Wikipedia:Authority control integration proposal/RFC
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
This proposal covers a plan to incorporate a large number of VIAF authority control identifiers to English Wikipedia biography articles, using the {{Authority control}} template. After an initial period of data-gathering and testing utilising multiple sources the template and VIAF parameter will be added or augmented by bot. This plan is being coordinated by Max Klein, the Wikipedian in Residence at OCLC, and Andrew Gray, the Wikipedian in Residence at the British Library.
Video Summary of the proposal
editOn youtube.
Summary of the proposal
editThe proposal was initially discussed on the Village Pump here and has been updated to include the feedback and commentary received during the discussions. While the Village Pump discussion was broadly favourable, it is being formally listed as an RFC in order to ensure clear support from the community before implementation later in 2012.
Authority control is the term used in librarianship, archival practice and related fields for unique identifiers to disambiguate objects (people, places, academic subjects, etc). On Wikipedia, this is handled with the {{authority control}} template, which places the identifiers at the end of the article and links out to library catalogues and central authority databases.
As well as the links for readers, this also embeds information which can be used to help build tools linking back into Wikipedia, or for maintaining its content.
It is widely used on the German Wikipedia (220,000 articles) and on Commons, but only lightly used on the English Wikipedia (4,000 articles). We plan to add a large number of identifiers to the English Wikipedia using data drawn from VIAF and from the German Wikipedia; depending on the level of overlap, this will probably be between 250,000 and 300,000 records. These will predominantly be drawn from the Virtual International Authority File (VIAF), an international project to merge multiple national authority files. VIAF identifiers correspond to identifiers in other systems, and can be used to populate other identifiers in the future.
Using data already embedded within VIAF, as well as on the German Wikipedia, we will identify pairs of corresponding VIAF numbers and articles. After data validation, a bot will add the VIAF number to the article using a reworked version of the {{Authority control}} template.
Frequently asked questions
edit- How do I add a subject's VIAF to the article about them (or mine to my user page)?
- Use {{Authority control}}.
- Why use VIAF and not another identifier?
- VIAF is a composite of several existing authority control databases, and so includes all the content from many of the other systems. Any entity with, for example, a LCCN should have a corresponding VIAF number as well, but not every entity with a VIAF number will have an LCCN. Adding VIAF does not preclude the inclusion of other identifiers (and may indeed make it easier); this isn't aiming to impose a sole standard.
- Why only people?
- The authority control system does cover other things, but for the moment (written 2013) we are only planning to cover people—this is to simplify the initial program, as well as target the articles where the template is most likely to be useful.
- What about errors in VIAF?
- You can report apparent errors in VIAF (or its constituent catalogues) at Wikipedia:VIAF/errors. These are then available to the relevant managing body, and for linkage repair on-Wiki. For the German equivalent noticeboard, see de:WP:PND/F.
- What about licensing?
- VIAF is licensed as ODC-BY, which is compatible with Wikipedia licensing; the use of a VIAF URI is sufficient attribution for the terms of the license.
- Will this give any control over Wikipedia content to third parties?
- No. While we will be including VIAF identifiers, the content of Wikipedia and VIAF will remain entirely separate. No metadata will be imported automatically from VIAF, nor will Wikipedia need to follow VIAF naming conventions.
- What if editors object to the template or the identifier?
- Editors of specific pages will in all cases be free to remove the metadata where it is inaccurate or felt to be editorially inappropriate. For the purposes of Wikipedia:Sanctions, the first revert of an automated or semi-automated addition of authority control information shall not count as a revert.
- What about pages covering two people?
- There are many cases where a single article deals with two individuals. If two VIAF identifiers refer to the same article, this will be logged but not added to the article; if it currently contains one but not the other, or a mixture of identifiers referring to both, this will also be flagged.
- What about Wikidata?
- Wikidata includes authority identifiers. However, adding the template now allows us to gain the benefit of having this information available before Wikipedia transcludes it from Wikidata ; it also will simplify any future work to add these identifiers to Wikidata.
- What about cases where several people have the same name?
- The primary purpose of authority control records is to help distinguish between people with the same (or similar) names. As such, identifiers are usually not matched on the name alone; the software is able to take account of other information such as birth and death dates.
- I wrote a new biographical article, how do find the VIAF identifier?
- Thank you for contributing to Wikipedia! You can look up a subject's VIAF at http://viaf.org/ Enter their name as the "Search Terms:", and leave the other parameters at their default values. If there are two or more entries with the same name, check the listed works for a match. If you're not sure which to use, you can ask for advice at Wikipedia talk:Authority control.
- I have another question
- Any comments, criticisms, etc. will be gratefully received, again at Wikipedia talk:Authority control.
Responses
edit- Please leave feedback or comments below. More general queries can also be left at Wikipedia talk:Authority control integration proposal.
Support
edit- Tagishsimon (talk) 22:28, 28 June 2012 (UTC)
- DGG ( talk ) 00:45, 29 June 2012 (UTC)
- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:30, 29 June 2012 (UTC)
- Ironholds (talk) 10:46, 29 June 2012 (UTC)
- Nyttend (talk) 13:28, 29 June 2012 (UTC)
- --AndreasPraefcke (talk) 13:42, 29 June 2012 (UTC) Not only helpful for linking out to, but especially for getting linked to from catalogues, scholarly databases and the likes.
- Wer900 • talk • coordinationconsensus defined 16:41, 29 June 2012 (UTC)
- SarekOfVulcan (talk) 19:44, 29 June 2012 (UTC)
- --j⚛e deckertalk 22:31, 29 June 2012 (UTC)
- Imzadi 1979 → 23:02, 29 June 2012 (UTC)
- Sandstein 06:16, 30 June 2012 (UTC)
- --Jarekt (talk) 11:45, 2 July 2012 (UTC)
- Filminfo 15:50, 2 July 2012 (UTC)
- the wub "?!" 16:17, 2 July 2012 (UTC)
- I support this project for having a large benefit, a low risk of harm, for being able to be undone if it is unwanted, and for the attention its coordinators give to addressing the concerns people have for it. This is a great experiment both in terms of incorporating data into Wikipedia and in terms of transparency in doing something new. I appreciate the commitment which project coordinators and participants have shown to making forthright replies to community questions. I have seen no make a comment or share an idea that makes me think anything other than that this project deserves to proceed. Blue Rasberry (talk) 20:51, 2 July 2012 (UTC)
- Bgwhite (talk) 06:57, 3 July 2012 (UTC)
- Mr impossible (talk) 12:07, 3 July 2012 (UTC) - this already seems to be appearing on Commons and the potential of this improved, linked data is very great.
- sunhai76 (talk) 14:20, 3 July 2012 (UTC)
- Yes please. Specs112 t c 12:36, 3 July 2012 (UTC)
- Some concerns, but outweighed by the benefit. Comments below. LeadSongDog come howl! 13:26, 3 July 2012 (UTC)
- —Ruud 14:06, 3 July 2012 (UTC)
- kosboot (talk) 14:07, 3 July 2012 (UTC)
- Whouk (talk) 14:42, 3 July 2012 (UTC) Might (or might or not) be issues down the line with generating Wikipedia content from the established links but we can discuss that as and when. Sounds like there's a lot of thought gone in and real potential for this to be useful.
- Night of the Big Wind talk 15:43, 3 July 2012 (UTC)
- Gobōnobo + c 20:46, 3 July 2012 (UTC)
- Helps to uniquely identify a person and to link to works by and about him. Edison (talk) 15:25, 4 July 2012 (UTC)
- Easy to support. Whether the template renders or not can be a separate discussion, but that's not enough of a drawback to outweigh the obvious benefits. In general, though, and this is off-topic, when will Wikipedia move from a file-based system to a database system? Embedding this sort of metadata (along with dozens of other pieces of metadata like it) directly into the content page is ridiculous and will have to be resolved with a fundamental change to the way Wikipedia stores, allows changes to and presents content and metadata. Is anybody working on that? I'd like to help but I don't know where to ask about it.
Zad68
17:33, 5 July 2012 (UTC) - I completely understand where opposers who see this as too much metadata on WP articles are coming from. However, the solution needs to be technical (via Wikidata, better rendering, or semantic web enabled browsers), not by omitting this valuable information. -- Gaurav (talk) 22:53, 7 July 2012 (UTC)
- It Is Me Here t / c 17:56, 8 July 2012 (UTC)
- Unlike some meta-data schemes this one seems to have real value and can be implemented fairly easily for a start. Eluchil404 (talk) 08:12, 9 July 2012 (UTC)
- BDD (talk) 15:23, 10 July 2012 (UTC)
- James F. (talk) 18:15, 11 July 2012 (UTC)
- I only see benefits, and can't see any harm. - Jorgath (talk) (contribs) 23:12, 11 July 2012 (UTC)
- Dsp13 (talk) 00:49, 12 July 2012 (UTC)
- This would be profoundly helpful. wholeheartedly support. eldamorie (talk) 20:58, 13 July 2012 (UTC)
- Ijon (talk) 18:00, 17 July 2012 (UTC)
- Useful, far easier to maintain as a template than EL's, and really of higher value than most EL's are now. Courcelles 00:28, 18 July 2012 (UTC)
- Support, as this would be incredibly useful in identifying sources in languages other than English. G. C. Hood (talk) 17:33, 22 July 2012 (UTC)
Oppose
edit- I am shocked, yet not surprised, yet perplexed, that no one thinks of the readers any more. I haven't seen any mention in discussions to date of why VIAF information needs to be visible in yet another stupid footer box. If you don't think these useless links belong in my comment, perhaps you ought to ask what possible use the common reader of an encyclopedia would have for them. Now even if you had been reading Sigmund Freud instead of this comment, those links in that ugly rectangle at the bottom of his article would be just as uninteresting as they are here. There's a reason that "PERSONDATA" does not render to the article; the same reason applies to "authority control codes". We've already got navboxes, articles feedback tools, external links, other navboxes, categories, and more—and this proposal would see another extremely low-value GUI element added to hundreds of thousands of articles. I am not opposed to markup that does not render in the article, although I don't understand why the metadata lovers (which, believe it or not, includes me in other contexts) don't come up with an article subpage proposal for all metadata. There are other groups taking usability quite seriously these days—specifically, ease of editing)—and at some point, they and the template formalists will have quite a battle. Riggr Mortis (talk) 09:51, 4 July 2012 (UTC)
- VIAF allows me to get to a number of catalogues of any author's work in a couple of clicks. That is useful to me as a reader, and I presume other readers will find it useful. It follows, in my head, that the central theses of your rant are invalid; that 1. no one has thought of the readers. Not so. 2. That the links are uninteresting and low value. Not so. Let me turn the question around. Why do you think it would be useful and what value would be added to wikipedia by preventing users from being able easily to traverse to a catalogue of the works of an author? --Tagishsimon (talk) 09:59, 4 July 2012 (UTC)
- We have far, far too much on the bottom of pages now, to the point that such templates make some large pages almost impossible to open on slow connections, and very expensive to open on mobile ones. The information is indeed of low value compared to the ever-increasing usability issues we are encountering. (It's all well and good that Westerners with high-speed, relatively inexpensive internet access might perhaps get a tiny bit of value from this; but we're trying to grow outside of our traditional reader base.) As external links, perhaps this is okay, but not as a template. And users should *never* be sanctioned for removing this kind of dross from articles. Risker (talk) 16:15, 4 July 2012 (UTC)
- wp:BuildTheWeb is what it's about. Visual presentation of this info in the page is (nearly) useless, it might just as well be in a subpage except we don't allow them in the mainspace. But where did you see even a suggestion of user sanctions? It is obviously intended to leave it up to the editors of each article to decide on whether these insertions should persist.LeadSongDog come howl! 16:36, 4 July 2012 (UTC)
- One point of the FAQ refers to them to clarify that any revert of the bot which included this explicitly doesn't count as a revert for the purpose of sanctions. (AIUI, it's an added protection from being sanctioned). I'm not sure this point is needed, but it's not an issue I do much work with so I left it in - please feel free to remove it if it's causing ambiguity. Andrew Gray (talk) 19:41, 4 July 2012 (UTC)
- No, it says the *first* revert will not be counted toward sanctions. It says nothing about subsequent reverts, and there have certainly been cases where users have been sanctioned for edit-warring with a bot. Risker (talk) 23:18, 17 July 2012 (UTC)
- One point of the FAQ refers to them to clarify that any revert of the bot which included this explicitly doesn't count as a revert for the purpose of sanctions. (AIUI, it's an added protection from being sanctioned). I'm not sure this point is needed, but it's not an issue I do much work with so I left it in - please feel free to remove it if it's causing ambiguity. Andrew Gray (talk) 19:41, 4 July 2012 (UTC)
- I strongly doubt this template will add any significant amount of overhead to the loading time of pages (<<0.1%). The amount of space a particular interface elements occupies on your screen is in general completely unrelated to the number of bytes needed to encode that element. —Ruud 19:19, 4 July 2012 (UTC)
- The concern about non-high-speed users is a valid one, but the solution shouldn't be not including valuable information. A low-fi skin of some sort, coupled with some classification of article content types as within or without the scope of the low-fi skin, would solve this problem more generally. Ijon (talk) 18:02, 17 July 2012 (UTC)
- You see what just happened there, Risker? You had us going with the concern about bandwidth thing. But you couldn't help but describe a link giving access to a plethora of third-party author bibliographies as "dross"; because, you know, honesty always wins out. And you know that author bibliographies, such as are curated by, for instance, national libraries, are not dross. Not of interest to you, maybe. But unambiguously and objectively not dross. So, we come away with the impression that the bandwidth thing was just a proxy for your general dislike for this sort of info. It really would be easier for all concerned if you'd step up to the plate, and, like Riggr Mortis, above, tell us: Why do you think it would be useful and what value would be added to wikipedia by preventing users from being able easily to traverse to a catalogue of the works of an author? --Tagishsimon (talk) 22:59, 17 July 2012 (UTC)
- Tagishsimon, what's the problem with an external link? Seriously, adding it as a template of any kind *is* dross, when we have other equally effective solutions that are more respectful to our users. The information may be useful, but the process by which we provide access is punitive to the audience that has the strongest need for it. And exactly how is this going to fit with Wikidata? Why are we adding this separately? Why is this not part of the Wikidata collaboration? Risker (talk) 23:08, 17 July 2012 (UTC)
- You see what just happened there, Risker? You had us going with the concern about bandwidth thing. But you couldn't help but describe a link giving access to a plethora of third-party author bibliographies as "dross"; because, you know, honesty always wins out. And you know that author bibliographies, such as are curated by, for instance, national libraries, are not dross. Not of interest to you, maybe. But unambiguously and objectively not dross. So, we come away with the impression that the bandwidth thing was just a proxy for your general dislike for this sort of info. It really would be easier for all concerned if you'd step up to the plate, and, like Riggr Mortis, above, tell us: Why do you think it would be useful and what value would be added to wikipedia by preventing users from being able easily to traverse to a catalogue of the works of an author? --Tagishsimon (talk) 22:59, 17 July 2012 (UTC)
- wp:BuildTheWeb is what it's about. Visual presentation of this info in the page is (nearly) useless, it might just as well be in a subpage except we don't allow them in the mainspace. But where did you see even a suggestion of user sanctions? It is obviously intended to leave it up to the editors of each article to decide on whether these insertions should persist.LeadSongDog come howl! 16:36, 4 July 2012 (UTC)
- So let me get this straight. You're okay with the content, but you're arguing the toss over whether the content should be inserted as a template or a plan-text EL? (Let's get wikidata out of the way first: it's not IMO a good reason to halt everything whilst we wait for wikidata to catch up. I anticipate AC will integrate with wikidata exactly as any other structured data within articles.)
- As to ELs, seems to me that there's quite a lot of info being given by the template - eight links, in fact. I don't think that's entirely consistent with users expectations of an EL, viz, a single link, not a set of eight links. Even cutting does to the three key numbers and arranging those one on each line seems to me not to be so great an idea. Neither the additional data rendered on the screen, nor marginal page load overhead seem to be anything other than trivial. I'm just not seeing cause for outrage. You'll tell me what I'm missing. --Tagishsimon (talk) 23:33, 17 July 2012 (UTC)
- What outrage are you talking about, Tagishsimon? I'm not outraged, I'm just seeing another little project that someone thinks is a good idea adding on to other little projects that someone else thought was a good idea on top of even more little projects... Our article pages are full of all these little projects: special templates that link all kinds of articles (instead of creating a logical category); infoboxes that are ever-expanding and containing more and more trivial information; links to half a dozen other places; templates nested within templates that take increasingly long to call forward. The fact that pretty well everyone on this page has high-speed, relatively inexpensive access to the internet means that we don't know what the real "call time" is for a page, when at the end of a dial-up in Africa or a mobile in India. The German Wikipedia gets very few "hits" outside of the Western "high-speed-connected" parts of the world, so they do not have to have the same level of concern for accessibility. We, on the other hand, have become the standard reference for the world, and accessibility is becoming an ever-more significant factor for us. We are failing our audience by continually adding layer on layer of resource-intensive metadata. It's not that this one is *the* problem, it's that it is just *one more* problem. Risker (talk) 00:09, 18 July 2012 (UTC)
- There's some ground for concern here, but the target is misplaced. Seriously, look at all the daft navboxen on [1] and ask yourself where the problem lies.LeadSongDog come howl! 05:50, 18 July 2012 (UTC)
- What outrage are you talking about, Tagishsimon? I'm not outraged, I'm just seeing another little project that someone thinks is a good idea adding on to other little projects that someone else thought was a good idea on top of even more little projects... Our article pages are full of all these little projects: special templates that link all kinds of articles (instead of creating a logical category); infoboxes that are ever-expanding and containing more and more trivial information; links to half a dozen other places; templates nested within templates that take increasingly long to call forward. The fact that pretty well everyone on this page has high-speed, relatively inexpensive access to the internet means that we don't know what the real "call time" is for a page, when at the end of a dial-up in Africa or a mobile in India. The German Wikipedia gets very few "hits" outside of the Western "high-speed-connected" parts of the world, so they do not have to have the same level of concern for accessibility. We, on the other hand, have become the standard reference for the world, and accessibility is becoming an ever-more significant factor for us. We are failing our audience by continually adding layer on layer of resource-intensive metadata. It's not that this one is *the* problem, it's that it is just *one more* problem. Risker (talk) 00:09, 18 July 2012 (UTC)
Comments
edit- I like this idea very much and think it would benefit both readers and researchers using Wikipedia. 64.40.54.97 (talk) 00:19, 29 June 2012 (UTC)
- With regards to FAQ question number three, how receptive have the VIAF people been to corrections submitted by the German community? Lankiveil (speak to me) 10:13, 29 June 2012 (UTC).
- Good question - I don't know, but I'll try to find out. That said, note that the German noticeboard is submitting corrections to PND/GND at the Deutsche Nationalbibliothek, rather than to VIAF, and so they'll be handled by different organisations. Andrew Gray (talk) 10:36, 29 June 2012 (UTC)
- VIAF has a reviews all corrections submitted by an editor. If there they are notified of an error which they agree with (which is mostly and obejctive process) then that correction will appear in VIAF the next time it is updated. Typically VIAF is updated every 6 months to a year. Maximiliankleinoclc (talk) 19:21, 29 June 2012 (UTC)
- Have any actual corrections been incorporated though? It's one thing to say "oh, it might happen", but I am concerned that we'll wind up getting a head-pat and some soothing words when we submit corrections, which will leave our articles and VIAF out of sync. Lankiveil (speak to me) 04:05, 1 July 2012 (UTC).
- Having spoken to the lead scientist of VIAF, Thom Hickey, I can report that changes have been made, and that there is a commitment from that team to make all submitted changes. The release cycle of VIAF is just that, a cycle, not an editable wiki, so it may take 6 months or more for the changes to eventually be reflected. Maximiliankleinoclc (talk)
- In general, the major authority control programs (VIAF, ISNI) are very keen on finding ways to incorporate information from "independently curated" sources like Wikipedia - our data is not perfect and needs checking, but it's a lot more comprehensive than traditional authority control datasets are, and in many cases may be more accurate. The practicalities are yet to be worked out, but when I talk to the people involved there's definitely a sense they do want this information and do want to make some kind of system which will work. Andrew Gray (talk) 10:57, 13 July 2012 (UTC)
- Good question - I don't know, but I'll try to find out. That said, note that the German noticeboard is submitting corrections to PND/GND at the Deutsche Nationalbibliothek, rather than to VIAF, and so they'll be handled by different organisations. Andrew Gray (talk) 10:36, 29 June 2012 (UTC)
- As I noted in earlier discussion, we should look to moving AC links into infoboxes, where articles have them, during a subsequent phase of this initiative. That will allow them to be included in the emitted metadata. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:32, 29 June 2012 (UTC)
- Should be pretty easy to move them into infoboxes later, right? Wouldn't it simply mean a bot cutting code from the bottom of the page and pasting it into the infobox? Nyttend (talk) 13:26, 29 June 2012 (UTC)
- Yes, but maybe infoboxes like Template:Infobox writer (and possibly some others) could be adjusted beforehand, and the bot could write the info directly there. (At de.wikipedia, the majority has always disapproved of infoboxes for most kinds of people, so we didn't bother to do that. Possibly, in geographic articles and other fields where we do have infoboxes, the authority control data will one day be shown there, but maybe only after the WikiData revolution.) --AndreasPraefcke (talk) 13:40, 29 June 2012 (UTC)
- Should be pretty easy to move them into infoboxes later, right? Wouldn't it simply mean a bot cutting code from the bottom of the page and pasting it into the infobox? Nyttend (talk) 13:26, 29 June 2012 (UTC)
- It might be good to amend the FAQ with "What about cases where several people have the same name?" IOW, how are we going to be sure we put the right VIAF id on the right pages? ErikHaugen (talk | contribs) 22:15, 29 June 2012 (UTC)
- I haven't looked closely at the VIAF matching code - Max? - but I believe it explicitly takes account of cues such as birth and death dates. (Unlike Wikipedia, which tends to use an occupational term, most authority control systems use dates as the primary method of disambiguation) Andrew Gray (talk) 16:57, 2 July 2012 (UTC)
- As stated in the full proposal, the matching source python code is available for your perusal. Maximiliankleinoclc (talk) 19:04, 2 July 2012 (UTC)
- I haven't looked closely at the VIAF matching code - Max? - but I believe it explicitly takes account of cues such as birth and death dates. (Unlike Wikipedia, which tends to use an occupational term, most authority control systems use dates as the primary method of disambiguation) Andrew Gray (talk) 16:57, 2 July 2012 (UTC)
- I would also propose to build tools for updating {{Authority control}} templates on other projects (that use them) for articles linked by interwiki links. This might require closer integration of {{Authority control}} templates at different projects. Eventually I see this as an ideal thing to add to future Wikidata infrastructure which is now being build, so different wikipedias linked by intewiki links can share a single copy of {{Authority control}} data. --Jarekt (talk) 11:56, 2 July 2012 (UTC)
- I should have finished reading FAQ, before answering. I can see that Wikidata idea is not unique. ;) --Jarekt (talk) 12:07, 2 July 2012 (UTC)
- This would be ideal (certainly no two linked articles should ever have different identifiers). For the moment, we won't run this script on any other languages, but there's nothing stopping anyone from localising it once we know it works. Andrew Gray (talk) 16:57, 2 July 2012 (UTC)
- Correct, certainly no two identical (and interwiki'd) articles should have different identifiers. However I have a suspicion that in some cases two articles maybe interwiki'd but in fact the articles have differnt subjects. I could really see this happening for a case like where *in totally fictional example* deWP for John Smith links to [[w:en:John Smith]] but should really link to [[w:en:John Smith (Plumber)]] either because it was never check, or was accurate at some point but then moved and now points to a DAB. That's the difficulty that I've been attempting to explain with using deWP as a validation step. Maximiliankleinoclc (talk) 19:17, 2 July 2012 (UTC)
- This would be ideal (certainly no two linked articles should ever have different identifiers). For the moment, we won't run this script on any other languages, but there's nothing stopping anyone from localising it once we know it works. Andrew Gray (talk) 16:57, 2 July 2012 (UTC)
- I should have finished reading FAQ, before answering. I can see that Wikidata idea is not unique. ;) --Jarekt (talk) 12:07, 2 July 2012 (UTC)
- I'd like to see a clearer explanation of what linkages are being built. Looking at http://viaf.org/viaf/141474549 we see a linkage to the stub-class article http://wikipedia.org/wiki/Stephan_Kekul%C3%A9_von_Stradonitz, and there we find an interlanguage link to http://de.wikipedia.org/wiki/Stephan_Kekule, which is a much better article. Further, at that article, there is an instance of the template Normdaten (Person) which links to http://viaf.org/viaf/57425893/. That VIAF record's history shows it was added to DNB, then to PTBNB, then removed from the DNB, yet it's still listed in the German Wikipedia article. The PTBNB is still there, but doesn't link to any Wikipedia article. The Portugese Wikipedia article at http://pt.wikipedia.org/wiki/Stephan_Kekul%C3%A9_von_Stradonitz doesn't link to the PTBNB, nor the VIAF. (Confused yet?) It is not at all clear that the English-language Wikipedia article should be linked in preference to other languages. We may need either multiple wp articles linked, or else a way to agree between wikis on which single article to link. LeadSongDog come howl! 13:57, 3 July 2012 (UTC)
- This is a very interesting example. In fact this goes to address some problems with verifying links against deWP efforts. Here's what happened:
- In 2009 VIAF thought that "Stradonitz, Stephan Kekule von" (Portugese entry) and "Kekule von Stradonitz, Stephan" (German entry) were the same person. I think this is just a portugese error in not doing german last names properly, but VIAF recovered from it and matched them and created a cluster number 57425893. Later deWP linked by hand to that VIAF cluster 57425893. Then in 2012 the Norwegian database was added, who have this person cataloged correctly (or at least the same as the Germans) as "Kekule von Stradonitz, Stephan". At this point VIAF identified the exact match of the German and Norwegian names, and deemed the difference of the Portugese one to mean that it was probably a different person since at least two other countries corroborated on the right name. So the German/Norwegian cluster became cluster number 141474549, while cluster 57425893 had its German part removed. This left deWP pointing to the cluster of the wrongly cataloged name. It's not their fault. But what it does mean is that if my bot went to add cluster 141474549 to the enWP article and checked against deWP, it would not match the deWP and classify the mismatch as VIAF clustering error, when in fact it is a Wikipedia linking error. That is one reason not to check the deWP (or treat it as law). The bot that is being proposed here for enWP is going to have a maintenance schedule that will update enWP (and down the road Wikidata) based on diffs, so this sort of things woudln't happen. Maximiliankleinoclc (talk) 19:04, 3 July 2012 (UTC)
- The concern, of course, is to avoid wp:CIRCULAR referencing. It is policy on enWP (and I believe most others) that open wikis are not to be treated as wp:RS. This certainly does not permit using deWP as a RS for enWP, even if sanitized through VIAF or other external process. Some method of identifying similar clusters without necessarily asserting whether or not they refer to the same person seems inevitable. Equating and merging the clusters should be based on an identifiable basis document. Consider two living people of the same name and birthyear as the most worrisome case: when one dies (or does something discreditable), the other may be inadvertently mourned (or libelled, as the case may be). Our wp:BLP treatment is necessarily cautious. LeadSongDog come howl! 22:16, 3 July 2012 (UTC)
- All this is not so much a point for not checking de.wikipedia, but for treating VIAF as what it is (the pooorest of identifiers since it changes so much) and always include LCCN and/or GND in this bot run wherever these numbers are included in current VIAF matchings. --AndreasPraefcke (talk) 05:15, 4 July 2012 (UTC)
- The concern, of course, is to avoid wp:CIRCULAR referencing. It is policy on enWP (and I believe most others) that open wikis are not to be treated as wp:RS. This certainly does not permit using deWP as a RS for enWP, even if sanitized through VIAF or other external process. Some method of identifying similar clusters without necessarily asserting whether or not they refer to the same person seems inevitable. Equating and merging the clusters should be based on an identifiable basis document. Consider two living people of the same name and birthyear as the most worrisome case: when one dies (or does something discreditable), the other may be inadvertently mourned (or libelled, as the case may be). Our wp:BLP treatment is necessarily cautious. LeadSongDog come howl! 22:16, 3 July 2012 (UTC)
- In 2009 VIAF thought that "Stradonitz, Stephan Kekule von" (Portugese entry) and "Kekule von Stradonitz, Stephan" (German entry) were the same person. I think this is just a portugese error in not doing german last names properly, but VIAF recovered from it and matched them and created a cluster number 57425893. Later deWP linked by hand to that VIAF cluster 57425893. Then in 2012 the Norwegian database was added, who have this person cataloged correctly (or at least the same as the Germans) as "Kekule von Stradonitz, Stephan". At this point VIAF identified the exact match of the German and Norwegian names, and deemed the difference of the Portugese one to mean that it was probably a different person since at least two other countries corroborated on the right name. So the German/Norwegian cluster became cluster number 141474549, while cluster 57425893 had its German part removed. This left deWP pointing to the cluster of the wrongly cataloged name. It's not their fault. But what it does mean is that if my bot went to add cluster 141474549 to the enWP article and checked against deWP, it would not match the deWP and classify the mismatch as VIAF clustering error, when in fact it is a Wikipedia linking error. That is one reason not to check the deWP (or treat it as law). The bot that is being proposed here for enWP is going to have a maintenance schedule that will update enWP (and down the road Wikidata) based on diffs, so this sort of things woudln't happen. Maximiliankleinoclc (talk) 19:04, 3 July 2012 (UTC)
- My first thought is that the primary author needs to go back and rewrite this entire presentation in plain English without undefined acronyms (note: wikilinking does not count as defining the acronym). Nothing in here has persuaded me that this is of benefit. Risker (talk) 13:05, 4 July 2012 (UTC)
- My apologies - I missed this comment! I'll have a prune through the RFC (though it's perhaps a little late now) and the proposal documentation itself to try and make it less specialised and more comprehensible. Andrew Gray (talk) 10:53, 13 July 2012 (UTC)
- I don't think the FAQ emphasizes the meta-Wikipedia usefulness, importance and significance of this project. Most of us have heard that Wikipedia is the 5th most-used website in the world. But DBpedia (Wikipedia in dataformat) is THE most-used source of data in the world. Virtually all new sites that seek to harness data seem to regard DBpedia something they must incorporate. Seen in that light, efforts like the one under discussion are all the more crucial to have. -- kosboot (talk) 20:36, 4 July 2012 (UTC)
- FAQ #3 – errors in VIAF – is discussed further up, however answers are rather unspecific. Establishing feedback channels for reports of obvious errors seems important to me. The Wikipedia community is used to transparent workflows and the ability to instantly resolve errors. Having VIAF linked in articles while not offering a productive way to correct errors might lead to frustration on the side of Wikipedians. Below I will list an example I noticed some days ago. What would be the workflow and how long would it take to clean up VIAF and national authority datasets for Frank Herbert? -- Make (talk) 09:46, 9 July 2012 (UTC)
- Example: VIAF entries for Frank Herbert expose some inconsistencies.
- http://viaf.org/hosted/xa/538 – found via http://viaf.org/viaf/59083797 – attempts to force two records from DNB into one cluster. (Can't see where this would ever make sense)
- http://www.idref.fr/076966089 – found via http://viaf.org/viaf/197081323 – mixes two persons. Most titles schould be moved to http://www.idref.fr/026919710
- http://d-nb.info/gnd/107444062 – found via http://viaf.org/viaf/17741678 – does not (no longer?) show the work "Der Wüstenplanet Science-Fiction-Roman" associated, which has not been autocorrected in VIAF.
- VIAF has a very clear place where people can make corrections to VIAF records. VIAF does not tamper with individual libraries' (or countries') authorities records - that is their responsibility. Additionally, it is not the responsibility of an authority file to list all the works of a creator - it is just to establish the version of the name to be used in bibliographic records. -- kosboot (talk) 14:28, 9 July 2012 (UTC)
- With "clear place" you mean the email feedback form? Anyway, the point I want to make is another: I am just curious about how the proposed »method for reporting apparent errors in VIAF (or its constituent catalogues) back to the relevant managing body« (FAQ #3) would look like. My concern is that an intransparent one-way channel, where one just dumps error reports might lead to frustration on the side of Wikipedians. Ad hoc I can think of two important questions that arise, when errors are spotted: "Has this already been reported?" and "What is done about it?" – In addition, error resolution times of currently 10 months, like we are experiencing on the otherwise superb de:Wikipedia:PND/F noticeboard are highly discouraging. -- Make (talk) 15:35, 9 July 2012 (UTC)
- I didn't see your response previously. A good example is http://viaf.org/viaf/79197757 - Edmond Duponchel. He was listed on VIAF as two people - the French had him under Henri, and other countries used Edmond as the first name. As it happens, he was the subject of an ample Wikipedia discussion on his talk page. I reported it to VIAF in February. Thom Pease got back to me in a few days later questioning my source, so I showed him that Wikipedia talk page and he was convinced - all within about 1 hour's worth of emails. The correction was implemented by March (I believe VIAF is updated once a month). If someone is going to send in feedback, they should know what they're doing (i.e. have experience working with libraries' authority files). But the system does work. -- kosboot (talk) 02:06, 12 July 2012 (UTC)
- With "clear place" you mean the email feedback form? Anyway, the point I want to make is another: I am just curious about how the proposed »method for reporting apparent errors in VIAF (or its constituent catalogues) back to the relevant managing body« (FAQ #3) would look like. My concern is that an intransparent one-way channel, where one just dumps error reports might lead to frustration on the side of Wikipedians. Ad hoc I can think of two important questions that arise, when errors are spotted: "Has this already been reported?" and "What is done about it?" – In addition, error resolution times of currently 10 months, like we are experiencing on the otherwise superb de:Wikipedia:PND/F noticeboard are highly discouraging. -- Make (talk) 15:35, 9 July 2012 (UTC)
- VIAF has a very clear place where people can make corrections to VIAF records. VIAF does not tamper with individual libraries' (or countries') authorities records - that is their responsibility. Additionally, it is not the responsibility of an authority file to list all the works of a creator - it is just to establish the version of the name to be used in bibliographic records. -- kosboot (talk) 14:28, 9 July 2012 (UTC)
- What about the British Library? – On http://www.oclc.org/viaf/contributors.htm the British Library is listed as a contributer through NACO – However at this point there are no links from VIAF entries to British Library resources. Seems to me like a huge amount of information not incorporated into VIAF. Is this going to change? -- Make (talk) 11:10, 9 July 2012 (UTC)
- The British Library's contributions are not a separate database but are to the Library of Congress's authority file: http://www.bl.uk/bibliographic/authority.html -- kosboot (talk) 13:59, 9 July 2012 (UTC)
- Thanks for the background. Still, wouldn't it be desirable to have weblinks from VIAF to British Library catalogue/authority file entries? I can only guess, but it looks like the British Library has not yet implemented permanent URLs for authorities. -- Make (talk) 15:35, 9 July 2012 (UTC)
- As I said above, the BL does not have its own authorities, they use LC. So when you see an indication of American use, it stands for US & UK usage. I'm sure you've noticed that though an authority file can cite various works in the authority record, they do not link to bibliographic records. The purpose of authority records is to establish "authorized" and consistent forms of entry for name, places, things, etc. so that all bibliographic records (whatever the language) will concatenate. VIAF is trying to do this on a world basis, so that, whatever form of name you have (based on country of use), it will link all the other forms used by other countries. For example, a name like "Tchaikovsky" (US usage) will link to "Tschaikowsky" (German usage), which would be different in Hebrew, French, Japanese, etc.-- kosboot (talk) 16:13, 10 July 2012 (UTC)
- I had a quick chat to the BL's metadata people on Wednesday, and unfortunately it's not currently possible to build a simple deeplink into the catalogue using author identifiers - the catalogue handles the authors as text strings not as ID codes, so you would need to resolve them before using them, which is a bit clunky. This may change in future, of course. That aside, the BL presumably isn't on the main VIAF page because it's bundled in with the LoC records, as Kosboot says. Andrew Gray (talk) 10:49, 13 July 2012 (UTC)
- Thanks for the background. Still, wouldn't it be desirable to have weblinks from VIAF to British Library catalogue/authority file entries? I can only guess, but it looks like the British Library has not yet implemented permanent URLs for authorities. -- Make (talk) 15:35, 9 July 2012 (UTC)
- The British Library's contributions are not a separate database but are to the Library of Congress's authority file: http://www.bl.uk/bibliographic/authority.html -- kosboot (talk) 13:59, 9 July 2012 (UTC)