Wikipedia:Bots/Requests for approval/VIAFbot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Contents
Operator: Maximilianklein (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 17:19, Friday August 17, 2012 (UTC)
Automatic, Supervised, or Manual: The bot is technically automatic, but logs cases for humans to check where it found conflicts between dewp and enwp.
Programming language(s): Python
Source code available: https://github.com/notconfusing/VIAFbot
Function overview: This bot takes a list of links from viaf.org to Wikipedia and reciprocates those links by way of the {{Authority control}} template after checking to see if the reciprocal link would agree with dewp's Normdaten template. Full details: Wikipedia:Authority control integration proposal
Links to relevant discussions (where appropriate):
Passed RfC: Wikipedia:Authority control integration proposal/RFC
and before that at the village pump
Edit period(s): It will be run one-time. A maintenance script could be developed later to refresh content when VIAF is refreshed.
Estimated number of pages affected: ~250,000
Exclusion compliant (Yes/No): No, this bot takes the list of articles from a file that's generated by viaf.org they are virtually all WP:BLPs.
Already has a bot flag (Yes/No): No
Function details: Takes this list Run's this switch:
Key nd - normdaten; ac - authority control; vl - viaf->wikipedia link number (xx) - template does not exist; |xx| - has template but no viaf parameter; |(xx)| -> (xx) or |xx| 1 - |(nd)|, (ac) AC template added VIAF parameter written. 2 - |(nd)|, |ac| VIAF parameter written. 3 - |(nd)|, ac == vl No writing necessary. 4 - |(nd)|, ac != vl Requires human attention. Nothing written. 5 - nd , (ac), nd == vl. AC template added VIAF parameter written. 6 - nd, (ac), nd != vl. Requires human attention. Nothing Written. 7 - nd, |ac|, nd == vl. VIAF parameter written. 8 - nd, |ac|, nd != vl. Requires human attention. 9 - nd, ac , nd == ac == vl. No writing necessary. 10- nd, ac, nd == ac != vl. Requires human attention. Nothing written. 11- nd, ac, nd != ac, nd == vl. Requires human attention. Nothing written. 12- nd, ac, nd != ac, ac == vl. Requires human attention. Nothing written. 13- nd, ac, nd != ac != vl. Requires human attention. Nothing written.
Discussion
editPlease review Wikipedia:Bot policy. In particular, note:
- Please update the bot's userpage per WP:Bot policy#Bot requirements.
- Pay particular attention to the third paragraph in WP:Bot policy#Bot accounts.
Code comments:
- Why does it try to write Template:Db-meta? Just to test the error?
- Right, that was just to test the error handling for locked pages. It was cruft, and is removed now.
- Why does pageValidate follow 10 levels of redirects, when only 1 is valid?
- I know sometimes that double redirects exist before they are cleaned, and some of these pages jumped around a bit, so I just set the maximum number of redirects to follow to 10, to make sure we're not in a loop. I can reduce it to 5 or even 4.
- The bot does not need to be editing its status pages (via calls to logOnWiki and from exception handling all over the place) for every edit it tries to make. Log them locally if at all possible. If it is really necessary to write them to the wiki, batch them to make one edit every few hours.
- The reason I am logging onWiki is that this information is useful to those other editors, also on German Wikipedia who are attempting to improve VIAF representation. Each time a dewp/enwp conflict is detected I believe that information needs to be reviewed - likewise where dewp disagreed with enwp & viaf.org. I refactored the code to batch the edits and write once there are 100 errors or conflicts of every variety. The plan is for humans to be able to follow along and look at conflicts as the bot is running, so I feel that every 100 is a good compromise.
- Similarly, the bot writes its stats far too often. Rather than posting to the wiki every 100 edits (which probably works out to once every few minutes), write the stats after a reasonable period of time has elapsed.
- OK, switched to 1,000 Maximilianklein (talk) 17:18, 19 August 2012 (UTC)[reply]
I'll leave any comments on whether there are more efficient ways to use pywikipedia to others, as I'm not very familiar with that framework. Anomie⚔ 14:39, 18 August 2012 (UTC)[reply]
- Some more code review:
- I don't think it's a safe idea to do
from wikipedia import *
, though I don't see any conflicts.- Done
- Why are you using the replace.py class? Isn't it simpler (and faster) to just call wikipedia.replaceExcept() directly?
- Done.
Same with add_text.py(using add_text allows for text to be put before interwiki links as Max explained on IRC)- Correct add_text.py is nice because it puts the text after the article and before the interwiki and category links, which is very nice.
- I'm not sure what the point of pageValidate is. Couldn't you just use:
if page.isRedirectPage():
return page.getRedirectTarget()
else:
return page
- 2 reasons. The list of links that I'm getting from viaf.org is about 6 months out of date so:
- I'm also checking for multiple redirects, which do sometimes exists - especially within WP:BLPs. For instance sometimes a page title can go from John Smith -> John Smith (Footballer) -> John Smith (Goalkeeper). This is a full list of all redirects I found from the links creation date to last week.
- And secondly some pages don't exist any more because of the out of date nature of the list. This is a full list of all does not exist errors I found from the links creation date to last week. Maximilianklein (talk) 17:18, 19 August 2012 (UTC)[reply]
There's a clear consensus for this task, and it appears all questions regarding the code have been answered. Let's see a trial to evaluate results and iron out any bugs. Approved for trial (35 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. — madman 04:10, 2 September 2012 (UTC)[reply]
- Thanks for the trial. I am busy but I hope to have the trial completed by Oct 1st. Maximilianklein (talk) 10:18, 6 September 2012 (UTC)[reply]
Trial complete. See [[1]]. Here's the log from the run:
Articles with VIAF numbers already on DEWP
edit- 15:53, 13 September 2012 . A.C. Reed added VIAF number 76504615
- 15:56, 13 September 2012 . Abel Decaux added VIAF number 34650276
- 15:56, 13 September 2012 . Abel Ferrara added VIAF number 12486003
- 15:57, 13 September 2012 . Abram Ioffe added VIAF number 10024694
- 15:58, 13 September 2012 . Adalbert Krüger added VIAF number 30290947
Articles with VIAF numbers not already on DEWP
edit- 15:53, 13 September 2012 . A. A. Long added VIAF number 108651772
- 15:53, 13 September 2012 . A. Craig Copetas added VIAF number 112277389
- 15:54, 13 September 2012 . A. Hays Town added VIAF number 93635604
- 15:54, 13 September 2012 . A. Owen Aldridge added VIAF number 108208332
- 15:54, 13 September 2012 . AM (musician) added VIAF number 170151494
- 15:54, 13 September 2012 . Aaron Glantz added VIAF number 70794690
- 15:54, 13 September 2012 . Aaron Hobart added VIAF number 59872150
- 15:54, 13 September 2012 . Aaron Krach added VIAF number 43671720
- 15:55, 13 September 2012 . Aasmund Nordstoga added VIAF number 226360175
- 15:55, 13 September 2012 . Aatish Taseer added VIAF number 83517573
- 15:55, 13 September 2012 . Ab Jenkins added VIAF number 29425332
- 15:55, 13 September 2012 . Abbas Ibn al-Ahnaf added VIAF number 90047960
- 15:55, 13 September 2012 . Abby May added VIAF number 67278497
- 15:55, 13 September 2012 . Abd al-Qahir al-Jurjani added VIAF number 9918029
- 15:56, 13 September 2012 . Abdolhossein Zarrinkoob added VIAF number 188571388
- 15:56, 13 September 2012 . Abdul Hamid Lahori added VIAF number 77040323
- 15:56, 13 September 2012 . Abdullah Ahmed Abdullah added VIAF number 46460912
- 15:56, 13 September 2012 . Abo El Seoud El Ebiary added VIAF number 124854473
- 15:57, 13 September 2012 . Abraham B. Hasbrouck added VIAF number 63891746
- 15:57, 13 September 2012 . Abraham Kaplan added VIAF number 122218121
- 15:57, 13 September 2012 . Abraham Raimbach added VIAF number 12578323
- 15:57, 13 September 2012 . Abrahm Lustgarten added VIAF number 4398109
- 15:57, 13 September 2012 . Abu Mansur Maturidi added VIAF number 89281707
- 15:58, 13 September 2012 . Abyale added VIAF number 19869978
- 15:58, 13 September 2012 . Achille Compagnoni added VIAF number 88403827
- 15:58, 13 September 2012 . Adam Bernstein added VIAF number 120262993
- 15:58, 13 September 2012 . Adam Dutkiewicz added VIAF number 181992364
- 15:56, 13 September 2012 . Abdoulaye Mamani requires human attention: nd != vl . VIAF number: 56613360
- 15:56, 13 September 2012 . Abdülaziz requires human attention: nd != vl . VIAF number: 122114473
This will be reviewed when possible. You really need to make sure you're never editing under the bot's account when you're not performing an automated task, though. Making human edits under a bot account is an excellent way to get the account blocked. — madman 00:14, 14 September 2012 (UTC)[reply]
- Understood. I'm very sorry for doing that. There may want to be better guidlines about how to prototype in the pre-trial period. Or a note on not protoyping in the pre-trail period. Maximilianklein (talk) 11:51, 14 September 2012 (UTC)[reply]
- I have no idea what you mean by prototyping, but the Wikipedia:Bot policy clearly states "Bot accounts should not be used for contributions that do not fall within the scope of the bot's designated tasks. In particular, bot operators should not use a bot account to respond to messages related to the bot. Bot operators may wish to redirect a bot account's discussion page to their own." — madman 15:07, 14 September 2012 (UTC)[reply]
- Hmmm. Everything looks good except the VIAF record for Abraham Kaplan. This may be a case of "garbage in, garbage out" as the match does make sense; some of the works are correct on the VIAF record. It still may be worth filtering records like this out, however, as the birth years don't match. — madman 23:21, 17 September 2012 (UTC)[reply]
- Yes, there maybe be some errors. As it says on WP:VIAF the matching was done with this code that does attempt to match names, birth and death dates, works, and some other factors. Checking by hand it's about 98% accurate. In the RfC it was acknowledged that even with these imperfections there was consensus to go ahead. Maximilianklein (talk) 16:20, 18 September 2012 (UTC)[reply]
{{BAG assistance needed}}
It's been one week since the last comment on this bot discussion. Maximilianklein (talk) 00:14, 27 September 2012 (UTC)[reply]- Approved. — madman 16:09, 27 September 2012 (UTC)[reply]
- Yes, there maybe be some errors. As it says on WP:VIAF the matching was done with this code that does attempt to match names, birth and death dates, works, and some other factors. Checking by hand it's about 98% accurate. In the RfC it was acknowledged that even with these imperfections there was consensus to go ahead. Maximilianklein (talk) 16:20, 18 September 2012 (UTC)[reply]
- Hmmm. Everything looks good except the VIAF record for Abraham Kaplan. This may be a case of "garbage in, garbage out" as the match does make sense; some of the works are correct on the VIAF record. It still may be worth filtering records like this out, however, as the birth years don't match. — madman 23:21, 17 September 2012 (UTC)[reply]
- I have no idea what you mean by prototyping, but the Wikipedia:Bot policy clearly states "Bot accounts should not be used for contributions that do not fall within the scope of the bot's designated tasks. In particular, bot operators should not use a bot account to respond to messages related to the bot. Bot operators may wish to redirect a bot account's discussion page to their own." — madman 15:07, 14 September 2012 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.