Wikipedia:Bots/Requests for approval/Lowercase sigmabot III
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Withdrawn by operator.
Operator: Σ (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 06:48, Sunday September 16, 2012 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python
Source code available: Yes
Function overview: Duplicate User:WebCiteBOT
Links to relevant discussions (where appropriate):
Edit period(s): Continuous
Estimated number of pages affected: [0, ∞)
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): No
Function details: See Wikipedia:Bots/Requests for approval/WebCiteBOT.
Discussion
editApproved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. don't forget to create a bot userpage. MBisanz talk 00:51, 19 September 2012 (UTC)[reply]
- Granted confirmed userright per IRC request from the bot's operator so it can proceed with the trial (adds external links so run into captcha errors). Snowolf How can I help? 06:03, 30 September 2012 (UTC)[reply]
Thank you Snowolf. At the moment it is Trial complete.. We didn't quite make 50 article-space edits, due to a lack of response from the webcitation.org team. We had stockpiled around 70 archived links, which were used in the bot trial. We are waiting to find out an acceptable rate to hit their API with as well as getting Σ's email whitelisted. In the meantime we have identified a few bugs/features that need to be fixed:
- Links being added in multiple edits (example at Czech regional elections, 2012: [1][2][3][4])
- The problem here is in how the bot has been constructed. The bot archives links as soon as they have stayed on-wiki for 48 hours. Now the problem is that not all links in that page might have been archived and ready for adding by the time the bot is ready to edit. To somewhat fix this problem we will implement a quick
SELECT * from `archived_links` where article="current_article";
, however that still does not fix the entire problem. Advice is appreciated.
- The problem here is in how the bot has been constructed. The bot archives links as soon as they have stayed on-wiki for 48 hours. Now the problem is that not all links in that page might have been archived and ready for adding by the time the bot is ready to edit. To somewhat fix this problem we will implement a quick
- Archiving links listed in the "External links" section (example: [5])
- Ideally all links on an article should be archived (including external links), however I don't think
{{cite web}}
should be used for this, and WP:EL does not have any other templates listed that could be used in this situation. - A quick fix for this is to update the regular expressions we are currently using to ensure the link is encased in <ref></ref> tags (or suitable equivalents). However I still believe all links should be archived.
- Ideally all links on an article should be archived (including external links), however I don't think
- The bot's error log isn't very descriptive, that will be fixed soon with more descriptive messages.
- If the bot wasn't able to add the url in for whatever reason, should it leave a note on the article's talkpage so a human can do it?
- In some cases the bot guessed titles for various URL's using the <title> attribute, I neglected to add <!--BOT generated title--> after it like other bots typically do. This will be fixed shortly.
Those are the main things that myself and Σ noticed. We are currently waiting for a response from the webcitation.org team, and after that and the above bugs are fixed, we should be ready for another trial. LegoKontribsTalkM and →Σσς. 06:20, 1 October 2012 (UTC)[reply]
- Why not just leave the problems that the bot had in the <!-- -->? Although I don't really see a problem with leaving the notes on the talk page. Cheers, --ceradon talkcontribs 21:20, 4 October 2012 (UTC)[reply]
- Because many title tags contain rubbish and SEO stuff, which is clearer to editors that such a title was added by a bot and can be safely changed. (a big problem why so many linkrot bots are simply stupid and should be avoided in my eyes!) mabdul 11:47, 5 October 2012 (UTC)[reply]
- Ceradon: Many of the problems I referred to the bot having include not being able to find the url within the wikitext, so it can't add anything in the comments. I'm trying to improve our regular expressions each time we hit a new one, but there are a lot of different ways people cite stuff. (I'm also probably going to borrow some of Dispenser's.)
- Mabdul: I agree with you about the SEO crap. I don't really see anyway around it though besides maintaining a large blacklist. LegoKontribsTalkM 14:03, 5 October 2012 (UTC)[reply]
- Because many title tags contain rubbish and SEO stuff, which is clearer to editors that such a title was added by a bot and can be safely changed. (a big problem why so many linkrot bots are simply stupid and should be avoided in my eyes!) mabdul 11:47, 5 October 2012 (UTC)[reply]
Update: Betacommand has graciously allowed us to use his code as well, so I will be merging our codebases together over the next few days and will be back with another update. LegoKontribsTalkM 04:44, 11 October 2012 (UTC)[reply]
- {{OperatorAssistanceNeeded}} Any updates? Thanks, — madman 06:12, 27 October 2012 (UTC)[reply]
- {{OperatorAssistanceNeeded|D}} Any updates? Thanks. MBisanz talk 14:46, 26 November 2012 (UTC)[reply]
- Done MBisanz talk 05:54, 2 December 2012 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.