Wikipedia:Bots/Requests for approval/BogBot 2
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Boghog (talk · contribs)
Time filed: 21:14, Friday July 22, 2011 (UTC)
Automatic or Manual: Manually assisted.
Programming language(s): Python
Source code available: Yes: link
Function overview: Populating new fields that have been recently added to the {{Drugbox}} template.
Links to relevant discussions (where appropriate): Adding clinical fields (and by extension external links) to the drugbox has been extensively discussed:
There is general consensus to populate the new fields. In contrast, there was some reservation about creating special purpose templates that would be transcluded back into the respective drug articles:
Hence the present request is only to populate the new fields and not to create special purpose templates.
Edit period(s): One time run for now.
Estimated number of pages affected: 4854 pages currently transclude the {{drugbox}} template
Exclusion compliant (Y/N): Y
Already has a bot flag (Y/N): Y
Function details: Populates the recently added fields to the drugbox. Some of these fields create links to external sites. Hence the bot checks to make sure each link is "live" before populating the field. In addition, the sections of the drugbox were also recently reordered, so the bot will sort the fields in the order they are currently rendered. See diff for an example bot edit for this requested task.
Discussion
editThis looks pretty good to me. I notice that this template is trans'd onto other namespaces such as User:. Would the proposed bot ignore these? SQLQuery me! 12:45, 24 July 2011 (UTC)[reply]
- Sorry, I should have specified this above. The list of pages that the bot will be working from is generated by the script "python pagegenerators.py -namespace:0 -transcludes:Drugbox > drugbox_titles.txt". Hence, the bot will only work on templates in main and not user namespace. Boghog (talk) 15:33, 24 July 2011 (UTC)[reply]
- Sounds good. Please consider putting some sort of link in the edit summary to this page. Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. SQLQuery me! 20:04, 26 July 2011 (UTC)[reply]
- Trial complete. In the process, I have uncovered and fixed a number of bugs. In addition, I have also included a number of additional checks per this discussion. I think the script is now fairly robust, but I intended to ramp up slowly to make sure there are no additional surprises. Boghog (talk) 02:11, 14 August 2011 (UTC)[reply]
- I noticed there is another bot editing drugboxes soon after BogBot does (e.g. [1] [2] [3]; there are dozens), but I'm assuming the tasks are sufficiently different that there's nothing we can do about this. I'm not an expert on the subject at all, so I can't gauge whether the changes are accurate, but everything seems to look good as far as I can tell. — The Earwig (talk) 21:05, 15 August 2011 (UTC)[reply]
- Yes we seem to be getting there... No concerns with the CheMoBot at this point.Doc James (talk · contribs · email) 21:27, 15 August 2011 (UTC)[reply]
- The subsequent edits by CheMoBot highlighted a small bug where BogBot forgot to propagate the DrugBank_Ref parameter. This bug has now been fixed. BogBot and CheMoBot are are complementary, the former is adding data while the later is verifying that changes to the template have not messed anything up. So CheMoBot did its job by catching an error by BogBot. I am also making a few additional small changes per this discussion. I will make an additional small test run to make sure everything is functioning correctly. Boghog (talk) 23:24, 15 August 2011 (UTC)[reply]
- OK, I have implemented the changes mentioned above and run some additional tests. The percentage of follow-on edits by CheMoBot is reduced. The few that remain are justified (e.g., diff, BogBot added ChemSpiderID and CheMoBot then added the ChemSpiderID_Ref parameter indicating that the data has been verified). Both bots appear to be functioning correctly. Boghog (talk) 02:46, 16 August 2011 (UTC)[reply]
- Yes we seem to be getting there... No concerns with the CheMoBot at this point.Doc James (talk · contribs · email) 21:27, 15 August 2011 (UTC)[reply]
- I noticed there is another bot editing drugboxes soon after BogBot does (e.g. [1] [2] [3]; there are dozens), but I'm assuming the tasks are sufficiently different that there's nothing we can do about this. I'm not an expert on the subject at all, so I can't gauge whether the changes are accurate, but everything seems to look good as far as I can tell. — The Earwig (talk) 21:05, 15 August 2011 (UTC)[reply]
- Trial complete. In the process, I have uncovered and fixed a number of bugs. In addition, I have also included a number of additional checks per this discussion. I think the script is now fairly robust, but I intended to ramp up slowly to make sure there are no additional surprises. Boghog (talk) 02:11, 14 August 2011 (UTC)[reply]
- Sounds good. Please consider putting some sort of link in the edit summary to this page. Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. SQLQuery me! 20:04, 26 July 2011 (UTC)[reply]
Your bot seems to be editing outside of approved trial. While it's not that big a deal, you really shouldn't do that (WP:BOTPOL is very clear on this). A question for now, could your bot add the ChemSpiderID_Ref instead of CheMoBot and save the extra edit? Headbomb {talk / contribs / physics / books} 02:56, 16 August 2011 (UTC)[reply]
- As stated these are just additional tests to determine how the new changes discussed here [4] are working. This overall is a tremendous improvement to articles on pharmaceuticals and hopefully this bot can soon be approved for a full run. Doc James (talk · contribs · email) 03:06, 16 August 2011 (UTC)[reply]
- Sorry for exceeding the 100 edit approved trial, but I was trying to test some modifications to the script that were made in response to Earwig and Jmh649 comments. I also have been monitoring closely the edits the bot makes and manually fixing any mistakes. I could add the ChemSpiderID_Ref parameter but I would prefer that CheMoBot makes an independent check to make sure that the ChemSpiderID link is valid. Where do we go from here? Perhaps an extended trial of 100 additional edits? Boghog (talk) 03:24, 16 August 2011 (UTC)[reply]
- As stated these are just additional tests to determine how the new changes discussed here [4] are working. This overall is a tremendous improvement to articles on pharmaceuticals and hopefully this bot can soon be approved for a full run. Doc James (talk · contribs · email) 03:06, 16 August 2011 (UTC)[reply]
- Approved for extended trial (≤100). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I'm fine with an extended trial. I'd have suggested 50, but 100 is fine too. While the trial is going on, could you ask the Chem people what they think about the ChemSpiderID_ref thing? If they are fine with the dual edits, so am I. If they'd rather have BogBot do the update itself, then I'd prefer that. See also this suggestion for edit summaries. Headbomb {talk / contribs / physics / books} 04:04, 16 August 2011 (UTC)[reply]
As a side-suggestion, would it be possible to fix those stupid first two fields. It's a pet-peeve of mine to see something like
{{Drugbox| Watchedfields = changed | Verifiedfields = changed
When they should/could be like
{{Drugbox | Watchedfields = changed | Verifiedfields = changed
Headbomb {talk / contribs / physics / books} 04:07, 16 August 2011 (UTC)[reply]
I am here after a question from Boghog. The part of the verification needs more than only adding the parameters. Where new and correct parameters are added, also the index should be updated (Wikipedia:WikiProject Chemicals/Index or Wikipedia:WikiProject Pharmacology/Index. Bogbot could pre-fill the _Ref parameters and also update the index if needed, but then still an extra edit is needed to get the correct value for the index ('verifiedrevid = ######') in the box. The only option I could think of to circumvent that, would be a change in the saving mechanism of the mediawiki software - if I were able to 'reserve' a revid, put that in the box, and filled-in the _Ref parameters, and then save the page giving it the revid that I reserved, plus putting the revid in the index, then CheMoBot would not need to update the page after an edit. Otherwise it is inevitable that there will be follow-up edits.
Regarding the verification Boghog, I work from several external lists, some of which privately mailed to me (CAS, UNII, ChEBI, ChEMBL), some of them readily available from internet (DrugBank, KEGG; the latter not too useful but it can be cross-checked against the other databases), some that I generate by eye (ChemSpider; I also got a list from them), and some that I source based on the others (StdInChI and StdInChIKey come from a ChemSpider search on the correct record. I am sure you could do it as well using BogBot, but since the code of CheMoBot seems pretty stable (though I removed a bug a couple of days ago) I think it is better to use that mechanism (CheMoBot also handles the more complex ChemBox). Note, CheMoBot is not capable (at the moment) to 'sort' fields in a box to a preferred order).
Second part, yesssss, pretty please. I will see if I can do that as well where possible. Also
{{Drugbox| Watchedfields = changed | Verifiedfields = changed |
is something that I sometimes see and which hurts my eyes. I know it is just edit-esthetics, but it has been bugging CheMoBot in the beginning as well (as well as many other missformats an crazy parameters. --Dirk Beetstra T C 13:19, 16 August 2011 (UTC)[reply]
Regarding the test edits now performed - it is not the only thing that needs to be done, it would also need a check whether all identifiers are correct (with respect to the index), and set the _Ref parameters accordingly, but that can be done (if you add e.g. a 'DrugBank = ####' and that is different from the value in the version which is indexed in the appropriate index (and you do not update the index), then the _Ref field should be set to '{{drugbankref|changed|DrugBank}}' - the bot will then not follow up if that is done consequently to all fields. If the index is updated, I don't think you can circumvent a follow up edit by CheMoBot, as it for sure will update 'verifiedrevid = ####' (except if you know beforehand what the revid will be when the page gets saved). I must say, this is something that has been bugging me for some time, especially when I update a lot of pages. --Dirk Beetstra T C 14:04, 16 August 2011 (UTC)[reply]
- If CheMoBot does I good job of this having a second bot run after Bogbot is not too big of a deal. Bogbot is only going to be run once on all the pharmaceutical articles to add data. CheMoBot is run whenever article data is changed.Doc James (talk · contribs · email) 16:43, 16 August 2011 (UTC)[reply]
- Thanks everyone, especially Dirk for your comments. As Dirk mentioned above, there is already a well functioning mechanism in place for verifying the data, namely CheMoBot. In addition, the process is significantly more complicated than adding a single parameter. Therefore I will not attempt to duplicate what CheMoBot is already doing well. As Doc James indicated, the update of the drugbox templates is a one run job, so there will be a limited number of follow-up edits. Concerning the second point, I completely agree. The previous version of the bot rebuilt the template from the first character after the first pipe so that if the first pipe was on the first line, it remained on the first line. This has now been changed so that the template is totally rebuilt from scratch from the opening curly bracket to closing bracket so that each parameter including the first is on its own line. Furthermore templates imbedded within the drugbox (e.g, citation templates) are collapsed so that all parameters within the imbedded template are on the same line. This should make parsing the template by bots in the future much easier. I will start the second trial run shortly. Boghog (talk) 23:20, 16 August 2011 (UTC)[reply]
- If CheMoBot does I good job of this having a second bot run after Bogbot is not too big of a deal. Bogbot is only going to be run once on all the pharmaceutical articles to add data. CheMoBot is run whenever article data is changed.Doc James (talk · contribs · email) 16:43, 16 August 2011 (UTC)[reply]
Alright, well if Chem folks are alright with the dual edits, then I'm alright with it. Also, concerning the edit summary "per bot trial" would be better than "per bot approval", as the later implies it's already approved. You got up to one hundred edits in article space (and as many as you want in your own sandboxes). Headbomb {talk / contribs / physics / books} 03:29, 17 August 2011 (UTC)[reply]
- BTW, {{tl|foobar}} doesn't work in edit summaries. You can use the {{foobar}} directly. Headbomb {talk / contribs / physics / books} 21:02, 21 August 2011 (UTC)[reply]
- Extended Trial complete. see last 50 contributons. Per the requests above, every parameter is on its own line and the edit summary has been tweaked. Everything seems to be functioning normally. Boghog (talk) 21:32, 22 August 2011 (UTC)[reply]
- Approved. MBisanz talk 22:20, 27 August 2011 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.