Wikipedia:Bots/Requests for approval/Monkbot 2
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Trappist the monk (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 15:23, Tuesday March 4, 2014 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): AWB
Source code available: Yes (source)
Function overview: Scans Category:Pages containing cite templates with deprecated parameters for Citation Style 1 citations that use the deprecated parameters |coauthor=
or |coauthors=
and where:
- these parameters are empty, removes the parameter;
- the parameter contains one 1–4 segment name, replaces
|coauthor=
or|coauthors=
with|author2=
- the parameter contains multiple (2–9) semicolon delimited names, replaces
|coauthor=
or|coauthors=
and the semicolons with|author2=
–|authorn=
(where n is 3–10) - template contains
|ref=harv
, does nothing (does not apply when|coauthor=
or|coauthors=
is empty) - template contains
|lastn=
or|authorn=
where n is greater than 1, does nothing
Links to relevant discussions (where appropriate):
Edit period(s): Occasionally after initial run through the category
Estimated number of pages affected: At the time of this writing, the deprecated parameter category contains 102,700 pages
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: A full and detailed description of the task's functionality is available with it source.
Discussion
editI looked through the code, and maybe I missed the following nuance. Category:CS1 errors: coauthors without author contains articles in which |coauthors=
exists in a citation in the absence of a populated |author1=
or |last1=
. Does the bot's code account for the possibility that |coauthors=
can exist without |author1=
or |last1=
? If you take a citation with no populated author parameters and replace |coauthors=
with |author2=
, the citation will not display any authors.
Related: I have put in a feature request to detect author2 without author1 or first1 without last1, but it hasn't been implemented yet. If we implement that feature, I'd be OK with this bot's current code, since any replacements of coauthors with author2 in the absence of author1 would simply move the article to a new, easily-fixable error category. I want to avoid "fixing" an article by replacing coauthors with author2 in the absence of author1, since the article will not currently have any error tracking after the "fix" even though it still contains a broken citation.
I have been working on cleaning up Category:CS1 errors: coauthors without author via an AutoEd script. AWB should also work fine if someone wants to run through the category. These articles are easy to fix, except when they contain a mix of citations that are "coauthors-only" and "author1 plus coauthors". – Jonesey95 (talk) 21:55, 4 March 2014 (UTC)[reply]
- Good catch. I've edited about 3000 articles using variants of this script and haven't see (yet) the case of
|coauthor=
without|author=
. I've added the item to my todo list. It essentially means that there will be 2x the number of regexes that there are now.
- A tricky situation to watch out for in the above category is the presence of a populated
|first=
, an empty or missing|last=
, and a populated|coauthors=
. The intent of the original editor was to list the "first" author followed by coauthors. The fix is to replace|first=
with|author1=
and|coauthors=
with|author2=
. I have fixed about 600 articles in the category and have seen this arrangement a few dozen times.
- A tricky situation to watch out for in the above category is the presence of a populated
- The more I think about this proposed bot, the more I think that it should fix only the most obvious of low-hanging fruit, at least at first. Will it behave properly if there are four authors, one missing author, three more authors, and coauthors? Would the bot create duplicate a
|author6=
in this case? We don't have code to flag repeated parameters–the citation simply displays the final one–so if the bot created a second|author6=
, nobody would ever know. Since we also don't have code to flag the errant "missing author" situation, I suggest leaving citations like this alone for a human to sort out. I think it would be reasonable, at least for a first pass through the category, to fix only those situations in which there is a single populated author or author1 or last1/first1 followed immediately by coauthors. Run through the 100,000 articles fixing only that condition, see what the category looks like when the bot is done, and make refinements to the bot's code. That would be a conservative approach. – Jonesey95 (talk) 00:07, 5 March 2014 (UTC)[reply]
- The more I think about this proposed bot, the more I think that it should fix only the most obvious of low-hanging fruit, at least at first. Will it behave properly if there are four authors, one missing author, three more authors, and coauthors? Would the bot create duplicate a
- If I understand the essence of your coauthors-without-author comment that opened this discussion, the citation must have
|last=
,|last1=
,|author=
or|author1=
and that parameter must have a value. If none of those parameters are present, or one is but is empty, then the citation shall be skipped. Because one of the|last=
or|author=
parameters is required, the|first=
issue is not an issue, right?
- If I understand the essence of your coauthors-without-author comment that opened this discussion, the citation must have
- As written right now, if values are assigned to any
|lastn=
or|authorn=
(where n is greater than 1 and less than 100) and that parameter is located ahead of the|coauthor=
parameter:{{cite ... |author6=Sixth Author |... |coauthors=First Coauthor; Second Coauthor; ... Fifth Coauthor}}
- then, there is no replacement because the simple
|author2=First Coauthor
...|author6=Fifth Coauthor
replacement would bugger up the citation. However, if all of the existing|author2=
–|authorn=
parameters in my example are empty, then there is no reason not to proceed with the replacement.
- As written right now, if values are assigned to any
- The other case, where
|lastn=
or|authorn=
follow|coauthor=
, there is no reason to do the replacement because the existing|authorn=
parameters will override the new.
- The other case, where
Will it behave properly if there are four authors, one missing author, three more authors, and coauthors?
Yes, because the script found a match in step 2 and so protected that citation from step 3 editing. No duplicate|author6=
.
- We can do as you suggest and limit the search and replace to the case where
|last=
,|last1=
,|author=
or|author1=
precedes|coauthor=
(that is the most common case). I have a test version of the script that is doing just that for the one-coauthor case.
- We can do as you suggest and limit the search and replace to the case where
- I follow your logic and am satisfied that the "first/coauthors" citations will be left alone.
- I do think it would be conservative and reasonable to start the bot with a simple task, run through the category knowing that the bot should not make any mistakes because the code is so straightforward, then see what is left in the category. At that point, as we did with BattyBot 25, we can work to suggest refinements that will take care of known problems that appear to be easy to add to the existing code.
- This is a coding philosophy, and you do not have to agree with it. I prefer to roll out simple code, make sure it works and is bug-free, then add complexity from there based on known needs. One can try to build a program that performs complex actions right from the start, and if one is very clever, one might succeed, but I am not that clever. I expect that addressing the most common case will be easy and will take care of two-thirds to three-quarters of the errors in the category. Once it does, it will be easier to find the odd situations that require additional complexity.
- I am ready to see some test edits if there is an admin around who can approve them. I will be happy to check all of the edits. – Jonesey95 (talk) 01:21, 5 March 2014 (UTC)[reply]
- You'll get no argument from me that simple is good. I think that this is the simple case that isn't so simple that it's trivial. The challenge is still ahead of us:
|coauthor=Last, First M., First M. Last, ...
– I had much grander visions when I started down this path.
- You'll get no argument from me that simple is good. I think that this is the simple case that isn't so simple that it's trivial. The challenge is still ahead of us:
Manual test edits
edit- All of the Step 3 regexes now require
|last=
,|last1=
,|author=
or|author1=
to precede|coauthor=
. All of the AWB edits in Special:Contributions/Trappist the monk from 11:39, 5 March 2014 were made with this version of the script.
- All of the Step 3 regexes now require
I looked at the first 35 edits by the script. Comments:
- This edit and This edit did not fix any errors. They only deleted empty coauthors parameters. Editors will probably object if a bot does only that to an article. Perhaps the script should exit without editing if all citations end up protected.
- This edit shows that the protection is working as intended. The script is being very conservative. That's good.
- This edit has some GIGO going on ("foreword by Mark L."). The script worked fine.
- This edit also has GIGO. The script worked fine. The output is no worse than the input.
- This edit and a couple of others resulted in a citation with exactly nine authors, which triggers the displayauthors CS1 error. That's OK. Another bot or editor can fix that problem. I think Citation Bot is being programmed to work on those errors, which it should be able to fix easily.
Good work. On a side note, if you could run some test edits on the Q-Z section of the alphabet in Category:CS1 errors: coauthors without author, that would help me clear out that category. The end of the alphabet contains articles with a mix of coauthors-related errors, and the script should be able to get the articles down to just one type of error that is easier for me to fix. – Jonesey95 (talk) 17:02, 5 March 2014 (UTC)[reply]
- I have made about 3500ish edits with various versions of this script. There are a lot of pages with empty
|coauthor=
parameters that have been removed. There have been no complaints – no doubt, now that I written that, someone will complain.
- It is trivial to add
|displayauthors=9
to the replacement when there are 9 authors. Is that the correct solution to that problem? Is it a problem? Is it something that a bot should be doing?
- A human (I assume you are a human) making the change with a script is one thing. A bot doing it is another. I'm looking at WP:COSMETICBOT, which I have seen people cite when making objections to edits by bots.
- As for
|displayauthors=9
, the problem is that the original source may have more than nine authors, but the editor inserting the citation may have listed only nine because of the previous nine-author limit in cite journal. Citation Bot goes out to check the original source (if a DOI or PMID is available) and adds the remaining authors (or, pending a feature request, adds|displayauthors=9
). The solution, in any case, is to refer to the original source before deciding the number of authors to display. – Jonesey95 (talk) 18:50, 5 March 2014 (UTC)[reply]
- As for
- I don't think that the removal of empty deprecated parameters qualifies as cosmetic – cosmetic implies appearance. The script is only removing something that isn't seen anyway. I look at it more as instructive and preventive. Instructive because editors will see that
|coauthor=
is deprecated, and preventative because editors aren't tempted to fill in the empty blank.
- I don't think that the removal of empty deprecated parameters qualifies as cosmetic – cosmetic implies appearance. The script is only removing something that isn't seen anyway. I look at it more as instructive and preventive. Instructive because editors will see that
- I have run the script through Category:CS1 errors: coauthors without author. It fixed about 475 pages.
Ready for trial, approval needed
editI am ready to see some test edits if there is an admin around who can approve them. I will be happy to check all of the edits. This bot task owner has a track record of being a conservative, responsible, and responsive bot owner. – Jonesey95 (talk) 05:41, 13 March 2014 (UTC)[reply]
{{BAGAssistanceNeeded}}
—Trappist the monk (talk) 15:30, 16 March 2014 (UTC)[reply]
Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. MBisanz talk 20:06, 29 March 2014 (UTC)[reply]
Trial complete.
Fifty-seven edits made (I started without getting the edit summary right). They are listed here: Special:Contributions/Monkbot beginning at 21:35, 29 March 2014 and ending at 21:46, 29 March 2014 (times in UTC). Except for the first six, these edits are marked with this edit summary: Task 2: Fix CS1 deprecated coauthor parameter errors (bot trial)
It's a rather uninteresting collection of edits, though all of Task 2's features are demonstrated except the longer strings of coauthor names (3–9). But, it does illustrate the most common edits. I didn't see anything untoward in these edits.
Pinging Editor Jonesey95.
—Trappist the monk (talk) 22:09, 29 March 2014 (UTC)[reply]
- So that I can let Monkbot continue to work on Task 1, here is a link to the wmflabs edit-summary search tool results that lists the edits made in this trial.
I inspected the 50 edits linked immediately above. I noticed the following:
- The bot removed empty
|coauthors=
parameters, as described above. This will discourage editors from filling in this deprecated parameter. - The bot appeared to limit itself to names containing no more than four segments, as described above. For example, this edit skipped
| last =Smith | first =George | coauthors =Delaware County Institute of Science
, as it should have. - The bot operated correctly on
|coauthors=
parameters containing multiple (2 or 3) semicolon delimited names, as described above. This is evidenced in this edit. The test edits did not include a|coauthors=
parameter with more than three authors. - I do not have an easy way to confirm that the bot ignores citations containing
|ref=harv
or that it ignores citations in which a template contains|lastn=
or|authorn=
where n is greater than 1, but I did not see any evidence in the test edits that the bot modified any such citations.
I found no errors in the test edits I inspected. The bot appears to be conservative in operation, as it should be. – Jonesey95 (talk) 02:29, 31 March 2014 (UTC)[reply]
- Thank you for doing that.
Per a conversation at BRFA Monkbot 3, I have changed the script to add |displayauthors=9
when the replacement results in nine authors listed in the citation. This prevents the script from adding the page to Category:Pages using citations with old-style implicit et al..
—Trappist the monk (talk) 11:38, 1 April 2014 (UTC)[reply]
{{BAG assistance needed}}
—Trappist the monk (talk) 11:08, 10 April 2014 (UTC)[reply]
Approved for trial (500 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. --slakr\ talk / 06:49, 12 April 2014 (UTC)[reply]
- I checked a little over 50 of these test edits, and I found zero errors.
- Here is an example of the bot correctly adding
|displayauthors=9
to a cite template.
- It handles ampersands and "and" gracefully.
- It avoids wikilinked coauthor values, as it should.
- I recommend approval. – Jonesey95 (talk) 13:59, 12 April 2014 (UTC)[reply]
Trial complete. Thank you. Every edit through edit 200 inspected, thereafter frequent random inspections.
This edit is flawed. Monk bot should have removed the 'and ' from |coauthors= Y. Hasegawa; and Y. Azuma
. I reverted, tweaked the script and let Monkbot try again; this time successful (this reedit makes the total trial edit count 501).
Not a bad edit by task 5, but rather an editor's choice.
Another edit where the editor's choice mystifies Monkbot:
|coauthors=McBurnie MA, Newman A, Tracy RP, Kop WJ, Hirsch CH, Gottdiener J, Fried LP; Cardiovascular Health Study
- →
|author2=McBurnie MA, Newman A, Tracy RP, Kop WJ, Hirsch CH, Gottdiener J, Fried LP
|author3=Cardiovascular Health Study
In this case, it looks like the editor merely copy/pasted the author list from Pubmed: PMID 12418947. Still, I reverted, tweaked the script. All rules enabled for ten edits, Monkbot reedited with this result. From this point through edit 150, only the multiple coauthors rules were enabled.
For edit 151, I disabled all rules except the 9 coauthor rule in order to to make sure to find a display authors edit. After which, all rules were enabled for the duration of the test. I found no other questionable edits.
The edits are listed at Special:Contributions/Monkbot beginning at 11:30, 12 April 2014 and ending at 15:09, 12 April 2014 (times in UTC) and have this edit summary: Task 2: Fix CS1 deprecated coauthor parameter errors (bot trial). Also edit summary search results.
—Trappist the monk (talk) 15:29, 12 April 2014 (UTC)[reply]
A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with {{t|BAG assistance needed}}
. —Trappist the monk (talk) 11:54, 27 April 2014 (UTC)[reply]
- Approved. MBisanz talk 05:02, 4 May 2014 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.