Wikipedia:Bots/Requests for approval/WikiCleanerBot 3
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: NicoV (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 09:40, Tuesday, April 2, 2019 (UTC)
Function overview: To fix some simple cases of square brackets without correct beginning.
Automatic, Supervised, or Manual: Automatic
Programming language(s): Java (Wikipedia:WPCleaner)
Source code available: On Github (and especially algorithm 46)
Links to relevant discussions (where appropriate): Maintenance task for CW Error #46
Edit period(s): Twice a month, preceding the dump analysis that I already perform, see Wikipedia:Bots/Requests for approval/WikiCleanerBot.
Estimated number of pages affected: Probably a few hundreds (estimation, as only simple cases will be fixed in the list) for the first complete run (pages with such problems are listed in Check Wiki #46, which currently contains a list of 2420 pages), and probably no more than a few dozen after that on each run given the evolution of the number of pages in the list.
Namespace(s): Main namespace
Exclusion compliant (Yes/No): No, cases that will be fixed are simple enough to fix them in each article.
Function details: The function will fix simple cases of problems detected by Check Wiki #46 (Square brackets without correct beginning). The cases identified so far are the following situations:
- an external link ending with 2 square brackets (
[https://... ...]]
), provided that it doesn't fall in the one of the following situations: remove the extra square bracket- it starts by 2 square brackets (
[[https://... ...]]
) - it contains another square bracket (
[https://... ...[...]]
) - there's a stray opening square bracket before in the line (
[...[https://... ...]]
)
- it starts by 2 square brackets (
- an internal link or a category ending with 4 square brackets (
[[...]]]]
): remove the extra 2 square brackets
You can tell me if it would be ok to add other fixes for #46 later if I find other cases where I'm sure of the fix.
I already ran this fix on frwiki (with eventually additional modifications), results can be seen in WikiCleanerBot's contributions ("Lien interne mal ouvert" is the French translation for this problem). For enwiki, I don't plan to have additional modifications (unless you think it's better to also fix some additional CW errors in the process).
For the test run, I can stop WPCleaner after a few modifications to let you check what the results are.
For the first articles in the list provided by Check Wiki #46, WPCleaner should do the following:
- 1 John 5: nothing, not a simple case (
[[First Epistle of John]] 4:11–12, 14–17]]
) - 10 nanometer: removing the extra square bracket by replacing
<ref>[http://www.cse.cuhk.edu.hk/~byu/papers/N01-SPIE-TPL.pdf triple patterning for 10nm metal]]</ref>
by<ref>[http://www.cse.cuhk.edu.hk/~byu/papers/N01-SPIE-TPL.pdf triple patterning for 10nm metal]</ref>
- 11AM (TV program): nothing, not a simple case (
[Vincent Smith (television presenter)|Vincent Smith]]
), but it may be a candidate for more ways of fixing ([XXX (YYY)|[XXX]]
replaced by[[XXX (YYY)|XXX]]
) in a later version - 12 Scorpii: nothing, not a simple case (
{{odlist | B=c<sup>1</sup>]]
) - 16-cell: nothing, not a simple case (
[<nowiki/>[4,2<sup>+</sup>,4]]
), and it would be a cosmetic fix only to remove a false positive - 1746 English cricket season: nothing, not a simple case (
*14 July - Addington & Bromley]]
) - 185th Air Refueling Wing: nothing, not a simple case (
the group received General Dynamics F-16 Fighting Falcon]]s
) - ...
- 1970 Law on dangerousness and social rehabilitation: removing the extra square bracket by replacing twice
<ref name="BOE">[http://www.boe.es/buscar/doc.php?id=BOE-A-1970-854 Ley 16/1970, de 4 de agosto, sobre peligrosidad y rehabilitación social]]. Boletín oficial del estado español (B.O.E) nº187 de 6/8/1970. Incluye un PDF con el texto de la ley y su análisis jurídico]</ref>
by<ref name="BOE">[http://www.boe.es/buscar/doc.php?id=BOE-A-1970-854 Ley 16/1970, de 4 de agosto, sobre peligrosidad y rehabilitación social]. Boletín oficial del estado español (B.O.E) nº187 de 6/8/1970. Incluye un PDF con el texto de la ley y su análisis jurídico]</ref>
- 1970–71 Iraqi Central Premier League: removing the extra square bracket by replacing twice
<ref>[http://forum.kooora.com/f.aspx?t=3591389 Iraq 1970/71]]</ref>
by<ref>[http://forum.kooora.com/f.aspx?t=3591389 Iraq 1970/71]</ref>
Discussion
editWhat happens in a case like
- J. Smith (2010) "Comment on "Do Shitzus have feelings", [J. Poodles, v. 4, iss. 4, pp. 4-8]" J. Poodles v. 19, iss. 3, pp. 134-134.
? Could it handle conversions to [ / ] Headbomb {t · c · p · b} 12:44, 3 April 2019 (UTC)[reply]
- Hi Headbomb. With the current code, nothing happens as it doesn't respect the check that it doesn't contain another square bracket. So for a first run, it would simply be ignored. It may fall in the other fixes I could add for #46, but I would first have to try it to see if there are no situations where this replacement shouldn't be used. --NicoV (Talk on frwiki) 16:21, 3 April 2019 (UTC)[reply]
- @NicoV: Alright, then that case can always be handled separately. Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.. Headbomb {t · c · p · b} 16:25, 3 April 2019 (UTC)[reply]
- @Headbomb: I run WPCleaner on a few pages, one at a time, and everything was good. I let it run on an extra dozen, and I found several articles with
Source: [Source: [https://int.soccerway.com/teams/romania/fc-dinamo-1948-sa-bucuresti/1775/ Soccerway]]
which was replaced bySource: [Source: [https://int.soccerway.com/teams/romania/fc-dinamo-1948-sa-bucuresti/1775/ Soccerway]
as specified, where the correct human fix would beSource: [https://int.soccerway.com/teams/romania/fc-dinamo-1948-sa-bucuresti/1775/ Soccerway]
. I wonder if I should make this an exception also (a stray [ before the external link, in the same line). What do you think? --NicoV (Talk on frwiki) 07:36, 4 April 2019 (UTC)[reply]- I think those would be malfunctions, and that they need to be addressed, either by skipping those cases, or handling them correctly. But it is not at all clear to me that [Source: link] is something that needs to be 'fixed' in the first place, so unless a clear and compelling argument can be made, I say skip it. Headbomb {t · c · p · b} 14:02, 4 April 2019 (UTC)[reply]
- Ok. I will modify WPCleaner to skip such cases, and I will run it on a few pages. I'll keep you posted. --NicoV (Talk on frwiki) 14:17, 4 April 2019 (UTC)[reply]
- I think those would be malfunctions, and that they need to be addressed, either by skipping those cases, or handling them correctly. But it is not at all clear to me that [Source: link] is something that needs to be 'fixed' in the first place, so unless a clear and compelling argument can be made, I say skip it. Headbomb {t · c · p · b} 14:02, 4 April 2019 (UTC)[reply]
- @Headbomb: I run WPCleaner on a few pages, one at a time, and everything was good. I let it run on an extra dozen, and I found several articles with
- @NicoV: Alright, then that case can always be handled separately. Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.. Headbomb {t · c · p · b} 16:25, 3 April 2019 (UTC)[reply]
@NicoV: remember that you need to post the trial results here once they are completed. Headbomb {t · c · p · b} 14:49, 4 April 2019 (UTC)[reply]
- @Headbomb:, I ran WPCleaner for 49 edits (miscounted, stopped 1 too early), here are the results (comment of the edits is "v2.01b - Task 3 - WP:WCW project (Square brackets without correct beginning)"):
- First part mentioned above with 17 edits, where I detected the case with stray opening square bracket before the external link: I modified WPCleaner to apply no automatic fixing in such case, and updated the description above accordingly. I also fixed manually the few pages where the automatic fix wasn't the best one to do by removing the extra "Source: [".
- Second part with the extra 32 edits, which seems ok to me.
- The trial edits seem to confirm the estimated number of pages that will be modified if run on the entire list, a few hundreds. --NicoV (Talk on frwiki) 10:53, 6 April 2019 (UTC)[reply]
- @NicoV: if the trial is complete you need to add {{BotTrialComplete}} so that AnomieBOT moves this from "in trial" to "trial complete" at WP:BRFA. Based on what you wrote, it seems like the trial is done; is it? --DannyS712 (talk) 01:01, 10 April 2019 (UTC)[reply]
- Thanks DannyS712. Trial complete. --NicoV (Talk on frwiki) 11:35, 10 April 2019 (UTC)[reply]
- @NicoV: if the trial is complete you need to add {{BotTrialComplete}} so that AnomieBOT moves this from "in trial" to "trial complete" at WP:BRFA. Based on what you wrote, it seems like the trial is done; is it? --DannyS712 (talk) 01:01, 10 April 2019 (UTC)[reply]
- [1], [2], [3], [4] are tricky. I'll need to think a bit, but you're welcome to brainstorm about them in the meantime, or come up with improved logic for those cases. Headbomb {t · c · p · b} 16:46, 10 April 2019 (UTC)[reply]
- Ok Headbomb. Here's my thoughts about them:
- Adam Davies (footballer, born 1992):
- Before the edit the reference was really malformed,
[https://www.barnsleyfc.co.uk/teams/first-team/goalkeeper/adam-davies/ {{Webarchive|url=https://web.archive.org/web/20180907182939/https://www.barnsleyfc.co.uk/teams/first-team/goalkeeper/adam-davies/ |date=7 September 2018 }} Barnsley F.C. Adam Davis]]
, which gives[https://www.barnsleyfc.co.uk/teams/first-team/goalkeeper/adam-davies/ {{Webarchive|url=https://web.archive.org/web/20180907182939/https://www.barnsleyfc.co.uk/teams/first-team/goalkeeper/adam-davies/ |date=7 September 2018 }} Barnsley F.C. Adam Davis]]
- The fix doesn't make it worse, by only balancing the square brackets, but leaving out the real problem. Right now, I don't see any improvement to the logic or to avoid modifying it, but the current fix is still slightly better than the original. So, I think we could let it be that way...
- Obviously, the correct fix would be something like the following, but I can't see any way to make that fix by bot...
Barnsley F.C. Adam Davis Archived 7 September 2018 at the Wayback Machine
- Before the edit the reference was really malformed,
- Air America (airline):
- Before the edit it was malformed,
[https://www.findagrave.com/memorial/7622220/charles-grant-herrick|Charles Charles Herrick]]
which givesCharles Herrick]
- The fix removes the extra square bracket, which gives
Charles Herrick
. - This fixes one of the problem, the other one being the incorrect target of the external link. It's still an improvement.
- Before the edit it was malformed,
- April 1913 and April 1915 are very similar to the previous one : the link has 2 problems (incorrect target, extra square bracket), the bot fixes one of them.
- Adam Davies (footballer, born 1992):
- --NicoV (Talk on frwiki) 20:06, 10 April 2019 (UTC)[reply]
- Ok Headbomb. Here's my thoughts about them:
- I agree they're both improvements. Just wondering if those other errors are tracked somewhere. It's not really a strike against this bot, just wondering out loud if this is something this bot should also tackle, or if it's best left to another bot/tool. Headbomb {t · c · p · b} 20:09, 10 April 2019 (UTC)[reply]
- Hi Headbomb.
- For the first one, I don't see any way to fix it automatically, or even detect it properly (for example some templates can be used inside external links texts, some can't...)
- For the other ones, I can easily add code to track them with WPCleaner (pipe inside an external link target). For automatic fixing, it's probably possible for some of them: when the pipe is at the end seems a good first candidate for automatic fixing, maybe the repetition of the first word is another good candidate. I can try to add that also, but it will require testing and I suggest to make it a separate request for approval: in the end, if they are all accepted, I will combine them with the dump analysis task (automatic fixing of some errors based on CW lists, dump analysis to create WPC lists, automatic fixing of some errors based on WPC lists). --NicoV (Talk on frwiki) 07:56, 11 April 2019 (UTC)[reply]
- I agree they're both improvements. Just wondering if those other errors are tracked somewhere. It's not really a strike against this bot, just wondering out loud if this is something this bot should also tackle, or if it's best left to another bot/tool. Headbomb {t · c · p · b} 20:09, 10 April 2019 (UTC)[reply]
- I haven't had an opportunity to look through all of the trials, but I've looked into the ones mentioned here. I'm not overly thrilled with the idea of automatically removing a known error (particularly something like the Air America page) with an error that might not be as obvious or track-able. Also not thrilled about what appears to be about a 10% failure rate. Not sure if this means the task should be declined, modified, or just closely watched; will have to think on that. Primefac (talk) 21:02, 27 April 2019 (UTC)[reply]
- Primefac. The link has 2 completely unrelated problem: an incorrect target for the link (the pipe and the characters just after) and an extra closing square bracket at the end. They are really completely unrelated: a good example is espacially Air America (airline) where there was an extra square bracket on 1 link (which was fixed by my bot) and an incorrect target on for 4 links (
[http://www.findagrave.com/cgi-bin/fg.cgi?page=pv&GRid=69444105&PIpi=85988325|Joseph Joseph C. Cheney]
,[https://www.findagrave.com/memorial/7622220/charles-grant-herrick|Charles Charles Herrick]]
,[http://www.findagrave.com/cgi-bin/fg.cgi?page=gr&GSln=Oyer&GSfn=john&GSmn=Lerdo&GSbyrel=all&GSdyrel=all&GSob=n&GRid=70395016&df=all&|John John Lerdo Oyer]
, and[http://www.findagrave.com/cgi-bin/fg.cgi?page=pv&GRid=3958452&PIpi=132195031|Jack Jack J Wells]
). So I kind of disagree with the statement that my bot is removing a known error with an error less obvious: it's just fixing one error, the other one being totally unrelated and it's not a failure. But I agree that the other kind of error should be tracked: I'm currently adding a check for that in WPCleaner and will track such problems Wikipedia:CHECKWIKI/WPC 543 dump, which will be filled in my next dump analysis. --NicoV (Talk on frwiki) 17:04, 1 May 2019 (UTC)[reply]- Primefac. I did a dump analysis on frwiki last night to try reporting the problem with pipe inside external links, result is available at fr:Projet:Correction syntaxique/Analyse 543. It's not as obvious as I hoped, as there are many links that are correct even with a pipe inside. I see several cases where I probably could report them without false positives (but I will miss some true errors):
- Pipe at the end of the external link (followed by a whitespace), like
[http://www.balistique-joel-serre.fr/233366341| Description de la cartouche]
in fr:.300 Winchester Magnum - Pipe followed by a template, like
[https://books.google.fr/books?id=VPcQAQAAMAAJ&pg=PA495|{{p.|495}}]
in fr:1650 en France - Pipe followed by an opening square bracket, like
[http://www.tv-tokyo.co.jp/|[http://www.allcinema.net/]
in fr:Akiho Yoshizawa - Pipe followed by a word that is repeated just after the whitespace, like Air America (airline)
- Pipe at the end of the external link (followed by a whitespace), like
- What do you think? --NicoV (Talk on frwiki) 07:24, 2 May 2019 (UTC)[reply]
- Primefac. I ran the dump analysis on enwiki last night, results for the pipe inside external links are available at Wikipedia:CHECKWIKI/WPC 543 dump (around 1000 articles with this problem), and they contain the articles mentioned above, so this problem will be tracked independently. If need be, I can also run a bot on this list to fix the most obvious ones (pipe at the end of the external link target): I already ran it on frwiki, and it fixed a vast majority of the reported problems (will be visible in the next version of fr:Projet:Correction syntaxique/Analyse 543 when I rerun the dump analysis). --NicoV (Talk on frwiki) 07:39, 6 May 2019 (UTC)[reply]
- Primefac, This diff gives a good idea of what could be fixed for obvious errors (on frwiki, it went down from 1302 pages to 56) for this pipe inside an external link. --NicoV (Talk on frwiki) 09:33, 7 May 2019 (UTC)[reply]
- Primefac. I ran the dump analysis on enwiki last night, results for the pipe inside external links are available at Wikipedia:CHECKWIKI/WPC 543 dump (around 1000 articles with this problem), and they contain the articles mentioned above, so this problem will be tracked independently. If need be, I can also run a bot on this list to fix the most obvious ones (pipe at the end of the external link target): I already ran it on frwiki, and it fixed a vast majority of the reported problems (will be visible in the next version of fr:Projet:Correction syntaxique/Analyse 543 when I rerun the dump analysis). --NicoV (Talk on frwiki) 07:39, 6 May 2019 (UTC)[reply]
- Primefac. I did a dump analysis on frwiki last night to try reporting the problem with pipe inside external links, result is available at fr:Projet:Correction syntaxique/Analyse 543. It's not as obvious as I hoped, as there are many links that are correct even with a pipe inside. I see several cases where I probably could report them without false positives (but I will miss some true errors):
- Primefac. The link has 2 completely unrelated problem: an incorrect target for the link (the pipe and the characters just after) and an extra closing square bracket at the end. They are really completely unrelated: a good example is espacially Air America (airline) where there was an extra square bracket on 1 link (which was fixed by my bot) and an incorrect target on for 4 links (
Headbomb, Primefac. If need be, when an extra closing square bracket is fixed from a page, I can also do the automatic fixing for the pipe in the external link. Currently, it would only fix a pipe at the end of the target (if the link is properly formatted), but I can also try to add the automatic fixing when the word after the pipe is a copy of the first word of the text. --NicoV (Talk on frwiki) 13:21, 8 May 2019 (UTC)[reply]
- @NicoV: The more I think of it, the more I think this is best left as a separate task/trial. You could run both tasks together, but it's easier to keep trials separate, I feel. So this task (CW Error #46) is Approved., and you can make a separate BRFA for the pipe issues. Headbomb {t · c · p · b} 14:16, 8 May 2019 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.