Wikipedia:Bots/Requests for approval/WikiCleanerBot 16
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: NicoV (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 14:49, Saturday, April 25, 2020 (UTC)
Function overview: Do edit for fixing CW Error #92 (Headline double).
Automatic, Supervised, or Manual: Automatic
Programming language(s): Java (WPCleaner)
Source code available: On GitHub (especially algorithm 92)
Links to relevant discussions (where appropriate): Wikipedia_talk:WikiProject_Check_Wikipedia#Request_for_addition_of_error
Edit period(s): Twice a month
Estimated number of pages affected: At first, I included only pages from Main in Wikipedia:CHECKWIKI/WPC 092 dump (a dry run on the 11598 pages results in the modification of 386 pages). After, I included also pages from File in the dump analysis, but keeping only articles where duplicate headings were consecutive (a dry run on the 12040 pages results in the modification of 5455 pages).
Namespace(s): Main + File
Exclusion compliant (Yes/No): Yes
Function details: The bot will remove some of the useless headlines that are doubled in some articles, if they are consecutive.
I already run a similar task on frwiki with 23 edits in Main and around 200 edits in File.
Discussion
editCouple of questions
- a) If this task is "off" at WP:CWERRORS, why do we need a bot to run this task?
- b) Will the bot only remove headers where it's ==<header>== <whitespace> ==<header>==?
Primefac (talk) 19:05, 11 May 2020 (UTC)[reply]
- Hi Primefac.
- a) I think the task is currently "off" at WP:CWERRORS because it was bringing too much false positives. WPCleaner detection is restricted to consecutive titles and a maximum level of 3 for the titles.
- Activating again this detection was requested by Jonteemil in this discussion.
- I tested this detection on frwiki, and all the pages reported had actual problems with the headlines or the content (various situations). I fixed all of them, either automatically for simple situations (most of the pages in File: were in such situations like here it seems), or manually for the others.
- b) The bot will only remove headers for non-ambiguous situations, leaving more complex situations for humans to fix.
- Non-ambiguous situations can also include things like ==<header>== <text> ==<header>== <text> <other_text> (both sections have the same content, or one section has the same text as the other section + other text after), but it's less frequent.
- --NicoV (Talk on frwiki) 10:48, 12 May 2020 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 18:01, 22 May 2020 (UTC)[reply]
- Trial complete. @Primefac:. Thanks, I've done 50 edits. I didn't see any problems in the edits. Fixes also take into account cases like :
==<header 1>== <text> ==<header 1>== ===<header 2>===
is simplified into==<header 1>== <text> ===<header 2>===
: 1922 Manitoba general election, ...
- --NicoV (Talk on frwiki) 15:24, 24 May 2020 (UTC)[reply]
- Trial complete. @Primefac:. Thanks, I've done 50 edits. I didn't see any problems in the edits. Fixes also take into account cases like :
Approved. As per usual, if amendments to - or clarifications regarding - this approval are needed, please start a discussion on the talk page and ping. --TheSandDoctor Talk 05:30, 27 May 2020 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.