Wikipedia:Bots/Requests for approval/Galobot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Galobtter (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 07:12, Friday, August 3, 2018 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python (Pywikibot)
Source code available: Here
Function overview: Fix multiple unclosed formatting tags lint errors
Links to relevant discussions (where appropriate): Wikipedia:Village pump (technical)#Remex: Pages that used to look fine are now broken Wikipedia:Bot requests#HTML errors on discussion pages
Edit period(s): One time run
Estimated number of pages affected: ~10000 based on 34000 errors
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): No
Function details: Basically, replaces things like <tt>
…<tt>
with <tt>
…</tt>
as Jc86035 suggested here. Tags fixed are <tt>
, <s>
, <u>
, <b>
, <i>
, <code>
, and <strike>
. Specifically:
- for each page that has multiple unclosed formatting tags, finds every multiple unclosed formatting tags error for that page
- uses the "location" output of Linter to narrow down where to fix the error in the page text
- searches for two instances of start tags of the erroneous tag
- if there are no closing tags or templates in between, it replaces the latter instance with a closing tag
- update: the two deprecated tags are now handled a bit differently;
<strike>
tags are replaced with<s>
tags and<tt>...</tt>
with {{mono}} if the fix is to the 99%+ case of <tt>reviewer<tt>
- update: the two deprecated tags are now handled a bit differently;
- only makes an edit if it has fixed all multiple unclosed formatting tags errors.
I know that Ahecht's Ahechtbot is having a BRFA partly for doing the same for just <s>
; however, this fixes all non-nesting tags with such errors, and as it only edits when all errors are fixed there shouldn't be any double-watchlist hits from both bots or anything like that. Also, my bot account was blocked by Oshwah for having "bot" in its name; probably should be unblocked, at-least now :)
Discussion
edit- Unblocked as there is a BRFA open on this and it is not editing outside the bot policy. — xaosflux Talk 13:03, 3 August 2018 (UTC)[reply]
- Regarding "only makes an edit if it has fixed all multiple unclosed formatting tags errors" - is this for the entire page, how will you determine this? — xaosflux Talk 13:08, 3 August 2018 (UTC)[reply]
- Yes, this is for the entire page. All the multiple unclosed formatting tags errors of the page are gotten through an API call; if for any of the errors on a page it cannot make a fix, it doesn't edit the page (there is no need to make more than one fix per error given by Linter). This filtering decreases the number of pages edited by ~5%. Galobtter (pingó mió) 13:24, 3 August 2018 (UTC)[reply]
- From an "error-handling" perspective, how likely is it that there will be nested instances of these calls? I know it's unlikely that there will be something like
<i>This is <i>italics</i> when we do this</i>
(which shows up asThis is italics when we do this
, but it's very possible you could have someone saying "To highlight code, use<code>
", which has two<code>
calls in it. Primefac (talk) 16:12, 3 August 2018 (UTC)[reply]- I mean, I know that the second example I've given doesn't actually throw any errors, but if there was another error on the page, would it correct it? Primefac (talk) 16:13, 3 August 2018 (UTC)[reply]
- Interesting edge case, but no :). Linter gives the location of the error (from the start tag to the incorrect end tag (addendum: or sometimes till the end of the line)). The script only tries fixing the specific tag that Linter says is problematic within that particular location. (see here for example of API output) So another error elsewhere would not cause "fixing" of that.
- I mean, I know that the second example I've given doesn't actually throw any errors, but if there was another error on the page, would it correct it? Primefac (talk) 16:13, 3 August 2018 (UTC)[reply]
- (Additional thoughts that may not make sense and are of minor import: The only way that would even be close happening is if the page had two unclosed formatting lint errors as in here. Linter sometimes gives the location as from the first erroneous tag to the very last one, instead of stopping at the second paired erronous tag (but not in the case I've made though), and thus the whole of the text would be in the location of one of the reported errors, and thus the text to be fixed would include the example you've given, and the program would be looking to fix a <code> error within that. But the program would only fix the first error there and not "fix" the next line; and the location of the second error would only contain <code>Bar<code>) Galobtter (pingó mió) 16:47, 3 August 2018 (UTC)[reply]
- Cool. I've been a little more out-of-the-loop on the Linter stuff recently, so I wasn't sure how the errors were being handled these days. Primefac (talk) 16:29, 4 August 2018 (UTC)[reply]
- (Additional thoughts that may not make sense and are of minor import: The only way that would even be close happening is if the page had two unclosed formatting lint errors as in here. Linter sometimes gives the location as from the first erroneous tag to the very last one, instead of stopping at the second paired erronous tag (but not in the case I've made though), and thus the whole of the text would be in the location of one of the reported errors, and thus the text to be fixed would include the example you've given, and the program would be looking to fix a <code> error within that. But the program would only fix the first error there and not "fix" the next line; and the location of the second error would only contain <code>Bar<code>) Galobtter (pingó mió) 16:47, 3 August 2018 (UTC)[reply]
- I've added the source code, nothing too much too it Galobtter (pingó mió) 21:46, 7 August 2018 (UTC)[reply]
- {{BAG assistance needed}} Galobtter (pingó mió) 15:15, 13 August 2018 (UTC)[reply]
- @Galobtter: Wouldn't it be better to change tt tags to {{tt}} or to
<kbd>...</kbd>
(and similarly for strike tags), since they're now deprecated? Jc86035 (talk) 07:53, 14 August 2018 (UTC)[reply]
- Done, thanks.
<tt>...</tt>
are instead replaced with {{mono}} if there are no pipes in between to muck things up.<strike>
are replaced with<s>
Galobtter (pingó mió) 09:44, 14 August 2018 (UTC)[reply]- @Galobtter: You should also check for curley brackets, just in case someone was trying to type <tt>}}</tt> and accidentally did <tt>}}<tt> instead. I also created a pull request to explicitly call out the first parameter as "1=" and to add "|needs_review=yes". --Ahecht (TALK
PAGE) 18:18, 17 August 2018 (UTC)[reply]- 99%+ of
<tt>
fixes are of <tt>reviewer<tt> from an old version of {{Pending changes reviewer granted}} so I'm thinking of maybe only changing to {{mono}} then (to avoid any errors); or at-least in that case the fixes won't need review ({{mono}} is what is used now for those notices) Galobtter (pingó mió) 18:28, 17 August 2018 (UTC)[reply] - Yeah, code updated to only replace with mono if it is <tt>reviewer<tt> Galobtter (pingó mió) 10:36, 18 August 2018 (UTC)[reply]
- 99%+ of
- @Galobtter: You should also check for curley brackets, just in case someone was trying to type <tt>}}</tt> and accidentally did <tt>}}<tt> instead. I also created a pull request to explicitly call out the first parameter as "1=" and to add "|needs_review=yes". --Ahecht (TALK
- Done, thanks.
- @Xaosflux: tis been nearly a month, I have dealt with any issues brought up; would appreciate if this could move forward. Thanks. Galobtter (pingó mió) 13:43, 1 September 2018 (UTC)[reply]
- Approved for trial (150 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. SQLQuery me! 23:56, 7 September 2018 (UTC)[reply]
- SQL, thanks, Trial complete. Edits. I made the bot skip user talk base pages for the trial per comments by Xaosflux on the Ahechtbot BRFA regarding creating new messages alerts. A large portion of the edits were of changing the
<tt>
tag; here are sample edits of each of the tags: tt (tt fix that isn't of <tt>reviewer<tt>), s, code, b, i, u, strike. There was one error, but I spotted it and tweaked the bot code (turns out the location given by linter is sometimes 1 off, and code was tweaked to account for that; it now skips that page). Other than that I was checking every edit and there were no issues I could spot. Galobtter (pingó mió) 12:49, 8 September 2018 (UTC)[reply] - @SQL:Galobtter (pingó mió) 14:03, 7 October 2018 (UTC)[reply]
- SQL, thanks, Trial complete. Edits. I made the bot skip user talk base pages for the trial per comments by Xaosflux on the Ahechtbot BRFA regarding creating new messages alerts. A large portion of the edits were of changing the
- {{BAGAssistanceNeeded}} It's been over a month since the trial was completed, and a week since Galobtter nudged SQL. --Ahecht (TALK
PAGE) 02:54, 14 October 2018 (UTC)[reply]- I apologize for the delay. I've been very busy off-wiki. I don't see any issues with the trial edits, and there has been more than ample time for anyone else comment on this task. Approved. SQLQuery me! 03:40, 14 October 2018 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.