Wikipedia:Bots/Requests for approval/HooptyBot

The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard. The result of the discussion was

Denied.

HooptyBot

Operator: JPxG (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 08:31, Sunday, August 22, 2021 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: https://github.com/jp-x-g/wigout

Function overview: When requested, adds convenience links to Earwig's Copyvio Checker next to articles and revision links on selected WP:CCI casepages.

Links to relevant discussions (where appropriate): User talk:JPxG/Archive9#Potential trial page for your script, Wikipedia talk:Contributor copyright investigations#New tool for CCI casepages (automatic Earwig linking bot), User talk:Enterprisey#CCI script

Edit period(s): When requested

Estimated number of pages affected: ?

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Function details: This is my first bot approval request, so I will apologize if I am doing it incorrectly. I asked around and tried to come up with ideas that could help with processing WP:CCI backlog; one suggestion that came up was to automatically provide links to Earwig's Copyvio Checker next to the diffs generated on CCI casepages. This isn't a terribly complicated task; what my script does is take the title of an input page, identify diffs, and put appropriate hyperlinks next to them. An example of its output can be seen here, and I've run it previously in my userspace at User:JPxG/CCIsandbox (so you can see what its diffs look like). The source code is available on my GitHub, if you want to look through it (suggestions appreciated).

For now, I don't want to think very hard about running it automatically; while it'd be possible to have it automatically scrape links from a requests page, I want to start out by just running it attended myself, i.e. someone pings me on-wiki and I run the script on the page from my terminal. I don't expect there to be thousands of pages getting run through this (by my count, there's only 968 CCI casepages in existence, and for many of them this task would be irrelevant).

I've checked "automatic" on the operation mode because I plan to run it unattended at some point (although obviously there'd need to be some safeguard against ruffians coming along and adding a ton of random stuff to the requests page).

Discussion

Initial thoughts: Since it's only a limited number of people who do CCI, wouldn't a client-side userscript be more ideal? (this is not a blocker to this BRFA btw, just a question) ProcrastinatingReader (talk) 11:54, 22 August 2021 (UTC)[reply]

My initial thoughts agree with this; a click of a few buttons to fill this in for the user seems easier than a bot. Primefac (talk) 20:33, 22 August 2021 (UTC)[reply]

The possibility of writing it as a userscript is something that occurred to me. From my conversations with technical editors and CCI workers familiar with the area, it seems like the most productive thing to do is completely change the way CCI casepages are structured. For example, @Enterprisey: suggested an entire standalone page that implements a revision-checking workflow. Of course, this will be great -- but in the meantime, people will still be going over the casepages by hand; it didn't seem worth going through the effort to develop a userscript (which will presumably languish in everyone's common.js forever) to handle the specific syntax for pages of which there are a couple hundred tops.

Of course, that said -- if I were developing the whole thing from scratch and didn't already have a working version, I think you are probably right that it would be easier to do so as a userscript. jp×g 01:26, 23 August 2021 (UTC)[reply]

If I understand correctly, the bot adds copyvio links like in the following extract:

* [[:Mark Mazzetti]]<span class="plainlinks"><sup>[https://copyvios.toolforge.org/?lang=en&project=wikipedia&action=search&use_engine=1&use_links=1&title=Mark_Mazzetti C]</sup></span>: (1 edits, 1 major, +922) [[Special:Diff/538795362|(+922)]]<span class="plainlinks"><sup>[https://copyvios.toolforge.org/?lang=en&project=wikipedia&action=search&use_engine=1&use_links=1&oldid=538795362 C]</sup></span>

I dunno how case pages are created/managed (whether via template or via userscript), but I'd have guessed these links can be added via the template (at least going forward)? I mean it seems to use the same rev ID? So something like Template:CVD perhaps? Paging Moneytrees in here as well. ProcrastinatingReader (talk) 10:34, 24 August 2021 (UTC)[reply]

Replacing all of the diff links with templates would shorten wikitext, but casepages tend to be extremely long, so I'm a little worried about transclusion count. A database query tells me the largest casepage is Wikipedia:Contributor_copyright_investigations/20210531, with 20,876 instances of Special:Diff/; this is certainly an outlier, but even the 100th largest casepage (Wikipedia:Contributor_copyright_investigations/Contaldo80) has 1,886. I'm not 100% solid on what constitutes an undue load on the servers, but from experience it seems like templates tend to get messed up when you get into the thousands (and {{CVD}} seems to have other templates within it). As far as the workflow goes, it's kind of confusing. My understanding of it is that the wikitext for a casepage is generated by this script written/maintained by @MER-C:. When testing it a few days ago, I couldn't figure it out (if I put in my own username, it would only fetch a couple of edits). But it's been used for a very long time, and all of the old casepages were generated using it. The majority I envision HooptyBot getting used on are extremely old. It seems possible (and smart) to come up with a completely new system for administering the casepages, or replace them altogether, but there are still some CCI cases that have been open for a decade. It may be possible to import them into a new system (if/when one is devised), but for now, this is what we've got to work with. jp×g 11:21, 25 August 2021 (UTC)[reply]

Another problem is is that there are also a lot of pages copied from sources Earwig can't and probably won't ever be able to read... Moneytrees🏝️^{Talk/CCI guide} 14:21, 25 August 2021 (UTC)[reply]

Exactly. The risk of someone coming along that is ignorant of Earwig not covering all sources, running Earwig, not seeing a match, and marking as

is why I oppose any hard insertions of Earwig links into the wikitext. User script is fine. I am absolutely not making any changes to the contribution surveyor that are specific to the English Wikipedia, or to detecting copyright violations (I now use it to go after cross-wiki UPE). MER-C 17:04, 25 August 2021 (UTC)[reply]

Denied. While I understand the interest in having this bot task, the two responses above mine are from editors heavily involved in this area so I trust their judgement about how useful this would be. Combined with the other concerns mentioned further up, it looks like this task could potentially cause more problems than it solves. Primefac (talk) 10:28, 27 August 2021 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard.