User talk:Citation bot/Archive 31

This is an archive of past discussions about User:Citation bot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 25

←

Archive 29

→

Bot sputtering

Latest comment: 2 years ago1 comment1 person in discussion

The bot seems to be stalling; pauses for significant lengths of time, also was giving the 503 error. Abductive (reasoning) 20:24, 24 March 2022 (UTC)

DOIs that point to larger document

Latest comment: 2 years ago3 comments3 people in discussion

Status: not a bug
Reported by: Ariconte (talk) 03:01, 27 March 2022 (UTC)

What happens: New link added - https://en.wikipedia.org/w/index.php?title=Frederick_C._Leonard&type=revision&diff=1079490992&oldid=1028937799
What should happen: nothing
We can't proceed until: Feedback from maintainers

I will revert the added link --- link is to a pub which adds no value.

The problem appears to be that the doi already there (doi:10.1111/j.1945-5100.2002.tb00912.x) goes to a 150-page "abstracts" section on which the publication actually cited is somewhere in the middle (page A34). The newly added link is to a different publication under the same doi (one of the two on the first page, page A9). —David Eppstein (talk) 07:43, 27 March 2022 (UTC)

Fixed up the refs on that page to fix issues. AManWithNoPlan (talk) 13:26, 28 March 2022 (UTC)

Olearia cuneifolia

Latest comment: 2 years ago2 comments1 person in discussion

Status: Fixed by adding to NO_DATE_WEBSITES list
Reported by: Gderrin (talk) 04:07, 1 April 2022 (UTC)

We can't proceed until: Feedback from maintainers

The bot is adding a random date to a reference in Olearia cuneifolia. The Queensland Government Department of Environment and Science regularly updates the linked page,^[1] the last time on 8 March 2022. Adding the date "20 October 2014" suggests to a reader of the article that the linked website is out of date. Gderrin (talk) 04:07, 1 April 2022 (UTC)

undefined issue

Latest comment: 2 years ago2 comments1 person in discussion

Status: {{fixed}}
Reported by: Jonatan Svensson Glad (talk) 13:24, 2 April 2022 (UTC)

What happens: Added |issue=undefined
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Aging-associated_diseases&diff=prev&oldid=1080635948
We can't proceed until: Feedback from maintainers

I know this is garbage, but should be added to a bad-list. Jonatan Svensson Glad (talk) 13:24, 2 April 2022 (UTC)

Billboard dates

Latest comment: 2 years ago1 comment1 person in discussion

Status: {{fixed}} by adding billboard to ignore dates list
Reported by: Exallonyx (talk) 15:43, 2 April 2022 (UTC)

What happens: The bot seems to be putting incorrect dates on Billboard year-end chart citations. I noticed it from this edit: https://en.wikipedia.org/w/index.php?title=I_Hate_U,_I_Love_U&diff=prev&oldid=1080644452 , where it put |date=2 January 2013 on 2 citations of 2016 year-end charts, which is obviously incorrect. I assume it's coming from the <meta property="article:published_time" content="2013-01-02T14:48:25+00:00" /> in the Billboard page's source code, so maybe the bot should ignore those tags on Billboard.com URLs?
It seems to have done this many times - I searched insource:/\{\{cite[^\}]+?billboard\.com\/charts\/year-end\/2016[^\}]+? 2013/ and that brought up 68 results of what, from clicking on the first few, appear to be the same deal, and that's just for 2016 links.
We can't proceed until: Feedback from maintainers

Link publisher?

Latest comment: 2 years ago3 comments2 people in discussion

It would be nice if the citation bot could wikilink |publisher= attributes that it adds when the wiki article exists, e.g., |publisher=IBM, |publisher=IEEE. --Shmuel (Seymour J.) Metz Username:Chatul (talk) 14:19, 3 April 2022 (UTC)

@Chatul: I have created a pull request for this: https://github.com/ms609/citation-bot/pull/3888

The bot's maintainers will decide whether to implement it. BrownHairedGirl (talk) • (contribs) 21:59, 3 April 2022 (UTC)

It has been implemented: https://github.com/ms609/citation-bot/pull/3888 BrownHairedGirl (talk) • (contribs) 23:59, 3 April 2022 (UTC)

Publisher type error

Latest comment: 2 years ago2 comments2 people in discussion

See [1]. {{u|Sdkb}} ^talk 01:49, 7 April 2022 (UTC)

Fixed AManWithNoPlan (talk) 13:30, 7 April 2022 (UTC)

book reviews

Latest comment: 2 years ago3 comments2 people in discussion

Status: Won't fix without examples. I tried a bunch and could not reproduce
Reported by: Prairieplant (talk) 03:39, 28 March 2022 (UTC)

We can't proceed until: Feedback from maintainers

This bot is turning online reviews of books into books, that is, cite web into cite book, making up an isbn and deleting the url sometimes. It has happened in Reviews section of two different novels by Ellis Peters in Cadfael series. The bot should leave Publishers Weekly, Library Journal citations alone. -- Prairieplant (talk) 03:39, 28 March 2022 (UTC)

Example edits please. AManWithNoPlan (talk) 12:31, 28 March 2022 (UTC)

ISBN in Cite web

Latest comment: 2 years ago5 comments4 people in discussion

Status: new bug
Reported by: Johannes Schade (talk) 06:30, 10 January 2022 (UTC)

We can't proceed until: Feedback from maintainers

The bot changed "{{Cite web|last=Coolahan |first=Marie-Louise |date=9 May 2019 |title=Dowdall [née Southwell], Elizabeth |website=[[Oxford Dictionary of National Biography]] |doi=10.1093/odnb/9780198614128.013.112775 |url=https://www.oxforddnb.com/view/10.1093/ref:odnb/9780198614128.001.0001/odnb-9780198614128-e-112775 |access-date=14 March 2021 |url-access=subscription}} – Online edition" -> "{{Cite web|last=Coolahan |first=Marie-Louise |date=9 May 2019 |title=Dowdall [née Southwell], Elizabeth |website=[[Oxford Dictionary of National Biography]] |doi=10.1093/odnb/9780198614128.013.112775 |isbn=978-0-19-861412-8 |url=https://www.oxforddnb.com/view/10.1093/ref:odnb/9780198614128.001.0001/odnb-9780198614128-e-112775 |access-date=14 March 2021 |url-access=subscription}} – Online edition". I doubt the bot checks the book against the website. The website could differ from what was published in the book with that ISBN. I do not think an ISBN should be added under these circumstances. With thanks and best regards, Johannes Schade (talk) 06:30, 10 January 2022 (UTC)

A diff would be more useful than the above. Headbomb {t · c · p · b} 10:26, 10 January 2022 (UTC)

To cite the ODNB, use {{cite ODNB}}:

{{Cite ODNB |last=Coolahan |first=Marie-Louise |date=9 May 2019 |title=Dowdall [née Southwell], Elizabeth |doi=10.1093/odnb/9780198614128.013.112775}}

Coolahan, Marie-Louise (9 May 2019). "Dowdall [née Southwell], Elizabeth". Oxford Dictionary of National Biography (online ed.). Oxford University Press. doi:10.1093/odnb/9780198614128.013.112775. (Subscription or UK public library membership required.)

—Trappist the monk (talk) 13:12, 10 January 2022 (UTC)

Normally this would work, but in this case, DOI is broken and needs to use the URL parameter instead. I fixed all 3 articles using this citation to use Cite ODNB with the URL. — Chris Capoccia 💬 14:10, 18 January 2022 (UTC)

Stalled job

Latest comment: 2 years ago4 comments2 people in discussion

A batch job of mine has been stalled for about 14 hours. I have tried to kill it using https://citations.toolforge.org/kill_big_job.php, but it won't die.

Please can it be killed? BrownHairedGirl (talk) • (contribs) 23:13, 7 April 2022 (UTC)

This zombie job is still blocking me from starting a new batch. BrownHairedGirl (talk) • (contribs) 17:02, 8 April 2022 (UTC)

Is it working now? AManWithNoPlan (talk) 18:21, 8 April 2022 (UTC)

@AManWithNoPlan: it is still not working.

I tried a few seconds ago, and got the same big bolded response: "Run blocked by your existing big run". BrownHairedGirl (talk) • (contribs) 05:42, 9 April 2022 (UTC)

Wrong S2CID

Latest comment: 2 years ago2 comments2 people in discussion

Status: Not a bug - doi was wrong
Reported by: Nardog (talk) 18:10, 8 April 2022 (UTC)

What happens: Wrong S2CID added (241198800, for "New England: phonology", instead of 242118647, for "New Zealand English: phonology").
Relevant diffs/links: Special:Diff/1075641803
We can't proceed until: Feedback from maintainers

Just simple GIGO. AManWithNoPlan (talk) 18:29, 8 April 2022 (UTC)

spelling change

Latest comment: 2 years ago2 comments2 people in discussion

Status: Not a bug
Reported by: -- ☽☆ NotCharizard (talk) 17:01, 10 April 2022 (UTC)

What happens: just changed spelling in main entry to be incorrect (connectivity -> connecvity)
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=K-vertex-connected_graph&curid=7566175&diff=1081957629&oldid=987331267&diffmode=source

You are mistaken. The bot's edit did not change 'connectivity' to 'connecvity'. That was done at the diff you provided by an ip editor. Not a bug.

—Trappist the monk (talk) 17:14, 10 April 2022 (UTC)

yeah, I know, probably gigo...

Latest comment: 2 years ago2 comments1 person in discussion

Status: Fixed, added to do not spelling correct list
Reported by: Trappist the monk (talk) 00:06, 11 April 2022 (UTC)

What happens: converts French parameter |lien= to unrelated English parameter |lccn=
Relevant diffs/links: diff
We can't proceed until: Feedback from maintainers

It begins with Editor Sulpyensid using French citation templates in the English Wikipedia at the article's creation. These templates are: fr:Modèle:Lire en ligne and fr:Modèle:Lien web. Editor Sulpyensid should have translated those French templates to their more-or-less matching English templates.

At this edit, AnomieBOT fixed the Lien web templates by substing them to {{cite web}}.

At this edit, SporkBot renamed Lire_en_ligne without completing the job by renaming the parameters within those templates. SporkBot should not be renaming non-English templates without it also renames the parameters within those templates; cs1|2 coughs up red empty citation error messages because the only parameter in these templates is |lien=.

Because of SporkBot's failure, at this edit, Citation bot changed the French |lien= (a url-holding parameter) in the English {{cite web}} template to |lccn= (an identifier-holding parameter) afterwhich cs1|2 coughs up four separate red error messages per template.

Yeah, this is a cascade of events that should not have happened had the various participants done the right things at the right time...

—Trappist the monk (talk) 00:06, 11 April 2022 (UTC)

Can be run locally?

Latest comment: 2 years ago3 comments2 people in discussion

The hosted version of this bot is frequently overloaded. Can this be run on my local machine? Is there a document that describes how that can be done? I'm especially having trouble understanding how to do oath in env.php.example. --Mblumber (talk) 22:10, 8 April 2022 (UTC)

I have updated the code and in-code documentation. You will need your own Wikipedia Oauth tokens that are described in the example ini file. Please change BOT_USER_AGENT variable. Finally, only the process_page.php file supports command line running. AManWithNoPlan (talk) 22:07, 10 April 2022 (UTC)

Should work now and is documented.

Fixed. AManWithNoPlan (talk) 19:59, 11 April 2022 (UTC)

ISFDB connected to incorrect URL

Latest comment: 2 years ago3 comments2 people in discussion

Status: Fixed
Reported by: Mike Christie (talk - contribs - library) 09:39, 13 April 2022 (UTC)

What happens: A citation to ISFDB, whose website is isfdb.org, instead gets sfdb.org as the website.
Relevant diffs/links: Diff of the error: https://en.wikipedia.org/w/index.php?title=Imagination_(magazine)&diff=prev&oldid=1082465500&diffmode=source

I have figured it out and fixed it. Also, I am going back and fixed the pages damaged. AManWithNoPlan (talk) 13:27, 13 April 2022 (UTC)

Great; thanks. Mike Christie (talk - contribs - library) 14:08, 13 April 2022 (UTC)

Default titles for dead links without flagging

Latest comment: 2 years ago1 comment1 person in discussion

Status: {{fixed}}
Reported by: Mikeblas (talk) 19:00, 13 April 2022 (UTC)

What happens: Citation bot doesn't recognize decorated dead links and places title=404 in a {{cite web}} tag. The bot made two such changes here, which went undetected for several weeks. since the robot's actions aren't monitored or audited.
What should happen: dead raw links shouldn't be converted to {{cite web}} tags and, instead, either left alone or marked with {{dead link}}. The bot's changes should be reviewed to make sure they're constructive.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Waxahachie,_Texas&diff=prev&oldid=1067168696&diffmode=source
We can't proceed until: Feedback from maintainers

Garbage title: ShieldSquare Captcha

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 20:19, 13 April 2022 (UTC)

What should happen: See the first diff line [2]

When using a "new section" and then using a gadget, the section header title is not displayed correctly

Latest comment: 2 years ago4 comments3 people in discussion

See Draft talk:Suita conjecture (diff).--SilverMatsu (talk) 01:20, 16 April 2022 (UTC)

That's an issue with WP:EDITSUMMARYs when creating new sections, where you cannot have an edit summary separate from the section title. At that time, the user is responsible for their own edits.

It might be worth investigating if edit summaries should be disabled in that situation, but really, review the changes before saving them. Headbomb {t · c · p · b} 03:08, 16 April 2022 (UTC)

@Headbomb: Thank you for teaching me !--SilverMatsu (talk) 03:53, 16 April 2022 (UTC)

and the academy award for the most unexpected bug goes to......10:53, 16 April 2022 (UTC)

Adding note to gadget talk area. AManWithNoPlan (talk) 13:58, 18 April 2022 (UTC)

Some cite magazine conversions

Latest comment: 2 years ago7 comments2 people in discussion

Status: {{fixed}}
Reported by: Lightlowemon (talk) 01:32, 17 April 2022 (UTC)

What happens: Playthings should be corrected to magazine (this looks like a junk in junk out situation)
What should happen: Cite journal should be converted to cite magazine
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Legacy_of_Kain&diff=prev&oldid=1083100047
We can't proceed until: Feedback from maintainers

Not entirely certain I'm correct on this one, but given it has magazine in the brackets on the page and is in WikiProject magazines, I feel like I am. --Lightlowemon (talk) 01:32, 17 April 2022 (UTC)

Similar here with Nintendo Power. --Lightlowemon (talk) 02:14, 17 April 2022 (UTC)

Some more missed conversions Electronic Gaming Monthly, Official U.S. PlayStation Magazine, PlayStation: The Official Magazine, Play and Silicon Mag. Also Games Radar, Hyper, Famitsu and GamePro--Lightlowemon (talk) 05:13, 17 April 2022 (UTC)

Just checked Official Xbox Magazine as well and a whole host of them in this article... I wasn't expecting there to be so many incorrect cite journals. I thought it was just Game Informer and Edge when I started looking at these. Sorry. --Lightlowemon (talk) 05:22, 17 April 2022 (UTC)

There was a not insignificant period where cite mag redirected to cite journal and where AWB was run with one of its general fixes being to change one to the other. This is probably why you are seeing so many. Izno (talk) 06:07, 17 April 2022 (UTC)

That seems like a weird choice, but it explains why there are so many, I've got one more that also seemed to stick around which is Entertainment Weekly. --Lightlowemon (talk) 00:21, 18 April 2022 (UTC)

DOI Removal

Latest comment: 2 years ago1 comment1 person in discussion

Status: {{fixed}}, very rare and obscure
Reported by: Lightlowemon (talk) 00:21, 18 April 2022 (UTC)

What happens: doi was removed, but the parameter was left behind
What should happen: Either doi stays if valid, or parameter removed with entry if not
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Submerged_floating_tunnel&diff=prev&oldid=1083231508
We can't proceed until: Feedback from maintainers

Wrong publication dates from Apple Music

Latest comment: 2 years ago3 comments2 people in discussion

Why bot adds publication dates of pages with album from Apple Music if this is not date of when page with album was published in Apple Music but something different - date of release date of this album so site with album might show up month before album release etc. Apple Music doesn't add date when page with album was added to store so it shouldn't be added. Eurohunter (talk) 09:49, 14 April 2022 (UTC)

Incorrect changed should be canccelled. Eurohunter (talk) 19:26, 15 April 2022 (UTC)

And how do you suggest finding them? AManWithNoPlan (talk) 19:44, 15 April 2022 (UTC)

Caps: Journal of the International Association of Physicians in AIDS Care

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 20:11, 22 April 2022 (UTC)

What should happen: [3]

Caps: BioMedical Engineering OnLine

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 03:53, 23 April 2022 (UTC)

What should happen: [4]

Why is the bot processing batches of drafts?

Latest comment: 2 years ago4 comments2 people in discussion

In September 2021, the bot was reconfigured to exclude drafts from batch jobs, but still allow them as individual requests. See User talk:Citation bot/Archive_27#Please_exclude_draft_space_from_batch_jobs

This was because a) most drafts are never promoted to mainspace, so it's silly to waste bot time on them when there are plenty of actual mainspace articles needing the bot's attention; b) cleaning up refs doesn't help the assessments of drafts.

But now the bot is chomping its way through the whole of Category:AfC pending submissions by age/0 days ago, at the request of RoanokeVirginia.

How is this possible? BrownHairedGirl (talk) • (contribs) 09:42, 3 April 2022 (UTC)

Not sure. Will look at when I have time. AManWithNoPlan (talk) 00:02, 19 April 2022 (UTC)

@AManWithNoPlan: did you get a chance to look at this?

More batches of drafts are being processed today, again including the whole of Category:AfC pending submissions by age/0 days ago. BrownHairedGirl (talk) • (contribs) 12:37, 28 April 2022 (UTC)

Fixed AManWithNoPlan (talk) 13:11, 28 April 2022 (UTC)

Wrong publication dates from Apple Music

Latest comment: 2 years ago7 comments2 people in discussion

What about to remove wrong edits by bot? He added publication dates of pages with album from Apple Music if this is not date of when page with album was published in Apple Music but something different - date of release date of this album so site with album might show up month before album release etc. Apple Music doesn't add date when page with album was added to store so it shouldn't be added. Eurohunter (talk) 14:24, 19 April 2022 (UTC)

Nothing will (or can) be done to remove those dates. Many tools/bots that add dates to wikipedia use those dates. Even if removed, other tools will add them back. The majority of these dates were added by human accounts and not the bot. Tracking them down and removing them is something the bot is not authorized to do. AManWithNoPlan (talk) 20:21, 19 April 2022 (UTC)

@AManWithNoPlan: Just check if there is "https://music.apple.com/" and remove date. If it can find this adress and add date then it can find this address and remove this date. Eurohunter (talk) 16:01, 20 April 2022 (UTC)

there is no way the bot could go more than an hour without being banned. AManWithNoPlan (talk) 19:56, 20 April 2022 (UTC)

@AManWithNoPlan: ? Eurohunter (talk) 20:09, 22 April 2022 (UTC)

The bot will get banned within the first couple hours of doing this. AManWithNoPlan (talk) 21:24, 22 April 2022 (UTC)

{{wontfix}} but I suggest opening a discussion on the CS1/2 help pages to get agreement. Then someone can have a bot run that fixes all of them. The bot would need to only fix ones that exactly matched the apple dates, since anything else is clearly human added. Also, if the dates are genrally not off by much, then simply truncating to the year might be advantageous. AManWithNoPlan (talk) 12:42, 30 April 2022 (UTC)

Change syntax of cite templates for future bot edits

Latest comment: 2 years ago13 comments5 people in discussion

Status: mostly {{fixed}}
Reported by: Jason Quinn (talk) 00:44, 23 April 2022 (UTC)

What happens: In template parameters, no space before pipes, spaces around equal signs

AND order of parameters should be considered

What should happen

Space before pipes, no spaces around equals. This dramatically improves readability in a text editor, and is critically important when the editor is using text wrapping as in the case in web browser form editors. We want the editor to group the related text, not unrelated text. Also, the bot should put a lot more thought into the order of the cite template parameters inserted. In both CS1 and CS2 there's a roughly similar order for the display appearance of the reference. It is very helpful when the cite template roughly matches this. It is also helpful when the unreadable things like |url= come near the end instead of at the beginning. Also details like using |last= instead of |last1= when there's only one author are nice. So instead of

{{Cite web|url = https://www.example.com|title = Star Trek Picard is Bad|publisher = Non-shrill reviewer|last1 = Quinn|first1 = Jason|date = 23 April 2022}}

it is dramatically better as something like

{{cite web |last=Quinn |first=Jason |date=23 April 2022 |title=Star Trek Picard is Bad |publisher=Non-shrill reviewer |url=https://www.example.com}}

In isolation this may not be instantly obvious, but when an article has many references and you are using a wrapping text editor, the benefits of the later formatting become striking for readability.

Relevant diffs/links: example of undesired spacing at Autocracy
We can't proceed until: Feedback from maintainers

For existing citations, this will run afoul of WP:COSMETICBOT and piss people off by the dozens. This should not be done by bot (again, for existing citations). Could be done for new citations though. Headbomb {t · c · p · b} 03:25, 23 April 2022 (UTC)

the number of people who act like this type action is worse than <<insert heanous act>> is suprising. although, I think that is because people react to things they have control over (one death is a tragedy, one million is a statistic) AManWithNoPlan (talk) 11:20, 23 April 2022 (UTC)

@Headbomb:. I don't think I explained myself properly. I am not proposing to fix existing citations. I am proposing that the bot be changed so that NEW citations follow the given format. Please reopen under that idea. Jason Quinn (talk) 16:11, 23 April 2022 (UTC)

No way. The bot will be banned. We follow the existing format within each template. AManWithNoPlan (talk) 18:28, 23 April 2022 (UTC)

Umm, I'm perplexed. In the OP's diff, search for {cite and you will find three pre-existing cs1|2 templates that all have the spacing style described by Editor Jason Quinn. Looking at the whole article, there are 18 cs1|2 templates; all but one (the template created by the bot) have the spacing style described above. The bot may follow the existing format within each template when modifying a template, but clearly, the bot does not follow the existing format that predominates in the article. But, when creating a new template from a url as was the case here, it does not have to. Recently, the template data that controls the spacing used by that abomination that is visual editor, was changed to use the spacing style described by Editor Jason Quinn. As far as I know, there has been no pushback from that. New templates created with WP:RefToolbar also use that spacing:

{{cite web |last1=Quinn |first1=Jason |title=Star Trek Picard is Bad |url=http://www.example.com |website=Non-shrill reviewer |date=23 April 2022}}

Because both of the primary cs1|2 template creation tools create spaced templates, it seems to me that The bot will be banned for creating NEW citations [that] follow the given format is just not true.

—Trappist the monk (talk) 19:02, 23 April 2022 (UTC)

try out new code. AManWithNoPlan (talk) 21:38, 23 April 2022 (UTC)

@AManWithNoPlan and Headbomb:. As Trappist wrote, the Bot doesn't detect and follow the predominate style so I also don't follow that objection. Nor do I see how changing the bot to use the suggested formatting for its future edits runs afoul of WP:COSMETICBOT. But I do want bot editors and bot maintainers to be extremely meticulous in the quality of the bot's edits and to have carefully considered them. While there's no official syntax of cite template parameters, I do expect bot devs to care about it when coding the bot. I am suggesting, for instance, that not putting a space before the pipe hinders reading of the source in text editors, especially those that wrap text. The same goes for putting spaces around the equals sign. If there's a reasonable argument that is weightier than these objections I fail to guess what it is. In fact, as Trappist also mentions, cite template formatting has tended to evolve to the format I'm proposing. Why? Better it's simply better. A bot should want to take advantage of that. Jason Quinn (talk) 05:46, 24 April 2022 (UTC)

PS I have changed the section heading title from "Fix bad spacing choices" to "Change syntax of cite templates for future bot edits" because I think my title was too ambiguous and took the initial discussion in the wrong direction. I arrived here by clicking a "Report a bug" link for the bot. So I wrote the title as if it were a bug report suggesting a change to the current version of the software. Jason Quinn (talk) 05:52, 24 April 2022 (UTC)

Please check new default for new refs. AManWithNoPlan (talk) 20:53, 24 April 2022 (UTC)

@AManWithNoPlan: What do you mean by this? I had checked the github but didn't see any commits or any discussion about a new default or any edits using a new default. Jason Quinn (talk) 08:42, 27 April 2022 (UTC)

I mean try it out on some examples. while looking over the code change can be insightful, the actual actions of the bot dont always match what a quick look at the code would imply. AManWithNoPlan (talk) 11:20, 27 April 2022 (UTC)

This was brought up here before, and there was general agreement that the "tidy" spaces-before-pipes-no-spaces-elsewhere format is superior to the "crammed" no-spaces-anywhere format and the "roomy" spaces-everywhere format. Anything else is awful and marks the users who do it as careless or perhaps unhinged. But I think it might be better to bring this up at a larger forum, such as Help talk:Citation Style 1 or the WP:Village Pump. Abductive (reasoning) 03:07, 29 April 2022 (UTC)

Caps: Cutter IT Journal

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 04:36, 1 May 2022 (UTC)

What should happen: [5]

Converts an arxiv link to cite web instead of cite arxiv

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 06:36, 3 May 2022 (UTC)

What happens: [6]
What should happen: cite arxiv instead, i.e. [7]

Fails to expand cite arxiv with a v#

Latest comment: 2 years ago2 comments1 person in discussion

Status: Fixed - added support for v# to the regex
Reported by: Headbomb {t · c · p · b} 21:38, 2 May 2022 (UTC)

What should happen: [8]
We can't proceed until: Feedback from maintainers

I had to change <ref>{{arxiv|q-bio/0309009v1}}</ref> to <ref>{{arxiv|q-bio/0309009}}</ref> to make it expand. Headbomb {t · c · p · b} 21:38, 2 May 2022 (UTC)

Fails to expand when No title found

Latest comment: 2 years ago3 comments2 people in discussion

Status: Fixed - will now expand if journal and (issue/volume) and year and pages are all found.
Reported by: Headbomb {t · c · p · b} 22:02, 2 May 2022 (UTC)

What should happen: Ideally, this . But if that can't be done, this.
We can't proceed until: Feedback from maintainers

by design. the bot has safeguards to avoid making things worse. it is an odds game: it misses some expansions in exhange for not adding junk. AManWithNoPlan (talk) 11:26, 3 May 2022 (UTC)

Surely those safeguards are overly aggressive here. How can things be worse than a bare doi:10.1023/A:1018861226606? It's one thing to not touch an existing CS1/2 template with a title that doesn't expand. But here there's literally nothing to make worse. Headbomb {t · c · p · b} 11:35, 3 May 2022 (UTC)

Cite news to cite journal conversion

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 23:14, 13 April 2022 (UTC)

What should happen: [9]

incomplete expansion

Latest comment: 2 years ago2 comments2 people in discussion

Status: Won't fix
Reported by: Astro$01 (talk) 23:42, 2 May 2022 (UTC)

What happens: bot looks to website for authorship, e.g., "ABC News, and cites web
What should happen: should cite news for a news article and cite actual author & organization, e.g., "Felicia Fonseca", "Associated Press"
Relevant diffs/links: https://abcnews.go.com/US/wireStory/autopsy-teenage-girl-died-dog-attack-navajo-nation-78840441

A news story should cite the author and organization they work for, if present in the original web page.

Sorry, but the problem is with the zotero host that wikipedia provides, and it outside the control of the bot. AManWithNoPlan (talk) 14:17, 3 May 2022 (UTC)

Better hdl handling / cleanup

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 09:55, 9 May 2022 (UTC)

What happens: [10]
What should happen: [11]

Cosmetic edit: Template capitalization

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 17:35, 9 May 2022 (UTC)

What should happen: Only if there are other non-cosmetic changes

Better issue/date declusterfuckering

Latest comment: 2 years ago2 comments2 people in discussion

Status: Won't fix, since difficulty to benefit ratio is too high
Reported by: Headbomb {t · c · p · b} 18:49, 5 May 2022 (UTC)

What should happen: [12]
We can't proceed until: Feedback from maintainers

It is a thing, and not an easy one to solve via automation at first glance. Unrelated observation: the last word of the section title cannot be used in Scrabble, as I think you made it up. 50.74.109.2 (talk) 01:21, 8 May 2022 (UTC)

Proper conversion of cite journal |doi=10.48550/arXiv.####.##### to proper cite arXiv |eprint=####.#####

Latest comment: 2 years ago2 comments1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 02:11, 10 April 2022 (UTC)

What happens: [13]
What should happen: [14]

That is, when you have |journal=arxiv ..., TNT the template as {{cite arxiv |eprint=...}} and expand. What the eprint is can be determined from the DOI or the url. Headbomb {t · c · p · b} 02:11, 10 April 2022 (UTC)

Added {Cite xxx}s clash with common-use {cite xxx |param1=value1 |param2=value2}

Latest comment: 2 years ago2 comments1 person in discussion

Status: Fixed
Reported by: A876 (talk) 01:29, 10 May 2022 (UTC)

What happens: When it fixes bare references, it puts {{Cite web | url=httzzzzz | title=zzz }}.
What should happen: It should put {{cite web |url=httzzzzz |title=zzz}}.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Panos_Panay&diff=prev&oldid=1087050803

It is discouraging to see a bot rolling backwards like this. My objectionable gaffes get multiplied by only 1000 views. Objectionable gaffes by bots get multiplied by 1000 edits * 1000 views. (I am jealous. 😀)

{{Cite web}} most everywhere else is spelled {{cite web.
Parameters for inline citation templates (and most templates) are documented as |param1=value1. (One space before |, none after. No spaces around =.)
Space before closing }} is pointless.

(When changing existing {{cite web}}, it appears to leave existing case and spacing as-is, except for spaces before and after parameter(s) that it creates.[15]) -A876 (talk) 01:29, 10 May 2022 (UTC)

|chapter= is not a valid parameter for cite web

Latest comment: 2 years ago2 comments2 people in discussion

Status: Fixed
Reported by: Trappist the monk (talk) 13:42, 7 May 2022 (UTC)

What happens: bot changed |title= in {{cite web}} to |chapter= and changed |url= to |chapter-url=
What should happen: in this case, the best possible action would have been to change {{cite web}} to {{cite grove}} as I did here; barring that, bot should not use |chapter= (or aliases thereof) in {{cite web}}, {{cite journal}}, {{cite magazine}}, {{cite news}}, {{cite periodical}}; this same restriction applies to adjunct parameters |chapter-url=, |script-chapter=, etc
Relevant diffs/links: diff

Often it is better to leave these for people to fix. They are rare, but I will look into specific cases that can be bot fixed. AManWithNoPlan (talk) 18:32, 9 May 2022 (UTC)

Bad series= on conference

Latest comment: 2 years ago3 comments2 people in discussion

Status: Fixed
Reported by: —David Eppstein (talk) 06:45, 11 May 2022 (UTC)

What happens: The Association for Computing Machinery produces bad metadata for conference proceedings like the "Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '99)" with the acronym for the conference (including its year, capitalized incorrectly as "Soda '99") in the series parameter. That is not the name of a series. Citation bot has been importing this garbage wholesale into Wikipedia.
What should happen: Do not respect series= data from ACM. It is bad.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Anna_Lubiw&type=revision&diff=1087222456&oldid=957764836

In the same citation, the url= should have been contribution-url=. It would be nice if citation bot could have caught and fixed that error instead of adding the wrong series. —David Eppstein (talk) 06:56, 11 May 2022 (UTC)

https://github.com/ms609/citation-bot/commit/32a6e4904b4e16604ebe502f4ad2bd8e53de519f AManWithNoPlan (talk) 13:05, 11 May 2022 (UTC)

External Relations as author

Latest comment: 2 years ago3 comments2 people in discussion

Status: Fixed
Reported by: —David Eppstein (talk) 07:47, 11 May 2022 (UTC)

What happens: Adds last=Relations first=External
What should happen: not that
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Katherine_Heinrich&type=revision&diff=1087242291&oldid=1085074266

PS re this and the previous bug: For some reason a recent run is triggering a huge number of citation bot edits in my watchlist. So far only those two have been problematic, and the problems are relatively minor. All the rest of the edits look good to me. So, my thanks for making the bot so reliable and useful. —David Eppstein (talk) 07:48, 11 May 2022 (UTC)

https://github.com/ms609/citation-bot/commit/637bbe891398f37754e30e269072058794a03d86 AManWithNoPlan (talk) 13:01, 11 May 2022 (UTC)

title=404 Not Found

Latest comment: 2 years ago3 comments2 people in discussion

Status: Fixed
Reported by: BrownHairedGirl (talk) • (contribs) 10:58, 12 May 2022 (UTC)

What happens: bare URL filled with |title=404 Not Found
What should happen: Nothing. That is an error message, not a title, so the bot should either skip that ref or tag it with {{Dead link}}.

(However, Citation bot could usefully catch a large trout, and go give the zotero server a sustained trout-slapping for feeding such junk to the bot. The zotero should itself reject such a title from a website)
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Thornton_v_Shoe_Lane_Parking_Ltd&diff=1087423252&oldid=1082982942

just when you think you have all the bad 404 forms covered. AManWithNoPlan (talk) 11:20, 12 May 2022 (UTC)

Sadly, webmasters are an ingenious bunch who show huge creativity in devising new ways to break the really simple HTTP 404 response, and thereby make avoidable extra work for you

I just did a wiksearch for "insource:/\| *title *= *404 Not Found/i", which gave 135 hits. So this is evidently not a new issue, although obviously they may not all have been caused by Citation bot.

Some of them have since been archives, and those ones may be rescuable. A simple revert of all of them would therefore not be appropriate, so I will begin a selective cleanup. --BrownHairedGirl (talk) • (contribs) 11:30, 12 May 2022 (UTC)

Fails to properly TNT cite journal with journal = arxiv... / handle

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 01:25, 12 May 2022 (UTC)

What happens: [16]
What should happen: TNT + expand

Cite conference is book-title&title, not title&chapter

Latest comment: 2 years ago3 comments2 people in discussion

Status: Fixed
Reported by: Gumshoe2 (talk) 19:49, 11 May 2022 (UTC)

What happens: [17]

The metadata added in this edit reproduces information already present, but makes it so that the most relevant information (article title rather than book title) does not render, with the book title repeated twice. For example, it modified

Floater, Michael S.; Hormann, Kai (2005). "Surface parameterization: a tutorial and survey". In Dodgson, Neil A.; Floater, Michael S.; Sabin, Malcolm A. (eds.). Advances in multiresolution for geometric modelling. Papers from the workshop (MINGLE 2003) held in Cambridge, September 9–11, 2003. Mathematics and Visualization. Berlin: Springer. pp. 157–186.

to

Floater, Michael S.; Hormann, Kai (2005). "Advances in Multiresolution for Geometric Modelling". In Dodgson, Neil A.; Floater, Michael S.; Sabin, Malcolm A. (eds.). Advances in multiresolution for geometric modelling. Papers from the workshop (MINGLE 2003) held in Cambridge, September 9–11, 2003. Mathematics and Visualization. Berlin: Springer. pp. 157–186.

Gumshoe2 (talk) 19:49, 11 May 2022 (UTC)

The existence of |chapter= in {{cite conference}} should be an error that CS1/2 tracks. AManWithNoPlan (talk) 20:09, 11 May 2022 (UTC)

Trivial and undesirable changes to `|work=`

Latest comment: 2 years ago5 comments5 people in discussion

Status: Not a bug - as discussed
Reported by: — SMcCandlish ☏ ¢ 😼 10:33, 23 April 2022 (UTC)

What happens: Making trivial changes, with no other edits, in contravention of WP:COSMETICBOT policy.
What should happen: Should not make trivial changes except as part of a more substantive edit. Also, it should not be making this specific change at all, since it is not an improvement: There is no reason to use a long parameter alias when a short one will do, and using this particular short one, |work=, facilitates conversion between citation templates, e.g. when {{Cite web}} or {{Cite news}} would be more appropriate for the source in question.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Van_cat&type=revision&diff=1083853848&oldid=1083423045&diffmode=source
We can't proceed until: Feedback from maintainers

While a bug for it to be done alone, reverting that edit is nonsense. |journal= is clear, |work= isn't. Headbomb {t · c · p · b} 13:40, 23 April 2022 (UTC)

Definitely an unorthodox use of work in the cite journal template. have never seen that system. template clearly shows using journal parameter. seems like the bot made a good fix to me. — Chris Capoccia 💬 14:13, 23 April 2022 (UTC)

And SMcCandlish's revert was a breach of WP:COSMETICREVERT. BrownHairedGirl (talk) • (contribs) 20:48, 26 April 2022 (UTC)

Editors also convert from cs1 to cs2 so converting {{cite journal}} to {{citation}} when the journal title is held in |work= causes a loss of 'journal-style' |volume= and |issue= formatting:

{{cite journal |title=Title |work=Journal |volume=123 |issue=6}} – assuming that the source really is a scholarly or academic journal

"Title". Journal. 123 (6).

{{citation |title=Title |work=Journal |volume=123 |issue=6}} – now renders like a magazine or generic periodical

"Title", Journal, vol. 123, no. 6

No doubt, going the other way can cause similar mis-rendering. So, in general, the 'work' parameter should follow the template name: |journal= for {{cite journal}}, |magazine= for {{cite magazine}}, |periodical= for {{cite periodical}}, |website= for {{cite web}}, certainly |newspaper= for {{cite news}} but (until we invent something that is more semantically correct) |website= or |work= also for {{cite news}}.

—Trappist the monk (talk) 14:45, 23 April 2022 (UTC)

title=404页面

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: BrownHairedGirl (talk) • (contribs) 01:58, 13 May 2022 (UTC)

What happens: | title=404页面
What should happen: nothing. The link is dead
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Vantour_Mangoungou&diff=prev&oldid=1087519539

Caps: AORN J/AORN J.

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 06:28, 13 May 2022 (UTC)

What happens: [18]
What should happen: [19]

Untitled_new_bug

Latest comment: 2 years ago4 comments4 people in discussion

Status: Not a bug
Reported by: Epiphyllumlover (talk) 21:56, 15 May 2022 (UTC)

What happens: The bot keeps bolding the word "Roe" in references
What should happen: It should leave the existing < i >< / i > formatting in place so it doesn't bold things in references
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Roe_v._Wade&diff=1087952340&oldid=1087913623

Huh? The diff shows only the bot changing a curly quote to a straight one per MOS:CURLY. There is no bold visible there or nearby. —David Eppstein (talk) 22:13, 15 May 2022 (UTC)

Whatever this complaint is, it's got nothing to do with Citation bot. Headbomb {t · c · p · b} 22:18, 15 May 2022 (UTC)

This edit changed <i>Roe</i>'s to ''Roe'''s but that edit was not made by this bot. Fixed by changing to ''Roe''{{'s}}

—Trappist the monk (talk) 22:31, 15 May 2022 (UTC)

title=Sign up | LinkedIn

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: BrownHairedGirl (talk) • (contribs) 14:59, 15 May 2022 (UTC)

What happens: bot fills a bare URL ref to a http://www.linkedin.com profile with the generic title | title=Sign up | LinkedIn
What should happen: Nothing. Better to leave the link bare
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Ron_Arnold&diff=prev&oldid=1087969140
Replication instructions: try any of the 697 pages returned in a search for insource:/\>https?:\/\/(www\.)?linkedin\.com[^ \<\>\{\}]+ *(\{\{bare *url *inline[^\}]*\}\} *)?\<\/ref/i

El País is a newspaper, not a person

Latest comment: 2 years ago1 comment1 person in discussion

Status: {{fixed}} once deployed, which might take a while
Reported by: BrownHairedGirl (talk) • (contribs) 21:38, 16 May 2022 (UTC)

What happens: BEFORE: {{Cite web|url=https://elpais.com/diario/1980/10/18/cultura/340671609_850215.html|website=[[El País]]|date=18 October 1980|title=El destino de una vida}}
AFTER: {{Cite news|url=https://elpais.com/diario/1980/10/18/cultura/340671609_850215.html|website=[[El País]]|date=18 October 1980|title=El destino de una vida|last1=País |first1=El }}
What should happen: nothing
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=D%C3%A1maso_Berenguer&diff=1088228152&oldid=1087820834
Replication instructions: This seems to be a quirk of how El País handles archive pages.
This article https://elpais.com/diario/1980/10/18/cultura/340671609_850215.html places the newspaper's name in the position where I would expect to find the name of the author, which is where the place where it is in article from today: https://elpais.com/internacional/2022-05-16/putin-afirma-que-la-entrada-de-finlandia-y-suecia-en-la-otan-no-supone-una-amenaza-inmediata-para-rusia.html
We can't proceed until: Feedback from maintainers

CAPS: For.

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 14:30, 18 May 2022 (UTC)

What happens: [20]
What should happen: [21] (or rather it shouldn't have touched 'For.' to begin with).

Double edit

Latest comment: 2 years ago2 comments1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 14:23, 18 May 2022 (UTC)

What happens: [22]+[23]

Seems to be due to first adding |doi-access=free, then realizing you don't need the URL anymore. Headbomb {t · c · p · b} 14:42, 18 May 2022 (UTC)

Lower case to capital

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Roelof Hendrickx (talk) 10:07, 18 May 2022 (UTC)

What happens: Changing journal=Spiegel der Historie. Maandblad voor de geschiedenis der Nederlanden by journal=Spiegel der Historie. Maandblad voor de Geschiedenis der Nederlanden.
What should happen: The lower case g in geschiedenis should not be changed into capital G
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=John_VII,_Count_of_Nassau-Siegen&type=revision&diff=1088467180&oldid=1087773875&diffmode=source

Apostrophe

Latest comment: 2 years ago6 comments3 people in discussion

Status: {{not a bug}}
Reported by: Roelof Hendrickx (talk) 10:17, 18 May 2022 (UTC)

What happens: Changing title=l’Allemagne Dynastique into title=l'Allemagne Dynastique
What should happen: It shouldn't be changed. An apostrophe is written as ’ not as '
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=John_VII,_Count_of_Nassau-Siegen&type=revision&diff=1088467180&oldid=1087773875&diffmode=source
We can't proceed until: Feedback from maintainers

On Wikipedia, they're ' not ’, see MOS:APOSTROPHE. Headbomb {t · c · p · b} 10:38, 18 May 2022 (UTC)

Then Wikipedia is telling me that 1x1=3. Graphic designers designed the apostroph as ’ instead of '. And they had a very good reason for it. The former is far more readable than the latter. As a professional editor it is a violation of my professional code to use the reverse tear on modern keyboards. I dislike it when a bot is changing texts that are better. Roelof Hendrickx (talk) 12:15, 18 May 2022 (UTC)

If you have a problem with MOS:APOSTROPHE, take it to WT:MOS.

And as a 'professional editor', you should be well-aware of external requirements that go against your personal preferences. Headbomb {t · c · p · b} 14:26, 18 May 2022 (UTC)

As a professional editor for almost 34 years I have never been asked to go for consistency making it worse/less readable than it was. Besides that am I not talking about personal preferences but about normal use of apostrophes in texts. Roelof Hendrickx (talk) 14:56, 18 May 2022 (UTC)

Accessibility and consistency are both pillars of wikipedia, and probably legal requirements in many areas. AManWithNoPlan (talk) 15:36, 18 May 2022 (UTC)

Changing correct url

Latest comment: 2 years ago7 comments3 people in discussion

Status: {{not a bug}}
Reported by: Roelof Hendrickx (talk) 10:19, 18 May 2022 (UTC)

What happens: Changing url=https://books.google.fr/books?id=uUITAAAAQAAJ&pg=PA683&dq=mariage+watteville+boba into url=https://books.google.com/books?id=uUITAAAAQAAJ&dq=mariage+watteville+boba&pg=PA683
What should happen: Nothing, the original link was and is correct and working
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=John_VII,_Count_of_Nassau-Siegen&type=revision&diff=1088467180&oldid=1087773875&diffmode=source
We can't proceed until: Feedback from maintainers

The second link is simpler and doesn't contain useless parameters. Not a bug. Headbomb {t · c · p · b} 10:38, 18 May 2022 (UTC)

The first link is correct so a bot shouldn't change it. So yes, it is a bug imho. Roelof Hendrickx (talk) 12:10, 18 May 2022 (UTC)

It's also de-localized (.fr to .com). Not a bug. Headbomb {t · c · p · b} 14:25, 18 May 2022 (UTC)

Why is it so difficult to understand the principle do not change what is not incorrect? Roelof Hendrickx (talk) 14:53, 18 May 2022 (UTC)

They are incorrect according to the wikipedia style guides for google books links.AManWithNoPlan (talk) 15:11, 18 May 2022 (UTC)

"Not incorrect" doesn't mean "can't/must not be improved." I can write 'In MMXXII James bought XVI pears and CDLII apples' and that would be not incorrect. That doesn't mean it's what I should write. Headbomb {t · c · p · b} 15:30, 18 May 2022 (UTC)

Bot thinks El País is the name of an author

Latest comment: 2 years ago2 comments2 people in discussion

Status: Fixed
Reported by: Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 03:22, 19 May 2022 (UTC)

What happens: The bot mistakenly assumes that "El País" is the name of the article's author, rather than that of the website that published it, and inserts author-name fields to this end.
What should happen: Not that.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=D%C3%A1maso_Berenguer&diff=1088567386&oldid=1088228595

@Whoop whoop pull up: I can't fix the bot, but I'll work on fixing the existing articles with incorrect references. GoingBatty (talk) 03:52, 19 May 2022 (UTC)

The bot was rebooted. I suspect all running jobs died

Latest comment: 2 years ago1 comment1 person in discussion

AManWithNoPlan (talk) 13:37, 19 May 2022 (UTC)

Incorrect change of template type

Status: Fixed
Reported by: Invasive Spices (talk) 19 May 2022 (UTC)

What happens: Incorrect change {{cite proceedings}}⇨{{cite journal}}
Relevant diffs/links: This diff.

Slack in bot usage

Latest comment: 2 years ago24 comments7 people in discussion

It seems like demand for the bot has declined quite a bit, and there have been periods with no jobs running. There are often long stretches with only one job running. Perhaps the job size limit could be increased a bit, and see how it goes? Abductive (reasoning) 15:23, 13 May 2022 (UTC)

define("MAX_PAGES", 2850); now set. AManWithNoPlan (talk) 20:22, 13 May 2022 (UTC)

Part of the slack might be some significant speed-ups made to the code implemented recently. AManWithNoPlan (talk) 20:23, 13 May 2022 (UTC)

Isn't 2850 the current limit? Abductive (reasoning) 03:52, 14 May 2022 (UTC)

added a thousand. Now. 3850. AManWithNoPlan (talk) 11:43, 14 May 2022 (UTC)

Thanks! Abductive (reasoning) 19:01, 14 May 2022 (UTC)

Curious - why 3850, instead of a rounder number like 4000 or 4096? (Also, when'd the limit get raised from 2200 to 2850?) Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 19:56, 14 May 2022 (UTC)

So I tried a single-shot using the ☑ Citations button. It failed as usual. Quelle surprise. Not. If priority is to be given to these bottom-dragging trawls, please just stop wasting everybody else's time and withdraw the button. --John Maynard Friedman (talk) 20:13, 14 May 2022 (UTC)

Oftentimes the page gets processed eventually even if you see a 502 error. Also, your OAuth login to Meta expires in something like 24 hours, so if it has been more than a day since you tried to run the bot, a delay in Meta bringing up the login will look like a delay in the bot. These Meta delays are quite common. Abductive (reasoning) 00:01, 15 May 2022 (UTC)

That's activating through the 'expand citations' options. The ☑ Citations button times out. Headbomb {t · c · p · b} 02:15, 15 May 2022 (UTC)

Exactly. Like most users, I am not using a bot. --John Maynard Friedman (talk) 10:34, 15 May 2022 (UTC)

Put me down for a return to a ~1k page limit. Headbomb {t · c · p · b} 02:17, 15 May 2022 (UTC)

And me. 2850 was anti-social. 3850 looks like enemy action. Why are the very few (<5?) bot operators allowed to make this tool unusable for everyone else. Like I suspect most other 'single shot' users, I had given up on using the button because failure was the usual option. When any efficiency gain is immediately thrown to the wolves, that will persist. If the bot limit is to be increased rather than heavily reduced, let's stop pretending that there is a credible single shot option. --John Maynard Friedman (talk) 10:34, 15 May 2022 (UTC)

I agree with JMF.

If bot tasks are indeed making human submissions impossible, then some sort of queuing system needs to be fixed or placed to make sure the bot tasks go as planned while prioritizing human submissions to the bot. This should be of utmost importance as any other bug fixes are worthless if people can't even use the thing.

Even archive.today and ghostarchive.org, with a budget probably 1000x smaller than the WMF, have priority queues, so the human submitters get priority over what i presume are automated sumbitters. So it can be done, the question now is will the WMF put in the resources to do it? Rlink2 (talk) 13:00, 15 May 2022 (UTC)

It's not really why the citation button chokes. That can be due to a ton of reasons. The issue with runs bigger than 1000 pages is that certain users run the bot against more-or-less random categories and it hogs the bot for a really long time, at the expense of other more targeted runs. Headbomb {t · c · p · b} 13:34, 15 May 2022 (UTC)

I agree with Headbomb.

Last year, I reported several times on how Abductive was abusing Citation bot by running huge, untargeted runs on categories, which generated very low returns. Abductive repeatedly argue verbosely that having the bot trawl through vast sets of pages where it had almost nothing to do was somehow an appropriate use of the bot. Many other editors pointed out how that was a waste of a precious resource, because the bot could otherwise be processing jobs with much higher rates of return, but Abductive just parroted slogans along the lines of "my job is as valid as anyone else's", completely ignoring the low productivity of his jobs.

Mercifully, in the last few months, Abductive stopped doing these low-return speculative trawls. But in the last week or so, they have resumed, with the same issues: most pages in Abductive's batches are not edited, and most of those which are edited have only trivial changes such as removing unused parameters or changing the style of quotation marks.

This is not a technical problem. It is a human problem of one editor who has a severe WP:IDHT issue, and the remedy is simply to ban Abductive from using Citation bot. BrownHairedGirl (talk) • (contribs) 14:44, 15 May 2022 (UTC)

Oh, ok so its not a technical issue, rather with one person who is supposedly abusing the bot. I now understand. Well on Wikipedia, consensus is required, so if Abductive is using the bot in way that is unfair to other editors, and other editors tell him to stop doing it, then he should change or explain his behavior.

I do recall reading about this behavior in the talk page archives a while ago. User_talk:Citation_bot/Archive_29#Bot_still_being_abused_by_Abductive - not sure if it revelant. Regardless, this is a serious issue that needs to be discussed in its entirely. I think the previous thread was closed too early. Rlink2 (talk) 15:01, 15 May 2022 (UTC)

@Rlink2: there were at least half-a-dozen threads about Abductive's abuse of Citation bot, some of which were very lengthy. The issue has already been discussed in its entirety, many times ... and Abductive's tediously self-serving explanations have been heard and rejected many times.

The problem is well-documented, so there is no point in rehashing those discussions again.

Abductive is:

running low-return speculative trawls of categories
flooding the job queue with so many single-page requests that the other requests don't get seen, leaning to available slots going unused.

The only issue now is: will Abductive voluntarily desist from these long-running abuses of Citation bot, or do we have to ban Abductive from abusing the bot? BrownHairedGirl (talk) • (contribs) 15:16, 15 May 2022 (UTC)

The definition of Abductive reasoning is to

start with an observation or set of observations and then seeks the simplest and most likely conclusion from the observations.

The observations are the following:

Abductive continues to push through even though other editors tell them to stop. Per Wikipedia guidelines: Serious or repeated breaches or an unwillingness to accept feedback from the community (Wikipedia:I didn't hear that) may be grounds for sanction. if there are many threads regarding this, then it shows a repeated unwillingness to accept feedback, instead just pushing right through.
This comment from Abductive stood out to me: The fundamental problem is that we have all been running the bot so much that is has fixed most articles by now, so that category runs aren't going to have a high enough rate of return.. So he acknowledged that the jobs was not useful, but yet when BrownHairedGirl asked why he is still wasting his time with the jobs, he basically had no response.

So the simplest conclusion is that either Abductive should stop the behavior or be banned from the bot.

Note that I am trying to stay neutral for now because I don't know the whole story. But it seems to be a "one vs many" situation, which Abudctive being the "one". Rlink2 (talk) 15:33, 15 May 2022 (UTC)

@Rlink2: the whole story is in the talk page archives, if yo have the time and energy to wade through it. It includes many occasions when I documented in detail how Abductive was abusing the bot, with links which make it verifiable.

One of Abductive's defences of this conduct is that they are using spare capacity in the bot, which would otherwise be wasted: better, says Abductive, to use it for low-return tasks than to have it unused.

That is superficially plausible, but only superficially. The flaw in this logic is that the unless the bot has spare capacity at any given moment, it cannot respond promptly to a single-page request by an editor who is using the bot interactively to assist in their manual work on a page. It may mean that the bot does not ever process that single-page request, because it times out before a slot is available.

So ... trying to use all the bot's capacity is actually a fundamentally bad idea. Like spaces in a car park or beds in a hospital, 100% utilisation is a nightmare of users. Spare capacity is what make the system usable.

It is this problem which @John Maynard Friedman (JMF) complains about, and righty so. The bot is an invaluable tool for helping to build complex citations while writing an article. See Wikipedia:Wikipedia Signpost/2022-08-01/Tips and tricks for examples of how it can help.

This ability of the bot to take an isbn or a doi or a handle and build a full reference to a book or journal is a huge timesaver. I often use it myself when manually filling the refs on an article (as I did 3 times yesterday), and in my experience 'it can save 5-10 minutes for each reference to a scholarly journal. (The complexity of those refs needs a lot of careful checking, so 5-10 minutes each is a genuine number).

Abductive's approach of using-all-the-bot's capacity makes to the bot unavailable for this sort of work, which actively sabotages the efforts of those improve articles. I thoroughly understand why JMF is so angry, and I share that anger. BrownHairedGirl (talk) • (contribs) 16:17, 15 May 2022 (UTC)

All that argument has merit but let's not lose sight of the main issue. If bot runs are restricted to 1000 edits, no matter who is making them or how lucky they feel, at least requiring their operators to take manual intervention every ten minutes might (a) give the rest of us some oxygen and (b) give them pause to consider whether there is not something more useful to spend their (and our) time on. --John Maynard Friedman (talk) 16:02, 15 May 2022 (UTC)

@John Maynard Friedman: as above, I have huge sympathy for your frustration about the unavailability of the bot to assist in manual referencing.

However, I am exasperated by your continued refusal to distinguish between productive and unproductive batch jobs. Your desire to punish those using the bot for productive batches is a vindictive and disruptive case of WP:IDHT.

Lemme explain. For 10 months, I have used Citation bot to fill WP:Bare URLs. I use a variety of tools to build and maintain lists of bare URLs, which I feed to the bot in systematic ways designed to minimise re-processing of articles. I also tag bare URLs of types which the bot cannot fix (see e.g. the 38,000 article which use {{Bare URL PDF}}). I also follow behind the bot to use other tools to fill ref which the bot fails to fix, and to identify websites where the can fill refs ... and I have developed a bunch of tool to identify and tag dead links: in February alone, my tools identifies and tagged over 60,000 bare URL refs which were actually dead.

This work by me takes many hours per day of list-making, tagging and programming. I estimate that in the last ten months, I have put over 3,000 hours of work into this task.

I have put in that time because it is getting results. At the start of May 2021, there were ~470,000 en.wp articles with bare URL refs. Now there about 140,000. That fall of ~330,000 masks the real progress, because new bare URLs are added at a rate of over 300 per day. So without this work, the tally would have grown by ~90,000, meaning that about 420,000 pages have been cleaned up.

Note that many of those pages contained multiple bare URLs, so the total number of bare URLs filled by this work is over a million.

I do not want any praise or thanks for this. But I am angry at your contemptuous desire to sabotage this work.

I try to keep Citation bot filling bare URLs 24 hours per day, and I do that by structuring my days so that I available to start a new batch when one finishes. That means keeping my laptop open while I do other task, and it often means setting alarms so that I wake in the night to start a new batch.

Your call for me to manually intervene every ten minutes would allow me to run these big batches only if I never slept, or did anything which took me away from my desk for more than ten minutes. Since that is impossible, the effect of your proposal would be to allow me to set Citation bot to fix bare URLs for only about 4 or 5 hours per day. That would be possible only if I took on a big extra burden, and even so it would reduce the productivity of this task by about 80%. It's slow enough already, so I would simply stop this work.

If you genuinely think that me using the bot fill about a million bare URLs is silly and that I should "find something more useful to spend my time on", then please say so directly, and we can discuss whether there is consensus for your view. Please note that both @Headbomb and @AManWithNoPlan also run a lot of targeted, productive batches, so your approach would also sabotage their work.

This is far from the first time when you have failed to acknowledge the distinction between productive and unproductive batch jobs, and have chosen instead to lash out indiscriminately, lumping those of us who target the bot efficiently into te same category as those who waste the bot's time. That distinction has been pointed out to you many times, and it is made clear in his thread, but yet again you choose to ignore it. Damn you. BrownHairedGirl (talk) • (contribs) 16:58, 15 May 2022 (UTC)

I intended no contempt nor, on re-reading, do I see any reasonable inference of such. Yes, I totally understand your frustration that the behaviour as you see it of one bot operator is bringing all operators into disrepute. Yes, I notice that your runs have generally brought improvement to pages I watch. But looked at from the perspective of the many who want to validate the citations in an article they have worked on, the fact that they cannot do so appears to be caused by a much smaller number of bot operators. It is impractical for me to make a value judgement between operators and I consciously choose not to try. It is not obvious to me why a limit of 1000 articles per run will bring your work to a halt, only that you will have to keep restarting it.

The other option of course is to formally request that Abductive's bot privileges be withdrawn but that will take time. John Maynard Friedman (talk) 17:49, 15 May 2022 (UTC)

Nonsense, @John Maynard Friedman. It is perfectly practical to make distinction between bot operators, by analysing the number and value of the edits make in each batch. I set out that data in an about half-a-dozen threads last year.

I understand your frustration. But you repeatedly choose to ignore the evidence of the cause of that problem, and instead you lash out indiscriminately, and advocate solutions which would sabotage most of the productive work of the bot. BrownHairedGirl (talk) • (contribs) 19:15, 15 May 2022 (UTC)

User:BrownHairedGirl running two jobs at once

Latest comment: 2 years ago68 comments8 people in discussion

I would like to know how in is possible that User:BrownHairedGirl has consistently been running two jobs at once for the past week or so? And then has the gall to complain about my past behavior? She needs to stop running two jobs at once, as it clogs the bot for other users—no matter that her jobs seem important to her, or make a lot of edits. Abductive (reasoning) 18:23, 15 May 2022 (UTC)

Sadly but unsurprisingly, Abductive continues to fail to distinguish between my high-return batch jobs which tackle the the worst refs, and Abductive's own low-return speculative trawls which mostly make no edit at all to the listed pages.

If Abductive actually wants to unclog the bot, the remedy is for Abductive to stop running low-return speculative trawls. BrownHairedGirl (talk) • (contribs) 19:19, 15 May 2022 (UTC)

Well if Abductive's submissions aren't making any edits, and BHG's and others are making edits, than thats a clear cut case IMO. Its going to be hard to convince anyone that we should stop a highly productive edit run for one that barely does anything. Rlink2 (talk) 19:47, 15 May 2022 (UTC)

I have just spent about 5 hours today doing what I have done every day this week: checking Citation bot's history to examine the diff of every edit where the editsummary says that a bare URL was filled, while the bot works it way through my regular lists of articles with bare URLs. (It is currently processing articles with bare URLs tagged in May 2022, as of 13 May)

In each case, I copy the domain name of the filled ref to a list in my text editor. Periodically I purge it to remove domains which were in previous lists. So far, this afternoon's list has 172 new domains. In a few hours, I will stop list-making, by which time the list will probably be over 200 domains.

With that list, I wrap each domain in a wiki-search regex. Then I take each line in turn, and use a wiki-text-search within AWB's listmaker to find articles with one or more bare URL refs to that domain.

When complete, I sort the list and remove duplicates.

That builds the list of articles which I feed to Citation bot.

By doing this, I create an article list which consists solely of articles where both the following conditions apply:

articles which currently have bare URLs (tagged or untagged), and
in every case at least one of those bare URLs is to a domain which should be fixable, 'cos Citation bot has filled a ref to that domain today.

That has produced very high returns, which I have been pursuing that approach this week. The result is that so far this month, the total number of articles with bare URL's has fallen by well over 10%. (The number of untagged bare URLs has fallen from ~~116K to ~98K, the first time that the tally has been below 100K since I started this work with the tally at over 470K).

Abductive wants me to stop this productive work, in order to facilitate Abductive's low-return speculative trawls. My answer is a firm "no".

Obviously, if there is a consensus that the community would prefer the bot's resources to be used for yet more of Abductive's unproductive, no-effort-required clog-the-bot-with-categories speculative trawls, then I will abide by that decision. --BrownHairedGirl (talk) • (contribs) 20:50, 15 May 2022 (UTC)

It would be a different situation if the dispute was the type of edits being done (my batch is adding DOI information, but yours batch is adding date information). However the dispute here is having more articles be edited vs less. I can't really come up with any plausible explanation to show how using the bots capacity to focus on articles on which it will be less effective is a good idea.

When BHG makes a list, she puts in alot of time and effort into the list, and it shows. However, the lists Abductive seem to be putting do not seem to be not controlled for quality and is basically random, which results in a much lower efficency rate. If Abductive wants to use the bot, then he should make sure his list meets a certain quality and results in articles actually being edited.

I see the consensus to be in BHG's favor. I think Abductive should stop using the bot until he's figured out how to make high-throughput lists. Rlink2 (talk) 21:17, 15 May 2022 (UTC)

The bot only has four channels. If User:BrownHairedGirl uses two of them, and two other users run batch jobs (note: all other users' batch and category jobs are as "inefficient" as mine) the nobody can run the bot on a single page. The bot is supposed to run through categories and do the tedious work of deciding which ones to make edits to—this is the bot functioning as intended, not some sort of misuse of the bot. There was a user who was using the bot on more than one job at time; the bot was then reconfigured to not allow that. User:BrownHairedGirl has always called out other users when she discovered them running the bot on two jobs, even though they did it inadvertently. Now she has discovered a way to "game" the bot to run two jobs, and is doing it constantly. I barely use the bot anymore, and I always make sure that there are two channels open before running a batch. My use of the bot is not problematic, but running two huge jobs at the same time is. In fact, that was why I requested to size limit of the runs be increased: so that BHG could run larger jobs, and not be tempted to run two at once. Abductive (reasoning) 21:42, 15 May 2022 (UTC)

Abductive writes The bot is supposed to run through categories and do the tedious work of deciding which ones to make edits.

That is utter bull. The purpose of the bot is to improve references. It has the ability to skip pages where it can make no improvements, but its purpose is not to spend its time skipping pages. (It takes just as long to find that no changes can be made as to it does to make the changes).

The only purpose of that bogus assertion is to try to justify Abductive's longstanding habit of wasting the bot's time on low-vale trawls. Every bot benefits from targeting, and nearly all bots use some form of targeting. This one is no exception.

Abductive writes My use of the bot is not problematic. Wow! This is the tactic known as as the Big Lie: assert a known falsehood, and try to make it stick. After 8 months of explanation to Abductive about how their batch jobs are unproductive, it is extraordinary that Abductive is still in denial ... and even more bizarre that he still expects someone to believe him.

The issue here is very very simple. Abductive wants to displace highly productive jobs to make way for unproductive jobs. I have no idea what on earth motivates that desire, but it is clearly not the objective of someone who is here to improve the encyclopedia. BrownHairedGirl (talk) • (contribs) 22:09, 15 May 2022 (UTC)

The issue is you are taking up 50% of the bot's capacity for days at a time. The bot was reconfigured to stop users doing that. It is especially bad for people editing a single article, and can't get the bot to run. Abductive (reasoning) 23:28, 15 May 2022 (UTC)

BHG has been using Citation bot for longer than I can remember. If there was some sort of issue with her usage of the bot, it would have been raised earlier. Why is it after Abductive's picks up their usage of Citation Bot when the complaints come rolling in? Rlink2 (talk) 23:57, 15 May 2022 (UTC)

This is a "where there's smoke there's fire" argument. But If you actually look through the bot contribs, you'll see that I am not abusing the bot, and she is. Abductive (reasoning) 00:09, 16 May 2022 (UTC)

So, in Abductive's alternate reality, it is disruptive for me to use the bot productively ... but not disruptive for Abductive to use it unproductively.

Bizarre. BrownHairedGirl (talk) • (contribs) 00:28, 16 May 2022 (UTC)

It has caused problems only when you decided to run some of your low-return speculative trawls at the same time.

The solution is simple: stop doing these low-return speculative trawls.

It is now nearly a year since you began expressing your persistent desire to displace productive uses of Citation bot, absurdly claiming that your abuse of the bot for unproductive purposes is "valid". Classic WP:NOTHERE stuff. BrownHairedGirl (talk) • (contribs) 23:57, 15 May 2022 (UTC)

What you call "speculative trawls" is an approved function of the bot. Any user who runs a category would be running a "speculative trawl". Abductive (reasoning) 00:10, 16 May 2022 (UTC)

Again, not true. And very simple.

As has been explained to Abductive many times over the past year, there are categories which concentrate problems which the bot can fix. Feeding those categories to the bot can be very productive ... but randomly-chosen content categories are not productive.

Note that Abductive is still wikilawyering (badly) in pursuit of his desire to displace highly-productive bot jobs, allowing Abductive to run unproductive tasks instead.

This desire to actively impede improvement of Wikipedia is classic WP:NOTHERE stuff. It's time for Abductive to explain what exactly motivates him to impede improvement of Wikipedia. Does he just like pressing the button on the bot? Does he actively want to stop more refs being improved? Is he somehow resentful that he is unable or unwilling to devise productive bot jobs?

Go on, Abductive. Please do tell why you are so keen to make the bot less productive. BrownHairedGirl (talk) • (contribs) 00:25, 16 May 2022 (UTC)

First off, this thread is about you using 2 of the 4 channels available to the bot. For those that don't know, the bot can only run four jobs at one time. Now, if these are singleton jobs, done by the hundreds (or thousands?) of users who have activated the gadget in their preferences, they should go fast. These uses of the bot do not appear on the bot's user contributions page. But the bot also allows any user to activate it on a category, a userpage, or on a manually entered list. There are a number of users who make a regular practice of activating the bot on such jobs (batches). Just looking over the past few days, there are about a dozen such users active. When there are three batches running, clicking the button while editing an article often times out. I know this because I use that button when creating an article all the time. When there are four batch jobs running, clicking the button always times out, leading to much frustration and complaints here on this talk page. When there are four batch jobs running for couple of hours (or three?) the queue fills up and everybody gets a 503 error. Thus, it behooves us, the big users of the bot, to moderate our usage so as not to lock out the editors who are trying to create content from using the bot on the article they are working on. Now, if one user has discovered a way to trick the bot into running two jobs, and then runs jobs that take almost a day to complete, this greatly increases the chances of time-out failures and 503 errors. This is because all it takes is two more users to active a batch while two jobs are already running. Abductive (reasoning) 03:38, 16 May 2022 (UTC)

I know all that. Most other user of CB know all that.

As already noted, the solution is for Abductive to not clog up the bot with speculative trawls. BrownHairedGirl (talk) • (contribs) 03:45, 16 May 2022 (UTC)

I use the bot one job at a time. Even if I stopped using the bot, if four jobs are run at the same time, nobody else can use the bot. This has already happened since you selfishly started running two jobs. It will happen again, and then what will you do? Tell the other users running batches that it's their fault? Abductive (reasoning) 06:41, 16 May 2022 (UTC)

First off, this thread is about you using 2 of the 4 channels available to the bot. Now you are trying to change the subject by completely brushing off anything BHG had to say about the quality of your lists. If there are 2 issues, we'll discuss the two issues at the same time. No need to ignore one of them.

What metric do you use to create your lists? What is your intent with the lists? BHG has shared her process, so it would be fair for you to share yours. I looked at the contrib list and I see that BHG is basing the jobs off her lists, while you are just basing it off random categories. So it would be intresting to see Abductive's rationale. Why those categories specifically? Rlink2 (talk) 14:56, 16 May 2022 (UTC)

How do I select them? First I look to see if the bot is overloaded, in which case I don't do a run. If there are zero jobs running I hastily choose a small category to give the bot something to do. I might select a category such as Garden Plants, an area in which I am an active editor. Or I might select a maintenance category, or something in Portal:Current events. If it is a large category, I check through a few histories to see if the bot has been run on members of that category recently.

All users of the bot should be treated equally. A user taking up 50% of the bot's capacity denies use of the bot to the entire rest of Wikipedia. Also, all other users who select categories have the same or lower rate of return as I do. I went through the last few days of the bot's contribs; here are the rates of return for categories—see if you can pick mine from the list: 6.7%, 10.2%, 12.2%, 13.1%, 16.3%, 17.3%, 22.7%, 28.2%, 30.6%, 31.1%, 43.8%, and 63.3% (hint; one of mine). Here are some of BrownHairedGirl recent rates: ~~14.0% (399/2841, this batch took over 25~~ 49 hours to run), 17.6%, 35.4%, and 47.0%. Abductive (reasoning) 17:09, 16 May 2022 (UTC)

A user taking up 50% of the bot's capacity denies use of the bot to the entire rest of Wikipedia. I don't know if BHG is using 2 slots or not. Maybe she is, maybe she isn't. But when she was using 2 slots no one was complaning. The complaining only started after your resumed your usage of the bot.

Assuming that your numbers were calucated with no error, the presumed low rate could be because maybe the bot is reaching diminishing returns at this point. BHG will continue to retarget and focus the lists she feeds to the bot.

The average return for categories (using your numbers) is 24%. The average return for BHG's batches is 33.3% (and that is just using 3 numbers, the real rate is probably higher). So clearly, BHGs approach of targeted lists is better than random category selection. Rlink2 (talk) 18:01, 16 May 2022 (UTC)

You can tell that she is using two channels by looking at the bot's contribs. She started complaining when I returned to the talk page to request an expansion of the size limit. I have been consistently using the bot without interruption—again, check the contribs. The categories are dinged on their rate of return because the bot counts subcategories in the total but doesn't edit them. Also, running a small category is less likely to lock people out of using the bot than a large job, and certainly less than two large jobs. Currently the bot is configured to allow category jobs 1/4 the size of list jobs because categories have a lower rate of return. Are you saying that the category function is somehow illegitimate? Are you saying the the dozen or so people in the past few days who ran categories are breaking a rule? Should they be stopped? Abductive (reasoning) 18:33, 16 May 2022 (UTC)

It's long past time for Abductive to explain why they persist in clogging-up the bot with low-return tasks when there are higher-return tasks ready to roll. BrownHairedGirl (talk) • (contribs) 19:26, 16 May 2022 (UTC)

~~Why don't you explain your recent job of 2841 articles that only edited 399 of them?~~ Enough with the deflecting. Abductive (reasoning) 02:09, 17 May 2022 (UTC)

If you pick categories at random then of course you will have some times where throughput is really high. And there will be times where the targeted lists where throughtput is low. But we took the average of both, and BHGs approach works out better than the random approach by a significant amount. Rlink2 (talk) 02:52, 17 May 2022 (UTC)

My choices of how to use the bot are not random, but if randomness is bad, why don't you go to Wikipedia:Random page patrol and demand that they disband, and start a thread at the Village Pump demanding that the Random article link in the sidebar be removed? And answer the question; if my runs are above the average for category runs by other users, why don't you demand that they stop too? Abductive (reasoning) 05:57, 17 May 2022 (UTC)

@Abductive: I am not the one deflecting. You are engaged in a year-long process of trying to deflect attention from your abuse of the bot by vast numbers of speculative trawls.

Please identify your claim that I made a recent job of 2841 articles that only edited 399 of them.

I think that if the claim is true, I can guess which one that might be (and why) ... but if you are actually serious about having a meaningful dialogue rather than your usual defection tactics, please stop the passive-aggressive games and provide the links which would allow me to verify your claim and identify the reason. BrownHairedGirl (talk) • (contribs) 03:29, 17 May 2022 (UTC)

Another way of interpreting these past events is that I have done nothing wrong or unusual, and that you have singled out me for unwarranted castigation for reasons unknown. You have started threads about me and other users accidentally running two jobs at once, and then when you do it and I complained, point to your alleged better use of the bot and refuse to stop. Moreover, the difference in bot output between my jobs and yours is underwhelming. It is especially strange that you keep pointing at me because as I have demonstrated, I do better than the average user when calling for runs of categories (and my occasional batch run) for a very simple reason; I know how to find articles that have more citations than the average article. ~~No doubt you know that the search you conducted on your recent batch that underperformed was flawed in some way.~~ I have no problem with that (but one could argue that in your haste and under pressure to create two lists to run the bot on, you are getting sloppy). The only problem is that you are running two jobs at once—even if they had a 100% edit rate instead of your usual rate (which I guess is around 50%) you are still hogging resources meant for everybody—when other people try to run the bot on two jobs, they get the message Run blocked by your existing big run. Abductive (reasoning) 05:57, 17 May 2022 (UTC)

Quit waffling. You made an allegation, so post the links to support it. BrownHairedGirl (talk) • (contribs) 06:43, 17 May 2022 (UTC)

Okay, if you look at the bot's contribs, right now you are running two jobs in violation of the bot's rules. One job has 1432 members and one has 3847. Abductive (reasoning) 07:33, 17 May 2022 (UTC)

Abductive's attrition strategy of deflect-and-counter-attack has reached a new low: a claim about my work which was unsupported usupported by evidence, and after my 3 demands for the evidence, Abductive came to my talk to tel mw to do my own research into his claim. I refused, and when he went to collect the evidence, he had to admit that it was bogus. See User talk:BrownHairedGirl#So_we_don't_get_distracted;_you_may_find_this_useful (permalink)

This time-wasting menace has had enough of my time (as well as enough of Citation bot's time), so I won't reply further in this thread.

It would be nice if Abductive made an unequivocal apology and strck out the paras of nonsense based on his bogus claim, but I don't expect it. --BrownHairedGirl (talk) • (contribs) 07:38, 17 May 2022 (UTC)

Yes, I screwed up the example of a job where I said User:BrownHairedGirl got a 14% rate of return, and I apologized for that. But all I had to do was look in the bot's contribs to find one where her rate of return was 10.9%. Interested readers can see it for themselves here: https://en.wikipedia.org/w/index.php?title=Special:Contributions/Citation_bot&offset=20220516025921&limit=500&target=Citation+bot. At 22:22, 15 May 2022, BHG starts a job with 304 articles. The bot made only 33 edits to that batch, which ended at 00:27, 16 May 2022. But running the bot on any list or category, even one with a low rate of return, is a perfectly legitimate use of the bot. No, the real problem is while that job was running, she was simultaneously running a job on 2061 articles, beginning at 14:46, 15 May 2022 and ending at 06:34, 16 May 2022. That batch saw 447 edits, or a rate of return of 21.7%. Running two jobs at the same time is an abuse of the bot. Abductive (reasoning) 19:15, 17 May 2022 (UTC)

So: no proper apology, just an another attempt to cherry-pick a batch to prove a point and try to deflect attention away from Abductive's own endless run of low-return speculative trawls.

However, Abductive fails to follow the evidence properly. Like all my main work with Citation bot over the last year, that job of 304 articles is fully documented, in his case at User:BrownHairedGirl/Articles with bare links (edit | talk | history | links | watch | logs). That particular list of 304 articles is at special:permalink/1088043457#Lists, which is the list titled "Lists Bare URLs tagged in May 2022, as of 13 May - part 3 of 3". That page:

set out the selection criteria
lists all the articles which have been submitted to the bot in that batch

Note that in every case the articles have been chosen because every article on the list has a bot-fixable problem: a bare URL which should be filled. Note that to ensure that the bare URLs in these lists are actually fillable, I spent hundreds of hours tagging dead bare URLs with {{Dead links}} (including over 60,000 in February alone, using a suite of Perl scripts which I wrote for the purpose). For the same reason, I have also tagged bare URLs of types which the bot cannot fill: see the transclusions of {{Bare URL PDF}}, {{Bare URL image}}, {{Bare URL plain text}} and {{Bare URL spreadsheet}}. Bare URLs with those tags are ignored when making the list, adding a further level of targeting.

In the last year I have put over 3,000 hours of work into this project to tag and fill bare URLs is well documented. It has reduced the total number of articles with bare URLs from over 470,000 in May 2021 to ~140,000 in May 2022, despite articles with new bare URLs being added at a rate of over 300 per day. Anyone can roughly verify those numbers by running this (slow) search for untagged Bare URLs, and by looking at the page count of Category:All articles with bare URLs for citations (currently 39,425). Note that the total of those two is an overestimate, because they overlap.

Yet despite all that, I am being attacked here by a menace of an editor who has hogged the bot for much of the year, but has never to my knowledge documented any of their uses of the bot. Abductive has never explained the basis for choosing a single one of the thousands of categories which they have fed to the bot, not for a single one of the tens of thousands of individual pages with they repeated clog the bot's queue by piling up so many requests that other requests time out.

Citation bot is a invaluable tool, which is at its most effective when used in a targeted way. But despite Abductive having been asked many time by many editors to stop their speculative trawls and stop clogging the bot queue, they come here with the utter hypocrisy to accuse me of "abusing" the bot.

This is Wikipedia at its worst. The editor running a long-term project, thoroughly-documented, with proven success is being attacked by a long-term menace who puts precisely zero verifiable effort into selecting the batches with which they hog the bot. And they are free to waste hours of my time in this vindictive effort to smear my work, whilst making zero contribution to assist it.

Abductive complains about me running two batches at once. But since Abductive is interested only in attack and in finding decontextalised nuggets in the hope hat if hey hurl enough muck some of it will stick ... and in the knowledge that whether or not it sticks, they can waste hours of my time in rebutting their smears. This is a classic attrition strategy, which sadly is usually tolerated on wiki, and worst still actually empowered by the community's enthusiasm for yelling "uncivil" anyone who replies harshly to this sustained goading.

In this case, the reason for my running two simultaneous jobs is already explained above, in my post of 20:50, 15 May 2022.[24] The batch which Abductive points to here was a batched of tagged Bare URLs. Those batches have a much lower low rate of return than my batches of untagged bare URLs, because many of those tagged bare URLs were tagged after failing to be filled by other tools. In other words this is the return on the most problematic set.

To try to get added benefit from that low-return set, I have taken in the last ten days to leveraging those low-return batches to feed other higher-return jobs. I have done that by checking each edit in which the bot has filled a bare URL, finding domains where the bot has filled a ref, and building secondary lists which consist solely of bare URLs to fixable domains. This is very time-consuming work, but it has been very effective, reducing the total number of articles with bare URLs by almost 20,000 in the last 15 days (a fall of over 10%), and reducing the number of bare URLs on many of the remaining pages.

These two tasks have to work in parallel: I can build the lists of fixable bare URLs only by analysing the previous batches.

But Abductive objects, because me doing this reduces the number of bot slots available for Abductive's low-return, undocumented, unexplained speculative trawls. And Abductive calls my productive work an abuse of the bot. YCMTSU.

Now, having wasted yet another hour of my time on rebutting Abductive's malicious nonsense, I need to go cook supper for my hungry partner. BrownHairedGirl (talk) • (contribs) 20:57, 17 May 2022 (UTC)

Honestly I think things will be easier if Abductive stopped using the bot until he releases a detailed plan about how he plans to use the bot and people agree to the plan. To his credit, I think he has explained a little bit of his methology, but not in a way that everyone can understand.

As for randomness, I think Abductive is missing the point that Citation bot has limited capacity. People can read articles and click the "random article" button at the same time. People can do random page patrol and write articles at the same time. But citation bot only has limited capacity, so it must be used in a way that maximizes its capacity. BHG is doing her best to maximize the output of the bot. Rlink2 (talk) 22:19, 17 May 2022 (UTC)

It's a double standard that BHG can constantly run huge jobs (sometimes two at a time) that get lower rates of return than my runs, then tell me that my above average, infrequent, small jobs are somehow anybody's business. Especially if the bot is not overloaded, which I try my best not to do. Abductive (reasoning) 02:08, 18 May 2022 (UTC)

Yet again, Abductuve ignores nearly all that has been written and cherrypicks an edge-case job of mine to misrepresent it as typical of the whole of my work ... and misrepresents Abductive's own jobs as more productive, when most of their edits are trivial, and zero evidence s offered of their climed effectiveness.

This is Abductive's attrition strategy: throw out masses of unevidenced false assertions, knowing that it will take others ages to document their falseness ... and then Abductive will ignore the rebuttals anyway. BrownHairedGirl (talk) • (contribs) 03:11, 18 May 2022 (UTC)

I invite interested readers to look through the bot's contribs and see for themselves what is really going on. And I always respond to fellow editors. Abductive (reasoning) 03:34, 18 May 2022 (UTC)

They could usefully start by looking at my latest completed batch of articles with known-fixable bare URLs. This list of 2800 bot edits shows the whole of my latest batch of 1,850 articles with articles with known-fixable bare URLs. 607 of the 1,850 pages were edited; 56 of those edits filled at least one bare URL (look for "Changed bare reference to CS1/2" in the editsummary).

A explained above, that highly-productive run was possible because I found the fixable bare URLs in the less-productive run of tagged bare URls.

But I am sure that this evidence of every effective targeting of a long-standing backlog will not deter Abductive from more sniping. BrownHairedGirl (talk) • (contribs) 06:46, 18 May 2022 (UTC)

You know what they'll notice? That four jobs are running right now, including two of yours, and that means nobody else can use the bot. Abductive (reasoning) 07:00, 18 May 2022 (UTC)

No, that's not what they will notice because most editors don't get to see the engine room. But what they do see is nearly cosmetic edits like this one of yours and want to know why it makes it so important that their one-shot job timed out yet again. And how many nul evaluations were run against how many pages to deliver that one stunning edit. --John Maynard Friedman (talk) 07:17, 18 May 2022 (UTC)

Above, on this talk page, you complained that you got timed out. Now you are complaining again that your one-shot timed out. But here's the thing: because there are four batches running (none of them are mine) everyone else who tries to use the bot right now will get the time-out failure. Who ran the fourth job that just locked everybody else out? The user who was already running a job. Why did the bot do a small edit on swastika? I dunno, I ran the bot on it because it was on the Front Page, along with all the other articles that were on the Front Page at that moment. Abductive (reasoning) 07:48, 18 May 2022 (UTC)

My bad, it seems the dam has broken and the bot is now running requests I made 7 hours ago. Abductive (reasoning) 07:57, 18 May 2022 (UTC)

Indeed, @John Maynard Friedman, a high proportion of the bot edits triggered by Abductive are purely cosmetic. Hyphen-to-dash in that case, and lots of edits which just change a curly quote mark to a straight one, or remove redundant parameters or change the template type. The bot should not even be making those changes as standalone edits: such trivial tweaks should be done only as part of a more major edit.

Meanwhile my run of 1850 articles includes edits like this one[25], which filled 34 bare URLs refs. It didn't happen by accident; it was part of my targeting of bare link Youtube refs, the tally of which has in the last few days has been brought down from over 1,000 to under 100.

All those null evaluations in Abductive's speculative trawls take the same amount of bot time as evaluation which actually find a needed fix. Some bots can evaluate with trivial effort, and do the heavy listing only if a problem is found; but with Citation bot every page gets the full check ... which is why abusing it for speculative trawls is so wasteful. BrownHairedGirl (talk) • (contribs) 07:39, 18 May 2022 (UTC)

You just ran some huge jobs that got 10.9% and 21.7% edits. Talk about wasted bot time. Abductive (reasoning) 07:48, 18 May 2022 (UTC)

As usual, Abductive makes no attempt to identify the jobs, so that their claims can be verified.

Given Abductive's bogus claim above, it would be very foolish to trust on their unevidenced assertions.

And of course, Abductive continues to ignore the pint that I have explained several times in this thread: that my run of tagged URLs are returning low edit rates ... but the provide the data for much-higher return runs of known-fixable URLs.

This is part of a consistent pattern that I have seen in the past year in discussions which involve Abductive. Even when evidence is provided to show that Abductive's claims are false or misleading, Abductive ploughs on with the same bogus assertions.

That repetition of claims which are known to be false or misleading is a smear tactic which should have no part in collegial discussion. BrownHairedGirl (talk) • (contribs) 01:24, 19 May 2022 (UTC)

Dash changes are not cosmetic nor are curly quote changes. It may be worthwhile to make these as minor edits, but they are not cosmetic. Izno (talk) 19:40, 18 May 2022 (UTC)

I said "almost cosmetic". The kind of change you might do in passing in the course of a significant edit. Yes, if there were no problems with capacity, it would be unremarkable. But while there are and the effect is to get in the way of more productive edits, then it is certainly not worthwhile to make these trivial edits. --John Maynard Friedman (talk) 20:02, 18 May 2022 (UTC)

While we can look at Special:Contributions/Citation bot to find the edits the bot has made, does the bot actually log anywhere edits made versus the batch size it was requested to search against? I've seen on this page and on BrownHairedGirl's talk page, Abductive pointing out what appear to be unsupported percentages relating to job success rates. Just a few hours ago Abductive said You [BHG] just ran some huge jobs that got 10.9% and 21.7% edits. How has this success/fail rate been calculated? It seems to me as though at least some of this conversation would be easier to agree/disagree with if we had verifiable numbers.

That said, while I can sympathise with Abductive's point to a degree, I also agree that BHG's use of the bot is very productive and certainly more so at a glance than Abductive's. If I check the most recent 50,000 edits by Citation bot, only 146 of those are by Abductive. What was the batch size that resulted in those 146 edits? And is that actually helping to address issues like link rot, bare or partial citations, or is it primarily cosmetic per the example given by John Maynard Friedman?

I definitely agree though, regardless of the answers above, that targeted use of the bot, as BHG is doing, is a far more effective use of its resources than running it against random categories. At the very least, running it against maintenance categories like Category:Articles with bare URLs for citations is a much better idea than running it against (picked at random) Category:Military history Sideswipe9th (talk) 16:15, 18 May 2022 (UTC)

The batch sizes are given behind the slash. So right now a job is running that happens to have 639 members. Then I typically expand the number of contribs to 500 or more (for example 3500) and use 'Ctrl F' to have it count the number of instances that /639 appears (once the job is done, of course). Dividing gives a rate. Abductive (reasoning) 16:51, 18 May 2022 (UTC)

So does that mean, to pick one of your edits as an example Add: s2cid. | Use this bot. Report bugs. | Suggested by Abductive | #UCB_toolbar then that this edit was not requested as part of a batch, but as a single page request?

If so, then how are you determining your success rate? And how can we verify that? While there were 146 edits at the time I made my reply, 157 now, all of them have UCB_toolbar and no number at the end of the edit log. If the contributions page for the bot only shows successful edits, how can we determine how many unsuccessful single page requests you have made? Sideswipe9th (talk) 17:00, 18 May 2022 (UTC)

With those I usually just do all the articles that appear on the Front Page every day, and the Recent Deaths, but I skip ones that I have done before, like World War II, so it would be difficult for an outside observer to see what the success rate was. But if an article has very few refs, it goes very fast, and if it has a lot of refs, it usually finds an error. I estimate around 40% for the ones I click. My success rate for categories is higher that the average user who selects a category, because I pre-screen those by looking at a sample of histories to see if it or a related category has been run on them lately. I also look for topics that have more citations, as they will have a higher success rate. But I must defend the rights of all users to run whatever they wish. For example, Category:Theoretical computer science stubs would have a poor success rate, but also be over very quickly since there aren't very many citations to check. Abductive (reasoning) 17:16, 18 May 2022 (UTC)

But I must defend the rights of all users to run whatever they wish. If someone else is having an issue using the bot, they can speak up. If you are unable to use the bot, say so. Maybe its working for everyone except you. It is best to speak for yourself. Rlink2 (talk) 19:32, 18 May 2022 (UTC)

Well it is time to challenge the "right of all users to run whatever they wish", when their justification is so thin that the effect of what they are doing is WP: disruptive and one has to ask whether Wp:nothere applies. See also Tragedy of the commons --John Maynard Friedman 20:02, 18 May 2022 (UTC)

I agree with @John Maynard Friedman, and I find John's pointer to Tragedy of the commons very timely.

We have in Citation bot a very powerful tool, which can do great work to improve Wikipedia's compliance with the core policy of WP:Verifiability.

However, the bot has limited capacity. So we have a choice of approaches:

Strive for efficiency: We try to regulate use of the bot to improve its efficiency, at least by eliminating the least efficient uses
Free for all. We say "sod efficiency", and just leave editors to use the bot however their whimsies take them.

I am firmly in the "strive for efficiency" camp, and I agree with John that there are WP:NOTHERE issues with the other approach. Whatever You Want is great music, but a terrible way to allocate scare resource. BrownHairedGirl (talk) • (contribs) 20:47, 18 May 2022 (UTC)

If User:BrownHairedGirl gets her way and prevents me from using the bot, there will three outcomes:

She will go after other users, inventing rationales as she goes along. Remember that the other users of the bot are less "efficient" than I am.
The bot will continue to be clogged, frustrating those even more casual users that have little idea about how the bot works.
She will give people who complain no recourse, using the rhetorical methods on full display here to attack them for daring to complain. Abductive (reasoning) 04:35, 19 May 2022 (UTC)
More nonsense. Abductive makes unevidenced claims about my future actions, which is a form of personal attack. Abductive would be well-advised to withdraw that post.

For the record, all the other users of the bot with whom I have discussed prioritisation have been reasonable people. They have all been wiling to consider the issues without making the counter-attacks and false denials used by Abductive, and the issues have been resolved amicably.

As to my rhetorial methods, I will continue to document and explain my use of the bot and I will continue to object to unevidenced claims about my actions. It is very revealing that Abductive finds this so threatening.

I suggest that Adbuctive goes to WP:ANI to lodge a complaint about the evil BHG documenting her own work and objecting to unevidenced allegations against her. Then we can see if I get burnt as a witch. BrownHairedGirl (talk) • (contribs) 04:59, 19 May 2022 (UTC)

Document your own actions of using 50% of the bot's channels, and (sometimes) running extremely low "efficiency" jobs, while trying to make that the metric that other users are judged by. Abductive (reasoning) 06:04, 19 May 2022 (UTC)

Again, Abductive blatantly cherrypicks fact to try to create a case.

For the umpteenth time, Abductive: the lower-return jobs are being used as the data collectors for a bigger set of highly-productive jobs.

Abductive's determination to misrepresent that situation has been repeated and corrected so many times that it no longer excusable as error. It is clearly a pattern of wilful dishonesty by Abductive, designed to mislead other editors by knowingly painting a false picture of my work.

This deliberate repetition by Abductive of known falsehoods is the smear tactics of a propagandist. It should have no part in a collegial discussion. BrownHairedGirl (talk) • (contribs) 06:42, 19 May 2022 (UTC)

If we can't take a third option of adding another two or more channels to the bot, to increase the resources available to it, perhaps even "protected" in some way to only allow the types of use that Abductive is advocating for, then I would be in favour of striving for efficient use of resources over allowing a free-for-all. Sideswipe9th (talk) 21:42, 18 May 2022 (UTC)

Abductive is advocating for the free for all, not efficiency. Surely you mean BHG here. Headbomb {t · c · p · b} 21:51, 18 May 2022 (UTC)

Perhaps I'm mistaken, but the core of Abductive's complaint seems to be that his requests to CitationBot keep timing out because the bot is busy? The third option would be to add one or two extra channels to the bot, with that reserved solely for the purpose of immediate or near-immediate use on a single article. The sort of batches being queued up by BHG and others would not be allowed on this specific "single use" channel. Then the the other channels could continue to be used as they are presently. Or perhaps I wasn't clear when I said that. Sideswipe9th (talk) 22:09, 18 May 2022 (UTC)

@Sideswipe9th: those ideas are attractive, but but ...

The idea of more capacity for Citation bot has been discussed to death. Lots of support for it, and the bot's wonderful maintainer @AManWithNoPlan would like to do it, but doesn't know how ... and his requests for technical assistance have brought no response. So, not likely to happen.
The bot relies heavily on the zotero servers, which have limited capacity. Already, when the bot is v busy, the zoteros fail to return info on pages which they could process at other times. So there is a risk that more CB channels might just result in a lot more false skips, with little extra editing done.
Citation bot is such a wonderfully helpful tool that I think demand is always likely to tend to exceed available capacity. So I expect that frivolous batches will always be something to avoid. BrownHairedGirl (talk) • (contribs) 05:12, 19 May 2022 (UTC)

Just a small technical point in reply to @Sideswipe9th's thoughtful comment.

running it against maintenance categories like Category:Articles with bare URLs for citations is of course right in spirit, but not quite complete in technical detail.

I know that Swipe's comment was intended as a suggested approach rather than a how-to, but I just thought I'd note the issues in case anyone is inclined to literally try that well-intentioned example.

That particular category is a container. For all articles with WP:Bare URLs, see Category:All articles with bare URLs for citations, or see the monthly subcats such as the current Category:Articles with bare URLs for citations from May 2022.

But ... beware.

Those categories include articles identified as having bare URLs which are not bot-fixable: Citation bot cannot get a title for a PDF file or an image or a spreadsheet. Most of those unfixable file types are tagged with specific templates such as {{Bare URL PDF}}, {{Bare URL image}}, {{Bare URL plain text}}, and {{Bare URL spreadsheet}}. Asking the bot to fill those refs will just waste the bot's time ... and about 75% of all the currently-tagged bare URLs are one of those bot-unfixable types.

So the method I developed is to select only articles tagged with {{Bare URL inline}} and/or the banner template {{Cleanup bare URLs}}. That means that the bot will not waste its time on articles where all the bare URLs have been identified as PDFs. For an example, see my current batch Bare URLs tagged before May 2022, as of 16 May - part 2 of 3, which was built using this Petscan search and then refined by an AWB pre-parse.

I doubt that anyone else will try to replicate that particular task. I am just using it to illustrate how a lot of care is need in identifying which maintenance categories actually collect bot-fixable issues, and hence are a suitable basis for a CB batch job. BrownHairedGirl (talk) • (contribs) 21:30, 18 May 2022 (UTC)

I appreciate this clarification! The example categories I used were for illustrative examples towards a generalised approach, and not a specific how to as you've said. Thanks :) Sideswipe9th (talk) 21:38, 18 May 2022 (UTC)

Sideswipe, this is mainly in reply to your earlier comment. You may not realise that there are two categories of users of the bot: batch-runners exemplified by BHG and Abductive, and single-shot requestors – the 'ordinary' users like me. Batch runs don't time out: Abductive has never made that complaint. But they do hold one of the four 'channels' for as long as it takes to run the batch. I don't know what the queueing mechanism to submit a batch run. Single-shot requests are submitted in real time: if a channel is not available or becomes available within about five minutes, the request just fails (the underlying reason is time-out). It is we single-shot users who complain about time-outs. It still appears to be the case that there is no facility to prioritize or even reserve a channel for real-time usage. Most single-shot users have simply given up on the bot – but get very annoyed to see a trivial batch edit on a watched article containing an edit note inviting even more editors to try to use it in single-shot mode.

In my view, [while capacity is limited] every batch runner should be required to provide a justification for their jobs. It is clear that peer pressure is not going to work on those who believe in their right to run whatever they like and it is nobody's business to object. If that means a Star Chamber, then so be it. --John Maynard Friedman (talk) 07:45, 19 May 2022 (UTC)

@John Maynard Friedman: that idea of a requirement to provide a justification for each job is a good idea. I have thought about it myself for a long time, and have held myself to that standard from when I first started using the bot. I think that I am the only CB user to have done so.

It would be helpful to log with each batch request answers to 3 questions:

By what criteria were these articles selected?
Why were those criteria chosen?
Why do you believe that this batch will make a significant proportion of non-trivial edits?

But that would require a whole new queueing system akin to that used on InternetArchiveBot, which would be a non-trivial task to create.

So in the meantime the only options open to us are non-technical: to have a simple guideline urging restraint, and to ban editors who use the bot disruptively. Mercifully, there is only one such editor. BrownHairedGirl (talk) • (contribs) 08:19, 19 May 2022 (UTC)

Rather than approval for each job, I had in mind approval for a programme of work, which might need a series of jobs. Consider it like a driving license, a privilege which could be restricted to single jobs or even be suspended completely like a topic ban. You do realise of course that approval to run multiple concurrent jobs would require a higher standard of justification, like a bus-driver license.

So how can it be made to happen? --John Maynard Friedman (talk) 16:14, 19 May 2022 (UTC)

People make this more complicated than it actually is. The solution is simple: If Abductive is using the bot in a way that goes against consensus, Abductive should change his usage of the bot to suit the consensus reflected in the discussion. Its simple, at least I think it is.

If multiple people were causing drama, then a system like this would make sense. Rlink2 (talk) 21:27, 19 May 2022 (UTC)

That's also my reading of the situation.

And Rkink2's suggested remedy is similar to existing practice in other areas of en.wp. For example, we don't protect an article because one editor is being a menace; we topicban or block the rogue editor. BrownHairedGirl (talk) • (contribs) 08:16, 20 May 2022 (UTC)

URLencoded bare ref not filled

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed - I will need to hardcode this host by host to avoid disaster
Reported by: BrownHairedGirl (talk) • (contribs) 06:32, 19 May 2022 (UTC)

What happens: bot fails to fill 7 bare URL refs to Youtube where the link has been url-encoded. When I decoded the URLs, the bot filled all the refs.
(I have not tested whether this applies to other websites).
What should happen: the citation should be filled even if it is encoded.
Relevant diffs/links: special:diff/1088629373
Replication instructions: testcases at User:BrownHairedGirl/sandbox199, where the encoded and unencoded versions are listed side-by-side. Feel free to experiment there.
We can't proceed until: Feedback from maintainers

Bot fails to fill ref to bare URL followed by punctuation

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: BrownHairedGirl (talk) • (contribs) 08:35, 20 May 2022 (UTC)

What happens: ref is not filled until the punctuation is removed
What should happen: bot should remove the trailing punctuation
Relevant diffs/links: sandbox test at https://en.wikipedia.org/w/index.php?title=User:BrownHairedGirl/sandbox198&diff=1088818046&oldid=1088817981
Replication instructions: testcases at User:BrownHairedGirl/sandbox198.
Searches find for example: 344 pages where trailing punctuation is a full stop, 116 cases where trailing punctuation is whitespace then a full stop and 40 cases where the trailing punctuation is a comma. Insource searches will timeout on a search for all permutations, but I estimate the total to be over 0.5% of all remaining WP:Bare URLs.

Double edit for ProQuest

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: * Pppery * _{it has begun...} 16:31, 22 May 2022 (UTC)

What should happen: The bot does both sets of changes in one edit
Relevant diffs/links: Special:Diff/1089224844 + Special:Diff/1089224932

Incorrect case change bug

Latest comment: 2 years ago3 comments1 person in discussion

Status: Fixed
Reported by: -- Verbarson ^talk_edits 09:08, 22 May 2022 (UTC)

What happens: "journal=VII: Journal of the Marion E. Wade Center" changed to "journal=VII: Journal of the Marion e. Wade Center" (initial E made lower-case)
What should happen: no change needed
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Tolkien:_A_Cultural_Phenomenon&diff=1089166636&oldid=1068150363

I have reverted and added a comment to prevent recurrence in this case, however, there are similar cites in other Tolkien-related articles.-- Verbarson ^talk_edits 09:08, 22 May 2022 (UTC)

Also I have found (and will correct) this diff where the same happened 12 Dec 2021.-- Verbarson ^talk_edits 13:03, 22 May 2022 (UTC)

Strip #citeas from springer.com URLs

Latest comment: 2 years ago1 comment1 person in discussion

Status: Fixed
Reported by: Headbomb {t · c · p · b} 03:22, 22 May 2022 (UTC)

What should happen: [26]

I'm not sure if this an error, but I'm not sure the purpose of an action the bot took?

Latest comment: 2 years ago4 comments2 people in discussion

Specifically in this edit [27], it changed the URL from a .ca domain to a .com one. Does that make any substantial difference? I'm not sure what the purpose of it is, but it's quite possible I'm missing something blatantly obvious. I wasn't going to report it as an error/bug because I'm not sure if the bot is acting as intended and it just seemed better to seek clarification here. Clovermoss (talk) 20:48, 21 May 2022 (UTC)

Acting as intended. The reason is .com is country neutral. .ca is the Canadian interface, which differs substantially from the .fr interface or .cn interface etc... Headbomb {t · c · p · b} 21:33, 21 May 2022 (UTC)

Thanks Headbomb. Does this mean I should use .com instead of .ca where it's possible to do so? I'm Canadian so websites tend to redirect me automatically to .ca domains. Clovermoss (talk) 21:46, 21 May 2022 (UTC)

For sites like Google Books, yes. For something like lapresse.ca, no. Really, just edit the article normally, and when you're done just ask the bot to take care of . Headbomb {t · c · p · b} 21:48, 21 May 2022 (UTC)

Bot down

Latest comment: 2 years ago21 comments4 people in discussion

Presumably for maintenance. @AManWithNoPlan: any estimate of when the bot will be running again? BrownHairedGirl (talk) • (contribs) 20:09, 19 May 2022 (UTC)

Should be back up now. Sorry about that. AManWithNoPlan (talk) 20:10, 19 May 2022 (UTC)

No prob, @AManWithNoPlan. These things are necessary.

But now https://citations.toolforge.org/ is blank. BrownHairedGirl (talk) • (contribs) 20:12, 19 May 2022 (UTC)

Have you tried a hard-refresh? The reboot was needed as I changed underlying parameters to the web-server itself. The documentation is wrong, so I have complained about that. AManWithNoPlan (talk) 20:15, 19 May 2022 (UTC)

I apologize for the reboots, but the server should now have a much greater capacity. AManWithNoPlan (talk) 21:15, 19 May 2022 (UTC)

I wonder if we will hit any CPU limits now? https://grafana-labs.wikimedia.org/d/toolforge-k8s-namespace-resources/kubernetes-namespace-resources?orgId=1&refresh=5m&var-namespace=tool-citations AManWithNoPlan (talk) 21:19, 19 May 2022 (UTC)

Sorry for bugging you about the https://citations.toolforge.org/ page. The only prob was that my PC needed its monthly reboot.

What do you mean by the server should now have a much greater capacity? More channels? BrownHairedGirl (talk) • (contribs) 22:03, 19 May 2022 (UTC)

It will have move channels. AManWithNoPlan (talk) 22:12, 19 May 2022 (UTC)

Will at least one be dedicated to the gadget? Headbomb {t · c · p · b} 22:49, 19 May 2022 (UTC)

I am thinking about how to do that. AManWithNoPlan (talk) 01:18, 20 May 2022 (UTC)

Check out the network and CPU usage. It is up significantly https://grafana-labs.wikimedia.org/d/toolforge-k8s-namespace-resources/kubernetes-namespace-resources?orgId=1&refresh=5m&var-namespace=tool-citations&from=now-2d&to=now AManWithNoPlan (talk) 12:05, 20 May 2022 (UTC)

Ah yes, way up. And watching throughput on my batches, it seems to be faster. BrownHairedGirl (talk) • (contribs) 12:56, 20 May 2022 (UTC)

I have archived and shutdown the other two discussions on "abuse/misuse" of the bot. We revisit if new and improved bot still has issues. AManWithNoPlan (talk) 14:21, 20 May 2022 (UTC)

User:Headbomb and User:Anas1712 and User:BrownHairedGirl the bot was rebooted to increase capacity. Your jobs did not survive. AManWithNoPlan (talk) 19:58, 20 May 2022 (UTC)

Would it be at all doable, at some point in the future, to find a way, when rebooting the bot, to save currently-running jobs and their current state of progress so the bot can pick them up where it left off when it comes back up again? Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 20:51, 21 May 2022 (UTC)

Thanks for the notification, @AManWithNoPlan.

Will there be more reboots this evening? I don't want a re-run of yesterday, where I went through several rounds of a cycle rebooot -> restart job rebooot -> restart job rebooot -> restart job. BrownHairedGirl (talk) • (contribs) 20:10, 20 May 2022 (UTC)

I am done, and will not be working at all this weekend. AManWithNoPlan (talk) 20:11, 20 May 2022 (UTC)

Thanks for clarifying. Enjoy your weekend off!

I hope you take some time to celebrate your successful increase in the bot's capacity. You deserve a treat after that good work. BrownHairedGirl (talk) • (contribs) 20:14, 20 May 2022 (UTC)

I will try to. AManWithNoPlan (talk) 20:19, 20 May 2022 (UTC)

The need to "login" should be significantly reduced also. Total number of workers increased by a factor of 12. The problem with the bot not sending any output until the job is completely done is not fixed. That appears to be buffering somewhere that the bot either cannot control or else I cannot figure out how to control. AManWithNoPlan (talk) 20:56, 20 May 2022 (UTC)

That is even more good news. Apart from two minor issues, you have resolved some of he major issues which have plagued the bot for years: slow performance, and lack of capacity. And in the mist of all that, you kindly resolved the bare URL+punctuation issue which I raised.

These are all problems which bot users which bot users found exasperating, and which seemed intractable ... but now all sorted.

You really do deserve that celebration. BrownHairedGirl (talk) • (contribs) 21:31, 20 May 2022 (UTC)

^ "Species profile - Olearia cuneifolia". Queensland Government Department of Environment and Science. Retrieved 1 April 2022.

[DES-1] "Species profile - Olearia cuneifolia". Queensland Government Department of Environment and Science. Retrieved 1 April 2022.

[1]

User talk:Citation bot/Archive 31

Bot sputtering

DOIs that point to larger document

Olearia cuneifolia

undefined issue

Billboard dates

Link publisher?

Publisher type error

book reviews

ISBN in Cite web

Stalled job

Wrong S2CID

spelling change

yeah, I know, probably gigo...

Can be run locally?

ISFDB connected to incorrect URL

Default titles for dead links without flagging

Garbage title: ShieldSquare Captcha

When using a "new section" and then using a gadget, the section header title is not displayed correctly

Some cite magazine conversions

DOI Removal

Wrong publication dates from Apple Music

Caps: Journal of the International Association of Physicians in AIDS Care

Caps: BioMedical Engineering OnLine

Why is the bot processing batches of drafts?

Wrong publication dates from Apple Music

Change syntax of cite templates for future bot edits

Caps: Cutter IT Journal

Converts an arxiv link to cite web instead of cite arxiv

Fails to expand cite arxiv with a v#

Fails to expand when No title found

Cite news to cite journal conversion

incomplete expansion

Better hdl handling / cleanup

Cosmetic edit: Template capitalization

Better issue/date declusterfuckering

Proper conversion of cite journal |doi=10.48550/arXiv.####.##### to proper cite arXiv |eprint=####.#####

Added {Cite xxx}s clash with common-use {cite xxx |param1=value1 |param2=value2}

|chapter= is not a valid parameter for cite web

Bad series= on conference

External Relations as author

title=404 Not Found

Fails to properly TNT cite journal with journal = arxiv... / handle

Cite conference is book-title&title, not title&chapter

Trivial and undesirable changes to |work=

title=404页面

Caps: AORN J/AORN J.

Untitled_new_bug

title=Sign up | LinkedIn

El País is a newspaper, not a person

CAPS: For.

Double edit

Lower case to capital

Apostrophe

Changing correct url

Bot thinks El País is the name of an author

The bot was rebooted. I suspect all running jobs died

Incorrect change of template type

Slack in bot usage

User:BrownHairedGirl running two jobs at once

URLencoded bare ref not filled

Bot fails to fill ref to bare URL followed by punctuation

Double edit for ProQuest

Incorrect case change bug

Strip #citeas from springer.com URLs

I'm not sure if this an error, but I'm not sure the purpose of an action the bot took?

Bot down

Trivial and undesirable changes to `|work=`