Wikipedia:Bots/Requests for approval/Platybot
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: BilledMammal (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 08:51, Monday, July 8, 2024 (UTC)
Function overview: Adjusts templates based on provided JSON configuration files. This request is limited to Template:Cite news and Template:Cite web, and is primarily intended to correct issues where the work or publisher is linked to the wrong target.
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python
Source code available: Not currently
Links to relevant discussions (where appropriate):
Edit period(s): Initially, irregular one-off runs, with each held after significant expansions to the configuration file. Once most citations have been fixed I will open a request for continuous operation in a maintenance mode.
Estimated number of pages affected: Varies considerably based on configuration. This configuration, which applies to ten sources, will edit approximately 23,000. This configuration, which goes beyond correcting wrong links and also always inserts the correct link when one is missing, will edit approximately 450,000.
Namespace(s): Mainspace
Exclusion compliant (Yes/No): Yes
Function details: Adjusts parameters of Cite news and Cite web based on a configuration file. This configuration can be applied to any parameter, but the intent of this request is to apply it to the following:
- work
- publisher
- publication-place
- department
- agency
- url-access
It determines which change to apply based on current parameter field values. Any field or combination of fields can be used, but the intent of this request is to use the "url" field.
Adjustments can be specified as "always", "onEdit", or "never". When "always" is specified, if a change is identified as being desired for a parameter the article will be edited to implement it. When "onEdit" is specified, desirable changes are only implemented if we are already editing the page. This reduces the impact on watchlists by skipping articles that don't have high priority issues.
Configuration schema
|
---|
{ "$schema": "http://json-schema.org/draft-07/schema#", "type": "array", "items": { "type": "object", "properties": { "includes": { "type": "array", "items": { "type": "object", "properties": { "key": { "type": "string", "example": "url" }, "value": { "type": "array", "items": { "type": "string", "example": ["www.bbc.com", "www.bbc.co.uk"] } } } }, "description": "Lists conditions required to be met for this configuration to be applied to the template." }, "excludes": { "type": "array", "items": { "type": "object", "properties": { "key": { "type": "string", "example": "url" }, "value": { "type": "array", "items": { "type": "string", "example": ["www.bbc.com/sport", "www.bbc.co.uk/sport"] } } }, "description": "Lists conditions that must not be met for this configuration to be applied to the template." } }, "patternProperties": { "^[a-zA-Z0-9-]+$": { "oneOf": [ { "type": "array", "description": "Named for the parameter, and defines what will be done with it. Used when there are multiple possible configurations for the parameter.", "items": { "$ref": "#/definitions/parameter-config" } }, { "type": "object", "description": "Named for the parameter, and defines what will be done with it. Used when there is only one possible configuration for the parameter.", "$ref": "#/definitions/parameter-config" } ] } } }, "definitions": { "parameter-config": { "$schema": "http://json-schema.org/draft-07/schema#", "$id": "parameter-config", "type": "object", "properties": { "includes": { "type": "array", "items": { "type": "object", "properties": { "key": { "type": "string", "example": ["url"] }, "value": { "type": "array", "items": { "type": "string", "example": ["www.bbc.com", "www.bbc.co.uk"] } } } }, "description": "Lists conditions required to be met for this configuration to be applied to the parameter." }, "excludes": { "type": "array", "items": { "type": "object", "properties": { "key": { "type": "string", "example": ["url"] }, "value": { "type": "array", "items": { "type": "string", "example": ["www.bbc.com/sport", "www.bbc.co.uk/sport"] } } } }, "description": "Lists conditions that must not be met for this configuration to be applied to the parameter." }, "link": { "type": "string", "description": "Where the parameter should normally link to", "example": ["ABC News (Australia)"] }, "wikitext": { "type": "string", "description": "What the wikitext of the parameter should normally be", "example": ["ABC News"] }, "blacklist": { "type": "array", "items": { "type": "string", "example": ["ABC News (United States)", "ABC News"] }, "description": "Links that will always be removed" }, "greylist": { "type": "array", "items": { "type": "string", "example": ["Australian Broadcasting Corporation"] }, "description": "Links that will only be removed when already editing the page. Used to prevent edits that would only fix issues we consider minor." }, "whitelist": { "type": "array", "items": { "type": "string", "example": ["The Sunday Telegraph (Sydney)"] }, "description": "Links that will never be removed. Used when we believe editors may have deliberately provided a non-standard value that we wish to respect." }, "fixRedirects": { "type": "string", "enum": ["always", "onEdit", "never"], "default": "onEdit", "description": "Specifies when we will replace redirects to the provided link with the provided link." }, "fixDisplay": { "type": "string", "enum": ["always", "onEdit", "never"], "default": "onEdit", "description": "Specifies when we will replace the currently displayed text with the displayed version of the provided Wikitext." }, "fixOthers": { "type": "string", "enum": ["always", "onEdit", "never"], "default": "always", "description": "Specifies when we will replace links to pages that are neither redirects to the link nor on the provided lists." }, "fixMissing": { "type": "string", "enum": ["always", "onEdit", "never"], "default": "onEdit", "description": "Specifies when we will add a missing value" }, "priority": { "type": "integer", "default": 5, "description": "Provides a tie-breaker when multiple array objects meet the inclusion or exclusion criteria. Higher value is preferred. It is unspecified which configuration object is used when both have the same priority level.", "minimum": 1 } } } } } } |
What it does to these parameters depends on the configuration. For example:
"work": { "link": "ABC News (Australia)", "wikitext": "ABC News", "blacklist": ["ABC News (United States)", "ABC News"], "greylist": ["Australian Broadcasting Corporation"], "fixMissing": "onEdit", "fixRedirects": "onEdit", "fixOthers": "always" }
Will ensure that the "work" parameter only links to ABC News (Australia). When it finds a link to a source other than ABC News (Australia), its redirects, or Australian Broadcasting Corporation, it will edit the article to correct that link.
When it encounters a redirect, or Australian Broadcasting Corporation, or a missing value, it will only correct those if it is already editing the article.
If we change "fixMissing" to "always", it would edit the article to insert the value.
"agency": { "includes": [ { "key": "agency", "value": ["Reuters"] } ], "remove": "onEdit" }
Will remove the agency field when it contains "Reuters". This is used to correct when the field has been incorrectly filled with the name of the publisher or work.
"department": [ { "includes": [ { "key": "url", "value": ["reuters.com/world/"] } ], "wikitext": "World" }, { "includes": [ { "key": "url", "value": ["reuters.com/world/reuters-next/"] } ], "wikitext": "Reuters Next", "priority": 6 }, { "includes": [ { "key": "url", "value": ["reuters.com/business/"] } ], "wikitext": "Business" } ]
This fills in the department field based on the source url. If none of these are met then the department field is not filled.
The current configuration file will do the following:
- ABC News (Australia)
- Set "work" to ABC News
- Set "publisher" to Australian Broadcasting Corporation
- Remove "publication-place"
- Remove "agency" when incorrect
- The Daily Telegraph
- Set "work" to The Daily Telegraph
- Set "publisher" to Telegraph Media Group
- Set "publication-place" to "London, United Kingdom"
- Set "department" when it can be determined
- Reuters
- Set "work" to Reuters
- Set "publisher" to Thomson Reuters
- Set "publication-place" to "London, United Kingdom"
- Set "department" when it can be determined
- Remove "agency" when incorrect
- The New York Times
- Set "work" to The New York Times
- Set "url-access" to "limited"
- Remove "publisher"
- Remove "publication-place"
- BBC News
- Set "work" to BBC News
- Remove "publisher"
- Remove "publication-place"
- Set "department" when it can be determined
- BBC Sport
- Set "work" to BBC Sport
- Remove "publisher"
- Remove "publication-place"
- The Guardian
- Set "work" to The Guardian
- Remove "publisher"
- Set "publication-place" to "London, United Kingdom"
- Set "department" when it can be determined
- The Guardian (Swan Hill)
- Set "work" to The Guardian
- The Daily Telegraph (Sydney)
- Set "work" to The Daily Telegraph
- Set "publisher" to News Corp Australia
- Remove "publication-place"
- ABC News (United States)
- Set "work" to ABC News
- Set "publisher" to American Broadcasting Company
- Remove "publication-place"
The intent is that the community will expand the configuration file, increasing the number of citations it can fix.
Example of template replacements
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
When editing a template, to improve readability it will also apply a consistent format and naming convention. This involves converting parameters away from aliases to their primary values, and placing the parameters into the following order:
Order
|
---|
|
Discussion
edit- I'd prefer if this bot (and every bot) stopped short of reordering template parameters. Doing a full reorganisation on any template edited will make it much more difficult to tell what changes have been made when reviewing diffs. Folly Mox (talk) 09:23, 16 July 2024 (UTC)
- We can trust our bots that much, I'd say. And it shouldn't be much of a problem if you compare the diffs in visual diff mode, try here. In my experience, it's much easier for a bot (program) to reassemble a template in some predefined order. Having data in the order of final appearance does help with readability (BilledMammal: that'd be url?, author(s) data, date, title…).Ponor (talk) 06:48, 18 July 2024 (UTC)
- @Ponor: Currently, author(s) data, date, title, url - the full order can be seen in the final collapsed box. However, that is easy to change.
- It wouldn't be difficult to put it back in the original order (although it would result in new fields being dumped at the end), but personally I believe it is better to reorganize it, as while it makes it harder for editors using non-visual viewer to identify the changes, it easier for editors to parse the template going forward. BilledMammal (talk) 23:05, 18 July 2024 (UTC)
- I support putting the params in some canonical order, my only question is which one it should be. VisualEditor (TemplateData), IAbot, maybe even reFill, probaly use the same one ("Full parameter set in horizontal format" from {{Cite web}}?), which is what I'd use as well. Up to you, though. Ponor (talk) 14:05, 19 July 2024 (UTC)
- I started with the full parameter set from Template:Cite news, but quickly found that "full parameter set" doesn’t actually mean "full parameter set".
- I see the two templates differ in where to put the URL; I think Cite news' method is better, as the URL is difficult to read so better to put that at the end. BilledMammal (talk) 14:11, 19 July 2024 (UTC)
- The order is probably from the order used by TemplateData as that is where ProveIt takes its order from. Gonnym (talk) 11:07, 4 August 2024 (UTC)
- I support putting the params in some canonical order, my only question is which one it should be. VisualEditor (TemplateData), IAbot, maybe even reFill, probaly use the same one ("Full parameter set in horizontal format" from {{Cite web}}?), which is what I'd use as well. Up to you, though. Ponor (talk) 14:05, 19 July 2024 (UTC)
- We can trust our bots that much, I'd say. And it shouldn't be much of a problem if you compare the diffs in visual diff mode, try here. In my experience, it's much easier for a bot (program) to reassemble a template in some predefined order. Having data in the order of final appearance does help with readability (BilledMammal: that'd be url?, author(s) data, date, title…).Ponor (talk) 06:48, 18 July 2024 (UTC)
- I think consensus would need to be established for this at other venues. The part of the proposal regarding adding links where none exist has the potential to conflict with WP:WHENINROME. voorts (talk/contributions) 21:18, 16 August 2024 (UTC)
- That aspect doesn’t need to be enabled; exactly how this functions depends entirely on the configuration file.
- However, that aspect isn’t covered by WP:WHENINROME, which says
If all or most of the citations in an article consist of bare URLs, or otherwise fail to provide needed bibliographic data – such as the name of the source, the title of the article or web page consulted, the author (if known), the publication date (if known), and the page numbers (where relevant) – then that would not count as a "consistent citation style" and can be changed freely to insert such data.
- Emphasis mine. BilledMammal (talk) 18:24, 17 August 2024 (UTC)
- I was referring to the part of WHENINROME that states:
Editors should not attempt to change an article's established citation style, merely on the grounds of personal preference or to make it match other articles, without first seeking consensus for the change.
For example, if an article has proper citation formatting, but none of the publication titles are wikilinked, or only the first instance is, running this bot to add wikilinks to each publication parameter would run afoul of WHENINROME. In any event, given that we have a reasonable disagreement on this point, I think consensus would be needed to implement that part of the bot. voorts (talk/contributions) 18:28, 17 August 2024 (UTC)- Ah, I misunderstood. The configuration file can be updated to not replace unlinked, but otherwise correct, source names, if such behaviour is desirable.
- With that said, I’m not sure whether the decision to Wikilink or not falls under WP:WHENINROME, as such a decision appears to go beyond referencing style and instead fall under MOS:LINK, specifically MOS:UL, which says
Proper names that are likely to be unfamiliar to readers
- which would include virtually all source names, as few have worldwide recognition - should be linked. BilledMammal (talk) 18:48, 17 August 2024 (UTC)- I broadly construe WHENINROME to avoid referencing conflicts since the MOS is a contentious topic. voorts (talk/contributions) 19:04, 17 August 2024 (UTC)
- I don't necessarily have an issue with the rest of what the bot would do. Also, I would like to see a process for establishing consensus for what parameters should be included for each ref. For example, why doesn't The Guardian (Swan Hill) have a publication-place parameter? Why use publisher instead of publication-place for The Daily Telegraph(s)? These are things that might need to be worked out. voorts (talk/contributions) 18:31, 17 August 2024 (UTC)
- The omissions for Swan Hill Guardian are primarily because I wanted an example of a minimally completed source, to demonstrate the tools range.
- (The Daily Telegraph actually uses both)
- The process I was planning was standard WP:CONSENSUS, with the requirement that consensus be obtained prior to changing the primary configuration file. Or do you think something more involved is needed? BilledMammal (talk) 18:48, 17 August 2024 (UTC)
- I was referring to the part of WHENINROME that states:
I think even a rough consensus would be fine for the contents of the configuration file. I'd like to see it advertised at Wikipedia talk:Citing sources, Wikipedia talk:Manual of Style, and potentially other venues before this bot goes active. voorts (talk/contributions) 18:58, 17 August 2024 (UTC)
- Good idea; I think WP:VPR would also be a good location, although I’ll wait till BAG gives preliminary approval before taking it to the wider community. BilledMammal (talk) 19:01, 17 August 2024 (UTC)
- Apologies, have been meaning to tag this with Needs wider discussion. but have had other things to deal with; I would like to see a rough consensus that this is a desired bot task. Primefac (talk) 12:02, 22 August 2024 (UTC)
- I've opened a discussion at the Village Pump. BilledMammal (talk) 09:03, 25 August 2024 (UTC) Link expanded to include section, no other change made. Primefac (talk) 20:09, 25 August 2024 (UTC) discussion archived, link updated. Primefac (talk) 11:43, 20 October 2024 (UTC)
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I do note a very weak consensus at the Pump that this will be a reasonable bot trial. For the sake of getting more eyes on this, please do not mark these edits as minor. Primefac (talk) 11:46, 20 October 2024 (UTC)
- I've opened a discussion at the Village Pump. BilledMammal (talk) 09:03, 25 August 2024 (UTC) Link expanded to include section, no other change made. Primefac (talk) 20:09, 25 August 2024 (UTC) discussion archived, link updated. Primefac (talk) 11:43, 20 October 2024 (UTC)
- Apologies, have been meaning to tag this with Needs wider discussion. but have had other things to deal with; I would like to see a rough consensus that this is a desired bot task. Primefac (talk) 12:02, 22 August 2024 (UTC)