Wikipedia:Bots/Requests for approval/GreenC bot 10
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: GreenC (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 15:25, Wednesday, February 6, 2019 (UTC)
Automatic, Supervised, or Manual: automatic
Programming language(s): Awk
Source code available: TBU
Function overview: Add {{Shadows Commons}}
to candidate File pages.
Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Shadows_Commons
Edit period(s): Weekly
Estimated number of pages affected: 30
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: Add {{Shadows Commons}}
template to File: pages on EnWiki that have the same name on Commons. It uses Quarry 18894 to find candidate articles.
Discussion
edit- Approved for trial (30 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. — xaosflux Talk 15:12, 7 February 2019 (UTC)[reply]
- Comment - I would very strongly suggest the bot use it's own query running directly rather than relying on a manually updated quarry one. ShakespeareFan00 (talk) 18:32, 7 February 2019 (UTC)[reply]
- I hadn't planned on that. Just downloading the JSON with each run. What problem do you foresee? -- GreenC 18:52, 7 February 2019 (UTC)[reply]
- There's no guarantee that a Quarry query is updated in a timely way. ShakespeareFan00 (talk) 01:00, 8 February 2019 (UTC)[reply]
- What does 'timely' mean for Quarry? The database on Tools has a replication lag, also. The tool is only running once a week or so. -- GreenC 01:22, 8 February 2019 (UTC)[reply]
- @GreenC: are you doing any checks if the shadow template is already in place (to avoid placing a second one), and/or that the file is actually shadowing? If so it won't really matter too much if this is delayed or using older data. This would be for cases where on edit someone else has already tagged the file, or the commons file has since been moved or deleted (i.e. the same checks we would expect of a human editor). — xaosflux Talk 13:42, 10 February 2019 (UTC)[reply]
- What does 'timely' mean for Quarry? The database on Tools has a replication lag, also. The tool is only running once a week or so. -- GreenC 01:22, 8 February 2019 (UTC)[reply]
- There's no guarantee that a Quarry query is updated in a timely way. ShakespeareFan00 (talk) 01:00, 8 February 2019 (UTC)[reply]
- I hadn't planned on that. Just downloading the JSON with each run. What problem do you foresee? -- GreenC 18:52, 7 February 2019 (UTC)[reply]
- Those are good points. I was going to check for the existence, but hadn't thought to check that the shadow exists. Both are relatively easy and not costly and yeah it would resolve any problem with delays in the replication server pool. -- GreenC 15:52, 10 February 2019 (UTC)[reply]
- Looks like Quarry is not stable, the link to the JSON file changes with each run of Quarry. It will connect to the DB directly. -- GreenC 19:29, 10 February 2019 (UTC)[reply]
Images are the same
edit- @ShakespeareFan00: one of the images is File:Mosh kashi self portrait.jpg (Commons]. According to the template instructions, when the images are the same, the template should not be used. Not the only case, also File:Léon-Vasseur.jpg and probably others. What would happen in these cases? A bot can't determine the images are the same. Should it add the template anyway - or is the bot not viable? -- GreenC 07:23, 11 February 2019 (UTC)[reply]
- One solution: add the template regardless. The burden will be manual removal of the template. This is less work than manual addition of the template, as the ratio of additions to removals is high. It can also leave instructions in the template like:
{{Shadows Commons |bot=Added by shadows bot. Remove this template if the images are the same. The bot will remember.}}
- The bot will keep a record and not add a second time. As a bonus the bot will now have a list of images that are the same, if ever needed. -- GreenC 08:12, 11 February 2019 (UTC)[reply]
- That sounds reasonable. Identifying images for CSD F8 (i.e Images identical), would be a related task. You could use an image hash to check IIRC. ShakespeareFan00 (talk) 09:31, 11 February 2019 (UTC)[reply]
- Like with File:Mosh kashi self portrait.jpg they have different dimensions so it's complicated. Will keep image comparison in mind, it would probably require a machine learning API and some other work. Currently the bot is skipping images with templates
{{Shadows Commons}}
,{{Keep Local}}
,{{Now Commons}}
and{{Do not move to Commons}}
(+ aliases) as well as anything with the magic word{{PROTECTIONLEVEL:(edit|move|copy)}}
. Anything else to avoid? -- GreenC 16:31, 11 February 2019 (UTC)[reply]- Huh. I was expecting that someone would file such a bot in due time. Having worked on Shadows Commons cases in the past, I have a few thoughts:
- Not sure that files with
{{Do not move to Commons}}
and{{Keep local}}
should be ignored. They simply say that a file can't be copied over and should be kept (respectively), not that it should stay at its file name. - {{Shadows Commons}} has
|keeplocal=
and|reason=
parameters; perhaps if the bot encounters files with{{Keep local}}
and{{Do not move to Commons}}
it should set the parameter to "yes"? And in the case of{{Do not move to Commons}}
it might also set the parameter|reason=
to "{{Do not move to Commons}}
"? - What is the problem with
{{PROTECTIONLEVEL:(edit|move|copy)}}
files?
- Not sure that files with
- Jo-Jo Eumerus (talk, contributions) 17:01, 11 February 2019 (UTC)[reply]
- Hi Jo-Jo Eumerus, thanks for the info.
{{PROTECTIONLEVEL:(edit|move|copy)}}
as they are high-risk (use on the main page etc) so renaming or moving to Commons would likely be avoided? I'm on-board with|keeplocal=
as replacement for{{keep local}}
. Not positive about{{Do not move to Commons}}
as that template is further embedded in 8 other templates. Something like|reason={{Do not move to Commons|reason=Original reason}}}}
and moving any of those 8 templates creates complexity of embedded templates and|reason=
(for future bots and tools). It would still work with separate templates I believe. -- GreenC 18:28, 11 February 2019 (UTC)[reply]- It is confusing with all the moving parts. Current thinking what action to take when the bot encounters:
- No templates - add {{Shadows Commons}}
- {{Shadows Commons}} - do nothing
{{PROTECTIONLEVEL:move}}
- do nothing? Or add {{Shadows Commons}}. Uncertain.- {{Keep local}} - delete and replace with {{Shadows Commons}} with
|keeplocal=yes
- {{Do not move to Commons}} - keep and add {{Shadows Commons}}
- {{Now Commons}} - keep and add {{Shadows Commons}}
- Thoughts / comments? -- GreenC 22:16, 11 February 2019 (UTC)[reply]
- It is confusing with all the moving parts. Current thinking what action to take when the bot encounters:
- Hi Jo-Jo Eumerus, thanks for the info.
- Huh. I was expecting that someone would file such a bot in due time. Having worked on Shadows Commons cases in the past, I have a few thoughts:
- Like with File:Mosh kashi self portrait.jpg they have different dimensions so it's complicated. Will keep image comparison in mind, it would probably require a machine learning API and some other work. Currently the bot is skipping images with templates
- That sounds reasonable. Identifying images for CSD F8 (i.e Images identical), would be a related task. You could use an image hash to check IIRC. ShakespeareFan00 (talk) 09:31, 11 February 2019 (UTC)[reply]
- One solution: add the template regardless. The burden will be manual removal of the template. This is less work than manual addition of the template, as the ratio of additions to removals is high. It can also leave instructions in the template like:
- I would ignore anything tagged {{Now Commons}} , as those have already been identified. ShakespeareFan00 (talk) 17:49, 12 February 2019 (UTC)[reply]
- Done. -- GreenC 17:59, 12 February 2019 (UTC)[reply]
- @ShakespeareFan00: Actually it was done in the SQL you gave me, but I added a few more aliases and backup regex check in the source. The current SQL list. The additions are all aliases.
- Done. -- GreenC 17:59, 12 February 2019 (UTC)[reply]
Extended content
|
---|
('ShadowsCommons', 'Shadows_commons', 'Shadows_Commons', 'Now_Commons', 'NowCommons', 'Nowcommons', 'NowCommonsThis', 'Now_commons', 'CommonsNow', 'NC', 'NCT', 'Nct', 'Db-now-commons', 'Db-nowcommons', 'Uploaded to Commons', 'Pp-template', 'Keep_local_high-risk', 'Pp-upload', 'C-uploaded', 'C-upload', 'C uploaded', 'C-uploaded', 'M-protected', 'Main page protected', 'Mpimgprotected', 'Mprotect', 'Mprotected', 'PP-main', 'PP-main-page', 'PP-mainpage', 'ProtectedMainPageImage', 'Uploaded_from_Commons', 'Protected_sister_project_logo', 'Rename_media', 'lfr', 'Image_move', 'Media_rename', 'Rename_file', 'Rename_image', 'Rename-image', 'Rename_media', 'RenameMedia', 'Renamemedia', 'Ffd', 'FFD', 'lfd', 'Imagevio', 'PUF', 'Puf', 'PUi', 'Pui', 'PUIdisputed' ) |
Trial results
editTrial results:
Trial complete. I accidentally issued a "-continuous" to jsub which circumvented the bots internal halts so it processed all available (44) instead of 33. I forgot the |bot=
message which is now included. Question about a few cases like File:Garlin Gilchrist II in Ann Arbor (cropped).jpg that have {{Copy to Wikimedia Commons}}
and have been copied but the image still exists on Enwiki. Should it be tagged? @Jo-Jo Eumerus and ShakespeareFan00: -- GreenC 17:41, 12 February 2019 (UTC)[reply]
- I think yes, they should still be tagged. Jo-Jo Eumerus (talk, contributions) 17:43, 12 February 2019 (UTC)[reply]
- Ok. -- GreenC 18:00, 12 February 2019 (UTC)[reply]
Approved. SQLQuery me! 18:04, 19 February 2019 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.