Wikipedia:WikiProject Copyright Cleanup/2023 backlog drive
This project page or section is in a state of significant expansion or restructuring. You are welcome to assist in its construction by editing it as well. If this project page has not been edited in several days, please remove this template. If you are the editor who added this template and you are actively editing, please be sure to replace this template with {{in use}} during the active editing session. Click on the link for template parameters to use.
This page was last edited by The4lines (talk | contribs) 13 months ago. (Update timer) |
Instructions
editFor new users
editFirstly, thank you for taking the time to help clear the backlog! Your efforts are appreciated. Copyright is complex and nuanced topic to understand so we recommend you start on the easier backlogs to clear, linked below. For CCI, these are mainly pages that involve copying from non-free websites, so no offline research is required. Category-wise, all the suspected violations should have a source URL.
An exhaustive list of instructions for handling text-based copyright violations is available at the top of the copyright problems page. A good guide on how to start editing at CCI is User:Moneytrees/CCI guide. A brief rundown of handling CCIs, but no substitute for reading the relevant pages, is below:
- Check for dead links, if there are, use IABot to restore them
- Run the page through Earwig's copyright detector to get a cursory score. Often mirrors copy from Wikipedia, so make sure to identify these and ignore them.
- Check the article' sources and compare it to existing text. WP:REX may be helpful for hard to access sources.
- If you have identified any possibly infringing content with a source
- Check the page's licence: is it compatible per WP:COMPLIC?
- If the content is not compatible, remove or rewrite it with a link to the source material in the edit summary
- Remove the diff from the CCI page and mark it with {{y}}. Mark the article talk with {{CCI}}
- If you have identified any possibly infringing content without a source
- In case of content added by repeat copyright violators at CCI, the content may be presumptively removed
- Please note this in your edit summary, linking to the CCI page if applicable
- Otherwise, if you still suspect the content of being plagiarised from a non-free source, removing it under other policies (e.g. if it's unreferenced) may be appropriate.
- In case of content added by repeat copyright violators at CCI, the content may be presumptively removed
Please do not hesitate to ask any experienced editors for help
For returning users
editWelcome back, and thanks for taking part. This drive is mainly focusing on CCI, and the rewards system is available below.
Rewards system
editFor articles at CCI...
- Handling a diff <1k bytes - one point
- Handling a diff >1k bytes - two points
For everything else...
- Handling any article - two points
- Reviewing all diffs of an article - four points
Awards
editImage | Minimum | Template |
---|---|---|
5 points | The Invisible Barnstar | |
10 points | The Working Wikipedian's Barnstar | |
25 points | The Tireless Contributor Barnstar | |
50 points | The Cleanup Barnstar | |
100 points | The Copyright Cleanup Barnstar | |
200 points | The Great Copyright Drive Barnstar | |
500 points | The Order of the Superior Scribe of Wikipedia | |
Re-reviewing 25 articles |
The Teamwork Barnstar | |
In addition, the person who accumulates the most points during the backlog elimination drive, will receive the Copyright Review Medal of Merit |
Beginner friendly CCIs
editCategory backlogs to clear
edit- Category:Copied and pasted articles and sections with url provided
- Category:Articles with close paraphrasing
- Category:Suspected copyright infringements without a source
- Category:Copied and pasted articles and sections
- Category:Articles with improper non-free content
Construction
editCurrently, there are significant backlogs in the three principle queues of copyright cleanup: CCI, CP and CopyPatrol. Other parts of the projects have made significant progress with clearing their backlogs through gamifying reviews and providing rewards for a certain number of points. Whilst a backlog drive is appealing, a gamified approach may not be effective in respect to copyright.
The Backlog (August 2023)
editBased on rough estimates and database counts, copyright backlogs on Wikipedia are:
- CCI currently has over 100,000 remaining diffs to be reviewed
- CopyPatrol currently has ~70 open reports at a time
- CP is at a manageable level for now
Rough ideas
edit- Backlog drive where we reward points for older CCIs
- Focus on a large CCI that's easier for beginners to tackle (rtkat3, werldwayd, etc.)
- Tackle low-risk stuff towards the end of CCIs
- Clear out Category:Copied and pasted articles and sections with url provided, so it doesn't have to be listed at CP
- Not too big so we could evaluate each once like a CCI review
- Bot to collate number of articles fixed
- ?
Development
editRewards system
editMost backlog drives make use of a point/article system, and this would make sense here: barnstars, etc. could be given out for certain criteria in a similar manner to the GAN drive. Finding points can be done automatically relatively easily: the NPP drive made use of bots to collect data such as the backlog size and user points.
The main problem is quality. Unlike the above, it is much more difficult to review individual users, not only because of the sheer number of pages, but the fact that there are a much more finite number of editors with sufficient copyright experience as GAN/NPP experience in the above drives. However, we could still probably get a relatively high standard with a set sample, which will have to be decided. One per 25 pages may be a good starting point but if this is an issue we can amend as appropriate.