User talk:Coren/Archive/2016/April


False positive, matching a Wikipedia clone

CorenSearchBot's suspected copyvio on United States presidential election in Hawaii, 2016 is outright absurd. The matched page is a Wikipedia clone, and all the pages have in common is a couple of transcluded templates. If not disregarded altogether, Wikipedia clones should at least be treated the way other Wikipedia pages are treated: disregarding templates, the two pages might have in common. Regards, PanchoS (talk) 06:12, 26 March 2016 (UTC)

  • Hi PanchoS, note that the bot did not tag this article (and indeed no longer flags against that particular WP clone). The user who created it pasted that notice in with the rest of the text as you can see in the edit summary of the first edit to that page: [1]. This is its own problem, as that makes this article an unattributed copy/paste. Easily fixed but still an issue to be aware of. They've apparently done this to several states' articles: [2] CrowCaw 23:01, 3 April 2016 (UTC)
  • @Crow: Ouch, true. CorenBot however did suspect quite a few articles based on that particular WP clone, before I made sure it is being added to the blacklist. I'm wondering if we can be slightly more diligent in autodetecting Wikipedia clones, not necessarily to automatically exclude them from copyvio comparisons, but at least treating them like original Wikipedia content, by placing {{Csb-wikipage}} rather than {{Csb-pageincludes}}. Harsh copyvio allegations that turn out to be false positives are really detrimental to CSB's credibility. They may also hurt innocent user's credentials and deter them from further contributions. IMHO we should go to great lengths to avoid that. Cheers, PanchoS (talk) 23:30, 3 April 2016 (UTC)
  • PanchoS: Generally I try to add the clones as I see them. All the bot's reports get listed at WP:SCV which is heavily backlogged Shameless plug, any help there would be most appreciated! so it may be a couple of weeks before they get eyes on them. Oftentimes, as with these election articles, the author has made a bunch of similar articles in the meantime and has been spammed about them. The "problem" is that the bot often can't tell what is a pure WP clone and what is not, so finding the criteria with which to pick the softer message is difficult. (@Coren: Perhaps if the target page includes a link back to WP, it gets the softer warning, since at least the site is complying with the attribution?). Additionally, there are lots of sites that mirror WP articles but also pull in content from lots of other places, so whitelisting them would not be appropriate since not all of the content is freely licensed. At the risk of another shameless plug, I think ultimately the best solution is faster clearing of the copyvio boards so the authors can get an explanation of why the bot left that message, and ways to prevent it (or letting them know it is safe to remove/ignore). Thanks! CrowCaw 15:21, 4 April 2016 (UTC)
  • @Crow: Not saying we should whitelist those sites. But remember, it's not the Wikipedia user's fault, whether the WP mirror does or does not provide a backlink for attribution. So I think we should throw the softer warning at the WP user, whenever a particular page is found using Wikipedia as a source. A backlink would be the easiest case, but even without a backlink it shouldn't be all to complicated. This would avoid unappropriately harsh messages, before a new WP mirror gets blacklisted. I'm ready to help you guys, though I've never looked into the sourcecode of CSB. Cheers, PanchoS (talk) 16:41, 4 April 2016 (UTC)
  • PanchoS I completely agree, it is just finding the criteria that the Bot can use to identify a true WP mirror. You can peruse the source code at the bot's main user page. It is currently not operational, as the tool it used (Yahoo Boss) has ceased operating, so the existing code will need to be tweaked anyway to adapt to whichever service Coren re-codes it to use; this does present an opportunity of sorts to see if this scenario can be coded in. Thanks, CrowCaw 16:52, 4 April 2016 (UTC)