User talk:ClueBot Commons/Archives/2018/September
This is an archive of past discussions about User:ClueBot Commons. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Cool Bot
Hi, I think it's really cool that you created a bot that detects vandalism. Is there any reason why you chose to write the majority of your bot in C/C++ instead of the other languages? Also what direction do you think is next after the vandalism bot? Thanks! --Hiitsjamez (talk) 05:34, 13 September 2018 (UTC)
- The primary motivating factor in choosing C++ over other languages is that the machine learning core needs to be fast as it has to keep up with the edit rate of Wikipedia as well as occasionally process hundreds of thousands of edits when retraining the bot after tweaking settings. C++ provides close-to-the-metal performance for executing the ANN. Sure, today there are libraries for Python and other languages that allow for fast machine learning in them, but the core of those libraries are also written in C or C++. And those weren't around or at least weren't mature when ClueBot NG was written.
- The next direction that anti-vandalism bots will take will probably be toward better NLP and better learning models like CNNs combined with better datasets. -- Cobi(t|c|b) 06:39, 13 September 2018 (UTC)
Archival edit by bot without leaving link to Archive page
Hi, thanks for your work on this bot. There seems to be an edit it made on this page that removed a lot of material to archive but did not include any link to the created archive page: https://en.wikipedia.org/w/index.php?title=Talk:Michael_Shermer&diff=prev&oldid=844643772&diffmode=source
The archived material is here: https://en.wikipedia.org/wiki/Talk:Michael_Shermer/Archives/_1
It looks like it possible that the header template on the talk page is supposed to include arvhive links, but the new archive link is not showing up anywhere that I could see. So I guess the question is about the bot making sure the new archive page is linked adequately. If I'm missing something, please let me know. mennonot (talk) 19:22, 13 September 2018 (UTC)
A cup of coffee for you!
You need this. Just sayin'. You've been working nonstop for the past decade or so. Hdjensofjfnen (If you want to trout me, go ahead!) 23:37, 17 September 2018 (UTC) |
I've just got a great idea
At the moment, when Cluebot NG decides that an edit isn't vandalism, it simply ignores it. My idea is to make edits that are considered "almost-vandalism", as-in edits that are almost considered vandalism, but aren't reverted, and create a script for the Recent Changes Feed that lists all the "almost-vandalism" edits, allowing recent changes patrollers to clear up the stuff that Cluebot NG can't/won't revert, yet are still vandalism. Terrariola 13:21, 20 September 2018 (UTC)
- @Terrariola: Nice idea! So Cluebot could have its own "RSS" feed in a separate page? Hdjensofjfnen (If you want to trout me, go ahead!) 16:20, 20 September 2018 (UTC)
- @Terrariola and Hdjensofjfnen: I haven't tried the tool, but the description at Wikipedia:STiki#Edit prioritization seems to indicate this is already available. -- John of Reading (talk) 16:51, 20 September 2018 (UTC)
- It used to be the default queue STiki used, but reportedly the IRC feed that the bot used to relay this information died, and with ClueNet being in shambles these days it wasn't revived, so West.andrew.g changed the default queue to STiki's own metadata queue. STiki's own method of deciding what's vandalism and what isn't works fairly well, and STiki by design always monitors recent changes and adds edits to its queue even when nobody's currently using it, so you can already use STiki to go through and review the edits that CBNG didn't revert. Of course, CBNG's queue was generally the most accurate (at least from what I can remember years ago), so it'd be nice if the maintainers of CBNG could bring it back up. Perhaps set up a channel on freenode and have it be relayed there so other anti-vandalism tools can draw from it more easily? (Huggle 3 and the late DefconBot used to use CBNG scores, at least until it died) —k6ka 🍁 (Talk · Contributions) 03:18, 21 September 2018 (UTC)
(As creator and maintainer of WP:STiki) -- Everything User:K6ka said is accurate. I pestered CBNG's creators for a while after the IRC feed went down, to no avail. CBNG's model was much better, in part because I haven't had the time to retrain my metadata model since its initial creation more than 7(?) years ago. Getting the IRC running again would be a huge boost to anti-vandalism efforts due to STiki and Huggle integration. All the code is already in place to ingest the IRC feed and make the CBNG queue the default for users again. West.andrew.g (talk) 04:48, 21 September 2018 (UTC)
- First, aplogies for the non-replys to the pestering. I will speak to @DamianZaremba: in relation to modifing the CBNG code to get the feed going somewhere else (in theory, not hard) - - RichT|C|E-Mail 08:42, 22 September 2018 (UTC)
- Done FreeNode - #wikipedia-en-cbngfeed — Preceding unsigned comment added by Rich Smith (talk • contribs) 18:21, 22 September 2018 (UTC)
Why was the edit summary on CBNG reverts changed?
Between 17:44, 22 September 2018 and 17:59, 22 September 2018, from "Reverting possible vandalism by USER1 to version by USER2" to "Edit by USER1 has been reverted by ClueBot NG due to possible noncompliance with Wikipedia guidelines." Is ClueBot NG going to be branching out into reverting violations other than potential vandalism? Gatemansgc (TɅ̊LK) 22:56, 23 September 2018 (UTC)
- I don't know, but the word "noncompliance" coming from a bot sure sounds like a dystopian robot police movie quote to me. --AntiCompositeNumber (talk) 23:21, 23 September 2018 (UTC)
- Agreed. It does sound more harsh than "possible vandalism". Gatemansgc (TɅ̊LK) 23:40, 23 September 2018 (UTC)
- There was a discussion a while back around this: User talk:ClueBot Commons/Archives/2017/December#Proposed change to standard wording
- DamianZaremba or Rich Smith might have accidentally reverted the revert in the code. -- Cobi(t|c|b) 00:38, 24 September 2018 (UTC)
- Does that mean that the change wasn't supposed to happen because the discussion about it wasn't complete? Gatemansgc (TɅ̊LK) 01:02, 24 September 2018 (UTC)
- So checking GitHub, it looks like I didn't push the revert of the code, hence it probably got overwritten... however that push was from November so it would have been saying that for quite a while. Anyway, I'll get it changed properly - - RichT|C|E-Mail 07:23, 24 September 2018 (UTC)
- I disagree with changing the edit summary. ClueBot NG has always been designed to be an anti-vandalism bot, not an anti-COI or anti-OR bot. "Reverting possible vandalism" is about the best you can get that makes it clear that CBNG is an anti-vandalism bot while not sounding presumptuous about the bot being 100% confident the edit is for sure vandalism. Other anti-spam messages from around the Internet use similar wording, or even harsher wording than the one CBNG currently uses. We ought to remember that no matter what we do, there will always be some people that take offense to their edit being automatically reverted and will lash out; we already have people that lash out over human reverts, even if they're politely explained. In any case, "Edit by <user> has been reverted by ClueBot NG due to possible noncompliance with Wikipedia guidelines" is 1) way too long—both a very short and a very long edit summary intimidates users; 2) is vague and unclear as it doesn't link to any Wikipedia guidelines nor does it explain which guideline the edits specifically violated; and 3) says "Edit" as a singular entity despite the fact that the bot uses rollback and can revert multiple edits. Like I said, the bot was designed to catch vandalism, not COI editors editing inappropriately. —k6ka 🍁 (Talk · Contributions) 13:16, 24 September 2018 (UTC)
- The change was in relation to OTRS agent making a query, not any user or IP that was reverted. Please see the thread that Cobi linked above - - RichT|C|E-Mail 13:21, 24 September 2018 (UTC)
- I disagree with changing the edit summary. ClueBot NG has always been designed to be an anti-vandalism bot, not an anti-COI or anti-OR bot. "Reverting possible vandalism" is about the best you can get that makes it clear that CBNG is an anti-vandalism bot while not sounding presumptuous about the bot being 100% confident the edit is for sure vandalism. Other anti-spam messages from around the Internet use similar wording, or even harsher wording than the one CBNG currently uses. We ought to remember that no matter what we do, there will always be some people that take offense to their edit being automatically reverted and will lash out; we already have people that lash out over human reverts, even if they're politely explained. In any case, "Edit by <user> has been reverted by ClueBot NG due to possible noncompliance with Wikipedia guidelines" is 1) way too long—both a very short and a very long edit summary intimidates users; 2) is vague and unclear as it doesn't link to any Wikipedia guidelines nor does it explain which guideline the edits specifically violated; and 3) says "Edit" as a singular entity despite the fact that the bot uses rollback and can revert multiple edits. Like I said, the bot was designed to catch vandalism, not COI editors editing inappropriately. —k6ka 🍁 (Talk · Contributions) 13:16, 24 September 2018 (UTC)
- So checking GitHub, it looks like I didn't push the revert of the code, hence it probably got overwritten... however that push was from November so it would have been saying that for quite a while. Anyway, I'll get it changed properly - - RichT|C|E-Mail 07:23, 24 September 2018 (UTC)
- Does that mean that the change wasn't supposed to happen because the discussion about it wasn't complete? Gatemansgc (TɅ̊LK) 01:02, 24 September 2018 (UTC)
- Agreed. It does sound more harsh than "possible vandalism". Gatemansgc (TɅ̊LK) 23:40, 23 September 2018 (UTC)
What's broken about the "Review edits for the dataset" thing?
Like, did it just crash and nobody has time to put it back up, or has it been broken from the start? Terrariola 09:13, 26 September 2018 (UTC)
An idea
How about making Cluebot NG ignore edits by other bots, so it won't end up reverting legitimate edits from other bots? Terrariola 09:49, 26 September 2018 (UTC)
- Terrariola, has it reverted any edits from other bots? it shouldn't because it ignores users that have more than a certain amount of edits (something like 50-150 edits) Galobtter (pingó mió) 09:59, 26 September 2018 (UTC)
- It hasn't yet, but it might happen with newly-added bots and it also clutters up the IRC feed. Terrariola 11:16, 26 September 2018 (UTC)
- CBNG ignores bots with the bot flag, I'm sure. This isn't anything new as I'm sure anti-vandal bots of the past have done it before. —k6ka 🍁 (Talk · Contributions) 17:45, 26 September 2018 (UTC)
- It hasn't yet, but it might happen with newly-added bots and it also clutters up the IRC feed. Terrariola 11:16, 26 September 2018 (UTC)