Wikipedia talk:Edit filter/Archive 2

Archive 1Archive 2Archive 3Archive 4Archive 5

Auto-filtration

If a filter is designed to stop X from doing Y, will everyone who is doing Y be affected? This seems reminiscent of being caught in an autoblock.

Also, will admins be exempt from filters the way that they are from autoblocks? - jc37 01:52, 10 July 2008 (UTC)

If the filters themselves specify so, but there is no wholesale exemption from filters. — Werdna • talk 01:53, 10 July 2008 (UTC)
I think I understand what you're trying to do, and for the most part, I support it. That said, the above response is concerning. NOt because I'm looking for a "get-out-of-jail-free card" for admins, but because a filter that acts up (defined poorly, or acting in ways unforseen), could, among other things, make the wiki a ghost town rather easily. I'm starting to think that access to filter creation (if implemented) should be on a tight rein, almost at developer/steward level. - jc37 02:00, 10 July 2008 (UTC)

There will be safeguards in place which prevent filters from going completely haywire, in the software itself. That said, I certainly do hear you on the matter of having very private filters, and this was, in fact, my original proposal. — Werdna • talk 04:53, 10 July 2008 (UTC)

For the purpose of request for comments, it would be good to document what the "safeguards in place which prevent filters from going completely haywire" consituite; unless I missed them somewhere. —Sladen (talk) 12:41, 10 July 2008 (UTC)
AFAIK, they consist of technical restrictions which prevent the extension reacting to more than X% of monitored actions. X is likely to be extremely low - say 2 or 3% - so trying to set a filter like TRIGGER: edit-namespace:0, ACTION:disallow will just result in the extension going off in a huff and refusing to trigger. I very strongly oppose tight restriction on abusefilter-modify - I would rather see limits on what actions are permitted than on who is allowed to set them. Happymelon 14:02, 10 July 2008 (UTC)

The real question is, if you're doing Y, shouldn't you be blocked as well? Given that "Y" is going to consist "blatant pagemove vandalism", "replacing Evolution with text from Genesis", "Sockfarm creation" (that most likely a "warn to ANI" response rather than anything proactive), and other blatantly and unquestionably undesirable activities; why wouldn't you want users who participate in those activities, whoever they might be, to be prevented from doing so? Happymelon 14:02, 10 July 2008 (UTC)

Because computers make arbitrary choices based upon pre-selected criteria, and surprise us everyday in how they interpret their criteria. (See: Category:Programming bugs for some "fun" examples.) I guarantee that I'm no Frankensteiner, but just like fire, or any thing else we utilise as a tool, it has the potential to harm, as well as to help. So I don't think there is anything wrong with careful analysis beforehand, rather than just "lighting the fire", and subsequently burning ourselves in hindsight, because it did something we didn't expect. - jc37 20:03, 10 July 2008 (UTC)
I think I sort of understand what you are saying but I don't agree that we should/should not put up the program on the basis that there is "bugs". As I programmer myself, I know that it is not possible to make a program of any sort of complexity that is completely bug free. However as you suggested, I think we need to analyze whether this "tool" will solve more problems than it will create. Vivio TestarossaTalk Who 20:21, 10 July 2008 (UTC)
And that is why we can run the filters on months of previous logs and see if and what mistakes they make. Something new may come along once in a while and confuse it, but the same thing happens to people. See #Dry Run 2. 1 != 2 20:18, 10 July 2008 (UTC)

Anything, any single thing, which is done to stop vandalism.....

.....is fine with me.

I have neither the time nor the energy nor the interest in trying to read and grasp the arcane arguments which have been put forward on the last 36" of scroll-down discussion.......

What needs to be SOLVED - NOW - IMMEDIATELY - is the issue of (mostly) unregistered half-wits who post what is little more than obscenities, and who are given a slap on the wrist with bullshit standard postings on their talk pages about "maybe this an accident" or other such rubbish. (The registered ones seem to be soon clampted down on, thank goodness).

Somewhere, in the dephs of Wikpedia, there was a discussion about not allowing a unregistered user who makes an edit to have that edit appear immediately, and to have it reviewed by an admin. beforehand. WHAT HAPPENED TO THIS???????

If there is one single thing which would cause me to stop editng on Wikpedia, it is this very one: the constant, day-by-day crude and deliberate vandalism which appears on the (mostly in my experience) opera-related articles on which I work. (Check my "contribution" list for the last 2 years).

I do not suffer fools gladly: we need to admit that vandalism is vandalism. Most of it is NOT an experimental "sandbox" issue; it is not a "mistake". (Just look at how I spell out the words used on these vandals' pages: I'll QUOTE their obscenities right there for everyone to see - and post a "  Please stop your disruptive editing. If your vandalism continues, you will be blocked from editing Wikipedia. " warning, even if it is a first-time piece of editing.

Viva-Verdi (talk) 04:00, 10 July 2008 (UTC)

Unless I have misunderstood something, unregistered editors are identifiable only by their dotted quad IP address. I know that my dotted quad changes often, as my ISP assigns me one from a pool when I log on (long digression about satellite service provider and dial-up uplinks omitted). Any kind of block based on the IP address would therefore catch the next guy to draw that number from the pool, and not necessarily the offender. Perhaps most home-based broadband users have a stable IP, but not everyone is in that boat. If you allow unregistered editors, this problem is intractable. That doesn't mean one shouldn't allow them, of course.
I assume the idea of reviewing edits by unregistered users before allowing them to go through failed on the vast amount of thankless work it would entail. Are you volunteering? Would having their first attempt at editing apparently cause nothing to happen either embolden or discourage new users? Have you ever noticed how many double posts you get in blogs or comment to news articles, because the poster didn't see his input appear right away? Even when a line appears saying the post is being moderated and there will be a delay? Perhaps new or unregistered users would benefit from an enforced "Show preview" before the edit could be saved, but if vandalism is intended, this would not deter anyone. (PS- the many misspellings in your comment above are tempting, but I restrain my urge to edit them.) TheNameWithNoMan (talk) 10:16, 10 July 2008 (UTC)
As for the code that requires edits to be reviewed before they affect the publically-visible page, that's almost finished now, and being tested on the German Wikipedia right now. So it shouldn't be too long before there's a big row on some talk page somewhere as to whether to enable it or not, and if so what the settings should be... --ais523 15:46, 11 July 2008 (UTC)

My two cents

First, let me just say that I have not read the entire discussion on this page, nor do I intend to. I skimmed the discussions and I figured I should add my thoughts here.

I think this is a great idea; in fact, I was actually about to propose something similar myself. However, I think you've taken it a bit far. Being able to automatically desysop someone is completely unnecessary. How often do admins become involved in vandalizing Wikipedia? And even if we had that happen often, wouldn't it be easier to manually ban them than having a bunch of false positives that we constantly have to deal with? Same with rollback-equipped users and such. I like the idea of being able to stop blatant vandalism, such as page blanking and what-not, before it actually happens (to save editors/admins some trouble), but I believe going any further than that makes it no different from any bot we currently have, and we have many. Except this system would have much more power than any of those bots, thus making it dangerous. If we have 10-15 false positives every day from bots, what would that mean about this filter? Personally, I think it would make even more of a mess than the vandalism its trying to prevent. Imagine having 5 falsely blocked users every day, and every-day contributors having rollback and other privileges stripped from them. Having a truly accurate filter would require a Google-size algorithm and lots of automatically-retrieved statistical information, neither of which is realistic. (That was not meant to offend any programmers here.)

About the issue of hiding the abuse log: I don't believe it should be hidden. Lots of people here have said that vandals could adapt and change their strategies, but I disagree. Most vandals on Wikipedia don't even know much about it, they just come because they think they are hilarious for writing "Mike is the coolest person ever!!!" on a random page. I guarantee that if this goes into effect, about 95% of those vandals won't even know about it. Sure, they might be surprised that all of a sudden they can't edit anything, but they aren't all of a sudden going to go look at the algorithm and try to get around it. The great thing about vandals (at least most of them) is that they don't care enough to do so. Sure, there will be the "smart" guys who'll figure it out, but no algorithm will be able to stop them anyway, so I don't think it's even worth trying.

Just my two cents. Sorry if this has already been posted somewhere, I'm not going to bother checking. Cheers. — FatalError 04:04, 10 July 2008 (UTC)

As I understand it, this extension won't be able to recognize the "Mike is the coolest person ever!!!" vandalism, this is left to recent changes patrollers and the external bots we already have. This is more to catch and block more serious acts of vandalism, such as disruptive page moves. The extension won't probably be triggered more than say 10 times a day. These vandals usually have good knowledge of our internal processes, this is why hiding the logs might have a positive effect (I personally am not that sure). -- lucasbfr talk 06:47, 10 July 2008 (UTC)
Oh, well in that case I don't see the point of this at all. If it only triggers 10 times a day, wouldn't it just be easier for admins and such to take care of it manually? These things usually don't take more than 5 minutes anyway. We wouldn't be saving that much manpower, and as I've said before, it would probably make more of a mess than the vandalism itself. And since they're "smart" vandals, they are going to learn to get around the filter, one way or another, so I don't see a point. — FatalError 17:57, 10 July 2008 (UTC)
Reverting pagemove vandalism, especially when there are subpages involved, is a very difficult process, especially when it's A > B > C > D; getting back to A isn't as easy as it would seem to be, and when you're talking about reverting hundreds of them, you're going to be taking a lot longer than 5 minutes. Them being more intelligent vandals is a good argument for not hiding the filters from the public, but not a good one for "we don't need this". Celarnor Talk to me 03:11, 11 July 2008 (UTC)

Statistics, please

In principle, I'd welcome additional anti-vandal measures; this one has added value to the bots since the software has access to IP addresses of registered editors, which otherwise only checkusers have. I think the algorithm should be public, for the purpose of trust, because an algorithm that detects obvious vandalism should not depend on being secret, and because most vandals don't make an extensive study of methods to circumvent the system before getting started.

But what I am missing in the discussion here is some numbers on how big the problem is that this software will solve, and how much false positive we can accept and expect.

Based on statistics of the past, how many vandalism edits should it catch? Should it only deal with really bad serial vandalism, or with all run-of-the-mill swearing/page blanking? How much false positives would be caught? How many false positives are acceptable? Especially with things like IP-range blocking there can be big side effects. Will the software check for legitimate edits from a /16 range (including logged-in users) before doing an IP-range block?

Werdna wrote somewhere above that it will be tested against a few 1000 edits. Although a test corpus of a few 1000 is nice for testing and tweaking the rules, it should really be tested against a much larger number of edits for false positives of the unacceptable kind (blocks, desysopping, ip range blocks) before the system is rolled out for production use. There should also be a way to pre-test new rules before they are applied.

Han-Kwang (t) 06:53, 10 July 2008 (UTC)

A rudimentary filter checking for page-move vandalism matched 288 of 294 pagemoves by users who were blocked in the next ten minutes. There were no false positives, and those which weren't matched were found to not conform to any repeating pagemove vandalism style. — Werdna • talk 01:00, 11 July 2008 (UTC)

Captcha

In actions when a rule is triggered, you could add asking for a captcha before the edit can be committed.

There are precedents for such action on vandalism (especially linkspam) filters on some wikis. I think even some smaller wikis ran by wikimedia has such a protection enabled, albeit with much simpler rules for recognizing spam.

This would also be easy to implement as wikipedia already has a captcha system used for logins. An even simpler variant would be invalidating the login cookies of the user so he'd have to fill out the captcha again if he or she wants to log in, but that only works for registered users.

b_jonas 07:17, 10 July 2008 (UTC)

Yes, I've had this in mind, but as yet been too lazy to implement it. — Werdna • talk 08:05, 10 July 2008 (UTC)

Case Study

Today, we had incidents of people from 4chan coming on and posting death threats and other nonsense in the sandbox. It's great that they did it in the sandbox — it meant practically zero cleanup. However, it would be really good if we could have quite simply stated that anybody who mentioned mudkips in the edit summary, or included "I am going to kill [name]", or something, on the sandbox, was blocked until further notice.

This is the sort of application I envisiage for the abuse filter — being able to save plenty of admin time by finding problems like this, and spending five minutes on a filter, rather than having ten people reverting stuff for an hour. — Werdna • talk 08:09, 10 July 2008 (UTC)

Hm. I don't think the edit summary filters are a particularly good idea, especially the ones you mentioned; for example, someone actually editing Mudkip is liable to include that in their edit summary. Going through the dry run you posted below, it uses 'epic lulz' as a trigger. I've used 'epic lulz' in edit summaries before in discussions. I don't think it is at all appropriate, since that in and of itself is not at all related to page move vandalism.
These are the reasons that I didn't like the idea of having the filters closed; we would have had to wait for me to get blocked for sticking 'lulz' in an edit summary, or someone else get blocked for editing the article on Mudkip. Additionally, new editors might not realize there are words they aren't allowed to say, so there really should be some kind of mention of this in the new user pages. Celarnor Talk to me 14:22, 10 July 2008 (UTC)

I'm not talking about using these as sole criteria, of course. In this case, we'd be applying the filters to that particular page, for new and unregistered users, etc, etc. You should note that I'm not currently proposing to hide the filters. — Werdna • talk 14:27, 10 July 2008 (UTC)

Regarding page move vandalism

I must admit that there are technicalities here that i do not grasp fully, but my spontaneous approach to combatting page-move vandalism would be to make the entire 'move' function semi-pp, only accesible to established users. a newcomer account clicking on 'move' would simply be refered to WP:RM. I'm not sure what that would mean in terms of wiki programming, though.

imho, a newcomer account should ideally initiate their editing at wikipedia by contributing through writing texts. Massive move schemes, even good intended ones, are not suitable for people who are yet to learn the practices at the wikipedia community. --Soman (talk) 09:29, 10 July 2008 (UTC)

This is what Autoconfirmed is for. Unfortunately, vandals know how to get around it. — Werdna • talk 09:59, 10 July 2008 (UTC)

I stand corrected. --Soman (talk) 16:12, 10 July 2008 (UTC)

Secrets

Hello,

I must admit I see a problem secrecy in regards to this, but would it not be possible to do as the security flaw people do -- that is to have a delay on the release time of the secret information? I.e. you could, after one month, force all logs to become public? I think this would lower the barrier to secrecy concerns. User A1 (talk) 11:24, 10 July 2008 (UTC)

That's an interesting idea. — Werdna • talk 13:14, 10 July 2008 (UTC)

AGE and CONFIRMED units

Both the AGE and CONFIRMED units have a gradularity and increment of seconds; however the AGE is in negative relative seconds and the CONFIRMED variable is in absolute binary-coded-decimal. This prevents any arithmetic combining or comparing the two. —Sladen (talk) 12:35, 10 July 2008 (UTC)

I suppose it would be possible to convert the CONFIRMED variable to a unix timestamp. Arithmetic isn't possible with the current filter syntax, although it's certainly worth doing. — Werdna • talk 13:16, 10 July 2008 (UTC)

No such thing as vandal-proof

You can do anything you want to make WP as vandal-resistant as you'd like, but it seems like this is another project seeking to attain near vandal immunity. You know what they say, you can't make anything vandal proof - the world will just make a better vandal. In other words, I believe the difficulty of making a more effective anti-vandal bot/filter/script is an exponential function; ultimately an exercise in futility. Existing editors and bots are doing their best to handle vandalism, and they're doing a good (although not great) job at present. Security through obscurity is never effective, and besides, only the absolutely necessary amounts of secrecy should be applied to WP. This filter would not be absolutely necessary and therefore simultaneously thwarts the spirit of WP while adding a - in my opinion - trivial benefit. --ž¥łǿχ (ŧäłķ | čøŋŧřīъ§) 19:11, 9 July 2008 (UTC)

The security-through-obscurity approach has now been largely abandoned, leaving us with just a posh version of autoconfirmed, semi-protection, adminbots and RCPatrol all rolled up into one elegant package and delivered with love. I agree with you: it is not possible to make a wiki immune from vandalism without making it approved-accounts-only. That's not the point. Just because we can never eliminate all vandalism, you would have us not try and eliminate any of it? Every vandal who is prevented from vandalising by this extension frees up seconds or minutes of other users' time, with which they can build the encycopedia. Let's not forget why we're all here: we want to write a free encyclopedia and give it away. Everything we do that is not part of that process is wasting our time; why would we not want to give some of those tasks, those that can be reliably done by an automated process with minimal risk of false positives, to a simple interface extension to handle, so we can all write another featured article? Or a new template. Or a better category structure. Or a nicer Main Page. Or anything that doesn't involve clearing up the mess created by vandals like these (1, 2, 3, 4). Who cares if it doesn't stop all the vandals? It stops some of them, and will do no damage to legitimate users now that it is in compliance with the wiki-philosophy of openness and transparency. Do you want to try doing its job? Happymelon 21:28, 9 July 2008 (UTC)
You make it sound as if this would be the first anti-vandal bot ever created. --ž¥łǿχ (ŧäłķ | čøŋŧřīъ§) 14:04, 11 July 2008 (UTC)

Impressive results!

But how usable is this filter going to be on the long run? The filter itself seems to make virtually no faults, but also seems to detect nothing that manual patrols would have missed, which leads me to think (Confirmed by the example run) that mostly the blatant cases are being filtered out. Personally i notice during vandalism patrol that if the user gets caught red handed, their next attempt is often much more sneaky. For example, try one is blanking the article, try two is altering a few numbers. If the user gets a message or a warning that one of their actions is not allowed, can the filter prevent they duck under the radar? This is hard even for a human unless the second revert is a prejudged blind one which it should not be. (Sometimes it stops after one bad edit, sometimes it doest). Any thoughts on this? Excirial (Talk,Contribs) 13:24, 11 July 2008 (UTC)

As has been stated, and re-stated, and re-stated, the filter is not intended for simple vandalism. It is intended for repeated vandalism with a known modus operandi, which can be easily detected and filtered with next to no false positives. What you specify seems to lie outside that category. — Werdna • talk 13:33, 11 July 2008 (UTC)
If i understand it correctly it functions much like rule based antiviral software, whereas now the rules for the filter can be edited almost real time by an admin? Interesting idea really, and i think the log can actually become quite helpful if some infrastructure that permits easy checking would be present around it. I would require quite the effort to keep the rulesets up to date, but it seems the bot owners manage to do so. Perhaps a WP:RRS (Request Ruleset) page to allow non sysop forwarding of new threads? Excirial (Talk,Contribs) 14:18, 11 July 2008 (UTC)
I am amused that many people's first instinct is to think of the appropriate acronym for a process page for the extension. Trust a Wikipedian. I'm sure discussion of filters will occur, there, in the admin IRC channel, on the mailing list, and wherever else. It is noteworthy that each filter has a space for general notes viewable by anybody who can view the ruleset. — Werdna • talk 14:21, 11 July 2008 (UTC)
Since Wikipedia is build onto acronyms, im not surprised that i tend to think in them first. Either way, thanks for responses. It cleared up the entire idea quite a bit, and it certainly looks promising. Good luck with it, and im sure we will all see this in perfect working order soon.  :) Excirial (Talk,Contribs) 14:30, 11 July 2008 (UTC)

Acceptable False Positive Ratio and False Positive Detection?

Hello there. Since this is all very quantifiable, what do you suppose is a good ratio we will allow. If you were to sample the AmeRican Justice System, you'd find a false positive weighs quite heavily. How do you weigh it? 1:100? 1:1000? 1:3.14?Yeago (talk) 04:30, 9 July 2008 (UTC)

Are there any automated processes in place for detecting false positives, considering that many of those who may trip false-positives may indeed be completely novice editors who may simply wander off after finding their contribution attempts ineffective?Yeago (talk) 04:30, 9 July 2008 (UTC)

I conceive that the uses of this extension will be so limited, that false positives will rarely, if at all, occur. In any case, the abuse log would provide adequate information as to which filters have been triggered, which parameters caused the trigger to occur, and so on. If users would monitor this page, then we have monitoring of false positives under control. — Werdna • talk 04:38, 9 July 2008 (UTC)
'so limited'=thousands of reverts per day?Yeago (talk) 05:32, 9 July 2008 (UTC)

No, I would imagine that it would trigger a few times a day, when WoW and Grawp and so on come to do their rounds. — Werdna • talk 05:50, 9 July 2008 (UTC)

Your filter isn't if user=='WoW'|user=='Grawp' is it? =)Yeago (talk) 01:07, 10 July 2008 (UTC)
If there was an automated process for detecting a false positive then it would not be false positive because we would have automaticaly detected it as false and made it not a detection :-).
Some where Werdna mentioned he would test the filters on the last 1000 edits. Perhaps a lot of angst could be save right now if Werdna were to show the results of such tests using proposed filters. This would allow us to see what the false positive results would be. Werdna could run such a test for a month or so to show that what he is saying is correct. Ttguy (talk) 14:24, 9 July 2008 (UTC)
An automated process or at least an automated process which produced results for human oversight. I'm fairly familiar with automation. Yes, Werdna could but shows no interest.Yeago (talk) 01:07, 10 July 2008 (UTC)
The filters can be modified, probably by admins, so the false positive ratio will vary depending on how conservatively its used. Mr.Z-man 21:03, 9 July 2008 (UTC)
Yes, of course. The threshold could be raised or lowered but Werdna doesn't seem interested in giving any specific criteria.Yeago (talk) 01:07, 10 July 2008 (UTC)

Details are below. Test runs on the last year's data indicated a 40% sensitivity and 99.8% specificity. — Werdna • talk 11:16, 13 July 2008 (UTC)

We need some more background

Or maybe it's just me. Basically I would want to have some statistics.

How many page moves are there per day? (I would think that this should be relatively rare event.) How many of them are vandalism? Is it feasible to restrict the ability to move pages to admins, or would that overburden them? Maybe we can restrict the page move requirements in such a way that the top 20% (or so) of contributors to Wikipedia can move pages? This would mean making the requirements depend on the amount of active editors, for instance they could automatically be changed every month or every year. --KarlFrei (talk) 13:30, 9 July 2008 (UTC)

Why does this proposal have to be independent of the implementation of this extension? While I think that your idea is unlikely ever to gain the necessary consensus, countering pagemove vandalism is not the only thing this extension can do - in fact its ability to counter just about any kind of vandalism we can reliably identify is what makes it so powerful and useful. Happymelon 16:28, 9 July 2008 (UTC)
Looking at the move log for today, there have been more than 1000 pagemoves today, and there's still 3 hours left in the day. Since most move operations cause 2 log entries (page and talk page), this is still more than 500 per day. Probably fewer than 1% of moves are vandalism and reverts of the vandalism. Heavily restricting all moves makes little sense given that so few moves are vandalism. This would encourage new users to do copy-and-paste moves, which take even longer to fix than vandalism. Mr.Z-man 21:13, 9 July 2008 (UTC)

I estimate that 5000 moves of a recent dry run of 250,000 were vandalism. Therefore, it's about 2% vandalism. — Werdna • talk 11:15, 13 July 2008 (UTC)

Dry Run

Due to popular demand, I've gone ahead and done a dry run of a rudimentary anti-grawp filter to demonstrate the ability of this extension. The test was overwhelmingly successful — of the 25,000 pagemoves analysed (about 20 days' worth), 288 matched the filter. Of these 288, there were absolutely no false positives, as evidenced by every single user being detected by the filter being blocked for pagemove vandalism.

Full results are available at Wikipedia:Abuse filter/Sample.

Werdna • talk 13:14, 10 July 2008 (UTC)

Thanks for running this test. It looks very promising indeed. Angus McLellan (Talk) 15:49, 10 July 2008 (UTC)

Based on these results, I support the implementation of this bot. Tim Vickers (talk) 15:51, 10 July 2008 (UTC)

Very impressive. I would support this filter as a matter of common sense now that it has shown itself to work so well. I suggest that each filter be given a dry run first then approved by the community. I am sure this filter will be appreciated. 1 != 2 16:12, 10 July 2008 (UTC)
Oooooh, looks good! I remember you posting a link to a test wiki where the extension was operational - is this still available? Happymelon 17:36, 10 July 2008 (UTC)
Strong support, with until1=2's caveat :) Great work, Werdna :) SQLQuery me! 20:43, 10 July 2008 (UTC)

I've just done some false negative analysis, too. In the period examined, I looked for users whose accounts were less than a month old, and who were blocked within ten minutes of making a page-move. In doing this, I found 45 user accounts, who had, between them, made 295 page-moves. I then cross-referenced this user list with a user list generated by running the filter across the same period. The lists were virtually identical, except the filter had failed to detect three of the accounts, which had made 7 moves between them. These seven false negatives, after careful examination, were run-of-the-mill vandalism, not worth the risk of programming a filter to detect. In other words, this filter has detected 98% of the last month's pagemove vandalism, with zero false positives. — Werdna • talk 01:15, 11 July 2008 (UTC)

That's awesome. —Giggy 03:40, 11 July 2008 (UTC)
Absolutely great work, yet again, Werdna. I'd probably like to see a little more testing, however, if it's not too much of a bother (maybe pointless, too, so, whatever :) ), maybe another month back... Perhaps on some other filters you may have in mind :) SQLQuery me! 03:45, 11 July 2008 (UTC)

I ran the same filter over the last 250,000 pagemoves (which goes back to the 12th of December last year). The filter again showed its worth. Of the 250,000 pagemoves in that period of time, the filter matched 1496 moves. Of these matches, four are not currently blocked, one was a vandalistic page-move, made on April Fools' day. That leaves three false positives. One was a user converting an ASCII title to unicode (which matched the filter's provisions for adding large numbers of special characters to titles), and the other two were newbies moving their userspace articles out into articlespace (which matched the filter's provisions for moving other users' userspace articles to long titles). The second two could possibly be removed by more tweaking of the filter. I think that one to three false positives in six to seven months is well worth the benefits the extension could bring. The final line, summarizing the results, is:

Checked all 250000 moves. Of these, 1496 hit the filter, and 4 of these are probably false positives (user is not blocked). Evaluated 583304 conditions.

I'm putting a full summary of the sample run at Wikipedia:Abuse filter/Sample2. — Werdna • talk 08:52, 11 July 2008 (UTC)

Previous estimations of false negative rate appear to be dramatically over-the-top. A test of the filter over the longer period of time achieved the following statistics:

Checked all 250000 moves. Of these, 1565 hit the filter, and 4 of these are probably false positives (user is not blocked).
-Of the 250,000 moves checked, 2951 resulted in a block within 10 minutes, but were not matched by the filter. These are probably false negatives.
So, 34.65 percent sensitivity, 99.74 percent specificity.
Note that sensitivity refers to the percentage of pagemove vandalism detected by the filter. Specificity refers to the percentage of moves matched by the filter which were pagemove vandalism. — Werdna • talk 10:44, 11 July 2008 (UTC)

Very nice! :) Another minor annoyance (that we're starting to see more frequently), would it be possible to use it to block this sort of edit? (admin only, sorry) SQLQuery me! 16:45, 11 July 2008 (UTC)

The short answer to that is "not yet". There are some changes that VasilievVV wants to make to my code to improve the scripting language, which should, theoretically, allow this. There are a few more hoops to jump through on the technical side. I suspect that we would have a single rule to block all instances of that behaviour, simply targetting the large number of HTML elements of a particular nature required for it. We could then have a separate rule which blocks that particular instance, with extra sanctions (seeing as there is no legitimate reason for it to be posted on Wikipedia). This could be done by picking a substring at random from the HTML code used, and targetting that, after normalisation and so on. — Werdna • talk 11:10, 13 July 2008 (UTC)

Whitelist

Sorry if this has been discussed before, but is it possible to have a whitelist for users that the filter should not interfere with, similar to the Huggle whitelist? We would probably want the requirements for being on the whitelist much higher than Huggle's, but what do editors think about the idea in principle? Skomorokh 12:21, 11 July 2008 (UTC)

Well this thing would be built into the mediawiki software, so I think a whitelist would be much harder to do, it would probably need to be a new usergroup of some sort, and given past issues with compromised admin accounts, wouldn't be something we could tack on to an existing group. So I'm really not in favor of exempting anyone's edits from this sort of thing. MBisanz talk 12:24, 11 July 2008 (UTC)
Okay, I didn't appreciate the technical requirements would be so onerous. No point in adding to the usergroup soup. There's no way the thing could query a database like the Huggle whitelist? Skomorokh 13:11, 11 July 2008 (UTC)

It is not technically onerous, but I would prefer that we did not exempt users wholesale from the filters. The filters are flexible enough to allow such exemptions to be included in the filters themselves. — Werdna • talk 13:19, 11 July 2008 (UTC)

I agree that the filters will need to avoid certain users, such as bots, or people who do mass vandalism cleanup. Many filters such as those who seek out people doing things with sleeper accounts can be told to ignore users with bot/admin flags. Other actions may be so specific that false positives are unavoidable. The automated message given to any user auto-blocked should contain instructions to use a special unblock template and an apology if it is mistaken. 1 != 2 15:18, 11 July 2008 (UTC)
Indeed, this is probably going to be desirable functionality at some point, I know, I know, it can be done in the filters themselves tho... Still might be useful (maybe a mediawiki: page so admins can exempt people?) at some point. SQLQuery me! 05:54, 13 July 2008 (UTC)

If it were implemented, it would be implemented as a user group for exempt users. It would absolutely not be done in a MediaWiki page. — Werdna • talk 10:58, 13 July 2008 (UTC)

Good point :) SQLQuery me! 20:42, 13 July 2008 (UTC)

Whitelisting all admins would seem the best option to me. The group already exists, so there's no added bureaucracy, and if a major vandal has managed to get an admin account then they're likely to move on to more "interesting" vandalism than whatever they were doing before that got filtered. --Tango (talk) 20:49, 13 July 2008 (UTC)

Changes

How are adds/changes in the filters decided on? And how would changes in functionality and viewing permissions be decided on? In other words, how are we going to avoid scope creep in the filters and unauthorized changes in the functionality? RxS (talk) 15:26, 12 July 2008 (UTC) Superscript text

Any answers to this? RxS (talk) 21:02, 13 July 2008 (UTC)
You may want to participate in the above discussion regarding the abusefilter-view privilege if you're interested in preventing unauthorized changes in the functionality. Celarnor Talk to me 22:18, 13 July 2008 (UTC)

Leaky admins?

The idea of restricting the coding of the filter to admins is pointless. I estimate that within 4 weeks of implementation, the text of the filter will end up wikipedia-r-you-know-where. Not necessarily a bad thing, either... Since the secrecy impinges on thousands of people - many opposed to the plan - keeping shtum, it's a daft idea, surely? ╟─Treasury§Tagcontribs─╢ 13:10, 13 July 2008 (UTC)

4 weeks longer than if we make it wide open. Regardless, what would such a leaky admin even have to gain? Helping someone like grawp perform juvenile vandalism? If an admin wanted to be a dick, this is not the most attractive way. 1 != 2 00:46, 14 July 2008 (UTC)
Yes, one could make an argument that leaking something like the content of a deleted page or incriminating IRC logs could serve some sort of higher purpose. Leaking the filters would just be leaking for the sake of leaking. The way I see it, if we let everyone see the filters, they'll be bypassed in a day or 2, maybe less. If we heavily restrict it to a group of highly trusted users, you get potential cabal concerns and you lose the responsiveness you would have if a large group has access. If we let a larger group of generally trusted users have access, it can be quickly adapted and used more effectively while still maintaining some security. Mr.Z-man 02:15, 14 July 2008 (UTC)

Random thoughts on another application

I've just been reading up on this with the attention it recently got through the Signpost, and this extension looks incredible, especially with all the test results posted above. While it looks as though this is mainly intended to work on vandalism, especially of the type done by our favorite recent fantasy-fiction loving page move vandal, I'm wondering if another application could be drawn from this.

Would it be possible to add in another filter criteria, to focus filters on a specific user, to monitor them for edits made in violation of a community or ArbCom sanction placed on them in response to previously disruptive behavior (in layman's terms, a topic ban)? It doesn't seem like it's currently set up to do this, focusing mainly on wide ranges of users instead of one specifically, but this could be potential asset to aiding processes like Arbitration Enforcement. Obviously such an addition should only be done in serious cases - an "I'll unblock you on the understanding you avoid Namespace X for a few days" obviously wouldn't merit such - but this could be a tool used by the Arbitration Committee to enforce some of their more specific remedies that don't really require individual review, for example, one-revert probations. In that case, an arbitrator or ArbCom clerk could set a filter on User X to check for edit summaries matching the formats used by rollback, Twinkle, and the undo button, more than one edit to the same article within 24 hours, in the article namespace only, to disallow on first offense, remove rollback on second, and block temporarily on the third. A bot could be set up to automatically include these entries in the case's sanction log. Filters used in this manner would of course need to be double-checked from time to time (which, it being an automated system, should be done regardless of the use), but it would help enforce the remedies and reduce the mindset of "Well, if I only do it occasionally, nobody will probably notice, and I can still get away with it."

Obviously this is bound to be somewhat controversial, and I would in fact be surprised if it were put into use, but since it seems as though it would only require a small addition to the existing code, and what my first impressions led me to think it was in the first place, I thought I'd point it out for everyone to stew over. Shout at me if you like, and apologies if I happened to repeat anything said above (I read some discussions in detail, but skipped a lot of the shorter ones), but in a few limited severe cases, all subject to review and subsequent removal as deemed necessary, it could be another useful addition to this already great tool. Hersfold non-admin(t/a/c) 03:14, 14 July 2008 (UTC)

The use of the extension for this purpose is specifically discouraged. The reason for this is because there are better technical ways to achieve this aim. I would expect that such a use would cause a ballooning in the number of filters, each of which must be checked against every edit, with quite significant performance implications. There are plans in the works for better ways to enforce such specific restrictions on users. — Werdna • talk 03:35, 14 July 2008 (UTC)

I thought as much, just thought I'd throw it out there. Thanks for the reply.

Deployment

The way I see this proceeding is for the devs to install the extension with every action bar 'logging' disabled; we can then do 'live' testing without any skynet complaints. Then we can have a poll to take the extension 'live' by enabling the 'active sanctions' features. Does this sound like a promising idea? If so, we need to decide, once and for all, how this thing is going to be configured. So let's continue the discussion from the #Assigning permissions section above. Below are subheadings for each configuration setting that is available, and an 'initial setting' to prompt discussion. If you agree with the setting, please leave the section alone, no need for any "endorse" I think ATM (we'll have to have a poll later to 'endorse' all the settings to present to the devs); by contrast if you disagree (and while these are my personal interpretation of the debates above, discussion is urgently needed to hammer out the still-controversial ones), please do speak up. That way, we can easily see which ones are still controversial by which sections go on growth spurts. If, however, you agree with everything below, please drop a note to that effect in the 'general discussion' section so we can see you've actually read it all :D. Happymelon 17:38, 10 July 2008 (UTC)

abusefilter-private

'checkuser' Happymelon 17:38, 10 July 2008 (UTC)

I have a question. It is my understanding that through a conventional checkuser interface that if a CU wants to get private information that information request is logged for other checkusers to audit.

Will this function contain private information that would normally require a CU to be logged to obtain? Will this function log all accesses to it from CUs? My only concern is that this tool could be used by a CU to gather information without the transparency that is supposed to be in place. Or perhaps I am wrong about how things work. 1 != 2 15:23, 11 July 2008 (UTC)

Does nobody know about this? 1 != 2 14:55, 13 July 2008 (UTC)

I wouldn't imagine that this would need a log. The privacy policy states that IP addresses may be released in cases of abuse, I'm not sure anybody will be terribly upset if the checkusers can find the IP addresses of people who have triggered filters — it simply makes things much easier (having the information "all there" for quick responses). — Werdna • talk 02:40, 14 July 2008 (UTC)

abusefilter-log

'*' Happymelon 17:38, 10 July 2008 (UTC)

abusefilter-log-detail

'*' Happymelon 17:38, 10 July 2008 (UTC)

abusefilter-view

'*'?? Happymelon 17:38, 10 July 2008 (UTC)

Since these filters are being used against specific types of vandalism made by a person or small groups of people with a specific Modus operandi, then I don't think it is a good idea that they can just look up what filters are being used to prevent them. grawp for example could easily look up his filter and change what he is doing just slightly. with private filters he would have to burn up his sleeper accounts trying to figure it out and when he does figure it out we have a good chance of noticing that the filter missed it and fixing it before he can do a large scale deployment with multiple accounts. He could not create such a large scale automated attack without first revealing his new method of vandalism, whereas if he could read the filters he could plan it all without giving us any prior example of what he intends to do.
Security through obscurity is a terrible model when it is put up against a huge group of people, but this tool is meant to go after very specific types of attacks by small groups or even individuals, and it that situation security through obscurity does work. Keep in mind that an open security system may use open source software that is publicly audited, but they will still keep configuration files a secret or otherwise they may make a brute force attacks easier, or even just let the attacker know what methods will not work. Open security does not mean all your books are open. 1 != 2 20:23, 10 July 2008 (UTC)
Considering the capabilities of the bot, I think that the only safe route to go is this one; the above's comparison with FOSS security software is a little off with regards to closed elements; seeds for encryption software don't dictate the algorithm, .confs don't contain any super secret extra algorithms or the like that are hidden from the rest of the world. The idea is to keep everything open so anyone can look at it and suggest or implement improvements. If a vandal is determined enough to learn the filter syntax and go through all the filters before his attack, having a closed one isn't going to help any; they can simply buy/work an account up to sysop and then use it to read the filters. Having them closed gives us a lot of negatives, namely fewer knowledgeable pairs of eyes on the filters and hindering improvement and potential for massive systematic abuse. In my opinion, the minor (if any) protection offered by restricting read access to the filters to just the administrative team is grossly offset by the benefits of complete openness. Celarnor Talk to me 01:47, 11 July 2008 (UTC)
  • "They might do positie work while pretending to help herd the cattle, so instead, let's just leave the corral barn door open for anyone, so they don't have to." - Logical?
Coulda, woulda, shoulda. If that's what the reasoning is, then let's make them jump through those hoops. Think of the good work that would have to be done to actually scale the walls that are RfA. Compare that to leaving this open to the thousands (millions?) of viewers. Doesn't compare in my estimation. If they are that determined anyway, they would have access to far more things than just viewing filters. Sorry, but I really do not want to see much, if any of this in the hands of the general editor, much less general reader. Logs which indicate why you (or someone else) was/were blocked? Sure. Everything else? No. - jc37 03:27, 11 July 2008 (UTC)
That's just one of the reasons that we shouldn't keep them secret. You may think that it hasn't happened before, but it certainly has. There have been stories about people selling sysop accounts on craigslist, people like Runcorn who worked their way up to sysop so they could unblock tor nodes ... it's certainly a problem, and when you're talking about something that could (depending on the throttling features) block hundreds of users a minute, you need to have as many eyes looking at them as possible. I don't care whether they passed an RfA or not. Being an administrator isn't a big deal anyway, and it's not like they're automatically better people simply because they have an extra bit; by having the capability of modifying filters without the oversight of the community, they're endowed with considerably more power than they would have had otherwise. Rather than increase the threshold of adminship, which would be the alternative solution, I think it is a much better idea to increase the transparency of the tools.
Preferably, I'd like to keep it to the general public, but if that isn't acceptable, autoconfirmed users or at the very least rollbackers should be able to view them or the majority of them (see below). That said, there aren't really reasons many why we should keep all of the filters secret (i.e, some you wouldn't be able to get around at all, like a single filter that triggers upon moving the main page to some hagger variant); the goal should be to keep everything as open as possible, and close things only when absolutely necessary. While I recognize that in some cases where only weak, vague heuristics can be applied or otherwise suffer an unsafe margin of false positives secrecy may be warranted, the default position should certainly be one of editors being able to review the filters; that itself carries a number of non-philosophical, more pragmatic advantages in terms of improvement, keeping tabs on editing of said filters, prevention of abuse...
The minimal benefit that security through obscurity provides is entirely offset by the psychological problems and false sense of security alone ("Oh, the vandals can't see it, so it doesn't matter"); the work that goes into making a heuristic capable of withstanding visual analysis only makes a better heuristic, and I think that alone is enough to endorse a position of open-ness on, but in this case, there are other offsets both practical and philosophical in this environment that make it even more a negative. Celarnor Talk to me 05:33, 11 July 2008 (UTC)

I am happy with adding a 'hidden' attribute to individual filters to hide them from public view, as a compromise. — Werdna • talk 02:54, 11 July 2008 (UTC)

I don't really see that working as a compromise. There would have to be a decision making process to decide which of the filters are weak enough to need to be hidden, and to keep it from becoming cabalistic, it would have to come from the community; by doing that, you've already released the content of the filter, which doesn't satisfy the 'security through obscurity' crowd. Conversely, having a cabal of editors decide which filters to remain hidden probably won't satisfy the 'open and transparent' crowd, since said cabal could simply hide whatever they wanted to without any community oversight or involvement. Celarnor Talk to me 03:21, 11 July 2008 (UTC)

The rules are hidden, but the (mandatory) descriptions of them are not, nor are which edits they apply to (the abuse log can easily be sorted by filter), nor are the actions assigned to them. — Werdna • talk 03:24, 11 July 2008 (UTC)

True; assuming that there are descriptions of them that are as accurate and specific as possible without giving away exact details (i.e, not "Pagemove vandalism filter", but "Quickly moving large numbers of pages to unicode titles" for moving >= 30 pages over the course of a minute with titles containing unicode), I suppose that wouldn't be a problem if it was done in that way. Celarnor Talk to me 03:30, 11 July 2008 (UTC)
Look, 1 editor in 50 here is going to actually be able to understand these filters. Everyone else can see both an accurate description and the dry run results, and the results of it actively running. This argument seems more about "I want everything open" then "there is real use to this being open". Instead of sysop why not a "filter" access group so that non-admins who are trusted can view it?
Seriously folks, you can argue for weeks to see the filters, get your way and then nobody will use them because they are in an arcane symbolic language. Then they will not work because grawp just loaded his bookmark in the morning and changed his method slightly. If we are going to have open filters then we might as well just forget about using this to target individuals who bother to take a bit of effort.
In my experience when a config page can be viewed by non-admins but edited only by admins and is a non-simple language then regular editors don't even bother to suggest edits on the talk page. See User:HBC NameWatcherBot/Blacklist for an example.
Despite this the community will have control over this system. They can discuss what they want, and what they think of the results. You tell your Chef you want a Yorkshire Pudding, but you don't need to go into the kitchen and tell him how much water to use. 1 != 2 15:08, 11 July 2008 (UTC)
Sysops plus 1/50th of all editors is still better than just sysops. Besides, only one in fifty of the sysops are going to be able to understand this; that point only really works if you're advocating viewing privileges to developers alone. In any case, we're not talking about regular expressions in perl here; from what I've seen, it looks to be a rather simple scripting language. It doesn't look particularly esoteric, and I don't think it would take much for most people to figure out how to use it; the scripting language itself isn't obfuscated, and since the documentation is going to be entirely open, I don't quite see how people who want to won't be able to figure it out.
That's a moot point; it could be that your example doesn't warrant the same kind of critical attention from the community that this requires. It could be that it's really an admin-only type of thing (i.e, adding names to the blacklist after they have blocked the editor in question) and doesn't really need input from the community as a whole. We aren't discussing the blacklist for that bot, which doesn't have any reason to be hidden; we're talking about the filters for this extension.
That's a terrible metaphor. Your yorkshire pudding doesn't have the power to block people. Your yorkshire pudding can't cause mass chaos that could require days to repair; your yorkshire pudding is your food. If I'm working on a DBMS schema for enterprise-level deployment, you bet your ass the people that contracted me are going to want to see the schema and not just my description of it. There's simply too much risk involved to assume that everything is going okay, and there's really no reason to hide things that don't need to be hidden; openness should be the default unless circumstances dictate that they be hidden, which I concede that some of the weaker and super-specific heuristics may. Celarnor Talk to me 18:01, 11 July 2008 (UTC)

It is my view that we should hide all filters from non-sysop viewing, if and only if circumvention becomes a problem that cannot be dealt with by hiding the filters which are being circumvented. — Werdna • talk 11:03, 13 July 2008 (UTC)

Not sure if it is technically feasible, but it would be nice to have them all visible except for those we need hidden for one reason or another. 1 != 2 14:57, 13 July 2008 (UTC)
I guess we can all agree on this, then. If possible, those that circumstances dictate need hiding can be hidden, and those that don't stay open by default; if it turns out that keeping everything open is a miserable failure, we can always tweak the filters' parameters and hide them. Celarnor Talk to me 16:12, 13 July 2008 (UTC)
As long as I can see exactly what the filters are doing in the log, I don't need to know the details of how that outcome is achieved. If a tool works as advertised I don't feel any need to take the back off and poke around inside. Tim Vickers (talk) 18:19, 14 July 2008 (UTC)
In an ideal world, that would be true. However, without access to the filters themselves, all you can see is the results achieved, not the mechanism by which they were achieved. Correlation does not imply causation. While there are mechanisms in place to throttle the extension so it can't provide mass abuse, there's nothing in place to keep it from targeting a specific editor who is on the bad side of the filter controlling cabal or the like; without being able to examine the filter parameters (or going through relevant edits made by all the blockees, a very tedious and unnecessarily indirect process), there is no way to discover the specifics of the situation. Celarnor Talk to me 20:02, 14 July 2008 (UTC)
If the results of the filter are reasonable and it only picks up obvious vandalism, as it is intended to do, then there would be no reason to think that would be a problem. If non-vandal edits are picked up by a filter, then we'd need to work out why and look at what the code is doing. However there is no need to solve that hypothetical problem in advance. Indeed, looking at the false negative/positive rate from the test run above, this is not a concern with the current filter. Tim Vickers (talk) 23:24, 14 July 2008 (UTC)
For that filter. That just indicates that filter is successful in what it was meant to do, and having seen both the text of the filter itself and the non-existent number of false positives, I wouldn't have a problem with it being used. However, that may or may not be true for those that come later. The extension allows for multiple filters, not just that specific pagemove vandalism heuristic. It doesn't necessarily follow that those that follow in its footsteps will be equally successful with regards to false positive ratios, or that their writers won't be abusing their privileges. The easiest solution to these issues is wider transparency with regards to the filters. Even as a hypothetical problem, considering its scope and magnitude, I think it easier to solve it now, when the solution is so simple than deal with these problems later as they manifest and we have to clean up a mess. Celarnor Talk to me 02:26, 15 July 2008 (UTC)

abusefilter-modify

'sysop'? Happymelon 17:38, 10 July 2008 (UTC)

$wgAbuseFilterAvailableActions

Initially: → array( 'flag' )

As I said, we should (IMO) be looking first and foremost to get this extension installed. Let's not try to run before we can walk: let's leave the discussions about whether to block or desysop in their sections above - they're all valid debates and the do need to be resolved, but they need to be resolved before we ask the devs to take the extension 'live'; they don't have to be resolved right now. They're important questions, so we need to take our time over them; but the sooner we get this on the devs' desk for performance review and installation, the better. Happymelon 17:38, 10 July 2008 (UTC)

Technically speaking, it'd be just a blank array, since 'flag' doesn't really exist as an action. We'd also need 'throttle'. There is merit in adding 'disallow' and 'warn', too. — Werdna • talk 00:22, 11 July 2008 (UTC)
My argument is that the extension shouldn't be allowed to do anything active when first installed, so we can run some live tests without any skynet complaints. When it goes live I personally think we should have everything (preferably with 'degroup' filters needing to be set by stewards); but that's another story. Happymelon 09:36, 11 July 2008 (UTC)
Wtf, nobody is mentioning Skynet except in jest... something is odd here. Wait, perhaps I am the only human left! Ummm, I mean... Beep. 1 != 2 15:15, 11 July 2008 (UTC)

Well, we could shoot for disallow and blockautopromote too, and see what the community says. You seem to be making concessions to non-existent opposition. — Werdna • talk 11:02, 13 July 2008 (UTC)

General discussion

In particular, if anyone thinks that I'm way out of line with the above, or that this game-plan for deployment is totally mad, please do flame me below :D Happymelon 17:38, 10 July 2008 (UTC)

  • I think these, like anything, should restrict access to only those who may potentially "need" the infirmation or access. (Needing the "tools to do the job".) The "titles" of the various parts seem a touch confusing at times, as to what refers to what. But to try to make some sense of it, here's what I think, anyway:
    • abusefilter-modify: allows holders to modify filters which are applied by the Abuse Filter. - (Presuming that this includes "creating" a filter.) I think this should be sorely restricted to those with indicated coding experience of some time, at the very least. I think that this should be more of a "by request" thing, where developers (who we trust with such code) do the actual coding. I understand "too busy", but from what I understand, they've been doing this anyway. And once in place, automatic filters should be easier to maintain than several bot accounts (among other things).
    • abusefilter-private: allows holders to view private information on the abuse log. - checkuser? Oversight?
    • abusefilter-view: allows holders to view filters active on the site. - I'm leaning towards developer only, but not sure. If it's broadened, then probably to those who can unblockuser or checkuser.
    • abusefilter-log-details: allows holders to view detailed information about individual events in the abuse log. - I think that anyone who can perform unblockuser or checkuser should be able to view the filter log details.
    • abusefilter-log: allows holders to view the abuse log. - I'm presuming that this just lists that a filter action was taken concerning a user, and what that action was (blocking, etc.) If so, then anyone with an account should be able to view.
    • All of the above presume that developers have access to them all. (Dunno how much access developers have to "privacy"-related info, so dunno about the privacy one.) And presuming that Stewards could technically have access at any time, since they can give themselves the ability. And Oversight might need to be able to view everything.
    • If I'm missing something, please clarify. - jc37 22:49, 10 July 2008 (UTC)
      Only the fact that none of these permissions requires access anything like as high as you've set them :D. Other than abusefilter-view-private, which must be granted only to checkuser, we're at liberty to set these how we want. The developers have the technical ability to do anything on-wiki - anything they can't officially see or modify, they can do directly into the database tables. For an example, check out the history of MediaWiki:Copyright. See the top two entries? Now try and find the log entry where Midom got his sysop rights :D. The developers are the gods on wikipedia, and hence they are far too busy to take an active role in vandalism-fighting on any particular wiki - apart from having 749 other wikis to similarly maintain, they are responsible for the MediaWiki software as well. As you can see from Special:ListUsers/developer, they aren't even around here very often - pretty much the only pages they edit are WP:VPT. They haven't been "doing this anyway", which is why we've had to improvise with adminbots etc. Now we have the technical ability to defend ourselves, why would we want to pass that power back to the devs?
      There is no one on-wiki with "indicated coding experience for some time". This extension uses a unique filter programming language, which is nonetheless very easy to understand. Any technically-literate person could have a reasonable crack at making a filter and is no more likely to screw up than our best bot operators. As ever, the problem is likely to be unanticipated legitimate uses of proscribed behavior, which has nothing to do with coding skill. Furthermore, this extension will only be useful if we are able to constantly update and tweak the filters to match the vandals' latest moves; this will be even more important if, as seems to have been agreed, security through obscurity has been largely abandoned. The higher up the tree we set these permissions, the less useful the extension becomes. Happymelon 09:48, 11 July 2008 (UTC)
      Hm. I was basing my comments on what Werdna said above concerning the bots. Perhaps I was presuming that the developers were the ones running such. My apologies for misunderstanding.
      That said, I somewhat agree with U1=2. I don't think just "anyone" should be configuring these filters. (Which I think is Celarnor's point as well, though Celarnor seems to have a different suggestion for dealing with the concern.)
      I think we're all concerned with the possibility of misuse and abuse. Though I feel that more cases of misuse than abuse are more likely, due to well-intentioned "help". As such, I'd like to see more "professional" hands holding these keys; utilising these tools.
      It looks like what is "seen" can be modified in several ways by Werdna (hidden, and otherwise), and so who can view what can be another thread. But I strongly feel that only the developers (those we trust with such code) should be doing the actual inputting of the filters. If there was a group called "trusted/admin coders" (a coder version of admin - different set of tools based on different "needs"), I'd be all for them having such access. But we don't have such a group, and creating such a group arbitrarily be selecting a few people to have filter access seems wrong as well (that cry of cabalism rears its ugly head : )
      We just don't really have anything that's more than a sysop, but less than a developer. (Well we do, but bureaucrats are slightly different. But then, maybe that's the simplest answer, I dunno.)
      I do welcome ideas. - jc37 04:51, 12 July 2008 (UTC)
      Bot operators seem like obvious choices for this new trusted class. Most of them were able to enter the bot operator class by demonstrating their technical skill without raising much user protest, and that's the skill set also required here. It's not a cabal if anyone can progressively enter the class, beginning with constructing and operating a responsible bot.
      All bot operators have coding experience and some of them are specifically experienced with anti-vandal programming. If they are made personally accountable for the filter sets they program, how is that significantly different from responsible bot operation? Milo 07:20, 12 July 2008 (UTC)
      Because besides the handful of antivandalbots (I think we have 2 or 3 running now) and some (but not all) of the semi-secret adminbots, most bots don't have near the effect that these filters can. If most bots screw up, its a few minutes of reverting and recoding, then back to business as usual. Mr.Z-man 14:55, 12 July 2008 (UTC)
      I used Wikipedia:Abuse filter#Safeguards – i.e., Werdna considered worst cases – in my assumption that bot operators were up to the task. But I don't know. Let's see what Werdna, 1!=2, and Happy‑melon have to say about worst-known-case recovery time. Milo 20:40, 12 July 2008 (UTC)
      The other problem with bot operators is that it restricts it to a fairly small group of people. We have fewer than 500 flagged bots, and some people operate many bots. Many (most?) bot operators aren't admins and giving them access to the filters would allow them to block people through the filter. Some have even failed on RFA. We also have lots of interwiki bot operators, for many of whom this isn't their home project and they may not speak good English. Mr.Z-man 21:06, 12 July 2008 (UTC)
    Z-man (21:06): "The other problem with bot operators is that it restricts it to a fairly small group of people."
    I counted the whole table at Wikipedia:Bots/Status#Registered bots (472), counted active bots as listed (177), purged those bots with no owner name (mostly "discontinued"), extracted a list of named bot owners (261), deleted duplicates leaving unique owner names (207) (counts not error checked).
    Since there are 54 duplicate names, subtract that from 177 for a low estimate of 123 operators (it could be figured exactly, but I didn't).
    Based on an estimated range of 123 active operators (including an unknown number of ESLs) up to 207 uncertainly available, this depth of talent may be adequate for Abuse filter parameter programming – provided the screwup risk isn't too high. (And if the screwup risk is really that high, why wasn't AF proposed as a developer-only tool rather than being for "privileged users"?)
    At Wikipedia:Bot Approvals Group, I counted 21 active members who approve some/most bots. All but three of these are admins.
    Z-man (21:06): "Some have even failed on RFA."
    Not significant without a case-by-case analysis. I know an admin coach who didn't pass RFA. I assume that WP:BAG can distinguish operators with excellent technical judgment and at least adequate interpersonal skills from those who don't.
    Z-man (21:06): "giving them access to the filters would allow them to block people"
    I thought blocking had been dropped as a contentious Abuse filter function? Anyway, my solution to that problem is for the community to sort out the known high risk vandalisms, and approve for those a 20 minute block log-labeled "NO FAULT TRIGGERED BLOCK - USER MAY BE INNOCENT OF WRONGDOING" (I suggest 20 minutes for the community to manually react, as just today I noticed an old WoW pagemove vandalism which took 21 minutes to revert.).
    Z-man (21:06): "interwiki bot operators ... may not speak good English"
    Yes, that's a serious problem if writing high-risk (are they?) filter parameters. However, the vetted Abuse filter technology needs to be rapidly disseminated to other language wikis. Possibly an apprenticeship can be arranged where an interwiki ESL novice operator, an EFL journeyman AF operator, and a translator can form an Abuse filter parameter training team. Milo 03:27, 13 July 2008 (UTC)
  • Everyone here seems to be under the impression that this involves some kind of incredibly complicated scripting language; that isn't the case. It's very simple, and anyone technical-minded person that wants to can easily review the relevant documentation and learn the scripting language; even someone already versed in Perl/Ruby/PHP/whatever would have to do that, since it isn't a preexisting language per se. Celarnor Talk to me 14:26, 12 July 2008 (UTC)
      • I think it's fair to say that the majority of editors would actually not have the experience you descibe. Most would need a bit of learning time, at the very least.
      • From my own experience most people haven't such experience even to set the clock on their electronic devices.
  • I don't understand what you're saying; there's nothing to keep anyone, including you, from reading the documentation and learning how to read and write the filters; they don't have to know how to read PHP and SQL syntax unless they actually want to review the extension itself. Sure, some people aren't the type of people that are capable of the linear logic and the like involved, but anyone who wishes to is free to learn how to read them. Celarnor Talk to me 02:46, 14 July 2008 (UTC)

Collateral Damage

This proposal has its merits, but it also has an Achilies heel, from what i have read their will be quite allot of well meaning contributors who will be stuffed around by this proposed extension   «l| Ψrom3th3ăn ™|l»  (talk) 05:38, 11 July 2008 (UTC)

See the dry run above (#Dry Run) - there won't be. Where did you hear this from? —Giggy 05:45, 11 July 2008 (UTC)
Quite. The first test of the system has had a 0% false positive ratio. Happymelon 09:51, 11 July 2008 (UTC)
Arguably, this filter was very restrictive. We are safe if only use that kind of filters (that will have tons of false negatives). -- lucasbfr talk 10:38, 11 July 2008 (UTC)

Of course, the second test of the system had a mere 99.8% specificity. — Werdna • talk 12:36, 11 July 2008 (UTC)

I think the community will be checking both dry runs and active running to determine if a given filter is worth risk of collateral damage. 1 != 2 15:16, 11 July 2008 (UTC)
Yes, and note that private mbots could be dry-run tested prior to active running, without compromising the private filter parameters. Milo 19:28, 11 July 2008 (UTC)
I have no recourse but to object. The other solution is for every one to allow all edits and interdict as is needed, like exists now. My business school profs always said "Never say you are sorry". I object to the filter on the grounds that this is the finest encyclopedia we have, on Earth today, why mess it up? Sswonk (talk) 23:59, 11 July 2008 (UTC)
How will this "mess it up?" Mr.Z-man 00:12, 12 July 2008 (UTC)
If implemented correctly and with sufficient community oversight over the filters, the danger to content is minimal; as long as the community at large is there to watch over the filters, there isn't any harm in having them there, since the system is only designed to be used against very obvious vandalism with very specific MOs (i.e, HAGGAR). The danger isn't in the technical power of the extension, it is in what might happen if there aren't enough editors capable of watching over the system to make sure it doesn't get subverted. Celarnor Talk to me 05:05, 15 July 2008 (UTC)

De-autoconfirmation

I'm still very much opposed to de-autoconfirmation. This commit message makes it clear to me that the developer of this extension cares very little about the constructive contributors who will be victims of it. If this extension goes live some administrator WILL screw up a filter so it blocks everything. That is an absolute certainty. If you don't believe me, see the title blacklist talk page; on several occasions creation of most or all titles has been accidentally blocked. When that happens with this thing, that's going to be a least a few hundred users de-autoconfirmed before anyone manages to fix it. And they're supposed to just quietly accept their fate if this "hacky tool that I haven't put much effort into" doesn't work? -- Gurch (talk) 12:34, 22 September 2008 (UTC)

Have you even read this page? Any filter which blocks more than 5% of edits is automatically disabled by software after as few as two triggers. So, in your given situation, if a filter is created that blocks everything, two users will get their autoconfirmed status revoked by mistake, before the filter is switched off, and a big warning placed on the edit page for that filter.

I will also thank you not to make this a personal issue, as you have in writing "the developer of this extension cares very little about the constructive contributors who will be victims of it". There is no need for theatrics here. The tool worked (although, admittedly, it let anybody do the re-autoconfirming, so it's been reverted). It's not like it'd be harder to use than a better-designed tool, it was just put in a weird place, and some of the links needed weren't sprinkled throughout logs and things. Comment on the content, not on the contributor.

Please inform yourself on the abilities of this extension, and base your discussion around that, before you start posting rants.— Werdna • talk 14:43, 22 September 2008 (UTC)

Sorry. The threshold was 50% when I posted that comment; you changed it immediately before you replied. :) Am I correct in thinking that 'autoconfirmed' cannot be added/removed through Special:UserRights? If so, given that the code for restoring it has been removed, does that mean all such restorations would have to be done manually by a developer? -- Gurch (talk) 17:57, 22 September 2008 (UTC)

Yeah, the change was a response to your concerns, as well as those of others. I thought it might be prudent to go for harsher limits, and lift them if they become problems, rather than the reverse, so I guess I was a bit harsh in quoting that as the current limit, although the same logic would apply for 50% with filters blocking all edits. As for restoring autoconfirmed, what you say is true, but that revision isn't going to stay reverted, it just needed to be removed until I actually come up with a decent mechanism with access control. You can rest assured that neither Brion nor myself will allow the abuse filter to go live with such an oversight :-)Werdna • talk 04:48, 23 September 2008 (UTC)

I'd rather keep things as close to the status quo as possible until there's sufficient data to support that the community is able to keep control of whoever controls the filters and go from there if it turns out to be useless, rather than give them everything and then let the community plead to Arbcom to fix it later. Best to start with it with everything set as low as positive to test the waters, then grant more rights as the need arises if it seems like the community can keep tight controls over everything. Celarnor Talk to me 14:45, 23 September 2008 (UTC)
Once bug 15702 is implemented, I think most of the opposition to de-autoconfirmation will be dealt with. עוד מישהו Od Mishehu 06:24, 24 September 2008 (UTC)
Somehow, I doubt any of the people being opposed to de-autoconfirmation were doing so solely on the basis of the fact that it couldn't manually be assigned without SQL statements. Celarnor Talk to me 15:37, 24 September 2008 (UTC)

SQL statements wouldn't work, either. You'd have to do what I did and make code changes. — Werdna • talk 00:02, 25 September 2008 (UTC)

Indeed, Autoconfirmed status is not kept in the user_groups table like 'sysop' or 'rollbacker' etc are. SQLQuery me! 02:15, 25 September 2008 (UTC)
That's...really odd. Why was it implemented that way rather than as a usergroup? Celarnor Talk to me 00:59, 26 September 2008 (UTC)

It is a user group. It's an "implicit group", one which is assigned based on criteria, rather than manually assigned (i.e. the software checks each time whether a user should be autoconfirmed). Putting it in the user_groups table would be akin to marking a list of first names as male names and female names – not necessary, you can usually figure it out just by looking at the name. Likewise, the software already has the number of edits and age of a user when we check if they're autoconfirmed, so, two comparisons later, we can figure out if they should be autoconfirmed. Also, this way any changes will be retroactive. — Werdna • talk 02:11, 26 September 2008 (UTC)

Towards a decision...or maybe not?

I was wondering where to status of this decision was. The implementation request has been filed, but there are still issues open. For some reason, there seems to be opposition to fixing lack of a manual override of autoconfirmation. Is the Abuse Filter going though despite the lack of an undo-blockautopromote feature? NuclearWarfare contact meMy work 20:34, 29 September 2008 (UTC)

en: options?

I'm somewhat lost between archived discussions, and the project page for this of a few points that I'd like to see simply documented (please refer me to another page if exists).

  1. Which options are/are not going to be active on en: (e.g. The user's account may be removed from all privileged groups so will this really kill checkusers and 'crats ?)
  2. How will the permission sets be assigned to users? Will this be done before activation?
There is a lot of information about what can be done with this tool, and not enough here about what will be done that I see. Thank you, — xaosflux Talk 02:08, 20 October 2008 (UTC)
And yes, I have seen the answers to some of these on bugtraq, but that is not a good place to document our en: specific settings in a manner that users can understand. — xaosflux Talk 02:13, 20 October 2008 (UTC)

I've updated it a little. — Werdna • talk 02:32, 21 October 2008 (UTC)

Thanks for that. — xaosflux Talk 03:15, 21 October 2008 (UTC)

Discussion at VPR page

Please see Wikipedia:Village pump (proposals)#Use a filter to block the manual insertion of the AES arrow. -- IRP 17:59, 25 December 2008 (UTC)

Filtering edits in a way similar to flagged revisions

As pointed out, we shouldn't use the extension in its present state for ordinary vandalism, because, in my opinion, the actions we can take are too extreme for that (disallow), or inefficient (warn). However we still can detect most of it, and we could find more moderate actions for this kind of vandalism, other ways to filter edits. For pages with flagged revisions enabled, the revisions more recent than the latest flagged revision are 'moved' to the draft page, and the stable page contains the latest flagged revision. We could use a similar method to filter edits: when a revision is found by the abuse filter to be 'suspect', it is moved to the draft page and the latest non-suspect revision is set as the stable page. The difficulty is: how to identify a version as suspect or non-suspect, on which basis ?

A first possibility, detailed here, is to use a passive (disabled by default) flaggedrevs type, say 'sight'. For pages sighted at least once, we could say a revision (with a prior sighted revision) is suspect if it doesn't trigger rules based on the diff to the latest sighted revision, which is assumed to be clean, and so can serve as a reference.

From a technical perspective, it would probably require some tweaks in flaggedrevs or another extension to handle the stable/draft separation. Cenarium (Talk) 03:59, 26 December 2008 (UTC)

This is somewhat confusing, can you try and explain in another way? I don't really understand what you're proposing. Happymelon 10:30, 26 December 2008 (UTC)
Suppose the latest revision of a page is sighted and a user without 'autosight' rights edits the page once: if the new revision is identified as 'suspect' by the abuse filter, then it will make the latest sighted revision as the stable page.
For pages with sighted edits but not the last one sighted, it'll make the latest non-suspect revision as the stable page. For example see this: here the edit 3 will become the stable page and the revision 4 will be put in the draft page, until someone reverts the edit, sight it or make a new non-suspect revision.
More generally, we could modify the extension flaggedrevs this way: to each flag type, we add a subordinated automatic flag. Revisions will be automatically flagged if they meet some conditions (for example: not identified as suspect by the abuse filter, old enough - allowing an expiration for flags -). Then we can decide for each namespace, members of a category, or page, to use as the stable version either: the latest (manually) flagged revision, the latest flagged or automatically flagged revision, or the latest revision. Cenarium (Talk) 13:43, 26 December 2008 (UTC)

Addition to warning template

The de-group should trigger an automatic review and the warning template should say as much. Please update the bug and the warning template accordingly. I haven't updated the warning template as the two should be updated together. davidwr/(talk)/(contribs)/(e-mail) 03:27, 2 January 2009 (UTC)

The degroup option will be disabled on Wikimedia wikis. — Werdna • talk 06:46, 3 January 2009 (UTC)

Toward a decision

Werdna has asked me to look this page over with an eye to deciding whether we have a consensus. Judging only by the "Vote" section, I'd say no, but of the fourteen people opposing in that section, ten cite only their discomfort with blocking-by-bot, and one more opposer cites two qualms of which one is blocking. This means that if the filter does not automatically block anyone, there is quite a strong consensus in its favor. The next section, "Discussion: blocking or de-autoconfirming?", points me toward the same conclusion. It's my opinion that the filter has gained consensus, with the condition that it de-autoconfirm users rather than block them.

I think Werdna asked me to interpret this discussion because I'm a bureaucrat. I should mention that bureaucrats do not at the moment have any sort of official status outside of the RFA/B process, so I present my considered opinion as an uninvolved user, rather than as a bureaucrat. I hope it's good for something anyway. — Dan | talk 06:40, 21 September 2008 (UTC)

Thanks, Dan :-)Werdna • talk 09:11, 21 September 2008 (UTC)
I see Dan has already got to it. I agree with the assessment too. =Nichalp «Talk»= 14:02, 21 September 2008 (UTC)
Yes. De-autoconfirming seems to be the right level of control for the time being, until there's more confidence in the system. -- The Anome (talk) 22:40, 21 September 2008 (UTC)

Thanks to Nicholas and Dan for helping out here. I've filed bugzilla:15684, and everybody is encouraged to vote for that bug (but don't comment just to say "me, too"). I consider the discussion closed at this point. — Werdna • talk 07:50, 22 September 2008 (UTC)

Deferred revisions

I've started Wikipedia:Deferred revisions, a system to use the abuse filter to defer suspect edits to reviewers for review, to be used against vandalism, spam, ... But there are technical difficulties, for example to correctly handle multiple edits, the abuse filter would have to analyze the diff between the latest version and the stable version, instead of just checking the edit. Cenarium (Talk) 01:41, 31 January 2009 (UTC)

It would perhaps have been prudent to ask for information about implementation details before making a proposal. At first blush, it seems that this system would be best used in conjunction with Flagged Revisions, not as an alternative to it. The reason is that most of the requisite functionality already exists in Flagged Revisions – some integration might be possible, but it's not necessarily going to be prioritised. — Werdna • talk 05:44, 31 January 2009 (UTC)

Yes, it would be used in conjunction with Flaggedrevs, but on all articles and possibly other non-talk pages, while flagged revisions would be used for certain pages only. It should be possible it Flaggedrevs can handle a negative flag. On the other hand, the abuse filter would need to analyze the diff to the latest stable version. And the system would be even more efficient if the defer flag were rollback-like and the abuse filter could analyze the diff to the revision before the user edited, but to work properly, they need each other. Cenarium (Talk) 16:14, 1 February 2009 (UTC)

Uses of the AbuseFilter

One of the issues that needs consideration is the limits to which this extension should be used. With a program this powerful, there are a lot of possible uses that wouldn't fall under the auspices of vandalism prevention per se, but would be possibly legitimate uses of the tool. For example

  • Setting up filters to catch and warn about test edits (from the toolbar buttons especially like http://example.com)
  • Discouraging people from certain behaviors, like creating cross-namespace redirects or redirecting their User_talk: page to somewhere else
  • As a mechanism to prevent the insertion of spam
  • As a mechanism in content disputes, for example, stopping the re-insertion of a controversial phrases into particular articles
  • As a mechanism to block certain editors from certain articles (e.g., User:AntiJew can't edit Israel)
  • As a mechanism to enforce Arbitration Committee sanctions

These are just some of the examples off the top of my head. There are likely a number of other ways this tool could be (mis)used. Any thoughts on all of this? --MZMcBride (talk) 18:00, 5 February 2009 (UTC)

The abuse filter in its current state has only two options for edits: warn or disallow (and both). Therefore, it should not be used for anything else than clear-cut cases. But I agree an automated filtering of certain edits may be valuable, this is why I proposed Wikipedia:Deferred revisions. It would add the additional action to defer the edit to a trusted user for review. Cenarium (Talk) 18:26, 5 February 2009 (UTC)
To elaborate, let's consider test edits. There are cases where the user only adds a "link title", this can be reverted, it can be done by a bot such as SoxBot III. There are also cases where a user adds a "link title", or example image, but also adds other information, for example a sentence. This should not be reverted by a bot or disallowed by the abuse filter. Instead, it should be deferred, so that it comes to the attention of a reviewer who can remove the "link title", and consider whether the sentence should be removed or not (vandalism, pov pushing, whatever) (and "undefer" the edit). I would be opposed to disallow test edits, even adding no content, as users should be encouraged to edit, but of course it needs to be reverted. The same can be done for filtered images (not bad enough to be on the bad image list, but with a use needing review), filtered external links (not blacklisted but again, needing review). I gave some examples of filters here. However I don't believe it should be used to block certain editors from editing certain articles or prevent insertion of controversial material. This is extreme filtering and tends towards a quasi-totalitarian content-control. Cenarium (Talk) 18:52, 5 February 2009 (UTC)
Per-article blocking is being dealt with by bugzilla:674, and will be applied to enforce ArbCom sanctions and community-based editing restrictions. Tim Vickers (talk) 19:16, 5 February 2009 (UTC)
I think you're missing the point here. While that bug may be resolved, as far as I'm aware, this tool will be capable of per-user per-article restrictions. So I imagine there are some who will want to use the AbuseFilter for this purpose, especially if it takes a long time for that bug to be resolved (if ever). --MZMcBride (talk) 09:36, 6 February 2009 (UTC)
Using a list of users disallowed from editing a certain page would simulate per-article blocking, but Werdna says this cannot be used for performance reasons. Cenarium (Talk) 09:42, 6 February 2009 (UTC)
Yes, selective blocking is an excellent feature for arbitration enforcement. Cenarium (Talk) 09:06, 6 February 2009 (UTC)
Is it possible to also add basic anti-vandal features, like more than one page blank in 15 seconds for an editor with...100 edits? Or an edit that only adds "penis" to an article? Or is there no consensus for that yet? NuclearWarfare (Talk) 22:43, 5 February 2009 (UTC)
The AbuseFilter can't do tracking like the former would require, it can only get information about individual edits and about the user who made the edit at the time they made the edit. Mr.Z-man 23:29, 5 February 2009 (UTC)
You may think that's a clear example, but what if a vandal removes a legitimate use of "penis" from an article? --NE2 08:50, 6 February 2009 (UTC)
It's the reason edits should generally not be disallowed, but instead deferred - or tagged for review (possibly with a warning before that). Editing should be encouraged, even if it creates problems with tests, etc, we can handle. Only indisputably bad, Grawp-style, vandalism, should be disallowed. Cenarium (Talk) 09:06, 6 February 2009 (UTC)

Correcting some misconceptions:

  • Edits can also be tagged for review on recent changes.
  • Edits can be tracked (using the 'throttle' feature.
  • The Abuse Filter is not intended for use in enforcing arbcom sanctions, for performance reasons.

And some others. — Werdna • talk 06:43, 6 February 2009 (UTC)

Reporting pages for review

A system to report pages with edits matching a certain filter to a special page, e.g. Special:ReportedPages, would be quite useful. It could be used for anti-vandalism and anti-spam purposes, but also for maintenance or policy enforcement. For example broken refs, links to userspace, redlink transclusions, non-free images in non-articles, etc. A filter by severity would be needed to distinguish vandalism and other urgent issues from the rest, for example high/moderate/low. I realize this is a shift from the initial purpose of the abuse filter, but this is an opportunity to detect and fix numerous problems. Is something like this doable ? Cenarium (talk) 21:27, 1 March 2009 (UTC)

There is already the abuse log, which can be filtered by filter. — Werdna • talk 22:45, 6 March 2009 (UTC)

Being able to filter the abuse log by filter is good... It would be nice to be able to filter by multiple filters, e.g. by placing 14,26,31 in the filter ID imputbox, all actions taken by filters 14, 26 and 31 are shown. Also, a way to know if an edit has been checked/reviewed would be helpful for report-only filters, maybe some kind of button to mark the edit checked/reviewed (only available for users in a given usergroup), and a possibility to filter so that only unchecked edits appear ? Cenarium (talk) 23:21, 6 March 2009 (UTC)

BASEPAGENAME

Is there any way to get the {{BASEPAGENAME}} in the abuse flter? I want this for a log-only (maybe later upgrade to warning) for edits to an other's userspace (one of the exceptions is edits to a user's talk page). עוד מישהו Od Mishehu 05:04, 18 March 2009 (UTC)

Possibly urgent false positive

[1] doesn't seem right. --NE2 05:44, 18 March 2009 (UTC)

Contains: "<ref>Insert footnote text here</ref>", which is the kind of default edit bar text it was designed to exclude. Not exactly sure what to do since there is a lot of legitimate text there. Was this a vandal revert or something? Dragons flight (talk) 06:04, 18 March 2009 (UTC)
It looks like the IP is doing a lot of additions to that and similar articles (no idea if they're constructive). This wasn't abuse; it was a misclick. Even if it was simple vandalism, the filter falls afoul of the note at the top of this page: "Please keep in mind that this extension's aim is not to catch simple vandalism, as some bots already do. [...] It won't catch childish vandalism or page blanking." --NE2 06:10, 18 March 2009 (UTC)
Unfortunately, I need to bow out. Hopefully by next morning some else will have things sorted. Dragons flight (talk) 06:16, 18 March 2009 (UTC)

Possible function

Could a filter be created which would prevent speedy deletion tags being removed by the creator of the article they're in, and by non-autoconfirmed users. That is, it should prevent anyone who isn't autoconfirmed, and anyone who created the article in question, from removing a speedy tag. That would save a lot of trouble for us all, I think. Other helpful things would be preventing large text-dumps in the sandbox, preventing the removal of the sandbox header, etc. - possible? Thanks! ╟─TreasuryTagcontribs─╢ 18:32, 7 March 2009 (UTC)

All three of those are possible. — Werdna • talk 03:24, 10 March 2009 (UTC)
And while we're at it, will there be any place where additions like these can be discussed before they are being implemented? Or is a single suggestion that makes sense to the developer enough? (Yep, I do object to the above suggestions, but this is a more general question.) --Conti| 14:14, 12 March 2009 (UTC)
Since admins will be editing the filters, WP:AN or here would probably make the most sense. Perhaps this page should be split up like the spam blacklist page; 1 section for policy discussion, the other for filter discussion. Mr.Z-man 16:55, 12 March 2009 (UTC)

And where are we up to with getting this installed on en-wiki? :-) ╟─TreasuryTagcontribs─╢ 18:25, 11 March 2009 (UTC)

Updated versions pushed to other wikis yesterday. Doing some profiling to make sure everything's in order, hoping for deployment to enwiki within the week if no problems arise. — Werdna • talk 05:11, 12 March 2009 (UTC)

  • As a non-programmer, and since I don't seem to be able to edit the filters anyway (!) could someone possibly look into these ideas? Thanks! ╟─TreasuryTagcontribs─╢ 08:25, 18 March 2009 (UTC)
    • I don't think we should prevent anyone from moving speedy tags, even from articles they created (someone may need to IAR for a trigger happy new page patroller) - perhaps a warn. –xeno (talk) 15:53, 18 March 2009 (UTC)

Uh... blocked users can't report a false positive...

I know blocking is not turned on (perhaps those templates should be commented out), but if it were on, the link to "report this error" would not be available. --NE2 05:33, 18 March 2009 (UTC)

We would presumably have some type of instruction about reporting a false positive through an unblock template if the abuse filter were set to block people. –xeno (talk) 13:54, 18 March 2009 (UTC)
  Fixed [2]. Thanks, –xeno (talk) 02:11, 19 March 2009 (UTC)

New user vs. non-autoconfirmed user

I made a change to Special:AbuseFilter/30 changing "new user" to "non-autoconfirmed user" for clarity's sake. But do we want to keep it simple or clear? I see other filters use the description "new user" as well when they actually mean non-autoconfirm user. Now that the filter can revoke autoconfirmed status not all the non-autoconfirmed users will be "new". Also, the filter was disabled per the 5% rule, I may have inadvertantly re-enabled it... –xeno (talk) 14:15, 18 March 2009 (UTC)

/ in regex

It appears that including "/" in regex expressions can fail without giving a syntax error. The correct way to do this is to escape it as "\/". Dragons flight (talk) 15:24, 18 March 2009 (UTC)

Idiot's guide for admins?

Would someone with a clue mind writing up a walkthrough guide on how to read/view/change/modify/etc these for people that don't have the time or knowledge of the filter system to get a running start? :) rootology (C)(T) 16:22, 18 March 2009 (UTC)

  • It wouldn't hurt to have a guide (that's why the navigation template redlink is for), but I don't think it should be aimed at newbies (or "idiots"). I've also added a request page so people without any sort of technical knowledge can ask others for filters to be created. - Mgm|(talk) 19:56, 18 March 2009 (UTC)

Requesting "abusefilter-private" right

  Resolved
 – The private right would be restricted to users with confirmed identified with Wikimedia, i.e. checkusers, and the like. This has nothing to do with viewing private filters, which are viewable by anyone with the abusefilter-modify right. –xeno (talk) 23:35, 18 March 2009 (UTC)

While I'm not a huge fan of Security through obscurity, I can see why the details of filters would be hidden. That said, is there a way for us lowly editors to get this user right? Burzmali (talk) 16:37, 18 March 2009 (UTC)

It's not "-private" that you want, that's Checkuser IP information. I think you mean "-modify"? ╟─TreasuryTagcontribs─╢ 16:41, 18 March 2009 (UTC)
See below. — Jake Wartenberg 16:54, 18 March 2009 (UTC)
I think the -private right is currently rolled into the admin package. There's private filters that probably aren't viewable by non-admins. For example, filter 2 is currently set to private. –xeno (talk) 16:55, 18 March 2009 (UTC)
The article indicates that "abusefilter-private" is the user right that allows you to view the filters that are marked private. I don't really want to edit the filters at the moment, but I am curious to see how they work. Burzmali (talk) 16:57, 18 March 2009 (UTC)
According to Special:ListGroupRights, abusefilter-private isn't assigned to any group. I believe TreasuryTag is correct with what its purpose is. Mr.Z-man 17:44, 18 March 2009 (UTC)
No, I think someone just forgot to add the entry. Try clicking this: Special:AbuseFilter/2 from a non-admin account - You may not view details of this filter, because it is hidden from public view.xeno (talk) 17:57, 18 March 2009 (UTC)
I can't view that, either. rootology (C)(T) 18:33, 18 March 2009 (UTC)
Add yourself to abusefilter-editors and you should be able to. –xeno (talk) 18:36, 18 March 2009 (UTC)
If you read this section, it seems to suggest that the -private permission is for viewing "private information" - and I recall from the pre-installation discussion that that right was given to checkuser. The viewing "secret" filters presumably comes with the admins' -modify right. ╟─TreasuryTagcontribs─╢ 18:06, 18 March 2009 (UTC)
Whatever it is, it should be clarified. –xeno (talk) 18:13, 18 March 2009 (UTC)

No, it doesn't do that. It shows the ip who triggers a filter. You have to be identified to the foundation to view it. Techman224Talk 21:11, 18 March 2009 (UTC)

Not even that currently. It isn't assigned to any groups at all. Stewards, staff, or sysadmins could presumably give themselves the group but none of them currently have it. Mr.Z-man 23:12, 18 March 2009 (UTC)

Be careful about load

This rule was so burdensome against large pages that the server was timing out whenever someone tried to save a page like ANI. Please try to only do as much processing to the edit stream as is necessary. Dragons flight (talk) 18:43, 18 March 2009 (UTC)

You can monitor the load by looking at [3] and searching for "AbuseFilter::filterAction". The fourth column measures how many milliseconds it is adding to the edit commit time to run things through Abuse Filter. Right now it is adding nearly 2 seconds to every edit, which is very excessive. MBisanz talk 19:35, 18 March 2009 (UTC)

According to Tim Starling there is no intelligent branching. Every operation of every rule get evaluated even if there are AND clauses and such that are clearly false. Dragons flight (talk) 20:42, 18 March 2009 (UTC)

Messages

How we've done the messages could be improved a bit, maybe something like this:

These use more friendly wording and have better explanations of what is going on. ViperSnake151 21:04, 18 March 2009 (UTC)

In response to block notices from the filter, how is the blocked user supposed to edit to report a false positive? ѕwirlвoy  22:54, 18 March 2009 (UTC)
By using our usual unblock system, let me clarify that. Anyway, I also made it so that we use the regular ambox colors, and reserving speedy colors for anything that actually does an effect. ViperSnake151 01:18, 19 March 2009 (UTC)

Impact on new users

This is a very powerful tool, and especially with very common problem actions, I'm optimistic that it will help to prevent a lot of damage and allow us to focus more attention on editing that's currently spent on routine maintenance.

That said, I'm concerned about the impact all these filters will have on a new user's experience. It's very easy to make up new filters, and there are already dozens of them. Some filters seem to be created without a measured, known need. What's our process for assessing whether the benefit for a new filter (potentially recognizing a harmful edit) is greater than the harm (potentially confusing a new user)? Here are some thoughts:

  • Should there be a "Filters for deletion page" or something similar to nominate unnecessary filters for removal?
  • Should there be a Wikipedia:WikiProject BITE or something similar focused on recognizing and eliminating filters (and other practices) that distract or irritate people acting in good faith?
  • Should there be a policy/guideline page on what makes a good filter? (For example: Avoid overcommunicating to the user, stick to the basics and most common misuses and get those right with well-designed filters)?

Given that we're also now using this for recommendations rather than actions known to be abusive, I also think we may want to change the language "abuse filter": Imagine that you're a new user doing something completely innocuous and being directed to a page that shows that you're triggering an "abuse filter". Would it be hard to generalize this to be called an "edit filter"?

But in general, I hope that we can have a discussion about how to make sure that this tool doesn't conflict with our ideals of being welcoming, accessible, and friendly to newcomers. --Eloquence* 21:17, 18 March 2009 (UTC)

Actually, none of the messages mention an "abuse filter". I deliberately triggered one of the most frequently triggered filters from a school computer, and the message seems fine to me. J.delanoygabsadds 22:08, 18 March 2009 (UTC)
Yep, I didn't see it in any of the messages, just on the pages that are being linked to for further information. That may be acceptable, but IMO it's a bit of a misnomer if truthfully the usage goes significantly beyond abusive actions.--Eloquence* 23:49, 18 March 2009 (UTC)
I randomly was looking through about 50-60 of these earlier, and consistently the filters were stopping vandalism across the board. There should be a link, though, on any of these pages saying something like "If you think this blocked edit was in error, please go HERE to report it". I can't see the vandals going HERE, wherever that is, to complain they couldn't penisize a page or anything like that, and it would help weed out any bad filters. rootology (C)(T) 00:02, 19 March 2009 (UTC)
Yes, it sends them to WP:FALSEPOS (see Special:PrefixIndex/Mediawiki:Abusefilter-warning). –xeno (talk) 01:45, 19 March 2009 (UTC)

Personal Logs

Every editor including every IP now has an Abuse Filter log linked from their Contributions page. Right now, that log apparently shows changes that person made to the filters. Would it not be more helpful to log every time that editor tripped a filter? EnviroboyTalkCs 21:36, 18 March 2009 (UTC)

Actually that log does absolutely nothing at the moment. Dragons flight (talk) 21:52, 18 March 2009 (UTC)

Messages spelling mistake

On most (if not all) the templates, the messages refer to a non-existent "Submit" button (should be "Save page"). Could someone fix them? x42bn6 Talk Mess 22:10, 18 March 2009 (UTC)

  Done. Thanks, –xeno (talk) 01:44, 19 March 2009 (UTC)

Importing/updating syntax

While I know that it makes sense to sort of start from scratch so that false-positives are avoided, I think making a use of syntax that is already used should be considered. For example, there are several characters in User:Lupin/badwords. Would it not make sense to at least use some of them? And I don't mean to brag, but heck, I've even got some commonly used "bad words" on one of my sandboxes. It can be pretty handy and less time consuming if you refer to things like that when making changes to a filter or creating a new filter. ~ Troy (talk) 03:17, 19 March 2009 (UTC)

Ouch my virgin eyes! If we were to load a page long regex in there, it may seriously affect load! — xaosflux Talk 04:13, 19 March 2009 (UTC)
Well, I didn't suggest taking out the all of the regular expressions! All I'm saying is, would it make sense to use some of the regex available? It's already in use, so I thought, may be it could be of help ...without busting Wikipedia's servers. ~ Troy (talk) 04:32, 19 March 2009 (UTC)

Some of this could perhaps be implemented with the fast string search extension, making it pretty fast. I'll keep it in mind. — Werdna • talk 05:16, 19 March 2009 (UTC)

Disabled

Abuse Filter is disabled per Brion with the note

"disabling AbuseFilter on en.wikipedia.org; performance problems on save. Needs proper per-filter profiling for further investigation."

So that means we need to go back to our AV-Bot system for the time being. MBisanz talk 19:43, 18 March 2009 (UTC)

Ok, Tim found a work-around, being selectively re-enabled. MBisanz talk 19:48, 18 March 2009 (UTC)
Meaning? :-) ╟─TreasuryTagcontribs─╢ 19:49, 18 March 2009 (UTC)
Meaning some filters in place had to be turned off since they were killing the servers and others are fine to leave running. MBisanz talk 19:58, 18 March 2009 (UTC)
It would be nice to have an easy way of telling the difference between high load and low load on a per filter basis. Obviously one can make some intelligent guesses, but real stats would be handy. Dragons flight (talk) 20:02, 18 March 2009 (UTC)
I think that is going to be the next thing Werdna works on. MBisanz talk 20:04, 18 March 2009 (UTC)

Because of load issues, is it really smart to run filters that are basically duplicates of toolserver anti-vandalism bots that are running 24/7? = Mgm|(talk) 20:38, 18 March 2009 (UTC)

In the long-term, yes. There is a transactional cost of adding junk edits to the database having those pulled to the toolserver, having a bot decide to revert, and sending another edit to the database. In the long run, clamping down on that is in our best interest (roughly 20% of all edits are either junk or reverts of junk). In the short run it might make sense to disable those functions depending where we stand on load. For the moment we seem to be okay, though provided we don't push things further. Dragons flight (talk) 20:57, 18 March 2009 (UTC)
  • It just seems pointless to duplicate efforts when the bots are already doing an admirable job. Perhaps a good idea to keep them in back up in case one of those bots stops running, but I don't see the use to run a filter for which all triggered actions are already reverted by a bot. - Mgm|(talk) 09:43, 19 March 2009 (UTC)

Special:AbuseFilter/43

  Resolved

Can someone review this filter? Somehow it is getting triggered, despite the fact that the people triggering the filter aren't removing any text. What could be causing these false positives and how do we solve them? - Mgm|(talk) 20:55, 18 March 2009 (UTC)

If a line is modified, its old version is included in removed_lines and its new version is included in added_lines. If you really want to know that something was removed you need to find it in removed_lines and not be able to find it in added_lines. Dragons flight (talk) 21:03, 18 March 2009 (UTC)
Or have length(added_lines) = 0, etc. Dragons flight (talk) 21:07, 18 March 2009 (UTC)

AF - combine?

In looking the above over, there seem to be several filters that are rather similar. (With some similar names, as well.)

Would it be possible to combine some of these? Especially those which search for certain word/character combinations. (Not going into details, since I'm not sure yet what's seen and unseen, and therefore what should be mentioned...) - jc37 02:12, 19 March 2009 (UTC)

  • Perhaps as long as the reason for checking is similar and if it checks in similar namespaces for similar users. I'm just wondering if adding them together has any effect on the runtime of the rule...- Mgm|(talk) 09:32, 19 March 2009 (UTC)

Branching intelligence

Werdna has now added branching logic so that if early conditions fail then later conditions do not get evaluated.

For example if you have A & B and A is false then the parse skips over B. Likewise if you have A | B and A is true then B is skipped.

Because logic branching operates from front to back, one should put any simple tests (i.e. namespace checks, edit counts, page sizes, etc.) ahead of expensive tests such as regexes and full_text manipulations. Dragons flight (talk) 04:06, 19 March 2009 (UTC)

Is there an easy way to see if any edits to featured articles trigger these filters? You can't select them based on page content, because their status is mentioned on the related talk page only. - Mgm|(talk) 09:40, 19 March 2009 (UTC)

There is {{Featured article}} template that adds the star. Ruslik (talk) 10:55, 19 March 2009 (UTC)

Hit count; how many handled

A lot of filters result in hits that have been already handled by bots or which are later patrolled by a user. Currently, there are no easy options to find out which log entries still need checking. Can some patrol type thingy be implemented in this log similar to the newpages one? - Mgm|(talk) 10:35, 19 March 2009 (UTC)

Double work

I think we can further ease the strain on the servers. Some templates obviously don't need to be run if other ones have already been triggered. Take this one for example. It triggered both 61 and 3, but if 3 is triggered, there's no elegant way to ignore any related rules. - Mgm|(talk) 12:51, 19 March 2009 (UTC)

Detection issues?

Keeping in mind that the filters just started going, and also keeping in mind that the filters shouldn't be overworked at that, I can't help but ask why a sock like this can go completely undetected. That's an example right there of a serial vandal using unicode characters or moving pages in a manner that should be at least more preventable. ~ Troy (talk) 01:50, 20 March 2009 (UTC)

How would you suggest we prevent this? Of course we can add a filter that checks for moves to the same title plus a dot, and we can blacklist that particular edit summary (and variations of it), but tomorrow Grawp will simply use another edit summary and move articles to different titles. --Conti| 02:00, 20 March 2009 (UTC)
Actually, Grawp switched to that new pattern after we did block 11 17 move attempts on other accounts. There is something of an arms race to this, but trust me that we are working on some new tricks. As Troy says we are a little new at this. Dragons flight (talk) 02:05, 20 March 2009 (UTC)

Non-admin abuse filter rights

Are we going to grant the abusefilter group to non-admins soon? And if that, what are the criteria to getting it. Techman224Talk 02:57, 18 March 2009 (UTC)

I brought this up over here. Not sure which venue is better. — Jake Wartenberg 03:16, 18 March 2009 (UTC)
This is absolutely not a "show good faith and understanding of the rules" thing like rollback. Any attempt to create a "request for abuse filter" process is going to be a disaster. There is no reason at all for non-admins to have the right anyways, after the flurry of new filters slows down we will be left with an established set of filters that will only need a few changes a day, at most. Abuse Filter is in no way comparable to rollback, it simply isn't needed for day to day editing. It is an admin task, let's leave it like that. BJTalk 07:03, 18 March 2009 (UTC)
It really should be accessible only to those with the technical skill, like bots. --NE2 07:58, 18 March 2009 (UTC)
Admins have shown they are generally smart enough to leave things they don't understand (complex templates, JavaScript, CSS, blacklists) alone. I don't see an y reason to create a new closed group for this. BJTalk 09:48, 18 March 2009 (UTC)

Now fully active?

Just to clarify... is the filter now fully active, and if so, what are the rights, and which groups hold them? Can a non-admin apply for any further rights similar to rollback? Just for clarity... Thanks! ╟─TreasuryTagcontribs─╢ 08:22, 18 March 2009 (UTC)

Abuse Filter is fully active, non-admins can not apply for the editor right. BJTalk 09:45, 18 March 2009 (UTC)
When, or actually will, non admins be able to apply for this right? §hawnhath 16:47, 18 March 2009 (UTC)
I don't see any reason to give away this right like rollback. This can cause site wide damage; one admin already messed things up for a while. Non-admins who want to propose new filters or changes to existing filters can always do so on this talk page—just as changes to MediaWiki interface messages are proposed at MediaWiki talk. We have enough admins with the technical knowledge and judgment to handle such requests. Capricorn24 (talk) 07:16, 20 March 2009 (UTC)

Special:AbuseFilter/29 - prevention of removal of deletion templates

  Resolved
 – Filter re-set to warn only by Hersfold. A related, but more general discussion is below, at #Requiring on-wiki consensus prior to setting a filter to disallow or revoke groups. –xeno (talk) 20:19, 19 March 2009 (UTC)

I'm concerned about this filter as it has been recently set to disallow by Hersfold (talk · contribs). I'm afraid that we are throwing one of our cardinal rules, IAR, out the window. Thoughts? –xeno (talk) 12:44, 19 March 2009 (UTC)

  • The rule only prevents new non-autoconfirmed users from doing this and the rules have a provision that ignores removal of speedy deletion templates when hangon is used. I can't think of any valid reason for a non-established user to remove an AfD template. Can you? - Mgm|(talk) 12:51, 19 March 2009 (UTC)
    • This seems outside the scope of the feature and will almost certainly be very bitey. Do we have a huge problem with templates being removed? RxS (talk) 13:01, 19 March 2009 (UTC)
Just because someone is brand new does not mean that the person adding the tag is right and the new guy removing it is wrong. I think it would be fine to warn the user about the rules when they attempt to do it, but not to prevent them. As for valid reasons, I suppose they would be the same as any established user. Chillum 13:04, 19 March 2009 (UTC)
Established users have little reason to do this too unless they're closing an AFD debate for which newbies have insufficient experience. - Mgm|(talk) 13:19, 19 March 2009 (UTC)
  • @Mgm, what about an experienced user on a new IP address? I agree with RxS, this is too bitey. –xeno (talk) 13:07, 19 March 2009 (UTC)
  • The underlying IP address doesn't matter. The rule only checks for user group permissions. As long as they log in, established users are not affected by the rules. - Mgm|(talk) 13:19, 19 March 2009 (UTC)
    And for established users who choose not to log in? –xeno (talk) 13:21, 19 March 2009 (UTC)
  • I agree, too, that this could be kinda bitey. And while we're at it, shouldn't there be a page somewhere where we can discuss the various existing filters? This page seems suboptimal for that. --Conti| 13:21, 19 March 2009 (UTC)
    That's probably a good idea (separate page to discuss filters). We could also perhaps institute some kind of policy to have filters !voted on before they are set to disallow. –xeno (talk) 13:44, 19 March 2009 (UTC)
  • I'd rather keep the discussion central (regarding filter discussion) A separate voting page wouldn't hurt. Anyway, the filter has code that can distinguish between new and experienced users, but if an experienced user chooses not to sign in, there's no way to recognize them. We can't start throwing out filters because of that. - Mgm|(talk) 13:49, 19 March 2009 (UTC)
    • An experienced user purposely logging out to close an AFD discussion would be rather suspicious, and would likely be a violation of WP:SOCK#SCRUTINY. Mr.Z-man 16:30, 19 March 2009 (UTC)
      • I was more thinking of say, an experienced user just poking around from a public terminal, seeing a deletion tag placed on an article that clearly meets inclusion guidelines, and removing it as an IP. Oh, and don't forget Mr. IP (talk · contribs) and similar users who prefer to edit anonymously. –xeno (talk) 16:34, 19 March 2009 (UTC)

Guys, why doesn't someone go through 50 or so of its 200 hits and figure out what the error rate is? It is much easier to discuss problems in actual terms rather than speculate about hypothetical user behavior. Dragons flight (talk) 16:50, 19 March 2009 (UTC)

  • This filter is against WP:CSD—I will not be surprised if someone changes this filter in the future to block all non-admins from removing speedy tag, just like the unilateral change that was done to the policy page once[4]. There was a debate at WT:CSD and the consensus was against this change. If the policy says "any" editor (except creator of the article) can remove the tag then any editor, including IPs and non-autoconfirmed users should be allowed to do that. There is no "usergroup" based restriction in the policy, then why add one by technical means? I would prefer if this filter restricts only the creator of the article from removing the tag. Capricorn24 (talk) 16:50, 19 March 2009 (UTC)
  • For my part, I can see no reason whatsoever why a non-autoconfirmed user should ever need to remove a speedy-tag. There will always be experienced users to do that on review; the vast majority of IPs deleting such tags will be disruptive, removing them from their own articles or just because they think it's fun... Sad but true. ╟─TreasuryTagcontribs─╢ 16:53, 19 March 2009 (UTC)
WP:ABF? I think that should be dealt with on a case by case basis. We do have several good IP editors. This kind of filter needs a consensus to change the policy, first. Technical measures shouldn't be implemented against policy. Capricorn24 (talk) 17:11, 19 March 2009 (UTC)
  • Can the filter not be modified to prevent only the article's creator from removing the tag? This would still violate IAR, but at least it wouldn't violate current deletion policy. –xeno (talk) 17:14, 19 March 2009 (UTC)
Xeno it can not. Prodego talk 17:21, 19 March 2009 (UTC)
kk. –xeno (talk) 17:22, 19 March 2009 (UTC)
If there is a consensus here to set the filter to warn-only, I don't really care. I set it to disallow following a discussion on IRC, where we noticed that there were very few if any false positives at the time. I do see your points, but then I'd also remind you that it is very rare for a non-autoconfirmed user to really know what they're doing when removing these. Hersfold (t/a/c) 17:33, 19 March 2009 (UTC)
I'd support warn-only as I think it would deal with the majority of the problem, and it would also mean that in the few problematic cases we would know that the user in question had ignored two warnings not to do this without good reason - one on the template and one in the edit window. Tim Vickers (talk) 20:10, 19 March 2009 (UTC)
Warn only sounds perfect. Chillum 23:50, 19 March 2009 (UTC)

I've set it to log-only. It's not that they're no false positives, it's that users generally make other edits when removing the tag, potentially constructive, like in Filter 18. While it's ok to warn for filter 18, there's no particular need to warn for this filter. Its chief purpose is to provide the list of pages for review, easy: check the blue links. There are already users rollbacking a removal of db-tag coupled with constructive editing... Let's not 'scary off' new users more than necessary. Cenarium (talk) 00:42, 21 March 2009 (UTC)

Requiring on-wiki consensus prior to setting a filter to disallow or revoke groups

I think that we really ought require clear on-wiki consensus prior to setting a filter to disallow or revoke groups (and block, when/if this is enabled). This tool has great potential for biteyness and preventing people being able to IAR, concerns I raised above. Requiring a number of editors to support a disallowing/revoking filter would also have the added benefit of catching erroneous filters before they result in the situation we encountered last night. Thoughts? –xeno (talk) 17:46, 19 March 2009 (UTC)

And on the subject of consensus, further input on the issue of who should be given the -modify right to view hidden filters, and edit/create/disable etc. any fitler, would be appreciated in the relevant section above. ╟─TreasuryTagcontribs─╢ 17:57, 19 March 2009 (UTC)
I strongly agree, although I'm not sure if we should turn this into a vote (or a !vote). A mandatory on-wiki discussion for filters that disallow or revoke groups/rights is needed, tho. --Conti| 20:33, 19 March 2009 (UTC)
Yes to consensus, no to voting. Chillum 23:49, 19 March 2009 (UTC)
I think the more important point is that filters with the heavy duty actions need to be thoughtfully targeted. Under normal conditions, having a discussion is fine; however, I don't want to say that one can never invoke those options without one. If a new type of vandal (or vandal bot) is making a mess then using the stronger options may be an effective way to shut it down, and in an emergency I'd want to leave those options open for skilled users without necessarily requiring a large and slow discussion. Dragons flight (talk) 01:07, 20 March 2009 (UTC)
Don't we have WP:IAR for emergencies? We could have the discussion with the filter turned on in such cases, but there should be a discussion nonetheless, even if it is just a bunch of guys agreeing with each other. --Conti| 02:03, 20 March 2009 (UTC)
There is also the issue of discussion publicity. I am thinking about turning on some anti-Grawp page move logic, but I'm not sure I want to discuss what that does publicly since that would just teach Grawp to defeat it if he visits these pages. Dragons flight (talk) 07:57, 20 March 2009 (UTC)
Yeah, that could definitely be a problem, and I'm not sure yet how we could get around that. The "Notes" section of the filters can be (and is being) used currently for discussions about the corresponding filters, but that's a pretty awkward way to do things. Still, my point remains for the public filters that we have. --Conti| 11:26, 20 March 2009 (UTC)
Nah, I would here apply the same as what we do with blocking, page protecting, and blacklisting links, the admin that is applying the filter should be able to provide sufficient proof of disruption for the action. We block Grawp on sight, we don't seek consensus first. If the filter is likely to give, or can be shown that it gives collateral damage (when an IP is blocked, we don't see if a genuine editor is using it; we don't know if genuine IPs are trying to edit a semi-protected page or editors trying to use a perfectly proper reference on a site which was heavily misused and hence blacklisted. I expect only few try to complain! Those blocks are WAY more bitey than this filter can ever be, and we will never know!), then indeed, adapt the filter, or remove the blocking/disallow (leave warn in tact), put a note on the filter (!), and start a discussion (often other admins than the blocking admin unblock editors/IP-ranges or deprotect pages). I really would not worry about the biteyness of this system, it is way less bitey than blocking, blacklisting or protecting! --Dirk Beetstra T C 10:54, 20 March 2009 (UTC)
I'd rather compare this to lots and lots of new anti-vandal bots. We do block Grawp on sight, but we're not letting anyone start a new untested and unobserved anti-Grawp bot. There would be chaos if we would, and I'm fearing that there might be (some) chaos if there won't be any rules or guidelines as to how to use this filter. I couldn't disagree more with the biteyness of this system, considering that most filters that are currently set on disallow or warn are for new users and (more importantly) anons only. --Conti| 11:35, 20 March 2009 (UTC)
That is a matter of the specificity of the filter. I see we now have a lot of broad filters, while this allows for specific fine tuning of the rule. Something that is very difficult (if not impossible) with bots, blocking, page protecting and blacklisting. I am not saying that we should blindly turn on filters, but requiring consensus before applying them may just be a bit too much. --Dirk Beetstra T C 12:00, 20 March 2009 (UTC)
Disallowing blatant vandalism I can see, I still think it would be a good idea to have a few eyes look at it before it's implemented (cf. the filter 58 bit). Disallowing non-vandal edits (i.e. filter 29) on the other hand, definitely needs on-wiki consensus. –xeno (talk) 12:44, 20 March 2009 (UTC)
For specific and limited clear abuse filter, it should be ok to turn on disallow for them, if they have been well-tested. For those that are of large scope or non-abusive, disallowing should be discussed, on-wiki. I had created filter 28 to be log-only. It provides a handful of pages to delete for interested admins: follow the blue links. But there's just no real purpose to warn, but to scare-off newcomers, and disallow can prevent constructive edits, and same for filter 18. Cenarium (talk) 01:15, 21 March 2009 (UTC)

Proposed warning messages

For vandalism

For use of restricted images


-- IRP 23:38, 19 March 2009 (UTC)
If we're gonna use tango for the stop icons, why not tango for everything there? Also changed around some of the icons. ViperSnake151 00:39, 20 March 2009 (UTC)
I dislike the "temporarily restricted from executing some sensitive operations" wording. Nobody's going to understand what that means. --Carnildo (talk) 00:46, 20 March 2009 (UTC)

I have added another warning message. -- IRP 00:54, 20 March 2009 (UTC)

  • This should be worded differently. People will complain about admin power abuse if you need an administrator to make edits that aren't done to protected pages. - Mgm|(talk) 23:52, 20 March 2009 (UTC)

Throttling

Please note the "Throttle" setting merely means that you have to see X number of actions before triggering other actions, so throttle needs to combined with warn, disallow, etc. to tell the filter what to do once the throttle limit is reached. Dragons flight (talk) 08:08, 20 March 2009 (UTC)

Also note that there is no logging until the throttle is exceeded, so it is not directly possible to see how often the event occurs without exceeding the throttle limit. (Though one could set up a duplicate rule that is unthrottled and log-only.) Dragons flight (talk) 19:27, 20 March 2009 (UTC)

The noticebox at the top of this page

This one:

We already have a bunch of filters catching simple vandalism (and mostly doing a pretty good job at it), so that message is clearly being ignored. So what should we do? Remove the notice or abide by it? --Conti| 11:51, 20 March 2009 (UTC)

  • If the particular type of vandalism found by a filter is routinely caught by antivandalism bots there's not really any point to keep it running, because the bot will do the reverting and warning for us. It's better to free up some of the runtime to use on other filters.- Mgm|(talk) 12:13, 20 March 2009 (UTC)
  • Or disble that part of the antivandal bot, this is less bitey and more effective than what the bots are doing. But the performance first has to improve, I would say. --Dirk Beetstra T C 12:22, 20 March 2009 (UTC)
  • It depends on the kind of vandalism. I don't see the use in warning someone who plasters the same abusive sentence on an article hundreds of times. People who blank substantial sections of text while adding material of their own, are more likely to have good intentions. Alternatively, we could ask the bot operators to send out friendly warning messages in some cases. - Mgm|(talk) 13:38, 20 March 2009 (UTC)
  • It was my intention with Special:Abusefilter/48, put some external links (youtube/myspace/blogs; which are not actually spam, but almost alwaysvery often inappropriate when added without having some idea of our policies or guidelines) into a mode where the editor is warned that these links are often inappropriate with links to all the relevant policies and guidelines, and give them the choice to save anyway. At the same time I would remove the relevant rules from XLinkBot, and see what happens. I can see from IRC how often XLinkBot gets reverted on these links, and do an estimate. Unfortunately, the rule is way to heavy on the servers for now. --Dirk Beetstra T C 16:34, 20 March 2009 (UTC)
  • Are the policies clear enough about when myspace and youtube links are appropriate? I still see people who always call YouTube links spam despite the fact that official videos from say Oprah Winfrey are not in anyway a copyright violation and obviously suitable to link to in her article. Blogs are the most abused. People rarely check if they might be written by an expert. - Mgm|(talk) 23:06, 20 March 2009 (UTC)

Removing {{reflist}}, {{reflist|<number>}} and <references/>

Can anyone think of a valid reason to remove these tags from an article other than copyvio tagging (with the template that requires complete article removal) or redirecting? If not, it might be a good idea to split this off from my other references filter (filter 61)and give a specified warning messsage or disallow to it. While there are several reasons to remove references themselves, the reasons to remove these tags are a lot less numerous and easier to code. Am I missing a glaringly obvious reason? - Mgm|(talk) 12:13, 20 March 2009 (UTC)

Makes sense, as long as there are <ref>-tags in the article, it should not be deleted. Also the other way around could be done, if the editor adds <ref>, but there is not one of these templates -> give a warning. --Dirk Beetstra T C 12:25, 20 March 2009 (UTC)
In the cases where there are no <ref> tags, the problem is usually the empty references section header. The template itself wouldn't show anything and it doesn't hurt to keep it around until an article is referenced, does it? - Mgm|(talk) 12:30, 20 March 2009 (UTC)
Duplicate reflist and references tags need to be removed sometimes. Disallowing will create more problems that it will solve. Ruslik (talk) 12:27, 20 March 2009 (UTC)
It's easy to check for duplicates or to let the filter ignore it if there's still such a tag in the resulting page. I'd only be disallowing instances where no such templates would be left after the edit. - Mgm|(talk) 12:30, 20 March 2009 (UTC)
Such filter will be expensive—it will need to check all text of the article. Ruslik (talk) 12:34, 20 March 2009 (UTC)
That is not a reason why we should/could not have such a filter, but merely why it should not be enabled at the moment (until the performance of the system is good enough to do these things). --Dirk Beetstra T C 12:40, 20 March 2009 (UTC)
  • Or we could simply encode that in the warning: "If you are removing a duplicate, feel free to hit the save button." In the time this filter has been running I have yet to see a single example of duplicates of the final reference collection tag. The number of false positives would be extremely small. - Mgm|(talk) 12:44, 20 March 2009 (UTC)
Comment: While removing these tags may be a worthwhile housekeeping project, I'm not seeing anything in this thread that indicates it is an abuse issue. Risker (talk) 13:04, 20 March 2009 (UTC)
  • It is abuse because it makes the article look like it's inadequately referenced when it's really not. It's plain disruption. - Mgm|(talk) 13:13, 20 March 2009 (UTC)
    Yea, it's a common target for blanking (vandals like to chop off the bottom bits of the article). Tag or warn only for now I would say. –xeno (talk) 13:22, 20 March 2009 (UTC)
  • I'd be running it in log only mode first for a couple of days before thinking about adding a warning message. If after a couple of months there's very few to no false positives, I might think about having this set to disallow, but that's not relevant at the moment. I'm trying for a warning message if the this filter is confirmed to work properly. (I'll code and start running it when I have the time to observe the log). - Mgm|(talk) 13:35, 20 March 2009 (UTC)
  • It does bring up an interesting question: does the abuse filter have to be used for abuse only, or can we also consider to use it in a positive way: "hey, thanks for the reference, good work, just as a note, there is no {{reflist}} in the article, please don't forget to put it in a section at the bottom"? --Dirk Beetstra T C 16:27, 20 March 2009 (UTC)
  • It's something we can't get around. Almost every unconstructive edit has potentially good intentions -- they're linked. If providing a warning improves a user's knowledge of the system and allows them to make the intended change while avoiding the unconstructive part of their edit, it improves Wikipedia, which is always a good thing. I believe that "abuse" should be defined widely and include all sorts of edits that are generally considered bad. - Mgm|(talk) 17:19, 20 March 2009 (UTC)
  • I like the basis of the thought, though calling not knowing that someone also has to insert a {{reflist}} is a bad edit or abuse goes a bit far. Does the warning screen you get say that it what the editor did is abuse, does it have a negative inclination? Or does it leave all options open? --Dirk Beetstra T C 17:42, 20 March 2009 (UTC)
  • Forgetting to add such a tag has no effect, effectively removing it does trigger the filter because that is what's harmful. I don't have a warning template yet, but when I'll write it, I'm obviously going to do my best to give it a positive tone. - Mgm|(talk) 23:02, 20 March 2009 (UTC)

Which warning message?

What warning message will be shown if an edit trips multiple filters carrying one? - Mgm|(talk) 12:13, 20 March 2009 (UTC)

The first one in the filter list only. Dragons flight (talk) 14:39, 20 March 2009 (UTC)

Peak

Earlier today, there was a peak that resulted in various filter running for 1-2 second and some even ridiculous times like 13 seconds. Can anyone track down what the cause of this was? - Mgm|(talk) 12:41, 20 March 2009 (UTC)

There are several possible issues. Being sent a really large edit is one, as some filters will chew for a long time. Another is having the job multitasked on a server that is temporarily overloaded (so all threads are running slow.) One could also imagine being hit by temporarily high database latencies. If a filter is giving a really high number, I'd recommend looking at the simpler filters. If they also have really high numbers then it is probably a temporary issue. Dragons flight (talk) 19:32, 20 March 2009 (UTC)

AND and OR

A & B | C evaluates left to right and is implicitly (A & B) | C. I've seen several examples where someone wrote A & B | C when they clearly meant A & (B | C). Please be careful about this. Dragons flight (talk) 23:15, 20 March 2009 (UTC)

Adding Disallow on Filter 3

The blanking filter triggers if a new user removes the entire contents of a page except for at most 50 characters (and isn't creating a redirect). I have reviewed 50 entries where a user resubmitted a blanked page even after being warned. Of these I found 48 that appeared to be vandalism. Most of the time an abusive or silly short message was left behind or left in the edit summary. For a few cases, the page was simply blanked with no explanation, and after looking at the page contents I saw no reason to believe it ought to be removed.

The remaining 2 confirmed submissions were essentially deletion requests. One was a self-blank by the author (effectively csd g7). The other attempt at deletion was for an article that may be a hoax, which I have now sent to AFD. In both of these cases the page blanking was initially reverted, which is the likely outcome when a new user blanks a page without any explanation. (In fact, most blanked pages are reverted by ClueBot with no intervention anyway.)

I'd like to suggest that rather than allowing these blankings, which are mostly vandalism, we set filter 3 to Disallow and craft a message that explains in simple terms how to correctly get a page deleted, and where to get help if they need it. This would A) cut down on vandalism (96% of the passed edits in my sample), and more importantly, B) help us avoid blanking-reverting cycles on pages that really do need to be deleted by giving the people who are legitimately seeking a deletion the tools to accomplish that. Right now MediaWiki:Abusefilter-warning-blanking says only a little bit about deletion, but I think it could be the main thrust of the message. Dragons flight (talk) 23:41, 19 March 2009 (UTC)

Since blanking a page that you are the sole contributor to is a form of CSD, is there no way to make it not trigger if the person blanking is the sole author? Chillum 23:48, 19 March 2009 (UTC)
Actually the filter does ignore it if the person is the sole author. In the case mentioned above, someone else had tagged the text with a message box. From the software's point of view there were then two editors, even though only one was really an author. Dragons flight (talk) 23:51, 19 March 2009 (UTC)
Hmmm. I see. Chillum 23:56, 19 March 2009 (UTC)
  • As long as the template they get served explains all eventualities and gives clear instructions on how to get help, I support setting this one to disallow. Simply explaining dg-g7 helps here. What about new users blanking pages with copyvios. Is it ignored if they put up the correct " this is a copyvio" template with http:// link? - Mgm|(talk) 23:58, 19 March 2009 (UTC)
Didn't we have a case (or two) of BLP subjects blanking their own articles? I don't think it would be a good idea to prevent them from doing that, even if it is inappropriate in the end. And does the filter consider whether an edit summary (possibly explaining the blanking) is being used? --Conti| 00:07, 20 March 2009 (UTC)
We don't want them blanking the page (which often goes directly into an edit war leading to more anguish for the living person), we want them to ask for the page to be deleted in clear manner even if that is just by pointing them at the help desk. I would assume that if someone is concerned about their own article, they would stop and read a message box when presented to them. No, the filter does not currently consider edit summaries, do you have an idea for how it could usefully do so? Dragons flight (talk) 00:15, 20 March 2009 (UTC)
I agree that we don't want them blanking the page, but what we want and what people do isn't always the same thing. :) My point was that if we have an angry BLP subject, a human should deal with it, and not a bot or a part of the software. We don't want to WP:BITE newbies, and we especially don't want to bite BLP subjects. Hmm, apart from excluding any blanking where an edit summary is used at all, I'm not really having a good idea there. Maybe we should have a look at what kinds of edit summaries are used in page blankings by vandals. I would imagine that they're either not used at all, or that some of the terms in Special:AbuseFilter/52 are used. --Conti| 00:31, 20 March 2009 (UTC)
  • I don't see how serving BLP subjects with an automated message is in any way bitey. As long as it properly explains what the proper alternative for blanking is, it is helping rather than biting them. (Besides, all log entries should be checked by humans. It's too new a feature to rely on it working completely faultless) - Mgm|(talk) 08:37, 20 March 2009 (UTC)
  • I just found this edit in the log. It triggered filter 3 and supposedly resulted in a warning and a tag. I can't find any trace of the edit existing. Does that mean the user decided not to follow through after reading the warning? We should have a way to distinguish between edits that were attempted and actually made. - Mgm|(talk) 00:25, 20 March 2009 (UTC)
    • Any entry that says Warn means the edit was stopped to issue a warning. In addition to the warning message, the user is given an edit box and a chance to fix the problem, abandon the edit entirely, or continue as is. If they continue, and the issue that triggered the filter still exists, there would be second entry in the log with a different action statement (in this case "tag"). A substantial fraction of the edits that trigger Filter 3 are actually abandoned with just a warning. As noted above, I looked at a sample of 50 that chose to continue. Dragons flight (talk) 00:32, 20 March 2009 (UTC)
  • Should we have someone collect statistics on how effective warnings are in causing edits to be abandoned? - Mgm|(talk) 23:09, 20 March 2009 (UTC)

Messaging

If we are going to disallow page blanking in favor of a deletion oriented message, I'm going to take a stab at writing one. My original draft tried to explain things like {{db-author}} and {{copyvio}} but that ending up being both very long and very opaque. So instead I opted for a simple text that encourages people to ask for help. The downside is that helpers may end up hating me for the extra work load :-). Please try out the key link as I am using pre-filled information to further the messaging.

So what do people think? Does this seem like a reasonable response when someone blanks an article? Dragons flight (talk) 07:17, 20 March 2009 (UTC)

I think that is a very well worded message, good work. Just a question, does the term "new users" include IPs? Oli OR Pyfan! 07:47, 20 March 2009 (UTC)
The logic includes IPs, we could say "new or anonymous users" or something like that, but that felt a little overcomplicated to my ears. Dragons flight (talk) 07:51, 20 March 2009 (UTC)
Hmmmm, maybe "new or logged-out", "new or unregistered", ... Dragons flight (talk) 07:54, 20 March 2009 (UTC)
I agree with "new or unregistered". Oli OR Pyfan! 08:18, 20 March 2009 (UTC)
  • I believe we should expand this message with improved wording on deletion instead. The message original suggested in this particular section does not explain a new user any alternatives besides deletion which might be equally or even more valid. - Mgm|(talk) 08:42, 20 March 2009 (UTC)
  • So suggest something in between. I will say though, I've yet to notice a log entry where I thought reverting vandalism or redirecting a duplicate was actually the right course of action. In addition to being rare, they are somewhat tricky for true newbie to grasp, so I'd definitely demote those. Editing might deserve more attention though. Dragons flight (talk) 08:57, 20 March 2009 (UTC)
    Maybe we could merge the two messages together, for example I think there should be something in the message about test edits but I also agree with Mgm's view that the message should incorporate other options as well as deletion. Oli OR Pyfan! 09:14, 20 March 2009 (UTC)
    I strongly suggest that "If you feel..." be replaced with "If you believe...". That is better English and doesn't make it seem like we encourage users to act on emotional whims. Indianopilot (talk) 10:26, 21 March 2009 (UTC)

    Specific cases versus wide scale

    I see now that most of the rules are quite general, broad rules which are normally and easily caught by the vandalism bots. If I see it correctly, none of the rules is for specific vandalism, which is difficult to programm into the antivandalism bots. Now not that I want to say that the first type of rules are not good (really, we should also have these), but why do we not also try some specific cases of vandalism, where specified editors are 'attacking' a small set of pages, or where a small set of editors (subset of IPs?) is doing something we dislike? The application of this filter is way less bitey then the general page protection, or link blacklisting, and way less work than blocking every new sock we encounter. --Dirk Beetstra T C 11:31, 20 March 2009 (UTC)

    I am trying to do this with filter 37. Ruslik (talk) 13:22, 21 March 2009 (UTC)

    -modify and -private

    Hi there, I'm just trying to get my head around all this AF stuff at the moment. From what I gather, -modify is currently a standalone permission and -private doesn't exist yet. If I'm not mistaken, -private is going to be made an Oversighter/Checkuser privilege, but could -modify become a Sysop privilege (just like ipblock-exempt and rollback are subsets of Sysop)? I think this is especially appropriate given that Sysops can -revert but not -modify at present, if Special:ListGroupRights hasn't betrayed me.

    For the time being, however, can we just go ahead and give ourselves AF editor privileges without it being considered an abuse of tools or what have you, or is there already an RfX for it? It Is Me Here t / c 11:02, 21 March 2009 (UTC)

    Many (including me) did exactly this. Ruslik (talk) 12:31, 21 March 2009 (UTC)
    OK, I'll do that, then. What are people's thoughts about making Abuse Filter editor a subset of Sysop if we are only going to grant the right to Sysops? It Is Me Here t / c 12:54, 21 March 2009 (UTC)
    The jury is still out on that one. Depending on how often we sysops royally screw things up, it could receive tighter restrictions, or, consensus could create some process whereby non-sysops could receive it. –xeno (talk) 13:19, 21 March 2009 (UTC)
    My tally so far is that admins, collectively, have made major user-visible issues three times in filter writing. Dragons flight (talk) 19:59, 21 March 2009 (UTC)
    I see. But am I right in thinking that -private is unrelated to viewing private filters? Also, could private filters receive normal talk pages (just make them hidden from non-Sysops or non-editors)? Those little talk boxes are not as useful in determining who said what as normal talk pages, in my opinion. It Is Me Here t / c 13:49, 21 March 2009 (UTC)
    Yes, private filters are viewable by anyone with the -modify right. See the discussion above (#Requesting_.22abusefilter-private.22_right) for more on the -private right. Hidden talk pages aren't possible (I suppose if we deleted it after every message we left, this would be sub-optimal), but people do tend to sign their comments in the abuse filter notes. –xeno (talk) 13:56, 21 March 2009 (UTC)
    How about having the system create Special:AbuseFilter/#/Talk every time a new filter is made, whose protection status is the same as its master page's? It Is Me Here t / c 14:12, 21 March 2009 (UTC)
    Seems redundant to the notes field, but that's jmo. –xeno (talk) 14:17, 21 March 2009 (UTC)
    Protecting it doesn't make it unreadable by others. It would still end up being a public discussion of a private filter. Mr.Z-man 15:55, 21 March 2009 (UTC)
    ← Sorry, I didn't mean "protect" as in WP:PROTECT, but rather as in the Public/Private tag. Hence, the /talk subpage would also be private if the filter page was, thus making it inaccessible by those without proper access. That is, unless you can't put "normal" (i.e. editable) pages in the Special: namespace? It Is Me Here t / c 18:02, 21 March 2009 (UTC)
    I'm pretty sure we wouldn't be able to keep a normal wiki page (with page history and the ability to sign with four tildes, and the like) within the Special:AbuseFilter as you desire. At least, not without some further wizardry. –xeno (talk) 22:40, 21 March 2009 (UTC)

    Batch testing fixed

    The batch testing interface that has been broken for a couple days should be working again. Dragons flight (talk) 03:19, 22 March 2009 (UTC)

    -modify right (moved from AN)

    Hi all,

    After about six months' waiting, I've finally activated the AbuseFilter extension on enwiki!

    In brief, the Abuse Filter allows automated heuristics to be run against every edit. It's designed as an anti-vandalism tool for very simple and/or pattern based vandalism.

    PLEASE do not activate a filter with any action other than flagging without testing it first with just flagging enabled. — Werdna • talk 23:36, 17 March 2009 (UTC)

    It's finally here! :D — neuro(talk)(review) 00:09, 18 March 2009 (UTC)
    Wonderful! One thing we need to do now is hammer out some kind of a guideline as to when admins are allowed to give out the "abusefilter" flag to users. — Jake Wartenberg 00:23, 18 March 2009 (UTC)
    Right. That would be a marvelous idea. Synergy 00:30, 18 March 2009 (UTC)
    Awesome, this is great. Cirt (talk) 00:35, 18 March 2009 (UTC)
    Hooray! shoy (reactions) 01:37, 18 March 2009 (UTC)
    Yay! Lets do this as quickly as possible. I'm waiting for it. Techman224Talk 03:24, 18 March 2009 (UTC)

    Giving non-admins the ability to use this could create a dangerous escalation of privileges. Esp. as this can be used to block editors. Though I'm told there are ways to restrict what certain people can do with AbuseFilter. --MZMcBride (talk) 03:39, 18 March 2009 (UTC)

    Agree with MZMcBride (talk · contribs). Cirt (talk) 03:41, 18 March 2009 (UTC)
    Agree, also it could be used to pseudo-protect pages (by denying any edit to a certain page). MBisanz talk 04:16, 18 March 2009 (UTC)
    As currently implemented, AbuseFilter cannot block users, it can only take away their autoconfirmed status. And using the extension to pseudo-protect a page would be a clear abuse of the tool, and access to it would be revoked immediately. So I see no reason not to grant this permission to trusted members of the community, especially as many of the filters cannot even be viewed without it. Social, as well as technical restrictions need to be considered here. — Jake Wartenberg 04:27, 18 March 2009 (UTC)
    Seems too much of a risk to allow for potential abuse of the AbuseFilter. Cirt (talk) 04:30, 18 March 2009 (UTC)
    To be fair, many admins shouldn't have access to it either, given the possibility of false positives. --NE2 04:38, 18 March 2009 (UTC)
    Maybe we need an AbuseFilterFilter to prevent abuse of the AbuseFilter. Cirt (talk) 04:41, 18 March 2009 (UTC)
    Now thats an idea! :). Oli OR Pyfan! 09:44, 18 March 2009 (UTC)
    but how can we stop people abusing it?--Jac16888Talk 15:29, 18 March 2009 (UTC)
    (edit conflict)x2 It should be pretty easy to tell if a user is going to use this for deliberate disruption. The bar for geting this flag should be rather high, but there are knowledgeable, trusted users that are not sysops that can help out in this area. I think that there is a reason we created yet another flag for this, rather than just bundling it in in with +sysop. The AbuseFilter guideline says "anybody in reasonably good standing may request the appropriate permissions on Wikipedia", though I am not sure what that is intended to mean. — Jake Wartenberg 04:42, 18 March 2009 (UTC)

    Whats the project page for this for requests for filters/filter reports/discussion of things to do with it? rootology (C)(T) 05:53, 18 March 2009 (UTC)

    Wikipedia:Abuse filter (talk). --MZMcBride (talk) 05:56, 18 March 2009 (UTC)
    Hmm, shouldn't there also be some kind of noticeboard/discussion page where the actual filters can be discussed, new ones proposed, etc.? --Conti| 11:28, 18 March 2009 (UTC)

    We can restrict certain actions to administrators while still allowing non-admins in the abusefilter group to modify filters. None of these actions are available yet. When we have some better performance data on the abuse filter, we'll consider whether we want to activate some of the harsher actions. — Werdna • talk 08:28, 18 March 2009 (UTC)

    Roux, that is exactly the problem—we don't have a policy for giving this to non admins yet. Ideally, I think, this would be a WP:PERM thing, but the criteria for getting the flag will have to be quite a bit higher than rollback or accountcreator, as if deliberately abused it can cause quite a bit of disruption. Perhaps we want there to be a three day waiting period on the requests so that other people can raise objections before it can be granted by a single admin? — Jake Wartenberg 12:55, 18 March 2009 (UTC)
    • I think at present it should probably be restricted to admins until we have some time on the system and see how it performs in the wild. Kudos to everyone involved in getting this thing up and running. –xeno (talk) 12:57, 18 March 2009 (UTC)
    I am not sure anyone should have it until they demonstrate a technical understanding of the features. I for one will not touch it till I have time to thoroughly read the manual. Chillum 15:15, 18 March 2009 (UTC)
    I thoroughly agree. — Jake Wartenberg 15:32, 18 March 2009 (UTC)

    A parallel discussion seems to be happening over here. It would be nice to keep this in one place. — Jake Wartenberg 15:48, 18 March 2009 (UTC)

    The discussion should be held there, so as to not be lost in the annals of AN archives. –xeno (talk) 15:54, 18 March 2009 (UTC)
    As you can see, the thread has been moved. I hope this is OK. — Jake Wartenberg 16:53, 18 March 2009 (UTC)
    Considering the disruption that a poorly-written filter could cause, I don't think this should be an easy flag to get. I'm not a fan of people wanting to poke things around simply out of curiosity. Tim Vickers (talk) 18:35, 18 March 2009 (UTC)

    I'd be very much in favour of trusted users (like myself... :p ) - perhaps the rollbacker group, perhaps a little stricter than that - being given the right. We're no more foolhardy than admins, would realise that if we cause chaos then we'll be in trouble (!) and on my own count, I'd like to be able to see all filters, all settings, test etc. I do have a limited amount of programming experience, sufficient to confirm that I won't muck anything up, anyway. ╟─TreasuryTagcontribs─╢ 18:38, 18 March 2009 (UTC)

    Make it blue then?xeno (talk) 18:43, 18 March 2009 (UTC)
    Well, not out of the question, but easier said than done!! ╟─TreasuryTagcontribs─╢ 18:44, 18 March 2009 (UTC)

    I don't think the issue here is that people will make mistakes. Admins can make mistakes just as easily as anyone else. The issue is, the Abuse Filter has the ability to affect actions across the entire site. If someone wrote a bad (malicious) filter, it wouldn't take long for people to figure it out, but even if it took only 30 seconds to fix (ludicrously short, IMO), the speed at which Wikipedia runs means that around a hundred people would be affected. Administrators are, through their access to the MediaWiki namespace and by extension the various blacklists, implicitly trusted by the community to make actions that have an effect on the entire site. Non-admins may or may not be trusted, and in any case, the trust is not "formalized", so to speak.
    That said, I am not opposed to allowing non-admins to create filters that perform some of the less drastic functions, but the right should not be nearly as easy to get as rollback. Maybe we could make it so that two or three administrators have to agree to give the right out. J.delanoygabsadds 19:02, 18 March 2009 (UTC)

    As an alternative, or at the very least a temporary measure till we work out a better system, we could just allow users to propose/suggest filters somewhere and have admins add them. --Jac16888Talk 19:06, 18 March 2009 (UTC)
    Of course this will always be possible. Probably using {{editprotected}} on this page. — Jake Wartenberg 19:29, 18 March 2009 (UTC)
    See Wikipedia:Abuse filter/Requested. –xeno (talk) 19:32, 18 March 2009 (UTC)
    Thats what I meant, a specific request page, that one isn't wasn't linked to anywhere--Jac16888Talk 19:35, 18 March 2009 (UTC)
    Yea, it's brand new... Linked from {{Filternav}}xeno (talk) 19:39, 18 March 2009 (UTC)

    Here, it says that "anyone in reasonably good standing can request the appropriate permissions"... are we going to stick to that, or does it need to come out? ╟─TreasuryTagcontribs─╢ 19:49, 18 March 2009 (UTC)

    Removed for now. –xeno (talk) 19:57, 18 March 2009 (UTC)
    There is also the part in Wikipedia:Abuse filter/Requested that says "only trusted editors in good standing will be given the ability to create new filters". Might wanna zap that too, to avoid confusion. — Jake Wartenberg 21:31, 18 March 2009 (UTC)
    Tweaked, thanks. –xeno (talk) 21:55, 18 March 2009 (UTC)

    Now what?

    Ok, I think that things are starting to stabilize to some degree, although I can't really know as I can't see most of the filters. *Glares pointedly* So I think we have a pretty good idea of what we are dealing with now. The three main ideas I seem to recall are:

    • Keep this for admins only. If we do this, it may well make sense to bundle this with +sysop, rather than keeping it as a separate right. No matter what we do, it probably makes sense to assign all sysops -modify, so they don't need the separate flag.
    • There will of course be people that want to wait longer, and see what happens.
    • Assign this at WP:PERM, like rollback and accountcreator. This is the idea I like the most. Because of the potential for disruption, the bar will have to be way higher than with either of those other permissions. We may want to have more than one admin agree before the flag is added, or have a waiting period during which editors can comment on requests.

    Besides the technical matter of giving people the rights, there is the question of what we let people do with 'em. If we ever turn on blocking of users, we may well want to restrict substantive edits to these filters by non admins. But this restriction could be a social one, and not a technical one.

    One of the most important things about this right is that it lets users see the private filters. The fact that we have these at all is kinda un-wiki, although probably a necessary evil. But we don't need to widen the gap between admins and non-admins any further. We may even want to consider asking for the ability to see private filters to be unbundled from the ability to modify them.

    So where do we wanna go now? — Jake Wartenberg 05:31, 22 March 2009 (UTC)

    If we give it to non-admins (which I'm not sure I like the idea of), it would definitely need to be a more stringent process than WP:PERM. I don't think a simple "lack of objections" would suffice for this. A full-RFA-like process wouldn't really be necessary, but there would need to be some way to gauge community trust/competency. Mr.Z-man 05:54, 22 March 2009 (UTC)
    I don't have any problem with granting all admins the right to see private filters etc. Letting people edit filters also has a component of technical competency which is independent of sysop trust. That said, we can probably deal with that socially rather than technically (at least I hope so with the admins). Simply discouraging people from mucking around with things before they understand them will hopefully avoid most of the bigger kinds of mess ups. And I would encourage people to ask questions. Hopefully we'll even improve the documentation eventually too.  :-) On the other point, there are some established bot operators who would probably have greater technical competency and ability to make thoughtful and productive use of the filters than most admins. I would be open to granting abuse filter rights to a few people who have demonstrated strong technical involvement even if they aren't admins. Doing so should involve a thoughtful discussion about trust and competency though. Dragons flight (talk) 07:09, 22 March 2009 (UTC)
    While I am not necessarily opposed to granting this right to non-admins, I think that if I would trust somebody with this permission then I would probably also trust them with +sysop. Master&Expert (Talk) 16:42, 22 March 2009 (UTC)
    I certainly agree with you on that point. But we don't live in an ideal world :|Jake Wartenberg 18:47, 22 March 2009 (UTC)

    Base page name

    Is there any way to get the base page name for use in a filter? For example, for User talk:Od Mishehu/Archive1, it would return "Od Mishehu"? עוד מישהו Od Mishehu 17:30, 21 March 2009 (UTC)

    Not presently, though one can do things like detect the "/" and related logic that covers some cases. Dragons flight (talk) 23:44, 21 March 2009 (UTC)
    How can I use this to check if the user who edits a paticular user/user talk subpage is, in fact, the owner of the userspace? עוד מישהו Od Mishehu 05:21, 22 March 2009 (UTC)
    "user_name in article_text". "X in Y" checks if X is a substr of Y. "user_name" is the user's name and "article_text" is (somewhat confusingly) the name of the page without the namespace, so "Od Mishehu/Archive1" in your example. Dragons flight (talk) 06:54, 22 March 2009 (UTC)
    I was actually looking into this same question. "X in Y" would false-positive on user_name "J" and pages under User:Jimbo, for example. "(user_name + '/') in article_text" would still false-positive when user "Smith" tried to edit "John Smith"'s subpages (or subpages of User:Jimbo/Smith, for that matter). "article_text like (user_name + '/*')" would almost work, but could false-positive for any user with a "?" or "*" in their name.
    Besides a #titleparts-like function, a "startswith" comparison could work for this case, as could "substr" or a "quote_glob" (or "quote_regex") function. Anomie 15:48, 22 March 2009 (UTC)

    Email address filter

    The goal of Special:AbuseFilter/76 is to provide a warning to new users who try to add an email address into pages. Most such email addresses are bad and would get removed by other editors when they are noticed anyway. Since there are at least a few cases where adding an email address is okay (e.g. @wikimedia.org), it may need to remain a warning only, i.e. a user could still submit the email after the warning. However it appears that User:XLinkBot is specifically approved for reverting all emails added to mainspace, so that is something to consider.

    The question arises, what is the right way to detect an email address.

    • Last night I added: "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b", which literally came from Googling for suggestions.
    • OverlordQ changed it to: "([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})"
    • Beetstra tells me XLinkBot uses: "(?<![^\s:])[^\s\]\[\{\}\\\|^\/`<>@:]+@\w+(?!\.htm)(?:\.\w+){1,3}"

    So, let's open it up for discussion. What is the right magic regex to have a very high efficiency of detection and a very low false positive rate? It probably isn't terrible if we let a few valid emails through, so personally, I'd prefer to make the error rate as low as possible. Dragons flight (talk) 23:18, 21 March 2009 (UTC)

    P.S. I think my default reaction is something like: "Let's use what the existing bot uses, since that seems to work.  ;-)". Dragons flight (talk) 23:29, 21 March 2009 (UTC)

    I am trying to analyse:

    Caught:

    • 22:09, 21 March 2009: WATerian (talk | contribs | block) triggered filter 76, performing the action "edit" on User talk:Boghog2. Actions taken: Warn; Filter description: Adding email address (details) (examine)
      • NickGracey@Gmail.com (not parsed by the linkwatchers, wrong namespace)
    • 21:51, 21 March 2009: 41.204.224.18 (talk | block) triggered filter 76, performing the action "edit" on User:Chidi ugwu. Actions taken: Warn; Filter description: Adding email address (details) (examine)
      • extraordinary.chidi@yahoo.com (parsed by the linkwatchers, not reverted, XLinkBot does not revert outside mainspace)
    • 21:50, 21 March 2009: Music roxs (talk | contribs | block) triggered filter 76, performing the action "edit" on Fulston Manor School. Actions taken: Warn; Filter description: Adding email address (details) (examine)
      • mail@fulstonmanor.kent.sch.uk (edit not performed, hence not seen by the linkwatchers)
    • 21:48, 21 March 2009: 41.204.224.18 (talk | block) triggered filter 76, performing the action "edit" on User:Chidi ugwu. Actions taken: none; Filter description: Adding email address (details) (examine)
      • extraordinary.chidi@yahoo.com (not performed, not parsed by the linkwatchers)
    • 21:48, 21 March 2009: 41.204.224.18 (talk | block) triggered filter 76, performing the action "edit" on User:Chidi ugwu. Actions taken: Warn; Filter description: Adding email address (details) (examine)
      • extraordinary.chidi@yahoo.com (not performed, not parsed by the linkwatchers)
    • 21:21, 21 March 2009: BurmaCampaignJapan (talk | contribs | block) triggered filter 76, performing the action "edit" on User:BurmaCampaignJapan. Actions taken: none; Filter description: Adding email address (details) (examine)
      • info@burmacampaign.net (not reported on IRC (flooding), parsed by the linkwatchers, not reverted by XLinkBot (not allowed in userspace))
    • 21:19, 21 March 2009: BurmaCampaignJapan (talk | contribs | block) triggered filter 76, performing the action "edit" on User:BurmaCampaignJapan. Actions taken: Warn; Filter description: Adding email address (details) (examine)
      • info@burmacampaign.net (not performed)
    • 20:40, 21 March 2009: 209.33.110.109 (talk | block) triggered filter 76, performing the action "edit" on Toby Keith. Actions taken: none; Filter description: Adding email address (details) (examine)
      • apparently added toby_keith_lover@hotmail.com, not warned on this edit, but edit not performed.
    • 19:41, 21 March 2009: Xyttz1 (talk | contribs | block) triggered filter 76, performing the action "edit" on Talk:RuneScape. Actions taken: none; Filter description: Adding email address (details) (examine)
      • mmccombs96@yahoo.com, talkspace, not seen by the linkwatchers, performed anyway).
    • 19:40, 21 March 2009: Xyttz1 (talk | contribs | block) triggered filter 76, performing the action "edit" on Talk:RuneScape. Actions taken: Warn; Filter description: Adding email address (details) (examine)
      • mmccombs96@yahoo.com, talkspace, not seen by the linkwatchers, warned, not performed).
    • 19:11, 21 March 2009: Nikhiltej777 (talk | contribs | block) triggered filter 76, performing the action "edit" on User:Nikhiltej777. Actions taken: none; Filter description: Adding email address (details) (examine)
      • nikhiltej777@yahoo.com, parsed but not reported on IRC, not reverted by XLinkBot (outside space)
    • 19:09, 21 March 2009: 98.116.74.51 (talk | block) triggered filter 76, performing the action "edit" on Captain Underpants. Actions taken: none; Filter description: Adding email address (details) (examine)
      • thomas97@yahoo.com (parsed but not reported on-IRC)

    LinkWatcher catches:

    (The edit on Chidi ugwu was not reported on IRC, though was parsed, probably the bot flooded just before that. XLinkbot does not revert in userspace, the linkwatcher only parses main, user, template and category namespace).

    (?<![^\s:])[^\s\]\[\{\}\\\|^\/`<>@:]+@\w+(?!\.htm)(?:\.\w+){1,3} is a better regex for email addresses (excluding the stuff in urls that look like email addresses. Still, I know a case of an CD/DVD title which looks like an email address (I think it was something like 'artist@wembley.2007'), which was reverted by XLinkBot (it is the only mistaken revert I know of ..). Besides this, there are some things that need to be filtered. Some places on wiki use 'username@en.wikipedia' as a way of describing who on which wiki. This looks like an email address, but is not that.

    About the things not working in the filter:

    • \b[0-9a-zA-Z]*: [0-9a-zA-Z] is a 'word with possible digits in it', \b is superfluous, but would not block e.g. http:// before it.
    • ([-.\w]*[0-9a-zA-Z])* is catching the same as [-.\w]*[0-9a-zA-Z]*, the brackets don't do anything here
    • There are more valid characters in an email address than '0-9a-zA-Z'

    I would prefer the filter to catch this in stead of XLinkBot, using the filter is a less bitey method, and probably more complete. But I think the whole needs some serious tweaking, and some things have to be excluded. Some more tweaking has to be performed, and I hope that it does not make the regex taking too much time. --Dirk Beetstra T C 23:47, 21 March 2009 (UTC)

    Since edit conflict: indeed, lets see if this works better. But I am afraid the regex is taking more time .. --Dirk Beetstra T C 23:47, 21 March 2009 (UTC)

    (edit conflict) Hi!
    Actually all three regexps are very similar. The second expression is more precise than the first, and XLinkBot's regexp is more general than the other two regexps. But I guess, XLinkBots's regexp still has a very low false positive rate. You should ask Beetstra whether he knows about some false positive cases. I guess there are (almost) none.
    To make an exception for wikimedia.org you could use:
    (?<![^\s:])[^\s\]\[\{\}\\\|^\/`<>@:]+@(?!wikimedia\.org)\w+(?!\.htm)(?:\.\w+){1,3}
    -- seth (talk) 23:57, 21 March 2009 (UTC)
    Actually there is already a separate exclusion clause for "@(lists\.)?wikimedia\.org", but that was just an example. I imagine there could be others, especially if one watches spaces other than mainspace. Dragons flight (talk) 00:01, 22 March 2009 (UTC)
    Beetstra, does the bot have any intelligence for things other than namespace? i.e. based on the editor or any special exemptions? Dragons flight (talk) 00:01, 22 March 2009 (UTC)
    Also, are there any valid TLDs that have numbers in them? I didn't think there were. Dragons flight (talk) 00:02, 22 March 2009 (UTC)
    There is an exclusion for '@[A-Za-z]+\.(wiki([mp]edia|books|species|source|versity|news)|wiktionary)(?!\.)' (email address part, there are others for urls, email addresses that match this regex are ignored by the linkwatchers, and hence not seen by XLinkBot), which may not be fully complete. The whole bot-system is very complex, doing a lot of checks and combined regexes, checking for duplicates etc. What it reports is almost failsave, but not completely.
    I am not sure if there are any TLDs which contain only numbers, but @S0mewhere.com should be valid, and @admins.S0mewhere.com maybe as well. And I have seen strange tricks to get around these things (will not stuff beans), but as this is never going to be an forbid-edit-filter, but at most a warn-filter, those rare ways of getting around should be fine (I will talk about the very strange ways to get around some systems with the Wernda in PM if necessery). --Dirk Beetstra T C 00:14, 22 March 2009 (UTC)
    Question in return, why did the ones I have here reported as being seen by EnLinkWatcher2 not tripped the filter, I don't really see the reason, and my off-wiki testing shows that the regex should have caught 6 out of 7 of them. --Dirk Beetstra T C 00:16, 22 March 2009 (UTC)
    The third and fourth are autoconfirmed and so were definitely excluded for that reason. Dragons flight (talk) 00:26, 22 March 2009 (UTC)
    I can give you a phenomenological answer on the others. Some quirk associated with "([-.\w]*[0-9a-zA-Z])*" is apparently causing the regex to prematurely abort. In other words, though it ought to match the email, regex is aborting before the search gets there. I don't really know why it does that, but obviously it is an issue. Dragons flight (talk) 01:49, 22 March 2009 (UTC)
    Also, please comment out the email rule in User:XLinkBot/RevertList when enabling warn on this filter (and vice versa), we don't want double warning here, I think. If they read the warning and still perform the edit, then they should not get the warning... --Dirk Beetstra T C 00:18, 22 March 2009 (UTC)
    I'm going to try "(?<![^\s:])[^\s\]\[\{\}\\\|^\/`<>@:]+@\w+(?!\.htm)(?:\.\w+){0,2}\.[A-Za-z]{2}" at the log level. It is the same XLinkBot except that I modified it to require the last frame begin with ASCII, in recognition of the fact that all approved TLDs start with letters (aside from ICANN's limited experiment with UTF-8 names). That would eliminate the .2007 error you mention above. Dragons flight (talk) 02:23, 22 March 2009 (UTC)
    <speech mode="clippy">It looks like you are trying to develop a regular expression to check email addresses. It's not as easy as you think, and most of the ones you'll find with Google are broken. May I direct you to http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html which has the simplest known regex to do the job?</speech> --Carnildo (talk) 01:08, 22 March 2009 (UTC)
    Yeah, yeah, it's hard to match emails 100% correcly, got it. I'd settle for mostly right with very few false positives. Dragons flight (talk) 01:51, 22 March 2009 (UTC)
    The one we are using in the linkwatchers seems quite OK, it does occasionally catch stuff that is not an email address (it caught an image on commons on the Turkish wiki a couple of days ago). However, I don't know how much mine is missing! And I use program-tricks to clean-up, something we can't do here. Lets see what the log does. --Dirk Beetstra T C 09:45, 22 March 2009 (UTC)
    Shouldn't this be main-namespace only (or exclude the user and user-talk namespaces)? --Conti| 20:42, 22 March 2009 (UTC)
    Right now it is log-only which will let us get a feel for where email address are legitimately added, versus where a warning is appropriate. Some people, myself included, do have an email on our own user page, but it may still be the case we want to warn new users against this is nearly all cases. Dragons flight (talk) 21:52, 22 March 2009 (UTC)
    Maybe we should have a different filter for that, then, to have different warnings for different namespaces. It's probably not a good idea to use an email address in userspace, but we certainly don't forbid it. On the other hand, we should warn strongly against using email addresses in articles. --Conti| 21:59, 22 March 2009 (UTC)
    If the filter properties are the same, we could vary the message using parser functions and a {{NAMESPACE}} switch in the warning template. Of course we may decide we don't want the filters to be the same. For example, we might consider forbidding new users from adding emails to articles (which is close to what XLinkBot does since it is approved to revert all emails added to articles). Dragons flight (talk) 22:09, 22 March 2009 (UTC)

    Special:AbuseFilter/79

    {{reflist}} sometimes comes with a column parameter like {{reflist|2}} I'd like the regex in this new filter to include those options too, but if there's a number there's also that vertical line that is usually an operator in regexes. Will "(\|[1-9])?" read the line-number part correctly or will it pick between \ and a digit? Basically, I don't know how to work the | into the expression. Any help is appreciated. (Please don't run it; I'd like to enable it myself and keep an eye on the log to avoid duplication with 61) -- Mgm|(talk) 01:06, 22 March 2009 (UTC)

    If you don't understand how to write regular expressions then why are you writing filters? --Malleus Fatuorum 02:29, 22 March 2009 (UTC)
    \| should work. Ruslik (talk)

    Throttle "user" vs. IPs

    The throttle "user" setting apparently collapses all IP editors to user "0" because that's the ID used in Mediawiki. The side effect of this is that setting a "user" throttle does not differentiate between IP users. If you are throttling on an action that IP are allowed to perform, I'd recommend a "user,ip" in the throttle. Dragons flight (talk) 18:25, 22 March 2009 (UTC)

    Query regarding the "search the abuse log" function for IPs/ranges

    Because so many vandals are IP hoppers in certain ranges, some of which are more notorious than others (wink ;), shouldn't we be able to search for logs of IP ranges and not just individual IPs when we input the user? Perhaps a better question would be, why shouldn't we? ~ Troy (talk) 03:30, 23 March 2009 (UTC) By the way, here's a simple example of a guy hopping IPs to circumvent the filter (not that you needed to see that, because it's already such a widespread problem). ~ Troy (talk) 03:33, 23 March 2009 (UTC)

    Microvandalism

    I've been asked how we can spot changes of dates, name, number, formulae, etc. A lot of vandalism is just such tiny changes. Will this help? Thanks. dougweller (talk) 19:12, 21 March 2009 (UTC)

    I don't think the filter can tell the difference between misinformation and a correction. Chillum 19:14, 21 March 2009 (UTC)
    I've just thought of something regarding this. How about we create <verified> ... </verified> tags which would serve this purpose? The tags would mean that a date had been verified by, say, at least two independent sources. The filter could be set up so that only a Sysop could add verified> to a page, and the markup would work so that <verified>20 March</verified> would just display as "20 March". Now, there would be an approval process for verifying dates, similar to the current GA or FA reviews. Once a date had been verified (or, basically, any block of text had been placed within <verified> ... </verified> parameters, another Abuse Filter would prevent anyone other than a Sysop from editing anything within those parameters; essentially, fully protecting little areas of an article which there is no real reason to edit.
    Thoughts? It Is Me Here t / c 19:41, 21 March 2009 (UTC)
    I can't think of any reliable way to make the filter do this. Probably would not be too hard to write a separate Mediawiki extension to do something like this but that's a different issue. Dragons flight (talk) 21:22, 21 March 2009 (UTC)
    I'd be concerned about starting to allow admins to lock down invidiual bits of articles in this fashion. –xeno (talk) 22:43, 21 March 2009 (UTC)
    I agree it would be a significant policy change. Since we can't accomplish it here, that question is largely moot at the moment. Dragons flight (talk) 23:19, 21 March 2009 (UTC)
    It also seems to be impossible to implement with 0 false positives - for example, it may be the case that an article needs to be split, and the <verified> section is on that content. עוד מישהו Od Mishehu 06:03, 23 March 2009 (UTC)
    You're looking for flagged revisions   --NE2 08:30, 23 March 2009 (UTC)

    Semi-protection of controversial articles

    I was wondering, were the abuse filter proposal to work well, could it make semi-protection of controversial articles obsolete (if we set the filters to the right settings)? Master&Expert (Talk) 16:48, 22 March 2009 (UTC)

    Unlikely. The abuse filter has powerful mojo, but it still can only detect things that have predictable patterns (such as page blanking, adding "poop", etc.). It can't do much about edit wars and partisan editing on controversial topics, since those are complex topics. There may be some cases where cutting down on the silly vandalism alone is enough reason for unprotection, but it won't cover everything. Dragons flight (talk) 18:19, 22 March 2009 (UTC)
    Well, it could be cut down to e.g. blocking certain ranges of IPs or blocking certain ranges of IPs and non-autoconfirmed users, which could be way less then blocking all IPs and non-autoconfirmed users. But it depends on the type of vandalism on the document. --Dirk Beetstra T C 19:11, 22 March 2009 (UTC)
    This can also be used to enforce topic bans in some cases. Ruslik (talk) 19:21, 22 March 2009 (UTC)
    Technically yes, but WP:AE usually does a good job at that, and I would think writing rules to topic ban specific users would usually be a poor use of the filter. Dragons flight (talk) 19:27, 22 March 2009 (UTC)
    This definitely shouldn't be used for topic bans and the like, unless there's a clear and strong consensus do to so. --Conti| 20:30, 22 March 2009 (UTC)
    It might be useful in some cases to use a filter instead of semi-protection, when an article is semi-protected because of a persistent vandal with a simple MO. I doubt there are too many cases like that, tho. --Conti| 20:30, 22 March 2009 (UTC)
    Well, I have a case of an IP-hopping vandal using a couple of ranges only, I am planning to use this to see if I can 'semiprotect' the handful of artices in question by just 'blocking' edits from these ranges. It may result in the editor moving to other ranges, but it would give considerable less collateral damage then semi-protection (the articles have been semiprotected for something like 3 out of the last 4 years .. I think). --Dirk Beetstra T C 08:25, 23 March 2009 (UTC)

    Baffled

    In an attempt to fix a bug in filter 79, I figured I'd do some alterations and run it against a change I knew it had to trigger. Unfortunately, it didn't work and now when I run the the debugging tool, I can't even get the code that was in effect at the time to catch the edit it did the first time. (as can be seen here). I know it worked once because it is logged. Why won't it work in the check now? - Mgm|(talk) 09:01, 23 March 2009 (UTC)

    Filter 30 warning message on Family Court With Judge Penny

    (see WP:FALSEPOS) I'm not surprised OhioRuthie (talk · contribs) didn't know what to do. They say they tried to have the page deleted, but filter 30's warning message isn't as information as the one in filter 3. The user got:

    when

    would've provided her with the information on how to nominate something for deletion. Would someone object to the blanking template receiving a rewrite so it can be applied to both filters? - Mgm|(talk) 10:45, 23 March 2009 (UTC)

    I'd lean towards keeping two messages but making the first more friendly/informative. As mentioned way up the page, I'd also give more emphasis to deletion in the blanking template since that seems to the main non-vandal reason for blanking. By contrast, content removal is probably less likely to be about deletion (or redirection) and more likely to be about editing and reverting vandalism. Dragons flight (talk) 18:57, 23 March 2009 (UTC)
    • That's probably a good idea. I don't really care if there's two messages or one as long as the first one is far more informative and friendly. - Mgm|(talk) 22:35, 23 March 2009 (UTC)

    Of the last 8,747 actions, 8,747 (100.00%) have reached the condition limit of 1,000...

    Sounds like a Bad Thing to me... thoughts? Happymelon 12:07, 23 March 2009 (UTC)

    You mean we have too many filters? Some of them can actually be merged. Ruslik (talk) 12:59, 23 March 2009 (UTC)
    Does this mean that every single edit is tested against at least one rule, pff, seems fine with me? --Dirk Beetstra T C 13:08, 23 March 2009 (UTC)
    I'm concerned about whether the 1001th condition is the one that stops the goatse vandalism... step 1 is to work out exactly what the hell this actually means. Werdna?!? Happymelon 14:07, 23 March 2009 (UTC)

    It means there are too many conditions being checked. Delete some filters. — Werdna • talk 03:38, 24 March 2009 (UTC)

    Argentina IP Hopper

    Special:AbuseFilter/38 and the related 86 through 92 are thrashing the condition limit. As presently implemented the condition limit is counting every | and & branch whether it needs to execute it or not plus every function call it does need to execute. Even if the ip_in_range branches don't need to execute it is counting as >160 conditions, which is a large portion of the way to the hard execution limit of 1000. I'll talk to Werdna about changing the implementation, since I don't see a reason to count branches that don't execute, but for the moment, I believe these rules should stay disabled. Dragons flight (talk) 14:59, 23 March 2009 (UTC)

    Comments

    One can embed comments within rules with the format:

    /* some comment */
    

    I'd prefer general descriptive information stay in the main text box, but this is available is you do want to add a short note about something in the rule itself. Dragons flight (talk) 18:52, 23 March 2009 (UTC)

    Speed optimisations

    I just split a rule into 7 (rule 38 -> rule 86-92). It did some seemingly easy tests in the beginning of rule 38, one of which was a 7-stage OR ( a== b | c == d | e == f | g == h | i == j | k == l | m == n), on article title, and hence should hardly ever be true. This rule took 130-160 milliseconds to run. The 7 rules, which each do ONE of the 7 tests, run in about 0.2-0.3 seconds (each!). It may really be worth to split some of the rules we have... --Dirk Beetstra T C 11:50, 23 March 2009 (UTC) I stand corrected, 86 seems to run quite a bit longer (45-50 seconds) than the other 6 .. maybe it still has to average properly, as I first saved it as the same rule as 38. --Dirk Beetstra T C 11:54, 23 March 2009 (UTC)

    They "run" in 0.2 seconds because we're hitting the hard execution limit and not executing at all. See section below. Dragons flight (talk) 15:05, 23 March 2009 (UTC)

    If this is a genuine improvement in runtime, it strikes me as more a bug in the way the filter is processed by the extension than a valid way to improve performance. We've been going for a week and we already have almost a hundred filters; splitting the ones we've already got into numerous parts is just messy. A filter should be a semantic grouping of conditions, not a technical one; it's the filter that people see, not the tests it runs. Let's see if we can find out why it was being so pathetically slow, and fix that, rather than destroy the semantic clarity of the filter organisation in a chase for the performance at the end of the rainbow. Don't get me wrong, thankyou for taking the time to explore this issue; I just don't agree that division and duplication is the way forward when we could just as easily get the AbuseFilter software to do whatever tweak we're hacking, automatically. Happymelon 12:06, 23 March 2009 (UTC)

    I agree. It also looks a bit too strange to me. Now this is 'just' 7 rules in stead of 1 rule , but I can imagine that there are bigger issues than this one (I could actually split every rule again in two, which according to this should give a further speed-improvement...). Speed optimisation is something that we should keep in mind, and some tests will be slow in any way. I do think it is a mixed responsibility, the system should not be easy to slow down, but still we have to keep in mind to write fast rules.
    I have been testing around with 38, moving things around, implementing some more simple tests, etc. Nothing really worked (I could make it worse, though). Strangely enough, killing half of a test resulted in it going from 140 to .35 millisecs. That effect is just too large. All 7 tests in the OR have to be run to see if it matches, something that will be slower than a single test. But it should not be a factor of 700, I think. --Dirk Beetstra T C 12:19, 23 March 2009 (UTC)

    It seemt to have to do with some caching. The first one is the slowest, the others re-use. I have enabled them as separate rules for now, when we see how to speed this up, or when it is on full speed, we can combine them again and delete them. --Dirk Beetstra T C 14:53, 23 March 2009 (UTC)

    These rules are so specific that the average time isn't giving you a good result. The fiters are all too slow, and should all be disabled. The reason is that if one gets past that first conditions, very very large checks are run. Thus for most edits you get .3 or so ms. But for others it takes much much longer (likely multiple seconds), if it has to do all those IP checks. The average time, therefore, is largely random, and the max time is significantly larger than the largest average time. Prodego talk 16:43, 23 March 2009 (UTC)
    But that is a matter of making the beginning specific enough .. those large tests are really hardly never run! One has to be editing from two specific ranges of IPs on just a couple of articles before the large test is run. I agree that the rule is too slow, but I don't see how the large IP check has something to do with that, for 99.999% of the articles, those are/should not run... --Dirk Beetstra T C 17:46, 23 March 2009 (UTC)
    The token parser (i.e. the thing that turns the rule string into a set of machine instructions) currently parses through the entire string every time a rule is run. That parsing is probably quite fast, but even if it is 0.025 ms / token, adding an extra 1000 tokens is going to add overhead even if that long branch is never executed. Improving the token parser so it is amenable to caching is a significant target for optimization, but that hasn't been done yet. Dragons flight (talk) 18:46, 23 March 2009 (UTC)
    But when they are run it is going to be very slow. The edit may actually time out right there, even if you don't hit the condition limit. Prodego talk 01:21, 24 March 2009 (UTC)
    ip_in_range benchmarks for me at well under 1 ms per call. It is a very lightweight function, and unlikely to directly cause a timeout. I'm pretty sure most of the pain is the token parser, and not the actual construction. That said, I have a distinct dislike for such huge conditional statements and the condition limiter is going to reject them. Dragons flight (talk) 01:39, 24 March 2009 (UTC)
    That is what I said, they are very likely hardly ever to run, it is not a rule which is run on 1 out of 1000 edits. It is probably still going to take a couple of days before our POV pusher realises that the pages are unprotected. And when it runs on those 10-20 edits, we are at the moment at 142 actions per minute (according to my linkwatcher, stats over 3 1/2 days), do you really believe that those 20 edits that do hit this filter in full will bring down wikipedia parsing time? No Prodego, I really don't believe thát is the problem. I think it is more that more development is needed to improve the parsing of the rules.
    As it is now, it becomes pretty clear to me that this filter will, at this moment, not enable us to stop specific modi operandi for long term vandals which a specific scope. The articles under this rule have been protected for something like 3 out of the last 4 years, and when protection ended or was lifted, this editor comes back. I was (and still am) hoping that the abuse filter, besides picking out some 'simple' vandalism as we now do, is capable to specifically target specific vandals and gives these articles that have been protected for years, and which results in much collateral damage, back to the good faith editors. --Dirk Beetstra T C 08:07, 24 March 2009 (UTC)

    IRC feed of Abuse log

    Does anyone know if there is a live IRC feed of the abuse log? I would love to get access to that for data analysis purposes. Also, if Huggle could read that feed, then it could better prioritize its queue to focus on edits that are potentially problematic. --CapitalR (talk) 19:20, 24 March 2009 (UTC)

    As far as I know, there's no IRC representation of the feature at all, though I agree that a tailored version (problematic edits that weren't disallowed) could be a great move, even one merged in a new colour into the CounterVandalism channel. But I'm not techy enough to do anything like that, and one may need Werdna's permission, I don't know how it'd work...
    Great idea, though! ╟─TreasuryTagcontribs─╢ 19:22, 24 March 2009 (UTC)
    There is an IRC feed of it, what channel it is in, I can't remember, I think it is #wikipedia-en-abuselog . MBisanz talk 20:35, 24 March 2009 (UTC)
    Actually #wikipedia-en-abuse-log . MBisanz talk 21:39, 24 March 2009 (UTC)
    Note that that channel doesn't report all abuse filter hits, but only some of them. Some of the "less important" filter hits are removed to reduce the noise and allow the channel to be used by humans. Also, there may still be a delay up to 30 seconds on reports; I'm not sure if the bot has been switched over to use the toolserver database yet. Mr.Z-man 22:18, 24 March 2009 (UTC)

    There is an open bug to get a channel on irc.wikimedia.org for this, akin to recentchanges.  — Mike.lifeguard | @en.wb 01:34, 25 March 2009 (UTC)

    Filter #4

    The prolific sockpuppeteer that inspired #4 is active right now: some refinement by those more competent than I might be possible, as the filter hasn't hit on the activity. Acroterion (talk) 22:26, 24 March 2009 (UTC)

    Action taken?

    Sorry to keep plaguing with questions... could I ask what is meant by "Action taken - tag," and "Action taken - warn," ? Thanks! ╟─TreasuryTagcontribs─╢ 11:12, 18 March 2009 (UTC)

    Warn shows you a bar resembling an edit notice before it accepts the edit. Try saving a page with test edit artifacts to see for yourself. Not sure what tag does. –xeno (talk) 14:08, 18 March 2009 (UTC)
    I think tag tags the edit in recent changes, but I'm not sure how it is meant to appear. –xeno (talk) 16:34, 18 March 2009 (UTC)
    I would like some clarification on what "tag" does, as well. MahangaTalk 09:15, 25 March 2009 (UTC)
    The intent of tag is to add an identifying marker to certain logs, e.g. recent changes, to call attention to the edit. The tagging system is not yet active, so presently tag does nothing. Dragons flight (talk) 09:38, 25 March 2009 (UTC)

    "Can I Have Some Filter, sir?"

    Perhaps this has been answered up there somewhere, but who will be able to get access to the right, besides admins? Or is it just admins? Thanks. —Mr. E. Sánchez (that's me!)What I Do / What I Say 00:42, 21 March 2009 (UTC)

    Admin only due to potential site-wide disruption. Cenarium (talk) 01:07, 21 March 2009 (UTC)
    We've noticed that these filters have the strong potential to slow down editing significantly, to the point editing some larger pages will make the edit time out so that you get Wikimedia's version of the blue screen of death. That aside, it's also possible to effectively place hard IP or rangeblocks through this system without necessarily having access to Special:Block. For that reason, it's currently restricted to admins only, and the flag may even be restricted further so that it can only be issued by bureaucrats (at least, the idea has been tossed around). Hersfold (t/a/c) 07:11, 21 March 2009 (UTC)
    • I think we should extend access to trusted bot developers who are not admins. It's unlikely an attempt at using the block part of a filter will go unnoticed. - Mgm|(talk) 11:12, 21 March 2009 (UTC)

    Some filter actions aren't addable except by a certain group. — Werdna • talk 06:17, 25 March 2009 (UTC)

    If a filter has such an action on it, does that mean the filter logic can only be changed by members of that group? Otherwise, it would seem easily exploitable. Dragons flight (talk) 06:46, 25 March 2009 (UTC)

    Optimisation statistics

    Many rules begin with something like "!("autoconfirmed" in user_groups) & article_namespace == 0". Does anyone have real data about what proportion of edits belong to each usergroup or namespace so we can put these in the right order? For example would it be faster to use "(article_namespace == 2) & !("user" in USER_GROUPS)" or "!("user" in USER_GROUPS) & (article_namespace == 2)"? -- zzuuzz (talk) 12:46, 23 March 2009 (UTC)

    For equivalent servers with equivalent loads, I don't think the example you are worrying about would ever be important enough to make a meaningful difference. I might care for strong descriminators like editing Mediawiki space, or being a sysop, but for common groups on common namespaces I wouldn't worry about it. The namespace and group checks are already very fast. Dragons flight (talk) 19:05, 23 March 2009 (UTC)
    Most, if not all of this type of data is already loaded elsewhere when editing a page and stored for the whole request in the various global variables, so any extra time for the abuse filter to load it would likely be the same. Its possible, depending on where the abuse filter does its checks, that not all of the data will have been loaded yet, so the loading will be included into the filter's execution time, but for something like namespace or user groups, this would just be shifting the loading from some other function to the abuse filter. Mr.Z-man 16:32, 25 March 2009 (UTC)

    Page blanking filters

    We currently have 4 filters related to blanking pages, Special:AbuseFilter/3 for new users blanking or nearly blanking articles, Special:AbuseFilter/30 for new users removing large amounts of content from an article, Special:AbuseFilter/33 for talk page blanking by anons, and Special:AbuseFilter/34 for user talk blanking. Some of the logic in these could probably be merged. Mr.Z-man 20:33, 23 March 2009 (UTC)

    3 and 30 use very similar logic and should be merged. I was ready to merge them, but realized that a new message was necessary. I will probably do the merge later. Ruslik (talk) 08:32, 25 March 2009 (UTC)
    I'd actually prefer they not be merged and 3 be set to full disallow per the discussion higher up the page. While blanking can sometimes be understood as an implied request for deletion it is not actually an appropriate method of deleting articles, and hence channeling people towards more appropriate responses would be a better goal for 3. However, I don't think one can act so broadly with 30. Even though most partial content deletions are bad, there are still a number of good and valid reasons to remove some of a page's content. Dragons flight (talk) 08:47, 25 March 2009 (UTC)

    A simple suggested feature for managing load

    To help manage load, I suggest that each filter include two conditions, both of which must match in order for the edit to match. The first is a "broad" condition that is quick to test and narrows it down to a small number of edits. The second may be more expensive. The point of this is that the second condition doesn't have to parsed or executed at all unless the first condition matches. It would be great to cache the parsed machine code for the rules instead (providing that and/or do proper short-circuit evaluation), but I believe that would be more difficult to implement. Dcoetzee 04:04, 24 March 2009 (UTC)

    • Aren't we doing this already by selecting on usergroup and namespace in most filters (broad and narrow condition)? The only chance of improvement I see is to allow 1 condition in multiple filters to be evaluated just once for a particular edit and to avoid entire filters when another one already triggered. (Don't trigger 3 and 30 or 61 and 30 at the same time) - Mgm|(talk) 06:06, 24 March 2009 (UTC)
      • If I'm not mistaken, the current approach avoids executing the more expensive tests but not actually parsing them. If parsing is trivial then of course this is a nonissue. It might be neat to have dependencies between rules, but you could achieve a similar effect by combining several rules into one (ignoring the cost of parsing). Dcoetzee 07:14, 24 March 2009 (UTC)
        • That's correct, the code short circuits executions but not parsing. Parsing is cheap most of the time. It is significantly less than 1 ms per condition typically, so <10 conditions should never be a problem from a parser point of view. 20-30 would be okay with good reason. The problem with rule 38 is that it had ~170 conditions, and was basically drowning in them, with considerable overhead devoted to the parser even though those branches didn't execute (not to mention chewing up 500+ units of the 1000 unit condition limit). I talked to Werdna about caching the parsed code. According to him, they actually tried it, but for normal rules (i.e. the not huge ones) it was taking longer to retrieve the cached code from the MemCached server than it was to reparse from scratch. That dynamic might change if one allowed really long rules, but for now the aim is improving the parser speed and logic rather than caching. So for the moment how about we agree to keep the rules not huge and use the built-in timing functions to judge problems? And I'll continue to kick at things that seem to cause problems.  :-). For the curious, the filters collectively are adding about 550 ms to the average edit save (which includes both parsing and execution for the 52 active rules). I am actively interested in pushing that latency number down, but I don't think devs will start getting actively pissed till somewhere north of 1000 ms. Dragons flight (talk) 08:25, 24 March 2009 (UTC)
          • That's interesting. It seems to me like that's exactly the kind of situation my original proposal would help a lot with - if the 170 conditions are only parsed for 5% of edits, the average overhead goes way down. Dcoetzee 08:32, 24 March 2009 (UTC)

    I would indeed cheer at this. Small rules are generally great for catching broad-scale problems, but for the more specific cases sometimes specifically targetted rules are needed (which are for freeing some long-time problems really needed). The ~170 conditions in rule 38 were probably never tested on edits (and if so, only once or twice, I think, IPs in that range have not edited the 7 articles in question), as the pre-checks should have prevented that (even the 170 IP-checks were preceded by 2 IP-range checks to narrow down the actual testing of the 170). Still it took down the system with it. Personally, I am way more interested in being able to solve some of those long-time specific disruption cases than only the small cases (but not neglecting those either!).

    Question: If an edit is filtered by the first rule, does it then still parse all others? --Dirk Beetstra T C 09:58, 24 March 2009 (UTC)

    It shouldn't matter if we assume the vast majority of edits will pass all filters—in most cases every active filter is evaluated. ~ Ningauble (talk) 15:33, 24 March 2009 (UTC)
    Good point .. did not think about that. --Dirk Beetstra T C 19:16, 24 March 2009 (UTC)

    It should be noted that filter 38 is using far more ranges than are really necessary. You could encapsulate all those ip_in_range calls into one or two, unless I'm very much mistaken. — Werdna • talk 05:08, 25 March 2009 (UTC)

    Depends on your tolerance for collateral damage, and whether he really wanted just those selected ranges. He targeted ~150 /24s in a larger space that could be covered by 2 /16s and a /20, but doing so would triple the number of affected IPs. As it was, his list has weird gaps like 117-130 except 119, 123, and 125. Dragons flight (talk) 08:22, 25 March 2009 (UTC)


    I'll explain a bit wider. Our long-term POV pusher is using IPs from this provider, who has quite some, strangely separated IPs to give out. The editor is known to change IP within minutes sometimes (200.45.150.231 at 21:18, 200.45.150.174 at 21:53). If you get a range-contribution for some of the /24s, you'll see that (very rough figures) >95% of the edits by these IPs is to the articles caught in the filter (see e.g. this range, <1% to abusive remarks regarding his POV edits on talkpages (e.g.), or, when the articles where protected, abusively undo-ing edits of involved admins (e.g. this IP in the range here). Some of the articles have been protected for 5-6 times, and when the articles get unprotected after half a year it only takes a couple of days for the vandal to return with exactly the same MO (on August 3, protected for 6 months on 3 August 2007, expires 16:05, 3 February 2008 (UTC), 22 February 2008 (19 days after protection ended: exactly same edit (I wonder if he had it in his diary or on his to-do list!).

    Protecting this handful of pages (which was after this edit indefinite) does cause collateral damage. There is at least one IP active on some of the talkpages who can't make the edits himself. I wonder of how many we don't know.

    Options:

    • Enable a huge very specific rule to exclude these /24 only, making the collateral damage minimal (good faith edits of others in that range, though I have seen none yet). But this would mean that parts of the rules have to be parsed and evaluated when they are called, otherwise this is never going to be possible.
    • Figure out other functions that are faster to specifically target all the ranges.
    • block 3 /16 IP ranges from editing (close to how the rule now is). Having a bit of collateral damage as there are bound to be some IPs outside of the target ranges which may be influenced. But as the rule still has 0 hits, that is going to be minimal, and we then still can do the edit though we may have haunted away some good faith editors (which happens as well with semi-protection ..)
    • Semiprotect the articles again, having the collateral damage of all IPs not being able to edit and unestablished editors not being able to edit the articles.

    I wrote the rule as a funnel, doing the big tests first, and narrowing it down to the specific test in the end. That specific test is only to be run on very, very special occasions (article in mainspace, not a 'user', only 7 articles, IP in 2 broad ranges, and only then the specific ranges ..). I am sorry, but it should not matter that that part is there, it does not need to be parsed, because it is hardly ever run.

    As I suggested earlier, catching simple vandalism is good, it brings down the number of edits to the database, less work for the anti-vandalism bots, and less work for admins/editors to clean up. All nice and fine. But I really hope that this filter will also enable us to very, very specifically target as well. There are many cases where there is much collateral damage due to full protection or semi-protection, which can be minimised by writing specific rules (disabling editing to a couple of articles by a couple of editors and/or some IP ranges, e.g.). And specific rules are sometimes bound to be huge. --Dirk Beetstra T C 09:02, 25 March 2009 (UTC)

    I wrote a patch and Werdna applied it that partially addresses this issue. It makes short-circuiting of disabled subexpressions faster and doesn't count them towards the condition limit. You can't entirely ignore such branches because you have to do at least enough parsing find its end and see if it is followed by a not short-circuited expression. (It actually still does more parsing than is necessary for that purpose, but less than occurred before.) The side effect is that one could re-enable rule 38 and things like it without killing the rest of the filter. There are still good reasons for not having a really long filter though. (Including that the condition limit still applies whenever that branch executes, and could block it from completing.) I believe those ~150 \24 calls could be replaced exactly with ~40 calls by collapsing adjacent blocks into \23s and \22s etc. I'd actually prefer to see what could be done with ~10-15 calls or so. With this limited to particular articles, the collateral damage would still be less extensive than semi-protecting the articles. Dragons flight (talk) 16:18, 25 March 2009 (UTC)

    I have now 3 /16 ranges, which is wider than the 150 /24's, though considerably less wide than semi-ing the articles. I understand the problem of 'finding the end', though my thinking in parsing functions would be (thinking from left to right): if the statement is A AND B .. then 'find the first AND or OR, and evaluate everything before that, true & OR -> finished? false & AND -> finished, otherwise .. walk until next AND or OR .. this would only fail if the whole function is encapsulated in ( (in which case the final ) has to be found; this could be solved by a small pre-parser). But I must say, I have never written this type of interpreters, the situation may be more complex than I think.

    If you think you can narrow it down, further without too much loss in performance, feel free. I am happy with these 180000 i.s.o. billions of IP editors blocked (+ the new accounts). If the editor comes back and is still a problem, we can try warning and disallowing then. --Dirk Beetstra T C 20:11, 25 March 2009 (UTC)

    To give some idea of the issues, keep in mind that things like X & Y | Z evaluate as (X & Y) | Z, etc. So one pretty much always has to look beyond for a possible next statement to evaluate. And if X is false, and so one decides to skip Y, one needs to find the end of Y. Looking for the &, |, or ^ that follows Y can be complicated because those same tokens can occur within strings, comments, and ( ) embedded sub-expressions. So the jump ahead parser still needs to intelligently process at least those three categories. With care one can see significant improvements, and we already have, but these are non-trivial considerations. (Oh, and depending on circumstances there are at least a half dozen additional ways to terminate a boolean argument such as Y. Any one care to list them?  ;-) ). Dragons flight (talk) 20:35, 25 March 2009 (UTC)

    Hmmm .. true, I see and understand the problem. Guess we will have to live with some limitations. Thanks for the explanation. --Dirk Beetstra T C 22:30, 25 March 2009 (UTC)

    Am I an idiot, or is this a bug?

    Look at http://en.wikipedia.org/w/index.php?title=Special:AbuseLog&details=45606

    I can't find the edit in the article mentioned, can't find the warning on that user's talk page, and can't find the edit in that user's contributions. What gives? Did that edit actually rise to the level of needing full-on oversight deletion or something? DeFaultRyan (talk) 00:13, 25 March 2009 (UTC)

    I am no expert, but I do know that the filter can prevent an action from taking place. Chillum 00:17, 25 March 2009 (UTC)
    When the abuse filter warns an editor, it logs it. It then logs it again as having taken no action if the user continues and saves the page without fixing the issue (if the edit isn't otherwise disallowed or throttled), so a "warning" log entry will often have 2 entries in the log: the warning, and the actual edit. Mr.Z-man 01:44, 25 March 2009 (UTC)
    • We can already specify a user, a filter and an article in the log. It would be even better if we could choose an action, so we can only check on edits that actually went through. - Mgm|(talk) 09:55, 25 March 2009 (UTC)

    Another day, another bit faster.

    Yesterday two patches increased parser speed and made short circuiting more efficient. Today a patch corrected a caching issue that should also make nearly all rules noticeably faster.

    So we can continue our headlong rush towards 100 rules without worrying the servers will explode (or something like that anyway. :-) )

    In addition, there is a new function, "contains_any( haystack, needle1, needle2, needle3, ... )", which should be a simpler alternative to the construction: "(A in Z) | (B in Z) | (C in Z) | ...". Dragons flight (talk) 05:27, 26 March 2009 (UTC)

    The self-reported stats for the 61 active filters indicate that filtering is adding 295 ms to the average edit (or 4.8 ms per filter). This doesn't count some of the overhead associated with the filter process as a whole but is still about a factor of two less than we had a couple days ago. I am also happy to note that filters targeted at particular pages (or small groups of pages) are often very fast meaning we could potentially address a very large number of topic specific issues. Dragons flight (talk) 06:39, 26 March 2009 (UTC)

    This seems like a good use for the filter. --NE2 01:24, 21 March 2009 (UTC)

    We have a Mediawiki:Spam-blacklist for that. Hersfold (t/a/c) 07:12, 21 March 2009 (UTC)
    That may depend on the goal. They are talking about a warning or simple tagging of those edits, which wouldn't fit the spam blacklist. Dragons flight (talk) 07:23, 21 March 2009 (UTC)
    • It improves Wikipedia. If the runtime is not too large, I'd give it a go and see what shakes out of the tree. We can always disable it. - Mgm|(talk) 13:35, 21 March 2009 (UTC)
    Way ahead of you? ViperSnake151 18:26, 21 March 2009 (UTC)
    The template should probably include some tips on how to improve the link. The one posted here, doesn't give the editor the incentive to change it. - Mgm|(talk) 18:57, 21 March 2009 (UTC)
    Tell them to consider using Webcite in the template --DFS454 (talk) 20:17, 21 March 2009 (UTC)

    Could we remove this filter? If the user gives up instead of fixing it or saving again we lose valuable edits. --Apoc2400 (talk) 19:23, 26 March 2009 (UTC)

    Though discussed, I don't think anyone actually wrote the rule. (I may be wrong, it is a little hard to keep track). Dragons flight (talk) 19:28, 26 March 2009 (UTC)

    AbuseFilter diffs and layout

    I've been looking at the interface for AbuseFilter, and I think Werdna is reinventing the wheel in several instances here. For one thing, the history and diff system is needlessly hairy. I'm wondering if it wouldn't be better to create a new namespace altogether. That way filter developers could watchpage filters, use talk pages (instead of using a plaintext discussion field in a tiny text box), page history, and all of the other features that a wiki is supposed to have. Filters could also be enabled by simply adding them to an "active filters" category (not to mention other categories for sorting them). What do others think? Spidern 12:33, 26 March 2009 (UTC)

    I do have to wonder why they aren't just MediaWiki messages... Happymelon 13:27, 26 March 2009 (UTC)
    Well, in no particular order 1) it wouldn't allow for private filters, 2) it would tie the ability to edit filters to the 'editinterface' right, so if a wiki wanted to make filter editing more or less restricted, they couldn't, without also affecting the rest of the interface. 3) filters aren't part of the interface, 4) this allows them to be cached separately from the message cache, 5) changing options like throttle settings would be more awkward, 6) wiki pages don't work so well for mixes of wikitext and plain text. Mr.Z-man 19:20, 26 March 2009 (UTC)
    I certainly take your points 1 and 2, 1 in particular. The spam blacklist, bad image list, robots.txt addon, etc, aren't part of the interface either; and they could of course be cached separately anyway by the extension. I was only thinking that the filter conditional itself would be in the message page; everything else would still be done through Special:AbuseFilter; this probably also does with point 6. The AbuseFilter extension could use one of the many hooks in the edit-a-page code flow to stop edits to the filters by people without abusefilter-modify even if they had editinterface; although I guess it would be more difficult to resolve the counterissue of people with abusefilter-modify but without editinterface... the software would probably have to use another hook to temporarily grant that permission 'on the fly', which would be rather hackish. I can see how it could work, but not, as you note, with private filters, which could be something of a showstopper. Happymelon 20:08, 26 March 2009 (UTC)

    Extend Documentation

    Would it possible for someone to extend the how-to guide somewhat, with details such as which namespace has which number, since I had to check through several filters to find out which was user-talk. Thanks--Jac16888Talk 13:29, 26 March 2009 (UTC)

    See mw:Help:Magic_words. Ruslik (talk) 13:49, 26 March 2009 (UTC)
    Cheers.--Jac16888Talk 15:31, 26 March 2009 (UTC)

    Extra documentation would still be very welcome since there are several non-standard regex functions in there for which the syntax isn't very clear. - 87.211.75.45 (talk) 05:55, 27 March 2009 (UTC)

    Oversight?

    Do we have any process for discussing new filters before they are implemented? This seems like something that could scare away huge numbers of new users if it isn't applied carefully enough. Also, is they any systematic checking for false positives? --Apoc2400 (talk) 19:18, 26 March 2009 (UTC)

    WP:RAF is used for discussion, as is this page, though neither is mandatory. Except for very trivial cases or emergency intervention against vandalism, all filters should start in the log-only mode to demonstrate empirically that they produce no (or very few) false positives. Continued monitoring is of course appreciated and most warning messages link to WP:FALSEPOS to report false positives. Dragons flight (talk) 19:32, 26 March 2009 (UTC)
    I think I just saw the too-many-spaces filter have false positives on source code in a programming article. I may have misunderstood it though. --Apoc2400 (talk) 19:44, 26 March 2009 (UTC)
    I've added an exclusion for articles using the <source> tag. Thanks for reporting this. This filter also only affects non-autoconfirmed editors. Like many filters, this is a "warning" filter, meaning that the editor is presented with a warning message (currently Mediawiki:abusefilter-warning-whitespace in the case you mention) and then has to option to either correct the issue or save anyway without changes. Because warnings are more friendly than prohibitions and give the user the option of continuing anyway, we tolerate a higher false positive rate on such actions though would still want it to be low (e.g. < 5%). Dragons flight (talk) 19:57, 26 March 2009 (UTC)

    Special:AbuseFilter/98

    Please add more conditions to this filter: if a new page contains a disambiguation template such as {{geodis}}, {{hndis}} or {{surname}}, it should not be triggered. Korg (talk) 22:52, 26 March 2009 (UTC)

    I told it to look for "disambig" in the rendered HTML (rendering should be fast since the rule is limited to pages < 200 characters). Assuming all the relevant templates say "disambiguation" on them that should cover it. Dragons flight (talk) 23:03, 26 March 2009 (UTC)
    Thank you. Surname pages, such as Guesdon, still trigger the filter as they don't contain the text "disambig". Could you please add another condition? Thanks, Korg (talk) 00:28, 27 March 2009 (UTC)

    Some known persistent vandals

    I read this notice at the top:

    This extension will help to prevent harm from some known persistent vandals, with very specific modi operandi

    and wonder what did I do to deserve this abuse from AbuseFilter; I am not the editor who vandalized Tourette syndrome in February 2008 (as mentioned here), did not make changes that fit the modi operandi exhibited back then, and would not consider the pattern of contributions from this IP address to be persistent. I understand that vandalism is a problem, but at the moment abusefilter seems counter to the first three items from the statement of principles that have been part of Wikipedia for over seven years. 67.100.125.197 (talk · contribs) 23:16, 26 March 2009 (UTC)

    Bot edits

    I tried to exclude all bot edits from the filter 76, but have failed so far. I tried both "bot" and "Bot", but they do not seem to work. Does anybody knows how to do this? Ruslik (talk) 20:10, 26 March 2009 (UTC)

    "bot" works, as best I can tell. XLinkBot does not have the bot flag. Happymelon 20:16, 26 March 2009 (UTC)
    Thanks. I did not know about XLinkBot. Ruslik (talk) 20:19, 26 March 2009 (UTC)
    Just to explain, XLinkBot does not have a bot-flag because it's reversions should be checked (especially questionable ones) by recent edit patrollers. As an example: a check some time ago showed that of 30 myspace.com reverts showed that one was questionable OK (it was the official myspace of the subject, though there were quite some other official sites as well), and one contained a OK myspace (myspace of the band, but also included myspaces of other band members, which is less OK). This does mean, that there are possibly inclusions where XLinkBot does revert good edits (and hence, it tries to be extremely friendly, especially on a first revert of an editor, see settings), which can subsequently be reverted (or at least checked) by patrollers. I hope this explains the 'special' role of XLinkBot in some rules. --Dirk Beetstra T C 10:04, 27 March 2009 (UTC)

    Concern about violation of WP:NOT

    Whatever happened to "WP:NOT Censored"? Isn't filtering of edits a form of censorship? 192.12.88.7 (talk) 02:46, 28 March 2009 (UTC)

    You should probably re-read that section of WP:NOT. --Carnildo (talk) 03:46, 28 March 2009 (UTC)

    Basically, WP:NOT means that articles don't have necessary encyclopedic content left out just because of social norms. Nude pictures, swear words, descriptions of gory violence... if they have relevance to the article, they're not excluded just because they might offend some people (if people are offened by such material, they don't have to read Wikipedia). The AbuseFilter, however, is designed to filter edits which actively hurt Wikipedia. While deleting nudity from articles such as List of sex positions would obviously be destructive and impair the quality of the article, preventing someone from adding "haha fuck ass" to the article Pencil (to take a random example) is obviously not harmful! ╟─TreasuryTagcontribs─╢ 08:56, 28 March 2009 (UTC)

    Blocked edits

    It might be useful if Wikipedia editors were to be able to see blocked edits in edit histories. For example, my userpage was blanked and replaced with a toilet plunger picture, and raised my curiosity level. When I anon-blanked my user page, it was blocked by the filter (I don't know which filter) and no edits showed up in page histories. — Rickyrab | Talk 03:32, 28 March 2009 (UTC)

    Actually, part of the point is to keep the swear words, page blankings, and poop jokes out of the page history. However, you can review the Special:AbuseLog for any page (use the "Title" search), and a thoughtful person might find a nice way to modify the user interface to include a link from the history page to the corresponding abuse log page. Dragons flight (talk) 04:31, 28 March 2009 (UTC)
    sounds ok to me... how do other users feel about this? — Rickyrab | Talk 04:45, 28 March 2009 (UTC)

    I agree with DragonF, basically. The point of the abuse-filter is to prevent abuse, the point of the edit-history is to log abuse. If the abuse is prevented, then there's nothing to log, and it means that histories don't get cluttered up with drivel; plus, there's the relatively easy workaround, which can even have a direct link created - out of interest, why do you want such data? ;-) ╟─TreasuryTagcontribs─╢ 08:52, 28 March 2009 (UTC)

    Viewing deleted pages

    On log entries such as this, one seems to be able to view the contents of a deleted page, which is, of course, a privacy risk... Is it possible for this to be fixed?

    Also, more generally, surely the "non-autoconfirmed users deleting {{db}} tags" should disallow, rather than just log, such edits? ╟─TreasuryTagcontribs─╢ 11:21, 18 March 2009 (UTC)

    Are non-autoconfirmed users not allowed to remove speedy-deletion tags? --Conti| 11:37, 18 March 2009 (UTC)
    They should never have any reason to. They are the most likely candidates for removing the tags from articles they create, vandalising etc. An admin or experienced user can always make the final decision on a tag; an anon- or non-autoconfirmed user should never have to. ╟─TreasuryTagcontribs─╢ 11:38, 18 March 2009 (UTC)
    But they are allowed to at present, so you'll have to change the policy first. The only exclusion is that the article creator may not remove the tag Fritzpoll (talk) 12:51, 18 March 2009 (UTC)
    True, so why the filter, then? ╟─TreasuryTagcontribs─╢ 13:52, 18 March 2009 (UTC)
    Beats me. Fritzpoll (talk) 13:59, 18 March 2009 (UTC)
    The filter doesn't block the edit, it doesn't even warn currently. I don't see why the policy would need to be changed for that. Mr.Z-man 15:57, 18 March 2009 (UTC)
    So what's the point of the filter, then? --Conti| 16:05, 18 March 2009 (UTC)
    To provide a list of possible disruptive edits. Personally I'd be in favour of it disallowing or warning, but it's relatively useful as it is now. ╟─TreasuryTagcontribs─╢ 16:08, 18 March 2009 (UTC)
    Hmm, you're right, actually, listing these edits might be useful. I'd be against any automatic action being taken tho, because the filter will probably have too many false positives. --Conti| 16:27, 18 March 2009 (UTC)

    And as for my original point, that deleted pages can be viewed? ╟─TreasuryTagcontribs─╢ 14:28, 18 March 2009 (UTC)

    Well it's about as problematic that new page patrollers see the "soon-to-be-deleted page" as they tag it... but then it gets kept as a permanent record in the log. I'm sure we have some way of supressing offensive or BLP-violating material. –xeno (talk) 14:40, 18 March 2009 (UTC)
    You're not hinting at actually requesting oversight for all of these logged edits (and in some cases, logged attempts at edits) that yesterday would have been simply deleted as attack pages and never seen by the general public again, are you? wodup 15:04, 18 March 2009 (UTC)
    No, not oversight, but perhaps some way for admins to flag parts of the log details as hidden if its required. –xeno (talk) 15:08, 18 March 2009 (UTC)
    Phew! Had me worried. :P What about having the AbuseFilter automatically hide the content if the page that was acted upon to trigger the filter is deleted? wodup 15:17, 18 March 2009 (UTC)
    That would work, if it's even possible. –xeno (talk) 15:38, 18 March 2009 (UTC)

    Filed as bugzilla:18043. wodup 01:10, 19 March 2009 (UTC)

    What if the page in question is not BLP or libelous, but speedy deleted for other reasons (such as the page being a bad joke, or nonnotable)? Surely the public ought to have the right to view THOSE pages. — Rickyrab | Talk 04:20, 28 March 2009 (UTC)
    I wouldn't say that there's a right to see those pages. It may be less harmful or not harmful at all, but it's not a right. wodup 08:09, 30 March 2009 (UTC)

    Disallow filter review

    I'd like to propose we review the five filters currently set to disallow that have seen more than 100 hits. I assume anything that generates only a few hits is being managed by its author, but when one get a large number of hits it is easy to miss false positives and so a collaborative review may be helpful. This is especially important for the disallow filters.

    I've created sections below to identify any problems with these filters. Dragons flight (talk) 03:37, 27 March 2009 (UTC)

    • I've seen several false positives on this one when people tried to include relevant obscenities discussed by reliable sources. Warning is clearly needed and perhaps throttling, but a ban on these edits is not yet a good idea. - Mgm|(talk) 12:47, 27 March 2009 (UTC)
    As always, providing some examples of problems would be nice.  :-) I haven't yet looked at this one, but if you are seeing a non-trivial number of problems with the current logic please do take off the disallow flag. Reducing the attack level should pretty much be our default position if there are significant false positives not immediately resolved with changes in logic. Dragons flight (talk) 15:12, 27 March 2009 (UTC)
    And where's "is a kike", "is a gweilo", "is a pendejo", etc/. in there? 192.12.88.7 (talk) 02:53, 28 March 2009 (UTC)
    Terms that have false positives should be removed from 9 (disallow, IP only) and only be placed in 97 (warn, autoconfirmed and below). I've got a script for building the filters, if a term needs to be removed leave me a note on my talk or ping me on IRC. BJTalk 22:08, 29 March 2009 (UTC)
    All the filter 9 false positives have been fixed in the last update and I took a few other words out as well. BJTalk 09:54, 30 March 2009 (UTC)

    This also turned up no problems in my sampling. Dragons flight (talk) 04:40, 27 March 2009 (UTC)

    I've gone through 40 or so of these, and despite being disturbed at how many juvenile minds are allowed unsupervised access to the internet (one poop joke every 3-5 minutes?!), I don't see any problems. Dragons flight (talk) 04:26, 27 March 2009 (UTC)

    [5] looks like a false positive. --Jayron32.talk.contribs 03:53, 27 March 2009 (UTC)
    :-) That one was fixed yesterday. Actually with this filter (and others targeting specific keywords) it might be worth searching/googling, etc. to see if those key words may have legitimate uses. The false positive you note was a hit against "Roland R" which was obviously too general, but one probably could have predicted that "Roland R" was too general just from thinking about it. Dragons flight (talk) 03:59, 27 March 2009 (UTC)
    How could these false positives [6] [7] happen? --Conti| 14:28, 29 March 2009 (UTC)
    The first is the same issue mentioned above. The second relates to a derogatory variant of a specific user's name that unfortunately also have legitimate uses. Both issues should now be removed from the filter. Dragons flight (talk) 17:19, 29 March 2009 (UTC)
    Whoops. Prolly should've checked the other link first. Anyhow, thanks for fixing the filter. :) --Conti| 17:43, 29 March 2009 (UTC)

    Adding this which wasn't set to disallow last night, but now is. Dragons flight (talk) 15:12, 27 March 2009 (UTC)

    Of those 400, less than 30 are "real". When I originally created the filter, I made a mistake, so there was a lot of stuff being filtered that should not have been. J.delanoygabsadds 15:50, 27 March 2009 (UTC)
    Actually, 390 of the 397 hits are from J.delanoy's original version. The last seven hits are from the filter as rewritten, and they are all valid hits against a JA/Grawp sock. There have been no false positives since the filter was changed. NawlinWiki (talk) 16:42, 27 March 2009 (UTC)