Wikipedia talk:Flagged revisions/Archive 7

Latest comment: 15 years ago by Ruslik0 in topic /Trial updated
Archive 1Archive 5Archive 6Archive 7Archive 8Archive 9

Proposal for implementation

Given that the results of the straw poll showed support for some implementation of flagged revisions but no consensus for a full implementation on all articles, I propose that we produce a proposal for a limited implementation to put to the full community. To try and get the widest consensus possible this implementation should be quite limited in scale - if it is successful then consensus for extension in the future should be easier.

Therefore how about producing a proposal for implementing flagged revisions only on BLPs and only where a consensus is formed on the talk page for implementing them on that page. This tries to get support from those who only support on a categorical basis (BLPs only) and from those who support based on requiring a consensus for each article (must get a consensus on the talk page before implementing).

The flagged revisions would only be there to prevent vandalism and blatant BLP violations not as a quality check.

Personally I support implementing flagged revisions widely but it is plain that consensus for that will probably be lacking. However a limited implementation like this could get more support and will enable us to see how well it works in practice here (such as how quickly revisions are flagged).

Thoughts? Davewild (talk) 09:39, 19 October 2008 (UTC)

Implementing based on both a category and on consensus is self-defeating - if we were to conduct a small-scale trial, Evolution should be included as a case-study of how FlaggedRevs' handling of disputes compares to semi- or full-protection. Consensus trumps a category every time; why not just suggest a trial on ~500 articles, to be decided by consensus? Happymelon 12:29, 19 October 2008 (UTC)
Would it be such a small number? That would depend on consensus, my thought was that there would be an initial rush of articles getting flagged revisions, then, dependent on how well flagged revisions are seen to be working, the number with flagged revisions would either stagnate or expand to cover a large number of the BLPs. This would be one element in perception of the success of flagged revisions. The purpose for limiting it to BLPs is twofold, firstly there is the clearest benefit to limiting the amount of vandalism readers see on BLPs even more than on other articles, secondly it reasures opponents and less enthuiastic supporters that there is a limit to how many articles would be flagged without the whole community making a new decision.
Limiting to a trial of 500 appears top-down to me, instead of permitting the editors to make the decision as they go. I don't see this just as a trial but as an actual implementation as well. Davewild (talk) 12:44, 19 October 2008 (UTC)
The problem with implementing it on any category of articles is that there will be articles for which FlaggedRevisions is not particularly useful, and articles outside the category for which it is desperately needed. On the other hand, if we trust consensus (and we have to!) then we assume that all articles for which there is a consensus to enable FlaggedRevs on would benefit from it; if there are other articles, there is no reason not to find a consensus there too. The 500 number is purely arbitrary, but you must see how your open-ended proposal (given that BLPs make up ~10% of our article base) would appear frigtening to 'weak opponents' (people prepared to be convinced, but currently in opposition) to FlaggedRevs. Such a proposal is not really a "test" so much as a blanket enabling over an enormous swathe of articles. A "top down" approach is more obviously a trial, and is therefore more controllable (and, if necessary, reversible). You might "[not] see this just as a trial but as an actual implementaiton", but this will polarise the discussion in a fashion that is not helpful at this time. If we are to run a trial, we must run a trial; if we make an implementation, we must be entirely honest about it. This not least because the developers simply won't make the technical changes unless we present a clear consensus for them to do so. Happymelon 19:16, 19 October 2008 (UTC)
I have proposed something similar for days, User:Cenarium/Proposal. Cenarium Talk 12:53, 19 October 2008 (UTC)
I would like to clarify that proposals for implementation (who, how) are rather different than proposals for trials (what, when). Before we proceed with planning a test run, we need to have a specific understanding of what we are trying to implement. My proposal does not include specifics for how a test-run would be conducted, but I feel that it fits the bill for the initial implementation of flagged revisions. I'd love some feedback. Random89 22:59, 20 October 2008 (UTC)
I think your proposal is straightforward, easy to understand, and workable. Let's go with it. Cla68 (talk) 00:33, 21 October 2008 (UTC)
Now, you see, I must disagree. I think the template namespace is an appallingly bad place to start FlaggedRevs. It's underwatched, mainly edited by experienced and niche editors, and monitored by the same. The majority of those users (and I count myself as one of them) are unlikely to want to take on the task of sighting the Template: namespace; of the huge number of editors we have who are interested in that role, only a minority have the technical competence (experience with template and ParserFunction syntax) to evaluate which edits are destructive and which are benign. The Template: namespace is one of the least widely-vandalised (now that would be an interesting study!) and so the benefit to be gained from the work required is the lowest of all the highly-public namespaces. On the other hand, merely having templates sighted would not enable us to unprotect them en-masse - I usually bring out this example of why some templates are protected - it's for the protection of would-be editors as much as anything else. Full protection makes people stop and think twice, no matter who they are or how good they think they are at template coding. Similarly, we can't use FlaggedRevisions to 'sandbox' changes to a template before they go live - either the current version is displayed on all pages, or the sighted version is displayed even on the sandboxes where we want to test the latest updates. FlaggedRevisions in the Template: namespace are, in summary, useless. Happymelon 12:48, 21 October 2008 (UTC)
I'm not sure if you were referring to Cenarium's proposal or my "clean pages" proposal, but I will address your concerns as they relate to the latter. I purposely did not state which pages should be initially covered by this proposal, as I said above that is a matter separate from the proposal itself. I claimed that it might be of use to apply it to the main page templates. By this I meant templates such as ITN, DYK, and OTD; templates that are not highly technical and are not likely to have accidental syntax errors. Obviously more complex templates are another matter, and may warrant continuing full protection. Random89 19:06, 21 October 2008 (UTC)
Then I disagree. Precisely because the template space (and portal space) is almost exclusively edited by experienced editor, it won't add much work. Vandalism there affects a number of pages, and is harder to detect and repair, even obvious one, for example Template:Convert/m2, transcluded on almost 10000 pages, remained vandalized for 20 minutes. Even if rare, this kind of vandalism is particularly harmful, and one of the preferred targets of persistent vandals. Like in other name-spaces, when a user is unsure on an edit, it'll be left for review to someone else, and the default action is sight, as always. Sighted revs should be used to prevent blatant vandalism from being transcluded, and only that. It won't prevent syntax errors, whether intentional or malicious. Cenarium Talk 17:58, 21 October 2008 (UTC)
Your comment is interesting, but resolves none of my concerns. Yes, the Template: namespace is edited largely by experienced editors, but you are likely to find that a much lower proportion of template editors are interested in sighting revisions rather than working on their pet projects. The majority of templates are either trivially simple (in which case there are likely to be few people monitoring them) or hideously complicated. The latter cases are often fully protected not because they are vandal targets, but because they are so easy to screw up - my point is not that FlaggedRevisions should be expected to catch syntax errors, but that merely knowing that changes to a template are not reflected immediately wiki-wide is not sufficient to replace full protection in these cases. If FlaggedRevs is used solely to counter vandalism, and every non-vandal edit must be sighted, then such errors will 'advance to go' very rapidly, necessitating the same level of protection. At the other end of the scale, there are hundreds of thousands of pages in the Template: namespace which contain seldom-used, simple templates that were written up to years ago and have no maintenance activity going on. These pages might not be being watched by anyone, and while I agree this makes them a tempting target for vandalism, it also means that sighting the template namespace is by no stretch of the imagination "[not] much work". Whenever vandalism appears in an article, it is noticed and the vandalism is reverted at source. That's not the ideal solution, I admit. But the ability of FlaggedRevisions to combat vandalism has to be balanced against the work and inconvenience that maintaining the system entails. Sighting a namespace which is both so specialist and so quiet is simply not possible without diverting more effort to sighting the changes that are made than to actually improving the content within. I know that's a common criticism of FlaggedRevisions generally, and I know as well as you do that it does not apply to more active namespaces. In fact the more active the namespace involved, the more easily the sighting process can be kept up to date. That's not going to happen in the Template namespace, and we'd be foolish to try. There are other places on en.wiki where FlaggedRevisions can be more usefully employed. Certainly it's a terrible place to conduct a test or begin an implementation. Happymelon 22:18, 21 October 2008 (UTC)
I agree, enabling sighted revs on all template space would be unwise. It's too big an too varied, and the templates where it would be useful (they are, for example navigational templates used in high risk articles) are too few in comparison. So I think that we should enable it individually, if the risks of vandalism are too important but SP is unwarranted. I would also be very much opposed to remove protection from high risk templates and sighting them instead, in particular because surveyors won't always be reliable for that matter, it would be too risky. Cenarium Talk 17:07, 23 October 2008 (UTC)

Please, comment on Random's how to proposal here. Where to implement it is a separate issue. Please start another section to discuss the where. Cla68 (talk) 00:20, 22 October 2008 (UTC)

Random's proposal is yet another permutation of settings and policies to produce a potential implementation of FlaggedRevisions. It is no better suited to a trial than Cenarium's or the OP's. My suggestion is to first discuss and decide: should we be talking about a "trial" here, or an "implementation"?? Happymelon 17:12, 22 October 2008 (UTC)
As I see it, "implementation" has to come before "trial", or else the said trial gains us no objective information. That is the major problem I see with Cenarium's proposal, it mixes the 2 without distinction. I do of course like my own implementation, but the key is to select one so we can procede with planning a trial. Random89 18:41, 22 October 2008 (UTC)
I'm sorry, your first sentence makes no sense to me. We are discussing enabling the possibility of enabling FlaggedRevisions on en.wiki in some form in the immediate future here, no? Either we're thinking of a "trial", whereby we explicitly construct a finite, small-scale experiment with the main objective of gathering data, which is intended to be easily terminated and reversed, probably as a default. Or we're thinking of an "implementation", where we enable FlaggedRevs in what we expect to be its final form, onto a subset of our pages, with the expectation that, if everything goes according to plan, we will simply expand that implementation to the rest of the wiki. In order to make an "implementation", therefore, we have to know, or at least have a good general idea, of which of the innumerable permutations of settings we want to use. Deciding that is a whole realm of discussion that we haven't really had yet, and which we ultimately need to have. A trial, being by definition finite and reversible, is not subject to this restriction: we genuinely can just try a permutation and see if it works, without establishing a 'status quo' that then needs to be overcome. Do you see what I mean: we must ultimately produce an implementation, but only one such implementation can ever be... er... implemented... (:D) here - yes, it can evolve and change slightly, but if we get it fundamentally wrong right in some way, it will be almost impossible to correct. In order to get to that one implementation that we can find a consensus on, we need a lot of discussion, and preferably objective evidence. That evidence can come from trials, and indeed we should give a small-scale trial serious consideration. But again, you're mixing terminology to label the penultimate step in the "work out what we probably want to do, enable it technically, test it to make sure it works, then deploy it wiki-wide" process as a "trial". It may be that we want to start on that process immediately, indeed we already have; but once again we fall into the hole of having innumerable possible implementations to choose from. The possible permutations of FlaggedRevisions are numerous enough for me to be able to state absolutely categorically: no one person's suggestion for implementation will ever gain consensus. The extension can just provide too many things to too many people. Instead of saying "let's do it this way" or "let's do it that way", we need to be thinking in general terms, breaking it down into its separate parts and trying to find a nice compromise that the majority of people can agree with if they're not in blanket opposition to the whole principle. I guarrantee, however, that if every man and his dog comes in here and says "what about this permutation" then they'll still be doing that when hell freezes over. I'm not criticising your proposal or the fact that you made it - equally unhelpful would be people not putting the suggestions out there. But individual proposals should be the nuclei of constructive discussion, not potential endpoints. If you think we're anywhere near the end of the road to FlaggedRevisions, then I'm afraid you're sorely mistaken.

This post has wandered terribly from the original topic, so apologies for that :D Happymelon 23:15, 22 October 2008 (UTC)

Sorry, I see now that I may have been a bit unclear. What I was trying to say was that if we run a trial with certain settings of flagged revision, then turn around and try and change those settings before widespread implementation, we have gained nothing. Basically we need to choose these "permutations", run a trial, and see what the result is. If it is well received, great, if not back to the drawing board. I also think that the discussion below is great; choosing the trial settings (over all FAs for a certain period of time maybe) is different than choosing the "permutations" of flagged revision.
In regards to your final point, I would like to say that I do believe that whatever version of flagged revision we implement in the near-ish future will not be the end of all flagged revision discussion. Indeed, that is the beauty of a wiki. If we agree now that FR's are best to limit vandalism, there's no reason that 2-3 years from now, or even sooner, that we could all be back here thinking "the first implementation was great, but we can go further with this", and decide to adopt a version that results in even higher content standards. That's essentially my philosophy for this; since we're in no hurry, let's start with the basic, and I believe that was reflected in my proposal and others. Random89 17:24, 23 October 2008 (UTC)
A note on your proposal. The absence of spelling or grammatical errors could be a fine requirement, as it can be checked quickly. But there are many articles with cleanup tags, not sighting them would be a huge lost of scope, and it's generally not easy to address their concerns. So again, if we diversify the goals for a given flagged revisions level, it will decrease its usability and cause several kind of problems as I explained above. For example in this case, what about if a cleanup tag is added while the rev was sighted ? It's also the problem of the outdated Wikipedia:Flagged revisions/Sighted versions, whose requirements are even higher. Cenarium Talk 17:56, 23 October 2008 (UTC)

In response to the original suggestion, I would add that enabling flagged revisions should also probably be done when requested by the subjects of BLPs (probably through OTRS). If they are concerned about vandalism to a page about them, it should at least help reduce the problem. It might not allay their concerns, but it would allow them a small measure of control. Sχeptomaniacχαιρετε 23:51, 22 October 2008 (UTC)

Where to conduct the trial

Please use this space to discuss where to conduct a trial run of flagged revisions. I suggest it be tried out on featured articles (FA). There are only 2,000 FA's, so it's a manageable amount for an experiment. Also, FA's get a fair share of attention, so they do get edited, including vandalism, especially the main page FA of the day. Cla68 (talk) 00:21, 23 October 2008 (UTC)

I think our FA collection make good guinea pigs for FlaggedRevs: they cover a broad range of topics, they are all prone to vandalism, the intensity of which varies greatly, and they are a neat and self-contained set of a reasonable size. Happymelon 07:46, 23 October 2008 (UTC)
I agree - the approximately 2000 FA articles provides a manageable subset and the most logical choice. Being peer-reviewed and carefully watched, they are essentially "sighted" already. If if is felt that this is too small of subset to see results quickly enough, the GA articles could also be included. CactusWriter | needles 07:55, 23 October 2008 (UTC)
I'm happy enough with, say, a two-month trial of flagged revisions on featured articles. After and during this process, contructive discussion can take place. – Thomas H. Larsen 09:05, 23 October 2008 (UTC)

We need more than just FAs, it's not flexible enough to judge adequately the expected results of flagged revisions and test the various cases of need. We need various data to see what works well, what is in need of improvement, etc. So below are some propositions from User:Cenarium/Flagged revisions/Proposal#Where, so that we can discuss them individually. Cenarium Talk 17:35, 23 October 2008 (UTC)

Visible pages
  • Temporarily or indefinitely: articles linked from the Main Page, very frequently viewed, related to a current event, and high-visibility pages in other non article spaces (Wikipedia, Help,...). They are particularly needed there, and obviously, a lot of users monitor those. Cenarium Talk 17:35, 23 October 2008 (UTC)
Per consensus
  • An article or a limited series of articles if there's a consensus to do so on the talk page or on a noticeboard. Whether it's a trial or for 'real', consensus prevails. Cenarium Talk 17:35, 23 October 2008 (UTC)
Featured articles
Biographies of living persons
Other
  • I would support to enable sighted revs on a template when it is transcluded in articles with sighted revs, and the main page of portals. We should also consider this for semi protected articles. Cenarium Talk 17:35, 23 October 2008 (UTC)
For the who-knows-how-many-th time, these are not viable sets of articles for "trials". You want to get it out there and start doing good with it, and I can applaud that, but it's not a helpful way to proceed. If FlaggedRevisions are enabled sitewide, yes, these are likely to be the first places they reach, and rightly so, for all the reasons you give above. But yet again, what you are proposing is not a trial, it is a full-scale deployment, which en.wiki in general is just not ready for. To be clear, I'm not necessarily saying that these proposals are not good ideas, I'm simply raining on the parade of the idea that there is actually the consensus to implement them. A trial on FAs might, might, just muster the necessary consensus. A deployment over such a hugely open-ended set of articles (I estimate this is a good quarter or even third of all reader-facing pages) is just not going to find the necessary support at this time. Happymelon 20:25, 23 October 2008 (UTC)
Cenarium, I want, as much as you do, to enable flagged revisions here – as soon as possible. However, we need to consider that a complicated trial will not actually garner the consensus that is desperately needed in this situation. – Thomas H. Larsen 00:14, 24 October 2008 (UTC)
I concur. While I too have hesitations over whether FAs give enough info to judge, s simple and obviuos subset of our articles is the right group for a trial. Either all FAs or all GAs or even all v.1.0 articles(that gives some articles which are not as well developed) would be a good, well defined set. Also, about 2 months seems like a decent trial length. Random89 04:07, 24 October 2008 (UTC)
I would suggest that in addition to the FA articles (currently there are 2,290), the trial could include this set of most vandalized pages. This list might need updating but currently contains approximately 500 articles across a broad range of subjects. It also includes several Wikipedia non-article pages and portals. One of the objectives of this small trial will be to determine the required extra workload for monitoring the very heavily trafficked pages -- and if editors can handle it. I think this condensed list adds a small enough number of these kind of articles that the trial won't get out of hand -- yet, it will provide a proper set of results for critical analysis of the ability to handle heavy vandalism, and for extrapolating those results onto a Wikipedia-wide scale. CactusWriter | needles 06:22, 24 October 2008 (UTC)

What if...?

1. What safeguards against abuse by 'sighters' who imagine they are Admins is it proposed to build in (see Case in point under 'The German implementation' above)?

2. How is one supposed to correct an illiterate 'sighter' who imagines that he or she has the last word?

3. Do we really have to use the ghastly term? --PL (talk) 15:51, 26 October 2008 (UTC)

1. There will always be people who try and abuse power. The wording of the proposal should make it clear that this is a technical ability, not an extra right.
2. To any abusers, we would follow the standard dispute resolution steps. Discuss, then ask for outside intervention. If the actions of the abuser are harming the project, take it up at WP:AIV right away.
3. No. In my proposal I tried to stay away from that term because it seems to just be meaningless techno-babble, and could be confusing to someone not familiar with the idea of flagged revs. Random89 16:33, 26 October 2008 (UTC)

How to

I believe we've gained a consensus above for testing flagged revisions with Featured Articles, and possibly with heavily vandalized articles as well on a case-by-case basis. So, why don't we discuss the "how" now? I believe Random89's proposal works fine. Cla68 (talk) 02:14, 30 October 2008 (UTC)

This should be discussed on Wikipedia talk:FA. Ruslik (talk) 04:43, 30 October 2008 (UTC)
That's true. I'll post something there if someone else doesn't first. Cla68 (talk) 06:30, 30 October 2008 (UTC)
Done. Cla68 (talk) 21:29, 30 October 2008 (UTC)
(de-lurk) - I'd have no problems filing a 'zilla bug to have this enabled, if needs be. I'm seeing rough consensus above for stuff like use on FAs. Just let me know and I'll make the request ;) - Alison 08:22, 7 November 2008 (UTC)
Don't see any consensus. Ruslik (talk) 10:33, 7 November 2008 (UTC)
Oddly, I do not see a consensus NOT to do this. So I say, let's test this. What are we waiting for? This is a needful change, it's way overdue, it's been in test since forever, the de:wp has developed workable processes, etc.... why are we futzing around? Just do it. ++Lar: t/c 17:11, 7 November 2008 (UTC)
What I'm about to say is probably rather controversial, but never mind. I feel that there are limits to the consensus approach, and, if it's not working, it should be ignored to improve the encyclopedia. Frankly, this page is becoming something a little like WT:RFA, all talk and no progress. Whenever anybody suggests anything, it usually results in nothing. Frankly, I feel that if we've got consensus to do this we should do it, and, if we don't have consensus to do it, we should do it anyway because it's best for the community and the encyclopedia as a whole.
I'll probably receive a lot of flak for saying this, but I ask everybody who opposes flagged revisions: if you have another way to ensure that readers receive consistently "safe" results, bring it up, or else stop complaining.
Our allegiance is to reliability and its associated accuracy and neutrality, not to openness. If Wikipedia's goal could be better attained in it was maintained solely by credentialed experts, we should close the project only to experts. (Of course, Wikipedia's goal could not be better attained by doing so, so have no fear.) If Wikipedia's goal could be better attained by sacrificing a very little openness for hopefully a great deal of stability, we should strive towards this goal of reliability by implementing such necessary measures. – Thomas H. Larsen 23:13, 7 November 2008 (UTC)
And we were doing so well. I strongly recommend that you retract this statement, Thomas Larsen; or this thread is going to descend into yet another flame war. How difficult is it to have a productive discussion? Harder than it looks, certainly, but comments such as this do absolutely nothing to help. WP:IAR won't cut any ground with the developers, hence it's not worth even mentioning it. This is not like any old on-wiki discussion: the output of this process can't be anything other than a clear consensus for a particular technical implementation of FlaggedRevisions that we can post off to the devs and say "look here's the consensus (X% in favour, go count votes), here's the specification, please do it". Nothing else will work, so nothing else is worth trying. Happymelon 23:31, 7 November 2008 (UTC)
Then we need to get to "look here's the consensus (X% in favour, go count votes), here's the specification, please do it", because as it stands this proposal is sitting here with only sporatic bursts of (very low level) activity. At the current rate, I don't see it ever progressing. Maybe the next step is coming up with a few implimentations and just (!)voting on them? Or not? Basically I think we need to figure out how to get the ball rolling again, as right now it's stuck in a local minimum of debating alot, and doing nothing (not even trying to use the debate to change or shape proposals). --Falcorian (talk) 23:35, 7 November 2008 (UTC)
Indeed we need more users to get involved, look at the thread above. I have made dozens of proposals and received almost no imput. I'll see if other talk pages have more activity. Cenarium Talk 01:24, 8 November 2008 (UTC)
I know it will be impossible to enable flagged revisions without consensus. However, I'm not entirely confident we will obtain such consensus and I would be disappointed if an uneducated majority could overrule a more knowledgeable majority based on opinions rather than facts and evidence. – Thomas H. Larsen 04:23, 11 November 2008 (UTC)
That's not the definition of consensus as used on wikipedia: it's the quality of the arguments that count, not how many people are making them. I don't think, however, that this issue is in any way uncontroversial or that it's fair to pidgeonhole editors as "uneducated" for holding a particular point of view. The correct response is to move slowly to build up that support, demonstrate that FlaggedRevs has potential, and thus convince otherwise skeptical editors that this is the way forward. Happymelon 08:48, 12 November 2008 (UTC)

Take the bull by the horns

What needs doing? Can we get a draft of a proposal that is good enough, without a lot of "make it perfect or else", and then put it forward for a vote? Is there another way out of this? That it is taking this long is an embarrassment, frankly, and something that will be used against the project by those who do not wish it well. I suggest someone put up a new draft on a subpage, and set a definite time for getting it in front of the community for a vote. Note: Random89's proposal: User:Random89/Proposal for Clean Pages is good enough for me. It may not be perfect but it's good enough. Let's roll. ++Lar: t/c 00:24, 8 November 2008 (UTC)

No, we need more discussion, there are issues to address before even a test, and obviously before a global implementation. For example, I proposed that edits be sighted automatically after a reasonable time if it has not been done before. This would address many oppositions raised above and in the poll, see the specific thread for details. Granting surveyor rights automatically after such light requirements would also considerably diminish the efficiency of sighted revisions. Instead, I propose that a group intermediary between autoconfirmed and surveyor, say 'established', be created, that users passing certain requirements be automatically assigned to it and that this group be assignable and removable by administrators. Established users would have the delay between their edit and the automatic flagging reduced to a very limited time (for example 5 minutes instead of 24 hours). From a security point of view, this would be a huge improvement (it would solve the recurring problem of autoconfirmed users vandalizing or violating major policies occasionally, despite otherwise acceptable contributions, i.e. not indef-blockable but posing a serious threat for many high risk articles). Cenarium Talk 01:06, 8 November 2008 (UTC)
All that seems an implementation detail that could be sorted out after the basics were enabled. Why does this need to be determined in advance? Or gotten exactly perfect? This is a wiki. If it's not quite right, tweak it. ++Lar: t/c 03:16, 8 November 2008 (UTC)
It's not so easy to tweak an extension, if we start a test and then ask a modification to the developers, the test will be over long before the changes become live. Cenarium Talk 03:38, 8 November 2008 (UTC)
I agree with Lar, we don`t need to flesh out all these details to start a limited test with the 2,000 featured articles. We can use the basic rules laid out in Random 89's proposal because most, if not all, of the FA's are already watched by editors who presumably qualify as trusted users. As the test runs, we can address the issues that need to be taken care of before a wider implementation. Cla68 (talk) 03:45, 8 November 2008 (UTC)
There are things we won't be able to reset at the end of the trial. We should be extremely cautious when granting surveyor rights and not use automatic assignment since we don't know what the requirements will be after full implementation. Cenarium Talk 04:10, 8 November 2008 (UTC)
Imagine that, then, users have the sentiment that this is too obtrusive to the open nature of Wikipedia and don't like sighted revisions ? That would be counter-productive in the end. This opposition is the most recurring. Delaying sighting would address this. And we would have the 'established' usergroup, so we could test its use. There are other issues to consider for a test and a specific proposal should be drafted. Cenarium Talk 04:34, 8 November 2008 (UTC)
  • How will it be implemented on FAs ? Shall we enable it individually by configuring each FA page ? How are we suppose to say to sysops don't enable it anywhere else then ? Cenarium Talk 04:34, 8 November 2008 (UTC)
I support the proposal to use FAs (or anything!) as a test bed for the idea. With a small set of articles like this, we don't need a huge team of "sighters", so I don't fully understand Cenarium's caution (above) about granting surveyor rights. As I see it, a small-scale test should be just that; obviously we will make changes afterwards. Cenarium, can you explain your reasons for delaying, because I think I'm missing something important; otherwise, I think we should go ahead - we can't just talk forever! FYI: I posted a RFC here, though don't expect a flood of comments from that! Cheers, Walkerma (talk) 10:09, 8 November 2008 (UTC)
If we enable sighted revs on a large number of articles, we'll have huge backlogs of edits to sight. This will indeed be the end of free editing (see two threads above for details), I proposed that after a certain delay, an edit is automatically sighted by the software (for example, 24 hours for IPs, 18 hours for users, 12 hours for autoconfirmed users and 5 minutes for 'established' users- a new usergroup with automatic assignment, and immediate for surveyors). Whether this is needed for the trial, it's debatable. Cenarium Talk 17:00, 8 November 2008 (UTC)
Agreed. - Dan Dank55 (send/receive) 13:47, 8 November 2008 (UTC)
I do not think User:Random89/Proposal for Clean Pages is acceptable. I disagree with 3 of 4 criteria of clean page. Free form spelling errors? How realistic is this requirement? If you take a random FA you will probably find a couple of typos. If a long article like, for instance, Roman Catholic Church is seriously rewritten (this happened recently), is a sighter expected to read carefully all article, to fix all typos and then mark it as sighted? How much time will it take? If a sighter spends ~1 minute reviewing a complex article, will all this process look like parody of itself?
The second requirement (grammar) has the same problems squared. Grammar is much more complex issue than spelling. I am not sure that the sighter who does not understand the content can decide anything about grammar. There is no sharp boundary between grammar issues and content issues.
Maintains tags at the top? Sorry, but I got involved in this project only because I was frustrated with {{unreferenced}} templates at the tops of some pages.
FA articles as "guinea pigs"? However this sample is not representative of all wikipedia and is too small. Therefore any results obtained from the test will be useless.
Flagged revisions need, in my opinion, cautions approach to their implementation. Controversies should be avoided at any cost. Why not to implement flagged revisions without changing how online wikipedia actually works? I mean that any reader should see the last revision as now. This limited implementation will be still extremely helpful for the WP1.0 project. The article sample will also be broad, as it will include articles of very different quality. This approach is least controversial, because it will not influence how the online wiki is functioning and the potential for abuse of the tools will be minimal either. The changes in what readers see should be the last stage in the implementation process (if it ever implemented). Ruslik (talk) 15:45, 8 November 2008 (UTC)
I agree, these 3 requirements are idealist but irrealist, for a test and even more for a wider implementation. As I said above, that would be a huge loss of scope not to use sighted revs on articles with cleanup tags (take 2008 South Ossetia war for example), and what about when a cleanup tag is added while it was not before (isn't it quite common ?) ? what to do then ? As I explained, a certain level of flagged revs, should have a determined, specific purpose and stick to it. Sighted revs: vandalism and libel. Confirmed revs: edit wars and disputes. If we try to mix everything in a same flagged rev level, it'll be an inextricable mess and inevitably backfire. Cenarium Talk 16:00, 8 November 2008 (UTC)

We need to write something specific for the test, here is a draft. Go ahead, edit. Cenarium Talk 16:53, 8 November 2008 (UTC)

Confirmed revisions

I propose that a system of flagged revisions, "confirmed revisions" be used to control edit wars, disputes and contentious articles (especially blps) where problematic edits are regularly made against consensus, see User:Cenarium/Flagged revisions/Confirmed for a detailed description. This system is independent of sighting versions that are discussed above, but can work together. The principle is that a sysop can configure an article so that the version showed to IPs is the latest confirmed one, and versions are confirmed when they are consensual, or non-controversial. Users able to confirm revisions may be called 'moderators' and chosen after a light process. That would be a net positive compared to full protection or blocks that are the only choices we have in such cases. Cenarium Talk 03:43, 8 November 2008 (UTC)

Trial

See: Wikipedia:Flagged revisions/Trial

My updated proposal for a trial may be found at User:Thomas H. Larsen/flagged revisions trial. – Thomas H. Larsen 00:46, 12 November 2008 (UTC)

[My suggestion for a trial,] [f]or what it's worth:

  • Flagged revisions are enabled for a trial period of two months over all featured articles. It should be made very clear that this test is actually a temporary trial. After two months, flagged revisions will be discussed and submitted to a community vote which will determine whether or not flagged revisions remain enabled and over which pages they cover.
  • The most recently sighted revision is displayed to logged-out users. Logged-in users can configure whether they see the most recent or last sighted revision of articles in their preferences.
  • All revisions are to be sighted unless they are found to contain vandalism, copyright violations, or other material obviously and uncontroversially unacceptable for a free content encyclopedia.
  • Any registered editor may request that an administrator grant them sighter privileges; administrators may choose, at their discretion, to grant or refuse these privileges. Any editor who believes they were unjustly refused privileges may make an appeal on the administrators' noticeboard, and any editor who believes that somebody has abused their privileges may make a request that an administrator remove them on the same noticeboard.

How's this? – Thomas H. Larsen 04:19, 11 November 2008 (UTC)

I think this looks like a good start. Can you add this into Cenarium's proposed language? His definitions help clarify your first and fourth bullets. On the second point: Will the logged-out user see only the sighted revision, or will they also have the option of clicking a link to view the latest version? This option should be made apparent for logged-in users as well. Your third point is a bit unclear. Cenarium's sentence, A revision should be sighted when it is free from vandalism and libel, is better but can be expanded upon. CactusWriter | needles 07:40, 11 November 2008 (UTC)
How are you going to do it from the technical point of view? How will you enable them only over FAs without enabling over other articles? FA differ from other pages only because they have a banner on the talk page and a star. Ruslik (talk) 07:53, 11 November 2008 (UTC)
I think this would be ok as a trial (my only worry is that it might be too small a scale for a trial but it's certainly better than nothing). To give my thoughts on the points raised above, a logged out user (and logged in users as well) should have a link, either as a tab at the top of the page (as on the German wikipedia) or in as a link in the top right corner of the page, which would take them to the most recent version (and back again to the sighted version).
On how to keep it only to featured articles, I would make it so that only admins can first turn on flagged revisions on a page or turn it off again if an article is defeatured. Any admin who flags a non featured article or deflags a still featured article should be taken to the administrators' noticeboard as a first step and it should be treated as wheel warring if they were aware of what they were doing. Davewild (talk) 09:31, 11 November 2008 (UTC)
We set all pages to show the most recent revision by default, and grant the stablesettings permission only to a small trusted usergroup (I think it's more sensible to assign this to 'bureaucrat'). A 'crat begins the trial by setting all FAs to display the stable version, and ends it by resetting them to display the current version. With some CSS or JS we can probably hide all the effects of FlaggedRevs on the other pages if desired. We can restrict the trial as much as technically possible by limiting it to the mainspace, disabling all auto-reviewing, and limiting allocation of the 'surveyor' right to people with an interest in editing the article set. In the initial consensus we mandate the 'crats to start the trial, and set the status quo to be 'deactivation' after a fixed time period. Of course if it goes well there will be cries to extend the trial into a full deployment; so much the better and it's easy to do so, but it's only possible to go down that road in one direction. If we make the initial configuration very restricted, by giving the tools to manage the deployment to bureaucrats (who will only use them in accordance with clear consensus), we retain the ability to easily go in both directions: either we can end the trial by having the crats restore all articles to 'normal' status and strip all users of the reviewing rights; or we can go the other way by asking the devs to extend the various rights and features across the whole article set. If we make the trial too open, however (by giving the ability to set articles to display FlaggedRevs to such a large group as 'sysop'), then the only way we can 'retreat' from a full deployment, given the strong opinions on both sides of this debate, is to have the devs uninstall the extension altogether. It is much more likely that consensus will be found for a tightly-controlled trial than for a loosely-controlled one, and it is just as easy to proceed forward from either one (but difficult if not impossible to retreat from an uncontrolled trial). I will try to flesh out the specifics of this roadmap somewhere, maybe Cenarium's page. Happymelon 16:40, 11 November 2008 (UTC)
Just a small side question about the technical aspect. What is the procedure for adding the "flag indicator" on all the FA articles? Is this something that is programmed into an administrative bot or does it need to be handled manually? I am just wondering how easy it is to set up the pages for the test. CactusWriter | needles 17:22, 11 November 2008 (UTC)
The 'show-stable/show-current' flag (which is what I assume you're talking about) can be set on a per-namespace basis by the developers, or on a per-page basis by users with a specific permission (a technical ability like delete or rollback). By having the developers set the mainspace to 'show-current-revision' and giving 'crats the permission that enables them to override that setting on a per-page basis, it is then possible for them to (manually) change all FAs to 'show-stable'. This should be very easy to automate in the initial setup, and then it should be easy for them or a bot script to keep up with promotions and demotions. Happymelon 17:30, 11 November 2008 (UTC)
Okay, got it. Thanks. CactusWriter | needles 18:10, 11 November 2008 (UTC)
I think it would be better if the most recently flagged revision were shown by default to users, since this is not only a test to see how well the community can handle flagged revisions but also a test of public perception (albeit a very small test of same). The purpose of flagged revisions is to avoid vandalism from appearing publicly. I certainly agree, though, that a link should be provided to the most recent revision; users should see the stable revision by default, but should have the option to easily acquire the most recent revision if they so desire. – Thomas H. Larsen 00:32, 12 November 2008 (UTC)
I think you misunderstand, the 'show current/show default' distinction is what sets the trial articles apart from the rest. If you display the flagged revisions on all articles, that's not a trial, that's a full-scale deployment. On our 2000-some FAs, users will see the most recent flagged revision; on other pages, the current version. Happymelon 08:39, 12 November 2008 (UTC)
Rereading your post I saw that I did misunderstand - I thought you were inferring that flagged revisions should always display the current revision by default and never the stable revision, which was not what you meant. My apologies ... – Thomas H. Larsen 00:25, 14 November 2008 (UTC)

← I've created User:Thomas H. Larsen/flagged revisions trial with my trial proposal. You're welcome to edit it. Cenarium's proposal seems to be similar, but I'm trying to be a little more specific and expand on various points. – Thomas H. Larsen 00:39, 12 November 2008 (UTC)

I don't think it's helpful to continually create proposals in userspace; although I know it's not intentional its location inevitably discourages editing of what really needs to be a community joint effort. It also has the effect of linking proposals very strongly to their originators, which again discourages other editors from making major changes to them. You'd be better off moving that to a subpage of this page, maybe Wikipedia:Flagged revisions/Trial. That way, we can be constantly building on our previous work rather than starting afresh every other day. Happymelon 08:43, 12 November 2008 (UTC)
I have created this page, merging our two proposals. Cenarium Talk 22:15, 13 November 2008 (UTC)
Agree with Happy-melon, and thanks Cenarium. – Thomas H. Larsen 00:37, 14 November 2008 (UTC)

I have a few suggestions, first how will we measure the success or failure of the trail? Sure we will have a vote, but what quantitative evidence will we be able to point to to say that it worked. We should not have the vote afterwords based on how we feel it worked, but on evidence. Who or how will you make featured artilces sightable, but other artilces unsightable? Z gin der 2008-11-12T23:00Z (UTC)

There are several things that can be measured. First for our purposes here, define a good edit as one that improves the readability without changing the meaning; adds content with a reference; is normal housekeeping; changes the tone of the writing toward neutrality; or removes things that have previously been tagged as problematic. Second a bad edit is vandalism, copyright violations, other material obviously and uncontroversially unacceptable for a free content encyclopedia. Everything else is an indifferent edit. First for the period of time preceding the trial: what are the number of good/indifferent/bad edits made to the articles? What percentage of the edits reverted were good/indifferent/bad? How long is the average time a bad edit exists in the article before it is reverted?
After the trial we can measure: what are the number of good/indifferent/bad edits made to the articles? What percentage of edits sighted were good/indifferent/bad? How long is the average time a good edit exists in the article before it is sighted? I personally think a interesting thing to find is how admins follow the instructions. The trial reads All revisions are to be sighted unless they are found to contain vandalism, copyright violations, or other material obviously and uncontroversially unacceptable for a free content encyclopedia. But I suspect that even though good and indifferent edits should be sighted equally, there will be a longer lag for indifferent edits as people hesitate to stamp their approval on something they are unsure of.
But measuring the success of the trial will not rely solely on these numbers. It will be a judgment call about whether the pros outweigh the cons, which will take some discussion. And even if this trial is deemed successful, there are still questions about exactly which implementation will work best here. There likely is a scale issue. The average time for a good edit to be sighted likely depends on the number of article with flagged revisions activated, the number and type of editors with sighting permissions, and the obscurity of the topic. I would predict that running this trial would lead to us gaining useful information to better design the next trial. I don't think this trial alone will give a definative answer, but I definately support the trial.--BirgitteSB 18:11, 13 November 2008 (UTC)
Good luck with the trial of the flagged revisions, but I will continue to suggest that the displayed version of all Wikipedia articles should be the most recent one, with a link next to the Featured Article star to the flagged version, thus keeping the advantage of the flagged revisions and making sure that wikipedia doesn't become out of date as normal encyclopedias so easily do. Judgesurreal777 (talk) 19:06, 13 November 2008 (UTC)
Note that I proposed above that an edit should be automatically sighted after a certain period of time (that could depend on the user rights, for example 18 hours for IPs, 12 hours for users, 6 hours for autoconfirmed users, 5 minutes for 'established' (a new group) users and immediate for surveyors). This would solve the problems raised above and prevent huge backlogs, inevitable if we enable SR massively and since a bad edits is generally reverted within minutes or hours, the efficiency would be very similar. Cenarium Talk 21:58, 13 November 2008 (UTC)
I don't think we should discuss what happens after the trial in this thread, since if we do so it will quickly become convoluted and confused. Suggestions in another thread, though, would be very helpful. – Thomas H. Larsen 00:25, 14 November 2008 (UTC)
The difficulty with having the current revision displayed by default is that it assumes most users will want the most recent, possibly-vandalised content instead of the stable, non-vandalised content. While I'm open to evidence to the contrary, I feel that most casual users of Wikipedia care more that they don't get a page with the words "YOU SUCK!" than that they don't get a page that's slightly out of date. – Thomas H. Larsen 00:29, 14 November 2008 (UTC)
Do you have any idea how huge a backlog there is on newpage patrol? What makes you think that this would be any better? DS (talk) 02:43, 14 November 2008 (UTC)
Hopefully, we won't have this kind of problems if an edit is automatically sighted after a reasonable period of time. Cenarium Talk 22:47, 14 November 2008 (UTC)

Implementation backlog?

All of these discussions are a bit futile.... There are 13(?) open requests for FlaggedRevs on other WMF wikis where consensus is evident and demonstrable. (See here for the list.) So, it may be best to focus energy on other parts of the project (clearing out the new page patrol backlog?) that would be a far better use of everyone's time and energy. --MZMcBride (talk) 02:56, 14 November 2008 (UTC)

I think it is a bad idea to say "since there's a backlog let's not do this at all". Instead, try to see if there is a way to aid the devs in somehow clearing the implementation backlog. I will also point out that (almost certainly) the longer en:wp waits to come to an implementation decision, the longer the wait will be to get it implemented. So why wait? Especially since it's a trial we speak of. ++Lar: t/c 04:22, 14 November 2008 (UTC)
I think you're missing what I'm saying. Read between the lines a bit. ;-) --MZMcBride (talk) 17:41, 14 November 2008 (UTC)
I think you're missing what I'm saying. Read between the lines a bit. ;-) ++Lar: t/c 19:06, 14 November 2008 (UTC)
This blog post by Brion Vibber - [1] is saying that they are tackling the backlog. Davewild (talk) 09:11, 15 November 2008 (UTC)
Note that there are (as of November 28 2008) only four outstanding requests for FlaggedRevs implementation. Happymelon 17:44, 28 November 2008 (UTC)

Possible modification to trial

I have been thinking about this and I thought of a way we would get a more useful information from this trial. If instead showing alll logged out users the sighted version, we could randomly select half the articles to be configured that way and the othe half showing logged out users the most recent version. Or if that is not technically possible we could run the trial for two months in each configuration.--BirgitteSB 14:31, 14 November 2008 (UTC)

Wikipedia talk:Flagged revisions/Trial

Copied from Wikipedia talk:Flagged revisions/Trial.

Backlog

No. Absolutely not. Do you have any idea how much of a backlog there is just for dealing with flagged newpages? We don't have anywhere near the resources to deal with something like this for every single revision. DS (talk) 02:36, 14 November 2008 (UTC)

I agree with DragonflySixtyseven. I hear de has over 100,000 revisions awaiting review. To me this seems like a massive extension of WP:AFC, which as everyone says is a giant mess. Also reminiscent of AFC is the "just a trial" idea. Also the granting of surveyor rights is bound to be a mess and to introduce bureaucracy. delldot ∇. 02:47, 14 November 2008 (UTC)
What about if revisions are automatically sighted within 12 or 24 hours ? See here. (Note also that AFC has well improved within recent months, the Category:Pending Afc requests contains only 22 pages at the time of writing, all recent.) We use to say the same of rollback that it would be a mess to assign and introduce bureaucracy, but in the end it is not. Cenarium Talk 03:47, 14 November 2008 (UTC)
Seems to me like if they're automatically sighted after time you lose the benefit without losing the cost (that people won't get to see their changes right away, it's disempowering or discouraging for new users, new content doesn't show up as fast, etc.). delldot ∇. 04:19, 14 November 2008 (UTC)
Vandalism is reverted very quickly nowadays, we won't loose the benefit. And new users can still see their edits on the draft pages and they know it'll be visible very soon. Cenarium Talk 04:34, 14 November 2008 (UTC)
Sorry for not being clear, I meant the benefit of each revision being reviewed. I would think the success of current vandalism fighting would indicate less of a need for flagged revisions. Seems like clear vandalism and edits that clearly meet all WP's criteria will get dealt with fast--it's the well-intentioned but slightly problematic edits that'll languish, like we see at AFC and with newpage patrol. These are of course the vast majority of newbie edits. I'm not sure seeing edits on the draft page will be as encouraging as the current setup for new contributors. delldot ∇. 04:58, 14 November 2008 (UTC)
As you know it, it's unrealistic to think we could review all edits by non-surveyors, we have a total of 1,251,467,734 edits on en, it's about 5 times the number of edits on de. If there's already 100000 unreviewed edits on de, then most of them are probably not uncontroversially inappropriate. Editors of the English Wikipedia will never accept such a situation on en, me included, so it's exactly because edits that are uncontroversially inappropriate are in most cases handled within a few hours, like clearly good ones, but others require a longer time, that a delayed automatic sighting will be extremely helpful, and not hinder the efficiency of SR. While it may be a benefit in reviewing each edit, it's unrealizable and, I think, not the point of SR, but rather preventing readers from seeing vandalism and libel. Other edits must be sighted if not overwritten or reverted for other reasons. Users can still review edits on their watchlist as before. There are also measures we could take to improve the system. For example, the abuse filter automatically filters all actions, we could create a filter that prevents certain edits from being automatically sighted (including surveyor's edits, and in this case, prevent the surveyor who made the edit from sighting it), as an additional protection, and possibly require sysop rights to sight certain edits identified as 'very bad'. For newbies, I think there are ways to reassure them about that, for example with the message displayed when the edit is done. The message displayed at test.wikipedia is "Edits will be incorporated into the stable version once an authorised user reviews them. The draft is shown below. 1 change awaits review", it needs to be improved and made more friendly (I suppose it's editable in a mediawiki page), telling them that edits will be visible on the article very soon, before a few minutes or hours at most (things we couldn't say without delayed automatic sighting), and that other users may edit in the meantime. We could also make that when an IP has recently edited an article, it is redirected to the draft page for this article (temporarily, as recorded in the computer's cookies). Cenarium Talk 18:17, 14 November 2008 (UTC)
Just to establish a sense of perspective, you are aware that de.wiki has fifty five million edits in total? However, this discussion is misplaced: all discussion over whether or not to enable FlaggedRevisions at all should take place on Wikipedia talk:Flagged revisions. This talk page should be used only for discussing the technical details of the proposed FlaggedRevs trial. Happymelon 18:02, 14 November 2008 (UTC)
Maybe we should transclude this page in Wikipedia talk:Flagged revisions. Cenarium Talk 20:58, 14 November 2008 (UTC)

Don't automatically sight for the trial

I would recommend against automatically sighting after a set length of time for the trial period. If sighted revisions causes a backlog, we would want to be able to gather data on how long revisions remain unsighted, and how big the backlog grows. Perhaps the idea can be revisited a month into the trial, or as a second trial, if this one isn't satisfactory. We would clearly not want to assess too early, as it will take time for editors to be granted privileges and get used to it.

Additional thought: It is possible we might want to implement the feature temporarily (say for the first week) as a stop-gap while editors are requesting and getting used to the sighting privileges.Sχeptomaniacχαιρετε 20:21, 14 November 2008 (UTC)

Since the trial is only on FAs, we won't be able to draw conclusions from it with respect to the size of the backlog. The German experience can tell us however, they have a backlog of about 100000 edits, and we have a volume of edits much more significant. Cenarium Talk 21:11, 14 November 2008 (UTC)
I don't find the number of edits in the backlog to be useful information (in fact, it smells more of FUD than data). It would actually be useful to know the average length of time a revision sits in the backlog, and the upper range. If the average is an hour or two, that's probably not serious, but if it's a few days, then it's worthy of concern. In addition, how many of the German Wikipedians have privileges for sighting, versus the number of active, registered users?
As far as data on backlogs here, I believe data can be acquired, as long as it's reasonably used. A lack of a backlog won't tell us very much, but having any significant one will. There's a lot to be gained from testing this out and seeing what exactly happens, rather than the hysteria I've seen from both sides (one side saying this will destroy WP, the other thinking it will magically fix things). Sχeptomaniacχαιρετε 22:53, 14 November 2008 (UTC)
That would be useful of course, I suppose we should ask German surveyors. I don't have the source for the 100000 edits in the backlog, but if this is indeed true, no matter how terrifying it looks, it should be taken into consideration for a full-scale implementation (and it means some edits haven't been reviewed for days or weeks). As you say, there are two extreme sides, delaying sighting would be a fair middle point. And as I said, it would also be far more adaptable, allow specific behaviors of filters w.r.t. SR, etc. Having randomized on de, I found edits needing review from Nov. 4 and more recent, for example de:Der Ball ist rund, de:MRT (Taipei), de:Leewellen and de:Stefan Jedele. I think the threshold for surveyor rights on de is too high (see de:special:listgrouprights), for example surveyors have rollback rights, while many users with otherwise good edits had it removed on en for misuse, edit warring, etc. Rollback requires surveyor rights for compatibility, but the inverse is not true, so it's an unneeded additional constraint. Cenarium Talk 23:39, 14 November 2008 (UTC)
well, it´s a qualityquestion. a de.WP-surveyor accepts an unsight version, if he isn´t sure. we ve discuss the backlogproblem and generated one, two, tree different Flagged revisions-project to limit the specialproblems of the different phases. a sight article with two weeks old unsight versions is possible. automatically sights ll dwindle the qualityeffect of this feature, best regards --Jan eissfeldt (talk) 02:44, 15 November 2008 (UTC)

If 99% of simple vandalism is removed within an hour or less (I'm making those numbers up, but for this example that's OK, and I think they are not far off the mark, maybe even conservative?) then if a revision gets automatically sighted after 2 hours, then over 99% of simple vandalism will never be seen by the general unlogged in public. That improves the apparent quality of pages, and disincents simple vandalism. Those are both good things. So I support automatic sighting. That doesn't mean sighted edits can't be reviewed and un-sighted later. It does not make things perfect, and it does not defend against POV pushing, or subtle vandalism, or other things, but it vastly reduces the probability of seeing YOU SUCK! for those who are not editors here. Which is our target market after all, not us. ++Lar: t/c 16:57, 15 November 2008 (UTC)

Sorry, but how will the automatic sighting after a period of time be realized technically? I studied FlaggedRevs.php, but failed to find any parameter related to it. Of course, I may be missing something. Ruslik (talk) 17:38, 15 November 2008 (UTC)
Now that's a good question. I had assumed that the code already allowed this but if it does not, then a new bugzilla bug would be needed. ++Lar: t/c 01:15, 16 November 2008 (UTC)
To my knowledge it doesn't exist, I made this up. It seems to be realizable, but there are a few technical suitabilities to resolve. Cenarium Talk 17:30, 16 November 2008 (UTC)
Well, I created Wikipedia:Flagged revisions/Trial/php with detailed technical proposal. Only features that are currently available in FlaggedRevs.php were used. I think it is better to focus the discussion on which variable should be set 'true' and which 'false' instead of abstract concepts. Ruslik (talk) 18:36, 16 November 2008 (UTC)
OK, yes, in that case I agree. Let's stick to which switches and dials we actually have available. ++Lar: t/c 18:42, 16 November 2008 (UTC)
I think we only have a consensus for sighted revisions (others have been vehemently opposed). There should only be two settings: unreviewed and sighted, and only one additional usergroup, surveyor. Bureaucrats only should be able to configure a page. About granting surveyor rights, I'm still undecided: admins or bureaucrats only ? No autopromote to surveyor settings, as that has been too little discussed and too much opposed, and it could be used for another user group, larger than surveyor. No need to make things more complicated at this time. We still need to talk about the full implementation in parallel and how to resolve a number of issues, but, I agree, not here. Cenarium Talk 18:50, 16 November 2008 (UTC)
I want to avoid overburdening bureaucrats. I am proposing to have a special user group responsible for the trials ('reviewers', the name being not so important). Ruslik (talk) 19:01, 16 November 2008 (UTC)
Someone above mentioned the time required for a "Sichtung" (if that's the correct term) on the German Wikipedia. It CAN take quite a while (if it ever gets done!). I edited Joseph Sheridan Le Fanu on 5 September. It's still waiting for a Sichtung. The article was created years ago and it has "Keine Version Gesichtet" at the top. I created Cappella Sansevero on 17 September and made a number of edits since then - it is still waiting for its first Sichtung. My edits of Henry Williamson and County Dublin from 12 November are still waiting (and County Dublin, which has been around for several years, also has the sad news: Keine Version Gesichtet.
That's not the only problem with the German Wikipedia, but it's a nuisance.
BTW, I've also contributed to the Italian and Spanish Wikipedias and never had a problem. Hohenloh + 21:49, 28 November 2008 (UTC)

Minimalist proposal

I updated Wikipedia:Flagged revisions/Trial/php. Now it is as minimalist as it can be in principle. I think now it can be !voted up or down. Ruslik (talk) 09:33, 17 November 2008 (UTC)

It's not too bad, but I disagree with the "one binary scale called accuracy". My justification for this is the lack of support flagged revisions has so far received as an accuracy-maintaining system; consensus, whether rightly or wrongly, seems to support flagged revisions only as a method of countering vandalism. Therefore, I'd propose a single binary scale named Sighted or similar, which can be set to "yes" or, by default, "no". – Thomas H. Larsen 01:27, 19 November 2008 (UTC)
I renamed two levels as sighted and unsighted. As to accuracy, I simply can not find a better word. Sighted is not a noun, and can not be used as the name of scale, in my opinion. The name of the scale is not so important. To avoid confusion I removed the word accuracy from the description. Ruslik (talk) 09:08, 19 November 2008 (UTC)

Who is going to do this?

We currently have 2,306 featured artilces. Who is going to go around with the reviewer right and make everyone of these pages flagable and then undo all of this in two months? Z gin der 2008-11-19T17:12Z (UTC)

Short answer, a bot will. That is a trivially-sized task. Happymelon 18:18, 19 November 2008 (UTC)

Metrics

I am very much in favour of sighted/flagged revisions, but one aspect of this trial concerns me. How can we have a trial without having some metric for whether or not it has been successful? Otherwise all trials will dissolve into a qualitative argument about the efficacy. What I'd propose is that when a trial is agreed on, the articles selected are first reviewed for a fixed period of time ( or the history is assessed ) so that we can say what the last X weeks has brought in the form of vandalism (which is currently automatically visible), and compare that to the amount of vandalism made visible (through human error, presumably) during the trial. Then there is a reasonable metric of performance - beware also bias in the sample, since if, for instance, we chose FAs as the class for testing, I imagine that the TFAs might be skewed - that means that the metric should be measured over all FAs since it would encompass the TFA effect. Fritzpoll (talk) 11:18, 28 November 2008 (UTC)

While I agree with the principle of setting out what we should be monitoring over the trial period, anyone who believes that there is some magic number by which its "success" or "failure" can be judged is unduly optimistic. While the very thought of there being a way to quantitively measure the efficacy of FlaggedRevs (or indeed any other process on-wiki) is itself laughable, even if there were a magic formula to objectify the process, how on earth are we supposed to decide what value represents "success"? Like everything else on-wiki, FlaggedRevs is a consensus-building process; our main problem from the start has been an inclination to view it erroenously as a matrix of binary choices. There is no simple straight line on one side of which lies "success" and on the other "failure"; there is a continuum whereby each contributor to the future discussion will evaluate the trial results and draw their own conclusions. Some will change their opinions based on the new evidence, some will not, and of course change can and will occur in both directions. The important thing about a trial is that it gives people hard evidence on which to refine and reevaluate their opinions and arguments. So while I fully anticipate reams of statistics and synopses being extracted from the raw trial data, I think it is incorrect to assume that we can or should all interpret those data in the same way. Happymelon 17:06, 28 November 2008 (UTC)
My point was more that the data should be available for analysis, otherwise we'll end up no better off after the trial than before it. Fritzpoll (talk) 10:40, 30 November 2008 (UTC)

Name of usergroup

'Editor' doesn't make sense, all users are editors, and it doesn't describe the role at all (it also implies that others are not editors, huh). 'Sighter' is more explicit but it is too connoted to warfare. Surveyor has been used quite regularly, and if no better name is found, it sounds like an acceptable choice. Cenarium Talk 15:50, 29 November 2008 (UTC)

I agree that the implicit declaration of other users as non-editors is at best unhelpful. How about using 'reviewer' where we currently use 'editor' and 'surveyor' where we currently use 'reviewer'?? That seems more in keeping with the latter's role of oversight and moderation, while the former is responsible for actually reviewing pages. Happymelon 16:00, 29 November 2008 (UTC)
Sounds acceptable, for the trial at least. Cenarium Talk 23:14, 29 November 2008 (UTC)

Some comments

  1. Link to most recent version: In this trial, will there be prominent links to allow logged-out users to easily view the most-recent version if they wish, and to allow logged-in users to easily view the most-recent sighted version (or a diff of the two)? The wording of this trial page makes it sound as if maybe logged-out users won't be allowed to view the most-recent version, etc. I was going to edit in the phrase "by default", but that could be taken to mean something else, so I couldn't think of a good wording.
  2. Symbols in Recent Changes and on watchlists: if people can easily see which edits are unsighted, that will help a lot, I think.
  3. Wide level of trust: if people are worried about a big backlog, then give reviewer privileges to almost everybody. It will still catch vandalism by new users, i.e. most vandalism, I think.
  4. Nomenclature: we could label the link to the most recent sighted version "current", and to the newest (potentially unsighted) version "newest". This avoids implying that the "current" version is officially approved by highly trusted reviewers; in an optimally balanced system, some vandalism will still slip through. It also gives a more encouraging name than "draft" to the version someone has just edited. Um, by the way, if automatic flagging occurs, "sighted" wouldn't be accurate.
  5. Automatic flagging: if the software allows it, we could have three categories: 0, new edits; 1, automatically flagged after a period of time (5 minutes? 24 hours?); and 2, sighted. The most recent version with a non-zero flag would be displayed by default to non-logged-in users, but the difference in flags would still (if the software allows it) be evident in RC and watchlists to alert people that the edit may still need to be checked for vandalism. Coppertwig(talk) 21:08, 29 November 2008 (UTC)
I think 1,2,3 and 4 are good ideas. Especially 2! That would help with up keep, making it obvious which edits to double check and sight. --Falcorian (talk) 00:26, 30 November 2008 (UTC)
Some editors have proposed that we don't use 'delayed sighting' or similar systems, like expired revisions for the trial, to see how huge the backlog will become. Anyway, it is likely that the extension won't be updated to include this kind of systems at the time of the trial. I also think we need to work out ways to make this appear less like an approval process, it is not, but rather a manual and possibly partially automatic verification process. As I said in the #backlog section, the post-edit message needs to be completely revamped. Unsighted edits have an exclamation mark in the default extension in recentchanges, watchlists... For the 'classification' of edits, we may use 'new', 'expired' and 'sighted'. On granting reviewer/surveyor rights, we should be careful in the trial since we won't know how the usergroup will differ from the trial version. Cenarium Talk 02:01, 30 November 2008 (UTC)
The right that's currently called 'reviewer' is totally different to the 'editor' right. In fact reviewers can't actually sight revisions! They have to be editors as well. Reviewers are solely responsible for organising and implementing trials, as such, they must be a very restricted group. I think you've misunderstood the permissions (or, equally confusingly, you've started using the new terminology we suggested above :S). For the other points, I suggest that you investigate the setup at http://en.labs.wikimedia.org where the full FlaggedRevs demo is running. Most pages display the current version by default; I've got sysop rights there so I've set Tablature to display the stable version. You can see the answers to your points 1 and 2 there. I'm not sure I follow your argument against the use of "draft"; how is it inaccurate? Since we're not using automatic sighting for this trial at least, I don't think your points 4b and 5 are valid. I agree that automatic sighting is a very useful concept that we might well consider using, but we seem to be getting a little caught up in a feature that's not actually available yet! Happymelon 09:55, 30 November 2008 (UTC)

/Trial updated

I have spent a few hours today going over the proposal at WP:Flagged revisions/Trial and trying to corral it into a presentable form. I believe I have now done so. As discussion on this page seems to have dried up somewhat, I suggest that we consider if and how this proposal can be improved and, if not, how we intend to go about determining a consensus to implement it. This would be, I feel, a good way to proceed. Comments, please. Happymelon 15:35, 29 November 2008 (UTC)

There is one question that worries me. It is connected to RCPatrol. According to the manual patrolling and autopatrolling is disable in any namespace where Flagged Revisions are enabled. Therefore it will be possible to mark a page in the mainspace of portalspace as patroled only by sighting it. However Flagged Revisions will be enabled only over a limited number of pages, which means it will be not possible to patrol the vast majority of mainspace or portalspace pages. At least, it is my interpretation of the manual. Ruslik (talk) 16:26, 29 November 2008 (UTC)
I think you're right: I've looked at the source and, by installing FlaggedRevs, you add a hook that strips all users of their normal patrolling rights, so you can only use the FlaggedRevs' own autopatrolling features. That is, frankly, ridiculous, and shouldn't be necessary. I'll file a bug and maybe write a patch to make that configurable at least. It's entirely reasonable to want to have FlaggedRevs enabled on only a subset of articles, and still use RCPatrol for the rest of them. Happymelon 17:15, 29 November 2008 (UTC)
Filed as T18495 Happymelon 17:34, 29 November 2008 (UTC)
Thank you. Ruslik (talk) 17:42, 29 November 2008 (UTC)
Should be fixed in r44044, which will be live long before this implementation ever is. Pages where FlaggedRevs are not visible behave exactly like normal pages. Happymelon 19:24, 29 November 2008 (UTC)
You probably mean r44045. -- Jitse Niesen (talk) 19:56, 29 November 2008 (UTC)
Nah, Doxygen updates all the way! Lol... Happymelon 20:07, 29 November 2008 (UTC)
So $wgFlaggedRevsReviewForDefault should be set to false? Ruslik (talk) 20:18, 29 November 2008 (UTC)
Looks great. Rather than using all featured articles, I'd suggest instead having the community wiki-edit a list of articles that have had to be protected a lot recently due to vandalism.
This quote from Arbcom might possibly be useful: "...mitigate the harms caused by BLP violations while also reducing any negative impact created by the necessary enforcement measures themselves. The developers are urged to give priority attention to any needed software enhancements that may be needed to implement new features recommended by consensus of the community with respect to these matters." (link). Coppertwig(talk) 20:40, 29 November 2008 (UTC)
I think that's a good suggestion and I think the updated trial proposal is good to go. Cla68 (talk) 21:35, 29 November 2008 (UTC)
The updated proposal is very good—nice work, HM. Does everybody agree that it's time for a community vote, announced via watchlist notices perhaps, to gain approval for the trial? – Thomas H. Larsen 23:12, 30 November 2008 (UTC)
In doing such, it will be important to be clear that consensus is only being sought for the technical implementation itself (and hence the mandate to conduct trials) rather than any specific trial (ie FA/FP). I'm not sure how best to explain that; moving the FA/FP trial to another page would work, but it does provide a useful example of what is likely to be done with the implementation. Thoughts? Happymelon 23:36, 30 November 2008 (UTC)
I think it is wise to present two questions to community: one about technical implementation, and another one about the first trial. Other trial can be discussed later. Ruslik (talk) 04:32, 1 December 2008 (UTC)