Wikipedia talk:Pending changes/Archive (Trial)


Reviewers promotion

DISCUSSION MOVED TO Wikipedia talk:Reviewers

Hope that's ok! AndrewRT(Talk) 23:24, 2 April 2009 (UTC)

Control groups

If this proposal is turned on, then it is important for those on both sides of the debate to gain useful information from the trial. The problem I see in establishing controls groups is that the set of articles that will be flag-protected during the trial is not determined, and cannot be determined in advance. This will limit what we can learn from the trial.

I propose 2 sets of control groups, as follows:

  1. For articles where flagged protection is turned on during the 2 months, the same article prior to protection being turned on is the control. Obviously the protection needs to be applied for long enough to get useful data, say 4 weeks minimum.
  2. A blanket group of articles, selected by a category they appear in, or the letter they start with. A similarly selected group would be the control.

This second group needs to be large enough that bottlenecks in reviewing become apparent. Kevin (talk) 22:20, 29 March 2009 (UTC)

Sounds good. –Drilnoth (TC) 22:25, 29 March 2009 (UTC)
I object. An underlying premise of suggestion 1 above is that the behavior of an article will be fundamentally the same whether protection is on or off. This is false. Protection and semi-protection are, under the current proposal, going to be used only when an article runs into trouble of some sort: An edit war, recurring vandalism, and so on. Articles which do not have these problems are not supposed to be protected or semi-protected, so they are not supposed to be flag protected or semi-flag protected, either. If the above proposal is carried through, then we will be comparing articles in trouble to articles which are not in trouble. This is fundamentally unfair, and it will skew the results: For instance, if we turn on semi-flag protection for any article with vandalism problems, then it'll look like semi-flag protection causes vandalism. Obviously this is absurd!
The right thing to do is to compare the behavior of two randomly or semi-randomly chosen sets of articles. Choosing articles by category is possible, but articles move into and out of categories all the time as they are reclassified; and what would we do with an article in two categories, one chosen for flagging and another chosen not? A much simpler solution is choosing by initial letter, as Kevin suggests at the end and as I've suggested before. Article moves are much less common, so we can just leave out articles whose initial letter changes during the trial.
Again as before, I suggest that we separate articles according to which half of the alphabet they're in. A–M would be the flagged group, and all other articles would be in the non-flagged group. (Pages not in the article namespace are not covered by the trial, and admins can use whatever they feel is appropriate.) This would work as follows:
  • At the start of the trial, a bot would convert all semi-protected and protected articles in the range A–M into semi-flag protected and flag protected articles.
  • During the trial, the same bot would monitor all changes to protection status. An article in the range A–M that was semi-protected or protected would be converted to semi-flag protected or flag protected. Similarly, an article not in the range A–M that was semi-flag protected or flag protected would be converted to semi-protected or protected.
  • Admins would be instructed to use flagging for articles in the range A–M and non-flagging for everything else. The bot will fix any errors.
  • At the end of the trial, the bot would convert semi-flag protection and flag protection to semi-protection and protection.
Here our trial group would be articles beginning with A–M and our control group would be everything else. If people think that trial would be too large for a first step, then we can use the same procedure on smaller groups of articles: The trial group could be A–E and the control group could be F–J. Articles not in either group would be up to the discretion of admins.
The only issue I can think of is that there may be articles so sensitive that flag protection would be unwise. An example is the Main page. I realize that some people have talked about unprotecting the main page, but I think that's ridiculous. If I were a hardcore vandal, I'd jump through just about any hoop in order to vandalize the main page. Of course, under the proposal I've sketched above, the main page would be flag protected, so they'd need to get their vandalism approved by an admin; the danger is that an admin might flag vandalism by accident, whereas now that's not even possible. Thoughts? Ozob (talk) 16:07, 30 March 2009 (UTC)
This sounds reasonable, too, although I think that extending it to templates would also be nice. I think that a lot of established template-editors would like to try their hands at the more highly-used and complicated ones, and that would be a nice thing to try during the trial. ({{Articleissues}}, for example). –Drilnoth (TC) 16:17, 30 March 2009 (UTC)
Templates would be interesting. I'm not sure whether it's appropriate for a trial because I don't know how common template vandalism is; and if we're not measuring vandalism, then what are we measuring? It seems to me that the primary benefit is opening things up to non-admins. Under my suggestion above, that's certainly possible: If most admins watching a template feel that flag protection would be an improvement, then it could be flag protected instead of fully protected. For most pages, I think flag protection is a better idea than full protection, so I'd be supportive (even though I've never edited a template and have no desire to). Ozob (talk) 16:35, 30 March 2009 (UTC)
For flagged protection, the ability to edit templates would be more of a measure of how many more constructive edits there are to them. Right now, non-admins can't edit templates like {{Article history}}, but if they could there might be a large number of constructive edits from template programmers. Only the smarter vandals will really think to edit templates, and those edits can be reverted by the reviewing admin, leaving the constructive edits that normally wouldn't be made in place to help improve the wiki. –Drilnoth (TC) 17:16, 30 March 2009 (UTC)
What I'm not sure about is how to measure constructive edits. As a rough guess, vandalism = reverted and constructive = not reverted; therefore, we want to measure non-reverted edits. Do you think quantity of edits is an appropriate metric? I realize that it's got flaws, but I don't know of any better ones. (My giant list at Wikipedia:Flagged_revisions/Trial/Proposed_trials#Possible metrics to measure success of trials does not help too much here, since it was mostly intended to measure whether flagged revisions discouraged vandalism and anonymous editing.) If everyone here is okay with using quantity of edits to measure success, then I think extending the trial to the template namespace may be a good idea.
There's only two other potential issues that I can think of. The first is that it may be too much too fast for some people; I'm okay with it myself but I'd be willing to listen to anyone who wants to slow down. The second is that some people may interpret "article" in the proposal to mean "the article namespace" and not "pages", and they might therefore claim that the community does not support a trial on templates at this time. I think this is silly, but the objection might arise, so we should be aware of it. Ozob (talk) 20:55, 30 March 2009 (UTC)
@Ozob, above. I thought the point of this proposal was to see if articles that are semi-flag protected will still have valuable IP edits while keeping the vandalism from open view. So my point was that an article that qualifies for protection by suffering a period of excessive vandalism can be semi-flag protected, and then the activity can be compared between protected and non-protected periods. Seems to be a valid comparison to me. Kevin (talk) 02:44, 31 March 2009 (UTC)
I was hoping to capture not only that information but also the effect of flagging on vandalism. Most people seem to believe that flagging will fight vandalism. But we don't even know if that's true. Even if it is true, we don't have any idea how effective it is because we have no data. I consider this the most important thing to measure. It might be that the proposal makes vandal fighting hugely easier, or it might be that it's just a bureaucratic mess. The community's willingness to turn it on indefinitely will depend a lot on how effective it appears to be.
If we restrict only to your concern, trying to understand how flagging interacts with IP editing, then I think your suggestion is okay. I can't think of any scenarios where it would distort IP editing in some way that ordinary semi-protection wouldn't already. Ozob (talk) 17:40, 31 March 2009 (UTC)

The situation is a bit more complicated, because there are cases were semi flag protection wouldn't be appropriate. We need to apply both in a sensible manner, articles receiving very high levels of vandalism should be semi-protected, it would be counter-productive to use semi flag protection for them: most edits would be vandalism and reverts, it would increase the backlog for reviewing, reverting vandalism, etc. Semi flag protection would be perfectly adapted to articles with little editing activity, but not so for those with a high volume of edits. We could try of course, but we should be ready to semi-protect if needed. For Barack Obama for example, and other heavily vandalized articles, they would probably have to be semi-protected, and maybe additionally semi flag protected with non-reviewer (auto)review disabled. I think it's crucial to have a trial with good results in terms of backlog and that it should reflect the way we're going to use flag protection in the future. If we use massive, strict control groups, I'm pretty sure the trial will fail. As for full protection, we should try full flag protection whenever we have the chance, there are not so many disputes. Cenarium (talk) 17:32, 31 March 2009 (UTC)

As a matter of statistics, we can separate frequently edited articles from infrequently edited articles very easily: Just count how many edits they get. So the issues seem to be:
  1. For the purposes of the trial, do we ever want to force admins to use semi-flag protection instead of ordinary semi-protection, even when they would prefer otherwise?
  2. If so, in what circumstances?
Above I was suggesting that the answers were "Yes" and "A–M". But you're right that this is probably suboptimal: An article may be semi-protected now to prevent IP vandalism, and while semi-flag protection would prevent that vandalism from going live, it would still need to be reverted.
It seems that we really have two competing ideas for a trial here:
  1. Fix a trial group and a control group at the start, and keep articles in those groups no matter what. These could be large groups like I proposed above, or smaller ones such as A–E and F–J. The advantage is that you could get interesting data comparing semi-protection to semi-flag protection. The disadvantage is that the backlog issue is confused because articles beginning with A–J will need more or less reviewing than in normal circumstances.
  2. Allow admins to flag articles according to how they would be flagged in practice. There would be a transition time while articles were switched (the length of this will depend on how much effort people put in to switching), and the advantage is that after that we would get an accurate picture of backlogs. The disadvantage is that we would get very little data comparing semi-protection and flag semi-protection.
I can see two ways forward here. One is to run one of these trials for two months; then we have to pick which kind of trial and why. The other is to run two consecutive trials, each for a month; then we have to fill in the details for both trials. Ozob (talk) 21:45, 31 March 2009 (UTC)

We don't need any control groups. We are not trying to find out how flagged protection works, and the trial will never be blind anyway. As I see it, the purpose of a trial is to let people try it pout in practice and see how they accept it. --Apoc2400 (talk) 19:26, 1 April 2009 (UTC)

I think they need to try it out and then they can pout if they don't like, they can't try it pout. :) –Drilnoth (TC) 19:27, 1 April 2009 (UTC)
If you don't measure the results of the trial, how do you know the system is any good? I mean that as a serious question! After the trial, there will probably be a poll to turn on the system indefinitely. Some people will oppose it: They'll say it does more harm than good. How will you counter that? How do you know that they're not right? If you tell them, "I didn't think it was important to see how the two systems compared, but I still want flagging turned on," then that's as good as admitting you're a member of the WP:CABAL. (And remember, we're not supposed to do that! Oops.) Ozob (talk) 18:31, 3 April 2009 (UTC)

Observation affects the experiment

One major concern that I have about organizing control groups and such is that it will affect the trial in complex ways that can't be controlled for. Two simple examples:

  1. If a particular subset of articles is used, the smaller subset may not accurately reflect flagging backlog.
  2. If consensus describes a particular method for applying flagged protection, admins will not flag-protect articles naturally; this live application is relevant to the proposal.

While I respect efforts to measure the success of the trial accurately, I'm not sure that this is the right path to a useful analysis. {{Nihiltres|talk|log}} 13:24, 3 April 2009 (UTC)

You're right; but we have to do something. I'm leaning towards having two trials. One would flag articles by first letter and would get data on vandalism and IP editing, and the second would tell admins "flag as you would in practice" and would get backlog data. Do you agree that would solve this issue? Ozob (talk) 18:56, 3 April 2009 (UTC)
While I'm concerned that a (short) one-month timespan for each sub-trial might also end up distorting the results (especially for the "flag as in practice" test), that seems like a reasonable compromise. {{Nihiltres|talk|log}} 06:54, 4 April 2009 (UTC)

Talk pages

As a check against manipulation by outside groups, and to preserve the free flow of ideas, I think it is important to say explicitly that Talk: pages should not now nor ever be subject to anything but passive patrolling, even for BLPs. Because Talk: pages are not now subject to full or partial protection, I think there should be broad consensus for this. Wnt (talk) 17:10, 19 January 2010 (UTC)

The existing policy on this isn't changing, but it makes sense to be explicit. --Elvey (talk) 23:17, 8 February 2010 (UTC)
The trial is limited to articles, and I don't think that the extension FlaggedRevs allows to flag talk pages anyway. Cenarium (talk) 17:02, 27 March 2010 (UTC)
I believe that is correct. On the testing site flagged protection can only be enabled in namespace 0 (the article namespace), and I assume that's how it will work here as well. Reach Out to the Truth 19:41, 27 March 2010 (UTC)

Range of articles we need to test this on

Idealy we need to test this on:

  • Articles under full protection
  • Articles under semi protection
  • Articles under neither but that we suspect will have BLP problems (keep an eye on the news to generate this list).
  • Control groups.

Genisock2 (talk) 17:19, 8 May 2010 (UTC)

That looks like a reasonable list Genisock2. Could you work that into the text of Wikipedia:Flagged protection and patrolled revisions/Trial? -- RobLa (talk) 19:02, 9 May 2010 (UTC)
Keep in mind that as established in the consensus found for this trial we should respect the protection policy in choosing where to use flagged protection. Preemptive protection for example is a no go (except in truly exceptional instances). Cenarium (talk) 00:34, 14 May 2010 (UTC)

Simplified protection levels for the trial

After a few private conversations and trying to get my head around the proposed levels and policies, it looks to me like there's some pretty complicated extra user access levels that are being proposed, which seem a bit much given the level of experience this community has with the feature. So, I've proposed a simpler set of access levels for the trial, with the idea that none of this is permanent, and we'll be able to add in a new layer should a consensus emerge that a new access level is necessary to go forward. -- RobLa (talk) 20:15, 10 May 2010 (UTC)

Categories for edits needing review

How are we planning on listing edits needing review? Can we list them by Wikiproject? I would be primarily interested in reviewing medical edits and less in reviewing edits in general. Also how will it be decided what pages get flagged? Will this be lead to the discretion of the Wikiprojects within certain guidelines?--Doc James (talk · contribs · email) 04:44, 29 May 2010 (UTC)

Success criteria?

We have 4 days to go and don't yet know what the criteria for measuring success of the trial are? Rich Farmbrough, 18:20, 11 June 2010 (UTC).

Hopefully we can measure a bunch of stuff such as:
  • Time between edit and review
  • Volume of pending changes
  • Change in vandalism frequency per and post implementation on pages not previously protected
  • Number of suggestions on pages which were previously semi protected
We have this data from before "42% of vandalism is repaired within one viewing and 11% is still present after 100 viewings."[1] Doc James (talk · contribs · email) 18:48, 11 June 2010 (UTC)

Testing pages

We'll have pending changes available in WP namespace so we can make testing; we can use subpages of Wikipedia:Pending changes/Testing, e.g. Wikipedia:Pending changes/Testing/1, Wikipedia:Pending changes/Testing/2, etc. There should be some content on them, we could copy a few articles (uncontroversial ones, like ancient history). Cenarium (talk) 23:30, 14 June 2010 (UTC)

A stupid question, I realize, but I'll ask anyway: wouldn't there be a problem with attribution? Salvio ( Let's talk 'bout it!) 13:23, 15 June 2010 (UTC)
Not if you say from where it comes. At http://flaggedrevs.labs.wikimedia.org/wiki/Main_Page pages are articles imported from enwiki. Cenarium (talk) 15:24, 15 June 2010 (UTC)
I had anticipated it was a stupid question... Thanks for your answer, however. ;) Salvio ( Let's talk 'bout it!) 17:15, 15 June 2010 (UTC)

Entering in a reason

It queries you for a reason whether you accept or decline an edit. Where are those summaries stored and how does one access them? Mkdwtalk 06:53, 16 June 2010 (UTC)

I found them on the page logs but in order to access it you have to view history -> logs -> review logs. Quite the numerous steps for people that will surely want it to be readily available. Perhaps review summary could show up on a mouse over, or have the summary written in beside the accepted tag. Mkdwtalk 06:58, 16 June 2010 (UTC)

Closure

How should the trial end? It had been decided that at the end of the (two-month) trial, the community should decide on whether to continue the implementation or not, with consensus needed to continue it. So a discussion should happen at the end or before the end of the trial, but it hasn't been decided what to do of the implementation pending the conclusion of the discussions, i.e. should it be entirely deactivated, put on hold, or still used.

Comments
I guess I'll start by brainstorming some possible outcomes, and then ask for the right metrics on those outcomes:
Postitive outcomes
  • Previously semi-protected articles receive a substantial number of accepted edits from anonymous and new users
  • A substantial number of malign/poor edits from autoconfirmed non-reviewers are reverted on review of previously semi-protected articles, thus never becoming visible
  • Good edits to pages under 'pending changes' are accepted quickly
Negative outcomes
  • Previously semi-protected articles receive fewer edits from autoconfirmed, non-reviewers.
  • Good edits to pages under 'pending changes' are accepted slowly
  • Editors who make their first edits on pages under 'pending changes' are less inclined to stick around
The correct thresholds are hard to define in advance, and I have no idea if how hard the metrics are to come up with (I'll ask around). The point is to come up with metrics that illustrate the most important benefits and risks of this new feature. It's virtually certain we're not going to be able to pull together all of the data we want, so we need to quickly narrow down the subset of most important information that is realistic to gather.
As for figuring out the consensus, we could conduct a !vote with two separate questions:
  • Should 'pending changes', in its current form be kept or removed from English Wikipedia (keep/remove)?
  • Should work on future versions of 'pending changes' continue, or stop for now (continue/stop)?
Possible outcomes:
  • keep/continue - this means the current configuration will remain on, and work to refine the implementation and configuration will continue
  • keep/stop - this means we keep the current implementation and configuration, and largely stop futzing with it for a while
  • remove/continue - the means we should take the current iteration down, and work to refine the implementation and configuration for a future trial
  • remove/stop - this means we remove the current implementation, and stop work on this for now
Does this sound like the right framework for having an informed discussion about retaining/removing 'pending changes' at the end of the trial? -- RobLa (talk) 20:48, 6 June 2010 (UTC)
Yes, it looks good. But, assuming the trial ends circa August 15, we'll need to have some discussion before the vote, say 1 week, and the vote itself should at least stay opened for 2 weeks, so it'll take at least 3 weeks to reach a conclusion. Until we reach the conclusion, what should we do of the implementation ? Should it be frozen (no new articles under PC), or entirely deactivated ? Cenarium (talk) 02:39, 13 June 2010 (UTC)
The trial IMO should be left running well we decide whether or not to implement it long term. Turning things on and off gets confusing.Doc James (talk · contribs · email) 22:33, 13 June 2010 (UTC)
Good edits to pages under 'pending changes' are accepted quickly
I'm not sure that the trial actually tests this, as there is going to be an absolute limit to the pages that can be protected in this fashion which is probably substantially lower than the number of pages likely to be so protected were the scheme to be approved. Espresso Addict (talk) 01:34, 14 June 2010 (UTC)
Agreed; the ratio of reviewers to pages is something like 100 to 1 right now. Until we have a critical mass of protected pages, there's no way to tell how our response time will be. — Carl (CBM · talk) 14:31, 16 June 2010 (UTC)

Bots

Bots could be helpful for the trial. I'm thinking of two tasks, if they are possible:

  1. reporting edits by autoconfirmed users who are not reviewers to level 1-pending changes-protected pages with pending edits
  2. reporting pending edits not accepted for more than n minutes, where n is fixed depending on the average backlog
  3. holding as many statistics as possible, which can't be hold by the software itself
  4. all autoconfirmed users editing pages protected with pending changes (not only when they have pending edits) in list format.

Cenarium (talk) 08:38, 14 June 2010 (UTC)

How are you thinking of "reporting" these things, on an IRC channel or in some kind of list/category on a Wikipedia page? - EdoDodo talk 19:00, 14 June 2010 (UTC)
On a wikipedia talk page or a noticeboard, the bot would report the edits under a fixed section, in a way similar to WP:AIV. Cenarium (talk) 21:26, 14 June 2010 (UTC)
In my opinion, that's a very good idea! Salvio ( Let's talk 'bout it!) 22:28, 14 June 2010 (UTC)
Well, they could also be reported in a IRC channel such as the one for pending changes. Cenarium (talk) 03:48, 17 June 2010 (UTC)

We'd also need to know the precise number of articles under pending changes. Cenarium (talk) 23:19, 14 June 2010 (UTC)

So we can control the number of articles under PC, the bot would just update a template with the current number. Cenarium (talk) 17:41, 15 June 2010 (UTC)
I could probably manage (3) if you were to give me more information on what you need. It seems to me to also be the most pressing. I'd leave (1) and (2) for others with more time/experience. - Jarry1250 [Humorous? Discuss.] 18:16, 15 June 2010 (UTC)

How is this data available? Can we get it from the API? — Carl (CBM · talk) 19:25, 15 June 2010 (UTC)

The extension is described at mw:Extension:FlaggedRevs, it mentions API. Cenarium (talk) 13:59, 16 June 2010 (UTC)
I spent a while looking at the API yesterday. It has some output, but not everything that might be desired.
I'm not really sure what bot tasks you are looking for here. There is already a collection of special pages that seem to list everything, and some of the things you suggest sound like they would be better as special pages (with dynamic updating).
What bot tasks, exactly, are you thinking of? — Carl (CBM · talk) 14:27, 16 June 2010 (UTC)
  1. It's important to know if there are edits by autoconfirmed users who are not reviewers to level 1-pending changes-protected pages with pending edits, so I wonder if a bot could detect them and report them. I'm not familiar with the API and how bot works; it seems the API allows access to the pages listed at Special:Oldreviewedpages, see mw:Extension:FlaggedRevs#list_.3D_oldreviewedpages, so it may be able to detect each new edit to one of those pages by non-reviewer autoconfirmed users ?
  2. It's also important to know if there is a backlog of pending changes, could a bot warn if there is a pending edit which has not been reviewed for more than x minutes (say 5 for now), also using Special:Oldreviewedpages (by checking the timestamp for the latest accepted revisions) ?
  3. Also requested: a bot which updates a template with the number of pages under pending changes, which can be retrieved from Special:StablePages. Cenarium (talk) 03:48, 17 June 2010 (UTC)
All of these would be better if they were implemented directly in mediawiki, rather than done as a bot. For #1, I don't understand the difference between what you are saying and Special:Oldreviewedpages.
For #2, the problem is that if the bot posts a message "there are old edits" and someone reviews them, the bot's message is instantly outdated. Even if the bot ran once per minute, many people would see the message. So it would be better to just have a special page that shows the pages in order of "oldest unreviewed revision".
For #3, it is just a matter of adding a new magic word to the system. — Carl (CBM · talk) 11:12, 17 June 2010 (UTC)
There is a bug filed (feature request) for #3 at bugzilla:23903. — Carl (CBM · talk) 11:39, 17 June 2010 (UTC)
Special:Oldreviewedpages lists pages with pending edits. What we need to gather are edits made by autoconfirmed users who are not reviewers to pages with pending edits. Because on semi-protected pages, autoconfirmed users would be able to edit without restriction, but if they're editing a PC-protected page while the latest edit was by an unregistered or new user and not yet accepted, their edit will be delayed. So we need to be aware of such events.
I suppose the bot could check if there are several old edits, for example more than 5 edits older than 5 minutes. For now it's not going to be a problem, I rarely see a page listed at Special:Oldreviewedpages, but when we'll have more pages under PC, backlogs may happen occasionally, so we'll need to attract the attention of reviewers. Cenarium (talk) 23:17, 17 June 2010 (UTC)

Bug

Reverting changes with Twinkle doesn't work. --MrStalker (talk) 16:40, 17 June 2010 (UTC)

Pending changes for the TFA

I think that we should include each day's Featured article, for that day , for some of the trial - I think that this may be a way to keep the Featured Article from being vandalized so much. עוד מישהו Od Mishehu 13:19, 20 June 2010 (UTC)

Strong oppose - it might be worth consideration, in a few weeks, but it'll take a lot of discussion - yes, I'm prepared to discuss it, I wrote 'strong oppose' to make it clear that careful consideration is required. We've had lots and lots of debates over whether or not TFA should be 'normally' protected, and the same arguments can be used for pending.  Chzz  ►  13:49, 20 June 2010 (UTC)
  1. ^ Priedhorsky, R. Chen, J. K. Lam, S. K. Panciera, K. Terveen, L. Riedl, J. (2007) Creating, Destroying, and Restoring Value in Wikipedia. GROUP’07, November 4–7, 2007, Sanibel Island, Florida, USA, in in Proceedings of the 2007 international ACM conference on Supporting group work, ACM Press, pages 259-268. ISBN 9781595938459 DOI 10.1145/1316624.1316663 Website at URL: http://portal.acm.org/citation.cfm?id=1316624.1316663