Wikipedia:Wikipedia Signpost/Single/2011-10-31

The Signpost
Single-page Edition
WP:POST/1
31 October 2011

 

2011-10-31

The monster under the rug

Sven Manguard has been editing Wikipedia for just over a year. He works primarily in the File namespace, but also participates in backlog eliminations and other gnomish tasks. Below, Sven makes a personal plea to the community, asking editors to become more involved in eliminating backlogs. The author would like to thank editors ThatPeskyCommoner, Ironholds, and Fox for offering their support and advice in the creation of this essay.

The views expressed are those of the author only. Responses and critical commentary are invited in the comments section. The Signpost welcomes proposals for op-eds. If you have one in mind, please leave a message at the opinion desk.


The task of encyclopaedia cultivation generates vast amounts of paperwork that, if left unaddressed by volunteers, accumulate into enormous backlogs.

Whatever people may say about declining participation, Wikipedia still generates a lot of new content. We add articles and upload dozens upon dozens of files every day, and that is unquestionably a good thing. However, as a community, we tend to neglect a large variety of problems that have cropped up in older articles. We sweep them under the rug, so to speak, and that is unquestionably a very bad thing.

The fact of the matter is that Wikipedia has swept so many problems under the rug that we now have a monster on our hands. We have backlogs that are in the hundreds, in the thousands, and in a few cases, in the hundreds of thousands, that have sat relatively untackled for months or years. These aren't petty issues either. There are 250,000 articles that need references. By that, I don’t mean that they need more references, I mean that there are, at last count, a quarter million articles that do not have a single citation to support them, and those are just the articles that are tagged as such. Some of these completely unreferenced articles were tagged as far back as October 2006, a half decade ago. There are an additional 250,000 articles that need additional references, and over 200,000 with unsourced statements. Less absurdly high in count but just as important, there are almost 10,000 articles tagged as containing original research, over 8,500 with disputed neutrality, and over 5,500 with disputed accuracy. I am cherry picking especially important issues with especially high numbers, yes, but there are about two dozen other content related backlogs with over a thousand items in them — listed at the Wikipedia Contribution Team’s backlog dashboard — that are not listed here.

What am I trying to say by listing all of these massive backlogs? I am saying that we, as a community, are failing our readers. People come to Wikipedia, for the most part, expecting accurate, neutral, well written articles. In almost a million cases, we cannot with a straight face vouch for the accuracy of the articles we're presenting. It is depressing, it is unacceptable, and unless the community, or significant portions of it, works to tackle these backlogs, the problem will only get worse.

There are a number of factors to blame for this problem. There was a time when ignorance of the problem was a valid claim, but considering the amount of times that one backlog or another has been mentioned in a prominent location, I no longer believe ignorance is a passable excuse. Instead, I believe it comes down to our culture. Working in backlogs certainly isn't glamorous, but more importantly, I don't perceive it as being looked upon by the community as being especially commendable or even as being especially valuable. It seems rather rare that a candidate for RfA puts forth their nomination by leading off their credentials with something like "I have spent the last six months clearing out the backlog at Category:Articles that need to differentiate between fact and fiction" (a category with over 3,500 items, by the way). Even worse, I can point to a few cases where someone did put forth backlog work as a credential, only to have it implicitly or explicitly disregarded by people who only seemed to focus on whether the nominee had written "enough" articles or had "enough" good and featured articles. Simply put, until the community decides that working on backlogs is a valuable activity, and shows it not only at RfA, but also in discussions and everyday community interaction, not enough people are going to jump in and start working on clearing backlogs.

This is not to say that no one values backlog work. There are a few groups of editors dedicated to working on clearing out particularly important backlogs. The Guild of Copy Editors and WikiProject Wikify deserve a tremendous amount of respect in particular for keeping the backlogs at Category:Wikipedia articles needing copy edit and Category:Articles that need to be wikified low; by doing so they ensure a great many articles are a great deal more readable than they otherwise would have been. In the area of files, which happens to be where I spend a majority of my time, backlogs are kept low by a combination of exceedingly useful bots, a few organized drives (such as WikiProject Images and Media's recently concluded Move to Commons drive), and a small handful of editors who devote large amounts of time to working with files.

It is, of course, not enough. This brings me to the primary motivation behind my decision to write this opinion piece:

I am asking, no, begging, everyone that reads this piece to go to this page, select a backlog that they think they can help out with, and knock off a few items. Spend an hour on it, devote ten minutes to backlogs once or twice a week, or do whatever else works for you. It doesn't have to take up a lot of time. If you want, show me a few diffs and I'll give you a barnstar; I'd be happy to. If 1,000 people read this, and each of them clears ten items this month, that’s 10,000 items. If everyone does ten items a month for an entire year, 120,000 items will have been cleared. Even distributed among two dozen or more backlogs, that is a formidable number.

I wouldn't go as far as to beg random strangers to do this if I weren't absolutely convinced that this was of vital importance, but here I am begging for all to see. I also wouldn't ask this of the community if I didn't think it were possible to make a noticeable difference. Recently I cleared a 1,500 item backlog in just a month, with the assistance of one other editor. The two of us, in weeks, took out a backlog that had sat untouched for years, and that specific backlog will never come back. While we'll never be able to eliminate maintenance tasks, it is possible to eliminate the massive backlogs that we have now, and return the number of pending cleanup tasks to a reasonable, functional, level. All it takes is work — and editors willing to do that work. Please join me in the coming months. Together we can defeat the monster under the rug.

Reader comments

2011-10-31

WikiSym; predicting editor survival; drug information found lacking; RfAs and trust; Wikipedia's search engine ranking justified

Wiki research beyond the English Wikipedia at WikiSym

Panel discussion at WikiSym 2011

WikiSym 2011, the "7th international symposium on wikis and open collaboration", took place from October 3–5 at the Microsoft Research Campus in Silicon Valley (Mountain View, California). Although the conference's scope has broadened to include the study of open online collaborations that are not wiki-based, Wikipedia-related research still took up a large part of the schedule. Several of the conference papers have already been reviewed in the September and August issues of this research overview, and the rest of the proceedings have since become available online.

The workshop "WikiLit: Collecting the Wiki and Wikipedia Literature"[1], led by Phoebe Ayers and Reid Priedhorsky, explored the daunting task of collecting the scholarly literature pertaining to Wikipedia and wikis generally. Research about wikis can be difficult to find, since there are papers published in many fields (from sociology to computer science) and in many formats, from published articles to on-wiki community documents. There have been several attempts over the years to collect the wiki and Wikipedia literature, including on Wikipedia itself, but all such projects have suffered from not keeping up to date with the sheer volume of research that is published every year. While the workshop did not reach consensus on what platform to proceed with to build a sustainable system, there was agreement that this is an important topic for the research and practitioner community, and the group developed a list of requirements that such a system should have. The workshop followed and extended discussions on the wiki-research-l mailing list earlier this year on the topic.

In a panel titled "Apples to Oranges?: Comparing across studies of open collaboration/peer production",[2] six US-based scholars reviewed the state of this field of research. Among the takeaways were a call to study failed collaboration projects more often instead of focusing research on successful "anomalies" like Wikipedia, and – especially in the case of Wikipedia – to broaden research to non-English projects.

Another workshop, titled "Lessons from the classroom: successful techniques for teaching wikis using Wikipedia"[3] was a retrospective on the Wikimedia Foundation's Public Policy Initiative.

Among the conference papers not mentioned before in this newsletter are:

  • "Mentoring in Wikipedia: a clash of cultures" [4], a paper which "draw[s] insights from the offline mentoring literature to analyze mentoring practices in Wikipedia and how they influence editor behaviors. Our quantitative analysis of the Adopt-a-user program shows mixed success of the program".
  • "Vandalism Detection in Wikipedia: A High-Performing, Feature–Rich Model and its Reduction"[5] – arguing that on Wikipedia "human vigilance is not enough to combat vandalism, and tools that detect possible vandalism and poor-quality contributions become a necessity", the authors present a vandalism classifier constructed using machine learning techniques.

Wikipedia-related posters included

  • "A scourge to the pillar of neutrality: a WikiProject fighting systemic bias"[6] presenting preliminary findings from an ongoing survey and interviews among members of the WikiProject Countering systemic bias.
  • Another poster presentation planned to analyze the contributions of the members of this WikiProject to see what kind of systemic bias they might exhibit themselves ("Places on the map and in the cloud: representations of locality and geography in Wikipedia"[7]).
  • "Participation in Wikipedia's article deletion processes"[8] found that "the deletion process is heavily frequented by a relatively small number of longstanding users" and that "the vast majority of [speedily] deleted articles are not spam, vandalism, or 'patent nonsense', but rather articles which could be considered encyclopedic, but do not fit the project's standards".
  • "Exploring underproduction in Wikipedia"[9] examined "two key circumstances in which collective production can fail to respond to social need: when goods fail to attain high quality despite (1) high demand or (2) explicit designation by producers as highly important".

Quality of drug information in Wikipedia

Lovastatin, the first statin to be marketed.

A study entitled "Accuracy and completeness of drug information in Wikipedia: an assessment"[10] in this month's issue of the Journal of the Medical Library Association of five widely prescribed statins found that while these Wikipedia drug articles are generally accurate, they are incomplete and inconsistent. The study's authors conclude:

The main criticism by the study is that most of the articles lacked sufficient information on adverse effects, contraindications, and drug interactions and this lack of information might harm the consumer. These criticisms echo earlier ones (two similar studies reported in the Signpost: "Pharmacological study criticizes reliability of Wikipedia articles about the top 20 drugs", " Wikipedia drug coverage compared to Medscape, found wanting"). However the authors did note the benefit of Wikipedia hypertext links to additional information that most other web sources on drug information lack and in addition noted that all the Wikipedia articles contained references to peer reviewed journals and other reliable sources. Hence overall, the latest study is somewhat more positive than the earlier two.

Predicting editor survival: The winners of the Wikipedia Participation Challenge

Banner of the Wikipedia Participation Challenge

The Wikimedia Foundation announced the winner of the Wikipedia Participation Challenge. The data competition, organized in partnership with Kaggle and the 2011 IEEE International Conference on Data Mining, asked data scientists to use Wikipedia editor data and develop an algorithm to predict the number of future edits, and in particular one that correctly predicts who will stop editing and who will continue to edit (see the call for submissions). The response was overwhelming, with 96 participating teams, comprising in total 193 people who jointly submitted 1029 entries (listed in the competition's leaderboard).

The brothers Ben and Fridolin Roth (from team prognoZit) developed the winning algorithm. They developed a linear regression model using Python and GNU Octave. The algorithm used 13 features (2 based on reverts and 11 based on past editing behavior) to predict future editing activity. Both the source code and a description of the algorithm are available. Unfortunately, because it relied on patterns in the training dataset that would not be present in the actual one, the model's ongoing use is severely restricted.

Second place went to Keith Herring. Submitting only 3 entries, he developed a highly accurate model, using random forests, and utilizing a total of 206 features. His model shows that a randomly selected Wikipedia editor who has been active in the past year has approximately an 85 percent probability of becoming inactive (no new edits) in the following 5 months. The most informative features captured both the edit timing and volume of an editor's activity.

The challenge also announced two Honourable Mentions for participants who only used open source software. The first Honourable Mention went to Dell Zang (team zeditor) who used a machine learning technique called gradient boosting. His model mainly uses recent past editor activity. The second Honourable Mention went to Roopesh Ranjan and Kalpit Desai (team Aardvarks). Using Python and R, they too developed a random forest model. Their model used 113 features, mainly based on the number of reverts and past editor activity (see its full description).

All the documentation and source code has been made available on the main entry page for the WikiChallenge.

What it takes to become an admin: Insights from the Polish Wikipedia

Logo of the Polish Wikipedia

A team of researchers based at the Polish Japanese Institute of Information Technology (PJIIT) published a study presented at SocInfo 2011 looking at Requests for Adminship (RfA) discussions in the Polish Wikipedia.[11] The paper presents a number of statistics about adminship in the Polish Wikipedia since the RfA procedure was formalized (2005), including the rejection rate of candidates across different rounds, the number of candidates and votes over the years and the distribution of tenure and experience of candidates for adminship. The results indicate that it was far more complicated to obtain admin status in 2010 than it was in previous years, and that tenure required to be a successful RfA candidate has soared dramatically: "the mean number of days since registration to receiving adminship is nearly five times larger than it was five years before".

The remainder of the paper studies RfA discussions by comparing the social network of participants based on their endorsement (vote-for) or rejection (vote-against) of a given candidate with an implicit social network derived from three different types of relations between contributors (trust, criticism and acquaintance). The goal is to measure to what extent these different kinds of relations can predict voting behavior in the context of RfA discussions. The findings suggest that "trust" and "acquaintance" (measured respectively as the amount of edits by an editor in the vicinity of those by the other editor and as the amount of discussions between two contributors) are significantly higher in votes-for than in votes-against. Conversely, "criticism" (measured as the number of edits made by one author and reverted by another editor) is significantly higher in votes-against than in votes-for.

This study complements research on the influence of social ties on adminship discussions reviewed in the past edition of the research newsletter.

High search engine rankings of Wikipedia articles found to be justified by quality

An article titled "Ranking of Wikipedia articles in search engines revisited: Fair ranking for reasonable quality?", by two professors for information research from the Hamburg University of Applied Sciences (which appeared earlier this year in the Journal of the American Society for Information Science and Technology and is now available as open access, also in form of a recent arxiv preprint[12]) addresses "the fiercely discussed question of whether the ranking of Wikipedia articles in search engines is justified by the quality of the articles". The authors recall an earlier paper coauthored by one of them[13] that had found Wikipedia to be "by far the most popular" host in search engine results pages (in the US): In "1000 queries, Yahoo showed the most Wikipedia results within the top 10 lists (446), followed by MSN/Live (387), Google (328), and Ask.com (255)". They then set out to investigate "whether this heavy placement is justified from the user’s perspective". First, they re-purposed the results of a 2008 paper of the first author,[14] where students had been asked to judge the relevance of search engine results for 40 queries collected in 2007, restricting them to the search results that consisted of Wikipedia articles – all of them from the German version. They found that "Wikipedia results are judged much better than the average results at the same ranking position" by the jurors, and that

To conduct a more thorough investigation (the 2008 assessments having only focused on the criterion of relevance), the present paper sets out to develop a set of quality criteria for the evalulation of Wikipedia articles by human jurors. It first gives an overview of existing literature about the information quality of Wikipedia, and of encyclopedias in general, identifying four main criteria that several pre-2002 works about the quality of reference works agreed on. Interestingly, "accuracy" was not among them, an omission explained by the authors by the difficulty of fact-checking an entire encyclopedia. From this, the authors derive a set of 14 evaluation criteria, incorporating both the general criteria from the literature about reference works and internal Wikipedia criteria such as the status of being a featured/good article, the verifiability of the content and the absence of original research. These were then applied by the jurors (two last year undergraduate students with experience in similar coding tasks) to 43 German Wikipedia articles that had appeared in the 2007 queries, in their state at that time. While "the evaluated Wikipedia articles achieve a good score overall", there were "noticeable differences in quality among the examples in the sample" (the paper contains interesting discussions of several strengths and weaknesses according to the criteria set, e.g. the conjecture that the low score on "descriptive, inspiring/interesting" writing could be attributed to "the German academic style. A random comparison with the English version of individual articles seems to support this interpretation").

The authors conclude:

Both the search engine ranking data and the evaluated Wikipedia article revisions are somewhat dated, referring to January 2007 (the authors themselves note that it "could well be that in the meantime search engines reacted to that fact [the potential of improving results by ranking Wikipedia higher] and further boosted Wikipedia results", and also that regarding the German Wikipedia, the search engine results did not take into account possible effects of the introduction of stable versions in 2008).

Attempts to predict the outcome of AfD discussions from an article's edit history

A master's thesis defended by Ashish Kumar Ashok, a student in computing at Kansas State University, describes machine learning methods to determine how the final outcome of an Article for Deletion (AfD) discussion is affected by the editing history of the article.[15] The thesis considers features such as the structure of the graph of revisions of an article (based on text changed, added or removed), the number of edits of the article, the number of disjoint edits (according to some contiguity definition), as well as properties of the corresponding AfD, such as the number of !votes and the total length of words used by participants in AfD who expressed their preference to keep, merge or delete the article. Different types of classifiers based on the above features are applied to a small sample of 64 AfD discussions from the 1 August 2011 deletion log. The results of the analysis indicate that the performance of the classifiers does not significantly improve by considering any of the above features in addition to the sheer number of !votes, which limits the scope and applicability of the methods explored in this work to predict the outcome of AfD discussions. The author suggests that datasets larger than the sample considered in this study should be obtained to assess the validity of these methods.

Briefly

  • Why did Wikipedia succeed while others failed?: In a presentation on October 11 at the Berkman Center for Internet and Society ("Almost Wikipedia: What Eight Collaborative Encyclopedia Projects Reveal About Mechanisms of Collective Action", with video), MIT researcher and Wikimedia Foundation advisory board member Benjamin Mako Hill presented preliminary results of his research comparing Wikipedia and seven other Internet encyclopedia projects or proposals that did not take off, based on interviews with the projects' founders as well as examinations of their archives. The event was summarised for the Nieman Journalism Lab (reprinted in Business Insider), and in the Signpost: "The little online encyclopaedia that could". Hill later gave a shorter (ca. 12min) talk about the same topic at the "Digital commons" forum (see below): video, slides.
  • "Digital Commons" conference: On October 29–30, the "Building Digital Commons" conference took place in Barcelona, organized by Catalan Wikimedians and Wikimedia Research Committee member Mayo Fuster Morell, and supported by the Wikimedia Foundation. The program featured several presentations about Wikipedia research; further online documentation is expected to become available later.
  • Placement of categories examined: A paper from two computer science researchers based at the Katholieke Universiteit Leuven examines the order in which categories are placed on Wikipedia articles and reports on connections between a category's position in this list and "its persistence within the article, age, popularity, size, and descriptiveness".[16] The order in which categories are added is not determined by any explicit rule. However, the research found, older, more persistent and more exclusive categories are consistently placed in lower positions. Categories appearing at lower positions also tend to do so across all the articles they contain and they include articles that are more similar to each other in terms of category overlap.
  • Visualizing semantic data: A team from the UCSB Department of Computer Science recently presented[17] WiGiPedia, a tool visualizing rich semantic data about Wikipedia articles, designed to "inform the user of interesting contextual information pertaining to the current article, and to provide a simple way to introduce and/or repair semantic relations between wiki articles". The tool builds on structured data represented via templates, categories and infoboxes and queried via DBpedia. By supporting collaborative editing of rich semantic data and one-click semantic updates of Wikipedia articles, the tool aims to bridge the gap between Wikipedia and DBpedia. The source code of the tool doesn't appear to be publicly released.
  • Wikipedia literature review: Owen S. Martin posted to arXiv a 28-page Wikipedia literature review towards his Ph.D. in statistics.[18] About half the paper gives an overview of Wikipedia's database structure; the remainder reviews about 30 recent papers from the perspective of assessing their quality, trust, semantic extraction, governance, economic implications and epistemological implications.
  • Vandalism detection contest: An "Overview of the 2nd International Competition on Wikipedia Vandalism Detection" has been published.[19]
  • Matching Wikipedia articles to Geonames entries: A four-page paper by two researchers from Hokkaido University[20] explored the problem of "merging Wikipedia'™s Geo-entities and GeoNames" to form a larger geographical database. This is already being done by the YAGO (Yet Another Great Ontology) database, but the paper uses additional data beyond the article name, such as categories and disambiguation pages on Wikipedia, to identify further matching pairs missed in YAGO (and in the process found several errors in GeoNames).
  • Attempt to examine evolution of key activities in Wikipedia: A paper titled "Governing Complex Social Production in the Internet: The Emergence of a Collective Capability in Wikipedia"[21] (presented last month at the "Decade in Internet Time" symposium at the Oxford Internet Institute) undertakes "an exploratory theoretical analysis to clarify the structure and mechanisms driving the endogenous change of [Wikipedia]", using the framework of capability theory to construct six hypotheses such as "the membership in group(s) of contributors that take up governance tasks varies less than in those revolving on content production". These are then tested empirically by applying a clustering algorithm to monthly snapshots of the English Wikipedia (until 2009) "to identify distinct groupings of contributors at each month". However, the clustering algorithm leaves out a group of users "that covers all the observed domains of activity" and "despite its relatively small share of overall contributor population ... provides the majority of the work", which leads the authors to dub it "the core editors of Wikipedia".

References

  1. ^ Ayers, Phoebe, and Reid Priedhorsky (2011). WikiLit: Collecting the wiki and Wikipedia literature. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration – WikiSym '11, 229. New York, New York, USA: ACM Press, 2011. DOIPDF Open access icon
  2. ^ Antin, Judd, Ed H. Chi, James Howison, Sharoda Paul, Aaron Shaw, and Jude Yew (2011). Apples to oranges? Comparing across studies of open collaboration/peer production. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration – WikiSym '11, 227. New York, New York, USA: ACM Press, 2011. DOIPDF Open access icon
  3. ^ Schulenburg, Frank, LiAnna Davis, and Max Klein (2011) Lessons from the classroom.In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration – WikiSym '11, 231. New York, New York, USA: ACM Press, 2011. DOIPDF Open access icon
  4. ^ Musicant, David R., Yuqing Ren, James A. Johnson, and John Riedl (2011). Mentoring in Wikipedia: a clash of cultures. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration – WikiSym '11, 173. New York, New York, USA: ACM Press, 2011. DOIPDF Open access icon
  5. ^ Javanmardi, Sara, David W. McDonald, and Cristina V. Lopes (2011). Vandalism detection in Wikipedia. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration – WikiSym '11, 82. New York, New York, USA: ACM Press, 2011. DOIPDF Open access icon
  6. ^ Livingstone, Randall M (2011). A scourge to the pillar of neutrality. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration – WikiSym '11, 209. New York, New York, USA: ACM Press, 2011. DOIPDF Open access icon
  7. ^ Livingstone, Randall M. (2011) Places on the map and in the cloud: representations of locality and geography in Wikipedia. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration – WikiSym '11, 211. New York, New York, USA: ACM Press, 2011. DOI PDF Open access icon
  8. ^ Geiger, R. Stuart, and Heather Ford (2011). Participation in Wikipedia's article deletion processes. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration – WikiSym '11, 201. New York, New York, USA: ACM Press, 2011. DOIPDF Open access icon
  9. ^ Gorbatai, Andreea D. (2011) Exploring underproduction in Wikipedia. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration – WikiSym '11, 205. New York, New York, USA: ACM Press, 2011. DOIPDF
  10. ^ Kupferberg, Natalie, and Bridget McCrate Protus (2011) Accuracy and completeness of drug information in Wikipedia: an assessment. Journal of the Medical Library Association 99(4): 310–3. DOIHTML Closed access icon
  11. ^ Turek, Piotr, Justyna Spychała, Adam Wierzbicki, and Piotr Gackowski (2011) Social Mechanism of Granting Trust Basing on Polish Wikipedia Requests for Adminship. In: Social Informatics 2011. Lecture Notes in Computer Science, 6984:212–225. DOI Closed access icon
  12. ^ Lewandowski, Dirk, and Ulrike Spree (2011) Ranking of Wikipedia articles in search engines revisited: Fair ranking for reasonable quality? Journal of the American Society for Information Science 62(1)): 117–132. DOI Open access iconarxiv.org PDF Open access icon
  13. ^ Höchstötter, Nadine, and Dirk Lewandowski (2009). What users see – Structures in search engine results pages. Information Sciences 179 (12): 1796–1812 DOIPDF Open access icon
  14. ^ Lewandowski, Dirk (2008). The retrieval effectiveness of Web search engines: Considering results descriptions. Journal of Documentation 64(6), 915–937 PDF Open access icon
  15. ^ Ashok, Ashish Kumar (2011). Predictive data mining in a collaborative editing system: the Wikipedia articles for deletion process. HTML Open access icon
  16. ^ Gyllstrom, Karl, and Marie-Francine Moens (2011) Examining the "Leftness" Property of Wikipedia Categories. In: CIKM '11. PDF Open access icon
  17. ^ Bostandjiev, Svetlin, John O'€™Donovan, Brynjar Gretarsson, Christopher Hall, and Tobias Hollerer (2011) WiGiPedia: Visual Editing of Semantic Data in Wikipedia. In: Workshop on Visual Interfaces to the Social and Semantic Web (VISSW2011), PDF Open access icon
  18. ^ Martin, Owen S (2011) €œA Wikipedia Literature Review. ArXiV, October 17, 2011. PDF Open access icon
  19. ^ Potthast, Martin, and Teresa Holfeld (2011) Overview of the 2nd International Competition on Wikipedia Vandalism Detection. In: PAN 2011. PDF Open access icon
  20. ^ Yiqi Liu, and Masaharu Yoshioka (2011) Construction of large geographical database by merging Wikipedia's Geo-entities and GeoNames. PDF Open access icon
  21. ^ Aaltonen, Aleksi, and Giovan Francesco Lanzara (2011) Governing Complex Social Production in the Internet: The Emergence of a Collective Capability in Wikipedia. In Decade in Internet Time symposium. HTML Open access icon


Reader comments

2011-10-31

German Wikipedia continues image filter protest

Image filter sparks protests on the German Wikipedia

In June 2010, following controversy over the appearance of the vulva article as the German Wikipedia's article of the day, allegations by Larry Sanger of hosting inappropriate graphic depictions of children, and other controversial events, the board voted for an external survey, to be conducted by Robert Harris, of controversial images on Wikimedia. The study was completed by October of that year, but its recommendations were not immediately adopted. In the interim, in December, a poll failed to gain the consensus necessary to promote Commons:Sexual content to a policy, and the Wikimedia leadership focused on the topic as a central issue for 2011. In March 2011 a technical draft of a personal image filter that enables users to hide for themselves images they do not want to see was presented to the Board.

However, a poll this August showed just how polarizing the issue is for many users; on the German Wikipedia in particular, a separate vote showed that more than 4/5 of users were opposed to institution of the filter, including some 35% of core users. As Jan eissfeldt explained in an op-ed last month, the German community is particularly motivated against censorship issues; as another user put it on the mailing list, "it is against the basic rules of the project. It is intended to discriminate content. To judge about it and to represent you this judgment before you have even looked at it." On October 9, the results of the poll were followed by a "Letter to the community on Controversial Content" from WMF Board chair Ting Chen (User:Wing) and a clarification by WMF executive director Sue Gardner that although the Board's May resolution on controversial content still stood, "the specific thing that has been discussed over the past several months, and which the Germans voted against" was not being pursued any more, and that "the goal is a solution that's acceptable for everyone". Still, the letter triggered extensive discussion by German Wikipedians; and Sue Gardner promised to discuss the issue with them directly when coming to Germany in November for the German chapter's annual meeting.

Shortly thereafter, Wikipedian Sargoth proposed on October 19 that users should put white paper bags over their heads as a sign of protest when Gardner arrived. In the interim, users have taken to posting an image of a white paper bag on their userpages in protest. As of writing, more than 150 users have done so. The image filter issue has united the Germans, as one user wrote, "in a way that I haven't seen in several years." In addition, a filter-less German Wikipedia fork has been proposed, and as of writing the community poll on the issue stands at 31/40/24/1. In the meantime the referendum committee published the second and third appendices of results on Meta to make the votes per project and by age of account transparent. On October 28, Sue Gardner reiterated that she had taken the category-based solution off the table, and will not impose anything on the German community against their will.

In brief

  • One possible design for the Article Feedback extension version 5
    Feedback on article feedback: The Wikimedia Foundation hosted IRC office hours this week specifically for questions on the article feedback tool. The tool, which was introduced in September 2010 and fully rolled out in July 2011, is in its fifth iteration. In a message to the foundation-l mailing list, Deputy Director Erik Moeller clarified that "the idea is to experiment with some alternative approaches in parallel with the existing deployment, not to scrap the existing deployment and start over immediately". A log of the office hours can be found here.
  • Featured sounds tagged as inactive: The Featured sounds candidates process was tagged as inactive on October 25 by FSC director Guerillero. A change in process standards and an RfC both stagnated in the middle of the year. The two current nominations, a composition of the Maple Leaf Rag and a delist nomination for an older version of the same, have not had any comments for two months; as Wikipedian Major Bloodnok remarked, "Ultimately few Wikipedians are going to be interested in getting involved in a project which doesn't even know what it stands for." The project is in the process of closing its last two nominations.

  • Women and Wikimedia Survey 2011: In January, The New York Times reported that Wikipedia was suffering from a gender gap, stating that female contributors made up less than 13% of editors. The 2011 Editor Survey further expatiated on this gap, showing that only 9% of editors on Wikipedia are women, and the gap is even more pronounced among high-activity editors. The gap has drawn much discussion from editors, and the Wikimedia Foundation has made one of its goals through 2015 to increase participation, especially among women. This week Wikipedian SarahStierch published the results of her manual survey of women writers, the Women and Wikimedia Survey 2011. Stierch emailed 500+ female editors with a set of 22 questions, and garnered a total of 329 responses; the results of the survey can be seen here.
  • New Page Patrol survey: New page patrollers have been invited to participate in a survey intended to inform the Wikimedia Foundation in advance of the design of an overhauled quality control system for new articles. Participants are being actively solicited ahead of the closure of the survey on Monday 7 November at 23:59 (UTC).
  • Fundraising discussion 2012: The open talks about how to manage Wikimedia Foundation fundraising after the 2011/12 fundraiser have been kicked off by a statement from Jan-Bart on Meta. The next key date is November 15, after which the Board of Trustees will take the input that has been gathered in the discussion process and make a final set of guiding principles.
  • Identity guidelines published: The Wikimedia Foundation has published "Identity guidelines for Wikipedia." The file, which is an expanded version of the Wikimedia visual identity guidelines, deals with the treatment of Wikimedia's official marks on other sites and publications (the official marks are copyrighted by the Wikimedia Foundation).
  • Venezuelan chapter recognised: The Wikimedia Foundation has resolved to provisionally recognise Wikimedia Venezuela as an official chapter, in accordance with a recommendation from the Chapter Committee. The arrangement is expected to be finalised with the signing of a Chapters Agreement in the coming year.
  • Call for Wikipedian in residence in South Africa: A call has been made for applications for a position as Wikipedian in residence in Cape Town, South Africa. The one year position is a monthly stipend of R8,500 and will be supported by the Africa Centre. The job description can be found here.
  • Wikimedia Participation Grants: The Wikimedia Foundation and the German chapter started a joint grant program for volunteers. The scope of the program consists of scholarships covering travel, accommodation and incidental expenses in relation with active participating in events. More details are available here.
  • Milestones: This week, the Kazakh Wikipedia reached 100,000 articles, the Ukranian Wikipedia reached 1,000,000 total pages, and the French wikisource reached 80,000 total entries. In fact, according to Wikimedia News, the Khazakh Wikipedia jumped from 90,000 to 100,000 articles in three days as a result of thousands of machine translations of articles on the Russian Wikipedia, imported by bot.

    Reader comments

2011-10-31

Citizendium on the rocks, Shankbone celebrated, and the week in vandalism

Citizendium at 5: birthday candles or last rites?

A graph showing numbers of active editors of Citizendium through July 2011, compiled by RationalWiki. Like those of Wikipedia, the contributor data of Citizendium indicate a worrying trend.

On the occasion of Citizendium's fifth anniversary, Ars Technica interviewed its founder Larry Sanger (known for his role in starting Wikipedia until 2002) and editorial council member Hayford Pierce, presenting their "candid assessments of what went wrong, and what we can learn from the experience" and looking back at the "great debate about the merits of Wikipedia's radically democratic editing process" which had been prompted by Sanger's September 2006 announcement. "Citizendium turns five, but the Wikipedia fork is dead in the water" was the grim headline given to the interview. Last month, shortly after the anniversary of Citizendium's first announcement, the Signpost interviewed the project's managing editor Daniel Mietchen: "Citizendium, half a decade later".

Vandalism noted

Vandalism to the article on Anna Dello Russo this weekend was picked up in several places. Part of why it received so much attention was undoubtedly its unusually humorous nature. "As much as I'm trying to be pissed at whomever did this, it's kind of...hilarious", wrote Ology.com. The defacement was also noted by New York magazine's fashion desk.

Meanwhile, progressive magazine Mother Jones spotted intensive edit warring at the article about Walid Phares, a foreign policy advisor for U.S. presidential hopeful Mitt Romney. The nexus of the dispute was attempted detailing of Phares relationship with the right-wing Lebanese Forces during that nation's civil war. Finally, the Herald Sun documented alternately juvenile and death-threatening defamatory edits to articles on Australian politicians Robert Doyle and Ted Baillieu.

Occupy Shankbone

Dog "protester" at "Occupy Wall Street" (named by David Shankbone as one of his favorites among his photos of the demonstrations)

American magazine Good interviewed editor David Shankbone this week, portraying him as "The Most Important Occupy Wall Street Photographer You've Never Heard of". In the interview, he discussed his photography ("In 2003 I was on a volcano in Ecuador with some locals who ended up stealing my digital camera and all of my clothes, and it wasn’t until 2006 that I had a camera again."), the role it has played on Wikipedia, as well as his opinion of Occupy Wall Street. Shankbone had previously been interviewed as a "Thought Leader" in March for the PBS MediaShift blog by former WMF staffer Sandra Ordonez with the acclamation that he was "arguably the most influential new media photojournalist in the world."

In brief

  • Archive opens floodgates of out-of-copyright journals: JSTOR, the pre-eminent online archiver of academic journals in the humanities and social sciences, has released that portion of its journal content first published prior to 1923 in the United States and prior to 1870 elsewhere, a haul comprising 500,000 articles from a broad swathe of disciplines, accounting for about 6% of its total archive. The non-profit service had been widely criticised earlier in the year for its perceived reticence to facilitate free access to such material (see Signpost coverage: "Open-access activists clash with proprietary journal establishment" )
  • Wikipedians in the stacks: The Daily Targum excerpts highlights from an address given by librarian and long-standing administrator David Goodman (User:DGG) and fellow editor Ann Matsuuchi to Rutgers University librarians, covering familiar ground on the construction, culture and best use of the peer-generated encyclopaedia.
  • Amanda Knox coverage excoriated: In citizen journalism site GroundReport, Joseph Bishop slammed the Murder of Meredith Kercher article as "a rare failure at Wikipedia", accusing it of having been controlled by partisan "mostly European" administrators convinced of the guilt of the eventually-acquitted Amanda Knox, who had been among those charged with the murder. Bishop, who had led efforts to petition detailing the alleged failings of the article that attracted the sympathy of Jimmy Wales amongst others, hailed the recent thorough rewrite of the article by "Super Administrator" [sic] SlimVirgin, to whom he attributed more power on the site than anyone other than Mr. Wales himself.
  • Pyromania and graphomania compete for Halloween attention: New Zealand's Stuff magazine gave a roundup of Halloween-related Wikipedia lists and articles, while The Saginaw News noted the prominence of the Michigan city's arsonists in the article on Devil's Night, a tradition of seasonal mischief.
  • Female Wikipedians survey noted: The Melville House Publishing blog covered the recent informal survey of female-identified editors by Sarah Stierch. Concerns by those surveyed included a hostile and adversarial working climate, a culture of tolerance towards soft prejudice, and the widely observed phenomenon of stalking and harassment of female functionaries. See "News and notes" for more.
  • All Wikipedia, all the time: ARNnet reviewed as their "app of the day" All Of Wikipedia - Offline, an app for iPad, iPhone and iPad that provides users with offline access to an image-free version of the encyclopaedia. Costing US$8.99 and weighing in at 2.7mb for the app and up to 4gb for the database, it was judged by the reviewer to be a daunting download, and – for an otherwise free product – cheekily expensive, but ultimately a "good app for Wikipedia junkies", who may well welcome the increased access opportunity offered.
  • Foundation ramps up mobile ambitions: paidContent gave an overview of the Wikimedia Foundation's efforts in expanding mobile penetration on the occasion of the announcement of the imageless dedicated mobile platform Wikipedia Zero, noting the development as proof of the foundation's pledge to prioritise its mobile offerings and expansion in the Global South. A note of concern was voiced at the foundation's failure to sign any carriers onto its scheme to provide free access to Wikimedia content, but WMF senior manager of mobile partnerships Amit Kapoor was upbeat about attracting partners in India and China specifically. Media Nama meanwhile noted with interest Kapoor's ambitions for Wikipedia to be available even to those mobile users without a data plan. See "Technology report" for more.
  • Wikipedia.ee: Estonian Public Broadcasting trumpeted the Estonian Internet Foundation's ruling in favour of the Wikimedia Foundation against a speculator who had cybersquatted the Wikipedia.ee domain name with a duplicate copy of the Estonian Wikipedia.

    Reader comments

2011-10-31

Proposal to return this section from hiatus is successful

Reporter's note: Last issue, a call for more writers was put out by the managing editors. In response, I decided to try to revive the Discussion report section. However, I'm going to need your help to do it. I watch most of the major discussion boards, and have done so long before taking on this responsibility, but I can't possibly be watching everything at once. At the beginning of the week, I am going to start the discussion report and throw in several items I will try to include for the finished product. You will be able to see the report in progress from the Signpost's Newsroom. If there is an important discussion that you're aware of, hasn't been covered in a previous report, and is not on my list, feel free to add a link to the discussion at the Signpost's suggestion board. I will not be able to cover sister projects or meta at all without tips, so I really, really need people who are active at other projects to keep me in the loop, so I can in turn keep you all in the loop. If I decide to write up a discussion I received from a tip (which I will, if it's a good tip), I'll give you credit for bringing it in. You can keep sending me tips until 48 hours before publication (I need the weekend to do the write ups).

I am currently planning for the report to be bi-monthly, but that may change in future. I'm also open to co-writing the report with anyone interested. For those who remember the pre-hiatus discussion report, I've also slightly reorganized the layout, adding a few sections including the sidebar.

And now, with all that explanation out of the way, please welcome back the Discussion report.

– Sven Manguard


Critical issues

A 45-day-long request for comment was initiated by MuZemike to solicit community opinions on a number of questions related to the upcoming Arbitration Committee elections. Dozens of proposals and hundreds of comments on important structural issues were made, and the RfC is set to close on November 1. A special edition of the discussion report will appear in next week's Signpost, devoted entirely to the results of this important discussion.

Surveys

New page patrol improvement survey

Wikimedia Foundation contractor Okeyes (WMF) (also known as Ironholds outside his official WMF capacity) recently distributed more than 4,000 invitations asking editors with experience in new page patrolling to participate in a survey designed to aid the Foundation's efforts to develop a new Special:NewPages interface. The Signpost asked Mr. Keyes about the survey:

"The staffers I've been in contact with have really been bowled over by the number of responses. We were expecting maybe 450; as of now, we've got over 1,000. The data gathered so far has torpedoed a lot of assumptions about new page patrollers. A majority of them have tertiary qualifications and are well above 18, for example.

However, we still need to normalise the data; a majority of patrollers have tertiary qualifications, but do they do the majority of patrols? If not, what demographic does most of the work, and what are their attributes? Hopefully this will be done in the next couple of weeks, and I should have some very interesting data to show people quite soon.

My thanks to everyone who has submitted information so far, or who plans to do so in the future; we're going to use it to build a Special:NewPages interface that's easier for existing editors to use, and easier for new editors to adapt to. Hopefully we can bring down the workload and make it a better experience."

If you have experience in the area but have not received an invitation, you can still take the survey by clicking this link. The discussion report will publicize the results of the survey as they become available.


Discussions that are
Happening now
Unless otherwise mentioned, all discussions profiled in the report remain open as of October 31, 2011.

A proposal was launched on the last day of September by Noleander to remove from the Notability (music) guideline the line "In general, if the musician or ensemble is notable, and if the album in question has been mentioned in multiple reliable sources, then their officially released albums may have sufficient notability to have individual articles on Wikipedia", on the grounds that it is so easy to find sources for just about any album that the criteria in question are not effective for judging notability. Instead, Noleander advocated the principle of "significant coverage". After five days, Lawrencekhoo restarted the discussion as a formal Request for Comment in a new thread immediately below Noleander's discussion. Discussion on the issue has slowed, with a significant majority supporting Noleander's proposal.

In response to concerns raised during the discussion of the possible unblock of TreasuryTag (coverage below), Eraserhead1 removed the line "as punishment against users" from the section of the blocking policy that listed what blocks were not to be used for. The change was reverted, and a Request for Comment was initiated by Hydroxonium to determine whether there was consensus for reinstating Eraserhead1's change. After 10 days and more than 100 comments, SilkTork closed the discussion with the conclusion that there was not a consensus for making the alteration.

Hammersoft (talk · contribs), on behalf of Δ (formerly Betacommand), has filed requests to undertake 20 separate automated tasks. Δ is required by community sanctions to seek consensus before undertaking a "pattern of edits", and is restricted to an average of four edits a minute during any ten-minute interval. At the time of writing, a majority of the proposals have more opposition than support, or have almost the same levels of opposition as support; seven, however, have achieved varying pluralities of support. They are:

3. undertake edits to remove external links where such links were used as a failed attempt to include an image in an infobox
7. add {{dead link}} as appropriate to references where the link is dead.
9. replace "Image:" with "File:".
13. add titles to bare URLs and convert inline links to refs where needed.
14. add non-breaking spaces to units, in accordance with WP:NBSP.
18. date maintenance templates.
20. combine templates as needed into {{multiple issues}}.

An effort to rewrite the editing restrictions placed on Δ is also underway at the same page; however, none of the proposed versions have achieved significant support, and the level of participation in the discussion is low.

A five-part RfC was initiated at the end of September by Dominus Vobisdu, in which the editor raised concern that several sections of the article Astrology give undue weight to a minority view of astrology, and over the use of unreliable sources in those claims. Duing the past month, more than 200 separate comments have been left on the talk page on these matters. Several editors have engaged in heated exchanges, and in mid-October the Arbitration Committee imposed a six-month topic ban from the topic of astrology on Ludwigs2, due to his comments at the RfC (the Astrology article is under general sanctions as a result of the case Pseudoscience). While a broad consensus has formed on most of the issues, the discussion appears likely to remain open for some time.

An RfC filed jointly by Paul Siebert and Smallbones seeks to end a dispute over the content of the lead section of Mass killings under Communist regimes, by crafting two potential lead sections and asking the community to choose which one should be placed in the article. The discussion has slowed to a halt, and at present neither of the two originally proposed leads, nor a third lead suggested during the discussion, has managed to achieve more support than opposition.

A modification to the verifiability policy was proposed earlier this month that would make two changes to the handling of "verifiability, not truth". The change would remove mention of the concept "verifiability, not truth" from the lead, in favor of mention that Wikipedia policies other than verifiability also affect the inclusion of content. "Verifiability, not truth" would instead be addressed in a new section, "Assertions of truth and untruth", placed right after the lead. Support for the change was just above a 2:1 margin, with almost 100 opinions already in, before a sudden spike in participation after a thread was opened on October 28 at the administrators' noticeboard alleging that the RfC was closed too early and was not closed by an uninvolved admin. The margin of support has decreased to around 3:2, with just under 250 comments in. SarekOfVulcan, the administrator whose close of the discussion led to the AN/I thread, voluntarily resigned his administrator tools on October 29.

An RfC was started earlier this week by Ridernyc that sought to gain consensus for inserting the line, "At this time there is no consensus that Esports [sic] participants are covered by the criteria of this guideline" into the page Notability (sports). A dozen editors have commented, and the discussion is ongoing. Another e-sports-related discussion is under way at the Reliable sources noticeboard, regarding whether or not several websites can be considered independent sources.

Other discussions

TreasuryTag, an editor since 2006, was blocked at the beginning of the month for "generally combative behaviour not conducive to collaborative environment". Having been blocked in August and September for unacceptable behavior, and each time being unblocked after promising to reform, TreasuryTag's October block was indefinite. A proposal by Worm That Turned was put forth at the Administrators' noticeboard that would have allowed the editor to be unblocked and given a final chance, on the condition that they be mentored and monitored by Worm That Turned and Fastily, both admins. Almost 50 people left comments on the matter. Sjakkalle closed the discussion and stated, in part, that consensus was against unblocking TreasuryTag, but that there was "general agreement in the discussion that both Worm That Turned and Fastily should be commended for their generous offer and attempts to find a satisfactory outcome".

Reader comments

2011-10-31

'In touch' with WikiProject Rugby union


WikiProject news
News in brief
Submit your project's news and announcements for next week's WikiProject Report at the Signpost's WikiProject Desk.
The All Blacks perform Ka Mate before a match against France in 2006. New Zealand are the current holder of the Rugby World Cup.
Rugby School in Rugby, Warwickshire, is reputed to be where rugby was started in 1823
This is the only known portrait of William Webb Ellis, circa 1857, who is famous for allegedly being the inventor of Rugby football
A rugby scrum is a way of restarting the game after an accidental infringement
A giant rugby ball was suspended from the Eiffel Tower to commemorate France's hosting of the 2007 Rugby World Cup

The recently concluded 2011 Rugby World Cup on 23 October saw New Zealand's All Blacks crowned world champions after defeating France 8–7 in a nail-biting finish. Rugby union is a full contact team sport which originated in England in the early 19th century. It is one of two codes of rugby football. This week, we took time out to speak with members of WikiProject Rugby union. Started by DaGizza in December 2005‎, the Project is home to over 9,700 articles, with 7 Featured articles, 1 Featured list, 14 Good articles and a Featured portal. The Project has 166 participants. The Signpost interviewed project members MacRusgail, FruitMonkey, Bob247 and Aircorn.

MacRusgail is a Scottish Wikipedian, and has been editing since April 2005. A rugby fan and former player, he was motivated to join WikiProject Rugby union because: "I felt that the coverage at the time was poor, and the articles in general were not well written/non-existent. American sports and association football are already well covered, but many others are not." FruitMonkey has been a Wikipedian since October 2006, and works on many Welsh articles: "Although I was well aware of the sport, I was not a huge rugby fan. After a few low level edits on Welsh articles I got into a heated debate when a Welsh rugby club was flagged for deletion by an editor who was unaware of the nature of 'amateur' within rugby union. That set me on a little crusade and I started building articles on Welsh Victorian players, which has now spiraled into all things rugby." Bob247 is another Scottish Wikipedian, editing since July 2005: "I was contributing to other articles and came across the rugby ones while searching for information on the [Six Nations] and was surprised to see how little there was on rugby union compared with other major sports." Aircorn is a New Zealander and has been on Wikipedia since December 2009. He says that his "religion" is rugby union: "I played rugby until a broken knee turned me into a referee a couple of years ago. Rugby union was one of the first subjects I started editing when I joined Wikipedia in late 2009. Like many, I started editing on my favourite players and now contribute to a range of articles within the project."

Your Project has over 9,700 articles associated with it. How does the Project keep all these up to standard, and what are its biggest challenges?

  • MacRusgail: Sports articles are frequently subject to vandalism. Also, there is a tendency to rely on websites as references, which leads to link rot. I try to provide print references where possible but unfortunately, rugby books are frequently poor resources. Also, I think there is some confusion, as rugby union is still an amateur sport in the main, and was officially 100% amateur until the mid-1990s, so non-project members frequently confuse amateurism with non-notability.
  • FruitMonkey: Through hard work and long watchlists. That said, we are blessed with editors who hold differing areas of interest within the sport, and we appear to jigsaw quite well together. The biggest problems for the Project are BLP issues, the fact that many of the articles are still "live", and require constant attention and the belief that "if it's not on the web it can't be notable" held by many editors.
  • Bob247: Most of the articles under WP:RU are substandard. Those that have made it into GA and FA are only due to the dedication of a handful of individuals. Trying to keep RU articles from being deleted is one of the biggest challenges, as people consistently compare RU to soccer. RU has only recently turned pro and only the top level clubs in a few countries are pro. Scotland, Wales and Ireland, three of the biggest countries in RU, don't even have their own pro domestic league. They have to have a pro league that spans all three plus now, Italy. I also have to agree with the other editors here in that most editors require references to be web-based, which just isn't the case for RU. Another big challenge was sockpuppetry, which has now been dealt with.

WikiProject Rugby union has 7 Featured articles, 1 Featured lists, 14 Good articles and a Featured portal. How did your Project achieve this and how can other Projects work toward this?

  • MacRusgail: Mainly by group effort, although sometimes individuals must take the initiative.
  • FruitMonkey: Like many other Projects, WikiProject Rugby union relies on a mixture of collaboration and bloody mindedness to achieve these results. Some of our editors don't place as much emphasis on gaining GA and FA status, but then will put lots of effort into the portal, anti-vandalism patrols or just improving existing articles. Sometimes, talking about achieving results just ends in just that, a talking shop. I would suggest that a little bravery goes a long way, make a commitment and listen to those who give advice, your fellow Wikipedians want to see articles succeed.
  • Bob247: Mostly by a few determined individuals.

Does WikiProject Rugby union collaborate with any other WikiProjects?

  • MacRusgail: Not really, although some members are also interested in rugby league and there is a degree of cross-over... [including] very minor cross-over with national/regional WPs, and in a very few instances, with the WikiProject Cricket. However, on articles about players who are more notable for things other than rugby, we find that the rugby info is frequently removed, or worthiness is determined by different criteria.
  • FruitMonkey: There is little obvious cross-over with other Projects, though considering the poor past relationship between the sports of rugby union and rugby league, there is genuine good-will between the editors of both codes. Some of the biggest stars of rugby union have played both sports and members of these Projects will aid articles where we share a common interest. I have also gained a lot of support, due to my interest in pre-1960s players, from WikiProject Military history, whose members are wonderful at finding information on those players who served.
  • Bob247: Very little. On occasions with military, rugby league and cricket.

Your members have been heavily involved with updating articles related to the recently concluded 2011 Rugby World Cup. Can you share with us your experience of working on a live sporting event such as this?

  • MacRusgail: I tend to leave this to others, or wait a few days.
  • Aircorn: I was motivated by the Rugby World Cup to improve articles that I thought might be useful for new fans unfamiliar with the game. These included articles relating to the laws, positions and gameplay. Unfortunately, the comp started before I was completely finished, and I then found most of my attention devoted to watching the game rather than writing about it. The Cup's results are usually updated by others much quicker than I can manage, and I have mainly kept an eye on any controversies that have occurred (most tend to involve fans venting their frustrations over certain referee decisions).

What are the most pressing needs for WikiProject Rugby union? How can a new contributor help today?

  • MacRusgail: Reliable print sources... information on players and RU history from 20+ years ago. We have a bias towards recent events which haven't been entirely rectified. RU articles tend to have a bias towards an Anglophone point of view. We need more information on players/clubs etc. from non-English speaking nations. And even some English speaking countries such as Scotland and the USA! The biggest "hole" in our coverage is probably French rugby.
  • FruitMonkey: The sport suffers terribly from poor sourcing, probably caused by [its] early amateur history, which has resulted in the early history of the sport relying on volumes by amateur enthusiasts, [are] usually in very limited runs. There are plenty of articles that need a bit of TLC and not just from cites. We have important clubs outside the top-flight needing urgent updating and improving, plus hundreds of internationals which are either stubs or simply non-existent. It's a great project for finding articles that you as an editor can make a real impact on.
  • Bob247: Improve article language and add histories to the hundreds of club side pages. And yes, the French pages are particularly poor (but this is also the case for the French wiki so it isn't just the question of translating, which is something I have done in the past), as are the Argentine [articles].

Anything else to add?

  • MacRusgail: I should also mention we had a bureaucratic and time consuming discussion about which Irish flag to use. I didn't find this helpful, or conclusive, but it took a lot of time and energy which editors could have employed elsewhere.
  • FruitMonkey: Due to the amateur nature of the early sport, you can find some wonderful cross-over articles. The President of the Barbarians was the medical officer who was in charge of the liberated Bergen-Belsen concentration camp, a past Scottish player saved tens of thousands of lives through treating malaria in Sudan, a Bermudian England international performed 150 amputations on a single boat load of passengers, while the second youngest Welsh international was killed when struck by a poison dart. You don't hit articles like that every day.

Next week's Report will be about the SysRq button. Until then, read zeros and ones in the archive.

Reader comments

2011-10-31

The best of the week



Reader comments

2011-10-31

Abortion case stalls, request for clarification on Δ, discretionary sanctions streamlined

Activity was at a virtual stand-still this week, with only a single edit to the Workshop of the only open case, Abortion. On October 26, the request to amend the Climate Change case was closed, with William M. Connolley's topic ban being modified to allow editing within the topic of climate change, while still prohibiting him from editing articles about living people associated with the topic. Two days later, another request to amend the case was opened, requesting that Scjessey's voluntary editing restriction be lifted. Also on October 26, a request for clarification on the Δ (formerly Betacommand) case was opened, requesting the Arbitration Committee's opinion on whether the community-proposed task of removing deleted images falls under Δ's NFCC-enforcement ban. The request for clarification has prompted SirFozzie to initiate a motion that would open up a new ArbCom case, tentatively called "Review of Δ sanctions".

Finally, a motion was passed this week that applied discretionary sanctions to articles within the scopes of thirteen prior ArbCom cases. All of the affected cases had imposed editing restrictions when the cases were first closed, so the only effect of this motion was to standardize the wording used in those restrictions.

Reader comments

2011-10-31

Wikipedia Zero announced; New Orleans successfully hacked

Wikimedia proposes Wikipedia Zero

In an effort to increase its mobile presence, the Wikimedia Foundation has reached out to mobile carriers, who it hopes will see value in allowing free access to a "lite" version of the encyclopedia (Wikimedia blog, paidContent article).

The lite version will contain all of Wikipedia's textual content, but no images or other media, reducing the cost to a mobile carrier of supplying the service to users. In return, mobile carriers will hope to "lure in" potential web users with tasters such as Wikipedia. The WMF is following in the footsteps of Facebook, who unveiled a similar plan eighteen months ago. In addition to Wikipedia Zero, the WMF is also taking the opportunity to push for inclusion of "links to Wikipedia in [carrier's] WAP portals and basic browser bookmarks [and] use Wikipedia logos and other branding material in their own marketing efforts" paidContent reported. WMF Senior Manager Amit Kapoor added that the WMF was also "exploring ways to develop feature phone access to Wikipedia through SMS and USSD".

The efforts are forming part of a wider programme of delivering Wikimedia wikis to the developing world, where the mobile-to-desktop browsing ratio is far greater than in developed nations. Even in countries where that ratio is relatively low at the moment, readers are increasingly switching their Internet usage to mobile devices. Whilst in the West smartphones are generally the primary mobile access point for the Internet, the WMF's actions show it is also reaching out to users of older phones, as is common in the developing world.

Originally outlined as a top priority in the five year strategic plan published in 2010, more recently the focus on mobile browsing has prompted the launch of a new mobile site in September (see previous Signpost coverage) and the creation of an Android app set to debut shortly. Users of the new mobile site will be able to "Opt in" to receive beta features as soon as they are available, it was also reported this week on the Wikimedia blog.

New Orleans hackathon explored

Chad Horohoe teaching developers unit testing

Volunteer Development Coordinator Sumana Harihareswara published a writeup of the New Orleans hackathon (which was held in the American city on 14–16 October) this week on the Wikimedia blog (which was also summarised in a wikitech-l post). The two day event, aimed enticing more and more productive volunteer MediaWiki developing as well as allowing developers with different backgrounds to meet in person, included talks from a number of longtime MediaWiki developers such as Chad Horohoe (pictured) and Brion Vibber.

Reporting "broad progress", Harihareswara described the event as specifically helping to further work on "the SwiftMedia extension, Wikimedia Labs, continuous integration, ArchiveLinks, user scripts, Max's API Query Sandbox, Puppetization, Git migration, and more". She also reported how a "volunteer came in on Friday night knowing nothing about developing for MediaWiki, and by the end of the weekend had a working development environment on her laptop and had some ideas about how to contribute".

Future hackathons are scheduled for the Indian city of Mumbai (18–20 November; full details are available) and the British seaside resort of Brighton (19–20 November; full details). The former has been designed to coincide with WikiConference India 2011, and the timing and the proximity of its venue should allow potential contributors to attend both.

In brief

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.

  • A chart showing the self-reported flow of visitors between Google and Wikipedia for selected countries. It illustrates that 30–50% of visitors specifically look for Wikipedia in Google results, whilst 40–60% chance upon the encyclopedia because it is listed as the first result (data from editor survey).
    Google and Wikipedia: chicken or egg? The latest contribution to the long running debate over whether Google drives more or fewer visitors to Wikipedia than Wikipedia drives to Google was published this week on the Wikimedia blog. Survey data, collated for a number of countries, showed that 30–50% of visitors (depending on country) specifically look for Wikipedia in Google results, whilst 40–60% visit it by virtue of the fact that it is the top site listed. The difference has been linked with brand awareness: Japan and the United States were more likely to look for Wikipedia, whilst India and Russia tended to only visit as a result of it being the first result. See also this week's "Recent research" report on "High search engine rankings of Wikipedia articles found to be justified by quality".

  • Feedback dashboard unveiled: Special:FeedbackDashboard has been unveiled as a new way to monitor the early experiences of new editors. It picks out contributions from the MoodBar extension, allowing praise to be noted and any problems new users have encountered to be addressed in a similar way to Mozilla's instant "Firefox made me happy because..." feedback system (Wikimedia blog).
  • Visual Editor updates: the notes from a meeting of the teams working on a next generation parser backend and a new "visual editor" WYSIWYG frontend held on 26 October described progress on both fronts (wikitext-l mailing list). The work could generate a call for testers as early as December although the new functionality is unlikely to be widespread for anything but page creation well into 2012.
  • Software security engineer opening: An opening was created on October 29 for the position of Software Security Engineer. The position is for "a smart, experienced PHP/MySQL software developer with software security experience... [who would] enjoy the technical challenges associated with managing databases with millions of records." The position requires a B.S. or M.S. or equivalent experience, and 5 years of working experience.
  • Discussion over commit access: The question of who should have commit access to the MediaWiki repository was asked once again this week on the wikitech-l mailing list. Lead Software Architect Brion Vibber explained that the WMF, who manage access, had been left in a "no man's land" by the proposed move to Git, which will make it "a *lot* easier to fully participate in the development ecosystem without having to get an account manually approved and created".
  • Work on re-imagined article feedback tool gains momentum: As also reported in this week's "News and notes", the Wikimedia Foundation hosted IRC office hours this week specifically for questions on the article feedback tool. The tool, which was introduced in September 2010 and fully rolled out in July 2011, is in its fifth iteration. In a message to the foundation-l mailing list, Deputy Director Erik Moeller clarified that "the idea is to experiment with some alternative approaches in parallel with the existing deployment, not to scrap the existing deployment and start over immediately". A log of the office hours can be found here.
  • Data analysis results published: The results of a data analysis competition looking at editor retention were published this week (Wikimedia blog). As the Foundation's Diederik van Liere and Howie Fung explained, "what the four winning models have in common is that past activity and how often an editor is reverted are the strongest predictors for future editing behaviour". Unfortunately, errors in the dataset used by the model that came first have since significantly impaired its ongoing usefulness.

    Reader comments
If articles have been updated, you may need to refresh the single-page edition.