Wikipedia:Wikipedia Signpost/2014-09-17/News and notes

News and notes

Wikipedia's traffic statistics understated by nearly one-third

A Wikipedia researcher has discovered that the encyclopedia's widely used article traffic statistics are missing out on approximately one-third of total views.

Computer scientist Andrew West has found that mobile readers are not counted by stats.grok.se (an unofficial website linked from the "history" tab on every Wikipedia page) or any other service/report that tabulates and visualizes the Wikimedia Foundation's official raw data. Thanks to a historical artifact, desktop and mobile counts have been segregated since the figures were first released in 2007. "The world has changed a lot since the original code was written," the WMF's director of analytics Toby Negrin told the Signpost. "We are working hard to catch up."

Impact

Of 9.5 billion total views to English Wikipedia in August 2014, about 3 billion—31.6%—are not reported in the raw per-article statistics. Other projects are assumed to have similar omissions based on their own mobile viewership ratios.

West told the Signpost he ran into the problem when collating view statistics for the English Wikipedia's Medicine WikiProject. The figures are being used in an upcoming academic paper comparing Wikipedia to WebMD, the World Health Organization, the National Institutes of Health, and other high-traffic medical websites. West caught the error early enough to add a disclaimer, but he's "curious and fearful as to how many other WikiProjects and researchers might have fallen into the same trap."

Unfortunately, that number is not zero. For a new example, Variety's new "Digital Audience Ratings" use Wikipedia's traffic statistics as a key cog. Jason Klein of ListenFirst, the company writing the posts, said in an interview with Lost Remote that "We have been monitoring Wikipedia page views daily for tv shows (as well as films and consumer brands) for over two years, and have found fascinating trends ..." (Editor's note: for additional information, please see this week's "In the media").

Similarly affected are the English Wikipedia's top 25 viewed articles (ten of which are used in the Signpost's weekly "Traffic report"). All of these initiatives are missing out on what West calls the mobile "bump" that popular culture and breaking-news events kindle.

The largest ramification may be reserved for users in the global south, where a higher percentage of individuals use mobile phones to surf the web. High-priced traditional computers can be out of reach for large segments of the population, who have turned instead to smartphones; this was a chief inspiration for the Wikimedia Foundation's Wikipedia Zero project. Pgallert laid out the scope of the computer issue on the Wikimedia blog last year:


Future

Negrin told us that they are aware of the problem and are currently working to replace the current apparatus with a "modern, scalable system," which will come out in a preliminary form next quarter. The team is also working on a redefining what a "page view" is, taking modern concepts like mobile apps and web, API requests, and automated bots into account. Negrin added that "fortunately, we'll be in a position very soon to provide more accurate data to the Foundation and the Community."

The work involved in this is not negligible. As research analyst Oliver Keyes wrote to us, "The overall page view trends are of increasing importance to how we understand how people consume our site. At the moment we ... have a lot of ideas and a lot of the nuts and bolts worked out and tested, but it's fairly inchoate and needs to be organised better before we do anything with it. Once we have done that, we'll move on to implementing it and running it in parallel to the existing infrastructure to detect irregularities."

In the meantime, the unofficial status of grok.se (it is still listed as a "beta service") and the varying reliability of the WMF's data dumps leave researchers like West in the lurch. For example, grok.se periodically misses full days of stats (such as 28 August), which invariably leads to frustration with the website's coder, Henrik—but the issue lies with the WMF-released data. In the example, the traffic statistics for five hours (UTC 16:00–21:00) are missing.

It appears that statisticians, researchers, and curious Wikimedia contributors will have to wait only a little longer for a more stable and reliable solution.

Editor's note: emails to Henrik, the owner of stats.grok.se, and Domas Mituzas, the former WMF database administrator who originally coded the raw data output, were not returned by publishing time.
Update: a new Pageview API was released by the Wikimedia Foundation in December 2015, and stats.grok.se has been replaced by a WMF Labs tool since January–February 2016.

In brief

Rachel diCerbo, new manager of the WMF's Community Engagement (Product) team.
  • Indian chapter in crisis: A major community consultation about the future of Wikimedia work in India will be held in Bangalore on 4 and 5 October. This follows a community planning process for the event, which comes after general recognition that key parts of the movement's presence and programmatic activities in India need to be revamped. The unstable situation in India includes the state of the chapter, which held an emergency meeting on 31 August. Three members of the chapter board have resigned in recent weeks: the president, Moksh Juneja, who was the subject of Signpost coverage concerning his failure to disclose to voters before last year's board elections that two sitting members were in his employment; Pranav Curumsey; and Srikanth Ramakrishnan ("I am not pleased with the way things are working out right now"). This follows a further loss to the board due to the non-renewal of membership by Nikita Belavate at the end of June. Ramakrishnan wrote last week that "since the chapter is now in a limbo", an administrator should be appointed "to conduct the Annual General Meeting and elections as soon as possible and run the organisation in the interim."
  • Template reform: A Let's fix templates thread was opened on the Wikimedia mailing list after heated discussions concerning the Media Viewer roll-out and the difficulties of developing software products that face a sprawl of inconsistently built templates on WMF sites. The thread has been followed by the WMF's launching of a metadata cleanup drive on Meta. The goal is to "fix file description pages and tweak templates to ensure that multimedia files consistently contain machine-readable metadata" across WMF projects.
  • Community feedback on product development: Editors' attention is drawn to the page for community feedback and discussion on improving the ways in which software components are built and delivered to communities. The page has been established by the Foundation's relatively new Community Engagement (Product) team, headed by Rachel diCerbo. Editors of all WMF projects are encouraged to engage on the talkpage.
  • Affiliations Committee: Three new user groups have been approved by the WMF's AffCom: Cascadia, LGBT, and Ghana.
  • IEG: The second round of individual engagement grants is open for submissions.
  • Stub contest: The stub contest is open until 30 September; prizes will be awarded to the winners.