Wikipedia:Wikipedia Signpost/2011-05-30/Technology report

Technology report

Wikimedia down for an hour; What is: Wikipedia Offline?

Wikimedia wikis down for an hour

As noted in last week's "Technology Report", Wikimedia wikis underwent a scheduled downtime of one hour on Tuesday 24 May at around 13:00–14:00 UTC. The downtime meant that the Foundation has already missed previous aired targets of limiting downtime to just 5.256 minutes per annum (equivalent to 99.999% uptime) and 52.6 minutes (99.99% uptime) for this calendar year. However, the work does appear to have been successful at reducing the quantity of out-of-date pages served to readers and other similar problems.

During the downtime, designed to allow the operations team sufficient time to "update the router software and tune the configuration", access to Wikimedia sites was intermittent. The episode and associated issues was alluded to by cartoonist Randall Monroe on his comic strip xkcd (see also this week's "In the news" for more details). Wikimedia developers enjoyed dissecting the technical aspects of the cartoon on the wikitech-l mailing list.

What is: Wikipedia Offline?

Many Wikipedia editors can now access the Internet from multiple locations: at home, at work, even on-the-go with smartphones. In 2010, however, only 30% of the world had any access at all to the so-called "World Wide Web", even when the high rates of availability found in the developed world are allowed to skew the data (source: CIA World Factbook). Since the Wikimedia Foundation's aim is to "encourage the growth, development and distribution of free, multilingual content", it is clear that either the remaining 70% will have to be supplied with the Internet so they can access the online versions of Wikimedia wikis, or the Wikimedia wikis will have to be provided in an offline-friendly format (in contrast, 50% of the world has used a computer, according to Pew Research). The "Wikipedia Offline" project, then, is a WMF initiative aimed at spreading its flagship product freely to the two billion people who use a computer but cannot access the Internet.

There are two parts to the challenge: firstly, in ensuring that there are Wikipedias in as many languages as possible. The number of users for whom a Wikipedia exists in a language they speak was recently estimated as above 98% (foundation-l mailing list); about 82% have a Wikipedia in their native tongue (also foundation-l). The second challenge is the technical one of supplying the information. A current strategy of the Foundation is to continue to make the raw data of Wikipedias available via so-called "dumps", while simultaneously supporting open-source programs that can process these files. In combination, this will allow whole Wikipedias to be either downloaded when an Internet connection is available, or to be shipped on DVDs or other portable media. This runs alongside the Foundation's existing project to select the most useful articles from a given Wikipedia, hence condensing an encyclopedia onto a single CD.

While "dumps" are largely tried and tested (though recent work has focussed on improving their regularity and reliability), there have also been efforts to enable the export of smaller "collections" of articles, for example those relating to major health issues faced by developing countries. This was in part provided by a new export format (ZIM, developed by the openZim project) that can be read by some offline readers. However, ongoing efforts focus mainly on the second half of the strategy: the provision of a good-quality reader capable of displaying off-line versions of wikis. A number of possible readers were tested. The "Kiwix" reader was selected in late 2010, and the Foundation has since devoted time to improving its user interface, including via the translation of its interface. There is also competition from other readers, including "Okawix", the product of the French company Linterweb. User:Ziko blogged last week about the differences he found between the two. Which, if either, will become the standard is unclear, because it is such a fast-moving area.

See also: Wikimedia strategy document, update on Wikimedia's progress (as of March 2011).

In brief

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.

  • On 26 May user MZMcBride reported at the administrators' noticeboard that some deletion debates and arbitration pages were being indexed on Google (that is to say, appeared in Google's search listings, despite this being prohibited by the English Wikipedia's "robots.txt"). The problem was traced to a complexity in Google's spidering system, which does not equate ordinary characters (such as ":" and "/") with their encoded forms ("%3A" and "%2F"). As a result, effective blocking requires a number of additional variants to be listed. Within 50 minutes the Foundation's operations engineer Ryan Lane was working on the case, and NOINDEX code was added to relevant templates. NukeBot (run by admin NuclearWarfare) also began to add the directive to each page in turn to enforce non-spidering. Afterwards, bug #29162 was opened to propose automatically handling such cases in future.
  • David Gerard launched a vocal attack on the GFDL software licence recommended to developers by the Free Software Foundation. Instead, he advised that "[developers should] use CC by-sa, CC-by or Public Domain ... If it's a software manual, [they should] license or dual-license it under the same licence as the software itself".
  • Wikimedia developers honoured a request from the Foundation's legal department (one of what are termed "office actions") to delete certain image files permanently from Wikimedia servers (server admin log).
  • The codebase on the anti-vandalism tool Twinkle was updated, prompting a number of bug reports and some non-functionality during the transitionary period. It is now "gadget only", according to its developer User:AzaToth (English Wikipedia's Technical Village Pump). Many other Wikimedia wikis have their own copy of the tool; many of them will have to be updated manually.
  • Bug #27465, which prevented the SVG parser from rendering unusual but perfectly valid images, was fixed.
  • Magnus Manske, one of the original developers of the MediaWiki software, began a new blog. His first blogpost concerned one of the gadgets he has written for Wikimedia sites, "Commons Commander".
  • The Article Feedback extension for rating articles was listed on the Foundation's "Software deployments" page to be expanded to all articles on the English Wikipedia on 31 May. The lack of publicity given to the deployment raised criticism from some quarters, particularly in the light of recent controversies about the Pending Changes feature (example 1, example 2). Erik Möller explained that the page was in error, and instead announced that the tool would be rolled out incrementally over the next few weeks. In related news, a fix preventing the tool from appearing on redirect pages was pushed live to Wikimedia sites (bug #29164).