We have clear guidelines on how to best disambiguate (distinguish) article names from each other. With the exception of place names, which uses commas, these require the use of bracket terms, for example Robert Smith (musician) and Robert Smith (mathematician). This ideal, in practice, relies on good naming conventions, and an understanding of what common terms to use are. For example, Joe Bloggs (trainer) or Joe Bloggs (coach)? I aimed to analyse existing disambiguations, and to spot useful trends and biases in the way users disambiguate.

Previous studies

edit

The author of this study respects the work done by Kevinkor2 when he compiled his report in January 2007. Many of the conclusions drawn from this study look at the changes between these two points in time; it is assumed that similar criteria were used (only mainspace pages, for example) or that difference were statistically insignificant (discarding redirects, for example).

Method

edit
  • Using the Toolserver, a list all article names (not including talk- or sub-pages or redirects) containing brackets was created. This list was accurate, as of 1 May 2009. 320,000 records were collected.
  • Everything except the contents of the brackets was stripped away and discarded.
  • The term 'disambiguation' was also discarded, to narrow the sample. 293,000 records were collected.
  • Everything other than a sample of 65,535 (22.3%) was discarded.
  • These contents were analysed by passing them through regexes; counts of those matching these were recorded.
  • Total counts for the top 50 individual terms were made, correct as of 9 May 2009. These referred to the total population i.e. all articles on Wikipedia.

Findings

edit

These were my findings, including a list of the top 50 disambiguation terms. The full lists are available via the navigation box below.