Wikipedia talk:Overlink crisis
This project page does not require a rating on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
|
Technical implications
editThere are none, as I corrected the page to say. Database storage alone is not a major cost for Wikimedia and probably will not be for the foreseeable future, compared to its overall technical budget (which is increasing exponentially over time). There are data warehouses that easily deal with orders of magnitude more data than Wikipedia contains. If storage should ever become a problem in the future, the ones paid by the Wikimedia Foundation to pay attention to such things will recognize and deal with the issue in whatever way is best. Until then, you can assume that you don't need to do anything special to avert such problems.
In particular, I would like to comment on this paragraph that I removed:
The guideline WP:PERF focuses on the wiki server-performance issues, not on article-display issues, basically stating that the wiki developers have purposely limited or delayed the server operation to prevent groups of users from creating denial-of-service events. The guideline really focuses on minute-to-minute response time, not on long-term plans about storing Wikipedia data. Exceptions to the guideline admit there are exceptions as unallowable pages, such as when the servers will limit and truncate very large pages that might hog server operation. However, if an editor creates a page with a 1-megabyte moving graphic image, that is not a server-side concern, and readers will simply have to wait until the 1-megabyte graphic is transfered into their browsers. The guideline WP:PERF basically states that no single user can stop server response for all other users, but it doesn't mean users can't systematically make a set of pages way too big or way too slow for comfortable viewing. The wiki servers will simply delay the viewing/editing of big pages, allowing other readers to view/save their smaller pages comfortably.
The guideline is every bit as relevant to long-term as to short-term issues. It is relevant to any issue involving server resources alone. While it's correct (as I have previously clarified) that it does not apply to the very narrow segment of performance concerns that deal with client-side limitations, such as limitations in viewers' bandwidth or processing power, the question of database space is very much a server issue, and as such is the responsibility of the server administration team, not Wikipedians. Of course, adding more links also makes pages' HTML source larger, and so slightly increases the time that people need to take to load pages, but for articles large enough to be problematic, it's the article text that's the problem, not the infoboxes.
I have no particular comment on the merits of the idea outside of technical implications. You could certainly argue that a few concise links are more useful than a forest of them. I've never really found myself using the link-boxes at all, personally. Could be they're just useless clutter. —Simetrical (talk • contribs) 01:10, 14 February 2008 (UTC)
The author of the stuff I removed (Wikid77) re-added it more or less verbatim without addressing my rather lengthy response above. I've removed it again until, at a minimum, he at least responds to my objections here. I would preferably like to see him explain why he thinks that he's in any position to talk about performance problems without having spoken to any developer or system administrator or performance engineer about it, or, from what I can recall, participating in any way at any time in the development of the software or operation of the servers. Thanks. —Simetrical (talk • contribs) 22:34, 7 September 2008 (UTC)
- 13-Nov-2008: I had thought that by re-adding that text (on 7 September 2008), that User:Simetrical would read and understand it: it simply says that if a user creates a huge page, it will be hugely slow to edit or display, and that's not the fault of the servers or any developer. Please calm down, I'm not blaming anyone that Wikipedia is slow or is bloated with wikilinks. There's no need to censor those statements in fear of being blamed. I'm an experienced system manager and computer scientist (including 18-variable commercial loan calculations, and credit life insurance). So, I understand the concern "why he thinks that he's in any position to talk" when it comes to analyzing technical "Capacity planning" with computers. Also, I have worked on numerous navboxes. These matters can be very difficult to predict, and many computer people would be puzzled by the math. I'm not the enemy here, I'm just trying to reduce articles to half size (or smaller). Unless you profit by selling thousands of disks to Wikipedia, I am not the enemy. -Wikid77 (talk) 07:23, 13 November 2008 (UTC)
- I read the text the first time. It was just as groundless the second time. If you want to advocate that pages be made smaller or have fewer links as an editorial choice, then go ahead. If you want to argue they'll make anything slower, that needs to be backed up by facts, because your theory is wrong. —Simetrical (talk • contribs) 02:59, 14 November 2008 (UTC)
Overlink sizing example
edit13-Nov-2008: I will give an example of overlink sizing math so that the numbers can be seen: consider 4,000 articles that each have the current average of 100 revisions ("100.69" on 13-Nov-2008), and each article has remained at 5,000 bytes, with each delta revision stored as 1,000 bytes (to simplify sizing). The total space for those 400,000 article revisions would be as follows:
- total_page_space = article_space + revisions_space = 420 million bytes, where,
- article_space = 4,000 * 5,000b = 20,000,000 bytes
- revisions_space = 4,000 * 100 revisions * 1,000b = 400 million bytes.
Now consider that each article contains a navbox linking the other 4000 small articles, with the navbox using 4,000 20-letter wikilinks (3,999 + a header link), so a navbox is over 80,000 bytes (20x4000). The extra wikilinks are 4,000 x 4,000 (quadratic) as 16,000,000 total wikilinks. Assuming each wikilink is stored as 30 bytes in the page-link database(s), then the total is:
- total_link_space = 30 * 16,000,000 = 480 million bytes
In this example, with just one navbox per article, the storage for extra wikilinks in the page-link database(s) has exceeded the entire storage of the 4,000 pages (articles+revisions).
Now imagine, a likely scenario where several "tangent" navboxes (totalling 1,000 more wikilinks) are added to those 4,000 articles, so the total wikilink storage increases by those 1,000:
- total_link_space = 4,000 * (4000+1000 links)*30 = 600 million bytes.
In that likely scenario, with multiple navboxes per article, the total space for wikilinks is nearly 1.5 times the total storage of the articles modified 100 times each.
Note, this is a simplified example that doesn't even consider the formatting of each small article to transclude more than 80,000 bytes of navboxes appended at the end and sent over the Internet. Plus, to simplify viewing an article, more navboxes will be appended with "Show/Hide" so the thousands of wikilinks are not always seen by a reader. An article "Einstein" can have navboxes for "Famous physicists", "Nobel winners", "Quantum pioneers", "German geniuses", "Swiss workers", "Zurich professors", "NAZI victims", "UN advocates", "Ban the Bomb", "Violin scientists", "Double divorces" and many more. That was half-joking, but you should see other wild-tangent navboxes being piled up. If each navbox was just the compressed navpage link, then Einstein navboxes would reduce to less than 30 wikilinks, rather than thousands. -Wikid77 (talk) 07:23, 13 November 2008 (UTC)
- Well, your premises seem to be pretty ludicrous to me (how many navboxes have 4,000 links in them?!). But it doesn't matter. Even if the amount of space used scaled quadratically with the number of articles, it wouldn't matter much. Quadratic is not very fast. The database size has been increasing more or less exponentially for all of Wikipedia's existence. Wikimedia's technical budget has also been increasing exponentially, so it's able to handle the added load with few problems. You have given no plausible reason why this marginal added increase in database size would cause any of the catastrophes your text outlined; and why, if it would cause such problems, it hasn't already with 15,000,000 pages on enwiki. In short, your claims are unjustified by either theoretical or empirical evidence. —Simetrical (talk • contribs) 02:59, 14 November 2008 (UTC)
Implications for 'What links here?'
editHello - one of the most useful Wikipedia features is the 'What links here?' link. Once an article is included in a popular navbox, the specific useful in-links are dwarfed by a huge number of links from merely tangentially related pages. This is a real over-link crisis. What's particularly annoying is that the navboxes provide no real new functionality, just the questionable convenience of not needing to click twice. By contrast, once the 'What links here?' page is swamped by template links, that feature is essentially ruined. I also suspect that navboxes are created simply to add visual interest.
Anyway, as far as I can tell, nobody else is bothered by this. However, this seemed a good place to raise my concerns.
Until a technical fix exists, I would like to see link-stuffed navboxes removed. - Crosbiesmith (talk) 20:20, 30 September 2008 (UTC)
- 13-Nov-2008: Thank you for noting that problem about users trying to find "real links" to an article, where those wikilinks directly mention the article, in context with the descriptive text. -Wikid77 (talk) 07:23, 13 November 2008 (UTC)
- I'm confused... if those links are only visible due to being included in a template, then how come they still show up when one clicks "hide transclusions"? --Explodicle (T/C) 21:09, 30 September 2008 (UTC)
- 13-Nov-2008: I tried it & agree that "hide transclusions" is broken on a "What-links-here" page. See work-around below. -Wikid77 (talk) 07:23, 13 November 2008 (UTC)
Showing real links to an article
edit13-Nov-2008: When clicking "What links here" (left-menu of a page), all links to an article are shown by default. Those links also include links from navboxes, where the wikilink is not really mentioned directly in the article text. Well, what if you want to list just pages that actually mention the wikilinked article, in context, as part of the upper text? Supposedly, clicking "Hide transclusions" would omit those transcluded navbox links; however, in November 2008, that option appeared broken, and the direct links were shown by "Hide links" instead:
- clicking "Hide transclusions" - doesn't work to omit navbox links;
- clicking "Hide links" - actually hides navbox links, showing only direct links from article text, not from templates (infoboxes or navboxes).
So, click the option "Hide links" to hide navbox or infobox references to a particular page on What-links-here. What appears to be "broken" click-options might be valid display-options instead: when it says "Hide transclusions | Show links" then that's what you get (don't view those as buttons, but as options-in-effect). However, I understand that it is broken, in the sense that "Show transclusions | Show links" then shows nothing. It would be great if experienced wiki-users could help fix those problems in the wiki-software, but that's controlled in a different organization, "MediaWiki" not Wikipedia. -Wikid77 (talk) 07:23, 13 November 2008 (UTC)
- Hi Wikid77. That doesn't work for me - 'Hide links' hides everything but redirects. See for example Pages that link to "Galloway hydro-electric power scheme". - Crosbiesmith (talk) 07:47, 15 November 2008 (UTC)
- I agree, doesn't work for me either. Has anyone found any reference to this at mediawiki?--Larrybob (talk) 17:07, 16 January 2009 (UTC)
Proposal on the use of redirect
editI have a proposal to deal with this over the medium term. Navboxes should not direcly link an article; instead they should link a redirect specially created for the purpose. Because 'What links here?' indicates which links arrive via a redirect, it becomes possible to differentiate between specifically created links and those which were simply included as a template. I propose that such specifically created redirects should take a name ending in '(Template)' to indicate their purpose.
As an example, see the template Template:Scottish Energy. I have altered it such that it now links to Galloway hydro-electric power scheme (Template), instead of the article itself. All future links generated by this template will now be apparent in the Pages that link to "Galloway hydro-electric power scheme" report.
Note that this does not work overnight. Links are updated only when the article itself is updated, not when the template is changed. As of today, only a few links in the 'What links here?' report have been moved, as only a few of the transcluding articles have been updated. Over time, however, most or all other articles will be touched, and 'What links here?' will gradually be decluttered, greatly increasing its usefulness. This proposal need not be adopted wholesale to be useful - it could be adopted as and where a need arises to declutter particular articles' 'What links here?' lists.- Crosbiesmith (talk) 19:58, 1 February 2009 (UTC)
What purpose does this serve? The only problem here is aesthetic, and this does absolutely nothing to change the aesthetics of the situation. It just makes things uglier and more confusing. —Simetrical (talk • contribs) 19:05, 2 February 2009 (UTC)
- The purpose is to allow users to tell which articles link to a particular page. The purpose is to make the 'What links here?' function serve the purpose it served before navboxes became ubiquitous. The aesthetic problem is not the only problem - Crosbiesmith (talk) 20:13, 2 February 2009 (UTC)
Can we do this?
editI had basically the same idea @ Help_talk:What_links_here#Question._How_do_we_hide_excessive_linkage.3F. This could be implemented by a bot: a bot could go to every navbox, replace every ordinary link with the transformed alternative, make sure that the redirect (from the alternative to the original link) has been created correctly, and generally maintain that no page except the navbox explicitly uses the transformed alternative link form. I suggest that the alternative, rather than appending " (Template)" appends " (from Template:Scottish Energy)" which is much clearer for users to understand and allows distinguishing between multiple navboxes containing the same links.
How do we go ahead and get this done? Do we need to find somebody with a bot account and a server? Is there somewhere we can propose and vote for this to be implemented? Somepage more watched than here? Do we need to fully implement it by hand on some example navboxes as a demonstration first? It seems that this page is not really the optimal forum, since we're not concerned with "excessive linkage" per se, just with practical nonfunctionality of "what links here" due to navboxes. Cesiumfrog (talk) 03:09, 19 September 2011 (UTC)
Fixable by bots?
editI believe that much of the overlink crisis could be dealt with by bots. (i) Bots could remove certain links automatically. Otherwise manual cleanup could be assisted by a bot that inserts either (ii) comments or (iii) cleanup-required templates to bring suspected problems to the attention of editors. A combination of link text content and context could be used to drive decisions. If a link is found to be repeated, this might be used to drive a "first instance only" culling policy, but this might be an example of where it would be better to let human intelligence make the choice as in a long article the presence of repeated duplicate links might be judged to be beneficial. Suggestions? CecilWard (talk) 10:33, 14 July 2009 (UTC)
- If it isn't fixable by 'bots, it would certainly help if software tools were made available to editors to find and categorized repetitive links in articles. When I encounter articles that are loaded with wikilinks and notice the same links used over and over throughout because the author or previous editors were too lazy to remove brackets during copy-paste operations, I will leave the first instance as a link and remove links from most, if not all subsequent instances. This is done with a tedious find-and-replace procedure in an external text editor. I envision an analysis tool that finds all links that occur two or more times in the body text — not in refs or footnotes — and lists them in descending order of frequency for the editor to decide which ones to keep and which ones to remove. That way, the most egregious overlinking could be handled first. — Quicksilver (Hydrargyrum)T @ 19:00, 4 October 2019 (UTC)
Link density could be selectable by user preference
editThe essay says (somewhat weaselly): "... some users may find the information overload or clutter of too many wikilinks to be an aesthetic or usability problem." This could well be true. And some other users may benefit from having lots of links, even on common words. One size does not fit all. A compromise would be to classify links according to relevance (with links on common words having lowest relevance, and links on topic-specific jargon having highest relevance), and then a user preference setting would determine the link relevance level to display. Linking the jargon is helpful for (probably) most readers and editors, particularly when the jargon is not standardized, and unlinked jargon won't as easily alert editors to synonym disease between related articles. For example, in power station articles there are many synonyms for terms such as capacity factor and nameplate capacity, so we want all instances of such terms and their synonyms to be linked at least once per screen. In some cases no good link exists yet for various terms that end up getting independently and inconsistently redefined in several articles (such as the often-confusing term "additionality" which appears in several articles about economic development and carbon offsetting; we have a Wiktionary link for it, but the definition there does not cover all its usage on Wikipedia).
Navboxes seem to be an issue quite different from in-line links. Navboxes that already collapse to a single line at the very bottom of an article do not "clutter" an article in the same way that in-line links do. An actual usability study would be necessary to determine that. I would imagine the lead section gets more views than any other part of an article, with views decreasing toward the bottom, possibly in a Pareto distribution. (However, a usability study would have to determine that.) "Boxifying" a complex parent topic can be quite helpful in a subsidiary article which might otherwise not indicate the larger topic structure or the minor topic's place within it. Having a navbox helps editors standardize their presentation of articles that share a parent taxonomy. Otherwise we get different groups of editors writing redundant, unmaintainable, and sometimes conflicting parent topic summaries within minor topics. However, I agree that a navbox should not try to duplicate an entire "Index of ..." article.
External links are in another class again, they proliferate for different reasons (e.g. promotion or advocacy), and we have other rules to keep a lid on them (WP:EL, WP:SPAMHOLE, etc.). --Teratornis (talk) 00:00, 11 March 2011 (UTC)
Not just navboxes: in-text wikilinks possessing a serious threat
editFirst of all, I'd like emphasize the importance of this article and how glad I am that this problem has been taken seriously in the Wikipedia Community. In my own experience as a contributor, however, it isn't just the navboxes that are causing a problem. It is the very regular in-text wikilinks.
Excess linking does not only lead to a quadratic explosion in the total number of links, but reduces the quality of articles making them hard and unpleasant to read. Excess linking also makes a disfavor for the contributors as they might not really describe the terminology they are using, but just throw a wikilink instead.
I personally think it like this: if you are writing an article on paper, how many times and in which occasions you would like to add brackets referring to an external source? E.g. Volkswagen is a German (see Bayer, H.: World Geographic Atlas, 2013) automobile manufacturer headquartered in Wolfsburg, Lower Saxony, Germany (see Dickinson, R. E.: Germany: A General and Regional Geography, 2011). Volkswagen is the original and top-selling marque (see Kotler, P.: Principles of Marketing, 2006) of the Volkswagen Group, the biggest German automaker and the second largest automaker in the world.
I am not sure if the above-mentioned serves as the best possible example, but I think it demonstrates quite well the situation with excess linking. If one sentence has even up to three wikilinks (and many times even more), and this keeps recurring from sentence to sentence, the text would be absolutely ludicrous! In a paper-published article you couldn't naturally do that.
I am optimistic about the future though, and a lot of good work has been done in order to fix this matter. WP:LINKICRISIS and WP:OVERLINK are already raising awareness on this matter. Much is still to be done. For example, at WP:OVERLINK, the names of major geographic features and locations; languages; religions; common occupations; and pre- and post-nominals are mentioned in the list of "not to be linked". This is just a tiny fraction of the problem, but perhaps the aforementioned could be extended to geographic locations in general (towns, cities, regions, countries). Well, that's just an example, so better not to grasp too much to it.
Anyway, great thanks to everybody working in order to fix this problem and to improve the Wikipedia project! :) Jayaguru-Shishya (talk) 18:41, 6 March 2014 (UTC)
Crucial other essay to link to immediately!
editDon't forget to add this to the see-also section at Wikipedia:Indulge in histrionic hyperbole essay, and vice versa, or the entire system will collapse!!! — SMcCandlish ☺ ☏ ¢ ≽ʌⱷ҅ᴥⱷʌ≼ 21:22, 8 May 2014 (UTC)