Wikipedia talk:Size comparisons/Archive 1

Latest comment: 14 years ago by Emijrp in topic Bad Link
Archive 1

Just how many pages about Wikipedia's size and activity do we need? We've got Wikipedia:Statistics, Wikipedia:Size of Wikipedia, Wikipedia:Traffic, and now Wikipedia:Size comparisons. Probably half a dozen more I haven't noticed yet, too. --Brion 19:15 Sep 11, 2002 (UTC)


These and the other new statistics pages are really fantastic! --Larry Sanger


I moved this here from the main article:

In addition to the comparisons above, the size of wiki articles can be compared with the size in other encyclopedias.

The table below compares the number of words for five topics in five categories between wikipedia, www.encyclopedia and encarta.

word count
category subject encarta wiki encyclopedia.com
literature shakespeare 3130 765 2230 (without summaries of individual plays)
mathematics calculus 3298 1093 1377 (including diff. Equotions)
geography france 60841 9020 6712 (in wiki, including text from main articles)
biology elephant 4318 673 891
history attack pearl harbour 339 940 315

It looks like we can compare ourselves well with columbia/www.encyclopedia, but still have a long way to go to catch up with encarta when it comes to the depth of information in articles.

As for the topics, I have tried to choose topics from different categories. With the exception of the mathematics, topic, I suppose they may be topics easily choosen by kids for a presentation at school.


Please describe which specific articles were used for the above comparison (article titles). I do not think this is a fair comparison if relevant linked articles that contain further information are not counted. --Eloquence 05:52 20 Jun 2003 (UTC)


hi Eloquence,

what you ask would require much more time than the original count required. You may have noted that I stated I included text from main articles when appropriate. 'main articles' are used in the article about France as sub articles as the original article became too lengthy.

Let me detail my count for one of the lines from the above table: France:

  • Encarta: 15 page article, with subtopics:
    • Introduction;
    • Land and Resources;
    • People and Society;
    • Culture;
    • Economy;
    • Government;
    • History.

Not included are listed related topics as: capital, Paris. Charlemagne, major figure in medieval French history, more... Total 60841 words, excluding numerous adds. For counting I copy/pasted everything inwo MSWord 2000 and used the count words function.

Not included is Geography, a listed of some of the major townas. Total of 6712 words,using same technic as for encarta.

Not included was

List of regions in France

I am no longer sure if I included: French literature I probably did, but it consists of little more than a list of French authors.


The problem is that Wikipedia articles are typically not neatly integreated long discussions. Encarta tries to do this with main topics such as "France", whereas we have articles about each individual aspect of France's history, politics, culture etc. So if you wanted to do a fair comparison, you would have to make a keyword-style list of the topics discussed in Encarta's "France" article and then check which of these topics are covered in Wikipedia, and then compare the length of the overall coverage. And to be fair, you would have to go the other way, too, and look at all the Wikipedia articles about France and see if coverage within Encarta exists.

Here are some other France articles that are relevant:

...

Do equivalents exist within Encarta for each of those? If not, are the topics covered within the main article? If only the main article covers them, the size comparison must include the word count from the Wikipedia articles.

As you see, it is quite difficult to use a fair methodology with projects that have such different information organization. In comparisons it is usually best not to pick articles about large meta-subjects, but to compare those about individuals, or specific historical events instead. I have done this (using random picks from Encarta and Wikipedia) in part 2 of my German article series about Wikipedia, and Encarta did not look very good. I also compared some of the Brilliant Prose stuff in Wikipedia to the Encarta equivalents, and found that, at least the German Encarta did not even have articles about essential stuff like the Milgram experiment. Coverage of controversial subjects like sexuality or real-world conspiracies (MKULTRA, COINTELPRO) is close to zero. Articles about movies, books etc. are often extremely POV ("masterpiece", "his best work").

Where Wikipedia fails is with subjects few people care about, but which are nevertheless important for a reference work, e.g. Politics of Cambodia. --Eloquence 08:17 20 Jun 2003 (UTC)


hi User:Eloquence,

You are right that it is difficult to make comparisons between the three pedias because of their different structures. I realized this from the onset, that's why I started including notes in the rightmost column about what I had and had not included. It was my intention to include a note about this, no comparison will be perfect.

I did a quick count the same way I had counted the articles (copy/paste text into msword, than count words) you mentioned, with the exception of the freedom fries, which has more to do with US based 'Francophobia' than with France itself. France should be expanded with 22.453 words, making a total of 9020+22453=31473. That includes numerous quotations of the Brittanica 1911.
I agree it is a major boost - but it still brings us only half way the encarta size. And because we have splitted things up, we have numerous overlaps. Compare the main article of France with its subpages (mentioned as 'main articles' in the France article) to see what I mean.
I did not check if encarta has separate entries on any of the subjects you mentioned.

I also did a quick recount of the elephant, when I included:

The number of words is raised with 919 to 673+919=1592, still not 50% of what encarta has.

The essential conclusion remains the same: we are a par with columbia (in some ways have surpassed them) but are still not at the same level as encarta. Thats not a bad thing, I guess we grow faster.

I would have loved to include info on the encyclopedia brittanica, but my time is too limited to use their temporary access offer for such a count.

You removed this info from the article to the discxussion page. What should we do to restore it? - I feel the count does contain relevant info TeunSpaans 18:29 20 Jun 2003 (UTC)

Linking to http://www.wikipedia.org/wikistats/EN/Sitemap.htm is not very helpful.

(results in: You don't have permission to access /EN/Sitemap.htm on this server.)

--Icekiss 11:31, 14 November 2006 (UTC)

The correct link is http://stats.wikimedia.org/EN/Sitemap.htm emijrp (talk) 11:32, 31 August 2010 (UTC)

American Jurisprudence

"American Jurisprudence? 1st ed. is an 83 vol. collection of American common law, 2nd ed. 231 volumes!" seems tangential to the discussion. --Twinxor 15:51, 7 Jul 2004 (UTC)

Comparison with Encyclopaedia Britannica

One reason why Wikipedia has more articles than Encyclopaedia Britannica is that Wikipedia has a guideline that articles should generally not be more than 32 kb long. This is yet another way that Wikipedia differs from every other encyclopedia published on Earth. For example, the Britannica article "China" is over a megabyte long--longer than any article on Wikipedia. The Encyclopedia Americana article "United States of America" is 1.9 MB. World Book's article is 211 KB and Columbia Encyclopedia's is 229 KB. Comton's Encyclopedia's article on China is 233 KB.

As for covering more subjects, of course we do, but it's my opinion that we tend to cover much-less important subjects. For example, a favorite pastime of mine is skimming through EB and looking for articles it covers that aren't in Wikipedia. See "Culture of Cameroon," a stub article dealing with the culture of a country of over 13 million people as an example. I expanded the section in the main article, though. Many of the articles I created (e.g., Ebira, Tehuelche, Sahaptin people, among many others) are dealt with very extensively in Britannica,[1][2][3] but had no articles at all here before I intervened.--Primetime 20:57, 16 April 2006 (UTC)

Primetime has a passion for denigrating Wikipedia, and for engaging in an argument with imaginary antagonists who think Wikipedia is finished and perfect - but there are no such people. His selection of articles is crudely and obviously biased. Wikipedia has many articles on mainstream topics which are far superior to those in Britannica; and many mainstream subjects more important than those he mentions are not covered in Britannica at all. But as he has shown repeatedly that he has no interest in carrying out a balanced assessment of where we stand I am wasting my time responding to him again. Hawkestone 06:42, 17 April 2006 (UTC)
Actually, I wrote the essay above in response to an edit summary stating that the article length of EB was irrelevant as Wikipedia covers "vastly more topics". I just get kind of upset when people treat pages about other encyclopedias as places for Wikipedia pep rallies, that's all. On the "Encyclopaedia Britannica" article talk page, I got defensive as I noticed some people were pretending that Wikipedia was perfect, especially when editing the article. I don't think that Encyclopedia Britannica is perfect--in fact, I'll admit that I've noticed more innaccuracies than other printed encyclopedias like Americana and World Book. I would add that to the Britannica article, but I'm sure someone would revert me. I also think that the Macropaedia's long article format is hard to use and that the writing can be a bit unnecessarily complicated at times (more so than other encyclopedias but not as bad as here). To say that Wikipedia is almost as accurate as Britannica is misleading because I always thought that Britannica was one of the less-accurate printed encyclopedias available. Not many users seem willing to criticize Wikipedia, so I'm willing to step in for a reality check every once in a while.

Also, can you please clarify what in the page scans of the Allgemeine Encyclopädie der Wissenschaften und Künste leads you to believe that it isn't anywhere near Wikipedia's size?--Primetime 07:04, 17 April 2006 (UTC)

The scans show around a thousand words a page, maybe less. Most of the volumes were below 500 pages and there were 167 of them. 1,000*500*167 is 83.5 million. Wikipedia must have passed 400 million words now. How a comparison with Britannica can mislead about how Wikipedia compares with Britannica is unclear to me; once again you are imagining yourself to be taking part in a debate which isn't actually happening, in this case a survey of the accuracy of English language encyclopedias in general. Your past comments were so aggressive and biased that they didn't act as the "reality check" you say you wanted to provide, but simply seemed to be abuse from someone who was out to win an argument rather than define the truth. Hawkestone 07:51, 17 April 2006 (UTC)
That is a very unscientific count, and words in German are much longer than in English, as they are more inflected. (In addition, see Wikipedia:No Original Research.) Finally, see Wikipedia: Assume Good Faith. I am trying to keep the article balanced.--Primetime 08:21, 17 April 2006 (UTC)
  • I suppose I can take out the passage about the Ersch-Gruber as a compromise. As for the Britannica passage, that does help put the article count in perspective, so I think it should stay. The mention about articles being more broad is a simplification, as Britannica is divided into two sets--one specific, one broad--so I don't think it does the issue justice.--Primetime 08:46, 17 April 2006 (UTC)

I'm not disputing your good faith, but I think your reasoning is absurd. You are clutching at such tiny straws. Any reasonable person can see that there is absolutely zero chance the old German encyclopedia was longer. It is not a margin of 10% or 20% or even 100% or 200%, but of 400%. And that's today, tomorrow it will be more.
The article size comment provides an entirely false perspective. It is obvious to a blind man that it was intended as a criticism of wikipedia (from a pro-Wikipedia perspective) and as a pep talk for wikipedians, but it is an old comment and every day it becomes less meaningful. Let's make some wild assumptions:

  • The average length of Wikipedia's articles on subjects covered by EB as of now is zero words.
  • In the next 24 hours 44 million words will be added on those 85,000 topics and all those words will be better than the articles in EB.

How would the overall average article size comparison work then? Well, if you added 44 million words in aggregate to the current 1.083 million articles that would increase the average article size by less than 41 words. Wikipedia's average would still be lower, but Wikipedia would contain superior versions of everything in EB and another 998,000 articles - many of them pretty good - yet this comparison would still suggest it was worse than EB.
Surely you can see that the only word average comparison which would be of any use would be a like for like; ie the number of words in each work's articles on the same topics. (Including break-out articles, which are the equivalent of the subsections of EB articles. Indeed the way the internet version of EB is set out the subsections might be better as separate articles.) The next million Wikipedia articles will inevitably be on less important topics than the last million (if you dispute that, you will be criticising EB's topic selection, not Wikipedia's). The comparison will simply become more and more meaningless. Trying to get the word average up to the same as EB's should not be a major goal from your point of view, as it can only be done by adding lots of padding to all the articles you like to dismiss as unimportant. The millions of extra articles which will be added are relevant to comparisons between specialist sections of Wikipedia and specialist reference works, but they have nothing to so with comparisons between Wikipedia's coverage of mainstream general topics and EB. Hawkestone 12:53, 17 April 2006 (UTC)

Much of what you just wrote seems like an argument against word count comparisons in general, and therefore an argument for deleting the article. The sentence saying Britannica has twice as much words per article as Wikipedia means that Britannica would have twice as many articles as it does now if they were as short as Wikipedia's. The sentence is obviously very relevant to the topic being treated in the article. Finally, I was compromising with you by not reverting your changes entirely. Are you willing to compromise with me or is the deal off?--Primetime 17:26, 17 April 2006 (UTC)
What deal? Anyway, yes I am against all misleading use of word counts, but I don't think there is anything in the article which should not be there which I have not removed already. Hawkestone 22:22, 17 April 2006 (UTC)
In any case, it appears as if you have violated the Three Revert Rule, which stipulates that you cannot revert the edits to any article more than three times in any 24-hour period. Violation of the rule can result in a block from editing.--Primetime 22:42, 17 April 2006 (UTC)
Resorting to threats just shows the poverty of your arguments. Hawkestone 18:14, 21 April 2006 (UTC)
I wasn't threatening you because I can't block you. I was asking that you not violate the rules. Is that a problem with you? If you don't like the reason this article was created, then nominate it for deletion. I can see you don't have what it takes to admit that you're wrong or compromise with me, so that's why I have stopped conversing with you. For example, you haven't explained why we should include Wikipedia's word count but not mention article length. I'm sure you'll figure out some sort of excuse, but I really don't want to read it.--Primetime 20:13, 21 April 2006 (UTC)
I have provided more detailed reasoning than you and I am waiting for you to have the decency to admit it. I resent your personal attacks. I have now had to add a verify tag temporarily to cover Wikipedia as the current version contains misleading statements. It is false to say that I am saying we should not mention article length as I have left it in the table. What I am saying is that you should not use the statistics in a misleading way.Hawkestone 20:27, 21 April 2006 (UTC)
OK. I'm going to try this one more time. I'm going to remove the statement about the Ersch-Gruber but keep the statement about article length in EB. That way, we can both leave the article alone and not write messages to each other anymore. If it's important enough to keep the statistic in the table, then it's good enough to put in prose, as well. I found it enlightening to read and cutting Wikipedia's article count in half is a dramatic change, I'd hope you'd agree.--Primetime 20:56, 21 April 2006 (UTC)
No it isn't. The number in the table is just a statistic, but the phrase in the text is a biased and misleading criticism of wikipedia on spurious grounds. The comparison is not like with like, and every day it becomes less so. The ground on which you are standing is tissue paper over a void. Will you still be defending this meaningless comparison when Wikipedia has ten million articles. What on earth does it prove? Absolutely nothing! Hawkestone 14:24, 22 April 2006 (UTC)

The comparison was somewhat meaningful once, but it isn't in the least meaningful now

I am not going to back down Primetime. The comment was added on 5 May 2004 when Wikipedia had 239,000 articles, a number not completely out of touch with Britannica's article tally. At that time the comparison made some sort of sense, though even then it was biased against Wikipedia (that was the point, it was part of a pep talk). But now it is utterly meaningless and worthless for any purpose other than denigrating Wikipedia on spurious grounds. As it happens, I agree with you that on average Wikipedia's articles on topics covered by Britannica are not as good as Britannica's (though not by much and many of them are better), but this opinion is not based on or in any way supported by word counts. For that matter it is quite likely that Wikipedia's like for like articles are longer on average. I can point to many which are many times longer. The statement that you are defending is not a comparison of like with like because Wikipedia has a much broader range of content than Britannica. Every day the difference grows wider. You seem to be clinging to the idea that there is an automatic connection between article length and quality, which is absurd. Quality is independent of length, because length is determined mainly by editorial conventions and by the scope of the average topic covered. Why do you persist in defending a worthless, spurious statistic? NPOV does not require bias against Wikipedia. Hawkestone 05:20, 24 April 2006 (UTC)

I'm not going to back down, either--ever. Also, you wrote, "The statement that you are defending is not a comparison of like with like because Wikipedia has a much broader range of content than Britannica." and I disagree that the content in Wikipedia is broader. As I demonstrated through my initial post, it is skewed away from non-controversial, non-technical fields. The purpose of this article is to compare Wikipedia with other encyclopedias, so why did you mention that it was "not a comparison of like with like" other than to gabble? You also wrote, "You seem to be clinging to the idea that there is an automatic connection between article length and quality, which is absurd." so back up your slander with a quote where I say such a thing, you liar. Finally, your it's-worthless slogan you keep chanting is not convincing me at all. This article is here to compare the size of the encyclopedias, and article length puts that in context. I find it disappointing that you are repeating your arguments--possibly to overshadow mine. Your primitive propaganda techniques are useless here, so try something else or go away.--Primetime 06:06, 24 April 2006 (UTC)
I will not debate with you further, but I will continue to revert. You are not capable of being civil. Read Wikipedia:No personal attacks. Hawkestone 06:03, 25 April 2006 (UTC)
  • You're a hypocrite. Here are some of your edit summaries: "You are making a laughing stock of yourself," "You are a polemicist," "Don't just revert dumbly." As for the Britannica statement, it would still be relevant if Wikipedia were 10 million articles long. I find your failure to admit that pathetic. Unlike you, I offered to compromise. But, apparently, no one ever taught you how to do that. When it was written is equally irrelevant as article length will always be relevant to understanding the number of articles in Wikipedia.--Primetime 01:02, 30 April 2006 (UTC)
    • You called me a liar! Removing a patently false statement is not "compromise" but just a basic necessity if you wish to have any credibility as a good editor. Your "response" is just abusive rhetoric and you continue to show a complete inability to engage with the serious and detailed case I have presented. Hawkestone 16:31, 30 April 2006 (UTC)
      • It seems to me that we're both being a bit rude to each other. I presented a pretty-detailed rationale myself for why the EB statement should stay because I believe it quite strongly. In any case, the most you can hope to achieve here is the removal of the statement regarding the Ersch-Gruber. Otherwise, you're just spinning your wheels and facilitating more conflict.--Primetime 04:50, 1 May 2006 (UTC)

I agree that the Ersch-Gruber is smaller than Wikipedia. However, I just read that the largest encyclopedia by far at 11,095 volumes is the Yung-lo ta-tien. Wikipedia, it seems, will never be the largest.--Primetime 03:29, 4 May 2006 (UTC)

  • Hold on. The Chinese People's Daily only says that it's "twelve times larger" than the Encyclopédie. That could mean the number of pages or the physical volume. Also, I will keep your additions to the article-length paragraph in the hopes that doing so will end the edit war. However, you should, in turn, allow the Britannica statement to remain.--Primetime 18:58, 4 May 2006 (UTC)
    • It's down to you to end it. I added the details on the length of the Chinese encyclopedia, not you. I then rewrote the section to make it less controversial (it should never have been a pep talk in the first place), but that was then reverted by a brand new user whose comments suggested a strange familiarity with the debate and with Wikipedia. I have little doubt the new user is your sockpuppet. If this new user shows the same kind of persistence as you in defending the same misleading comment by the same means that really will be too much of a coincidence, so please stick to using one account. Hawkestone 06:40, 12 May 2006 (UTC)
      • The month after this unpleasantness Primetime was given a long term ban for "massive copyright violations" and the use of multiple "abusive sock puppets". Hawkestone 22:21, 20 October 2006 (UTC)

Article is for the main namespace

Just to say that. Maybe some sentences should be changed, but this is a good article and it should go into the main namespace, not inside of Project: namespace. --millosh (talk (sr:)) 23:01, 13 December 2006 (UTC)

100,000,000 edit counts! Yeehah!

 
Have a Toast! Cheers! :)

Again another Wikipedian statistical phenomenon has arrived! However, vandalisms and inactive users aside. First off for mine third compliment and perhaps the forth for this Wikipedia itself, I truly praise, commend and greatly congratulate this English Wikipedia once again for surpassing yet anoher Wiki-record of the One Hundred Millionth (or in figures: 100,000,000) mark of the total Wikipedians' Edit Counts!!! Yet this whopping number of what both users and Wikipedians have made up of this big free encyclopedia ever since July 2002AD and yet they never stop growing (as stated and based on/in the Wikipedian User Statistics)! WOW, what else can I say to express here, man!!? Thus, Congratulations and Kudos to the English Wikipedia! Keep the numbers going and keep on editing and contributing for more! Yaaahooooo!!! --onWheeZierPLot 00:40, 26 December 2006 (UTC)

sources?

Just curious -- I know this isn't in the main namespace -- but how did we determine the average number of words per article among various encyclopedias? If there's a good source, that information, and some of the other statistics here, might be relevant for the main articles on these publications. -- Bailey(talk) 00:46, 31 January 2007 (UTC)

It is rare for a review to give the average number of words per article. So, those stats were determined by dividing the total number of words by the number of articles. The sources for the word counts are the same used for the number of articles. For example, Enciclopedia italiana has a note next to the number of articles giving the source. That article also was used for the number of words. The same I know is true for Britannica, Americana, Italiana, Encarta, and the Great Soviet Encyclopedia.--Tree 'uns 5 01:35, 1 February 2007 (UTC)