Talk:List of dictionaries by number of words

(Redirected from Talk:List of languages by number of words)
Latest comment: 4 months ago by Quercus solaris in topic Collins 730k

Requested move 13 August 2016

edit
The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review. No further edits should be made to this section.

The result of the move request was: No consensus. Since this has been listed separately at AfD, and has run its course here wihtout any definite consensus to move, I see little point keeping it open. If the AfD is closed as a "keep" then this could of course be revisited in a fresh RM if desired.  — Amakuru (talk) 09:45, 22 August 2016 (UTC)Reply



List of languages by number of wordsList of dictionaries by number of words – I don't think there's any determinate sense to the concept of "number of words in a language". If there is one, I'd expect to see it based on the studies about the highest number of words used in non-technical contexts by a certain cohesive community of speakers. What the article lists are dictionaries and the number of words in a given dictionary will depend above all on the thoroughness of the lexicographical research that went into it and also on its scope (how much of the geographical, social and historical variation should it reflect? should it include the technical vocabulary of specialised fields? etc.). Not to mention the question of drawing the line between words worthy of a separate dictionary entry and ones that are (more or less) predictable derivational variants (e.g. in Slavic languages, should each member of an aktionsart pair be included? should there be a separate entry for every -ness noun in English?, etc.). In short, the number of words in a dictionary can't be used as a proxy for the number of words in a language (even if such a thing exists). I'm open to suggestions for a more precise alternative title. Uanfala (talk) 13:50, 13 August 2016 (UTC)Reply

  • I second the nominator's concerns. —  AjaxSmack  20:59, 13 August 2016 (UTC)Reply
  • Basic support/comment: This indeed seems to be strictly based on dictionaries, and your linguistic concerns are completely valid. To use an English-language example, presumably "two" and "twenty" both count as words. But how about "twenty-two"? Or "two thousand twenty-two"? Is every number a word, and thus there are infinite words in the English language? However, it just now occurs to me that perhaps a clearer title would be something like "List of languages by number of dictionary words" or "List of languages by word number in dictionaries" since otherwise your proposed title might be interpreted to mean "A list of English dictionaries comparing their differing numbers of words." Wolfdog (talk) 02:36, 16 August 2016 (UTC)Reply
  • Oppose. A list of dictionaries is a whole different thing than a list of languages. Of course it is very difficult to agree on one measure of the vocabulary of a language, but in the lack of one, I think the requirement of a number of entries in a most authoritative dictionary is a reliable, primary and academic quantity. Is twenty-two an English word? I think it is for primary sources to answer —i.e., did linguists give it a separate entry in reputed dictionaries? Every major language has one or two, either languages with central authorities (such as Spanish) or those without one (like English). If concerns are raised about how the grammar of a lanugage affects the number of entries in such a dictionary or how it is very difficult to actually measure vocabulary with the same standard for all languages, it might well be specified in the body of the article. Cato censor (talk) 12:40, 17 August 2016 (UTC)Reply
@Jbaranao:,@LiliCharlie: pinging relevant contributors to this article to take part in the discussion.Cato censor (talk) 12:43, 17 August 2016 (UTC)Reply
The "number of entries in a most authoritative dictionary" is a reliable , primary andacademic quantity, but what it measures is the number of words in the given dictionary and using it as a measure of the vocabulary of the language is, in my opinion, an unreliable, primary original research on the part of us, wikipedia editors. I don't think the list will be substantially different as a list of dictionaries and I think we could (and should) allow a couple (but not more) dictionaries per language. Uanfala (talk) 14:52, 17 August 2016 (UTC)Reply
Agreed, it would be original research if used as a measure of vocabulary. I see my mistake. Then a valid title for the list could also be List of languages by number of words in authoritative dictionaries, more than one dictionary allowed, especially when there are two that dispute the title of the most authoritative (not the case of Spanish). I still think what makes the list relevant is the comparison of languages, not that of dictionaries. Even so with the underlying specifics driven by grammar and schools of linguists, that should be mentioned in the article whatsoever. Cato censor (talk) 21:01, 17 August 2016 (UTC)Reply
  • Comment. The entire article is of dubious value, even as a "List of dictionaries by number of words", since different dictionaries/languages treat words differently, especially words with multiple meanings, some listing the word as a headword, i.e. as a single entry with the different meanings explained underneath, while others treat each meaning as a separate word/entry, inflating the number of words for that language. The scope of the article also needs to be clarified, since some dictionaries (Oxford English Dictionary, Svenska Akademiens Ordbok, Deutsches Wörterbuch and possibly more) list all words that have been used in print since the 15th-16th Century, most of which are obsolete and understood only by experts, whereas most dictionaries only list contemporary words, or words that have been in common use during the past 100 years or so, and are still understood by a large number of native speakers of that language (such as Svenska Akademiens Ordlista, which only lists words in current common use, and removes words that haven't been in common use since the last edition, even though they're still encountered in litterature, and still understood by a large part of the population; which is why there are ~600K words in SAOB but "only" ~126K words in the latest edition of SAOL). - Tom | Thomas.W talk 15:58, 17 August 2016 (UTC)Reply
  • Oppose . I appreciate the interest! I agree that this is not a perfect metric to measure the number of words in a particular language. Far from it. But the suggested new title changes the whole aim of the article, which is to compare the vocabulary of different languages, not of particular dictionaries. What I have always wanted to know is how rich are the different laguages compared to the others. My native tongue is spanish and almost everybody I have talked to in my area believes that spanish is the richest, notably compared to english, a notion that now is common sense even though largely unsubstained. In my opinion, the main dictionary can be used a proxy. If the decision is to rename the article, I strongly suggest something in the vein of 'list of languages by number of words according to its main dictionary. --Jbaranao (talk) 15:54, 17 August 2016 (UTC)Reply
  • Comment 1. First thing I thought was: Languages can't be listed by number of words when it isn't clear at all what a language is (witness Chinese or Serbo-Croatian) nor what is meant by “word.” Let me stick to Chinese: A Chinese dictionary with over 85,000 characters is listed here, but 1. well over 80% of them have not seen use for centuries or even for millennia (except in dictionaries; they are merely more or less obscure variant characters, used on only one occasion or for never-important proper names, or simply fell into disuse long long ago). — I mean, would you treat French the same as pre-classical Latin? Or consider historical variant spelling like weorold, wuruld, worold, uoruld, wiarald, weoruld, woreld, wurold, wæruld, we(o)relld. weorld, worlde, worlð, wurld, whorlld(e), werlð, werrld, werld(e), warld, warlde, varld, warlede, wordle, wordel, wordil, wardle, wardill, vardil, wardel, vardel, werdle, word, worde, woaude, werd, werde, wird, ward, worl, worle, worlle, orlle, worell, worl', warle, warl', warl to be English words other than the word world? (All of these are historical English spellings, according to the Oxford English Dictionary.); 2. Almost all (at least 99.99%) of Chinese words can be written with less than 5000 characters. All words can, except for a few obscure ones. Only, the Chinese script doesn't mark words by spaces between words. Like the Thai script, although this one is alphabetic (i.e. denoting consonants and vowels). What are we listing? — If you look at the entry for Korean this becomes even clearer: Korean words are separated like ours (i.e. by spaces between them) and by far not all Korean words can be written in Chinese characters. Yet our current list prefers characters over words.
  • Comment 2 will be added before the week is over. Love —LiliCharlie (talk) 20:29, 17 August 2016 (UTC)Reply
  • Comment Certainly the number of character-entries in an ideographic script like that of Chinese deserves special attention. The article should define how to deal with these, either excluding them, or adding a column to clarify. Also, if there is any academic terminology for the treatment of different meanings as different or the same entry (which I'm pretty sure there must be), I think it would interesting to the scope of the list. Cato censor (talk) 21:01, 17 August 2016 (UTC)Reply
1. I don't consider Chinese characters to be ideographic. (Cf. The Chinese Language: Fact and Fantasy by John DeFrancis.) 2. I think the point is whether or not orthographies mark word boundaries. Love —LiliCharlie (talk) 21:15, 17 August 2016 (UTC)Reply

The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page or in a move review. No further edits should be made to this section.

Die verlässliche Quelle in arabischer Sprache ist nicht vollständig. Die Berufung auf mehrere wissenschaftliche Quellen wird verschoben, da die Berufung auf eine Quelle eine Sichtweise bedeutet, und dies ist ein fataler Fehler. Vielen Dank. Starmaster3 (talk) 12:19, 31 January 2023 (UTC)Reply

Dutch is duplicated

edit

Why is Dutch listed twice? 88.159.88.249 (talk) 01:24, 5 May 2018 (UTC)Reply

This is a list of dictionaries, not languages, and there are entries for two Dutch dictionaries. – Uanfala (talk) 11:00, 5 May 2018 (UTC)Reply

Wiktionary?

edit

The English Wiktionary includes over 500,000 entries for English (see the "gloss entries" column on its Statistics page). Should it be included? The number of entries would include affixes, acronyms, idioms, and many archaic words (though not Middle English, which is treated as a separate language). However, the number of entries increases by about 3500 per month, and many of the new words added do not fall into any of the above categories. Not to mention that many of the dictionaries mentioned in this list probably include similar things in their count of headwords. 204.191.85.34 (talk) 01:22, 28 September 2019 (UTC)Reply

I concur, but then we face the problem of deciding which dictionaries should be included at all. Dictionaries like Merriam-Webster and the german Duden also include prefixes. I guess the main aim of this page is to approximate the size of the vocabulary in a given language, relative to other languages.CaffeineWitcher (talk) 18:53, 14 May 2020 (UTC)Reply

CaffeineWitcher No, the purpose of this page is not the size of the word tree in the language. The main purpose of this page is to show their language supreme by nationalist people. They added the Turkish language 3-4 times and There are no 600,000 words in Turkish, according to the source given on the Turkish language website, they have added all of these words, idioms, rumors, but they show them as words in the language on the page. And also in Kurdish just sorani have 90.000 word. And azerbaijani language not have 50.000 word. Human knowledgeable (talk) 23:04, 16 August 2020 (UTC)Reply

@Human knowledgeable: Please provide sources for your assertions, as well as a workable definition of the term word (=headword, see intro) that can be applied to many languages. Love —LiliCharlie (talk) 00:19, 17 August 2020 (UTC)Reply

Büyük Türkçe Sözlük: 616,767 headwords?

edit

The day before yesterday Ghostbatuhan added the Büyük Türkçe Sözlük to the list. I then reverted, and my edit summary was: "Source says the dictionary contains 616,767 "söz, deyim, terim ve ad" (words, phrases, terms and names) rather than 616,767 headwords". The sentence in the source I was referring to is: "Büyük Türkçe Sözlük’te söz, deyim, terim ve ad olmak üzere toplam 616.767 söz varlığı bulunmaktadır."

Today I was re-reverted without a comment. Am I missing something or was my revert justified? Love —LiliCharlie (talk) 19:50, 5 November 2020 (UTC)Reply

Turkish Language Association does not have a dictionary called "Büyük Türkçe Sözlük" (Great Turkish Dictionary). This number 616,767 is the vocabulary consisting of the combination of 11 dictionaries[1][2] belonging to the Turkish Language Association.
Güncel Türkçe Sözlük (Current Turkish Dictionary): 117,000
TDK Bilim ve Sanat Terimleri Sözlüğü (TDK Dictionary of Science and Art Terms): 188,866
Türkiye Türkçesi Ağızları Sözlüğü (Dictionary of Turkish Dialects of Türkiye): 217,736
Yer Adları Sözlüğü (Dictionary of Place Names): 37,424
Kişi Adları Sözlüğü (Dictionary of Personal Names): 9,697
Atasözleri ve Deyimler Sözlüğü (Dictionary of Proverbs and Idioms): 13,605
Türkçede Batı Kökenli Kelimeler Sözlüğü (Dictionary of Western Origin Words in Turkish): 5.321
Türk Lehçeleri Sözlüğü (Dictionary of Turkic Languages): 7.000
Tarama Sözlüğü / Eş ve Yakın Anlamlı Kelimeler Sözlüğü / Zıt Anlamlı Kelimeler Sözlüğü (Old Turkish Dictionary, Dictionary of Thesaurus and Antonyms): 20,000 Canuur (talk) 10:32, 12 August 2023 (UTC)Reply

Why the hell

edit

Some months ago I saw Kurdish in the first lines with around a million words. But you have let users who have not logged in to wipe Kurdish out like we see in edits like [3].

Wow, I never knew Wikipedia lets things like this happen. RealRojSerbest (talk) 10:03, 23 April 2021 (UTC)Reply


If someone wanted to re-add the Büyük Türkçe Sözlük again, please just append it after Kurdish without getting any OCD, phobia or anything like that. Regards, RealRojSerbest (talk).

Century Dictionary

edit

The Century Dictionary (English) is not listed. It is said to contain 500,000 entries. I haven't counted them all. — Preceding unsigned comment added by 65.175.242.16 (talk) 23:35, 14 April 2022 (UTC)Reply

Urban Dictionary not within defined definition?

edit

Is there a reason for which Urban Dictionary is set for the English language? It follows none of the definitions set previously on the page: These figures do not take account of entries with senses for different word classes (such as noun and adjective) and homographs. Although it is possible to count the number of entries in a dictionary, it is not possible to count the number of words in a language. In compiling a dictionary, a lexicographer decides whether the evidence of use is sufficient to justify an entry in the dictionary. 37.171.12.197 (talk) 02:38, 23 September 2022 (UTC)Reply

Thanks for bringing this up! I have removed Urban Dictionary as it does not fit the criteria for this list. SakurabaJun (talk) 01:19, 4 October 2022 (UTC)Reply

my question

edit

How many total word in english i am want 2401:4900:3138:19B1:8D7C:C067:B2D:F36 (talk) 09:33, 8 October 2022 (UTC)Reply

Persian Dehkhoda dictionary

edit

Hello, I was wondering why it says that the Dehkhoda dictionary contains 343,466 entries when the actual Dehkhoda Wikipedia page states that it has 500 000 entries. I think the 343,466 number needs to be updated. — Preceding unsigned comment added by 99.209.41.22 (talk) 15:03, 25 October 2022 (UTC)Reply

Hi! If you can find a reliable source stating the number of headwords, please feel free update the number in the list and add the reference. SakurabaJun (talk) 00:07, 26 October 2022 (UTC)Reply

Güncel Türkçe Sözlük

edit

The 114,767 headwords claimed to be in the Current Turkish Dictionary (Güncel Türkçe Sözlük) are incorrect. Turkish Language Association has published its own dictionary offline mobile application and there are word counts in the database file inside. The total headwords are 93,405. Of these, 60,075 are words, the remaining 32,330 headwords consist of idioms, proverbs and phrases. https://www.linkpicture.com/q/Screenshot-2023-02-04-at-20.44.37.png If anyone wants, they can open the database file in the TDK Turkish Dictionary mobile application and have a look. 81.215.85.193 (talk) 19:07, 4 February 2023 (UTC)Reply

Why was Kurdi removed?

edit

A few months ago, the Kurdish language was ranked first with 1.6 million words, why was it removed? With what policy? Chiako2701 (talk) 04:32, 18 August 2023 (UTC)Reply

There is no dictionary named "Deng Publications Dictionary of all Kurdish dialects (2017)". Instead of this non-existent dictionary, an existing Kurdish dictionary containing the most vocabulary has been added. Canuur (talk) 08:34, 23 August 2023 (UTC)Reply

Contradictory information.

edit

The arrangement in the article presented in English is not the same as that in Arabic. The order varies. 176.70.59.36 (talk) 06:10, 24 August 2023 (UTC)Reply

kindly replace kurdish language

edit

A few months ago, the Kurdish language was ranked first with 1.6 million words, why was it removed? the kurdish languge was ranked as second, Wikipedia knowen as trusted site. Hani.alshiekh (talk) 17:53, 31 August 2023 (UTC)Reply

There is no dictionary named "Deng Publications Dictionary of all Kurdish dialects (2017)". The source of the Kurdish dictionary is incorrectly shown on the Turkish dictionary site. Instead of this non-existent dictionary, an existing Kurdish dictionary containing the most vocabulary has been added. Canuur (talk) 05:17, 2 September 2023 (UTC)Reply

Stop adding non-existent, unsourced imaginary Kurdish dictionaries

edit

@2001:9e8:22b0:a00:d075:7cec:f15e:2fea @2001:9e8:22b0:a00:c96:ad2e:557e:a319 @2001:9E8:22B0:A00:C1AD:E841:3AC0:CC3D @2001:9e8:22b0:a00:2c36:aa90:1073:1e55 @87.123.246.58 @2001:9e8:2286:ee00:e05f:d916:b205:5493 @2001:9e8:2296:6400:7c45:e68f:3f71:e9da @2001:9E8:229F:5C00:449F:2861:72B6:1DB4 @Falcowon Stop adding non-existent, unsourced imaginary Kurdish dictionaries. There is no dictionary named "Deng Publications Dictionary of all Kurdish dialects (2017)". There is no online link or ISBN code source for the Kurdish dictionary, which is claimed to have 735,320 words. Do not remove the existing Ötüken Turkish dictionary that is cited as a source. Stop calling people "racist" instead of picking sources. The Kurdish dictionary with the highest vocabulary in the records is the 4-volume Kurdish-to-Persian and Persian-to-Kurdish dictionary prepared by Majid Rouhani. This dictionary also has 93,000 vocabulary. Canuur (talk) 04:41, 11 September 2023 (UTC)Reply

Why are u removing the Kurdish language out of something everyone knows why, fortunately I informed an admin about this, they re-added Kurdish language back. Falcowon (talk) 02:37, 4 October 2023 (UTC)Reply
The Kurdish dictionary that everyone knows does not actually exist. There is no dictionary named "Deng Publications Dictionary of all Kurdish dialects (2017)". There is no online link or ISBN code source for the Kurdish dictionary, which is claimed to have 735,320 words. If available, show the ISBN code or PDF file version of this book. Canuur (talk) 15:20, 4 October 2023 (UTC)Reply

Kurdish Wiktionary, which claims to have a vocabulary of 913,077

edit

There are hundreds of thousands of entries in the Kurdish Wiktionary 913,077 word list that have nothing to do with Kurdish. In particular, there are Turkish idioms and expressions as well as grammatical structure. for example: allah'a bir can borcu var, allah'tan kork, allah'tan korkmak, allah'ın belası, allah'ın gazabı, allah'ın işine bak, allah'ın kulu, allah'ın öküzü, allah'ından bulsun, allah'ını seversen 1 Or there are German or Polish names aufsätze, auftragsmörderinnen, aufwand, aufzählungszeichen, augenhöhle, augusiewicz, augustynowicz, ausfuhrverbot, ausschlüsse 2 Or there are French, Portuguese, Spanish names bélanger, bélgica, bélinda, bélisaire, béliveau, bénigne 3 Or there are English, Italian names carwell, cattanach, casagrande, casaleggio, casciano, caspian roaches 4 Or there are Cyrillic or Georgian suffixes 5 As a result, how correct is it to present the vocabulary created by importing from other languages as "Kurdish words"? Canuur (talk) 08:39, 25 September 2023 (UTC)Reply

There are thousands of Arabic and Persian words in the Turkish dictionary, mind checking these too? Falcowon (talk) 02:35, 4 October 2023 (UTC)Reply
Borrowed words in Turkish have become Turkish and words take suffixes according to Turkish grammar rules. However, the expressions and idioms on the Wiktionary Kurdish page were added according to the Turkish sentence order rules. For example "allah'ın işine bak" 1 How can you accept this Turkish expression as a "Kurdish word"? I provided many examples in the comment above. German, Italian, Spanish, French, Polish, Portuguese proper names and city and region names that are not used in Kurdish have been transferred to the Wiktionary Kurdish page and claimed to be "Kurdish words". This is wrong. Even words written in Cyrillic are available on this Wiktionary Kurdish page. Now will we accept the "-тарин" suffix as Kurdish? 2 Therefore the number 913,077 is not correct. Canuur (talk) 15:32, 4 October 2023 (UTC)Reply

Collins 730k

edit

Collins claim to have 730k in their big-ass flagship. Huge if true, as they say. I might add it to this article (citing the ref) if the spirit moves me. Quercus solaris (talk) 04:36, 30 March 2024 (UTC)Reply

There's no way - read carefully : "More than 730,000 words meanings and phrases". Not just headwords, but also meanings and also phrases. I make about 60 headwords per page and at 2300 pages Collins is lucky to make 150,000 to 200,000 words. Try to find even one page with 240 headwords on it - that's what it would need to average to make 730k by the end. 76.71.23.238 (talk) 23:41, 29 July 2024 (UTC)Reply
D'oh. Good catch. Thanks for checking it harder than I did. Quercus solaris (talk) 01:09, 30 July 2024 (UTC)Reply