Template talk:Wikipedia languages/Archive 6

Archive 1Archive 4Archive 5Archive 6Archive 7Archive 8

Hindi Wikipedia

I would like to inform you that the Hindi Wikipedia has much more than 50,000 articles due to which I think that it should be included on the main page by adding it to this template. --Tow Trucker talk 07:57, 27 May 2012 (UTC)

As noted in the template's documentation, "this is not a complete list of Wikipedias containing 50,000 or more articles; Wikipedias determined to consist primarily of stubs and placeholders are omitted."
The Hindi Wikipedia has been evaluated in the past. I just performed our standard 50-article sample again, yielding 46 stubs/placeholders, three short articles and one long article. —David Levy 01:01, 28 May 2012 (UTC)
OK. Thank You. --Tow Trucker talk 01:12, 28 May 2012 (UTC)

WHY 700000?

This is inconsistent with Template:Wikipedias. — Preceding unsigned comment added by Ibicdlcod (talkcontribs) 03:19, 24 August 2012 (UTC)

Edit request on 15 November 2012

Uzbek Wiki

Could we add the Uzbek Wikipedia to the list of Wikipedias with more than 50,000 entries? The Uzbek Wikipedia is currently blocked in the territory of Uzbekistan. Listing uzwiki on the main page of enwiki would help us spread the word about the blockage. Nataev (talk) 10:32, 15 November 2012 (UTC)

  Not done: I added this, and then thought better of it and removed it again. When I tried the 50-article test as described further up on this page, I came up with 49 stubs and 1 short article, so it doesn't look like the Uzbek Wikipedia qualifies for a place here for now. Sorry. — Mr. Stradivarius (have a chat) 11:02, 15 November 2012 (UTC)
OK, fair enough. Since uzwiki is blocked in Uzbekistan and the country has very low rates of Internet penetration, we cannot really expect the quality of articles to be high. For now. Hopefully uzwiki will improve as time goes by. Nataev (talk) 13:12, 15 November 2012 (UTC)

Malay Wikipedia

Salam / Hello. Hopefully someone can change the link for Malay Wikipedia from "More than 50,000 articles" to "More than 150,000 articles" section. Thank you - 26 Ramadan (talk) 14:27, 14 December 2012 (UTC)

  DoneMr. Stradivarius (have a chat) 15:54, 14 December 2012 (UTC)

Horizontal lists

Per WP:HLIST, shouldn't this template use {{flatlist}}? Gorobay (talk) 20:06, 14 December 2012 (UTC)

Yes. I've mocked somehting up in the sandbox as a starter. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:58, 25 January 2013 (UTC)
I have modified the sandbox. Now it looks exactly the same as the current version, but it uses hlist. Gorobay (talk) 02:28, 26 January 2013 (UTC)
It's a mix of hlist and non-hlist (not an anticipated use). The current code should work fine though. Edokter (talk) — 11:13, 26 January 2013 (UTC)
Nice work, thank you. Let's implement it. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:39, 26 January 2013 (UTC)
  Done. Edokter (talk) — 15:02, 26 January 2013 (UTC)


Million plus

Please add another row, above the 750,000 line, with:

More than 1,000,000 articles: Deutsch Français Italiano Nederlands

formatted like the existing lines, and remove those from the 750,000 line; per meta:List of Wikipedias#1 000 000+ articles. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:39, 25 January 2013 (UTC)

Done! Ed [talk] [majestic titan] 22:03, 25 January 2013 (UTC)
Thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:51, 25 January 2013 (UTC)

Missing languages

Hello there, sorry, I couldn't find out how to start a new topic/request. I'd like to point out that several languages are missing from the 50,000-200,000 category, for example Georgian, which now has 69,000 articles. (Or is this intentional?) Thank you! — Preceding unsigned comment added by 176.241.48.244 (talk) 21:46, 31 January 2013 (UTC)

I moved your message to a new section. In the future, you can create one via the "+" link at the top of the page (between "Edit" and "View history").
Yes, the omissions are intentional. As noted in the template's documentation, "this is not a complete list of Wikipedias containing 50,000 or more articles; Wikipedias determined to consist primarily of stubs and placeholders are omitted."
Some Wikipedias contain large quantities of essentially empty articles, which artificially inflate their counts. So we review them to enure that this isn't the case. —David Levy 21:44, 1 February 2013 (UTC)

Spannish million

Please move the Spanish Wikipedia (español) to the top line; they now have a million+ articles; see es:Especial:Estadísticas. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:55, 17 May 2013 (UTC)

I actually bumped it up before I saw this request. Congrats to Spanish Wikipedia for reaching this landmark. --ThaddeusB (talk) 20:56, 18 May 2013 (UTC)

Should Hindi Wikipedia be included on the mainpage?

Should our mainpage include an interwiki link to the Hindi Wikipedia? ThaddeusB (talk) 02:33, 19 May 2013 (UTC)

Some statistics

Basic stats
Language Language (local) Wiki Articles Edits Active Users Depth Native Speakers (million)
Malay Bahasa Melayu ms 218667 3549558 346 18 77
Bulgarian Български bg 147582 6078163 756 29 6.8
Estonian Eesti et 111292 3744442 486 31 1.05
Hindi हिन्दी hi 105420 2277175 210 45 180
Serbo-Croatian Srpskohrvatski / Српскохрватски sh 81325 1661259 216 20 19
Tamil தமிழ் ta 53465 1485352 253 32 70
50 article test stats

I used each Wikipedia's random article button to view 50 articles and tabulated the results. (All are listed Wikipedias except Hindi.) I counted # of articles with pictures since it is somewhat more effort to add a picture and less likely to do done with a bot. Geographic stubs (including 1-2 sentence articles) were counted since these are especially easy targets for rapid stubbing. --ThaddeusB (talk) 18:23, 19 May 2013 (UTC)

Language Full Short Stub 1-2 sentences
or just headers
Dab List w/Pics w/Maint. tags Geo stubs
Malay 2 2 10 35 1 16 2 35
Bulgarian 4 5 32 5 4 20 9
Estonian 1 5 32 10 1 1 11 4 12
Hindi 1 3 20 26 19 1 18
Serbo-Croatian 3 7 24 14 2 20 7
Tamil 1* 6 17 23 2 1 18 2 3

Tamil's sole long article was tagged as a machine translation. Tamil avoided the geo stub temptation, but made up for it with tons of 1-2 lines on cricket players (in my test).

estimates of total articles in various classes

To be taken with a grain of salt as margin of error on a sample size of 50 is ~13.8%

Language Decent articles Stubs articles+stubs article+stub+1-2, excl. Geo
Malay 17500 43700 61200 61200
Bulgarian 26600 94000 121000 109000
Estonian 13300 71000 84600 80100
Hindi 8400 42000 50600 67500
Serbo-Croatian 16300 39000 55300 66700
Tamil 7500 18000 25700 47000
Conclusions
  • All tested wikis in this range were 20% or less decent articles, so merely looking at 50 articles and concluding most are stubs is an insufficient test
  • With then possible exception of Bulgarian, quality difference are too small to be conclusive given the sample size
  • Tamil is especially illustrative on why I feel Hindi should be included. It is from the same region of the world as Hindi, has less pages overall, and has similar quality in the articles in does have. In absolute numbers, Hindi likely has equal or more decent articles. Hindi also has more native speakers.

--ThaddeusB (talk) 18:23, 19 May 2013 (UTC)

2012 comment by Eukesh

This is regarding the need of a better set of criteria for wikipedias to be listed in the main page and be judged in general (to aid in improvement of quality of wikipedia). I compared a set of articles in simple English (~90,000 articles and listed) and Hindi (~100,000 and unlisted). The method I used was to check the 10001st, 20001st, 30001st, 40001st, 50001st, 60001st and 70001st longest articles of the two wikipedias. After 70,000th articles, most of the articles were very short in both wikipedias. So, they were not compared. Here are the articles that I got

Based on this, I think that there is very less difference in length of articles between the two.

Also, out of the first 500 longest articles in the two, there were 158 "list of" articles (and numerous articles like simple:2012 in movies which are non-featured lists with very less paragraphic contents) in simple and only 8 "सूची" (list related) articles in Hindi (Please correct me if there are more lists in longest 500 articles in Hindi) which infers that there are more content based articles in longest 500 articles in Hindi than Simple.

About quality of wikipedia, regarding stubs and placeholders, I think there is essentially no difference whether they are generated by bots or not. Much more than that, I think a better judgement can be reached by comparing the quality of list of articles every Wikipedia should have. What is the use of an encyclopedia with pages this long for a country and comparatively this long for a dance?

At this point, I would also like to request the community here to have objective written standards based on reproducible numbers (featured articles, good articles) and statistics for judging the quality of wikipedia than just random 50 articles (of which some wikipedias are exempt and others not based on some other arbitrary criteria) which, in all fairness, looks like bias. Thank you.--Eukesh (talk) 19:20, 18 December 2012 (UTC)

Why not? I don't understand the relevance of all the above statistics. What do they mean?GeorgeLouis (talk) 03:31, 19 May 2013 (UTC)
One problem with this method is that Wikipedia stores text in UTF-8, which is twice as large for Devanagari (Hindi script) text than English. So these comparable numbers actually indicate that the Simple English articles are about double the size of the Hindi. VanIsaacWS Vexcontribs 15:18, 4 June 2013 (UTC)

On the rather thin evidence of a score of random button presses, I see only 30% that are more than a one-liner (85% for en.wiki, 60% of gl.wiki, 50% for simple.wiki, 40 % for gd.wiki, 35% for lij.wiki, 25% for ang.wiki) so it is clearly at a very early stage, especially considering the potential readership. So the question is probably: What is the purpose of listing at the Main Page? To bring attention and thereby fuel growth, or to reward the work done already? The Main Page already features nboth approaches (Featured articles and DYK). I'd opt for the draw attention so that it might grow in this case, especially as there will be much English-Hindi bilingualism. Kevin McE (talk) 09:24, 19 May 2013 (UTC)

So the question is probably: What is the purpose of listing at the Main Page? To bring attention and thereby fuel growth, or to reward the work done already?
Heretofore, it's neither of those. Past discussions have consistently indicated that our goal is to link to the best Wikipedias — not as a reward (though this can be considered a side effect), but as a service to Main Page visitors seeking such content. —David Levy 09:43, 19 May 2013 (UTC)

Further discussion

  • Looking at the past history of this page and Template talk:Main Page interwikis, I noticed Hindi appears to be the most requested interwiki link (and possibly only language that has been requested multiple times in recent history). I checked and it appears to have the highest depth score (higher than several listed WPs, but the score can be deceptive) and User:Eukesh makes an excellent argument (see above). Its been about 18 months since any link was added, so I went ahead and BOLDly added Hindi. David Levy reverted me, so now its time to start a discussion... I support the inclusion of Hindi per my comments above & because it is "approximately the sixth largest language" in terms of number of native speakers worldwide (according to Hindi). --ThaddeusB (talk) 02:13, 19 May 2013 (UTC)
    Hindi certainly isn't the only language that has been requested multiple times. (On occasion, we've even received multiple requests from editors sent here via a discussion at the relevant Wikipedia.) As you noted, Hindi is one of the world's most widely spoken languages, so it makes sense that it would be among the most requested.
    Unfortunately (and perhaps due to India's poverty and illiteracy rates), the Hindi Wikipedia's quality is lacking. When I performed our standard 50-article test tonight, I saw 45 stubs and placeholders.
    A placeholder (such as this one) is an article consisting primarily of headings for empty sections, typically machine-generated. The Hindi Wikipedia contains a great many placeholders, whose existence dramatically skews (and renders essentially meaningless) the statistics cited by Eukesh (because the aforementioned headings artificially inflate the quantity of characters without adding any useful information).
    As you noted, a Wikipedia's depth score can be deceptive (due, in part, to the widespread use of multiple bots performing trivial edits). That's why we stopped considering it years ago.
    As discussed in the past, establishing "written standards based on reproducible numbers and statistics" would be counterproductive, as it would encourage Wikipedias to focus their efforts on the technical fulfillment of said criteria. This isn't speculative. When we relied on article counts alone, Wikipedias organized drives to create essentially empty articles (both manually and via the use of bots), specifically for the purpose of crossing our numeric threshold.
    "Featured article" and "good article" already have no consistent definition from one Wikipedia to the next (and therefore cannot be relied upon as relevant standards, let alone ones that Wikipedias wouldn't simply lower until x articles qualified). —David Levy 03:11, 19 May 2013 (UTC)
    The requests bit was only as way of explanation as how I came upon this, not really an argument and this RfC is only about Hindi, not developing written standards... I did a bunch of random articles on Simple and they were mostly stubs too. I didn't count, but the 10,000th largest article as per above is a stub so it is probably ~90% stubs. I suspect the same is true or the majority of the 50k Wikipedias listed. As you note, pretty much any stat can be "faked". However, the number of speakers is certainly something that can't be faked and I think it should be considered. They also appear to have an active community as evidenced by a regularly updated mainpage. I guess that could be faked too, so we can't explicitly rely on it, but it seems like a reasonable thing to consider.
    The placeholders are a concern; let's say they are 10% of Hindi. That leaves them at 90,000 stubs+articles. Should they be penalized because some idiot(s) used a bot to create a bunch of useless articles? I personally don't think so. --ThaddeusB (talk) 03:31, 19 May 2013 (UTC)
    I did a bunch of random articles on Simple and they were mostly stubs too.
    The simple English Wikipedia is a highly unusual case. Its very nature (written in English, but with limitations imposed) tends to discourage the creation of highly developed articles, but it's been argued that it's exceptionally useful to non-native English readers (for whom even stubs can bridge gaps in understanding and assist in translation efforts). But this is controversial (and substantial criticisms have arisen in various discussions), so we probably should discuss the possibility of removing the simple English Wikipedia from the list.
    As you note, pretty much any stat can be "faked". However, the number of speakers is certainly something that can't be faked and I think it should be considered.
    Were we to consider such matters, the number of Hindi readers accessing our main page would be more relevant. But we'd also need to consider the fact that English is an official language of India. (If a Hindi reader also is fluent in written English, the Hindi Wikipedia's utility to him/her is highly questionable.)
    The placeholders are a concern; let's say they are 10% of Hindi.
    The percentage appears to be much higher than that. Regardless, we generally don't distinguish between placeholders and stubs. (I did so in response to Eukesh's comments regarding byte counts.)
    Should they be penalized because some idiot(s) used a bot to create a bunch of useless articles?
    Many/most of the smaller Wikipedias' stubs are bot-generated too (typically based on lists of places, years, etc.) and of similarly low value. I don't ascribe idiocy or regard our response as a penalty. Stubs and placeholders aren't inherently harmful. They simply aren't very helpful to readers (whose experiences we consider when determining which Wikipedias to list). Abandoning our current approach would effectively penalize Wikipedias for not creating stubs and placeholders. —David Levy 04:12, 19 May 2013 (UTC)
    Simple may be an odd case. I'll try comparing to some other 50k Wikipedias tomorrow. As to the last point, the "50 article test" implicitly penalizes Wikipedias for having stubs. If you have 200k articles and 90% are stubs, that is 20k "better" articles, but 5/50 in a test will be such articles. If you have 50k articles and 60% are stubs, that is same 20k better quality articles with 20/50 showing up in a 50 article test. --ThaddeusB (talk) 05:08, 19 May 2013 (UTC)
    As to the last point, the "50 article test" implicitly penalizes Wikipedias for having stubs.
    No, it prevents us from rewarding Wikipedias for creating stubs and placeholders that artificially inflate their article counts above our minimum inclusion threshold (effectively penalizing the smaller Wikipedias that don't).
    A Wikipedia containing 45,000 stubs/placeholders and 5,000 "better" articles receives the same treatment as one containing no stubs/placeholders and 5,000 "better" articles. If we were to consider only the article counts (as we did originally), the former would receive preferential treatment over a Wikipedia containing 45,000 "better" articles and no stubs/placeholders. (Of course, that wouldn't last long, as the latter Wikipedia would be encouraged to create 5,000 stubs/placeholders.)
    If you have 200k articles and 90% are stubs, that is 20k "better" articles, but 5/50 in a test will be such articles. If you have 50k articles and 60% are stubs, that is same 20k better quality articles with 20/50 showing up in a 50 article test.
    Right. That's why both hypothetical Wikipedias would be included on our list. We don't apply the 50-article test to the higher tiers. As discussed in the past (sometimes in response to the argument that the English Wikipedia contains a high percentage of stubs), we care about the absolute quantity of "better" articles, not the relative quantity. For the lowest tier, we measure the latter purely as a means of gauging the former. —David Levy 05:56, 19 May 2013 (UTC)
    There are two Wikipedias with over 300k articles (ceb & war) that aren't listed because they are exceptionally poor quality, so clearly quality considerations still apply when 200k is hit (as well it should). The intent may be to judge absolute # of quality articles, but if you are gauging that only on the %age found in a 50 sample you are preferencing Wikipedias closer to 50k articles than 200k. A 100k Wikipedia needs twice as many decent articles as a 50k Wikipedia to get the same percentage; A 200k Wikipedia needs 4x as many decent articles. You are penalizing Wikipedias for having stubs, at least if they are between 50k and 200k total articles. --ThaddeusB (talk) 15:40, 19 May 2013 (UTC)
    There are two Wikipedias with over 300k articles (ceb & war) that aren't listed because they are exceptionally poor quality, so clearly quality considerations still apply when 200k is hit (as well it should).
    Indeed. It's unusual for higher-tier Wikipedias to be called into question, but it does occur. (Bots needn't stop creating articles when they reach 50,000, after all.)
    The intent may be to judge absolute # of quality articles, but if you are gauging that only on the %age found in a 50 sample you are preferencing Wikipedias closer to 50k articles than 200k. A 100k Wikipedia needs twice as many decent articles as a 50k Wikipedia to get the same percentage; A 200k Wikipedia needs 4x as many decent articles. You are penalizing Wikipedias for having stubs, at least if they are between 50k and 200k total articles.
    You seem to be under the impression that we simply check 50 articles and exclude the Wikipedia if x are stubs/placeholders. Were that so, your assessment would be accurate. But it isn't.
    We have no formal cutoff point. (As discussed above, such a criterion would entourage Wikipedias to game the system.) In borderline cases, we consider various factors (including, but not limited to, the Wikipedia's total size). When in doubt, our general practice is to err on the side of inclusion. (We omit Wikipedias obviously in poor shape.) —David Levy 16:09, 19 May 2013 (UTC)
    You are the one who called the 50 article test the standard way of deciding, not me. Hindi is not "obviously" in poor shape any more than many of the listed Wikipedias - all 6 in my test are in the 80-90% stubs and below range. --ThaddeusB (talk) 18:49, 19 May 2013 (UTC)
    You are the one who called the 50 article test the standard way of deciding, not me.
    At no point did I state that we simply gauge the relative percentage of stubs/placeholders and omit the Wikipedia if it's above a certain figure. You apparently inferred that, so I've explained that you were mistaken.
    Hindi is not "obviously" in poor shape any more than many of the listed Wikipedias - all 6 in my test are in the 80-90% stubs and below range.
    I just noticed that the Tamil Wikipedia was inserted by Sundar (a bureaucrat there), who evidently was under the impression that it automatically qualified for inclusion when it reached 50,000 articles (despite the documentation's explanation to the contrary). I've removed it from Template:Wikipedia languages. (For some reason, it wasn't added to Template:Main Page interwikis.)
    According to your analysis, all of the other Wikipedias tested contain significantly more "decent articles" than the Hindi Wikipedia does. —David Levy 12:36, 20 May 2013 (UTC)
    You say the 50 article test is just a gauge, yet have supplied exactly zero other criteria that are considered. --ThaddeusB (talk) 20:50, 20 May 2013 (UTC)
    As I noted, we have no formal criteria, which would encourage Wikipedias to game the system (as in the past).
    When the percentage of stubs/placeholders is overwhelmingly high, we rarely look beyond the 50-article sample. In borderline cases, we look for signs of relative quality as deciding factors. That's when we value proper stubs over placeholders and multi-paragraph stubs over smaller ones. That's when we look for images. It's subjective, but as I said, we err on the side of inclusion.
    You're welcome, of course, to propose an alternative evaluation method. —David Levy 21:42, 20 May 2013 (UTC)
  • Go for it. Seems reasonable. --Jayron32 02:42, 19 May 2013 (UTC)
    On what do you base this determination? Have you examined the Hindi Wikipedia's articles? Are you aware that Eukesh actually cited placeholders (articles whose character counts are artificially inflated via the inclusion of machine-generated headings for empty sections) as evidence that the Hindi Wikipedia's articles are superior in size? —David Levy 03:11, 19 May 2013 (UTC)

I do agree that there are concerns for "penalizing" Wikipedias that have aggressively created a ton of stubs, but the way around that is simply to take a larger sample of the random article test. I just did an independent 51 random article sample, of which there appeared to be 1 full article (hi:गुप्त ऊर्जा), perhaps 2 short but decent-looking articles, 47 placeholders / stubs that had only a sentence or two and often empty headings, and 1 piece of apparent spam that had been ignored (hi:अध्याय २ साख्यंयोग). That doesn't fill me with confidence that there are in fact 20K "real" articles hiding somewhere in Hindi WP. SnowFire (talk) 05:27, 19 May 2013 (UTC)

The (expected) percentage of stubs will be the same whether you take 50 samples or 100 or 200. (The margin or error will be smaller from sample size will decrease though.) Relying on %age has the same problem regardless... I doubt any of the 50-200k Wikipedias have 20k decent articles. --ThaddeusB (talk) 15:40, 19 May 2013 (UTC)
Reducing the error margin is exactly the point, yes. If Hindi WP had a million articles after some extremely aggressive bot campaign, the difference between "20k decent articles" and "2k decent articles" is the difference between a 2% rate of non-stub and a .2% rate of non-stub. So a 50 article check that finds one-non stub isn't very reliable for convincing people that the rate really was 1/50 = 2% = 20k decent articles in hypothetical 1 million article Hindi WP. My point is still that there's a way to not penalize editions with a ton of stubs by taking a large sample size to get a more accurate guess of the proportion of real articles.
Even if standards are loosened, I don't think Hindi WP hits 10k non-stub articles, though. And I suspect that plenty of the 50k-200k article tier can at least manage 10k non-stubs. SnowFire (talk) 17:00, 19 May 2013 (UTC)
The minimum cut of is definitely less than 20k as is (see stats above). My test yielded 4/50 as non-stub for Hindi; David Levy found 5/50 in his test (according to edit summary). 10% (or just under) would yield 10k decent articles which seems to be closer to the actual minimum at current (Tamil has 7500 est), so Hindi is borderline by the numbers. Considering it is (by far) the most widely spoken language that meets the 50k requirement and is not included, that should be more than enough to push it over the edge, IMO. --ThaddeusB (talk) 18:46, 19 May 2013 (UTC)
As explained above, the Tamil Wikipedia's insertion was inappropriate and has been reverted. I'd have done this in December if I'd noticed at the time. (You stated above that "it's been about 18 months since any link was added", so you evidently overlooked Sundar's edit as well.) —David Levy 12:36, 20 May 2013 (UTC)
I looked at Template:Main Page interwikis to get the 18 month figure, which apparently never listed Tamil. --ThaddeusB (talk) 20:50, 20 May 2013 (UTC)
The possibility occurred to me. Sundar apparently missed that part of the documentation too. —David Levy 21:42, 20 May 2013 (UTC)
  • I just spent several hours testing 5 Wikipedias on sizes from 50k to 200k that are listed, plus Hindi. The stats are above. The take away is that all of them are mostly stubs or worse. Hindi appears to be on roughly the same level as the others it most categories. --ThaddeusB (talk) 18:49, 19 May 2013 (UTC)
    According to your analysis, all of the Wikipedias tested (excepting the Tamil Wikipedia, which I've removed) contain significantly more "decent articles" than the Hindi Wikipedia does. —David Levy 12:36, 20 May 2013 (UTC)
    Actually, given the margin of error, the differences are insignificant. I found 8% decent articles on Hindi in my test, but with at 13+% margin of error, the true figure could easily be as low as (essentially) 0% or as high as 21%. On Estonian, for example, I found 12% but that could easily be 0-25%. There is very little material difference between 0-21 and 0-25. It is possible Hindi is substantially inferior, but that conclusion can not be reached by a 50 article sample. Furthermore, if you consider proper stubs as useful (they should be useful), but exclude placeholders as useless (a good idea), only Bulgaria is obviously superior among the tested wikis. We either need a better test, or your statement that only Wikipedias that are obviously inferior are denied is false. --ThaddeusB (talk) 20:50, 20 May 2013 (UTC)
    Actually, given the margin of error, the differences are insignificant.
    Feel free to take larger samples. Note that disambiguation pages traditionally aren't counted.
    We either need a better test,
    Feel free to propose one.
    or your statement that only Wikipedias that are obviously inferior are denied is false.
    I wrote "obviously in poor shape" (emphasis added). Perhaps our current methodology's limitations have led to the erroneous listing of Wikipedias when it wasn't obvious that they were in such poor shape. (As I noted, we err on the side of inclusion.) —David Levy 21:42, 20 May 2013 (UTC)
  • If we are going to keep denying requests on the basis of 50-article test, we should apply this test to all languages currently listed on the Main Page, and remove those for which a request to add it would be denied if the link was not already present. This will make the process seem a lot fairer and less arcane to editors coming here with their requests. — This, that and the other (talk) 11:49, 20 May 2013 (UTC)
    No Wikipedia's inclusion is grandfathered. When such deficiencies are pointed out, we act on this information. —David Levy 12:36, 20 May 2013 (UTC)
    I don't think I was clear enough; what I meant is we should actually go and carry out the 50-article test on each wiki currently listed on the main page, and act on the results we get. — This, that and the other (talk) 08:07, 22 May 2013 (UTC)
    You're more than welcome to, but I believe that every realistically borderline Wikipedia has been checked at some point. (The Tamil Wikipedia was an exception, but that error has been rectified.)
    Because we seek to gauge the absolute (not relative) quantity of non-stub/placeholder articles, it's highly unlikely that any Wikipedia previously deemed appropriate for inclusion would be rejected under the same standard. (Whether a different standard should be applied is a separate matter.) —David Levy 08:52, 22 May 2013 (UTC)
    In your test of Hindi, you found 5/50 non-stubs are rejected it. In my test of Estonian (which has a very similar number of total pages) I found 6/50 non-stubs. Most likely Estonian would have been rejected if it had been newly added and you got those results. Given the variance of a 50 sample, I think it is near certain some Wikipedias currently listed would be rejected in a fresh test. --ThaddeusB (talk) 13:44, 23 May 2013 (UTC)
    I personally evaluated the Estonian Wikipedia in the past, and I just did so again. My fifty-article sample yielded nine non-stubs (by my standard, which might differ from yours), along with numerous large stubs (many with multiple paragraphs/sections) and zero placeholder articles (but one article containing a couple of placeholder sections).
    The Estonian Wikipedia clearly is in much better shape than the Hindi Wikipedia is. Do you dispute this? —David Levy 14:25, 23 May 2013 (UTC)
    No, it was probably my sample that was out of the ordinary. --ThaddeusB (talk) 03:54, 25 May 2013 (UTC)
    I'm pretty sure every 50k Wikipedia would be denied if it was not already listed, as every one was at least 80% stubs in my test. --ThaddeusB (talk) 20:50, 20 May 2013 (UTC)
    We have no such acceptance threshold. —David Levy 21:42, 20 May 2013 (UTC)

By that measure, the following changes would be made: Apteva (talk) 23:27, 20 May 2013 (UTC)

lc language speakers views articles action
eo Esperanto 1 M 7363 178052 remove
eu Basque 1 M 6446 150260 remove
nn Nynorsk 5 M 6050 99830 remove
gl Galician 4 M 5280 101161 remove
kk Kazakh 12 M 12730 199976 add
ka Georgian 4 M 9957 74020 add
hi Hindi 550 M 8231 97151 add
bs Bosnian 3 M 8127 32466 add
az Azeri 27 M 7986 94755 add
lv Latvian 2 M 7974 47756 add
Interesting proposal. It certainly would make things objective without giving other Wikipedias a way to (easily) fake their way on to our mainpage. For the record, Icelandic isn't actually listed at current. --ThaddeusB (talk) 00:13, 21 May 2013 (UTC)
Corrected. Somehow I mixed up Lithuanian with Icelandic. Apteva (talk) 00:43, 21 May 2013 (UTC)
This is somewhat similar to how www.wikipedia decides what to put in their top 10 (or however many) that they put around the globe. Nil Einne (talk) 07:00, 21 May 2013 (UTC)
Views *are* quite easily faked if people use them for anything important. The real viewership is a very important statistic, so let's not give people incentives to game it. (A single student at a university who writes a script to ping their favorite language's wikipedia repeatedly using the Uni's fat pipes can be pretty notable.) The "amount of real articles, estimated" metric with a dash of "native speakers" is definitely the best guess, I think. SnowFire (talk) 17:59, 21 May 2013 (UTC)
Does the "usage (views per hour)" statistic exclude bots? —David Levy 07:29, 21 May 2013 (UTC)
For now it includes bots, but excluding them is planned.[2] Basically the easiest method would be just to figure out the percentage of edits that are bots (roughly 15% on en), and adjust the counts accordingly. Apteva (talk) 23:19, 21 May 2013 (UTC)
Presumably, most hits come from readers, so I hope you don't mean just knocking 15% off the total hits. --ThaddeusB (talk) 00:51, 22 May 2013 (UTC)
Indeed, the figure wouldn't carry over to page views. But bots do load pages without editing them, so simply subtracting the number of bot edits from the number of page views wouldn't be accurate either.
And as SnowFire noted, adopting such a criterion would encourage the artificial inflation of page view counts, much like the "Let's get on the English Wikipedia's main page by creating empty articles!" drives that were organized when we considered article counts alone. Actually, this would be worse, as individual users could do it on their own, with no approval from the various Wikipedias' communities (which would be powerless to stop them). —David Levy 02:14, 22 May 2013 (UTC)
If this metric is used, the correct thing to do would be to compile a table called "estimated user page views/week", and fudge the numbers at will to try to remove known errors. For example, if we get a count of 100,000 from the server log, and know that roughly 16.9% are bots and students hitting the server to inflate the count, we enter 83,100 into the table and go on to the next row. But in the interest of openness include a note or a column for that. Apteva (talk) 00:27, 23 May 2013 (UTC)
How would we determine the approximate percentage of page views stemming from bots and deliberate attempts to inflate the count? —David Levy 00:39, 23 May 2013 (UTC)
This page has a list of bot counts. Some of the entries are possible bots. As to attempts to inflate the count, some of those would be easy to spot, and some hard. We use grok wiki stats extensively for deciding primary topics, and there are often spikes, sometimes because a page was linked from the main page, sometimes for unknown reasons. Sometimes I just go back a year and look to see what the stats looked like back then. Apteva (talk) 01:01, 23 May 2013 (UTC)

hi all this link may give some overall idea about indian languge wikipideas http://shijualex.in/analysis-of-the-indic-language-statistical-report-2012/ — Preceding unsigned comment added by 112.135.223.2 (talk) 07:03, 22 May 2013 (UTC)

  • The diplomatic thing to do is to look at the above table, and add kk and hi and not remove anything. Apteva (talk) 01:08, 23 May 2013 (UTC)
  • Hi all. Seems like the discussion has been moved in more progressive direction and there are already some users looking for alternative ways to test the qualitative shape of the smaller Wikipedias, typically ranging from 50k to 100k articles, and thus replace the old-fashioned "50-article test" which exhibits a volatile success with statistical error that may easily lead to a faked outcome. I was ready to propose similar changes some time ago, but wasn't able to do so because of the lack of time. Now it seems a good time and your fruitful discussion appears to be inferable enough. Some of the changes I'd like to propose in the determination of one Wikipedia's general quality are listed in turn:
  • replace the "50-article test" with a broader analysis on the quality of articles which will include screening different tiers of articles sorted by length to yield overall view of the number of articles over some limit;
By using the "50-article test" you get to some figures that exhibit high volatility and statistical error (Thaddeus has found it to be ~13.8% in his analysis) that makes the test more faked and less representative. You can surely re-asses your findings by consequently repeating the test and thereby improve the overall view or take a greater sample that will yield the same result. Nonetheless, a better alternative is to take on to analyse different tiers of articles and to pick some of them to deduce whether your outcome from the 50-article test does represent your findings correctly or not. For example, if you get 10 non-stub/placeholder articles out of 50 for a 100k Wikipedia, it implies that 20k of its size comprise articles that are larger than stubs or placeholders. To test the validity of your findings you can pick the tier for articles from the 19,501st to 20,000th largest and thus get a better view of the quality of that Wikipedia. In the case with the Hindi Wikipedia, four articles are found to be non-stub which implies of having app. 8,400 non-stub articles relative to its total size. Then, you check the tier from the 8,001st to 8,500th largest article and pick some articles from the list to test your validity. The main advantage of this method is that you can perform a broader analysis on the stubs and placeholders as well. In the same case, there are 20 stub-articles implying 42,000 to its total size. This translates to a wider tier ranging from the 8,501st to 42,000th largest article, where you can easily find the average quality of the stub-articles by selecting different tiers.
  • introduce the activity-level as a useful criterion to gauge the quality of a Wikipedia;
Some may say that having a strict criteria means that one can easily game the system by focusing to accomplish them, but it's almost impossible that one can game with the number of active users. I really don't understand the point beyond wasting time to create accounts to artificially increase this number. The variability of this number is valid as it may be derived from different factors such as seasonal activity growth or periodical effects, so plotting data for a longer period (~6 months) might become a necessary condition.
  • fully disregard the page views as a useful criterion to determine Wikipedia's quality.
This is one of the criteria that could be easily manipulated. You just perform a gimmick by pressing F5 to increase the number of page views on separate articles. To undermine the possibility of finding the discrepancy and subtracting it from further analysis, you can perform the same gimmick every day to equalise the number of page views at a higher level. If you need 5 min. a day to make 1,500 views (assuming 5 page views per second), then you can get 6,000 for 20 min. on a daily basis, and 180,000 monthly with the same time spent every day. To manipulate it more deeply, you can engage the whole community in this game and thereby easily reach to an increase of 15-20% over its regular figure. Even if we circumvent the possibility to manipulate with these figures, there are numerous other factors making the comparison between the number of page views of two Wikipedias more complex. These factors include: the total number of speakers of the language (e.g. one language spoken by 5 million people is simply not comparable to one spoken by 100 million people); the percentage of speakers who have access to Internet (this is impossible to measure as most of the languages are spoken by people inhabiting more than one country); the level of multilingualism across the speakers of that language (this is impossible to measure as well); the level of activity of bots (the number of bot edits can be measured relatively easy, but they may put additional burden on the further analysis) etc.
  • I'd like to invite all of you discuss the changes proposed. My name appeared in some discussions in the past relating the Macedonian Wikipedia (my home wiki), which is another example of Wikipedia that deeply exceeds the 50k minimum but is still not listed in the template. Best regards.--Kiril Simeonovski (talk) 12:01, 25 May 2013 (UTC)
    All good suggestions. I left a notice on David Levy's talk page last week to see if he could (potentially) get behind a bot that assessed total quality (rather than a highly variable 50 article test) and he didn't reply. I think it is a worthwhile idea; of course, they devil is in the details (i.e. what exactly is considered). In truth, there is very little interest here so I am hesitant to proceed w/o explicit approval of the idea by those likely to object (mostly David). --ThaddeusB (talk) 06:23, 30 May 2013 (UTC)
    My apologies. I overlooked your message until reading the one above. (Someone else posted on my talk page seven minutes later, and I failed to notice that I had two new messages.)
    In response to your question, I support the idea in principle. Assuming that a bot script meeting your description can be developed and used as a tool (not a replacement for human judgement), I don't see how it could anything other than helpful. —David Levy 23:16, 30 May 2013 (UTC)
    No worries, I figured it was an oversight and nothing intentional. I'll put an "analysis bot" on my to do list. --ThaddeusB (talk) 05:11, 1 June 2013 (UTC)

Regarding my method

The method that I used to compare Hindi and Simple English wikipedia was a roughly based on percentile system to measure the central tendency of articles based on their sizes. It showed that the hypotheses that have been generated on the basis of 50 random article is inadequate as the sample size is too small. One can clearly see that the articles in Hindi (and also length of articles) is better than the corresponding article in Simple English, as per my findings. I strictly believe that better statistical methods must be used to measure the central tendency and dispersion of the articles. At least something better than 50 random articles. One can delve into subjective criteria of "quality" which is more prone to observer bias. Even after a study of central tendency shows that Hindi wiki is superior to Simple English when it comes to article size (and length of articles), people are still not willing to accept it. This clearly shows an observer bias. It is clearly an unscientific assumption as such. Hypothesis testing methods such as 50 random articles should be reported with its p value and other statistical parameters, which clearly has not been done. Also, why was 50 articles chosen in the first place? Why not 75 or 100 or 10? Is there any statistical relevance of 50? I don't think those adamant proponents of the test have any statistically relevant answer to that. I think central tendency and dispersion of all articles can be used to generate Gaussian curve, which would be the most accurate test for length of articles. In its absence, we can still use list of articles based on length to generate percentile system based data. Random sampling is only alternative when we do not have any of these methods to cater us. Just imagine a scenario where one is determining the height of a country of 100,000 people using 50 random people in the street as a standard method when you have the data of height of all the people of country! Once again, I am just delving into the quantitative dimension of the wikipedias as such. This has nothing to do with the qualitative aspect of it. Thank you. --Eukesh (talk) 00:48, 12 June 2013 (UTC)

Telugu Wikipedia

I see lot of interesting discussion going on regarding inclusion of Hindi. At the same time, please evaluate Telugu wikipedia for inclusion in 50,000+ section. te wikipedia has considerably improved since it's last evaluation. Thanks --వైజాసత్య (talk) 08:49, 2 June 2013 (UTC)

Swedish million

Please move the Swedish Wikipedia (svenska) to the top line, as they/we now have a million+ articles. See article Swedish Wikipedia for update and links to sources. Many thanks in advance.--Paracel63 (talk) 12:33, 16 June 2013 (UTC)

At the same time, perhaps Cebuano wp (359k articles) and Waray-waray wp (377k articles) could be promoted to the 200,000+ group? Lsj (talk) 14:32, 16 June 2013 (UTC)
There are also so many things to add in 50k groups, so why just change it into 100k groups? --kwan-in (talk) 04:27, 18 June 2013 (UTC)
Request fulfilled by User:David Levy. Harryboyles 05:33, 18 June 2013 (UTC)

Request for template updating

-- — Preceding unsigned comment added by Zlobny (talkcontribs) 18:28, 18 July 2013‎ (UTC)

Polish (polski) should be moved to the 1,000,000+ line

...As https://pl.wikipedia.org/wiki/ bumped past one million articles on Tuesday, or thereabouts. Rwessel (talk) 00:51, 27 September 2013 (UTC)

Somehow I didn't see the entry just above here when I posted this, this this (obviously) a duplicate request. Sorry. Rwessel (talk) 00:59, 27 September 2013 (UTC)

Polish Wikipedia

Polish Wikipedia has already more than 1 000 000 articles. --Brateevsky (talk to me) 18:54, 25 September 2013 (UTC)

  Done Thingg 14:07, 27 September 2013 (UTC)

Serbian Wikipedia miscategorized

copied from Talk:Main_Page
The Wikipedia in Serbian has well over 200000 articles, but it's listed in the "over 50000" category instead of the correct one for at least a few days now. Possibly the same goes for some of the others. — Preceding unsigned comment added by Spa (talkcontribs) 12:23, 29 August 2013‎

Hi Spa, please remember to sign your talk page posts with ~~~~. I've copied this query here where this sort of thing is handled, please discuss here. Edgepedia (talk) 11:57, 29 August 2013 (UTC)
  Done Thingg 14:16, 27 September 2013 (UTC)

Georgian wikipedia

Hello. Georgian wikipedia has now more than 75,000 articles and it should be included in the list. Can you please add the Georgian wiki page to the list? georgianJORJADZE 15:04, 20 August 2013 (UTC)

Heeeeeeelo. Is anybody here? georgianJORJADZE 13:12, 22 August 2013 (UTC)
  Done depth category is sufficient (38) for inclusion in my opinion Thingg 14:18, 27 September 2013 (UTC)

Edit request on 29 September 2013

Update to use the language parser function for the title as well. Change is in the sandbox. Test cases here and here. Raw output diff here. Lfdder (talk) 09:46, 29 September 2013 (UTC)

  Done --Redrose64 (talk) 11:11, 29 September 2013 (UTC)

Proposed update of lowest category to 100k

A somewhat arbitrary cutoff for quality based mostly on the depth column (>=~18) on the Meta page produces the following (200, 400, and 1000k are the same as they are currently):

If no one has any objections, I will make the change in a few days. Thingg 14:01, 11 October 2013 (UTC)

Thanks very much for inviting me to comment.
As discussed previously, we stopped relying on "depth" as a criterion because it can be skewed (either intentionally or as a side effect) via edits by multiple bots. This is a factor in the decision to omit the Hindi Wikipedia, which consists largely of stubs and placeholders. The oʻzbekcha Wikipedia does too. This bot-generated article was the first random page that loaded for me. As you can see, it contains an infobox, two sentences, and two empty sections. But it's been edited by five different bots — a process that, when repeated throughout the site, artificially inflates the reported "depth". I saw other stubs/placeholders edited by as many as six bots (and no humans).
I see that you added the Georgian Wikipedia (on the basis of its "depth") on 27 September. This was the first random page that loaded for me. I won't even attempt to count the number of bots that have edited it. Please see the archives for past discussions regarding the Hindi and Georgian Wikipedias' omission.
I just reverted the Georgian Wikipedia's insertion, so the proposed change would result in the removal of four Wikipedias (including the Latvian Wikipedia, which you inserted on the same day) and the addition of three Wikipedias of lower quality than those removed. —David Levy 17:11, 11 October 2013 (UTC)
We no longer rely on "Depth" as a useful criterion to gauge anyone's quality, but it doesn't mean that we should rely on David Levy's previous findings that lack factual accuracy. That is, if one Wikipedia was omitted several times in the past, it has to be taken with grain of salt in the future and stripped for good from the right to be included again. A very fruitful discussion was open on this page earlier this year, with some users complaining on the use of the 50 random-article test and at the same time coming with new solutions to replace this evaluation method, but it unfortunately was archived with only an affirmation to take any action which didn't yield any serious results at all.--Kiril Simeonovski (talk) 20:41, 11 October 2013 (UTC)
We no longer rely on "Depth" as a useful criterion to gauge anyone's quality, but it doesn't mean that we should rely on David Levy's previous findings that lack factual accuracy.
The findings aren't mine alone, nor do I assert that they're sacrosanct. I've cited the most recent determinations, but new evaluations/discussions would be entirely appropriate.
That is, if one Wikipedia was omitted several times in the past, it has to be taken with grain of salt in the future and stripped for good from the right to be included again.
No one advocates such an approach. Some Wikipedias have been rejected repeatedly because their inclusion has been requested repeatedly in the absence of substantial improvement.
A very fruitful discussion was open on this page earlier this year, with some users complaining on the use of the 50 random-article test and at the same time coming with new solutions to replace this evaluation method, but it unfortunately was archived with only an affirmation to take any action which didn't yield any serious results at all.
I expressed support for a hypothetical improved method. At no point have I asserted that the current one is perfect or set in stone. —David Levy 22:22, 11 October 2013 (UTC)
As for the proposal to raise the minimum threshold to 100,000 articles, it's not a good idea in my opinion since there are very few quality Wikipedias to have reached more than 100k. Most of the changes were evident among those already included and saw an enormous bot-activity in order to artificially inflate the number from low to high hundreds or from high hundreds over one million articles. The length of the sidebar is still in a normal range of 40-50 links to halt any early and premature change.--Kiril Simeonovski (talk) 20:56, 11 October 2013 (UTC)
I agree. —David Levy 22:22, 11 October 2013 (UTC)
I am ok with whatever. I was not aware of the decision regarding depth. Thingg 02:15, 12 October 2013 (UTC)
David Levy, why did you remove the Georgian wiki link from the template? It's the only link that you've removed. Please explain yourself. Jorjadze (ჯჯჯ) 08:03, 12 October 2013 (UTC)
My edit summary was "per the talk page" because I explained above. Have you read this discussion? —David Levy 15:23, 12 October 2013 (UTC)
Yes I've read above and it is groundless. Georgian wiki has more than 70K articles and why did you remove it from the list? Jorjadze (ჯჯჯ) 15:30, 12 October 2013 (UTC)
Did you find my explanation unclear, or are you challenging the underlying concept's validity? Either way, please elaborate. —David Levy 15:42, 12 October 2013 (UTC)

Macedonian Wikipedia

I see that the template has been recently updated with the inclusion of links to Georgian and Latvian Wikipedia, so it seems to be on time to demand inclusion of the Macedonian Wikipedia which now has more than 73,000 articles or 46% above the minimum threshold for inclusion.--Kiril Simeonovski (talk) 20:43, 8 October 2013 (UTC)

As mentioned in numerous discussions (including those in which you've participated), quantity isn't the sole inclusion threshold. (I won't bother reiterating practices of which you're fully aware). Thingg's addition of the Georgian Wikipedia apparently stemmed from a misunderstanding of our criteria. The Latvian Wikipedia, conversely, recently reached 50,000 articles and appears to be in decent condition. —David Levy 17:11, 11 October 2013 (UTC)
Sure it was discussed in the past, but things may rapidly change. Your premature conclusion loosely based on your empirical evidence you've collected by using your own evaluation methods demonstrates that the omission of one edition is for good and it will never change your personal opinion regardless of one Wikipedia's quality. But please don't forget that the first time it was when the Macedonian Wikipedia reached 50,000 articles, the second time when it counted app. 62,000 and now it has more than 73,000. If your findings remain the same, then it automatically has increased the number of quality articles for more than 46%, which in the event of having 10,000 quality articles a priori would mean that now they're almost 15,000. Hence, even if you're persistent to oppose one's inclusion with no action provided, your previous findings are enough to conclude that the number of quality articles increases when the minimum threshold remains the same. So, it's sincerely appreciated to take any action to raise the barrier or to make some evaluation before uttering the same thing again and again. Please also note that there has been a discussion (in which you were one of the fellow users who have commented) with different proposals to replace the old-fashioned random-article test with other techniques to evaluate one Wikipedia's quality.--Kiril Simeonovski (talk) 20:24, 11 October 2013 (UTC)
Sure it was discussed in the past, but things may rapidly change.
Agreed. I'm not citing the discussions as evidence that nothing has changed or that past decisions are sacrosanct. I'm citing them to point out that you're aware of our current conventions (which you didn't acknowledge in your message). We certainly could decide to adopt different conventions (and perhaps we should), but this has not yet occurred.
Your premature conclusion loosely based on your empirical evidence you've collected by using your own evaluation methods
I haven't unilaterally imposed the evaluation methods.
demonstrates that the omission of one edition is for good and it will never change your personal opinion regardless of one Wikipedia's quality.
That isn't so. I recall personally adding Wikipedias that previously missed the cut. And despite past accusations to the contrary, I have no prejudice against the Macedonian Wikipedia or its contributors.
But please don't forget that the first time it was when the Macedonian Wikipedia reached 50,000 articles, the second time when it counted app. 62,000 and now it has more than 73,000.
The Macedonian Wikipedia wasn't in borderline condition when it reached 50,000 articles. The subsequent quantity increase wouldn't change that.
However, it does appear that the average article quality has improved. I still see many placeholder sections, but a new discussion of whether the Macedonian Wikipedia is now in good enough condition to warrant inclusion would be entirely reasonable. My above objection pertains strictly to the premise that its inclusion should be automatic because it "now has more than 73,000 articles".
Please also note that there has been a discussion (in which you were one of the fellow users who have commented) with different proposals to replace the old-fashioned random-article test with other techniques to evaluate one Wikipedia's quality.
And I expressed support for a hypothetical formula, provided that it were to account for all of the longstanding concerns. At no point have I asserted that the current methodology is perfect or set in stone. —David Levy 22:22, 11 October 2013 (UTC)
Well, then you're encouraged to take some action in evaluating the Macedonian Wikipedia once again. I hope that your findings might be different or enough to warrant inclusion this time.--Kiril Simeonovski (talk) 14:39, 12 October 2013 (UTC)
Okay, I'll perform a more thorough examination sometime soon. (Please bear with me, as I've been extremely busy lately.) Irrespective of my findings, I'll invite others to comment. —David Levy 15:23, 12 October 2013 (UTC)
Thanks. Don't hurry if you're busy with something else. Please also take a look at the special pages providing direct links to the articles sorted by different categories on each project. For instance, you can find a list of 500 articles sorted from the 49,501st to 50,000th longest article or another list of 500 articles sorted from 9,501st to 10,000th shortest.--Kiril Simeonovski (talk) 18:04, 12 October 2013 (UTC)
I should become less busy soon (hopefully) and will give this matter my full attention when that occurs. —David Levy 17:22, 2 November 2013 (UTC)

Albanian Wikipedia

Recently Albanian Wikipedia has surpased 50,000 articles and it needs to be included in the 50k category. Thank You. Visi90 (talk) 13:14, 2 November 2013 (UTC)

As noted in the template's documentation, "this is not a complete list of Wikipedias containing 50,000 or more articles; Wikipedias determined to consist primarily of stubs and placeholders are omitted." The Albanian Wikipedia appears to consist primarily of stubs and placeholders. —David Levy 17:22, 2 November 2013 (UTC)

Welsh Wikipedia, 50,000

Welsh Wikipedia has passed the 50,000 article threshold, with a depth of 43. As I'm not sure what the criteria is nowadays, I won't add it myself, but could someone look into this please?Optimist on the run (talk) 22:25, 12 November 2013 (UTC)

I've deactivated the request for now. I think there needs to be some time for discussion about this, seeing as there seems to be disagreement over the criteria by which Wikipedias are included here. Please reopen the request after a week if there is a consensus to include Welsh Wikipedia, or if no-one has replied here during that time. — Mr. Stradivarius ♪ talk ♪ 05:32, 13 November 2013 (UTC)
No reponse - needs attention from an uninvolved admin.Optimist on the run (talk) 11:43, 21 November 2013 (UTC)
David Levy usually has an opinion on these requests. If not I'll add shortly. — Martin (MSGJ · talk) 12:52, 21 November 2013 (UTC)
Thanks for pinging me (and my apologies for not noticing the section previously).
I just performed the usual sample of fifty random articles, yielding forty-nine stubs and one short article. I've seen Wikipedias in worse shape than this (particularly given the absence of placeholders), but I regard the current content level as insufficient. (If others disagree, the matter certainly is open for discussion.)
This strikes me as an ideal situation for an article expansion drive. Most speakers of Welsh also speak English, so translation from the latter should be relatively straightforward. —David Levy 16:59, 21 November 2013 (UTC)

Isn't the threshold of 50,000 articles too low?

What I see is most language pages are now fixated on passing this 50,000 magical figure with whatever means available and yes resorting to mechanized automatic creation of articles (mostly of villages and small localities which is easy to prepare from exhaustive lists and add minimum of changes and launch as articles) or created one or two line article pages for sportsmen or music artists and then apply for inclusion. It is high time we went back to the earlier 100,000 and more threshold. This will take out some of the less productive language Wikipedias. Second: As for the new higher 100,000 threshold, it is high time very clear guidelines and unified acceptance rules are set once and for all for all language Wikipedias and that they are well-informed about the rules to be able to know how they can comply with acceptance criteria, and to know when they can actually apply. As of now, I feel it is quite an arbitrary process which languages English Wikipedia is accepting. Was there a quality comparison test made and all those we see included are 100% legit full content Wikipedias? Can we reopen the files for some of the languages or are the ones there now final and set in stone? I guess the most urgent for now is raising the threshold to 100,000 as a priority. How many languages would qualify there from the ones we now have werldwayd (talk) 21:57, 14 December 2013 (UTC)

Latin Wikipedia

Latin Wikipedia reached 100,000 today, so should surely be here somewhere. StevenJ81 (talk) 21:42, 18 December 2013 (UTC)

As noted in the template's documentation, "this is not a complete list of Wikipedias containing 50,000 or more articles; Wikipedias determined to consist primarily of stubs and placeholders are omitted."
I just performed our standard fifty-article sample, which yielded fifty stubs/placeholders (mostly stubs), just as it did in 2011. —David Levy 23:57, 18 December 2013 (UTC)
Whatever. StevenJ81 (talk) 03:14, 19 December 2013 (UTC)

Protected edit request on 23 February 2014

Please update the list. For example Kazakh Wikipedia, there is more than 200,000 articles. Arystanbek (talk) 07:01, 23 February 2014 (UTC)

  Not done: it's not clear what changes you want made. Please mention the specific changes in a "change X to Y" format. — {{U|Technical 13}} (tec) 17:11, 26 February 2014 (UTC)

Remove Nederlands (Dutch / .nl) from list?

Given that the Dutch Wikipedia has expanded from ~800K entries to 1.8M entries over the past few years primarily through a couple of bots adding large numbers of stubs, I'm wondering if it should be demoted from the list in this template. FWIW, I did a quick 50 article survey, and hit 35 stubs, 10 very short articles (counted generously), four articles with significant content (again, being somewhat generous), and a DAB page. Rwessel (talk) 03:10, 3 May 2014 (UTC)

"Demoting" Dutch to the 800K tier might be fair, but removing it seems harsh. What we secretly care about is the number of "real" articles, and while the Dutch WP has inflated their score, there's still surely enough real articles that it qualifies to be on the template somewhere. SnowFire (talk) 22:03, 9 May 2014 (UTC)

Urdu Wikipedia

User:Amire80 has added Urdu Wikipedia to the list. But after performing random 50 article sample search it appears most of the article on urwiki are stubs, thus failing "stubs and placeholders" criterion. — Bill william comptonTalk 19:22, 24 April 2014 (UTC)

Yeah, apparently there's a discussion about it on wikimedia-l.
I don't have a strong opinion about it. --Amir E. Aharoni (talk) 19:47, 24 April 2014 (UTC)
Then why did you insert a link to the Urdu Wikipedia (which appears to consist primarily of bot-generated stubs) unilaterally and decline to self-revert when you learned that the change wasn't uncontroversial? And why did you add a link to the Hindi Wikipedia (whose intentional omission has been discussed on multiple occasions)? —David Levy 17:02, 15 July 2014 (UTC)

fawiki

Persian Wikipedia is reached to 400k and it can be moved to the upper level :) –ebraminiotalk 15:14, 18 July 2014 (UTC)

  Done Ruslik_Zero 19:19, 19 July 2014 (UTC)

Replace 50 Random articles with better statistical method

To check the length of articles, there is a much better way than 50 random articles. 50 random articles is a very bad sampling method to check the length of articles. A "random count" might display 50 shortest or longest articles and the results are not uniform for repeated tests. Besides, it is a tedious task, often subject to bias. I would like to request users here to change it with quartiles or deciles as it provides a better picture of the distribution of length of articles.

How to do it?

Let's say there is a wiki with 1000 articles, check Q1 (25%-75% split of length of articles), Q2 (50%-50% split) and Q3 (75%-25% split) article. For this,

  • Calculate the quartiles, eg for the 1000 article wiki, it will be 250th, 500th and 750th articles.
  • Find the first quartile. Eg: for 250th longest article in "xx" wiki with 1000 articles, type https://xx.wikipedia.org/w/index.php?title=Special:LongPages&limit=50&offset=249 . One needs to change the offset value to do this.
  • A list of 50 articles appears starting from 250th longest article in this case
  • Right next to the article, the length of the article in bytes will be mentioned eg ‎ABCD ‎[xyz bytes].
  • The bytes of each quartile will give a better picture of the range of articles than random articles
  • Repeat the same method for Q2 and Q3 resetting the offset value
  • To increase the power of the method, check the length of 1000-2000 longest articles in the wiki using Special Pages. It will be more useful for larger wikis (eg English)

This will give a better picture of the range of page lengths than 50 random articles. For an even better view of distribution of length, decile can be used. Thank you--Eukesh (talk) 11:36, 6 July 2014 (UTC)

A random sample is usually considered a valid statistical tool. I might complain that 50 is a bit on the small side, but the Wikipedias failing this test tend to be *really* low grade. In any event, this isn't directly about article sizes, but rather about content - so you still need to make a judgement about the article. IOW, is this article a stub, for example (and while stubs will tend to be short, that's not really their defining characteristic). And while it's a bit tedious, it's not really done that often. As for repeatability, it's no worse than any other random statistical sampling (although my comment about the sample being a bit small stands). In any event, you're scheme presents repeatability difficulties as well, since straight article size is not the criteria, every time someone will use this to get a sample, they'll get a different selections of articles to judge (assuming there's been a reasonable number of articles added to the encyclopedia). Rwessel (talk) 16:59, 6 July 2014 (UTC)
Thanks Rwessel for the comment. There is qualitative (quality of articles) and quantitative aspect (length or articles) of each wiki. First, lets delve into the quantitative aspect. Sampling is favored when the data of the entire population can not be determined. We have very excellent data (census, if you will) of entire wikis. So, moving to census based data makes more sense than random sample. Measuring central tendency, dispersion and creating a Gaussian curve of each wiki's length distribution would be the ideal solution here. However, since we do not have many users here who are savvy in this regard, we can settle down for median.

The discussions here are always based on "I performed 50 random article sample..." which is not very accurate. Eg- Hindi wikipedia has far more longer articles than Simple English for almost each decile (which I found from a modified Decile method and posted here in 2012). I checked article length of "decile"s myself as well as the bytes of articles. However, the users here were still not ready to believe it.

Coming to quality, 50 random articles method does not determine quality of articles at all! How does one assess the quality of articles in languages that one can not even read? There should be better method for quality assessment and judging the wikipedias by the basic articles and their status (featured, good, long etc) would provide a qualitative assessment for the same. Thank you.--Eukesh (talk) 12:34, 8 July 2014 (UTC)

Let's assume there's a Wikipedia with 2,000 really good (and long) articles and about 1,000,000 bot generated stubs. Among the 50 random articles there might be 48 to 50 stubs. But: Why should this Wikipedia be mentioned in the box with more than 1 million articles when it really has only 2,000 of them? The statistical method isn't really the problem here. -- 32X (talk) 14:40, 14 August 2014 (UTC)

Esperanto

Hi,

The Esperanto Wikipedia just reached 200,000 articles. Could you edit the list accordingly ?

Thank you very much ! :) Thomas Guibal (talk) 06:00, 14 August 2014 (UTC)

Done. —David Levy 21:15, 14 August 2014 (UTC)

Adding Bosnian Wikipedia to the list

Hi. Could someone please add the Bosnian wikipedia to the specified list? The number of 50k articles has been passed some time ago. I did the random 50 article test, and based on these results it seems that this wikipedia is worth including. Regards, -- Edinwiki (talk) 15:35, 6 June 2014 (UTC)

So... what were the results of your test?
I performed the 50 article test myself - actually a bit more than the 50 article test - and came up with 33 Stubs, 13 "short" articles (more than a paragraph), and 4 proper articles. Additionally (not included in my stub count), there were 7 Year articles and 16 astronomical feature articles which were clearly populated by bot and had little human involvement. (The 4 actual articles I found: https://bs.wikipedia.org/wiki/Gradac_(pe%C4%87ina) , https://bs.wikipedia.org/wiki/Prohujalo_s_vihorom_(film) , https://bs.wikipedia.org/wiki/Dinastija_Shang , https://bs.wikipedia.org/wiki/Niko_Kranj%C4%8Dar .) But there was also some worrying signs, like https://bs.wikipedia.org/wiki/Kneset being a stub. Anyway... it's borderline. It is certainly far better than Urdu or Hindi, which should probably be re-removed, as the 50 article test would literally turn up 50 stubs on those checking the talk page archives, but I'm not sure it's good enough to merit inclusion here.
Also, maybe there should be a FAQ for this page, but it's worth mentioning that this template isn't a "reward" for other language Wikipedias - if all Wikipedias instantly had the breadth & quality of English, we still wouldn't have room to list them all. We just use Wikipedia size as a rough proxy of interest for which Wikipedias can get the precious mainpage space. SnowFire (talk) 21:48, 13 June 2014 (UTC)
Hi SnowFire. Sorry for the (very) late reply. I did a 50 test and my results are as follows: 12 "proper" articles ([3] , [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]), 10 articles on astronomical NGC/IC objects (which I consider proper stubs), 4 stubs, 6 disambiguation pages, 15 small to medium articles, and 3 lists. It of course depends on the definition of what is a proper article, but I think the results are positive especially if you compare it to some of the Wikipedia's that are currently on the list. And you are right, this shouldn't be a reward, but nonetheless, some Wikipedias still use it as some sort of a goal which they try to achieve by continously improving the quality of the articles. Bs.wiki does indeed need improvement, but I think that based on these results it should at least be considered whether it can replace some less qualified wiki for the list. -- Edinwiki (talk) 10:26, 6 September 2014 (UTC)
Those 12 look good enough to me, sure. I wouldn't complain about adding it myself, it did appear to be borderline from my look and it sounds on the good-side-of-the-border from your test. SnowFire (talk) 05:27, 7 September 2014 (UTC)

Wikipedia Other Language Editions Problem

In Main Page following Wikipedia Editions are missing for 1000000+ articles section: Cebuano and Waray-Waray. Actually in List of Wikipedias statistics page, these two editions are clearly listed as having over one million articles. Administrators should kindly correct the information in the main page. I have read some studies on the foundation of Cebuano Language Wikipedia and used it in my thesis, really amazing that they have grown so fast. By the way, if these language editions are mostly consisting of stubs, shouldn't they at least locate under 400000+ or 200000+ section? I see them in this page (the template above), but do not see them in the Main Page. — Preceding unsigned comment added by 94.123.205.101 (talk) 13:57, 9 September 2014 (UTC)

I tested 50 random Cebuano articles. All 50 were stubs. This fails our inclusion criteria. I also examined the page history of all 50. Not a single of them had a single edit by a human. It was 100% bots. PrimeHunter (talk) 00:51, 25 September 2014 (UTC)

Basque passing 200000?

Basque (Euskara) currently shows 202057 at meta:List of Wikipedias. Talk:Main Page‎#Basque Wikipedia discusses whether this is supposed to move it to "More than 200,000". PrimeHunter (talk) 01:22, 25 September 2014 (UTC)

Thanks Primerhunter. Euskaldunaa (talk) 10:08, 25 September 2014 (UTC)
Done (as explained in that discussion). —David Levy 00:35, 26 September 2014 (UTC)

Serbo-Croatian (sh.wikipedia.org)

Serbo-Croatian Wiki has passed 260k articles according to http://meta.wikimedia.org/wiki/List_of_Wikipedias#100_000.2B_articles but it's still listed as only 50k+. Could that be fixed? 93.136.48.197 (talk) 06:24, 29 September 2014 (UTC)

Done. —David Levy 21:03, 26 December 2014 (UTC)

Request for updating

Please update some other languages to 1.5 million articles. Qwertyxp2000 (talk) 04:51, 22 December 2014 (UTC)

Extended content
Your list is highly inaccurate (with multiple Wikipedias misplaced/duplicated) and it includes Wikipedias intentionally omitted because they consist primarily of stubs/placeholders. Additionally, we don't use that many tiers or establish one for only three Wikipedias. —David Levy 21:03, 26 December 2014 (UTC)

Protected edit request on 15 February 2015

please move Slovak Wikipedia (sk) to the 200,000 category 213.151.215.195 (talk) 18:39, 15 February 2015 (UTC)

  Done Ruslik_Zero 21:40, 15 February 2015 (UTC)

Urdu Wikipedia

please move Urdu Wikipedia to the 50,000+ category.)

--Obaid Raza (talk) 15:31, 17 February 2015 (UTC)

Protected edit request on 15 March 2015

Add the Macedonian Wikipedia to 50,000+ per this. Also Georgian, Occitan, Chechen, Newar / Nepal Bhasa, Urdu, Tamil, etc. Fauzan✆ talk✉ mail 14:02, 15 March 2015 (UTC)

Just read the doc, but still, a few might be added. --Fauzan✆ talk✉ mail 14:07, 15 March 2015 (UTC)
Has a particular Wikipedia not consisting primarily of stubs and placeholders been omitted? —David Levy 14:57, 15 March 2015 (UTC)

mk.wiki

Could you please add Macedonian (mk.wiki) as it has passed the 80.000 mark some time ago? Cheers! --B. Jankuloski (talk) 11:14, 5 April 2015 (UTC)

Having previously discussed the matter extensively, you know perfectly well that the minimum article quantity isn't the sole inclusion criterion. —David Levy 15:43, 5 April 2015 (UTC)
It's evident that David Levy draws conclusions solely on the evidence that he's found in the past. There was a lengthy discussion on the inclusion of the Hindi Wikipedia to the list almost two years ago, where some users even complained on the relevance of the quality testing used by David Levy and proposed changes which, unfortunately, have never been implemented without even a single attempt to change something in the whole process. Later that year, I requested more thorough examination on the case with the Macedonian Wikipedia with no result even though David responded that he would do it after becoming less busy. That said, I'm strongly inclined to think that David doesn't want to put any efforts on examining the state of the Wikipedias any more and just responds by copying his usual automatic rejection to turn down any users requesting inclusion. His last two comments "Has a particular Wikipedia not consisting primarily of stubs and placeholders been omitted?" and "Having previously discussed the matter extensively, you know perfectly well that the minimum article quantity isn't the sole inclusion criterion." are clear indicators of unwillingness, hubris and disparagement of the work done by the communities on the other Wikipedias. Considering the involvement of users from other communities, this is not a minor issue and deserves further consideration. I think it's worth reporting on the administrators' noticeboard to see if any helpful suggestion come from it. Thanks.--Kiril Simeonovski (talk) 11:42, 9 April 2015 (UTC)
My apologies for failing to follow up in the earlier instance. It was an honest oversight that occurred during a hectic period, not a deliberate brush-off.
As anyone familiar with my long-winded messages can attest, I'm not one to shun discussion. I'm baffled as to why you've perceived my responses as "clear indicators of unwillingness, hubris and disparagement of the work done by the communities on the other Wikipedias".
"Has a particular Wikipedia not consisting primarily of stubs and placeholders been omitted?" was a sincere question, directed toward an editor who'd just become aware of our inclusion criteria and suggested that perhaps "a few" absent Wikipedias qualified (without specifying which).
My above reply to B. Jankuloski, which you've evidently interpreted as something along the lines of "Get lost! I have spoken.", simply reflects the user's feigned ignorance and intentional omission of relevant information, apparently intended to mislead a passing administrator lacking familiarity with our conventions (which, incidentally, I didn't institute unilaterally), which is exactly what happened when B. Jankuloski did this previously. —David Levy 12:55, 9 April 2015 (UTC)

  Comment: You are obvioiusly under the impression that our wiki mostly consists of very small articles, and that our situation is somehow similar to what it was years ago when we made the request. I can assure you that now his not the case at all, In the years that have passed since our last suggestion, we have created many articles of very good size and this painstaking labour and it is very untoward to dismiss it. What I am talking about can best be illustrated by tjis list of long pages. On it, this article ranks at no. 50.000 by length. As can be seen, there are 49.999 articles larger than it, and a good number of them considerably so. I am sure that we more than meet the relevant criteria for inclusion. I expect that that, whoever decides, will take an objective look at our wiki in accordance with the relevant criteria and conclude what I have just expounded. --B. Jankuloski (talk) 18:43, 9 April 2015 (UTC)