Wikipedia talk:WikiProject Wikipedia essays/Archives/2010/April
This is an archive of past discussions on Wikipedia:WikiProject Wikipedia essays. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Let's generalize this
And make it just WikiProject:Essay. I support the effort either way, but I've wanted to do start cat sorting essays since ~3 years now, and had no idea this project existed. -- Kendrick7talk 02:03, 31 January 2010 (UTC)
- I agree that the current name is very awkward-sounding. — Martin (MSGJ · talk) 08:42, 31 January 2010 (UTC)
- "WikiProject Essay Sorting" is another possiblity.--Father Goose (talk) 09:29, 31 January 2010 (UTC)
- Hi all. I resurrected this WikiProject after I discovered it a few weeks ago. It had been in suspended animation for about a year. I also agree that the current name is clunky, which is why I created the WP:ESSAY C/C shortcut. However, that was really just a band-aid effort so I didn't have to type the whole project name all the time. I like WikiProject:Essay as well. Categorically taking over essays also keeps other issues to a minimum (e.g. "Hey, making a portal isn't categorization or classification!") ɳoɍɑfʈ Talk! 03:18, 1 February 2010 (UTC)
I support Kendrick's proposal to generalize this WikiProject to all essays, calling it WP:WikiProject Essay. ɳorɑfʈ Talk! 15:38, 1 April 2010 (UTC)
- I think Wikipedia:WikiProject Essays makes more sense than a singular. –xenotalk 15:41, 1 April 2010 (UTC)
- Ah, good call. WP:WikiProject Essays. Are you up for that too? ɳorɑfʈ Talk! 15:43, 1 April 2010 (UTC)
Done Moved. SilkTork *YES! 14:38, 6 April 2010 (UTC)
Proposal for Importance Grading
I think we should take a page from Google's playbook in determining importance of essays. I think it should be done solely how many other pages link to the essay. I propose to structure it like this:
Low Importance: 0 to 50 Medium Importance: 51 to 200 High Importance: 201 to 1000 Top Importance: 1000+
If we notice the above scale is too top-heavy, we can adjust accordingly (move "High" to 500-2000 and "Top" to 2000+ or something like that). ɳoɍɑfʈ Talk! 18:16, 23 January 2010 (UTC)
Essay Link Survey
Okay, I did a little research on various pages, seeing how many links they had. Here is what I found, taking samples of various pages' "What links here" page."
- Wikipedia:Arguments to avoid in deletion discussions = 1528
- WP:BRD = 607
- Wikipedia:Don't-give-a-fuckism = 484
- Wikipedia:Don't revert due to "no consensus" = 173
- WP:NONFACT = 42
- Wikipedia:Who is a low profile individual = 23
And idea I hit upon while counting the pages is that the list of resultant pages can be viewed 20, 50, 100, 250, and 500 at a time.
I think we could structure it parallel to that:
- Low Importance: 0 to 250
- Medium Importance: 251 to 500
- High Importance: 501 to 1000
- Top Importance: 1000+
This way I can check to see if an article is low, medium, or high simply by selecting the number of entries per page. If I select 500, and it loads as no link, without "next 500," I know there are less than 501, therefore it is a medium or below. Next I click 250. If it comes up the same, I know it is low importance.
I'm going to go ahead and run with this idea unless someone stops me. ɳoɍɑfʈ Talk! 04:24, 25 January 2010 (UTC)
- Any chance we could call it something other than "importance"? Essays tend to be written by opinionated people, who might not take kindly to having the importance of their thoughts graded, especially given the essentially subjective nature of that process. "Impact" would strike me as a more apt description of what is being measured here. Skomorokh 15:01, 25 January 2010 (UTC)
- I'm not sure I see the value of this effort. Why try and rate importance, when an essay's importance evolves organically? Jclemens (talk) 17:40, 25 January 2010 (UTC)
- I think the value in rating importance lies in the way information can be presented to editors. Like the {{essay}} tag says, "Essays may represent widespread norms or minority viewpoints." I think editors might like to know which essays are most widely cited. While its true that anyone can click "What links here" in the sidebar, it would be convenient to be able to get a directory listing of every essay with more than 1000 incoming links, for example. We may find some surprises as well; I was a little suprised that WP:BRD only had 607 incoming links. I would have thought it was more. ɳoɍɑfʈ Talk! 21:46, 25 January 2010 (UTC)
- To answer your question about "Impact," I also think that's more apt than "Importance." Importance implies value, as if higher ranked essays are superior to lower ranked essays. Impact is a little more neutral. That just says "Everyone is talking about this essay," or "Nobody is talking about this essay." I'm pretty sure that "Importance" can be changed to "Impact." ɳoɍɑfʈ Talk! 12:02, 27 January 2010 (UTC)
- I'm not sure I see the value of this effort. Why try and rate importance, when an essay's importance evolves organically? Jclemens (talk) 17:40, 25 January 2010 (UTC)
- My support for this depends on how you would display the "importance" -- a category? A list of "most important"?
- If it were possible to implement it via a template that merely signified the number of incoming links (approximately or otherwise, maybe on the talk page), then I'd go for it. Incoming links is a useful metric for measuring the importance of an essay, though not a perfect one. The existence of navboxes like Template:Civility biases the count, for one thing.--Father Goose (talk) 09:39, 31 January 2010 (UTC)
- Father Goose, what we're talking about here is the standard article evaluation criteria, used in almost all article WikiProjects, where articles are graded in terms of quality (FA, GA, A, B, C, Start, Stub) and importance (top, high, mid, low). We dropped quality as it doesn't make sense to evaluate the quality of an opinion. We're talking about renaming "importance" to something that fits "# of incoming links" better. And the existence of navboxes only biases the count at the low end. I have an essay that's in one of those navboxes, and I have 42 incoming links because of that. If you set the bar at 201 for mid-, it doesn't really help. Also, essays in those navboxes are better read than orphans, so they deserve a higher incoming link count. ɳoɍɑfʈ Talk! 14:06, 1 February 2010 (UTC)
- Here is a report MZMcBride has generated for this project for incoming links. You might also consider pageviews as a metric. Number of watchers may also provide some clue as to an essay's mindshare. I just grabbed one at random, WP:Wikipedia is not a wine guide, top importance per your scale and the report, viewed only 66 times last month. It's many links are because it is linked in {{WikiProject Food and drink}} - something I'm not sure we can easily filter. –xenotalk 04:01, 16 March 2010 (UTC)
- One other metric would be the delta in incoming link totals month-over-month. Of course that data is 30 days away. =) –xenotalk 04:19, 16 March 2010 (UTC)
- You raise some excellent points. I'm not wedded to number of incoming links. I chose it because I knew it was available via the "What Links Here" link. Maybe we can build an algorithm that looks at number of watchers, number of pageviews, and delta in incoming link totals. If that's too complicated, I'd be happy with just pageviews. Thanks for your contribution. It is very helpful. ɳorɑfʈ Talk! 07:31, 16 March 2010 (UTC)
- Request filed for Z-bot to generate pageview data. –xenotalk 15:41, 16 March 2010 (UTC)
- This looks good. I'm wondering if the bot can read the rank-order, because that would be even better than an arbitrary pageview range. For example Top importance being the top 10 essays, High importance being 11 through 50, Medium being 51 through 100, and Low being everything else? ɳorɑfʈ Talk! 16:03, 16 March 2010 (UTC)
- I could just chop up the list this way manually. It wouldn't be too hard. –xenotalk 16:05, 16 March 2010 (UTC)
- This looks good. I'm wondering if the bot can read the rank-order, because that would be even better than an arbitrary pageview range. For example Top importance being the top 10 essays, High importance being 11 through 50, Medium being 51 through 100, and Low being everything else? ɳorɑfʈ Talk! 16:03, 16 March 2010 (UTC)
- Request filed for Z-bot to generate pageview data. –xenotalk 15:41, 16 March 2010 (UTC)
- You raise some excellent points. I'm not wedded to number of incoming links. I chose it because I knew it was available via the "What Links Here" link. Maybe we can build an algorithm that looks at number of watchers, number of pageviews, and delta in incoming link totals. If that's too complicated, I'd be happy with just pageviews. Thanks for your contribution. It is very helpful. ɳorɑfʈ Talk! 07:31, 16 March 2010 (UTC)
- MZMcBride (talk · contribs), in his infinite ubertude, has generated all the data on a single page, see Wikipedia:WikiProject Essay Categorization and/or Classification/Assessment/Links. Let me know what you think. –xenotalk 19:27, 16 March 2010 (UTC)
- That's pretty awesome. So I'm wondering if we can give a relative weight to incoming links, watchers, and pageviews, weighting incoming links the lightest and watchers the heaviest, with pageviews in between. Let's say "incoming links" = i, "watchers" = w, and "pageviews" = p. Something like (i/100)*w*(p/10)=n. all essays are rank ordered for n, then the top 10 = Top importance, 11-50 = Medium, etc. That formula is likely crude, as that's not my field, but I'm sure you get the idea. Can the bot do something like this, or do we have to pick a single variable (like pageviews)? If the proposed formula is too much, even w*p=n would be good. Anyway, let me know what the capabilities are. I'd love to also have a list like the one you've shown me generated monthly just so we can track statistics of the essays, and have a reference. ɳorɑfʈ Talk! 03:59, 17 March 2010 (UTC)
- Sure that's not at all hard to calculate in Excel... And probably could be built into the report as well. You may want to massage it a bit. (Raw results here: [1]). –xenotalk 12:04, 17 March 2010 (UTC)
- That's pretty awesome. So I'm wondering if we can give a relative weight to incoming links, watchers, and pageviews, weighting incoming links the lightest and watchers the heaviest, with pageviews in between. Let's say "incoming links" = i, "watchers" = w, and "pageviews" = p. Something like (i/100)*w*(p/10)=n. all essays are rank ordered for n, then the top 10 = Top importance, 11-50 = Medium, etc. That formula is likely crude, as that's not my field, but I'm sure you get the idea. Can the bot do something like this, or do we have to pick a single variable (like pageviews)? If the proposed formula is too much, even w*p=n would be good. Anyway, let me know what the capabilities are. I'd love to also have a list like the one you've shown me generated monthly just so we can track statistics of the essays, and have a reference. ɳorɑfʈ Talk! 03:59, 17 March 2010 (UTC)
Importance in Practice
At this point I think it is appropriate to begin a discussion about how we should set the criteria for an essay's importance.
What constitutes "Important"?
First off, we need to determine what constitutes Top importance, High, Medium, and Low. I propose that the top 10 essays be considered Top importance, the top 5% (besides the Top 10) be considered High Importance, the next 5% be considered Medium, and the bottom 90% be considered Low importance. Considering there are about 1000 essays now in the Wikipedia essay category (although only about 700 have the project banner tag), that would mean there were 40 High importance essays and 50 Medium importance essays. If you look at the way pageviews, links, etc, drop off after the top 100, I think it makes sense to have 90% be low. ɳorɑfʈ Talk! 02:47, 18 March 2010 (UTC)
How do we measure?
There are several variables we have access to, and we can weight them as we see fit. The variables we can get for each essay are:
- Pageviews
- Watchers
- Incoming Links
My sense is that how often an essay is cited in discussion is a good measure of how "talked about" it is. However, this is not reliably determined by Incoming Links because category links, WikiProject links, and others are also included in this number. One might argue that an essay nobody reads that has 1000 incoming links is still more important/influential than an essay nobody reads with zero incoming links, because the probability that someone might read it in the future is higher. Still, this appears much weaker than Pageviews and Watchers.
Pageviews is a good measure of importance because it records how often someone actually navigates to the essay (and presumably reads it). However, people read essays for a variety of reasons, and reading an essay does not constitute agreement with it. At the same time, one might argue that an essay nobody agrees with, but everyone reads is still important, whether or not people share its views, because everyone thought enough of it to at least read it.
Watchers, in my opinion, is the best indicator of an essay's importance, because people generally watch essays in a sort of custodial manner, to make sure they are not vandalized, or rewritten in such a way that does not reflect the spirit of the essay.
I think that a forumula which weights Watchers as most important, Pageviews as second most important, and Incoming Links as least important is appropriate. I think Incoming Links should be weighted so lightly as to really serve as little more than a tiebreaker when the other variables are too close together.
What do other people think? ɳorɑfʈ Talk! 02:47, 18 March 2010 (UTC)
- I think it's probably a good start. The importance can be tweaked manually, of course, if the formulaic approach doesn't cut the mustard for every essay. –xenotalk 16:14, 24 March 2010 (UTC)
- Okay, so let's be WP:BOLD and do it. I ran W*10+P*2+L/100 against a few of the stats at Wikipedia:WikiProject_Essay_Categorization_and/or_Classification/Assessment/Links and that seems to put things in their rightful place (drops essays that have high incoming links only, raises essays with lots of watchers). Want to program it and do a test run? ɳorɑfʈ Talk! 00:20, 25 March 2010 (UTC)
- The task is running... –xenotalk 15:12, 1 April 2010 (UTC)
- Can you post some metrics on what has been done? All the essays on my watchlist have been summarily tagged low. The threshold for importance may have been set too high. Jclemens (talk) 15:42, 1 April 2010 (UTC)
- See WP:ESSAY C/C/A#Score. Task complete. 891 edits. If you start from the bottom up, the edits are in sequential order until you get to low-import where I alphabetized. –xenotalk 15:46, 1 April 2010 (UTC)
- Something's wrong when WP:TURNIP is mid and WP:HAMMER is low. I'm wondering if template links are distorting the scoring. Jclemens (talk) 16:13, 1 April 2010 (UTC)
- Looks like the Bot doesn't take into account incoming links from redirects. Should probably ask MZMcBride if he can fix that... That being said, WP:HAMMER doesn't seem to get very many views. –xenotalk 16:16, 1 April 2010 (UTC)
- Jclemens, you don't have to wonder: the weighted scale is published. The weighted score system takes into account number of page watchers (W), pageviews in the previous month (P), and number of incoming links (L). The score is calculated with the following equation: W*10+P*2+L/100. As you can see, one watcher equals 5 pageviews equals 1000 links. We weighted links extremely low so that it would really only serve as a tiebreaker when the other two factors were about equal to another essay. This is also to ensure that template linking won't have much direct affect on the rankings. You can also view the entire raw table (the one that Xenobot is using to do the rankings) here: Wikipedia:WikiProject_Essay_Categorization_and/or_Classification/Assessment/Links ɳorɑfʈ Talk! 16:52, 1 April 2010 (UTC)
- Further, looks like WP:HAMMER had 82 pageviews last month. WP:TURNIP had over 700. That's a pretty big difference. ɳorɑfʈ Talk! 16:56, 1 April 2010 (UTC)
- The scale itself is ridiculously biased in favor of people who watch policies: 1,000 incoming links equals one watcher? Very, very few pages even have' 1,000 incoming links, so the current algorithm essentially is watchers + 1/5th monthly pageviews. That's silly. Why not go with something more balanced like straight watchers+views+incoming links? Jclemens (talk) 17:08, 1 April 2010 (UTC)
- Covered already. Please read this talk page. Starting with the beginning of the current discussion: Wikipedia_talk:WikiProject_Essay_Categorization_and/or_Classification#Proposal_for_Importance_Grading Then you'll know why we didn't go with that. ɳorɑfʈ Talk! 17:33, 1 April 2010 (UTC)
- The scale itself is ridiculously biased in favor of people who watch policies: 1,000 incoming links equals one watcher? Very, very few pages even have' 1,000 incoming links, so the current algorithm essentially is watchers + 1/5th monthly pageviews. That's silly. Why not go with something more balanced like straight watchers+views+incoming links? Jclemens (talk) 17:08, 1 April 2010 (UTC)
- Further, looks like WP:HAMMER had 82 pageviews last month. WP:TURNIP had over 700. That's a pretty big difference. ɳorɑfʈ Talk! 16:56, 1 April 2010 (UTC)
- Something's wrong when WP:TURNIP is mid and WP:HAMMER is low. I'm wondering if template links are distorting the scoring. Jclemens (talk) 16:13, 1 April 2010 (UTC)
- See WP:ESSAY C/C/A#Score. Task complete. 891 edits. If you start from the bottom up, the edits are in sequential order until you get to low-import where I alphabetized. –xenotalk 15:46, 1 April 2010 (UTC)
- Can you post some metrics on what has been done? All the essays on my watchlist have been summarily tagged low. The threshold for importance may have been set too high. Jclemens (talk) 15:42, 1 April 2010 (UTC)
- The task is running... –xenotalk 15:12, 1 April 2010 (UTC)
- Okay, so let's be WP:BOLD and do it. I ran W*10+P*2+L/100 against a few of the stats at Wikipedia:WikiProject_Essay_Categorization_and/or_Classification/Assessment/Links and that seems to put things in their rightful place (drops essays that have high incoming links only, raises essays with lots of watchers). Want to program it and do a test run? ɳorɑfʈ Talk! 00:20, 25 March 2010 (UTC)
Why are we using the current formula? It was consensus at the time. Of course, consensus can change (and we're open to that). We started with the idea of watchers+views+incoming links. Then we saw how well-linked essays that nobody reads (like WP:WINEGUIDE) get ranked REALLY highly. So then we thought about not using incoming links at all, because WP:WINEGUIDE has tons of them but few pageviews, so using incoming links seemed to skew it. I don't think you should think of it as "1000 incoming links equals one watcher." I think you should think of it as a system that uses watchers and pageviews and uses incoming links to break ties. We weren't setting up some sort of equivalency and deciding the "worth" or "value" of these different factors against each other. We were trying to determine the most accurate way available to determine which essays are read more, cited more, and therefore more influential (or representative of consensus). We didn't think incoming links really helped much. Now if we could filter out incoming links from templates, categories, etc, and be left only with incoming links from talk pages, that would be a useful piece of information. But that's beyond our capabilities. So we're left with P, W, and L, and they certainly aren't equal in weight, so we've got to weight them one way or another. If you've got a suggestion, we're certainly open to tweaking the formula. I don't particularly care what the formula is as long as it accurately represents the most/least influential/used/read essays on Wikipedia. ɳorɑfʈ Talk! 17:33, 1 April 2010 (UTC)
- In the specific case of WP:HAMMER, there are several redirects that point to the same place. I'll check 'em out and post some numbers. Jclemens (talk) 17:08, 1 April 2010 (UTC)
- Bah, one of the things I hate about Wikipedia is the fact that the toolserver is separate and poorly categorized. How do I go looking at all the uses of all the redirects that point to HAMMER, as well as trending over time? I'm thinking that the reason HAMMER may get so many relatively few pageviews is that it's pretty straightforward, static, short, and most everyone already knows what it says. Does that make it less important? I think not. Jclemens (talk) 17:24, 1 April 2010 (UTC)
- Thinking about this more, it might make sense to give the "top 5" essays on any of the three metrics "top" importance. Or something. But to address Jclemens concern, I don't see an issue bumping HAMMER up to at least mid - it's a fairly well-regarded essay in deletion discussions - even if it doesn't have the weighted score to prove it. Again, we aren't married to this equation. –xenotalk 17:40, 1 April 2010 (UTC)
- In the specific case of WP:HAMMER, there are several redirects that point to the same place. I'll check 'em out and post some numbers. Jclemens (talk) 17:08, 1 April 2010 (UTC)
- Would it make sense to give stable incoming links from policies & guidelines more weight, say, as much as or more than one watcher? Would it change the makeup of the top two categories, or would it reflect the same priorities? ~ Ningauble (talk) 15:10, 1 April 2010 (UTC)
- This would probably be difficult to program for with the bot-generated report. The important thing to note is we aren't married to the weighted score, if other factors that aren't easily divined by the bot/weighted score, we should bump an essay up manually. Perhaps with a parameter to make it clear it was set manually. –xenotalk 15:12, 1 April 2010 (UTC)
- FYI it should be noted that the "Links" value is even less useful than we previously thought. It only counts the first link and ignores subsequent links. If, for example, WP:STAYCOOL is linked 15 separate times in a particular ANI archive, that would be significant: unfortunately, in our score system it only receives credit for the one link. This isn't possible to workaround without complicated and troublesome in_wiki_text searches. –xenotalk 14:24, 16 April 2010 (UTC)
Importance Index
FYI, I'm in the process of putting together a Wikipedia Editorial Team 1.0 Index for essays, like the one here: Wikipedia:Version 1.0 Editorial Team/Index to appear both on the portal and here at Wikipedia:Essay Categorization and/or Classification/Assessment/Statistics. I'm about halfway there, have a question in at the help desk, then should be able to finish quickly. ɳorɑfʈ Talk! 08:36, 2 April 2010 (UTC)
What is the importance of "Importance"?=
I think the formula given above for sorting essays into "importance" ranking is a decent starting point, and if there is to be some form of ranking, it is better than nothing at all. While looking into this matter, it may be worth considering the purpose of ranking the essays by importance. Is it that the essays ranked as important are the ones this project will pay more attention to for maintenance purposes? If maintenance is the case, then those essays which are most heavily edited and/or vandalised would be ones to also consider for high importance. I would suggest that while looking at an essay, a human judgement can be made to update/supplement the basic importance formula based on knowledge and observation. SilkTork *YES! 15:12, 6 April 2010 (UTC)
- I viewed this "importance" excercise as trying to divine the "mindshare" that an essay has. As far as your second comment - yes - we are not 'locked in' to the formula. If you feel that a particular essay is of higher 'importance' than the formula says, be bold and upgrade it. (Suggest using "auto=no" just in case we ever run the bot to "re-rate" these on future months' numbers) –xenotalk 15:17, 6 April 2010 (UTC)
User essays
Are user essays within the scope of this project? There are many very very good essays in userspace as well:
- User:Uncle G/On sources and content (and a few others in his userspace)
- User:Antandrus/observations on Wikipedia behavior
- User:Giano/A fool's guide to writing a featured article
- User:Beyond My Ken/thoughts
- User:Art LaPella/Devil's Dictionary of Wikipedia Policy
- User:Mangojuice/Administrators are not slaves
And many others. -- Ϫ 09:43, 8 April 2010 (UTC)
- User essays are not in the scope of the project, because we don't have any authority over anything in user space. We can certainly move (or copy) an essay in user space to WP space if we want to administrate it, though. ɳorɑfʈ Talk! 10:50, 8 April 2010 (UTC)
Adjusting essays in categories of rank
16-Apr-2010: Of course, the auto-ranked essays would need to be re-ranked within the rank-categories, depending on their future impact to Wikipedia users, so that requires a human decision about the future importance of many essays. The rank-categories (for April 2010) are:
Also, the count-limits (such as only 10 in "Top") seem too severe, where perhaps the count should be based as a percentage of the total existing essays, such as perhaps 5% of 950 essays being "Top" in rank, where the average essays would be "Mid-importance" as being most of them. The proposed counts, in each category, would become:
- Category:Top-importance Wikipedia essays - 5% = 47 essays
- Category:High-importance Wikipedia essays - 15% = 142 essays
- Category:Mid-importance Wikipedia essays - 45% = 427 essays
- Category:Low-importance Wikipedia essays - 35% = 332 essays
The rationale for having most essays as "Mid-importance" is the notion that Low-importance essays would be avoided, as being a transition rank, as subject to removal by "userfication" because they don't affect enough users. Hence, the other essays, which remained, would tend to be considered mostly "Mid" when the "Low" essays would either rise to "Mid" or become userfied, as removed from the list. Hand-ranking of the Top 47 (+37 more than the top 10) should be easy, based on experienced notions of what essays would impact most users in the future. The problem of small counts, such as the Top 10, is like choosing the top 10 important people of all time, versus the top 47 or top 142, which gives much more room for diversity of essay topics. -Wikid77 (talk) 21:07, 16 April 2010 (UTC)
- Hi, thanks for the input! Regarding the first sentence in your post: What you're talking about is the difference between being descriptive and predictive. I think being predictive is a bad idea, because 1) people don't agree on things like "future importance," which leads to discussion and attempts at consensus building, and if you have to do that for a whole bunch of essays, that becomes problematic. This project (as it is right now) simply describes what it sees. There is a freedom in that. You think essay X is more important than 27 incoming links, no watchers, and 3421 pageviews suggests? Either adjust your perceptions, because someone else is inclined to disagree with you. Also choosing to be predictive means that we'll have authors arguing for higher importance ratings (or arbitrarily assigning their own). These arguments all sap vital time and energy from the project. As it stands right now we have an automated system that tells everyone exactly what is happening. That has never been done for essays before. Now when some idiot cites an essay nobody reads (which s/he probably wrote), everyone can easily see that nobody reads it. ɳorɑfʈ Talk! 03:53, 17 April 2010 (UTC)
- Regarding the rest of the post: The purpose of the entire project is to describe the essays better (through classification and categorization) to make it easier for users to find the information they want, and to evaluate the information they find. If I want to see what the top essays are, I don't want to see 50 (we have over 1000). I want to see 10, maybe 20. That's what "Top" means to me. Further, the reason we chose 10 is because you can see marked drop-offs in importance as you scroll down the page. We tried to choose breakover points that mirrored what we were seeing. In other words, things in a certain importance category should be more similar to each other (in terms of impact) than to things in another importance category. The difference between essay #5 and essay #15 was too great for us to say "These are both TOP importance." Also, you'll have to explain to me how you can use percentages and specify that all those in the bottom tier get userfied. If you userfy the bottom 35% every month, after two months you'll have userfied half the essays.ɳorɑfʈ Talk! 03:53, 17 April 2010 (UTC)
- Here is why arbitrarily imposed distributions won't work
What you're trying to impose is a normal distribution on a population that doesn't even come close to a normal distribution. Can we really call essays in the middle of this graph "mid-importance"? If so, then that's like going to a country where 1% of the population is filthy rich and 90% are dirt poor, and calling the guy in the 50th percentile "middle-income," even though he only makes 50 cents more per day then they guy in the 10th percentile. I'd love for your proposed distribution to be reality, but there simply aren't that many essays out there that have much of an impact. I think our distribution of essays is probably similar to the distribution of basketball skill if we include every adult in the country. A few hall of famers (Top), a league of really great players like the NBA (high), a larger group of skilled players like the NCAA (mid) and everyone else, and that everyone else includes the guy who is the best in his neighborhood, and me (I suck ass). Here is a closeup of the top end of the chart: We broke over at 10 essays, then 50, then after that it is the next 10%. We believe that the other 85% simply do not have much impact, and that is bourne out by the numbers. I think the graph shows that Top 10 (or 9, actually) is clearly a breakpoint. There is more room for argument about the second and third breakpoints. ɳorɑfʈ Talk! 04:48, 17 April 2010 (UTC)
- Those graphs need to be "inverted" so that the scores are on the bottom line, and the counts would be on the left line. A probability distribution (such as the normal distribution) should show the height of the number at each score, where the score is calibrated on the x-axis (bottom axis) and the counts are on the y-axis. At that point, you could exclude the extreme top-scoring essays (perhaps 15 essays?) and examine the remaining distribution. -Wikid77 10:01, 17 April 2010 (UTC)
- That depends on whether you want to see how the frequency of scores are distributed or how the frequency of essays at a particular score are distributed. If we're ranking them, it makes sense to look at them in terms of that rank, which this graph does. How high is the score (i.e. how much impact) of the top ten? You can see it there. How high is the score of the next 40? Its there. Etc. I think this graph speaks for itself. Manipulating the data (i.e. data cleaning) by excluding the top 15 is inappropriate for the purposes of what we're analyzing. The top 15 aren't outliers if we're ranking the essays; they are integral to the dataset. ɳorɑfʈ Talk! 14:36, 17 April 2010 (UTC)
- I see your point: most of the essays rank close to zero. So those impact scores cannot distinguish among the bulk of the essays. Should essay size be considered, as an attempt to blend the metrics for quality with readership levels? I would suspect a very tiny essay to be easily considered as minimal quality. Does that make sense? -Wikid77 (talk) 09:19, 18 April 2010 (UTC)
- There's nothing to distinguish: they have no impact. We intentionally do not measure quality. Essays are editor's opinions. We find it inappropriate to measure the quality of opinions here. To attempt to do so would create tons of arguments. ɳorɑfʈ Talk! 09:55, 18 April 2010 (UTC)
Some essays to be re-ranked
We should discuss some particular essays, to consider criteria for re-ranking them, based on actual essay contents. To begin, I was shocked that one of my 30 essays ranked High, when it contains merely quickly collected pageview-stats of most-missed article names that did not exist, even listing misspelled redlinks as "wanted" articles. Meanwhile, similar essays of extensive pageview stats, which I spent many weeks writing, were judged as "Low" essays. Compare the reader pageview counts:
- Wikipedia:Most_missed_articles_in_2008 - viewed 9516 in 2009
- Wikipedia:Most_read_articles_in_2008 - viewed 2629 in 2009
- Wikipedia:Most_read_articles_in_2009 - viewed 200 per month
Those actual Top 1000 pageview-statistics essays are extremely difficult to write. Plus, the counts for 2009 were the first year when enough data was collected to really pinpoint most of the Top 1000 articles, very accurately. NOTE: The essay "WP:Most read articles in 2009" is the first year of highly-accurate pageviews. Anyway, those are some issues to consider, when re-ranking essays which might have more impact on Wikipedia in the future. I understand the ranking process has just begun, as a new task, and so this is a good time to note the vast differences in essay contents, as an issue for re-ranking them. Hint: perhaps include page-size kilobytes (kb) as a factor, when re-ranking the essays: "Most read...2008" (80kb) & "Most read...2009" (61kb). Anyway, I think swap my High-ranked essay, with a Mid-rank, as follows:
- Raise to High: WP:Protecting_children's_privacy - viewed 1547 in 2009
- Lower to Mid: WP:Most_missed_articles_in_2008 - viewed 9516 in 2009
The rationale, for re-ranking them: Essay "Protecting children" informs users about privacy issues they should know, whereas essay "Most missed...2008" mainly reveals that a misspelled redlink of a celebrity looks like a highly-requested new article. That difference of impact, to actual users, should out-weigh the fact more people were curious about what are "missed articles". -Wikid77 (talk) 22:24, 16 April 2010 (UTC)
- I would contend that none of those (most read/most missed) are actually essays as such. –xenotalk 01:55, 18 April 2010 (UTC)
- The essays for most-read do have more text about ranking; they're not just data lists, but essays heavily based in reality. -Wikid77 09:09, 18 April 2010 (UTC)
- Whoa, hold on there. There are a couple misconceptions happening here. First, when you talk about misspellings and the like, you're talking about quality not importance. We don't evaluate essay quality. If an essay is poorly written, but a million people read it each month, its going to be Top importance. Second, "importance" is a measure of impact, not how crucial we think the information is. Maybe we can rename the attribute "Impact," because this is like the third time I've had to explain this, and I'm sure its going to keep coming up, the same way people confuse the difference between common usage of the word "notable" and Wikipedia's notability criteria. "Importance" in this case is a descriptive measure of impact. If you think an essay has more or less impact than the numbers show, you can attempt to build consensus to have it moved into a certain level. ɳorɑfʈ Talk! 05:13, 18 April 2010 (UTC)
- I think "impact" would help alleviate some of these concerns. –xenotalk 13:16, 18 April 2010 (UTC)
Renaming Wikilawyering as Wikifogging
Because "wiki-stalking" was renamed as policy section WP:Wikihounding (to avoid legal term "stalking"), I think we should rename essay WP:Wikilawyering as "WP:Wikifogging". See:
Note, we would then adjust the main writings to use "wikifog". WP:Wikilawyering would be a redirect to "WP:Wikifogging" so that not every link need be changed. The navbox should be changed to avoid "lawyering". I realize renaming just 1 essay goes beyond this WikiProject, but the term "wikilawyer" has been offensive for years, so I'm just asking for some help, if you have time. Also, consider the impacts of renaming in various essay lists. Thanks. -Wikid77 11:53, 19 April 2010 (UTC)
- I don't think you're going to be able to get something this entrenched changed. –xenotalk 18:42, 19 April 2010 (UTC)
- This space is for discussing WP:WikiProject Essays. Please discuss individual essays on that essay's talk page. ɳorɑfʈ Talk! 09:49, 20 April 2010 (UTC)
Importance renamed to Impact
The template has been changed, and the importance category has been renamed "Impact." The ranking system (Top, High, Mid, Low) has not changed. Hopefully this will put some of the recurring arguments about the meaning of "importance" to bed. ɳorɑfʈ Talk! 17:14, 26 April 2010 (UTC)