Talk:Nonparametric skew

Latest comment: 7 years ago by Glenbarnett in topic Name

Name

edit

So who calls this thing the "Nonparametric skew"? It is no more "nonparametric" than any other. Melcombe (talk) 14:49, 26 January 2012 (UTC)Reply

Its the only name I have seen it referred to by - so far. Perhaps it should be referred to as Bowley's first measure of skewness to give it credit to its creator? DrMicro (talk) 16:32, 26 January 2012 (UTC)Reply
If you have seen it referred to thus, then do give a citation for this usage. Wikipedia is more concerned with terminology/names as used in practice, not with inventing new terminology or names based on someone's perceived importance in development. Melcombe (talk) 20:57, 26 January 2012 (UTC)Reply

Melcombe is right -- indeed the sentence "Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric" makes no sense -- the same comment applies to every measure of skewness I've ever heard of (dozens at least). If that's really why it's called nonparametric skewness, it's a terrible reason. Better to offer the earliest available reference that calls it that and use that reference as the basis for the name. (Personally, since it's just the second Pearson skewness with the "3" filed off, why does this get a whole article of its own, while second Pearson skewness gets a teeny section in the "Skewness" article? If there's an article to be written here, should it not talk about both? The obvious encompassing term would be *median skewness* (which is sometimes applied to the second Pearson skewness but would apply just as well to this measure), or perhaps even "median-based skewness" and discuss the two together. Failing that, the second Pearson skewness needs to be more prominent in this article; I think it should be mentioned in the opening paragraph, possibly the opening sentence. Glenbarnett (talk) 02:24, 27 April 2017 (UTC)Reply

Mistake in notational definition?

edit

The section Sharper bounds, subsection Extensions currently says

It has been shown that
 
where xq is the qth quartile.

For the second or third quartile this gives imaginary numbers. I don't have access to the given source, but presumably the formula should have q replaced everywhere on the right side by q/4. That would make it collapse to the standard formula regarding the median (q=2).

Any objection to my making that change? Duoduoduo (talk) 18:29, 6 May 2013 (UTC)Reply

Also: the immediately following sentence

This statistic has also been extended to distributions with infinite means,

while given with a source, is too vague to be useful.

For the above reasons I'm deleting the above passages. They could certainly be reinserted if the formula is corrected by someone with access to the source and the subsequent statement is clarified. Duoduoduo (talk) 14:35, 7 May 2013 (UTC)Reply

The passage has been restored, with an explanation. But the explanation does not accord with standard terminology. The new explanation is Here the quartile lies between 0 and 1: the median has q = 0.5. But standard terminology refers to the first quartile, etc., not the .25 quartile. Can we find a better way to express this?
Also, the puzzling passage This statistic has also been extended to distributions with infinite means has been restored. Could someone go into the source and see what it says more specifically, and clarify this? The expression is not well defined for distributions with infinite means, because both the numerator and the denominator are infinite. Duoduoduo (talk) 20:09, 9 May 2013 (UTC)Reply

Inequalities

edit

Maybe I'm just misunderstanding, or maybe there are some typos in copying from the source (or omitted caveats) into the section "Extensions".

1. It says

 

But consider the distribution Pr(X= -1) = 1/2 = Pr(X=1) with mean zero and standard deviation one. For the 1/4 quantile the first inequality says   or  , which is not correct.

2. Our article also says

 

But consider the distribution Pr (X = -1) = (1/2 + epsilon), Pr(X = 1) = (1/2 - epsilon) for arbitrarily small epsilon, and consider q=1/2. The mean is arbitrarily close to zero, and the standard deviation is arbitrarily close to 1. The first inequality here says something arbitrarily close to   so   hence   But that can't be true since for all values of X, X is no more than (and usually less than) a distance of 1 from the mean.

Am I confused, or is something different from what the source says? Duoduoduo (talk) 18:26, 10 May 2013 (UTC)Reply

Good points.
I will have to re read the reference to answer the first point
If I understand the second point correctly what we have here is a limiting case where the bounds are attained. Instead of an inequality equality applies.DrMicro (talk) 19:13, 10 May 2013 (UTC)Reply
You're right about my second example -- it gives equality so it's not a counterexample. Duoduoduo (talk) 19:32, 10 May 2013 (UTC)Reply
But how about this counterexample to the second inequality chain above: let Pr(X = -1) = 1/2 = Pr(X = 1) and let q = 1/4. then the inequality says -2/3 ≤ -1. Duoduoduo (talk) 19:41, 10 May 2013 (UTC)Reply
This is a Rademacher distribution. The mean is zero. Consequently the expectation operator here acts on X alone. The expectation of X is also zero (as this is the mean). This gives 0 ≤ x1/4 ≤ 0 as the bounds of the inequality. If the statement is correct then x1/4 for this distribution must also be zero. I will have to look up details of this distribution to verify this.DrMicro (talk) 11:25, 11 May 2013 (UTC)Reply
It may be the case the the quantile function is not properly defined for discrete distributions such as the Rademacher distribution. Alternatively this statement may only apply to continuous functions. I will have to check the details again.DrMicro (talk) 11:32, 11 May 2013 (UTC)Reply
A quick check in WP confirms my suspicions: quantile functions are only properly defined for continuous functions. For discrete distributions pathologies such as the one noted above may occur. Since this is a possible point of confusion perhaps it may be worth noting that quantile functions are only properly defined for continuous functions? Alternatively the statement of the theorem could be altered to state that it only applies to continuous functions. If the second course is chosen it may be necessary to check cases for piecewise continuous functions.DrMicro (talk) 11:39, 11 May 2013 (UTC)Reply
The difficulty with the proper definition of the quantile function would also deal with the first possible problem noted above. DrMicro (talk) 11:40, 11 May 2013 (UTC)Reply
My reading of the article quantile function is that it is not the same concept as what we are talking about here. In any event, do you have access to the source that gives the disputed inequalities? If so, could you look there and see precisely what caveats it gives -- e.g., does it limit itself to continuous distributions? I'd be a little surprised if that solves our problem, since you can turn a Rademacher distribution into a continuous distribution by transferring an infinitesimal amount of probability mass to the region between -1 and 1, and smoothing out the point masses into continuous peak regions of infinitesimal width. Incidentally, x1/4 = -1 in the Rademacher distribution. Duoduoduo (talk) 13:12, 11 May 2013 (UTC)Reply
I am clearly misisng something in the proof. Here is a paper discussing these inequalities: [[1]]
There was just a notational confusion in going from the paper's notation to the notation in our article. I've fixed it now. Thanks for finding the online link. If you have links for the other references, could you put them into the article? Duoduoduo (talk) 18:46, 14 May 2013 (UTC)Reply

Lede needs better definition

edit

Currently the lede reads in its entirety:

In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values.[1][2] Its calculation does not require any knowledge of the form of the underlying distribution – hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by scale shift; and it reveals either left- or right-skewness equally well. Although its use has been mentioned in older textbooks[3][4] it appears to have gone out of fashion. It has been shown to be less powerful[5] than the usual measures of skewness.[6]

This does not say what the statistic is in terms of a formula, nor does it say what it measures other than the fact that "it reveals either left- or right-skewness equally well" -- which is too self-referential, since it says that nonparametric skew measures skewness. Duoduoduo (talk) 14:12, 29 June 2013 (UTC)Reply

References

  1. ^ Arnold BC, Groeneveld RA (1995) Measuring skewness with respect to the mode. The American Statistician 49 (1) 34–38 DOI:10.1080/00031305.1995.10476109
  2. ^ Rubio F.J.; Steel M.F.J. (2012) "On the Marshall–Olkin transformation as a skewing mechanism". Computational Statistics & Data Analysis Preprint
  3. ^ Yule G.U.; Kendall M.G. (1950) An Introduction to the Theory of Statistics. 3rd edition. Harper Publishing Company pp 162–163
  4. ^ Hildebrand DK (1986) Statistical thinking for behavioral scientists. Boston: Duxbury
  5. ^ Tabor J (2010) Investigating the Investigative Task: Testing for skewness - An investigation of different test statistics and their power to detect skewness. J Stat Ed 18: 1–13
  6. ^ Doane, David P.; Seward, Lori E. (2011). "Measuring Skewness: A Forgotten Statistic?" (PDF). Journal of Statistics Education. 19 (2).

Problem with "less powerful than the usual measures of skewness"

edit

Currently the last sentence of the lede says "It has been shown to be less powerful[5] than the usual measures of skewness.[6]" This doesn't make sense unless the reader is told "less powerful at doing what?" Reference 5 is entitled "Investigating the Investigative Task: Testing for skewness - An investigation of different test statistics and their power to detect skewness". This suggests to me that that paper defines skewness in some particular way, maybe the way Wikipedia's skewness article defines it or maybe some other way, and then uses nonparametric skew of samples to test for skewness in that defined sense. This needs to be clarified.

I looked it up in reference [6]. It refers to power against a null of normality. I put that in. Duoduoduo (talk) 16:44, 29 June 2013 (UTC)Reply