Talk:Regression toward the mean/Archive 3

Archive 1Archive 2Archive 3

"Proof" debunking for gifted ninth graders (Fallacy: Part Six)

I need a proof of:

For any c in [a,b]:
  • Pr( [Sum (over all i such that y_i > c) e_i] > 0) > 0.5
  • Pr( [Sum (over all i such that y_i < c) e_i] < 0) > 0.5

This is the obvious generalization of his formulation. I assumed wrongly that he knows how to generalize his formulation. What AaCBrown provided is rubbish, purporting to prove that "for y_j > c, Pr(e_j > 0) > 0.5", something entirely different. His "proof" is:

Consider any point c in the range [a,b]. Consider any j. Unconditionally, e_j is equally likely to be positive or negative. If x_j > c, if e_j > 0 then Pr(y_j > c) = 1. If e_j < 0 then Pr{y_j > c) < 0.5. If x_j < c, if e_j > 0 then Pr(y_j > c) > 0 and if e_j < 0 then Pr(y_j > c) = 0. So either way, for y_j > c, Pr(e_j > 0) < Pr(e_j < 0), so Pr(e_j > 0) > 0.5

This "proof" does not even prove what it claims to prove. I have gone through this trash, and it is totally worthless. (It is an excellent exercise for gifted ninth graders to debunk.) The most egregious error is this "so", equating Pr(e_j > 0) to Pr(y_j > 0) [This gross error can be spotted by gifted ninth graders easily]:

So either way, for y_j > c, Pr(e_j > 0) < Pr(e_j < 0), so Pr(e_j > 0) > 0.5.--Palaeoviatalk 01:43, 2 August 2010 (UTC)

There are several other elementary mistakes. I recommend this, seriously, as an exercise for gifted eleventh graders to debunk point by point. (See my "Guide to debunkers" below.)--Palaeoviatalk 02:16, 2 August 2010 (UTC)

I am convinced that I have been absolutely right not to trust the arguments on RTM of someone whose mathematical sophistication, mathematical maturity, and muddleheadedness are reflected in such a proof as this.--Palaeoviatalk 02:23, 2 August 2010 (UTC)

Guide to debunkers (from gifted eleventh graders to math majors)

The exercise is to debunk, in detail, the following "proof".

Claim Let X = {x_1, x_2, . . ., x_n} be any set of unknown points. Let E = {e_1, e_2, . . .,e_n} be unknown i.i.d draws from a distribution with median zero and support over the entire real line. We observe only Y = {x_1 + e_1, x_2 + e_2, . . ., x_n + e_n}. The minimum value of y is a, the maximum value is b. Let c be any value in the range [a,b].

For all j such that y_j > c, Pr(e_j > 0) > 0.5.

(Though greater clarity is possible, I have preserved the original phrasing of the Claim.)

Invalid Proof (for debunking): It is in the citations, and it is as trivially obvious as the first proof. In fact, it's the same argument and I can do better, I can prove it for every point, not just the sum. However, again, I'm just reporting from the sources, none of this is my original work.

Consider any point c in the range [a,b]. Consider any j. Unconditionally, e_j is equally likely to be positive or negative. If x_j > c, if e_j > 0 then Pr(y_j > c) = 1. If e_j < 0 then Pr{y_j > c) < 0.5. If x_j < c, if e_j > 0 then Pr(y_j > c) > 0 and if e_j < 0 then Pr(y_j > c) = 0. So either way, for y_j > c, Pr(e_j > 0) < Pr(e_j < 0), so Pr(e_j > 0) > 0.5


(Note: see subsection Correct Notation below.)

To start gifted eleventh graders taking up this challenge off, let me analyze the beginning of the "proof":

First. note the bold phrases. They illustrate proof by intimidation. Always refuse to submit to such a proof tactic. Now we examine:

If x_j > c, if e_j > 0 then Pr(y_j > c) = 1.

Remember that the only probability space is that of error E. So "If x_j > c, if e_j > 0" means that x_j, error e_j, y_j are all known, and no more uncertainty remains. "y_j > c" is true. You should say "y_j > c" (a certain fact). It is wrong to say "Pr(y_j > c) = 1". Pr should always refer to the probability space in question. Now the next sentence,

If e_j < 0 then Pr{y_j > c) < 0.5.

Now this is confusing. If e_j is known, then y_j is also known, and (y_j > c) should be either true or false. "Pr{y_j > c) < 0.5" makes no sense. Is this Pr{y_j > c) the probability before e_j is known? Is so, then "Pr(y_j > c) = 1" in the earlier sentence must also refer to the probability before e_j is known. But how can the earlier sentence say "Pr(y_j > c) = 1" (i.e. before e_j is known, it is certain,with probability 1, that (y_j > c))? It is plainly false. So we are in a major notational and conceptual muddle here. Very sloppy thinking is exhibited here. Try to avoid such laziness and sloppiness of thought. Such sloppy thinking can lead to "discoveries" such as "0=1".

I'll leave the rest to you. You can have great fun debunking this "proof".

It is a good exercise in clear and rigorous mathematical reasoning.--Palaeoviatalk 05:08, 2 August 2010 (UTC)

Correct Claim

We have above this Claim:

Claim Let X = {x_1, x_2, . . ., x_n} be any set of unknown points. Let E = {e_1, e_2, . . .,e_n} be unknown i.i.d draws from a distribution with median zero and support over the entire real line. We observe only Y = {x_1 + e_1, x_2 + e_2, . . ., x_n + e_n}. The minimum value of y is a, the maximum value is b. Let c be any value in the range [a,b].

For all j such that y_j > c, Pr(e_j > 0) > 0.5.


The "proof" was a mess. The question remains: Is the claim true? Is there a valid proof? The answer is "No". The Claim is false. No valid proof exists for a false claim.

(Note: See subsection Correct Notation below.)

It is straight forward. The intention is to assert something about Pr(e_j > 0), for all j such that y_j > c.

Values in E are unknown. Consider any j (a particular j) such that y_j > c (we don't know which numbers qualify as such j yet). What is Pr(e_j > 0)?

Simple. Because the median of the error (E) distribution is 0, Pr(e_j > 0) = .5 . (This is in fact true of any j in [1,n].)

The correct (trivial) claim is therefore:

For all j such that y_j > c, Pr(e_j > 0) = 0.5--Palaeoviatalk 16:00, 2 August 2010 (UTC)

Correct Notation

I need to highlight here a notational issue. I wrote statements such as "Pr(e_j > 0) = 0.5" because AaCBrown wrote with this wrong notation, and I wanted to focus only on the essential conceptual errors in his prrof. e_j is a specific value, and no probability space exists with respect to e_j.

The correct notation is Pr(E_j > 0) = 0.5, where E_j is a random variable. Such misuse of notation shows his lack of basic knowledge in an area where he authoritatively propounds preposterous falsehood.--Palaeoviatalk 00:58, 5 August 2010 (UTC)

Palaeovia's concluding remarks

Anyone is welcome to propose improvement to the article. However, when someone with a history of writing utter nonsense proposes to re-introduce into this article his pet theory, for which after lengthy discussions I was not shown either a proof or a credible source, I am apt to be fervent in my pursuit of truth (mathematical and statistical, theoretical and empirical). My impression has been strenghthened that his understanding of the issue is superficial, his mathematical training is inadequate, and his interpretation is either "original research", or gross distortion of more carefully phrased, qualified statements from scholars.

Mathematicians generally promptly admit their errors, when pointed out, and proceed to seek the truth. Crackpots never admit their patent errors (usually they cannot understand mathematical and logical reasoning), and proceed to defend their pet theories to their last breath. I respect the former, and expose the latter.

In mathematics, truth is remarkably uncontroversial. It is not a matter of compromise. Of course what truth belongs to this article is a matter of debate and compromise. Excluding error and fallacy is my sole objective. I am open to be proved an idiot. I will learn from my errors, and improve. --Palaeoviatalk 23:02, 1 August 2010 (UTC)

Curious deletion of Palaeovia's posts on Crackpots and Sciolists

Given the persistent effort by Melcombe to delete my following post from this Talk page, I will explain its relevance:

My post on Crackpots and Sciolists

On the matter of mathematicians' honesty in facing up to their errors, the following example (from the article Andrew Wiles) of Andrew Wiles is exemplary:

The proof of Fermat's Last Theorem
Starting in the summer of 1986, based on successive progress of the previous few years of Gerhard Frey, Jean-Pierre Serre and Ken Ribet, Wiles realised that a proof of a limited form of the modularity theorem might then be in reach. He dedicated all of his research time to this problem in relative secrecy. In 1993, he presented his proof to the public for the first time at a conference in Cambridge. In August 1993, however, it turned out that the proof contained a gap. In desperation, Andrew Wiles tried to fill in this gap, but found out that the error he had made was a very fundamental one. According to Wiles, the crucial idea for circumventing, rather than closing this gap, came to him on 19 September 1994. Together with his former student Richard Taylor, he published a second paper which circumvented the gap and thus completed the proof. Both papers were published in 1995 in a special volume of the Annals of Mathematics.

How do you tell a mathematician from a mathematical crackpot?

  • A mathematician occasionally makes subtle mistakes, understands his mistakes, and readily admits to them.
  • A crackpot frequently makes obvious, elementary mistakes, cannot understand that they are mistakes, and never admits to any mistake.
  • A mathematician confesses ignorance in fields beyond his expertise.
  • A crackpot (a sciolist) propounds authoritatively on subjects of which he has but superficial and faulty knowledge.

Against sciolism:

  • A little learning is a dangerous thing; drink deep, or taste not the Pierian spring: there shallow draughts intoxicate the brain, and drinking largely sobers us again."

Explanation

The recent exchange concerns the truth of certain claims and the behavior of their author, upon being shown their falsity. An appropriate description for such behavior is "crackpot", a term not necessarily familiar to everyone. Giiven that I have had some experience in dealing with such behavior, and have given some serious thought to the same, my post above should help to clarify the precise sense in which I have employed the term, and serve to distinguish mathematicians from crackpots, in general.

I wonder at Melcombe's standard in policing this Talk page. I could point to certain other posts that, if such standard had been consistently applied, should have suffered at his hand.

Could my post have been singled out as a result of our previous exchange?--Palaeoviatalk 09:25, 10 August 2010 (UTC)

None of the above diatribe has anything to do with discussing what should be in the article and so should be deleted. Standards for talk pages are at WP:Talk. Melcombe (talk) 11:13, 10 August 2010 (UTC)

Express your views, by all means. Let each be his/her own judge. Deleting others' posts in a Talk page (meant for talks) seems devious. --Palaeoviatalk 11:33, 10 August 2010 (UTC)

If those posts breach our talk page guidelines, they can certainly be removed. Editors who make personal attacks against other editors can also be blocked or banned from editing Wikipedia. Comment on the content, not the contributor. --Avenue (talk) 13:00, 10 August 2010 (UTC)

Notation needs definition.

A casual reader pointed out here that there is no definition for most of the notation in the regression section. I don't know wikipedia standards for this, but E[], Var[], Cov[], \hat, epsilon, and pipe = conditional on all either need to be defined, eliminated, or a noted as defined on another page. It seems in the least squares page that they are defined with a link when first introduced. — Preceding unsigned comment added by 108.75.137.21 (talk) 14:47, 22 June 2011 (UTC)

Cross-Cultural Differences in recognizing and adjusting to a regression toward the mean.

A recent study performed by Roy Spina et al. found that there are cultural differences in being able to account for the regression toward the mean, and I think that this may be found in other studies and would add to this article. Here is the citation for his article. Spina, R. R., Ji, L., Ross, M., Li, Y., & Zhang, Z. (2010). Why best cannot last: Cultural differences in predicting regression toward the mean. Asian Journal Of Social Psychology, 13(3), 153-162. doi:10.1111/j.1467-839X.2010.01310.x Fotherge (talk) 21:19, 7 February 2012 (UTC)

Please select non-controversial examples illustrating regression to mean.

Have removed a reference to alleged criticism of UK speed cameras partly because it appeared to argue an unrelated point about speed cameras being an unproductive use of road safety funds. A good illustrative example of regression to the mean should be clear & easy to understand and should not drag in any secondary issues which could detract from the idea being explained.

Noel darlow (talk) 21:46, 23 October 2013 (UTC)

Reinserted with offending sentence removed and two additional references. Qwfp (talk) 19:30, 24 October 2013 (UTC)
Great :) It does make a good, topical example of R2M when it's worded to be camera-neutral. Noel darlow (talk) 01:07, 25 October 2013 (UTC)

Regression effect/fallacy

The explanation given of regression towards mediocrity seems plausible. But it does not explain why this phenomenon also occurs with entirely random data. Generate (x, y) pairs from a bivariate normal distribution with the same marginal distributions and correlation 0.5. The regression effect will show up, and no genetic theory will account for it.

This is already discussed in the article, I'm just wondering if the example from genetics really explains something that is not already an artifact of the definition of the regression line.TerryM--re (talk) 22:52, 24 May 2016 (UTC)

Why is it MORE likely that high performers are unlucky the next day?

Seems like they should be just as likely to be unlucky (or lucky) the next day as they were on the first day. Danielx (talk) 02:44, 20 December 2013 (UTC)

They aren't. However, if they are a high scorer for this event, then it's more likely that they have been lucky this time and entirely probable they won't be next time. Alpha3031 (talk) 13:17, 3 April 2015 (UTC)
They are. Note that the disagreement here hinges on a subtle difference in the frame of reference for the (un)luckiness. The highest performers are likely *to have had* (past) luck on the exam. The implicit assumption here, however, is that everyone is equally likely *to have* (future) luck on an exam, and that that luck is generated independently of the course and the individual (e.g. you get sick, your dog dies, etc). Supposing the expected luck to be 0, the high performers are just as likely to be lucky on the second exam as the were likely to be lucky before going into the first exam: expected luck = 0. However, their high performance on the first exam is evidence of good luck on the first exam. In fact, they are more likely *to have had* (past) good luck on the first exam than they are *to have* (future) good luck on the next exam. e.g. expected value of luck on the first exam of a high performer on the first exam, after the first exam has taken place = 1, expected value of luck on the second exam of a high performer on the first exam, before the second exam has taken place = 0. --Ihearthonduras (talk) 18:16, 14 November 2017 (UTC)

Issue with some phrasing

In the *Misunderstandings* section, there is this phrase "So for every individual, we expect the second score to be closer to the mean than the first score." This is not true. For example, individuals who have a score of exactly the mean should experience regression away from the mean. Regression toward the mean is specifically a phenomenon that affects the 'highest'/'lowest' performers. Everyone else expects to experience some sort of motion with respect to the mean, but not necessarily towards it.

In fact, there are a few statements surrounding that one that are related and misleading. I will now go and attempt to edit to clarify this statement in the page myself. --Ihearthonduras (talk) 18:27, 14 November 2017 (UTC)

Different use in finance

I added a note to the end of the introduction, because I'm almost certain that "mean reversion" as used in finance is fundamentally different from "reversion to the mean" or "regression to the mean" as described here. I don't think the Wikipedia article on Mean reversion (finance) is clear on this. As I understand it, as used in science and statistics, mean reversion is an effect that shows up when genuinely independent random samples are drawn successively from a fixed population having a constant frequency distribution.

As used in finance, it seems to be referring to a situation in which performance over successive time periods is not independent, but shows a negative correlation from one time period to the next. A fluke period of low returns is not followed by a typical period of average returns, simply due to the nature of a random process. On the contrary, a period of low returns has an actual tendency to be followed by a compensating period of high returns. Thus, the average return as holding periods increase decreases faster than it would if the process were a random walk.

The law of large numbers says that if you throw 10 heads in a row, then flip a coin 100 times more, the average number of heads for the whole 110 throws will be closer to 50/50, not because there's any tendency to throw more tails after a long series of heads, but simply because the maximum likelihood is that the 100 additional throws will be split 50/50 and the percentage for the whole series will decline from 10/10 = 100% heads to (10 + 50) / 110 = 55%. I've talked to a couple of financial specialists who have been quite definite that in finance, "mean reversion" does not just mean swamping out an unusual run with a series that simply has the mean value, it means active compensation--a run of low stock returns will (supposedly) tend to be followed, not by a run with mean-value stock returns, but by a run of higher-than-mean stock returns.

In the article, I'm doing my best to present this by paraphrasing what Jeremy Siegel says, but I admit that I'm going just a little farther by using the word "compensation." Dpbsmith (talk) 15:33, 22 December 2011 (UTC)

I think, with regards to finance, the random parts of the time series are generally modeled as a stationary process. I would conjecture with 95% confidence that stationary processes exhibit a "regression toward the mean" sort of phenomenon. This is probably the missing link that you want between these two articles. --Ihearthonduras (talk) 18:33, 14 November 2017 (UTC)

Examples

"If your favorite sport team won the championship last year, what does that mean for their chances for winning next season? To the extent this is due to skill (the team is in good condition, with a top coach etc.), their win signals that it's more likely they'll win next year. But the greater the extent this is due to luck (other teams embroiled in a drug scandal, favourable draw, draft picks turned out well etc.), the less likely it is they'll win next year."

I don't see this as a good example of regression to the mean at all. There is such a huge amount of feedback loop going on (increased investment, morale, attracting better quality people etc.) that this is going to override any theoretical underlying probability based on 'normal conditions'. Also winning a championship is binary (as in you either do or you don't) - it might be more useful to talk about whether the final rank is higher or lower than their average rank. Btljs (talk) 11:53, 19 January 2018 (UTC)

Confusion of terms and concepts

In the *Other Examples* section the author appears to continuously confuse probability and statistics. They claim "If your favorite sport team won the championship last year, what does that mean for their chances for winning next season? To the extent this is due to skill (the team is in good condition, with a top coach etc.), their win signals that it's more likely they'll win next year." This is an extremely poor form confusion of the statistics of the victorious match (1 win out of 1 trial implying a 100% likelihood) with the probability of future matches and comparing the probability to that statistical sample as such. How could it be more likely to win than 100%? The entire section needs to be revised or removed. — Preceding unsigned comment added by 184.54.35.21 (talk) 08:05, 15 April 2018 (UTC)

First sentence

"[...] the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the mean or average on its second measurement, and if it is extreme on its second measurement, it will tend to have been closer to the average on its first." Put another way, this sentence says that if you measure something twice, one of the measurements would be closer to the mean than to other, which is a rather meaningless sentence. Or one could understand it as saying that if one measurement is extreme, then the two measurements would be different from each other. This of course depends on the measurement in question. Not a very good way to start an article. I would have corrected it but I came here to learn about the concept, so I don't really have what it takes to make this correction. --178.8.24.240 (talk) 15:44, 28 November 2018 (UTC)

Picking nits - 'regression AWAY from the mean'

Regarding this: individuals that measure very close to the mean should expect to move away from the mean. I found a couple of references to 'regression away from the mean' online, but someting's wrong here. A person who scores 1oo on an IQ test (the mean, by definition), may indeed have a different score in a retest, and so, would have moved 'away from the mean.' But is this 'regression?

The word regression implies a return to an expected value. In the case of normally distributed IQ scores, the most probably score of a random test-taker (also the mean) is given the value of 100. So a test-taker who does, in fact, score 100 on a first test is already at the most probably value. So when, after a second test, a score of 98 results, to what has the test-taker 'regressed' to?

The only way this makes sense is if each test score is considered a single outcome in a sample of test results - the sample representing a part of all possible test results. So if the testee took 100 tests, we might find a range of scores (error on IQ tests is within 4 points, I believe). If the mean of the 100 test scores were 102, then the first score of 100 would, in later tests, regress toward the mean value of all possible test results.

In this case, there is some logic to a regression away from the population mean, but only in the sense that the first score of 100 was away from the test-takers true ability, and thus, what we really have is a regression of the test-taker's score to his or her personal mean test score.

The other way to think of it - the only other way I can think of - is that the movement away from the mean after a first score is not a regression at all, but simply error variance. Thus, if a testee's true IQ were 100, we would expect later test scores to be different, but only because test results are not perfect measures of intelligence - there is error in the testing process.

Another way to think if the first case is also for IQ. If two parents each have an IQ of 100, we do not expect their children to 'regress away from the mean.' In fact, their children will be expected to have IQs normally distributed with a mean of ... 100. The fact that sibilings have a mean IQ difference of about 11 points does not come from any regression away from the mean. It is simply the result of the probably distribution of IQ, centered on the mean, with lesser probabilities as you move out from the mean.

MarkinBoston (talk) 18:50, 17 April 2019 (UTC)