Wikipedia:Reference desk/Archives/Mathematics/2021 February 25
Mathematics desk | ||
---|---|---|
< February 24 | << Jan | February | Mar >> | February 26 > |
Welcome to the Wikipedia Mathematics Reference Desk Archives |
---|
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages. |
February 25
editNelson rules
editNelson rules present 8 rules descriptively. Included are illustrations which in general are fake. For example in rule 2, it is obvious that the mean is in the wrong place, and as soon as that is corrected then for the shown data, there won't be "nine or more" points on the same side of the mean.
I would like to redo the graphs with real data, to which end, in each case, I would like to find 20 integer values that meet the relevant criterion. I have no idea how to go about generating such sets of data other than running a program using rand() and waiting until I get a set that happens to meet the need.
Is there a better way of generating set of values? -- 09:30, 25 February 2021 (UTC)
- The figures show sections from much longer time series. The mean (and the standard deviations) are those of the long time series and therefore are mostly based on data points that are not shown. I don't think the figures need or should be altered. --Wrongfilter (talk) 10:18, 25 February 2021 (UTC)
- I agree that there is no need to alter the chart examples. Any example should eventually appear in a sufficiently long purely Gaussian random sequence, for which one should not use rand (). (You get a decent approximation of the standard normal distribution using
rand()-rand()+rand()-rand()+rand()-rand()+rand()-rand()+rand()-rand()+rand()-rand()
, that is, twelve calls, half of which count as negative.) Among the innumerable examples, you probably do not want one that is statistically very unlikely (such as having excursions beyond five sigma). But those that are statistically the most likely, will also be very atypical. For example, the most likely occurrences of nine points in a row on the same side have them infinitesimally displaced from the mean, and three points in a row more than two sigma from the mean will be exactly two sigma away. In the time it will take you to come up with devising a smart criterion for what you want, the dumbest program you can imagine will have produced zillions of examples. --Lambiam 13:41, 25 February 2021 (UTC)- I'd add that the test is meant to be applied to data with a normal distribution, which is a continuous distribution so integer data would not work. Similar tests might be developed for non-normal distributions, but they would have different criteria. --RDBury (talk) 14:23, 25 February 2021 (UTC)
- The formula I gave above assumed that 0 ≤ rand() < 1, like random.random() in Python, but I see that in C++ rand() returns a number in the range 0 to RAND_MAX, so to get variance 1 the results need to be divided by RAND_MAX+1. --Lambiam 14:31, 25 February 2021 (UTC)
- Another way to generate normal deviates is the Box–Muller transform. The rand() function does generate numbers between 0 and 1 in most languages; I guess C++, being built a bit closer to the hardware, lets you code that in yourself if that's what you want. --RDBury (talk) 14:44, 25 February 2021 (UTC)
- The formula I gave above assumed that 0 ≤ rand() < 1, like random.random() in Python, but I see that in C++ rand() returns a number in the range 0 to RAND_MAX, so to get variance 1 the results need to be divided by RAND_MAX+1. --Lambiam 14:31, 25 February 2021 (UTC)
- I'd add that the test is meant to be applied to data with a normal distribution, which is a continuous distribution so integer data would not work. Similar tests might be developed for non-normal distributions, but they would have different criteria. --RDBury (talk) 14:23, 25 February 2021 (UTC)
- I agree that there is no need to alter the chart examples. Any example should eventually appear in a sufficiently long purely Gaussian random sequence, for which one should not use rand (). (You get a decent approximation of the standard normal distribution using