Wikipedia:Reference desk/Archives/Mathematics/2009 March 21

Mathematics desk
< March 20 << Feb | March | Apr >> March 22 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


March 21

edit

Looking for a reference

edit

A document I'm reading has the following passage (italics are mine):

"...laboratory controls must include the establishment of scientifically sound and appropriate specifications, standards, sampling plans and test procedures to assure that components and products conform to appropriate standards. One example of a scientifically sound statistical sampling and analytic plan is based on a binomial approach (see Table 1: Product Performance Qualification Criteria for the Platelet Component Collection Process). The sampling sizes described in Table 1 will confirm with 95% confidence a < 5% non-conformance rate for pH and residual WBC count, and < 25% non-conformance rate for actual platelet yield.

However, other statistical plans may also be appropriate, such as the use of scan statistics."

Do we have an article on scan statistics that is under another name? If not, does anyone know of a concise introduction to the principles? I see books on Amazon, but I'm looking for a not-too technical overview (i.e. less than ten pages in length). SDY (talk) 19:54, 21 March 2009 (UTC)[reply]

There is a short (and IMHO not very good) introduction to scan statistics in the "Guide to the preparation, use and quality assurance of blood components" published by the Council of Europe. Basically, it is binomial statistics with a small twist: instead of considering your samples independently, you look at a "moving window". Say your sample size is n. When you do a new quality control, you look at the set of the n-1 previous quality controls plus your new one. When setting up a scan statistics-based QC program, you need to know (1) the baseline error rate that is considered acceptable, and (2) the error rate at which you want the test to indicate a quality failure. Next, you need to find a combination of a moving window size, and a maximum acceptable number of failed tests within such a window, that result in a low false-positive rate and a high probability of detecting a quality failure. The chapter presents a table with some combinations of window sizes and maximum allowed failures per window. Unfortunately, the accepted error rates in that table are too high for the table to be very useful. In addition to determining a window size and a maximum acceptable number of failures per window, the table requires that you specify a third number, the "universe", corresponding to the number of samples analyzed per year. The false-positive rates in the table are calculated with respect to the "universe", whereas the power of the test to detect quality failure is calculated on a sample-by-sample basis.
Since two consecutive samples are not independent (they share n-1 observations), the maths for calculating false positive-rates becomes quite tricky. The chapter refers to this book for the calculations. However, I've heard from a trustworthy source that the false positive-rates in the table were actually calculated by Monte Carlo simulations, and not by the formulae in the book. The probability of detecting a quality failure (power) as presented in the table, was calculated using the cumulative binomial distribution, and is thus easily checked. If you do so, you will see that there is an error in the bottom row of the table. I've done simulations myself, which have shown that the false-positive rates in the table appear to be correct. --NorwegianBlue talk 13:24, 22 March 2009 (UTC)[reply]
That helps, thanks. SDY (talk) 15:35, 22 March 2009 (UTC)[reply]

Central Limit Theorem & the Gamma Distribution

edit

Hi there refdesk - I'm trying to show that the limit of the gamma distribution under the following integral:

  tends to 0 as n tends to infinity for any positive real lambda, by using the central limit theorem. By taking the limit as n tends to infinity, we should have a normal distribution as follows:

  where G is our gamma distribution, so then  , right? Then how do I show that the integral of the normal PDF over   tends to 0 as n tends to infinity?

Mathmos6 (talk) 22:48, 21 March 2009 (UTC)Mathmos6[reply]

Take  . Then the integral of the normal pdf tends to 1 as  . So there is an error somewhere. 71.182.216.55 (talk) 02:31, 22 March 2009 (UTC)[reply]

I figured there must be but I can't spot where - i assume it must be my very first bit with G, but I'm not sure where I've gone wrong... Mathmos6 (talk) 02:48, 22 March 2009 (UTC)Mathmos6[reply]

I believe that the initial assertion is wrong. I can do it with n! but not (n-1)!. Check your earlier calculations. 71.182.216.55 (talk) 03:08, 22 March 2009 (UTC)[reply]
With an n! it's obvious: the quantity given is a probability, so at most 1, so dividing by n makes it tend to 0. Algebraist 03:13, 22 March 2009 (UTC)[reply]
I didn't mean to suggest that my modification of the result was deep. But the result as stated is wrong. Would you care to confirm? 71.182.216.55 (talk) 03:23, 22 March 2009 (UTC)[reply]
The limit appears to be 1/2 (for λ=1). I can't see anything wrong with the CLT argument, and the integral of the normal tends to 1/2 (not 1 as stated above). Algebraist 03:27, 22 March 2009 (UTC)[reply]
Yes, 1/2 is right (I took the normal distribution over (-n,n) rather than (0,n)). Anyway, we are substantively in agreement that the result is definitely not zero. 71.182.216.55 (talk) 03:38, 22 March 2009 (UTC)[reply]

OK, we're looking at this integral:

 

Now let

 

Then as x goes from 0 to λn then u goes from 0 to λ2n, and already I'm wondering if you didn't intend n/λ instead of λn. If what you wrote is what you intended, then the integral becomes

 

But if you intended n/λ, then the integral becomes

 

For a Gamma distribution with expected value n and variance n, this integral is the probability that a random variable with that distribution is between the mean and √n standard deviations below the mean. That doesn't go to 0, but maybe it looks more promising than the other thing. Michael Hardy (talk) 03:43, 22 March 2009 (UTC)[reply]

I had considered myself whether n lambda was the correct upper limit for the integral but checked and rechecked and it certainly is - perhaps the question was simply written down wrong? Mathmos6 (talk) 04:55, 22 March 2009 (UTC)Mathmso6[reply]

Yes, something is wrong, as they say. But what exactly did you check? Anyway, you may clarify this thing a little if you consider what the central limit theorem actually tells you about a sequence of iid random variables   with exponential distribution law, which was the topic of your problem, as we may reasonably presume (and I assume that it was an exercise, whose text has been corrupted at some point). If   has pdf   supported in  , you should find, by the central limit theorem,
  tends to  ,
as n tends to infinity, for any positive real lambda and all  . Notice that for   and   we have again the upper limit n in the integral, and we find again the correct limit 1/2 for the special case considered above --pma (talk) 15:27, 22 March 2009 (UTC)[reply]
RMK. It is possible that the original integral had an upper bound  , defined somewhere; then the definition has been lost, and successively the unintelligible   has been wrongly corrected into  ; or that   became   after a typo, and consequently the definition of   was expunged as useless. Still, I can't see where the statement that the limit be 0 comes from. Ignorabimus...  :-(
Oh, but most likely the integral is exactly the one you wrote, only the correct statement is that the limit is 0 for all positive   strictly less than 1, it is 1/2 for  , and it is 1 for all   strictly larger than 1. If so, you just need to use the central limit theorem as written above to conclude. The point is that, no matter what   is, we have:   for all large n, respectively,   for all large n, according whether we are in the case   or in the case  . This allows to make a comparison of integrals, finding limit (superior) =0 in the former case, respectively, limit (inferior) = 1 in the latter. Does this make sense to you? --pma (talk) 16:19, 22 March 2009 (UTC)[reply]
It's just this: if   then for all   we have   for all large n, so
 .
Since this holds for all c, and the LHS does not depend on c, we have
 ;
analogously, if   we find the limit to be 1, and if   we simply choose c=0 and find the limit 1/2. --pma (talk) 08:26, 25 March 2009 (UTC)[reply]