Talk:Fisher's exact test
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Please note that the minimum expected value for a chi-squared test to be appropriate is 10 not 5, in the particular case where there is only one degree of freedom (see any responsible stats cookbook). This was correct in earlier versions of the page and I have put it back now. seglea 21:30, 11 May 2006 (UTC)
In Bob Moore. (2004) On Log-Likelihood-Ratios and the Significance of Rare Events. In Proc. of the ACL 2004., Moore shows that Fischer's exact is not really prohibitively more expensive to compute than Chi-square. In light of this, the introductory paragraph suggesting that its computational complexity is a major consideration may deserve some qualification. —Preceding unsigned comment added by 70.108.245.148 (talk) 21:16, 19 February 2008 (UTC)
I checked the link to http://mathworld.wolfram.com/FishersExactTest.html and found there were no content for general cases. So I deleted it. Lixiaoxu (talk) 13:33, 9 November 2008 (UTC)
- Yes you're quite right. I've also removed the link at the bottom. RupertMillard (Talk) 10:53, 24 March 2009 (UTC)
- I don't understand this. I quote from the beginning of the mathworld article, "Let there exist two such variables X and Y, with m and n observed states, respectively...."; and there then follows a paragraph giving the formula and some description of procedures for the m x n case. In what sense can this be described as having no content for the m x ncase? I have therefore restored the link and reference. seglea (talk) 23:34, 24 March 2009 (UTC)
- Oh yes - you're right. I think I'm going mad! Thank you for putting the link back in. I don't think the article's brilliantly clear, but it's a start - very vague about the other measures of association that are required for case. RupertMillard (Talk) 07:10, 25 March 2009 (UTC)
- Agreed. I only put it in because at least it states unambiguously that the is possible, and so many students (and not a few lecturers) believe that only 2 x 2 can be done. There might be a reference to a better source in some SPSS manual, since SPSS will calculate the case, but I don't have one to hand. seglea (talk) 21:43, 25 March 2009 (UTC)
- Oh yes - you're right. I think I'm going mad! Thank you for putting the link back in. I don't think the article's brilliantly clear, but it's a start - very vague about the other measures of association that are required for case. RupertMillard (Talk) 07:10, 25 March 2009 (UTC)
- I don't understand this. I quote from the beginning of the mathworld article, "Let there exist two such variables X and Y, with m and n observed states, respectively...."; and there then follows a paragraph giving the formula and some description of procedures for the m x n case. In what sense can this be described as having no content for the m x ncase? I have therefore restored the link and reference. seglea (talk) 23:34, 24 March 2009 (UTC)
In the example the notation switches from girls and boys to men and women. Perhaps it would be less confusing to maintain one label. Australisergosum (talk) 01:41, 16 December 2008 (UTC)
I wonder if the example gender x dieting is well-chosen... It is a requirement of the standard exact fisher test that both marginals are fixed; it can easily be assumed that a researcher could choose to include an equal number of men and women in his/her sample, but how about dieters versus non-dieters? These particular marginal counts seem to be random to me? —Preceding unsigned comment added by 201.52.149.7 (talk) 23:09, 31 March 2010 (UTC)
The link to http://www.socr.ucla.edu/htmls/ana/FishersExactTest_Analysis.html points to an applet that only calculates P(Cutoff), and not the actual probability of the null hypothesis. http://www.physics.csbsju.edu/stats/exact2.html calculates the interesting probability correctly, and works for NxN matrices.128.243.21.225 (talk) 21:12, 22 January 2009 (UTC)
- I just looked at the link for the Fisher exact test calculator that you gave: Fisher Exact Test Calculators: 2-by-2 and N-by_N, but the HTML was rather mangled, so it is not rendered in Firefox 12 or IE9. Looking at the source, I see that the page has good information. Here are the direct (working) links to the calculators:
Typo in formula explanation
editIf the marginal totals (i.e. a+b, a+b, a+c, and b+d) are known
I believe the second a+b should be c+d.109.65.36.159 (talk) 22:01, 13 January 2019 (UTC)
Reference does not exist - Exact inference in categorical data. Biometrics, 53(1), 112-117.'
editMehta, C. R.& Patel, N. R. 1997. Exact inference in categorical data. Biometrics, 53(1), 112-117. definitely does not exist.[1] Is the intention to reference Mehta CR. Exact inference for categorical data. Encyclopedia of Biostatistics 1998; 2:1411–1422 as per[2]? I would probably cite this as Corcoran, Christopher D; Senchaudhuri, Pralay; Mehta, Cyrus R; Patel, Nitin R, Exact Inference for Categorical Data, doi:10.1002/0470011815.b2a10019. Anyway, I have removed the reference for now, as it was superfluous to the 1984 reference. RupertMillard (Talk) 10:47, 24 March 2009 (UTC)
- Very odd. That reference was added by an anon in April 2008, presumably relying on a secondary source. seglea (talk) 23:43, 24 March 2009 (UTC)
Question
editCan someone spell out how the value from Fisher exact is used please? Is fisher exact value same as p-value? What is considered to be statistically significant? —Preceding unsigned comment added by Sedoc (talk • contribs) 16:09, 5 June 2009 (UTC)
- You should try the mathematics reference desk for a question like that. Baccyak4H (Yak!) 17:41, 5 June 2009 (UTC)
Is there any confirmation on the minimum value of n=5 or 10 or it it still a debated topic? I have seen textbooks (Biostatistics the bare essentials 2nd edition - Geoffrey R Norman/David L Streiner) and statistics professors in the flesh that says otherwise. Any paper/summary would help the layman to understand the debate if any. Thanks a million. —Preceding unsigned comment added by 155.69.163.224 (talk) 04:51, 30 October 2009 (UTC)
Fisher-Irwin Test
editThis is the same as the Fisher-Irwin test, correct? If so there should at least be a redirect, and a mention in the article. Esoxidt•contribs 18:13, 12 January 2013 (UTC)
- According to Campbell (https://doi.org/10.1002/sim.2832), the answer is yes, they are the same.
- "Versions of the Fisher–Irwin test — This test appears in the literature under various names including ‘Fisher’s exact test’. Because the test was developed independently by Fisher [1, 17] and Irwin [18], and because it is controversial whether the P values obtained are‘exact’ in all 2 x 2 tables, the test will be referred to here as the ‘Fisher–Irwin test’." Cajawe (talk) 11:57, 29 March 2024 (UTC)
Dieting
editI changed the example from dieting to studying. Female teenagers are particularly likely to develop eating disorders, and dieting seems to be influenced by societal expectations that it's normal to diet (see eating disorder). Since there's no reason whatsoever that this example must be about dieting, I changed it. "Studiers" is an awkward word, so feel free to change it to "slackers" and "keeners" or whatever you can think of that fits better. But really, it would make most sense to find an example that doesn't involve made up statistics about the habits of people who happen to have penises vs people without penises. For example, it could be two sets of patients taking a new medication.
Example is wrong?
editAccording to online calculators and R, fisher.test(matrix(c(1, 9, 11, 3), 2,2)) results in a p-value of 0.002759 and not 0.0013. — Preceding unsigned comment added by 83.153.126.238 (talk) 19:23, 21 September 2017 (UTC)
- Today (2018-03-30) I ran this:
- fisher.test(rbind(c(1,9),c(11,3)), alternative="less")$p.value
- [1] 0.001379728
- and this:
- fisher.test(rbind(c(1,9),c(11,3)), alternative="two.sided")$p.value
- [1] 0.002759456
- with R version 3.2.2 (2015-08-14).
- So the example is right, and you have applied the two-sided test where the example was using the "less" one-sided test. You should read again the article, and more carefully, because after the example, the two-sided case is discussed and it is said that in the example framework the two-sided p-value is twice the one-sided. GizTwelve (talk) 15:19, 20 March 2018 (UTC)
Simplification
editThe example
Men | Women | Row total | |
---|---|---|---|
Studying | 1 | 9 | 10 |
Not-studying | 11 | 3 | 14 |
Column total | 12 | 12 | 24 |
is conveniently analyzed by computing the mean values and variances of the hypergeometric distributions, rather than computing the hypergeometric probabilities themselves.
Based on the sums
Men | Women | Row total | |
---|---|---|---|
Studying | 10 | ||
Not-studying | 14 | ||
Column total | 12 | 12 | 24 |
The mean values are (10)(12)/(24) etc
Men | Women | Row total | |
---|---|---|---|
Studying | 5 | 5 | 10 |
Not-studying | 7 | 7 | 14 |
Column total | 12 | 12 | 24 |
All the variances are equal to (10)(14)(12)(12)/(24)(24)(24-1)=35/23
The squares of the deviations from the mean values are all equal to 16.
As 16/(35/23) ≈ 10.5 the difference of proportion is indeed significant.
This calculation is simpler than that in the text.
On the requirement that the margin of the tables be fixed
editIn Fisher's original example quoted in the article, the number of cups of tea Bristol judged to have had milk put in first is not fixed a priori. It is of course fixed once the experiment has been carried out, but if we take this as the meaning of 'fixed' then every experiment seems to me to have fixed margins, rendering such a stipulation redundant. Therefore, I am assuming the article currently means to say that both margins must be a prior fixed in order for the test to apply. It appears to me from Fisher's motivating example that this is in fact untrue. I was wondering if there was disagreement on this point. If not, I would propose removing the stipulation that marginal totals be fixed. Marko1973 (talk) 16:24, 5 March 2020 (UTC)
- Marko1973 makes a perceptive observation. I would also add that it is not clear whether Bristol was constrained to select exactly 4 and 4. Presumably she was offered the cups one by one and had to make an irrevocable judgement. What if she was up to 4-3 (that is, she has said 4 with milk first and 3 with milk second) and, tasting cup number 8 was sure that this had milk first? Could she say that the final cup was milk first, giving a 5-3 split of her guesses? I've never been sure about this. Best wishes, Robinh (talk) 20:58, 7 March 2020 (UTC)
Politically incorrect example
editI am unhappy with the example given to illustrate Fischer's statistical significance idea. Please change it back what Fischer originally applied the idea to, i.e., tea with milk, or any other example from biology, of which there are plenty from Fischer's own papers.