Talk:Allele frequency spectrum
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||
|
I think this should discuss the situation of a single locus before introducing "equivalent" loci and copy-number variation. Sminthopsis84 (talk) 15:20, 24 September 2015 (UTC)
- The allele frequency spectrum is a statistic over a whole ensemble of allele frequencies from many loci, so it necessarily requires considering more than a single locus. A lot of the misunderstanding is from how poorly written the allele frequency page is written. Many statements there are either confusing or misleading, or possibly flat-out wrong. It definitely needs some editing attention. Aaronragsdale (talk) 17:03, 24 September 2015 (UTC)
- Yes, an AFS can even be constructed for one population with many loci, but it can also be constructed for one locus and many populations. E.g., if there are 2 populations sampled at 1 locus, the sample sizes are 18 and 9 respectively, and the derived allele (assuming only two are being considered) was found 4 and 8 times respectively in population 1 and population 2, then the AFS is a 2-dimensional matrix of order 18x9, and all entries are zero except entry (4,8) which holds the value 1. If there are 3 populations, then the AFS is a 3-dimensional matrix ... My point is that unless the reader can figure out how to build an AFS for a simple case, there is little hope that they will be able to grasp what the general case is about, and that in this field it is extremely common for people to have no idea what the mathematics is, so they are unable to supervise the computer-aided calculations, and the effect is garbage-in => garbage-out. It would be very helpful if wikipedia could provide an explanation that builds from the simple to the more complex, so a reader could learn from it. (The allele frequency page needs editing to remove material that doesn't belong there; it could be made more readable by using the magic of hyperlinks.) Sminthopsis84 (talk) 19:45, 24 September 2015 (UTC)
- What you've described is the joint allele frequency spectrum for multiple populations - perhaps an explicit example of constructing a 2-population frequency spectrum would be instructive, I agree. The text already contains an example for a single population frequency spectrum that clearly describes constructing the spectrum in that case. And true, technically you could construct an allele frequency spectrum for a single locus, not only a large ensemble of loci. However, in practice that would be rather pointless. Allele frequency spectra are used to describe the observed patterns of allele frequencies, and the shape of the spectrum is what is informative from them. Along the same lines, you could sequence 10 chromosomes at a locus, not see any variation, and say the allele frequency spectrum is just , which also doesn't really tell you anything interesting. Allele frequency spectra are really only used and only useful when looking at an sufficient number of loci to say something about their pattern. This could probably use clarification in the text. Aaronragsdale (talk) 21:43, 24 September 2015 (UTC)
- This page is equating the allele frequency spectrum with the sites frequency spectrum. However, those are different concepts. The distinction is not subtle and should not be overlooked. Maybe the best approach would be to create a different page for the site frequency spectrum and distribute the content accordingly. 27 March 2016 — Preceding unsigned comment added by 73.223.61.110 (talk) 08:37, 27 March 2016 (UTC)
Chimpanzee used for ancestral allele estimation: why may it work?
editThe Calculation section says the following: For example in human population genetic studies, the homologous chimpanzee reference sequence is typically used to estimate the ancestral allele. This doesn't sound intuitive to me. Chimpanzees have diverged since the human-chimp common ancestor. It seems to me that in many cases the chimpanzee allele might not be the ancestral allele. Why is my intuition incorrect? Besides, there might be polymorphism in chimpanzees too, and if the site is variable for that species, the reference sequence for the chimpanzee may be an arbitrary selection among the chimpanzee polymorphism. My understanding may be wrong, but wouldn't these points deserve clarifications? Bli (talk) 16:25, 1 June 2016 (UTC)