G-value paradox

The G-value paradox arises from the lack of correlation between the number of protein-coding genes among eukaryotes and their relative biological complexity. The microscopic nematode Caenorhabditis elegans, for example, is composed of only a thousand cells but has about the same number of genes as a human.^[1]^[2] Researchers suggest resolution of the paradox may lie in mechanisms such as alternative splicing and complex gene regulation that make the genes of humans and other complex eukaryotes relatively more productive.^[3]

DNA and biological complexity

The lack of correlation between the morphological complexity of eukaryotes and the amount of genetic information they carry has long puzzled researchers.^[4] The sheer amount of DNA in an organism, measured by the mass of DNA present in the nucleus or the number of constituent nucleotide pairs, varies by several orders of magnitude among eukaryotes and often is unrelated to an organism's size or developmental complexity.^[5] One amoeba has 200 times more DNA per cell than humans,^[6] and even insects and plants within the same genus can vary dramatically in their quantity of DNA.^[7] This C-value paradox troubled genome scientists for many years.

Eventually, researchers recognized that not all DNA contributes directly to the production of proteins and other biological functions.^[8] Susumu Ohno coined the phrase "junk DNA" to describe these nonfunctional swaths of DNA.^[9] They include introns, genetic sequences that are removed after transcription into mRNA and thus are not translated into proteins;^[4]^[10] transposable elements that are mobile fragments of DNA, most of which are nonfunctional in humans;^[8]^[11] and pseudogenes, nonfunctional DNA sequences that originated from functional genes.^[12] The share of the human genome that may be considered "junk" remains controversial. Estimates reach as low as 8%^[13] and as high as 80%,^[14] with one researcher arguing that there is a fixed ceiling of 15% imposed by the genome's genetic load.^[15] (Prokaryotes, which have little "junk" DNA by comparison, exhibit a fairly close relationship between genome size and biological functionality).^[16]

In any case, the assumption was that once the C-paradox was swept away and the focus shifted to the number of protein-coding genes, the anticipated correlation between genetic information and biological complexity in eukaryotes would emerge.^[3] Unfortunately, the G-value paradox simply picked up where the C-value paradox left off, because the discrepancy persisted when comparisons were narrowed to just protein-coding genes.^[3]^[17]

G-value paradox

Estimates of the number of coding genes in the human genome reached upwards of 100,000 prior to the human genome project,^[18] but since have dwindled to as low as 19,000 following completion of that massive sequencing effort and subsequent refinements.^[1] By comparison, the microscopic water flea Daphnia pulex has about 31,000 genes;^[19] the nematode C. elegans about 19,700;^[2] the fruit fly (Drosophila melanogaster) about 14,000;^[20] the zebrafish (Danio rerio), 26,000;^[21] and the small flowering plant Arabidopsis thaliana, 27,000.^[22] Plants in general tend to have more genes than other eukaryotes.^[23] One explanation is their higher incidence of gene and whole genome duplication and retention of those additional genes, due in part to their development of a large collection of defensive secondary metabolites.^[23]

The apparent disconnect between the number of genes in a species and its biological complexity was dubbed the G-value paradox.^[3] While the C-value paradox unraveled with the discovery of massive sequences of noncoding DNA, resolution of the G-value paradox appears to rest on differences in genome productivity. Humans and other complex eukaryotes simply may be able to do more with what they have, genetically speaking.

Among the mechanisms cited for this greater productivity are more sophisticated transcriptional controls,^[24] multifunctional proteins, more interaction between protein products, alternative splicing^[25] and post-translational modifications that may produce several protein products from the same genetic raw material.^[3]^[24] In addition, thousands of non-coding RNAs that are transcribed from DNA but not translated into protein have emerged as important regulators of gene expression and development in humans and other eukaryotes.^[26] They include short RNA sequences, such as microRNAs (miRNAs), small interfering RNAs (siRNAs) and Piwi-interacting RNAs (piRNAs),^[26] and long, non-coding RNAs (lncRNA) that may regulate gene expression at different stages of development.^[27] Some researchers suggest that instead of the number of genes the focus now should shift to gene interactions and the network of genetic regulatory mechanisms that allow them to support a variety of biological activities.^[28]^[24] These transitions have taken analysis of genetic complexity from the C-value to the G-value to what some refer to as the I-value, a measure of the total information contained in a genome.^[3]

Defining complexity

One of the challenges in the long debate over the mismatch between genome size and biological complexity has been ambiguity in defining complexity. Is it the number of cell types in an organism, the sophistication of its nervous system or the number of different proteins it produces?^[17] By some definitions, the greater complexity of humans compared to other organisms may be illusory.^[29] Even once complexity is defined, some researchers argue complexity in function does not necessarily require the same complexity in process. Evolution is not a paragon of efficiency but travels a crooked path that leads to a more cumbersome genome than is necessary in some species.^[30]

References

^ ^a ^b Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, et al. (November 2014). "Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes". Human Molecular Genetics. 23 (22): 5866–78. doi:10.1093/hmg/ddu309. PMC 4204768. PMID 24939910.
^ ^a ^b Hillier LW, Coulson A, Murray JI, Bao Z, Sulston JE, Waterston RH (December 2005). "Genomics in C. elegans: so many genes, such a little worm". Genome Research. 15 (12): 1651–60. doi:10.1101/gr.3729105. PMID 16339362.
^ ^a ^b ^c ^d ^e ^f Hahn MW, Wray GA (2002). "The g-value paradox". Evolution & Development. 4 (2): 73–5. doi:10.1046/j.1525-142X.2002.01069.x. PMID 12004964. S2CID 2810069.
^ ^a ^b Gall JG (December 1981). "Chromosome structure and the C-value paradox". The Journal of Cell Biology. 91 (3 Pt 2): 3s–14s. doi:10.1083/jcb.91.3.3s. PMC 2112778. PMID 7033242.
^ Cavalier-Smith T (December 1978). "Nuclear volume control by nucleoskeletal DNA, selection for cell volume and cell growth rate, and the solution of the DNA C-value paradox". Journal of Cell Science. 34: 247–78. doi:10.1242/jcs.34.1.247. PMID 372199.
^ Holm-Hansen O (January 1969). "Algae: amounts of DNA and organic carbon in single cells". Science. 163 (3862): 87–8. Bibcode:1969Sci...163...87H. doi:10.1126/science.163.3862.87. PMID 5812598. S2CID 44975843.
^ Thomas CA (1971). "The genetic organization of chromosomes". Annual Review of Genetics. 5 (1): 237–56. doi:10.1146/annurev.ge.05.120171.001321. PMID 16097657.
^ ^a ^b Gregory TR (September 2005). "Synergy between sequence and size in large-scale genomics". Nature Reviews. Genetics. 6 (9): 699–708. doi:10.1038/nrg1674. PMID 16151375. S2CID 24237594.
^ Ohno, S. (1972). "So much "junk" DNA in our genome". Brookhaven Symp. Biol. 23: 366–370. PMID 5065367.
^ Gilbert W (May 1985). "Genes-in-pieces revisited". Science. 228 (4701): 823–4. Bibcode:1985Sci...228..823G. doi:10.1126/science.4001923. PMID 4001923.
^ Orgel LE, Crick FH (April 1980). "Selfish DNA: the ultimate parasite". Nature. 284 (5757): 604–7. Bibcode:1980Natur.284..604O. doi:10.1038/284604a0. PMID 7366731. S2CID 4233826.
^ Balakirev ES, Ayala FJ (2003). "Pseudogenes: are they "junk" or functional DNA?". Annual Review of Genetics. 37 (1): 123–51. doi:10.1146/annurev.genet.37.040103.103949. PMID 14616058. S2CID 24683075.
^ Rands CM, Meader S, Ponting CP, Lunter G (July 2014). Schierup MH (ed.). "8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage". PLOS Genetics. 10 (7): e1004525. doi:10.1371/journal.pgen.1004525. PMC 4109858. PMID 25057982.
^ Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, et al. (ENCODE Project Consortium) (September 2012). "An integrated encyclopedia of DNA elements in the human genome". Nature. 489 (7414): 57–74. Bibcode:2012Natur.489...57T. doi:10.1038/nature11247. PMC 3439153. PMID 22955616.
^ Graur D (July 2017). Martin B (ed.). "An Upper Limit on the Functional Fraction of the Human Genome". Genome Biology and Evolution. 9 (7): 1880–1885. doi:10.1093/gbe/evx121. PMC 5570035. PMID 28854598.
^ Taft RJ, Pheasant M, Mattick JS (March 2007). "The relationship between non-protein-coding DNA and eukaryotic complexity". BioEssays. 29 (3): 288–99. doi:10.1002/bies.20544. PMID 17295292. S2CID 16226307.
^ ^a ^b Claverie JM (February 2001). "Gene number. What if there are only 30,000 human genes?". Science. 291 (5507): 1255–7. doi:10.1126/science.1058969. PMID 11233450. S2CID 11444318.
^ Fields C, Adams MD, White O, Venter JC (July 1994). "How many genes in the human genome?". Nature Genetics. 7 (3): 345–6. doi:10.1038/ng0794-345. PMID 7920649. S2CID 26164550.
^ Colbourne JK, Pfrender ME, Gilbert D, Thomas WK, Tucker A, Oakley TH, et al. (February 2011). "The ecoresponsive genome of Daphnia pulex". Science. 331 (6017): 555–61. Bibcode:2011Sci...331..555C. doi:10.1126/science.1197761. PMC 3529199. PMID 21292972.
^ Hales KG, Korey CA, Larracuente AM, Roberts DM (November 2015). "Genetics on the Fly: A Primer on the Drosophila Model System". Genetics. 201 (3): 815–42. doi:10.1534/genetics.115.183392. PMC 4649653. PMID 26564900.
^ Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, et al. (April 2013). "The zebrafish reference genome sequence and its relationship to the human genome". Nature. 496 (7446): 498–503. Bibcode:2013Natur.496..498H. doi:10.1038/nature12111. PMC 3703927. PMID 23594743.
^ Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, et al. (January 2008). "The Arabidopsis Information Resource (TAIR): gene structure and function annotation". Nucleic Acids Research. 36 (Database issue): D1009-14. doi:10.1093/nar/gkm965. PMC 2238962. PMID 17986450.
^ ^a ^b Sterck L, Rombauts S, Vandepoele K, Rouzé P, Van de Peer Y (April 2007). "How many genes are there in plants (... and why are they there)?". Current Opinion in Plant Biology. 10 (2): 199–203. doi:10.1016/j.pbi.2007.01.004. PMID 17289424.
^ ^a ^b ^c Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, Romano LA (September 2003). "The evolution of transcriptional regulation in eukaryotes". Molecular Biology and Evolution. 20 (9): 1377–419. doi:10.1093/molbev/msg140. PMID 12777501.
^ Nagasaki H, Arita M, Nishizawa T, Suwa M, Gotoh O (December 2005). "Species-specific variation of alternative splicing and transcriptional initiation in six eukaryotes". Gene. 364: 53–62. doi:10.1016/j.gene.2005.07.027. PMID 16219431.
^ ^a ^b Gaiti F, Calcino AD, Tanurdžić M, Degnan BM (July 2017). "Origin and evolution of the metazoan non-coding regulatory genome". Developmental Biology. 427 (2): 193–202. doi:10.1016/j.ydbio.2016.11.013. PMID 27880868.
^ Leone S, Santoro R (August 2016). "Challenges in the analysis of long noncoding RNA functionality". FEBS Letters. 590 (15): 2342–53. doi:10.1002/1873-3468.12308. PMID 27417130. S2CID 19766152.
^ Szathmáry E, Jordán F, Pál C (May 2001). "Molecular biology and evolution. Can genes explain biological complexity?". Science. 292 (5520): 1315–6. doi:10.1126/science.1060852. PMID 11360989. S2CID 86104866.
^ McShea DW (April 1996). "Perspective Metazoan Complexity and Evolution: Is There a Trend?". Evolution; International Journal of Organic Evolution. 50 (2): 477–492. doi:10.1111/j.1558-5646.1996.tb03861.x. PMID 28568940. S2CID 29590466.
^ Jacob F (June 1977). "Evolution and tinkering". Science. 196 (4295): 1161–6. Bibcode:1977Sci...196.1161J. doi:10.1126/science.860134. PMID 860134. S2CID 29756896.

[:3-1] Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, et al. (November 2014). "Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes". Human Molecular Genetics. 23 (22): 5866–78. doi:10.1093/hmg/ddu309. PMC 4204768. PMID 24939910.

[:4-2] Hillier LW, Coulson A, Murray JI, Bao Z, Sulston JE, Waterston RH (December 2005). "Genomics in C. elegans: so many genes, such a little worm". Genome Research. 15 (12): 1651–60. doi:10.1101/gr.3729105. PMID 16339362.

[:2-3] ^ ^a ^b ^c ^d ^e ^f Hahn MW, Wray GA (2002). "The g-value paradox". Evolution & Development. 4 (2): 73–5. doi:10.1046/j.1525-142X.2002.01069.x. PMID 12004964. S2CID 2810069.

[:0-4] Gall JG (December 1981). "Chromosome structure and the C-value paradox". The Journal of Cell Biology. 91 (3 Pt 2): 3s–14s. doi:10.1083/jcb.91.3.3s. PMC 2112778. PMID 7033242.

[5] Cavalier-Smith T (December 1978). "Nuclear volume control by nucleoskeletal DNA, selection for cell volume and cell growth rate, and the solution of the DNA C-value paradox". Journal of Cell Science. 34: 247–78. doi:10.1242/jcs.34.1.247. PMID 372199.

[6] Holm-Hansen O (January 1969). "Algae: amounts of DNA and organic carbon in single cells". Science. 163 (3862): 87–8. Bibcode:1969Sci...163...87H. doi:10.1126/science.163.3862.87. PMID 5812598. S2CID 44975843.

[7] Thomas CA (1971). "The genetic organization of chromosomes". Annual Review of Genetics. 5 (1): 237–56. doi:10.1146/annurev.ge.05.120171.001321. PMID 16097657.

[:1-8] Gregory TR (September 2005). "Synergy between sequence and size in large-scale genomics". Nature Reviews. Genetics. 6 (9): 699–708. doi:10.1038/nrg1674. PMID 16151375. S2CID 24237594.

[9] Ohno, S. (1972). "So much "junk" DNA in our genome". Brookhaven Symp. Biol. 23: 366–370. PMID 5065367.

[10] Gilbert W (May 1985). "Genes-in-pieces revisited". Science. 228 (4701): 823–4. Bibcode:1985Sci...228..823G. doi:10.1126/science.4001923. PMID 4001923.

[11] Orgel LE, Crick FH (April 1980). "Selfish DNA: the ultimate parasite". Nature. 284 (5757): 604–7. Bibcode:1980Natur.284..604O. doi:10.1038/284604a0. PMID 7366731. S2CID 4233826.

[12] Balakirev ES, Ayala FJ (2003). "Pseudogenes: are they "junk" or functional DNA?". Annual Review of Genetics. 37 (1): 123–51. doi:10.1146/annurev.genet.37.040103.103949. PMID 14616058. S2CID 24683075.

[13] Rands CM, Meader S, Ponting CP, Lunter G (July 2014). Schierup MH (ed.). "8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage". PLOS Genetics. 10 (7): e1004525. doi:10.1371/journal.pgen.1004525. PMC 4109858. PMID 25057982.

[14] Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, et al. (ENCODE Project Consortium) (September 2012). "An integrated encyclopedia of DNA elements in the human genome". Nature. 489 (7414): 57–74. Bibcode:2012Natur.489...57T. doi:10.1038/nature11247. PMC 3439153. PMID 22955616.

[15] Graur D (July 2017). Martin B (ed.). "An Upper Limit on the Functional Fraction of the Human Genome". Genome Biology and Evolution. 9 (7): 1880–1885. doi:10.1093/gbe/evx121. PMC 5570035. PMID 28854598.

[16] Taft RJ, Pheasant M, Mattick JS (March 2007). "The relationship between non-protein-coding DNA and eukaryotic complexity". BioEssays. 29 (3): 288–99. doi:10.1002/bies.20544. PMID 17295292. S2CID 16226307.

[:5-17] Claverie JM (February 2001). "Gene number. What if there are only 30,000 human genes?". Science. 291 (5507): 1255–7. doi:10.1126/science.1058969. PMID 11233450. S2CID 11444318.

[18] Fields C, Adams MD, White O, Venter JC (July 1994). "How many genes in the human genome?". Nature Genetics. 7 (3): 345–6. doi:10.1038/ng0794-345. PMID 7920649. S2CID 26164550.

[19] Colbourne JK, Pfrender ME, Gilbert D, Thomas WK, Tucker A, Oakley TH, et al. (February 2011). "The ecoresponsive genome of Daphnia pulex". Science. 331 (6017): 555–61. Bibcode:2011Sci...331..555C. doi:10.1126/science.1197761. PMC 3529199. PMID 21292972.

[20] Hales KG, Korey CA, Larracuente AM, Roberts DM (November 2015). "Genetics on the Fly: A Primer on the Drosophila Model System". Genetics. 201 (3): 815–42. doi:10.1534/genetics.115.183392. PMC 4649653. PMID 26564900.

[21] Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, et al. (April 2013). "The zebrafish reference genome sequence and its relationship to the human genome". Nature. 496 (7446): 498–503. Bibcode:2013Natur.496..498H. doi:10.1038/nature12111. PMC 3703927. PMID 23594743.

[22] Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, et al. (January 2008). "The Arabidopsis Information Resource (TAIR): gene structure and function annotation". Nucleic Acids Research. 36 (Database issue): D1009-14. doi:10.1093/nar/gkm965. PMC 2238962. PMID 17986450.

[:7-23] Sterck L, Rombauts S, Vandepoele K, Rouzé P, Van de Peer Y (April 2007). "How many genes are there in plants (... and why are they there)?". Current Opinion in Plant Biology. 10 (2): 199–203. doi:10.1016/j.pbi.2007.01.004. PMID 17289424.

[:6-24] Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, Romano LA (September 2003). "The evolution of transcriptional regulation in eukaryotes". Molecular Biology and Evolution. 20 (9): 1377–419. doi:10.1093/molbev/msg140. PMID 12777501.

[25] Nagasaki H, Arita M, Nishizawa T, Suwa M, Gotoh O (December 2005). "Species-specific variation of alternative splicing and transcriptional initiation in six eukaryotes". Gene. 364: 53–62. doi:10.1016/j.gene.2005.07.027. PMID 16219431.

[:8-26] Gaiti F, Calcino AD, Tanurdžić M, Degnan BM (July 2017). "Origin and evolution of the metazoan non-coding regulatory genome". Developmental Biology. 427 (2): 193–202. doi:10.1016/j.ydbio.2016.11.013. PMID 27880868.

[27] Leone S, Santoro R (August 2016). "Challenges in the analysis of long noncoding RNA functionality". FEBS Letters. 590 (15): 2342–53. doi:10.1002/1873-3468.12308. PMID 27417130. S2CID 19766152.

[28] Szathmáry E, Jordán F, Pál C (May 2001). "Molecular biology and evolution. Can genes explain biological complexity?". Science. 292 (5520): 1315–6. doi:10.1126/science.1060852. PMID 11360989. S2CID 86104866.

[29] McShea DW (April 1996). "Perspective Metazoan Complexity and Evolution: Is There a Trend?". Evolution; International Journal of Organic Evolution. 50 (2): 477–492. doi:10.1111/j.1558-5646.1996.tb03861.x. PMID 28568940. S2CID 29590466.

[30] Jacob F (June 1977). "Evolution and tinkering". Science. 196 (4295): 1161–6. Bibcode:1977Sci...196.1161J. doi:10.1126/science.860134. PMID 860134. S2CID 29756896.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]