Ridges (regions of increased gene expression) are domains of the genome with a high gene expression; the opposite of ridges are antiridges. The term was first used by Caron et al. in 2001.[1] Characteristics of ridges are:[1]
- Gene dense
- Contain many C and G nucleobases
- Genes have short introns
- High SINE repeat density
- Low LINE repeat density
Discovery
editClustering of genes in prokaryotes was known for a long time. Their genes are grouped in operons, genes within operons share a common promoter unit. These genes are mostly functionally related. The genome of prokaryotes is relatively very simple and compact. In eukaryotes the genome is huge and only a small amount of it are functionally genes, furthermore the genes are not arranged in operons. Except for nematodes and trypanosomes; although their operons are different from the prokaryotic operons. In eukaryotes each gene has a transcription regulation site of its own. Therefore, genes don't have to be in close proximity to be co-expressed. Therefore, it was long assumed that eukaryotic genes were randomly distributed across the genome due to the high rate of chromosome rearrangements. But because the complete sequence of genomes became available it became possible to absolutely locate a gene and measure its distance to other genes.
The first eukaryote genome ever sequenced was that of Saccharomyces cerevisiae, or budding yeast, in 1996. Half a year after that Velculescu et al. (1997) published a research in which they had integrated SAGE data with the now available genome map. During a cell cycle different genes are active in a cell. Therefore, they used SAGE data from three moments of the cell cycle (log phase, S phase-arrested and G2/M-phase arrested cells). Because in yeast all genes have a promoter unit of their own it was not suspected that genes would cluster near to each other but they did. Clusters were present on all 16 yeast chromosomes.[2] A year later Cho et al. also reported (although in more detail) that certain genes are located near to each other in yeast.[3]
Characteristics and function
editCo-expression
editCho et al. were the first who determined that clustered genes have the same expression levels. They identified transcripts that show cell-cycle dependent periodicity. Of those genes 25% was located in close proximity to other genes which were transcript in the same cell cycle. Cohen et al. (2000) also identified clusters of co-expressed genes.
Caron et al. (2001) made a human transcriptome map of 12 different tissues (cancer cells) and concluded that genes are not randomly distributed across the chromosomes. Instead, genes tend to cluster in groups of sometimes 39 genes in close proximity. Clusters were not only gene dense. They identified 27 clusters of genes with very high expression levels and called them RIDGEs. A common RIDGE counts 6 to 30 genes per centiray. However, there were great exceptions, 40 to 50% of the RIDGEs were not that gene dense; just like in yeast these RIDGEs were located in the telomere regions.[1]
Lercher et al. (2002) pointed to some weaknesses in Caron's approach. Clusters of genes in close proximity and high transcription levels can easily been generated by tandem duplicates. Genes can generate duplicates of themselves which are incorporated in their neighborhood. These duplicates can either became a functional part of the pathway of their parent gene, or (because they are no longer favored by natural selection) gain deleterious mutations and turn into pseudogenes. Because these duplicates are false positives in the search for gene clusters they have to be excluded. Lercher excluded neighboring genes with high resemblance to each other, after that he searched with a sliding window for regions with 15 neighboring genes.[4]
It was clear that gene dense regions existed. There was a striking correlation between gene density and a high CG content. Some clusters indeed had high expression levels. But most of the highly expressed regions consisted of housekeeping genes; genes that are highly expressed in all tissues because they code for basal mechanisms. Only a minority of the clusters contained genes that were restricted to specific tissues.
Versteeg et al. (2003) tried, with a better human genome map and better SAGE taqs, to determine the characteristics of RIDGEs more specific. Overlapping genes were treated as one gene, and genes without introns were rejected as pseudogenes. They determined that RIDGEs are very gene dense, have a high gene expression, short introns, high SINE repeat density and low LINE repeat density. Clusters containing genes with very low transcription levels had characteristics that were the opposite of RIDGEs, therefore those clusters were called antiridges.[5] LINE repeats are junk DNA which contains a cleavage site of endonuclease (TTTTA). Their scarcity in RIDGEs can be explained by the fact that natural selection favors the scarcity of LINE repeats in ORFs because their endonuclease sites can cause deleterious mutation to the genes. Why SINE repeats are abundant is not yet understood.
Versteeg et al. also concluded that, contrary to Lerchers analysis, the transcription levels of many genes in RIDGEs (for example a cluster on chromosome 9) can vary strongly between different tissues. Lee et al. (2003) analyzed the trend of gene clustering between different species. They compared Saccharomyces cerevisiae, Homo sapiens, Caenorhabditis elegans, Arabidopsis thaliana and Drosophila melanogaster, and found a degree of clustering, as fraction of genes in loose clusters, of respectively (37%), (50%), (74%), (52%) and (68%). They concluded that pathways of which the genes are clusters across many species are rare. They found seven universally clustered pathways: glycolysis, aminoacyl-tRNA biosynthesis, ATP synthase, DNA polymerase, hexachlorocyclohexane degradation, cyanoamino acid metabolism, and photosynthesis (ATP synthesis in non plant species). Not surprisingly these are basic cellular pathways.[6]
Lee et al. used very diverse groups of animals. Within these groups clustering is conserved, for example the clustering motifs of Homo sapiens and Mus musculus are more or less the same.[7]
Spellman and Rubin (2002) made a transcriptome map of Drosophila. Of all assayed genes 20% was clustered. Clusters consisted of 10 to 30 genes over a group size of about 100 kilobases. The members of the clusters were not functionally related and the location of clusters didn't correlate with know chromatin structures.[8]
This study also showed that within clusters the expression levels of on average 15 genes was much the same across the many experimental conditions which were used. These similarities were so striking that the authors reasoned that the genes in the clusters are not individually regulated by their personal promoter but that changes in the chromatin structure were involved. A similar co-regulation pattern was published in the same year by Roy et al. (2002) in C. elegans.[9]
Many genes which are grouped into clusters show the same expression profiles in human invasive ductal breast carcinomas. Roughly 20% of the genes show a correlation with their neighbors. Clusters of co-expressed genes were divided by regions with less correlation between genes. These clusters could cover an entire chromosome arm.
Contrary to previous discussed reports Johnidis et al. (2005) have discovered that (at least some) genes within clusters are not co-regulated. Aire is a transcription factor which has an up- and down-regulation effect on various genes. It functions in negative selection of thymocytes, which responds to the organisms own epitopes, by medullary cells.[10]
The genes that were controlled by aire clustered. 53 of the genes most activated by aire had an aire-activated neighbor within 200 Kb or less, and 32 of the genes most repressed by aire had an aire-repressed neighbor within 200 Kb; this is less than expected by change. They did the same screening for the transcriptional regulator CIITA.
These transcription regulators didn't have the same effect on al genes in the same cluster. Genes that were activated and repressed or unaffected were sometimes present in the same cluster. In this case, it's impossible that aire-regulated genes were clustered because they were all co-regulated.
So it is not very clear if domains are co-regulated or not. A very effective way to test this would be by insert synthetic genes into RIDGEs, antiridges and/or random places in the genome and determine their expression. Those expression levels must be compared to each other. Gierman et al. (2007) were the first who proved co-regulation using this approach. As an insertion construct they used a fluorescing GFP gene driven by the ubiquitously expressed human phosphoglycerate kinase (PGK) promoter. They integrated this construct in 90 different positions in the genome of human HEK293 cells. They found that the expression of the construct in Ridges was indeed higher than those inserted in antiridges (while all constructs have the same promoter).[11]
They investigated if these differences in expressions were due to genes in the direct neighborhood of the constructs or by the domain as a whole. They found that constructs next to highly expressed genes were slightly more expressed than others. But when to enlarged the window size to the surrounding 49 genes (domain level) they saw that constructs located in domains with an overall high expression had a more than 2-fold higher expression then those located in domains with a low expression level.
They also checked if the construct was expressed at similar levels as neighboring genes, and if that tight co-expression was present solely within RIDGEs. They found that the expressions were highly correlated within RIDGEs, and almost absent near the end and outside the RIDGEs.
Previous observations and the research of Gierman et al. proved that the activity of a domain has great impact on the expression of the genes located in it. And the genes within a RIDGE are co-expressed. However the constructs used by Gierman et al. were regulated by al full-time active promoter. The genes of the research of Johnidis et al. were dependent of the present of the aire transcription factor. The strange expression of the aire regulated genes could partly have been caused by differences in expression and conformation of the aire transcription factor itself.
Functional relation
editIt was known before the genomic era that clustered genes tend to be functionally related. Abderrahim et al. (1994) had shown that all the genes of the major histocompatibility complex were clustered on the 6p21 chromosome. Roy et al. (2002) showed that in the nematode C. elegans genes that are solely expressed in muscle tissue during the larval stage tend to cluster in small groups of 2–5 genes. They identified 13 clusters.
Yamashita et al. (2004) showed that genes related to specific functions in organs tend to cluster. Six liver related domains contained genes for xenobiotic, lipid and alcohol metabolism. Five colon-related domains had genes for apoptosis, cell proliferation, ion transporter and mucin production. These clusters were very small and expression levels were low. Brain and breast related genes didn't cluster.[12]
This shows that at least some clusters consist of functionally related genes. However, there are great exceptions. Spellman and Rubin have shown that there are clusters of co-expressed genes that are not functionally related. It seems like that clusters appear in very different forms.
Regulation
editCohen et al. found that of a pair of co-expressed genes only one promoter has an Upstream Activating Sequence (UAS) associated with that expression pattern. They suggested that UASs can activate genes that are not in immediate adjacency to them. This explanation could explain the co-expression of small clusters, but many clusters contain to many genes to be regulated by a single UAS.
Chromatin changes are a plausible explanation for the co-regulation seen in clusters. Chromatin consists of the DNA strand and histones that are attached to the DNA. Regions were chromatin is very tightly packed are called heterochromatin. Heterochromatin consists very often of remains of viral genomes, transposons and other junk DNA. Because of tight packing the DNA is almost unreachable for the transcript machinery, covering deleterious DNA with proteins is the way in which the cell can protect itself. Chromatin which consists of functional genes is often an open structure were the DNA is accessible. However, most of the genes are not needed to be expressed all the time.
DNA with genes that aren't needed can be covered with histones. When a gene must be expressed special proteins can alter the chemical that are attached to the histones (histone modifications) that cause the histones to open the structure. When the chromatin of one gene is opened, the chromatin of the adjacent genes is also until this modification meets a boundary element. In that way genes is close proximity are expressed on the same time. So, genes are clustered in “expression hubs”. In comparison with this model Gilbert et al. (2004) showed that RIDGEs are mostly present in open chromatin structures.[13][14]
However Johnidis et al. (2005) have shown that genes in the same cluster can be very differently expressed. How eukaryotic gene regulation, and associated chromatin changes, precisely works is still very unclear and there is no consensus about it. In order to get a clear picture about the mechanism of gene clusters first the workings chromatin and gene regulation needs to be illuminated. Furthermore, most papers that identified clusters of co-regulated genes focused on transcription levels whereas few focused on clusters regulated by the same transcription-factors. Johnides et al. discovered strange phenomena when they did.
Origins
editThe first models which tried to explain the clustering of genes were, of course, focused on operons because they were discovered before eukaryote gene clusters were. In 1999 Lawrence proposed a model for the origin operons. This selfish operon model suggests that individual genes were grouped together by vertical en horizontal transfer and were preserved as a single unit because that was beneficial for the genes, not per se for the organism. This model predicts that the gene clusters must have conserved between species. This is not the case for many operons and gene clusters seen in eukaryotes.[15]
According to Eichler and Sankoff the two mean processes in eukaryotic chromosome evolution are 1) rearrangements of chromosomal segments and 2) localized duplication of genes. Clustering could be explained by reasoning that all genes in a cluster are originated from tandem duplicates of a common ancestor. If all co-expressed genes in a cluster were evolved from a common ancestral gene it would have been expected that they're co-expressed because they all have comparable promoters. However, gene clustering is a very common tread in genomes and it isn't clear how this duplication model could explain all of the clustering. Furthermore, many genes that are present in clusters are not homologous.
How did evolutionary non-related genes come in close proximity in the first place? Either there is a force that brings functionally related genes near to each other, or the genes came near by change. Singer et al. proposed that genes came in close proximity by random recombination of genome segments. When functionally related genes came in close proximity to each other, this proximity was conserved. They determined all possible recombination sites between genes of human and mouse. After that, they compared the clustering of the mouse and human genome and looked if recombination had occurred at the potentially recombination sites. It turned out that recombination between genes of the same cluster was very rare. So, as soon as a functional cluster is formed recombination is suppressed by the cell. On sex chromosomes, the amount of clusters is very low in both human and mouse. The authors reasoned this was due to the low rate of chromosomal rearrangements of sex chromosomes.
Open chromatin regions are active regions. It is more likely that genes will be transferred to these regions. Genes from organelle and virus genome are inserted more often in these regions. In this way non-homologous genes can be pressed together in a small domain.[16]
It is possible that some regions in the genome are better suited for important genes. It is important for the cell that genes that are responsible for basal functions are protected from recombination. It has been observed in yeast and worms that essential genes tend to cluster in regions with a small replication rate.[17]
It is possible that genes came in close proximity by change. Other models have been proposed but none of them can explain all observed phenomena. It's clear that as soon as clusters are formed they are conserved by natural selection. However, a precise model of how genes came in close proximity is still lacking.
The bulk of the present clusters must have formed relatively recent because only seven clusters of functionally related genes are conserved between phyla. Some of these differences can be explained by the fact that gene expression is very differently regulated by different phyla. For example, in vertebrates and plants DNA methylation is used, whereas it is absent in yeast and flies.[18]
See also
editNotes
edit- ^ a b c Caron H, van Schaik B, van der Mee M, et al. (February 2001). "The human transcriptome map: clustering of highly expressed genes in chromosomal domains". Science. 291 (5507): 1289–92. Bibcode:2001Sci...291.1289C. doi:10.1126/science.1056794. PMID 11181992.
- ^ Velculescu VE, Zhang L, Zhou W, et al. (January 1997). "Characterization of the yeast transcriptome". Cell. 88 (2): 243–51. doi:10.1016/S0092-8674(00)81845-0. PMID 9008165.
- ^ Cho RJ, Campbell MJ, Winzeler EA, et al. (July 1998). "A genome-wide transcriptional analysis of the mitotic cell cycle". Mol. Cell. 2 (1): 65–73. doi:10.1016/S1097-2765(00)80114-8. PMID 9702192.
- ^ Lercher MJ, Urrutia AO, Hurst LD (June 2002). "Clustering of housekeeping genes provides a unified model of gene order in the human genome". Nat. Genet. 31 (2): 180–3. doi:10.1038/ng887. PMID 11992122. S2CID 5797987.
- ^ Versteeg R, van Schaik BD, van Batenburg MF, et al. (September 2003). "The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes". Genome Res. 13 (9): 1998–2004. doi:10.1101/gr.1649303. PMC 403669. PMID 12915492.
- ^ Lee JM, Sonnhammer EL (May 2003). "Genomic gene clustering analysis of pathways in eukaryotes". Genome Res. 13 (5): 875–82. doi:10.1101/gr.737703. PMC 430880. PMID 12695325.
- ^ Singer GA, Lloyd AT, Huminiecki LB, Wolfe KH (March 2005). "Clusters of co-expressed genes in mammalian genomes are conserved by natural selection". Mol. Biol. Evol. 22 (3): 767–75. doi:10.1093/molbev/msi062. hdl:2262/29227. PMID 15574806.
- ^ Spellman PT, Rubin GM (2002). "Evidence for large domains of similarly expressed genes in the Drosophila genome". J. Biol. 1 (1): 5. doi:10.1186/1475-4924-1-5. PMC 117248. PMID 12144710.
- ^ Roy PJ, Stuart JM, Lund J, Kim SK (August 2002). "Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans". Nature. 418 (6901): 975–9. doi:10.1038/nature01012. PMID 12214599. S2CID 4379384.
- ^ Johnnidis JB, Venanzi ES, Taxman DJ, Ting JP, Benoist CO, Mathis DJ (May 2005). "Chromosomal clustering of genes controlled by the aire transcription factor". Proc. Natl. Acad. Sci. U.S.A. 102 (20): 7233–8. Bibcode:2005PNAS..102.7233J. doi:10.1073/pnas.0502670102. PMC 1129145. PMID 15883360.
- ^ Gierman HJ, Indemans MH, Koster J, et al. (September 2007). "Domain-wide regulation of gene expression in the human genome". Genome Res. 17 (9): 1286–95. doi:10.1101/gr.6276007. PMC 1950897. PMID 17693573.
- ^ Yamashita T, Honda M, Takatori H, Nishino R, Hoshino N, Kaneko S (November 2004). "Genome-wide transcriptome mapping analysis identifies organ-specific gene expression patterns along human chromosomes". Genomics. 84 (5): 867–75. doi:10.1016/j.ygeno.2004.08.008. PMID 15475266.
- ^ Kosak ST, Groudine M (October 2004). "Gene order and dynamic domains". Science. 306 (5696): 644–7. Bibcode:2004Sci...306..644K. doi:10.1126/science.1103864. PMID 15499009. S2CID 7293449.
- ^ Gilbert N, Boyle S, Fiegler H, Woodfine K, Carter NP, Bickmore WA (September 2004). "Chromatin architecture of the human genome: gene-rich domains are enriched in open chromatin fibers". Cell. 118 (5): 555–66. doi:10.1016/j.cell.2004.08.011. PMID 15339661.
- ^ Lawrence JG (September 1997). "Selfish operons and speciation by gene transfer". Trends Microbiol. 5 (9): 355–9. doi:10.1016/S0966-842X(97)01110-4. PMID 9294891.
- ^ Lefai E, Fernández-Moreno MA, Kaguni LS, Garesse R (June 2000). "The highly compact structure of the mitochondrial DNA polymerase genomic region of Drosophila melanogaster: functional and evolutionary implications". Insect Mol. Biol. 9 (3): 315–22. doi:10.1046/j.1365-2583.2000.00191.x. PMID 10886416. S2CID 39243989.
- ^ Pál C, Hurst LD (March 2003). "Evidence for co-evolution of gene order and recombination rate". Nat. Genet. 33 (3): 392–5. doi:10.1038/ng1111. PMID 12577060. S2CID 21567576.
- ^ Regev A, Lamb MJ, Jablonka E (July 1998). "The role of DNA methylation in invertebrates: developmental regulation or genome defense?". Mol Biol Evol. 15 (7): 880–891. doi:10.1093/oxfordjournals.molbev.a025992.