Uncharacterized protein C1orf131 is a protein that in humans is encoded by the gene C1orf131. The first ortholog of this protein was discovered in humans.[5][6] Subsequently, through the use of algorithms and bioinformatics, homologs of C1orf131 have been discovered in numerous species, and as a result, the name of the majority of the proteins in this protein family is Uncharacterized protein C1orf131 homolog.

C1orf131
Identifiers
AliasesC1orf131, chromosome 1 open reading frame 131
External IDsMGI: 1913773; HomoloGene: 11982; GeneCards: C1orf131; OMA:C1orf131 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_152379
NM_001300830

NM_025615

RefSeq (protein)

NP_001287759
NP_689592

NP_079891

Location (UCSC)Chr 1: 231.22 – 231.24 MbChr 8: 125.56 – 125.59 Mb
PubMed search[3][4]
Wikidata
View/Edit HumanView/Edit Mouse


Gene

edit

In humans C1orf131 is located on the minus strand of chromosome 1 and on the cytogenetic band 1q42.2 along with 193 other genes.[7] Notably, the gene upstream of C1orf131 is GNPAT, and the gene downstream of C1orf131 is TRIM67. When this gene is transcribed in humans, C1orf131 most often forms an mRNA of 1458 base pairs long which is composed of seven exons. There are at least nine others alternative splice forms in humans that produce proteins. They range in size from 129 base pairs (2 exons) to 1458 base pairs (7 exons).[8]

Protein

edit

In the C1orf131 protein family, the proteins are between 93 and 450 amino acids long; however, the majority tend to be between 160-295 amino acids long. They have a molecular weight between 10.6 and 49.0 kDa with the majority between 18.6 and 32.7 kDa. They have an isoelectric point between 9.6 and 11.2.[9] Over 30 orthologs from mammals, birds and lizards have been identified as having a poly(A) RNA binding site.[10] All orthologs in this protein family have a domain of unknown function DUF4602.[10][11] The human protein has been shown to be both phosphorylated and acetylated.[12][13][14][15][16][17] These proteins are lysine-rich, charged amino acids (DEHKR), and basic charged amino acids (HKR).[18] The secondary structure of these proteins primarily consist of alpha helices and coils with a small percentage of beta strands.[19] C1orf131 has been shown to interact with ubiquitin[20] through affinity capture followed by mass spectrometry and APP (amyloid beta (A4) precursor protein)[21] through reconstituted complex.

 
Graphical overview of the human protein C1orf131 with DUF4602 shown in green, phosphorylation in red points, and acetylation in gray point.

DUF4602

edit

DUF4602 (PF15375) is generally 120+ amino acids long.[22] There is typically only one gene that contains this DUF domain;however, the DUF domain has been identified in two different proteins in several species. In Trichuris suis DUF4602 is found in both hypothetical protein M5114_09117 and tRNA pseudouridine synthase D, and in Echinocuccus granulosus DUF4602 has been found in hypothetical protein EGR 05135 and expressed conserved protein. DUF4602 has been found primarily in eukaryotes; however, DUF4602 has been identified in the virus DRHN1, Bacillus sp. UNC41MFS5, Enterococcus faecalis, and Enterococcus faecalis 13-SD-W-01. In the C1orf131 orthologs the DUF domains are typically located in the middle of the gene toward the C-terminus side in larger proteins (250+ residues) and in smaller orthologs (160-250 residues) the DUF domain is located near the N-terminus. Also in larger orthologs there are regions of low complexity which could indicate that these proteins are intrinsically disordered proteins.

Evolutionary history

edit

This gene family exists only in eukaryotes. There are no paralogs of this gene; however, there are a few pseudogenes of C1orf131. Thus far they have only been found in orangutans, mouse lemurs, and sloths.[11] When this gene family is compared to cytochrome C, a slow evolving gene,[23] and fibrinogen gamma chain, a fast evolving gene[24] it is shown to evolve at a faster rate than fibrinogen.

 
Graph of divergence of this gene as compared to fibrinogen and cytochrome C.

References

edit
  1. ^ a b c GRCh38: Ensembl release 89: ENSG00000143633Ensembl, May 2017
  2. ^ a b c GRCm38: Ensembl release 89: ENSMUSG00000031984Ensembl, May 2017
  3. ^ "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. ^ "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. ^ Gerhard DS, Wagner L, et al. (October 2004). "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)". Genome Research. 14 (10b): 212–2127. doi:10.1101/gr.2596504. PMC 528928. PMID 15489334.
  6. ^ Ota,T., Suzuki,Y., et al. (December 21, 2004). "Complete sequencing and characterization of 21,243 full-length human cDNAs". Nature Genetics. 36 (1): 40–45. doi:10.1038/ng1285. PMID 14702039.
  7. ^ "Browse Homo sapiens ORF cDNA clones by chromosome 1, map 1q42, page 1". Archived from the original on 2015-05-18. Retrieved 2015-04-27.
  8. ^ "AceView: Gene:C1orf131, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView".
  9. ^ Kozlowski LP (2016). "IPC - Isoelectric Point Calculator". Biology Direct. 11 (1): 55. doi:10.1186/s13062-016-0159-9. PMC 5075173. PMID 27769290.
  10. ^ a b "Uniprot Gene: C1orf131". Retrieved May 7, 2015.
  11. ^ a b "BLAT". Retrieved May 7, 2015.
  12. ^ Olsen JV, Vermeulen M, Santamaria A, Kumar C, Miller ML, Jensen LJ, Gnad F, Cox J, Jensen TS, Nigg EA, Brunak S, Mann M (January 2010). "Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis". Science Signaling. 3 (104): ra3. doi:10.1126/scisignal.2000475. PMID 20068231. S2CID 24775963.
  13. ^ Wang B, Malik R, Nigg EA, Körner R (December 2008). "Evaluation of the low-specificity protease elastase for large-scale phosphoproteome analysis". Analytical Chemistry. 80 (24): 9526–9533. doi:10.1126/scisignal.2000475. PMID 20068231. S2CID 24775963. Retrieved April 26, 2015.
  14. ^ Matsuoka S, Ballif BA, Smogorzewska A, McDonald ER 3rd, Hurov KE, Luo J, Bakalarski CE, Zhao Z, Solimini N, Lerenthal Y, Shiloh Y, Gygi SP, Elledge SJ (May 2007). "ATM and ATR substrate analysis reveals extensive protein networks responsive to DNA damage". Science. 316 (5828): 1160–1166. Bibcode:2007Sci...316.1160M. doi:10.1126/science.1140321. PMID 17525332. S2CID 16648052.
  15. ^ Kim D, Hahn Y (July 9, 2011). "Identification of novel phosphorylation modification sites in human proteins that originated after the human–chimpanzee divergence". Bioinformatics. 27 (18): 2494–501. doi:10.1093/bioinformatics/btr426. PMID 21775310.
  16. ^ Dephoure N, Zhou C, Villén J, Beausoleil SA, Bakalarski CE, Elledge SJ, Gygi SP (August 2008). "A quantitative atlas of mitotic phosphorylation". Proceedings of the National Academy of Sciences of the United States of America. 105 (31): 10762–10767. Bibcode:2008PNAS..10510762D. doi:10.1073/pnas.0805139105. PMC 2504835. PMID 18669648.
  17. ^ Choudhary C, Kumar C, Gnad F, Nielsen ML, Rehman M, Walther TC, Olsen JV, Mann M (August 2010). "Lysine acetylation targets protein complexes and co-regulates major cellular functions". Science. 325 (5942): 834–40. Bibcode:2009Sci...325..834C. doi:10.1126/science.1175371. PMID 19608861. S2CID 206520776.
  18. ^ Brendel V, Bucher P, Nourbakhsh IR, Blaisdell BE, Karlin S (March 1992). "Methods and algorithms for statistical analysis of protein sequences". Proceedings of the National Academy of Sciences of the United States of America. 89 (6): 2002–2006. Bibcode:1992PNAS...89.2002B. doi:10.1073/pnas.89.6.2002. PMC 48584. PMID 1549558.
  19. ^ Garnier J, Gibrat JF, Robson B (1996). "GOR method for predicting protein secondary structure from amino acid sequence". Computer Methods for Macromolecular Sequence Analysis. Methods in Enzymology. Vol. 266. pp. 540–553. doi:10.1016/S0076-6879(96)66034-0. ISBN 978-0-12-182167-8. PMID 8743705.
  20. ^ Stes E, Laga M, Walton A, Samyn N, Timmerman E, De Smet I, Goormachtig S, Gevaert K (June 2014). "A COFRADIC Protocol To Study Protein Ubiquitination". J Proteome Res. 13 (6) (3rd ed.): 3107–3113. doi:10.1021/pr4012443. PMID 24816145.[permanent dead link]
  21. ^ Olah J, Vincze O, Virok D, Simon D, Bozso Z, Tokesi N, Horvath I, Hlavanda E, Kovacs J, Magyar A, Szucs M, Orosz F, Penke B, Ovadi J (September 2011). "Interactions of pathological hallmark proteins: tubulin polymerization promoting protein/p25, beta-amyloid, and alpha-synuclein". J. Biol. Chem. 286 (39) (39th ed.): 34088–34100. doi:10.1074/jbc.m111.243907. PMC 3190826. PMID 21832049.
  22. ^ "Family: DUF4602". Retrieved May 8, 2015.
  23. ^ Dickerson, R. (1971). "The structure of cytochrome c and the rates of molecular evolution". Journal of Molecular Evolution. 1 (1) (1st ed.): 26–45. Bibcode:1971JMolE...1...26D. doi:10.1007/bf01659392. PMID 4377446. S2CID 24992347.
  24. ^ Prychitko TM, Moore WS (2000). "Comparative evolution of the mitochondrial cytochrome b gene and nuclear beta-fibrinogen intron 7 in woodpeckers". Mol Biol Evol. 17 (7): 1101–11. doi:10.1093/oxfordjournals.molbev.a026391. PMID 10889223.