Chromosome 4 open reading frame 51 (C4orf51) is a protein which in humans is encoded by the C4orf51 gene.[5]

C4orf51
Identifiers
AliasesC4orf51, chromosome 4 open reading frame 51
External IDsMGI: 1914937; HomoloGene: 78034; GeneCards: C4orf51; OMA:C4orf51 - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_001080531

NM_026315

RefSeq (protein)

NP_001074000

NP_080591

Location (UCSC)Chr 4: 145.68 – 145.77 MbChr 8: 79.94 – 79.98 Mb
PubMed search[3][4]
Wikidata
View/Edit HumanView/Edit Mouse

Gene

edit
 
Genomic neighborhood of C4orf51, labeled in red.[5]

The C4orf51 gene is located at 4q31.21 on the plus strand of chromosome 4.[6] The gene spans 120,289 base pairs and contains 6 exons.[7] The genomic neighborhood of C4orf51 includes LOC285422, LINC02491, NCOA4P3, and MMAA, all located upstream of C4orf51.[5] ZNF827 and LOC105377468 are located downstream of C4orf51.

mRNA

edit

There are three known transcript variants for C4orf51, which encode for isoforms X1, X2, and X3.[5] Though the variants vary in length, all contain exons 1 and 2. At times, C4orf51 is transcribed to form an mRNA corresponding to C4orf51 and the neighboring gene.

Protein

edit
 
Schematic illustration of predicted post-translational modifications for C4orf51, made using DOG 2.0.[8] DUF4722 shown.

C4orf51 encodes for a protein with 202 amino acids and a molecular weight of 23 kDa.[6] The theoretical isoelectric point of C4orf51 is 8.6.[9] Relative to other human proteins, C4orf51 has more serine resides and fewer valine residues.[10]

Domains and motifs

edit

In humans, the C4orf51 protein contains one domain of unknown function, DUF4722.[11] DUF4722 spans the first 168 amino acids of C4orf51 and has a predicted molecular weight of 19.3 kDa.[9] In a compositional analysis of this domain, no extremes were identified.[10] The DUF is highly conserved in orthologous proteins, particularly near the N-terminus.[12]

Secondary structure

edit

Alpha-helices are predicted to span amino acids 20-34 and 150–165 in C4orf51.[13][14][15] Amino acids 45 to 48 are predicted to form a beta sheet.[13][14] No coils are predicted in C4orf51.[16]

 
Structural analog of C4orf51, generated by I-TASSER and visualized with iCn3D.[17][18] Conserved domain Clr2_transil, involved in transcriptional silencing, is labeled in yellow.

Tertiary and quaternary structure

edit

The best-aligned structural analog of C4orf51, generated by I-TASSER, contains Clr2_transil, a domain involved in transcriptional silencing.[17][18] Per Origene, migration of a C4orf51 rabbit polyclonal antibody in gel resulted in a band at 23 kDa and at ~44-46 kDa, suggesting that C4of51 may form a dimer.[19]

Post-translational modifications

edit
 
Conceptual translation of C4orf51, with annotation key below. Exon-exon boundaries, transcription start and stop sites, and predicted post-translational modifications are marked.

C4orf51 is predicted to undergo several post-translational modifications, including phosphorylation, glycation, and acetylation.[20][21][22] Though SUMOylation and tyrosine sulfation are also predicted, the sites of these modifications are not conserved in distant C4orf51 orthologs.[23][24]

Subcellular localization

edit

C4orf51 is predicted to be localized to the cell nucleus.[25] The protein contains pat4, a motif commonly used to identify potential nuclear localization signals. This motif is conserved in the most distantly related C4orf51 ortholog known, found in Anolis carolinensis.

Expression

edit

C4orf51 expression is low in all tissues, with the exception of the testes.[26] However, because C4orf51 contains long-terminal repeats (LTRs) of human endogenous retroviruses (HERVs) in the gene body, it has exhibited high levels of expression in differentiation-defective human induced pluripotent stem cells.[27][28]

Promoter

edit

There are two promoter regions predicted by Genomatix, but only one (GXP_921944) is located upstream of the transcription start site.[29] GXP_921944 spans 1910 base pairs on chromosome 4. There are 15 coding transcripts supporting this promoter, but none are experimentally verified.[29]

Interacting proteins

edit

Experimentally-determined protein interactions for C4orf51 have not yet been identified.[30][31][32]

Clinical significance

edit

Vlaikou et al. (2004) report that a 4q deletion containing C4orf51 and six other genes causes growth failure and developmental delay, minor craniofacial dysmorphism, digital anomalies, and cardiac and skeletal defects.[33]

Homology

edit

Paralogs

edit

No paralogs or paralogous domains exist for C4orf51.[7]

Orthologs

edit

Orthologs of C4orf51 have been found in mammals and reptiles.[7] Within class Mammalia, orthologs have been identified in orders Primata, Scandentia, Lagomorpha, Rodentia, Perissodactyla, Chiroptera, Carnivora, Cetartiodactyla, Sirenia, and Proboscidea, as well as mammalian infraclass Marsupialia. The green anole (Anolis carolinensis) and Burmese python (Python bivittatus) contain the most distantly related orthologs of C4orf51. Both species diverged from humans an estimated 312 million years ago. C4orf51 orthologs have not yet been identified in bacteria, archaea, protists, plants, fungi, trichoplax, invertebrates, bony or cartilaginous fish, amphibians, or birds.

C4orf51 Orthologs
Genus and species Common name Taxonomic group Estimated date of divergence Accession number Length (amino acids) Sequence identity Sequence similarity
Homo sapiens Human Mammalia (Primate) 0 NP_001074000.1 202 100.00% 100%
Macaca mulatta Rhesus macaque Mammalia (Primate) 29.44 NP_001181807.1 202 94.55% 97%
Callithrix jacchus Common marmoset Mammalia (Primate) 43.6 XP_008990874.1 217 79.72% 88%
Tupaia chinensis Chinese tree shrew Mammalia (Scandentia) 82 XP_006143532.1 201 68.96 77%
Oryctolagus cuniculus European rabbit Mammalia (Lagomorpha) 90 XP_017202803.1 222 57.40% 76%
Mus musculus House mouse Mammalia (Rodentia) 90 NP_080591.1 208 50.96% 66%
Urocitellus parryii Arctic ground squirrel Mammalia (Rodentia) 90 XP_026248522.1 142 43.35% 72%
Ceratotherium simum simum Southern white rhinoceros Mammalia (Perissodactyla) 96 XP_014635653.1 199 66.50% 77%
Equus asinus Donkey Mammalia (Perissodactyla) 96 XP_014693612.1 201 64.71% 75%
Pteropus vampyrus Large flying fox Mammalia (Chiroptera) 96 XP_023385935.1 200 62.56% 71%
Enhydra lutris kenyoni Sea otter Mammalia (Carnivora) 96 XP_022368037.1 201 61.39% 74%
Myotis brandtii Brandt's bat Mammalia (Chiroptera) 96 XP_014393999.1 199 59.11% 69%
Callorhinus ursinus Northern fur seal Mammalia (Carnivora) 96 XP_025730051.1 146 50.50% 68%
Vicugna pacos Alpaca Mammalia (Artiodactyla) 96 XP_006210007 158 50.50% 59%
Balaenoptera acutorostrata scammoni Minke whale Mammalia (Cetacea) 96 XP_007189508.1 168 49.51% 56%
Trichechus manatus latirostris West Indian manatee Mammalia (Sirenia) 105 XP_004378925.1 162 57.64% 66%
Loxodonta africana African bush elephant Mammalia (Proboscidea) 105 XP_023412869.1 213 53.00% 65%
Sarcophilus harrisii Tasmanian devil Mammalia (Marsupial) 159 XP_023361728.1 190 28.71% 52%
Anolis carolinensis Green anole Reptilia 312 XP_003221711.1 194 21.53% 46%
Python bivittatus Burmese python Reptilia 312 XP_025028520.1 176 19.43% 51%

References

edit
  1. ^ a b c GRCh38: Ensembl release 89: ENSG00000237136Ensembl, May 2017
  2. ^ a b c GRCm38: Ensembl release 89: ENSMUSG00000031682Ensembl, May 2017
  3. ^ "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. ^ "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. ^ a b c d "C4orf51 chromosome 4 open reading frame 51 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-05-06.
  6. ^ a b "C4orf51 Gene (Protein Coding)". Gene Cards. Retrieved 2019-02-03.
  7. ^ a b c "Homo sapiens chromosome 4 open reading frame 51 (C4orf51), mRNA". NCBI (National Center for Biotechnology Information): Nucleotide. May 2019.
  8. ^ "DOG 2.0 - Protein Domain Structure Visualization". dog.biocuckoo.org. Retrieved 2019-05-02.
  9. ^ a b "ExPASy - Compute pI/Mw tool". web.expasy.org. Retrieved 2019-04-21.
  10. ^ a b "SAPS < Sequence Statistics < EMBL-EBI". www.ebi.ac.uk. Retrieved 2019-04-21.
  11. ^ "uncharacterized protein C4orf51 [Homo sapiens]". NCBI (National Center for Biotechnology Information): Protein.
  12. ^ "Clustal Omega < Multiple Sequence Alignment < EMBL-EBI". www.ebi.ac.uk. Retrieved 2019-05-06.
  13. ^ a b "CFSSP: Chou & Fasman Secondary Structure Prediction Server". www.biogem.org. Retrieved 2019-05-03.
  14. ^ a b "NPS@ : GOR4 secondary structure prediction". npsa-prabi.ibcp.fr. Retrieved 2019-04-21.
  15. ^ "PHYRE2 Protein Fold Recognition Server". www.sbg.bio.ic.ac.uk. Retrieved 2019-05-03.
  16. ^ "COILS Server". embnet.vital-it.ch. Archived from the original on 2019-07-12. Retrieved 2019-05-03.
  17. ^ a b "I-TASSER server for protein structure and function prediction". zhanglab.ccmb.med.umich.edu. Retrieved 2019-05-03.
  18. ^ a b "iCn3D: Web-based 3D Structure Viewer". www.ncbi.nlm.nih.gov. Retrieved 2019-05-03.
  19. ^ "C4orf51 Rabbit Polyclonal Antibody – TA335924 | OriGene". www.origene.com. Retrieved 2019-05-03.
  20. ^ "GPS 3.0 - Kinase-specific Phosphorylation Site Prediction". gps.biocuckoo.org. Archived from the original on 2018-05-06. Retrieved 2019-04-21.
  21. ^ "NetGlycate 1.0 Server". www.cbs.dtu.dk. Retrieved 2019-04-21.
  22. ^ "NetAcet 1.0 Server". www.cbs.dtu.dk. Retrieved 2019-04-21.
  23. ^ "SUMOplot™ Analysis Program | Abgent". www.abgent.com. Archived from the original on 2005-01-03. Retrieved 2019-05-03.
  24. ^ "ExPASy - Sulfinator". web.expasy.org. Retrieved 2019-05-03.
  25. ^ "PSORT II Prediction". psort.hgc.jp. Retrieved 2019-04-21.
  26. ^ "C4orf51 chromosome 4 open reading frame 51 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-04-21.
  27. ^ Koyanagi-Aoi M, Ohnuki M, Takahashi K, Okita K, Noma H, Sawamura Y, Teramoto I, Narita M, Sato Y, Ichisaka T, Amano N, Watanabe A, Morizane A, Yamada Y, Sato T, Takahashi J, Yamanaka S (December 2013). "Differentiation-defective phenotypes revealed by large-scale analyses of human pluripotent stem cells". Proceedings of the National Academy of Sciences of the United States of America. 110 (51): 20569–74. Bibcode:2013PNAS..11020569K. doi:10.1073/pnas.1319061110. PMC 3870695. PMID 24259714.
  28. ^ Ohnuki M, Tanabe K, Sutou K, Teramoto I, Sawamura Y, Narita M, Nakamura M, Tokunaga Y, Nakamura M, Watanabe A, Yamanaka S, Takahashi K (August 2014). "Dynamic regulation of human endogenous retroviruses mediates factor-induced reprogramming and differentiation potential". Proceedings of the National Academy of Sciences of the United States of America. 111 (34): 12426–31. Bibcode:2014PNAS..11112426O. doi:10.1073/pnas.1413299111. PMC 4151758. PMID 25097266.
  29. ^ a b "Genomatix: Gene2Promoter Result". www.genomatix.de. Retrieved 2019-04-21.
  30. ^ "C4orf51 protein (human) - STRING interaction network". string-db.org. Retrieved 2019-04-21.
  31. ^ "The Molecular INTeraction Database – An ELIXIR Core Resource". Retrieved 2019-05-06.
  32. ^ "Mentha: the interactome browser". www.mentha.uniroma2.it. Retrieved 2019-05-06.
  33. ^ Vlaikou AM, Manolakos E, Noutsopoulos D, Markopoulos G, Liehr T, Vetro A, Ziegler M, Weise A, Kreskowski K, Papoulidis I, Thomaidis L, Syrrou M (2014). "An interstitial 4q31.21q31.22 microdeletion associated with developmental delay: case report and literature review". Cytogenetic and Genome Research. 142 (4): 227–38. doi:10.1159/000361001. PMID 24733116. S2CID 32287205.