Chromosome 1 Opening Reading Frame 94 or C1orf94 is a protein in human coded by the C1orf94 gene.[5] The function of this protein is still poorly understood.
C1orf94 | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Identifiers | |||||||||||||||||||||||||||||||||||||||||||||||||||
Aliases | C1orf94, chromosome 1 open reading frame 94 | ||||||||||||||||||||||||||||||||||||||||||||||||||
External IDs | MGI: 3616080; HomoloGene: 57187; GeneCards: C1orf94; OMA:C1orf94 - orthologs | ||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||
Wikidata | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Gene
editC1orf94 gene is also known as Q6P1W5; B3KVT1; D3DPR3; E9PJ76 and Q96IC8is; MGC15882.
C1orf94 has the FLJ20508 gene as an alias.[5]
Locus
editC1orf94 is located on the short arm of chromosome 1 specifically at 1p34.3 chr1:34,166,883-34,219,131 and is situated near HSPD1P14 gene. It is encoded on the sense strand.[6]
This gene has 7 exons (only 6 of them are coding)[7]
Exon | Start | End | Size |
---|---|---|---|
ENSE00001207243 (non transcribed) | 34,166,883 | 34,167,171 | 289 |
ENSE00003530680 | 34,197,225 | 34,197,913 | 689 |
ENSE00002095077 | 34,200,772 | 34,201,032 | 261 |
ENSE00002136629 | 34,202,084 | 34,202,259 | 176 |
ENSE00002136447 | 34,208,157 | 34,208,234 | 78 |
ENSE00002125161 | 34,212,210 | 34,212,406 | 197 |
ENSE00001460399 | 34,218,686 | 34,219,131 | 446 |
mRNA
editThis protein has two isoforms a and b; a being the longest (598 aa).[8]
Name | Transcript ID | Base pairs | Protein type | Protein length |
---|---|---|---|---|
C1orf94-202 | ENST00000488417.2 | 3050 | Protein Coding | 598 aa |
C1orf94-201 | ENST00000373374.7 | 2136 | Protein Coding | 408 aa |
Transcription
editThere are two promoters predicted for C1orf94. Only one of them is predicted for the transcript used for the analysis. This is the list of transcription factor binding sites that bind transcription factors:[9]
ZF02 (C2H2 zinc finger transcription factors 2)
Cart1 Sequence-specific DNA-binding transcription factor
HTLV-I U5 repressive element-binding protein 1
NKX homeodomain factors
AARE binding factors PREB core-binding element
Protein
editDUF4688 is a large region found within C1orf94 protein sequence and in both isoforms a and b.[10] This sequence is conserved in eukaryotes.[11]
C1orf94 is a Protein tissue co-expression partner for RBBP8NL.[12] the isoelectric point is 8.56 and the molecular weight is around 65353 KDa. Proline is the most abundant amino acid in the protein sequence (11.7%) then followed closely by Leucine (10.4%).[13]
Seven PEST motifs were identified in from positions 1 to 598 : PEST domain signatures, rich in proline (P), glutamic acid (E), serine (S), and threonine (T).
Prediction of only one potential PEST motif with 21 amino acids between positions 133 and 155. This sequence is associated with proteins that have a short intracellular half-life.[14]
Post-translational modifications
editC1orf94 goes through Palmitoylation,[16] phosphorylation[17] and glycation[18] mainly on the N-terminus of C1orf94. Also, Mitochondrial processing peptidase cleavage site is predicted on the first Methionine.
Structure
editAccording to CFSSP,[19] the secondary structure of C1orf94 shows alpha Helix, extended strands, beta turns, and Random coils.
Both Tertiary structures predicted by Phyre2[20] and the SWISS model[15] show that C1orf94 is a monomer.
According to I-TASSER[21] the closest protein structures and Identified structural analogs to C1orf94 are 3IXZ (Pig gastric H+/K+-ATPase complexed with aluminum fluoride) and 3B8E (Crystal structure of the sodium-potassium pump).
Protein-protein Interactions
editMentha[22] proposed a strong physical interaction with ATXN1 which is a chromatin-binding factor that represses Notch signaling in the absence of the Notch intracellular domain.
According to PSICQUIC,[23] C1orf94 and MMADHC have physical interactions that were demonstrated through affinity chromatography technology. MMADHC is a gene that encodes a mitochondrial protein that is involved in early steps of vitamin B12 metabolism.[24]
RFX2 is possibly a functional partner according to STRING[25] and it is a query protein and involved in first shell of interactors.RFX2 is a Transcription factor that acts as a key regulator of spermatogenesis.
Expression
editAccording to AceView, this gene is well expressed, 0.5 times the average gene in this release.[26]
According to PSORT II[27] C1orf94 is 69.6% nuclear.
Data from NCBI shows that C1orf94 is primarily expressed in the testis tissues.[28]
According to the human protein Atlas,[29] C1orf94 is slightly expressed in the brain tissue.
According to GEO profiles,[30] the C1orf94 increase of expression is highly correlated with Morbid obesity. Also, C1orf94 increased after related coactivator depletion.
Function
editThe function of C1orf94 is not yet fully understood and there are no experiments yet that proved otherwise. However, C1orf94 shows higher rates of expression in HPA RNA sequences in normal tissues compared to tissues during fetal development.[28]
Association with diseases
editAccording to GWAS,[31] C1orf94 was identified as an OncoORF (Oncogenic Open Reading frame). According to Colorectal cancer Atlas,[32] C1orf94 is involved in protein-protein interactions with 50 nodes causing colorectal cancer like interactions with AKAP9 kinase anchor protein, which is the most dangerous one as it promotes colorectal cancer development by regulating Cdc42 interacting protein.[33]
Sequence homology
editC1orf94 evolved faster than both Cytochrome C and less than fibrinopeptides.
C1orf94 has no paralogs. Orthologs were identified using NCBI BLASTp.[34] Mammalians showed the most conservation and the most distant orthologs were found in fish.
After running SAPS on a group of orthologs (Gorilla, Rat, Dog, and Bat), the protein's composition only shows minor variations compared to the human sequence: Proline is still the most abundant amino acid followed by leucine and tryptophan remains the least abundant.[13]
References
edit- ^ a b c GRCh38: Ensembl release 89: ENSG00000142698 – Ensembl, May 2017
- ^ a b c GRCm38: Ensembl release 89: ENSMUSG00000028813 – Ensembl, May 2017
- ^ "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
- ^ "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
- ^ a b "C1orf94 - Uncharacterized protein C1orf94 - Homo sapiens (Human) - C1orf94 gene & protein". www.uniprot.org. Retrieved 2020-05-01.
- ^ a b "C1orf94 Gene - GeneCards | CA094 Protein | CA094 Antibody". www.genecards.org. Retrieved 2020-05-01.
- ^ "GeneLoc Integrated Map for Chromosome 1: Exon structure for C1orf94". genecards.weizmann.ac.il. Retrieved 2020-05-01.
- ^ "Transcript: C1orf94-201 (ENST00000373374.7) - Summary - Homo sapiens - Ensembl genome browser 100". uswest.ensembl.org. Retrieved 2020-05-01.
- ^ "Genomatix - NGS Data Analysis & Personalized Medicine". www.genomatix.de. Retrieved 2020-05-01.
- ^ "uncharacterized protein C1orf94 isoform b [Homo sapiens] - Protein - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-05-01.
- ^ "InterPro". www.ebi.ac.uk. Retrieved 2020-05-01.
- ^ "RBBP8NL Gene - GeneCards | RB8NL Protein | RB8NL Antibody". www.genecards.org. Retrieved 2020-05-01.
- ^ a b "SAPS < Sequence Statistics < EMBL-EBI". www.ebi.ac.uk. Retrieved 2020-05-01.
- ^ "PEST sequence", Wikipedia, 2020-04-15, retrieved 2020-05-01
- ^ a b "SWISS-MODEL". swissmodel.expasy.org. Retrieved 2020-05-01.
- ^ "CSS-Palm - Palmitoylation Site Prediction". csspalm.biocuckoo.org. Retrieved 2020-05-01.
- ^ "NetPhos 3.1 Server". www.cbs.dtu.dk. Retrieved 2020-05-01.
- ^ "GPS 5.0 - Kinase-specific Phosphorylation Site Prediction". gps.biocuckoo.cn. Retrieved 2020-05-01.
- ^ "CFSSP: Chou & Fasman Secondary Structure Prediction Server". www.biogem.org. Retrieved 2020-05-01.
- ^ "PHYRE2 Protein Fold Recognition Server". www.sbg.bio.ic.ac.uk. Retrieved 2020-05-01.
- ^ "I-TASSER server for protein structure and function prediction". zhanglab.ccmb.med.umich.edu. Retrieved 2020-05-01.
- ^ "mentha: the interactome browser". mentha.uniroma2.it. Retrieved 2020-05-01.
- ^ "PSICQUIC View". www.ebi.ac.uk. Retrieved 2020-05-01.
- ^ "MMADHC Gene - GeneCards | MMAD Protein | MMAD Antibody". www.genecards.org. Retrieved 2020-05-01.
- ^ "STRING: functional protein association networks". string-db.org. Retrieved 2020-05-01.
- ^ "AceView: Gene:C1orf94, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView". www.ncbi.nlm.nih.gov. Retrieved 2020-05-01.
- ^ "PSORT II Prediction". psort.hgc.jp. Retrieved 2020-05-01.
- ^ a b "C1orf94 chromosome 1 open reading frame 94 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-05-01.
- ^ "The Human Protein Atlas". www.proteinatlas.org. Retrieved 2020-05-01.
- ^ "Home - GEO Profiles - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2020-05-01.
- ^ Delgado AP, Brandao P, Chapado MJ, Hamid S, Narayanan R (2014-07-01). "Open reading frames associated with cancer in the dark matter of the human genome". Cancer Genomics & Proteomics. 11 (4): 201–13. PMID 25048349.
- ^ "Colorectal Cancer Atlas | C1orf94 Gene summary::Mutations:: Proteomics :: Domains :: Protein Interactions :: PTMs :: Cell lines :: Colon Atlas :: Colorectal Cancer Database::Bowel cancer::Mutations::Proteomics::Genomics::Cancer Atlas". colonatlas.org. Retrieved 2020-05-01.
- ^ Hu ZY, Liu YP, Xie LY, Wang XY, Yang F, Chen SY, Li ZG (June 2016). "AKAP-9 promotes colorectal cancer development by regulating Cdc42 interacting protein 4". Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease. 1862 (6): 1172–81. doi:10.1016/j.bbadis.2016.03.012. PMC 4846471. PMID 27039663.
- ^ "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2020-05-01.