C9orf50

C9orf50
Identifiers
Aliases	C9orf50, chromosome 9 open reading frame 50
External IDs	MGI: 1923631; HomoloGene: 18859; GeneCards: C9orf50; OMA:C9orf50 - orthologs
Gene location (Human)
Chr.	Chromosome 9 (human)
End	129,620,776 bp
Gene location (Mouse)
Chr.	Chromosome 2 (mouse)
End	30,693,673 bp
RNA expression pattern
	Top expressed in
	left testis; ; right testis; ; tibial nerve; ; stromal cell of endometrium; ; hypothalamus; ; amygdala; ; caudate nucleus; ; putamen; ; C1 segment; ; Brodmann area 9;
	Top expressed in
	seminiferous tubule; ; spermatid; ; spermatocyte; ; muscle of thigh; ; embryo; ; embryo; ; morula; ; ventricular zone; ; superior frontal gyrus; ; skeletal muscle tissue;
	More reference expression data
	n/a
Orthologs
	375759
	73598
	ENSG00000179058
	ENSMUSG00000044320
	Q5SZB4
	A2APZ1
	NM_199350
	NM_198000
	NP_955382
	NP_932117
	Wikidata
View/Edit Human	View/Edit Mouse

Chromosome 9 open reading frame 50 is a protein that in humans is encoded by the C9orf50 gene.^[5] C9orf50 has one other known alias, FLJ35803.^[6] In humans the gene coding sequence is 10,051 base pairs long, transcribing an mRNA of 1,624 bases that encodes a 431 amino acid protein.

Gene

Location

In humans the gene is located on the negative strand at 9q34.11 and the coding sequence is 8,552 base pairs long.^[7] On human chromosome 9, the gene spans bases chr9:132,374,504-132,383,055^[8] Near C9orf50 is ASB6 which is the gene directly before C9orf50 on the negative strand and on the positive strand is NTMT1 which is more than double the size of C9orf50.[1][2]

Protein

C9orf50 Schematic Illustration using Dog2.0. The 431 amino acid is displayed showing regions of disorder in yellow, polyampholyte in red, and DUF4685 in teal. Other motifs are shown and labelled at the correct AA position. Pink circles represent sites of acetylation, teal represents glycation, and blue represents sumoylation.

The C9orf50 protein has a molecular weight of 47,639 kD and consists 431 amino acids with a predicted isoelectric point of 10.38^[7] The C9orf50 protein contains the conserved domain in pfam15737- DUF4685, the function of which is not well understood and conserved in vertebrates. The protein is made up of 7 exons.

Isoforms

C9orf50 has 9 different splice isoforms (SI) and 11 different transcript variants (TV), the most common is isoform 1 and transcript variant 1.^[9]

C9orf50 Isoform table. Author Hannah Berhow

C9orf50 Isoforms and Transcript Variants

Domains

The protein can be analyzed as a whole as well as split into 3 parts including the N-terminalDomain of 193 residues, DUF4685 of 103 residues, and the C-terminal Domain of 135 residues. The full protein pI is similar to the average pI of the NTD, DUF4685, and CTD. Of these sections the NTD has the highest pI and mW but also has the most residues at 193 of 431.^[10]^[11]

C9orf50	pI	mW kD	Residues
Human Whole Protein	10.38	47.6	431
NTD	11.14	21.1	193
DUF4685	10.8	11.8	103
CTD	9.47	14.7	135

Composition

The compositional analysis of the C9orf50 protein reveals low amounts of I, M, Y and FIKMNY relative to humans and high amounts of R, and KR-ED. There are no findings for charge clusters, high scoring charged or uncharged segments, charge runs, patterns, high scoring hydrophobic or transmembrane segments. Three different unique spacings of C were found at positions 161, 190, and 342. C9orf50 is also found to have 3 repetitive structures, the first sequence PRLP_KLT occurs starting at position 30 and then is repeated at position 78. Another repetitive structure is SLLP at positions 99 and 398. The last repeat structure at 250 and 303 made of KAAL.^[12]

Tertiary Structure

Tertiary C9orf50 protein structures can be found using I-Tasser. This tool results in 5 visualized structures, the two with the highest C scores are -3.25 and -1.27.

Gene level regulation

Promoter

The promoter region for C9orf50 was found using the Genomatix Gene2Promoter search engine.^[13] This resulted in 6 found promoter regions. Only 2 of which were supported by transcripts and cage tags. The most supported promoter region spans 1,962 bases and is conserved in 6 of 8 orthologous loci with 945 cage tags. The transcription start site was determined to be located at 1,503 from a transcript with 7 exons supported by 118 cage tags.^[13]

Transcription factor binding sites

There are hundreds of transcription factors that are predicted to bind the promoter region. The promoter region transcription factors table highlight 20 of these.

Transcript Level Regulation

C9orf50 5' UTR intermolecular base paired structure with the highest delta G is -323.4 kcal/mol. This is the lowest energy structure predicted for the 5'UTR region.^[14] For the 3 ' UTR, the highest dG is -127.5 kcal/mol indicating that it is not as stable as the 5' UTR.

C9orf50 3' UTR Stem Loop Structures

C9orf50 5' UTR Stem Loop Structures

Tissue expression

RNA-seq data of C9orf50 has found a low expression level, 25-50th percentile, in most human tissues compared to all human proteins.^[15] However, it is most highly expressed in testes, brain and gallbladder.^[9] C9orf50 protein expression is higher than the C9orf50 RNA expression.^[16] When studying in situ hybridization data, The mouse C9orf50 ortholog, symbol 1700001O22Rik, was used to compare protein expression against Beta-actin which is ubiquitously expressed and the analyses shows similar expression patterns in the mouse brain.^[17] During development, the protein can be found in the fetal stages.^[18]

Subcelluar expression

The protein has been located primarily in the nucleus and less so found in mitochondria and cytosol.^[19]

C9orf50 Promoter Region Transcription Factors

Orthologs

There are no known paralogs of C9orf50. orthologs of C9orf50 have been found conserved across most subclasses of mammals with the furthest, opossum of the infraclass marsupialia, diverged 159 million years ago.^[20] This gene is not found in reptiles, amphibians, birds, or any other organisms evolved before mammals. A list of mammals in which C9orf50 is conserved is shown below.

C9orf50 Orthologs
Common Name	Taxonomic Group	Divergence from Humans (MYA)	NCBI Accession #	Protein Length (AA)	Sequence Identity to Humans%
Human	Hominini	0	NP_955382.3	431	100
Chimpanzee	Primates	6.65	XP_016817319.1	431	97.22
Gorilla	Primates	9.06	XP_018889539.1	435	93.17
Deer Mouse	Rodentia	90	XP_006983488.1	391	46.14
Prairie Vole	Rodentia	90	XP_005346778.1	370	45.18
American Pika	Lagomorpha	90	XP_004593748.1	579	38.11
Narrow Ridged Finless Porpoise	Cetacea	96	XP_024617982.1	473	56.71
Killer Whale	Cetacea	96	XP_012388229.1	343	59.34
Alpaca	Artiodactyla	96	XP_006205645.1	399	53.83
Black Flying Fox	Chiroptera	96	XP_015449607.1	432	53.21
Egyption Fruit bat	Chiroptera	96	XP_015989428.1	431	53.01
Goat	Artiodactyla	96	XP_017910228.1	438	52.4
Northern Fur Seal	Carnivora	96	XP_025744313.1	441	52.36
Grizzly Bear	Carnivora	96	XP_026369526.1	447	50.63
European Hedgehog	Soricomorpha	96	XP_007527129.1	419	51.42
Star Nosed Mole	Proboscidea	96	XP_012576659.1	383	48.68
Southern White Rhinoceros	Perissodactyla	96	XP_014637447.1	489	47.25
African Bush Elephant	Proboscidea	105	XP_023401069.1	527	49.31
Nine-Banded Armadillo	Cingulata	105	XP_023443586.1	476	46.72
Gray short tailed opossum	Didelpimorphia	159	XP_007475193.1	583	32.56

Evolution

C9orf50 is predicted to evolve more quickly than other common proteins including cytochrome C, hemoglobin beta, and fibrinogen alpha chain.

C9orf50 Molecular Clock

Amino acid conservation

Important amino acids are characterized by those that were on the 100% consensus line created in MView of the strict ortholog multiple sequence alignment.^[21] Amino Acids in red represent conserved amino acids in DUF4685. 14 of the 22 highly conserved amino acids are found within this domain. Leucine occupies the most conserved positions of the C9orf50 protein.

Conserved Amino Acids	C9orf50 AA Position
Proline	33,325
Leucine	147, 155, 158, 280, 285, 321, 328
Phenylalanine	231, 275
Arginine	272, 286
Valine	273, 313
Alanine	267
Aspartic Acid	277
Glutamic Acid	278, 289
Threonine	279
Tyrosine	287
Tryptophan	288

Mutations

Post Translational Modifications and Secondary Structure of C9orf50. PTMs for C9orf50 were found using the tools posted on the Expasy Protein Modifications site. The secondary structure for C9orf50 was predicted by using analysis from Gor, COILS, CFSSP, JPRED, and SOPMA.^{^[1]}^{,^[2],^[3],^[4],^[5]} Helix indicated by green cylinders, beta sheet indicated by blue arrows, and turn structures indicated by pink arrows were included below in the conceptual translation if they had a high prediction score. All the structures that were found in more than one analysis tool were also kept. The protein has no transmembrane sequences.

Common variants in C9orf50 were found with NCBI SNPGeneView.^[22]

dbSNP rs# Cluster ID	Function	dbSNP Allele	Amino Acid Position
rs146521610	Synonymous	V → G	317
rs566893379	Synonymous	S → T	310
rs111868243	Synonymous	S → A	258
rs918165	Missense	K → A	248
rs141573674	Missense	S → A	201
rs759058008	Frameshift	Deleted L	189
rs111606531	Synonymous	A → T	86
rs146618124	Missense	S → C	52
rs372378735	Synonymous	G → A	45
rs751493011	Nonsense	Insert T	11

References

^ ^a ^b ^c GRCh38: Ensembl release 89: ENSG00000179058 – Ensembl, May 2017
^ ^a ^b ^c GRCm38: Ensembl release 89: ENSMUSG00000044320 – Ensembl, May 2017
^ "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
^ "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
^ "uncharacterized protein C9orf50 [Homo sapiens] - Protein - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-02-25.
^ "Gene: C9orf50 (ENSG00000179058) - Summary - Homo sapiens - Ensembl genome browser 95". uswest.ensembl.org. Retrieved 2019-02-25.
^ ^a ^b "C9orf50 Gene". www.genecards.org. Retrieved 2019-02-25.
^ "C9orf50 chromosome 9 open reading frame 50 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-02-25.
^ ^a ^b "C9orf50 chromosome 9 open reading frame 50 [Homo sapiens (human)] - Gene - NCBI".
^ Gene https://www.ncbi.nlm.nih.gov/gene/375759
^ "ExPASy - Compute pI/Mw tool".
^ "EBI Tools: Job not available".
^ ^a ^b "Genomatix: Login Page".
^ "The Mfold Web Server | mfold.rit.albany.edu".
^ "Gds3113 / 115495".
^ "Anti-C9orf50 antibody produced in rabbit Prestige Antibodies Powered by Atlas Antibodies, affinity isolated antibody, buffered aqueous glycerol solution | Sigma-Aldrich".
^ "Gene Detail :: Allen Brain Atlas: Mouse Brain".
^ "EST Profile - Hs.124223".
^ "WoLF PSORT: Advanced Protein Subcellular Localization Prediction Tool - GenScript".
^ "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2019-02-25.
^ "EBI Tools: Error".
^ "SNP linked to Gene (geneID:375759) Via Contig Annotation".

[refGRCh38Ensembl-1] GRCh38: Ensembl release 89: ENSG00000179058 – Ensembl, May 2017

[refGRCm38Ensembl-2] GRCm38: Ensembl release 89: ENSMUSG00000044320 – Ensembl, May 2017

[3] "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.

[4] "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.

[5] "uncharacterized protein C9orf50 [Homo sapiens] - Protein - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-02-25.

[6] "Gene: C9orf50 (ENSG00000179058) - Summary - Homo sapiens - Ensembl genome browser 95". uswest.ensembl.org. Retrieved 2019-02-25.

[:0-7] "C9orf50 Gene". www.genecards.org. Retrieved 2019-02-25.

[:1-8] "C9orf50 chromosome 9 open reading frame 50 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2019-02-25.

[ncbi.nlm.nih.gov-9] "C9orf50 chromosome 9 open reading frame 50 [Homo sapiens (human)] - Gene - NCBI".

[10] Gene https://www.ncbi.nlm.nih.gov/gene/375759

[11] "ExPASy - Compute pI/Mw tool".

[12] "EBI Tools: Job not available".

[Genomatix:_Login_Page-13] "Genomatix: Login Page".

[14] "The Mfold Web Server | mfold.rit.albany.edu".

[15] "Gds3113 / 115495".

[16] "Anti-C9orf50 antibody produced in rabbit Prestige Antibodies Powered by Atlas Antibodies, affinity isolated antibody, buffered aqueous glycerol solution | Sigma-Aldrich".

[17] "Gene Detail :: Allen Brain Atlas: Mouse Brain".

[18] "EST Profile - Hs.124223".

[19] "WoLF PSORT: Advanced Protein Subcellular Localization Prediction Tool - GenScript".

[20] "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2019-02-25.

[21] "EBI Tools: Error".

[22] "SNP linked to Gene (geneID:375759) Via Contig Annotation".

[5]

[6]

[1]

[2]

[3]

[4]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]