Family with sequence similarity 98, member A, or FAM98A, is a gene that in the human genome encodes the FAM98A protein. FAM98A has two paralogs in humans, FAM98B and FAM98C. All three are characterized by DUF2465, a conserved domain shown to bind to RNA.[5] FAM98A is also characterized by a glycine-rich C-terminal domain.[6] FAM98A also has homologs in vertebrates and invertebrates and has distant homologs in choanoflagellates and green algae.

FAM98A
Identifiers
AliasesFAM98A, family with sequence similarity 98 member A
External IDsMGI: 1919972; HomoloGene: 41042; GeneCards: FAM98A; OMA:FAM98A - orthologs
Orthologs
SpeciesHumanMouse
Entrez
Ensembl
UniProt
RefSeq (mRNA)

NM_015475
NM_001304538

NM_133747
NM_001357860

RefSeq (protein)

NP_001291467
NP_056290
NP_001291467.1
NP_056290.3

NP_598508
NP_001344789

Location (UCSC)Chr 2: 33.53 – 33.6 MbChr 17: 75.84 – 75.86 Mb
PubMed search[3][4]
Wikidata
View/Edit HumanView/Edit Mouse

Gene

edit

Locus

edit

The FAM98A gene is located on 2p22.3 in humans on the "-" (minus) strand. Including the 5' and 3' UTR, the gene spans 15,634 bases and contains 8 exons.[7]

mRNA

edit

The mRNA is 2745bp, comprising the 8 exons. The coding sequence starts at base 75 and continues until base 1631. The polyA tail signal sequence is a six-nucleotide sequence 20 bases from the 3' end of the transcript at base 2725-2730, and the polyA site is at base 2745.[8]

Protein

edit

Primary Sequence

edit

FAM98A is 518 amino acids in length with a molecular weight of 55.3 kDa, without modifications. Residues 10-329 comprise the DUF2465, and the remainder of the protein is a diglycine-rich C terminus. Glycine makes up approximately 20% of the protein, with the majority of these in the last 200 residues.[9]

Post-Translational Modifications

edit

FAM98A has six strongly predicted phosphorylation sites in DUF2465. These sites are predicted to phosphorylate S169, T178, S236, T243, S276, and S285 by protein kinase C.[10] GPS also predicts phosphorylation by protein kinase C at S285 and T178.[11] FAM98A is likely sumoylated at K183 and K195.[12] Sumoylation may allow the cell to re-localize FAM98A between the nucleus and the cytoplasm.[13] The glycine-rich C terminus has repeat GRG sequences, which has been shown to be susceptible to methylation of the arginine, either symmetrically or asymmetrically.[14] Another paper explains the effects of arginine methylation on biochemical functions such as transcription activation and repression, mRNA splicing, nuclear-cytosolic shuttling, and DNA repair.[15]

Secondary Structure

edit

The N terminus is predicted to have multiple alpha helices, though the C terminus likely is only coiled.[16] The alpha helices do not form any channel, and FAM98A is not a transmembrane protein.

Tertiary and Quaternary Structure

edit

The structure of FAM98A was predicted with the program Phyre2. The N-terminal region contains several alpha helices, and a C-terminal coiled region corresponding to the glycine-rich C terminus. These two regions of the protein are connected by an alpha helix approximately 50 residues long from the residues 200-256. Phyre2 found the most similar protein to be the human protein NDC80 kinetochore complex component, a nuclear protein that binds to microtubules.[17]

Domains and Motifs

edit

FAM98A has a domain of unknown function 2465 (DUF2465) from the amino acids 10-329. Within the DUF2465, there is a heptide (VPDRGGR) near the C-terminal end that is conserved in all species tested. The C-terminal end is a glycine-rich domain (glycine makes up about 40% of the C terminus) with GGRGGR repeats.[9] At residues 149-155, there is a predicted nuclear export signal, with the sequence ICIALGM (generally [LIVFM]-X-[LIVFM]-X-[LIVFM]-X-[LIVFM]).[18] Residues 173-176 are predicted to be a nuclear localization signal KKLK (K-[K/R]-X-[K/R]).[19]

Homology

edit

Paralogs

edit

FAM98A has two paralogs: FAM98B and FAM98C. FAM98A is longest of the three paralogous protein products with 518 amino acids. It is more similar to FAM98B, whose glycine-rich C terminus is much shorter than FAM98A. FAM98C less similar than FAM98B to FAM98A, all but lacking in a C terminus after DUF2465, as well as containing more differences in the amino acid sequence within the DUF2465. All three protein products have been shown experimentally to associate non-specifically with RNA: FAM98A binds to mRNA and FAM98B is incorporated into a tRNA-splicing complex.[5]

Orthologs

edit

Orthologs for FAM98A have been found in vertebrates. In insects and molluscs, there are predicted proteins for a FAM98A gene. Because there are three paralogs of FAM98 in humans, there is a common ancestor of these genes. A strict ortholog, a gene that is orthologous to FAM98A and not the entire FAM98 family, is less clear. FAM98A has not yet been thoroughly studied, compounded with the fact that many genomes are yet to be recorded, makes it more difficult to determine if the predicted FAM98A gene in mosquitoes is a strict ortholog (the split of FAM98 into FAM98A,B,C occurred before the species diverged) or if it is a homolog ("FAM98A" in mosquitoes is the ancestral FAM98 gene).

 
A graph of the data at the left relating the percent identity of the protein sequence between animals and humans.
Sequence Number Genus species (Gsp) Common Name Date of Divergence (MYA) (from Time Tree) Accession # (from NCBI) Sequence Length (AA) Identity Similarity
1 Homo sapiens (Hsa) Human 0

NM_015475.3

518 100 100
2 Mus musculus (Mmu) Mouse 92.3

NP_598508.2

515 95 96
3 Camelus ferus (Cfe) Bactrian Camel 94.2

XP_006192455.1

517 97 98
4 Pantholops hodgsonii (Pho) Tibetan Antelope 94.2

XP_005963883.1

521 96 97
5 Elephantulus edwardii (Eed) Cage Elephant Shrew 98.7

XP_006882420.1

517 94 96
6 Geospiza fortis (Gfo) Medium Ground Finch 296

XP_005416400.1

648 84 88
7 Pseudopodoces humilis (Phu) Ground Tit 296

XP_005526966.1

545 84 88
8 Alligator mississippiensis (Ami) American Alligator 296

XP_006273242.1

556 81 86
9 Pelodiscus sinensis (Psi) Chinese Soft-shelled Turtle 296

XP_006131385.1

549 85 88
10 Chrysemys picta bellii (Cpi) Western Painted Turtle 296

XP_005296336.1

549 85 88
11 Xenopus tropicalis (Xtr) Western Clawed Frog 371.2

XP_002934502.2

520 79 86
12 Anoplopoma fimbria (Afi) Sablefish 400.1

BT082651.1

353 31 48
13 Ictalurus punctatus (Ipu) Channel Catfish 400.1

AHH38396.1

543 67 75
14 Camponotus floridanus (Cfl) Florida Carpenter Ant 782.7

EFN74857.1

516 41 53
15 Culex quinquefasciatus (Cqu) Mosquito 782.7

XM_001846602.1

498 38 52
16 Ceratitis capitata (Cca) Medfly 782.7

JAC03102.1

454 35 51
17 Lepeophtheirus salmonis (Lsa) Salmon Louse 782.7

BT078155.1

467 29 45
18 Crassotrea gigas (Cgi) Pacific Oyster 782.7

EKC33026.1

422 45 59
19 Clonorchis sinensis (Csi) Chinese Liver Fluke 792.4

GAA34581.2

378 35 47
20 Echinococcus granulosus (Egr) Dog Tapeworm 792.4

CDJ19758.1

1177 39 56

Distant Homologs

edit

Genes homologous to FAM98A are predicted to occur in many taxa within Animalia, but there are other taxa outside of Animalia that may have homologous FAM98 genes in their genomes. Eukaryotes such as the opisthokonts Monosiga brevicollis (XP_00174505.1) and Capraspora owczarzaki (XP_004346371.1), and even the protist Chlorella variabilis (XP_005845167.1), a green alga, may contain FAM98 in their genomes.[20]

Homologous Domains

edit

The homologous domain in FAM98A is the DUF2465 (Domain of Unknown Function 2465) domain. The function of this domain, like the gene itself, is largely unknown, though it has been reported that it preferentially binds to RNA, targeting mRNA in FAM98A and tRNA in FAM98B.[5]

Expression

edit

Promoter

edit

The promoter (GXP_90934) assigned to the human FAM98A transcript (GXT_24436545)[21] is 915 bp long, and it overlaps with the transcript to include 243 bp of mRNA transcript. Nuclear respiratory factor 1 (NRF1) is a transcription factor that had seven sites predicted to bind on the promoter, four of which had a Matrix similarity - optimum score of greater than or equal to 0.085 and the two highest scoring transcription factors predicted were NRF1 with scores of 0.204 and 0.199.[22]

Expression

edit

In a GEO large-scale human transcriptome, FAM98A was ubiquitously expressed, though not uniformly expressed. Cell types that were most highly expressed were many parts of the brain (cortex, amygdala, thalamus, corpus callosum, and pituitary gland), the testis, uterus, and smooth muscle.[23] According to Aceview, FAM98A is expressed at 3.9 times the expression of the average gene. Eleven transcripts have been identified by AceView, five of which were "good", complete (both N and C termini fully translated) proteins. From the transcripts, there are apparently two main parts of FAM98A: the first four exons and the second four exons, and these parts correspond roughly to the tertiary structure of the protein - the N-terminal alpha-helices to exons 1-4, and the long alpha-helical arm and C terminus coils to exons 5-8.[24]

Function and Biochemistry

edit

The function of FAM98A has not been experimentally determined, though it has been shown to bind its DUF2465 with mRNA.[5] Kiraga et al. have noted that basic proteins bind with nucleic acids.[25] In fact, FAM98A (and it orthologs) have an unmodified isoelectric point of approximately 9.[26]

Known Interactions

edit

FAM98A has been experimentally shown to interact with UBC, DDX1, C14orf166, and SUMO3, and it is coexpressed with DDX1, C14orf166, and RBM25.[27] These latter three proteins interact with mRNA, as FAM98A is also predicted to do. DDX1 is a putative ATP-dependent RNA helicase in a spliceosome, likely releasing the RNA from the splicing complex.[28] C14orf166 is a polymerase II binding factor,[29] and RBM25 regulates alternative splicing.[30] All of these interactions suggest that FAM98A is a nuclear protein. FAM98A also interacts with SUMO3, which sumoylates lysines in the protein to facilitate transport across the nuclear membrane between the nucleus and cytosol.[13] FAM98A also binds nonspecific mRNA indicating a potential mRNA shuttle out of the nucleus to the ribosomes.[5]

Clinical Significance

edit

In a study that looked at differences in expression levels of certain genes (including FAM98A) in both young and old men with high or low protein diets, the expression levels were measured as a ratio of low/high protein diets in each group of men (young and old). FAM98A had increased expression in low protein diets in both young and old men, 1.01 and 1.20, respectively. Only one other gene in the study had the same trend of increased expression in lower protein diets in both groups: THOC4.[31] THOC4, THO Complex 4 or Aly/REF export factor, dimerizes to form a larger complex and chaperones spliced mRNA, assisting with processing and export of the mRNA.[32] The paper mentions that up-regulation of mRNA in older individuals is associated with RNA binding/splicing, signaling proteins, and protein degradation; in fact, the older group has the higher expression of FAM98A in low protein diets than the younger men.[31]

Disease Association

edit

Research on a population in Taiwan has found an association between young onset hypertension and two SNPs upstream of four genes at the locus 2p22.3. One of these four genes was FAM98A, though more research must be done to verify that it was FAM98A that was the gene responsible for the hypertension.[33] Indeed, FAM98A is expressed moderately high (roughly the 75th percentile) in smooth muscle and cardiac myocytes.[23]

References

edit
  1. ^ a b c GRCh38: Ensembl release 89: ENSG00000119812Ensembl, May 2017
  2. ^ a b c GRCm38: Ensembl release 89: ENSMUSG00000002017Ensembl, May 2017
  3. ^ "Human PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  4. ^ "Mouse PubMed Reference:". National Center for Biotechnology Information, U.S. National Library of Medicine.
  5. ^ a b c d e Dürnberger G, Bürckstümmer T, Huber K, Giambruno R, Doerks T, Karayel E, Burkard TR, Kaupe I, Müller AC, Schönegger A, Ecker GF, Lohninger H, Bork P, Bennett KL, Superti-Furga G, Colinge J (July 2013). "Experimental characterization of the human non-sequence-specific nucleic acid interactome". Genome Biology. 14 (7): R81. doi:10.1186/gb-2013-14-7-r81. PMC 4053969. PMID 23902751.
  6. ^ "Pfam: Family: DUF2465 (PF10239)". Pfam. EMBL-EBI. Retrieved 5 May 2014.
  7. ^ "Human Gene FAM98A (uc002rpa.1)". Genome. NCBI. Retrieved 5 May 2014.
  8. ^ NCBI (National Center for Biotechnology Information) mRNA sequence FAM98A NM_015475.3 https://www.ncbi.nlm.nih.gov/nuccore/NM_015475.3
  9. ^ a b Brendel V, Bucher P, Nourbakhsh I, Blaisdell B, Karlin S (1992). "Methods and algorithms for statistical analysis of protein sequences". Proc. Natl. Acad. Sci. U.S.A. 89 (6): 2002–2006. Bibcode:1992PNAS...89.2002B. doi:10.1073/pnas.89.6.2002. PMC 48584. PMID 1549558.
  10. ^ Blom N, Sicheritz-Pontén T, Gupta R, Gammeltoft S, Brunak S (June 2004). "Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence". Proteomics. 4 (6): 1633–49. doi:10.1002/pmic.200300771. PMID 15174133. S2CID 18810164.
  11. ^ Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X (September 2008). "GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy". Molecular & Cellular Proteomics. 7 (9): 1598–608. doi:10.1074/mcp.m700574-mcp200. PMC 2528073. PMID 18463090.
  12. ^ Abgent, a WuXi App Tec company. SUMOplotTM Analysis Program. 2013. http://www.abgent.com/tools
  13. ^ a b Matunis MJ, Coutavas E, Blobel G (December 1996). "A novel ubiquitin-like modification modulates the partitioning of the Ran-GTPase-activating protein RanGAP1 between the cytosol and the nuclear pore complex". The Journal of Cell Biology. 135 (6 Pt 1): 1457–70. doi:10.1083/jcb.135.6.1457. PMC 2133973. PMID 8978815.
  14. ^ Hyun YL, Lew DB, Park SH, Kim CW, Paik WK, Kim S (June 2000). "Enzymic methylation of arginyl residues in -gly-arg-gly- peptides". The Biochemical Journal. 348 (3): 573–8. doi:10.1042/0264-6021:3480573. PMC 1221099. PMID 10839988.
  15. ^ Bedford MT, Clarke SG (January 2009). "Protein arginine methylation in mammals: who, what, and why" (PDF). Molecular Cell. 33 (1): 1–13. doi:10.1016/j.molcel.2008.12.013. PMC 3372459. PMID 19150423.
  16. ^ PELE (BPS, D_R, DSC, GGR, GOR, G_G, H_K, K_S, L_G, Q_S, JOI). SDSC Workbench. Board of Trustees of the University of Illinois, 1999.
  17. ^ Kelley LA, Sternberg MJ (2009). "Protein structure prediction on the Web: a case study using the Phyre server" (PDF). Nature Protocols. 4 (3): 363–71. doi:10.1038/nprot.2009.2. hdl:10044/1/18157. PMID 19247286. S2CID 12497300.
  18. ^ Fu SC, Imai K, Horton P (September 2011). "Prediction of leucine-rich nuclear export signal containing proteins with NESsential". Nucleic Acids Research. 39 (16): e111. doi:10.1093/nar/gkr493. PMC 3167595. PMID 21705415.
  19. ^ Timmers AC, Stuger R, Schaap PJ, van 't Riet J, Raué HA (June 1999). "Nuclear and nucleolar localization of Saccharomyces cerevisiae ribosomal proteins S22 and S25". FEBS Letters. 452 (3): 335–40. doi:10.1016/s0014-5793(99)00669-9. PMID 10386617. S2CID 1933493.
  20. ^ Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (September 1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Research. 25 (17): 3389–402. doi:10.1093/nar/25.17.3389. PMC 146917. PMID 9254694.
  21. ^ Transcript GXT_2827489. Genomatix software. 2014. http://www.genomatix.de/cgi-bin/[permanent dead link]/eldorado/eldorado.pl?s=2ab9d4751cbd873358acdd746c629f61;TRANS=1;TRANSCRIPTID=2827489;ELDORADO_VERSION=E28R1306
  22. ^ GXP_90934. Genomatix software. 2014. http://www.genomatix.de/cgi-bin/[permanent dead link]/eldorado/eldorado.pl?s=99a7e4da5d3118fa8a93fb9a283d710f;PROM_ID=GXP_90934;GROUP=vertebrates;GROUP=others;ELDORADO_VERSION=E28R1306
  23. ^ a b National Center for Biotechnology Information, US National Library of Medicine. Gene Expression Omnibus (GEO) Profiles. "Large-scale analysis of the human transcriptome (HG-U133A)". https://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS596:212333_at
  24. ^ "Homo sapiens complex locus FAM98A, encoding family with sequence similarity 98, member A." AceView. NCBI.
  25. ^ Kiraga J, Mackiewicz P, Mackiewicz D, Kowalczuk M, Biecek P, Polak N, Smolarczyk K, Dudek MR, Cebrat S (June 2007). "The relationships between the isoelectric point and: length of proteins, taxonomy and ecology of organisms". BMC Genomics. 8: 163. doi:10.1186/1471-2164-8-163. PMC 1905920. PMID 17565672.
  26. ^ Program by Dr. Luca Toldo, developed at http://www.embl-heidelberg.de. Changed by Bjoern Kindler to print also the lowest found net charge. Available at EMBL WWW Gateway to Isoelectric Point Service "EMBL WWW Gateway to Isoelectric Point Service". Archived from the original on 2008-10-26. Retrieved 2014-05-10.
  27. ^ STRING 9.1. FAM98A. http://string-db.org/newstring_cgi/show_network_section.pl
  28. ^ "DEAD (Asp-Glu-Ala-Asp) Box Helicase 1". GeneCards.
  29. ^ "Chromosome 14 Open Reading Frame 166". GeneCards.
  30. ^ "RNA Binding Motif Protein 25". GeneCards.
  31. ^ a b Thalacker-Mercer AE, Fleet JC, Craig BA, Campbell WW (November 2010). "The skeletal muscle transcript profile reflects accommodative responses to inadequate protein intake in younger and older males". The Journal of Nutritional Biochemistry. 21 (11): 1076–82. doi:10.1016/j.jnutbio.2009.09.004. PMC 2891367. PMID 20149619.
  32. ^ "Aly/REF Export Factor". GeneCards.
  33. ^ Yang HC, Liang YJ, Wu YL, Chung CM, Chiang KM, Ho HY, Ting CT, Lin TH, Sheu SH, Tsai WC, Chen JH, Leu HB, Yin WH, Chiu TY, Chen CI, Fann CS, Wu JY, Lin TN, Lin SJ, Chen YT, Chen JW, Pan WH (2009). "Genome-wide association study of young-onset hypertension in the Han Chinese population of Taiwan". PLOS ONE. 4 (5): e5459. Bibcode:2009PLoSO...4.5459Y. doi:10.1371/journal.pone.0005459. PMC 2674219. PMID 19421330.