Computational methods use different properties of protein sequences and structures to find, characterize and annotate protein tandem repeats.
Sequence-based annotation methods
editName | Last update | Usage | Result types | Description | Open source? | Repeat type specific | Reference | |
---|---|---|---|---|---|---|---|---|
ard2 | 2013 | web | annotated sequence | Neural network | no | alpha-solenoid | [1] | |
DECIPHER | 2021 | downloadable | Detection of tandem and/or interspersed repeats by orthology (DetectRepeats function in R package) | yes | no | [2] | ||
TRUST | 2004 | downloadable / web | unit position, multiple sequence alignment | Ab-initio determination of internal repeats in proteins. Exploits transitivity of alignments | ? | no | [3] | |
T-REKS | 2009 | downloadable / web | repeat unit | Clustering of lengths between identical short strings by using a K-means algorithm | yes | no | [4] | |
HHRepID | 2008 | downloadable / web | Identification of repeats in protein sequences via HMM-HMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs | no | [5] | |||
RADAR | 2018 | downloadable / web | unit position, multiple sequence alignment | RADAR identifies short composition biased and gapped approximate repeats, as well as complex repeat architectures involving many different types of repeats in a query sequence | yes | no | [6][7] | |
XSTREAM | 2007 | web | unit position, different periods, multiple sequence alignment | data-mining tool designed to efficiently identify Tandem Repeat (TR) patterns in biological sequence data. The program uses a seed-extension strategy coupled with several post-processing algorithms to analyze FASTA-formatted protein or nucleotide sequences | no | no | [8] | |
TRED | 2007 | downloadable | definition for tandem repeats over the edit distance and an efficient, deterministic algorithm for finding these repeats | no | no | |||
TRAL | 2015 | downloadable | Detects tandem repeats with both de novo software and sequence profile HMMs; statistical significance analysis of putative tandem repeats, and filtering of redundant predictions | yes | [9] | |||
DOTTER | 1995 | downloadable | Graphical dotplot program for detailed comparison of two sequences | [10] | ||||
0J.PY | [11] | |||||||
PTRStalker | 2012 | downloadable | unit position, multiple sequence alignment | Ab-initio detection of fuzzy tandem repeats in protein amino acid sequences. | no | [12] | ||
TRDistiller | 2015 | Rapid sorting of tandem repeat (TR)- and no-TR-containing sequences | [13] | |||||
REPRO | 2000 | web | Repeats detection based on a variation of the Smith-Waterman local alignment strategy followed by a graph-based iterative clustering procedure | no | no | [14] | ||
REP | 2000 | web | no | yes |
Structure-based annotation methods
editName | Last update | Usage | Result types | Description | Open source? | Repeat type specific | Reference |
---|---|---|---|---|---|---|---|
TAPO | 2016 | web | unit position | Uses periodicities of atomic coordinates and other types of structural representation, including strings generated by conformational alphabets, residue contact maps, and arrangements of vectors of secondary structure elements | no | no | [15] |
SYMD | 2014 | galaxy | repeat geometry | Detects internally symmetric protein structures through an “alignment scan” procedure in which a protein structure is aligned to itself after circularly permuting the second copy by all possible number of residues | no | no | [16] |
RAPHAEL | 2012 | web | repeat probability | Reduce to three dimensional structure to a wave function. It then determines periodicity information. | no | no | [17] |
CE-SYMM | 2021 | ||||||
ProSTRIP | 2010 | ||||||
DAVROS | 2004 | ||||||
RQA | 2009 | ||||||
OPAAS | 2006 | ||||||
Gplus | 2009 | ||||||
REUPRED | 2016 | ||||||
ConSole | 2015 | ||||||
RepeatsDB-Lite | 2017 | ||||||
PRIGSA | 2014 | ||||||
Swelfe | 2008 | ||||||
Frustratometer | 2021 |
References
edit- ^ Fournier D, Palidwor GA, Shcherbinin S, Szengel A, Schaefer MH, Perez-Iratxeta C, Andrade-Navarro MA (21 November 2013). "Functional and genomic analyses of alpha-solenoid proteins". PLOS ONE. 8 (11): e79894. Bibcode:2013PLoSO...879894F. doi:10.1371/journal.pone.0079894. PMC 3837014. PMID 24278209.
- ^ Wright ES (2015). "Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R". The R Journal. 8 (1): 352–359. doi:10.1186/s12859-015-0749-z. PMC 4595117. PMID 26445311.
- ^ Szklarczyk, Radek; Heringa, Jaap (2004-08-04). "Tracking repeats using significance and transitivity". Bioinformatics. 20 (Suppl 1): i311–317. doi:10.1093/bioinformatics/bth911. ISSN 1367-4811. PMID 15262814.
- ^ Jorda, Julien; Kajava, Andrey V. (2009-10-15). "T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm". Bioinformatics. 25 (20): 2632–2638. doi:10.1093/bioinformatics/btp482. ISSN 1367-4811. PMID 19671691.
- ^ Zimmermann, Lukas; Stephens, Andrew; Nam, Seung-Zin; Rau, David; Kübler, Jonas; Lozajic, Marko; Gabler, Felix; Söding, Johannes; Lupas, Andrei N. (2018-07-20). "A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core". Journal of Molecular Biology. Computation Resources for Molecular Biology. 430 (15): 2237–2243. doi:10.1016/j.jmb.2017.12.007. ISSN 0022-2836. PMID 29258817. S2CID 22415932.
- ^ Heger, Andreas; Holm, Liisa (2000). "Rapid automatic detection and alignment of repeats in protein sequences". Proteins: Structure, Function, and Genetics. 41 (2): 224–237. doi:10.1002/1097-0134(20001101)41:2<224::aid-prot70>3.0.co;2-z. ISSN 0887-3585. PMID 10966575. S2CID 21757391.
- ^ Lopez, Rodrigo; Paern, Juri; Squizzato, Silvano; Valentin, Franck; Li, Weizhong; McWilliam, Hamish; Goujon, Mickael (2010-07-01). "A new bioinformatics analysis tools framework at EMBL–EBI". Nucleic Acids Research. 38 (suppl_2): W695–W699. doi:10.1093/nar/gkq313. ISSN 0305-1048. PMC 2896090. PMID 20439314.
- ^ Newman, Aaron M.; Cooper, James B. (2007-10-11). "XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences". BMC Bioinformatics. 8 (1): 382. doi:10.1186/1471-2105-8-382. ISSN 1471-2105. PMC 2233649. PMID 17931424.
- ^ Anisimova, Maria; Xenarios, Ioannis; Zoller, Stefan; Stockinger, Heinz; Murri, Riccardo; Messina, Antonio; Pečerska, Jūlija; Korsunsky, Alexander; Schaper, Elke (2015-09-15). "TRAL: tandem repeat annotation library". Bioinformatics. 31 (18): 3051–3053. doi:10.1093/bioinformatics/btv306. hdl:20.500.11850/103876. ISSN 1367-4803. PMID 25987568.
- ^ Sonnhammer, E. L.; Durbin, R. (1995-12-29). "A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis". Gene. 167 (1–2): GC1–10. doi:10.1016/0378-1119(95)00714-8. ISSN 0378-1119. PMID 8566757.
- ^ Wise, M. J. (2001). "0j.py: a software tool for low complexity proteins and protein domains". Bioinformatics. 17 (Suppl 1): S288–295. doi:10.1093/bioinformatics/17.suppl_1.s288. ISSN 1367-4803. PMID 11473020.
- ^ Pellegrini, Marco; Renda, Maria Elena; Vecchio, Alessio (2012-03-21). "Ab initio detection of fuzzy amino acid tandem repeats in protein sequences". BMC Bioinformatics. 13 (3): S8. doi:10.1186/1471-2105-13-S3-S8. ISSN 1471-2105. PMC 3402919. PMID 22536906.
- ^ Richard, François D.; Kajava, Andrey V. (2014-06-01). "TRDistiller: A rapid filter for enrichment of sequence datasets with proteins containing tandem repeats". Journal of Structural Biology. 186 (3): 386–391. doi:10.1016/j.jsb.2014.03.013. ISSN 1047-8477. PMID 24681324.
- ^ George, Richard A.; Heringa, Jaap (October 2000). "The REPRO server: finding protein internal sequence repeats through the Web". Trends in Biochemical Sciences. 25 (10): 515–517. doi:10.1016/s0968-0004(00)01643-1. ISSN 0968-0004. PMID 11203383.
- ^ Do Viet, Phuong; Roche, Daniel B.; Kajava, Andrey V. (2015-09-14). "TAPO: A combined method for the identification of tandem repeats in protein structures". FEBS Letters. 589 (19 Pt A): 2611–2619. doi:10.1016/j.febslet.2015.08.025. ISSN 1873-3468. PMID 26320412.
- ^ Tai, Chin-Hsien; Paul, Rohit; KC, Dukka; Shilling, Jeffery D.; Lee, Byungkook (2014-07-01). "SymD webserver: a platform for detecting internally symmetric protein structures". Nucleic Acids Research. 42 (Web Server issue): W296–W300. doi:10.1093/nar/gku364. ISSN 0305-1048. PMC 4086132. PMID 24799435.
- ^ Walsh, Ian; Sirocco, Francesco G.; Minervini, Giovanni; Di Domenico, Tomás; Ferrari, Carlo; Tosatto, Silvio C. E. (2012-09-08). "RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures". Bioinformatics. 28 (24): 3257–3264. doi:10.1093/bioinformatics/bts550. ISSN 1460-2059. PMID 22962341.