Molecular recognition feature

Molecular recognition features (MoRFs) are small (10-70 residues) intrinsically disordered regions in proteins that undergo a disorder-to-order transition upon binding to their partners. MoRFs are implicated in protein-protein interactions, which serve as the initial step in molecular recognition. MoRFs are disordered prior to binding to their partners, whereas they form a common 3D structure after interacting with their partners.[1][2] As MoRF regions tend to resemble disordered proteins with some characteristics of ordered proteins,[2] they can be classified as existing in an extended semi-disordered state.[3]

Categorization

edit

MoRFs can be separated in 4 categories according to the shape they form once bound to their partners.[2]

The categories are:

  • α-MoRFs (when they form alpha-helixes)
  • β-MoRFs (when they form beta-sheets)
  • irregular-MoRFs (when they don't form any shape)
  • complex-MoRFs (combination of the above categories)


MoRF predictors

edit

Determining protein structures experimentally is a very time-consuming and expensive process. Therefore, recent years have seen a focus on computational methods for predicting protein structure and structural characteristics. Some aspects of protein structure, such as secondary structure and intrinsic disorder, have benefited greatly from applications of deep learning on an abundance of annotated data. However, computational prediction of MoRF regions remains a challenging task due to the limited availability of annotated data and the rarity of the MoRF class itself.[4] Most current methods have been trained and benchmarked on the sets released by the authors of MoRFPred[5] in 2012, as well as another set released by the authors of MoRFChibi[6][7][8] based on experimentally-annotated MoRF data. The table below details some methods available as of 2019 for MoRF prediction (related problems are also touched upon).[9]

Predictor Year Published Predicts for Methodology Uses MSA
ANCHOR Archived 2009-10-23 at the Wayback Machine[10] 2009 Protein Binding Regions Amino acid propensity and energy estimation analysis. N
ANCHOR2 [11] 2018 Protein Binding Regions Amino acid propensity and energy estimation analysis. N
DISOPRED3[12] 2015 Protein Intrinsic Disorder and Protein Binding Sites Multistage component prediction (utilizing neural network, Support Vector Machine, and K-nearest neighbour models) for protein disorder prediction. Also uses an additional Support Vector Machine to interpolate binding regions from the disorder predictions. Y
DisoRDPbind[13] 2015 RNA, DNA, and Protein Binding Regions Multiple logistic regression models based on predicted disorder, amino acid properties, and sequence composition. The result is aligned with transferred annotations from a functionally-annotated database. N
fMoRFPred[4] 2016 MoRFs Faster version of MoRFPred without the use of multiple sequence alignments. N
MoRFchibi SYSTEM[6][7][8] 2015 MoRFs Hierarchy of different in-house MoRF prediction models:

MoRFchibi: Utilizes Bayes rule to combine the outcomes of two support Vector Machine modules using amino acid composition (Sigmoid kernel) and sequence similarity (RBF kernel). MoRFchibi_light: Utilizes Bayes rule to combine MoRFchibi and disorder prediction hierarchically. MoRFchibi_web: Utilizes Bayes rule to combine MoRFchibi, disorder prediction and PSSM (MSA) hierarchically.

N/Y
MoRFPred[5] 2012 MoRFs Support Vector Machine based on predicted sequence characteristics and alignment of input sequence to known MoRF database. Y
MoRFPred-Plus[14] 2018 MoRFs Combined predictions from two Support Vector Machines, predicting for both MoRF regions and MoRF residues. Y
OPAL[15] 2018 MoRFs Support Vector Machine based on physicochemical properties and predicted structural attributes of protein residues Y
OPAL+[16] 2019 MoRFs Ensemble of Support Vector Machines trained individually for length-specific MoRF regions. Also incorporates other predictors as a metapredictor. Y
SPINE-D[17][18] 2012 Protein Intrinsic Disorder and Semi-Disorder Neural network for predicting both long and short disordered regions. Semi-disorder can be linearly interpolated from its predicted disorder probabilities (0.4<=P(D)<=0.7). Y
SPOT-Disorder[19] 2017 Protein Intrinsic Disorder and Semi-Disorder Bidirectional Long Short-Term Memory network for predicting intrinsic disorder. Semi-disordered regions can be linearly interpolated from its predicted disorder probabilities (0.28<=P(D)<=0.69). Y
SPOT-MoRF[20] 2019 MoRFs Transfer learning from the large disorder prediction tool SPOT-Disorder2[21] (which itself utilizes an ensemble of Bidirectional Long Short-Term Memory networks and Inception ResNets). Y

Databases

edit

mpMoRFsDB[22]

Mutual Folding Induced by Binding (MFIB) database[23]

References

edit
  1. ^ van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, et al. (July 2014). "Classification of intrinsically disordered regions and proteins". Chemical Reviews. 114 (13): 6589–631. doi:10.1021/cr400525m. PMC 4095912. PMID 24773235.
  2. ^ a b c Mohan A, Oldfield CJ, Radivojac P, Vacic V, Cortese MS, Dunker AK, Uversky VN (October 2006). "Analysis of molecular recognition features (MoRFs)". Journal of Molecular Biology. 362 (5): 1043–59. doi:10.1016/j.jmb.2006.07.087. PMID 16935303.
  3. ^ Zhang T, Faraggi E, Li Z, Zhou Y (2013-05-31). "Intrinsically semi-disordered state and its role in induced folding and protein aggregation". Cell Biochemistry and Biophysics. 67 (3): 1193–205. doi:10.1007/s12013-013-9638-0. PMC 3838602. PMID 23723000.
  4. ^ a b Yan J, Dunker AK, Uversky VN, Kurgan L (March 2016). "Molecular recognition features (MoRFs) in three domains of life". Molecular BioSystems. 12 (3): 697–710. doi:10.1039/C5MB00640F. hdl:1805/11056. PMID 26651072.
  5. ^ a b Disfani FM, Hsu WL, Mizianty MJ, Oldfield CJ, Xue B, Dunker AK, et al. (June 2012). "MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins". Bioinformatics. 28 (12): i75-83. doi:10.1093/bioinformatics/bts209. PMC 3371841. PMID 22689782.
  6. ^ a b Malhis N, Gsponer J (June 2015). "Computational identification of MoRFs in protein sequences". Bioinformatics. 31 (11): 1738–44. doi:10.1093/bioinformatics/btv060. PMC 4443681. PMID 25637562.
  7. ^ a b Malhis N, Wong ET, Nassar R, Gsponer J (2015). "Computational Identification of MoRFs in Protein Sequences Using Hierarchical Application of Bayes Rule". PLOS ONE. 10 (10): e0141603. Bibcode:2015PLoSO..1041603M. doi:10.1371/journal.pone.0141603. PMC 4627796. PMID 26517836.
  8. ^ a b Malhis N, Jacobson M, Gsponer J (July 2016). "MoRFchibi SYSTEM: software tools for the identification of MoRFs in protein sequences". Nucleic Acids Research. 44 (W1): W488-93. doi:10.1093/nar/gkw409. PMC 4987941. PMID 27174932.
  9. ^ Katuwawala A, Peng Z, Yang J, Kurgan L (2019). "Computational Prediction of MoRFs, Short Disorder-to-order Transitioning Protein Binding Regions". Computational and Structural Biotechnology Journal. 17: 454–462. doi:10.1016/j.csbj.2019.03.013. PMC 6453775. PMID 31007871.
  10. ^ Mészáros B, Simon I, Dosztányi Z (May 2009). "Prediction of protein binding regions in disordered proteins". PLOS Computational Biology. 5 (5): e1000376. Bibcode:2009PLSCB...5E0376M. doi:10.1371/journal.pcbi.1000376. PMC 2671142. PMID 19412530.
  11. ^ Mészáros B, Erdos G, Dosztányi Z (July 2018). "IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding". Nucleic Acids Research. 46 (W1): W329–W337. doi:10.1093/nar/gky384. PMC 6030935. PMID 29860432.
  12. ^ Jones DT, Cozzetto D (March 2015). "DISOPRED3: precise disordered region predictions with annotated protein-binding activity". Bioinformatics. 31 (6): 857–63. doi:10.1093/bioinformatics/btu744. PMC 4380029. PMID 25391399.
  13. ^ Peng Z, Kurgan L (October 2015). "High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder". Nucleic Acids Research. 43 (18): e121. doi:10.1093/nar/gkv585. PMC 4605291. PMID 26109352.
  14. ^ Sharma R, Bayarjargal M, Tsunoda T, Patil A, Sharma A (January 2018). "MoRFPred-plus: Computational Identification of MoRFs in Protein Sequences using Physicochemical Properties and HMM profiles". Journal of Theoretical Biology. 437: 9–16. Bibcode:2018JThBi.437....9S. doi:10.1016/j.jtbi.2017.10.015. hdl:10072/376330. PMID 29042212.
  15. ^ Sharma R, Raicar G, Tsunoda T, Patil A, Sharma A (June 2018). "OPAL: prediction of MoRF regions in intrinsically disordered protein sequences". Bioinformatics. 34 (11): 1850–1858. doi:10.1093/bioinformatics/bty032. hdl:10072/379824. PMID 29360926.
  16. ^ Sharma R, Sharma A, Raicar G, Tsunoda T, Patil A (March 2019). "OPAL+: Length-Specific MoRF Prediction in Intrinsically Disordered Protein Sequences". Proteomics. 19 (6): e1800058. doi:10.1002/pmic.201800058. hdl:10072/382746. PMID 30324701. S2CID 53502553.
  17. ^ Zhang T, Faraggi E, Xue B, Dunker AK, Uversky VN, Zhou Y (2012). "SPINE-D: accurate prediction of short and long disordered regions by a single neural-network based method". Journal of Biomolecular Structure & Dynamics. 29 (4): 799–813. doi:10.1080/073911012010525022. PMC 3297974. PMID 22208280.
  18. ^ Zhang T, Faraggi E, Li Z, Zhou Y (2017). "Intrinsic Disorder and Semi-disorder Prediction by SPINE-D". In Zhou Y, Kloczkowski A, FaraggiR, Yang Y (eds.). Prediction of Protein Secondary Structure. Methods in Molecular Biology (vol. 1484). Vol. 1484. New York: Springer. pp. 159–174. doi:10.1007/978-1-4939-6406-2_12. ISBN 9781493964048. PMID 27787826.
  19. ^ Hanson J, Yang Y, Paliwal K, Zhou Y (March 2017). "Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks". Bioinformatics. 33 (5): 685–692. doi:10.1093/bioinformatics/btw678. PMID 28011771.
  20. ^ Hanson, Jack; Litfin, Thomas; Paliwal, Kuldip; Zhou, Yaoqi (2019-09-05). Gorodkin, Jan (ed.). "Identifying Molecular Recognition Features in Intrinsically Disordered Regions of Proteins by Transfer Learning". Bioinformatics. 36 (4): 1107–1113. doi:10.1093/bioinformatics/btz691. ISSN 1367-4803. PMID 31504193.
  21. ^ Hanson, Jack; Paliwal, Kuldip K.; Litfin, Thomas; Zhou, Yaoqi (2020-03-13). "SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning". Genomics, Proteomics & Bioinformatics. 17 (6): 645–656. doi:10.1016/j.gpb.2019.01.004. ISSN 1672-0229. PMC 7212484. PMID 32173600.
  22. ^ Gypas F, Tsaousis GN, Hamodrakas SJ (October 2013). "mpMoRFsDB: a database of molecular recognition features in membrane proteins". Bioinformatics. 29 (19): 2517–8. doi:10.1093/bioinformatics/btt427. PMID 23894139.
  23. ^ Fichó E, Reményi I, Simon I, Mészáros B (November 2017). "MFIB: a repository of protein complexes with mutual folding induced by binding". Bioinformatics. 33 (22): 3682–3684. doi:10.1093/bioinformatics/btx486. PMC 5870711. PMID 29036655.