DIMPL (Discovery of Intergenic Motifs PipeLine) is a bioinformatic pipeline that enables the extraction and selection of bacterial GC-rich intergenic regions (IGRs) that are enriched for structured non-coding RNAs (ncRNAs).[1] The method of enriching bacterial IGRs for ncRNA motif discovery was first reported for a study in "Genome-wide discovery of structured noncoding RNAs in bacteria".[2]

DIMPL pipeline automates the process of total genome analysis by extracting IGRs, filtering them by length and nucleic acid composition, and collecting the data necessary to identify candidate motifs and assign their possible functions. DIMPL pipeline provides reproducible techniques for identifying genomic regions enriched for ncRNA through support vector machine (SVM) classifiers. It can be used to look for nucleic acid and protein motifs, including riboswitch-like elements, upstream open reading frames (uORFs), short open reading frames (sORFs), ribosomal protein leader sequences, selfish genetic elements and other structured RNA motifs of unknown function.

DIMPL uses various sequence analysis resources, including:

  • Rfam database,[3] as a reference of known RNA families
  • BLASTX search tool,[4] to eliminate unannotated protein coding regions
  • INFERNAL package,[5][6] to search the IGSs sequences
  • CMfinder,[7] to look for possible RNA secondary structure features
  • R-scape software[8] and R2R drawing algorithm,[9] to generate the consensus model
  • RNAcode,[10] to look for the presence of coding regions
  • GenomeView,[11] to visualize the genetic context of the RNA motif

RNA motifs discovered using DIMPL include HMP-PP riboswitch, icd-II ncRNA motif, carA ncRNA motif, ldh2 ncRNA motif,[12] among others.

References

edit
  1. ^ Brewer, Kenneth I.; Gaffield, Glenn J.; Puri, Malavika; Breaker, Ronald R. (2021-09-15). "DIMPL: a bioinformatics pipeline for the discovery of structured noncoding RNA motifs in bacteria". Bioinformatics. 38 (2): 533–535. doi:10.1093/bioinformatics/btab624. ISSN 1367-4811. PMC 8723152. PMID 34524415.
  2. ^ Stav, Shira; Atilho, Ruben M.; Mirihana Arachchilage, Gayan; Nguyen, Giahoa; Higgs, Gadareth; Breaker, Ronald R. (2019-03-22). "Genome-wide discovery of structured noncoding RNAs in bacteria". BMC Microbiology. 19 (1): 66. doi:10.1186/s12866-019-1433-7. ISSN 1471-2180. PMC 6429828. PMID 30902049.
  3. ^ Kalvari, Ioanna; Nawrocki, Eric P.; Argasinska, Joanna; Quinones-Olvera, Natalia; Finn, Robert D.; Bateman, Alex; Petrov, Anton I. (2018-06-05). "Non-Coding RNA Analysis Using the Rfam Database". Current Protocols in Bioinformatics. 62 (1): e51. doi:10.1002/cpbi.51. ISSN 1934-340X. PMC 6754622. PMID 29927072.
  4. ^ Camacho, Christiam; Coulouris, George; Avagyan, Vahram; Ma, Ning; Papadopoulos, Jason; Bealer, Kevin; Madden, Thomas L. (2009-12-15). "BLAST+: architecture and applications". BMC Bioinformatics. 10: 421. doi:10.1186/1471-2105-10-421. ISSN 1471-2105. PMC 2803857. PMID 20003500.
  5. ^ Mandiwanza, Tafadzwa; Kaliaperumal, Chandrasekaran; Mulligan, Linda; Ryan, Elizabeth; Looby, Seamus; Caird, John; Brett, Francesca (2017-02-20). "Child with radiologically recurrent thalamic tumor". Brain Pathology. 27 (2): 239–240. doi:10.1111/bpa.12490. ISSN 1015-6305. PMC 8029015. PMID 28217956.
  6. ^ Nawrocki, Eric P.; Eddy, Sean R. (2013-11-15). "Infernal 1.1: 100-fold faster RNA homology searches". Bioinformatics. 29 (22): 2933–2935. doi:10.1093/bioinformatics/btt509. ISSN 1367-4811. PMC 3810854. PMID 24008419.
  7. ^ Yao, Zizhen; Weinberg, Zasha; Ruzzo, Walter L. (2006-02-15). "CMfinder--a covariance model based RNA motif finding algorithm". Bioinformatics. 22 (4): 445–452. doi:10.1093/bioinformatics/btk008. ISSN 1367-4803. PMID 16357030.
  8. ^ Rivas, Elena; Clements, Jody; Eddy, Sean R. (2016-11-07). "A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs". Nature Methods. 14 (1): 45–48. doi:10.1038/nmeth.4066. ISSN 1548-7105. PMC 5554622. PMID 27819659.
  9. ^ Weinberg, Zasha; Breaker, Ronald R. (2011-01-04). "R2R--software to speed the depiction of aesthetic consensus RNA secondary structures". BMC Bioinformatics. 12: 3. doi:10.1186/1471-2105-12-3. ISSN 1471-2105. PMC 3023696. PMID 21205310.
  10. ^ Washietl, Stefan; Findeiss, Sven; Müller, Stephan A.; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L.; Stadler, Peter F.; Goldman, Nick (2011-02-28). "RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data". RNA. 17 (4): 578–594. doi:10.1261/rna.2536111. ISSN 1469-9001. PMC 3062170. PMID 21357752.
  11. ^ Abeel, Thomas; Van Parys, Thomas; Saeys, Yvan; Galagan, James; Van de Peer, Yves (2011-11-18). "GenomeView: a next-generation genome browser". Nucleic Acids Research. 40 (2): e12. doi:10.1093/nar/gkr995. ISSN 1362-4962. PMC 3258165. PMID 22102585.
  12. ^ Brewer, Kenneth I.; Greenlee, Etienne B.; Higgs, Gadareth; Yu, Diane; Mirihana Arachchilage, Gayan; Chen, Xi; King, Nicholas; White, Neil; Breaker, Ronald R. (2021-05-10). "Comprehensive discovery of novel structured noncoding RNAs in 26 bacterial genomes". RNA Biology. 18 (12): 2417–2432. doi:10.1080/15476286.2021.1917891. ISSN 1555-8584. PMC 8632094. PMID 33970790. S2CID 234361097.