Serratus is a large scale viroinformatics platform for uncovering the total genetic diversity of Earth's virome. Originating with the goal of uncovering novel coronaviruses[1] that may have been incidentally sequenced by other researchers, the project expanded to encompass all RNA viruses, those which encode a viral RNA-dependent RNA polymerase (RdRp).

Serratus
Stable release
v210110 / January 10th 2023
Operating systemLinux, web-based
TypeBioinformatics
Licensecode, GPLv3. data, cc0
Websiteserratus.io

By the end of 2020 there were approximately 15,000 distinct RNA virus sequences known from public databases, measured by the number of distinct RdRp (greater than 10% difference in amino acid sequence). Using a bioinformatics workflow optimized for large-scale cloud computing, the research team analyzed 5.7 million freely available sequencing datasets (20.4 petabytes of raw data) in the Sequence Read Archive (SRA) in only 11 days and a computing cost of US$23,900.[2] This analysis yielded 132,000 novel viral RdRp, representing nearly an order of magnitude increase in the known genetic diversity of RNA viruses.[3]

Within the database, RNA viruses are classified according to their RdRp palmprint,[4] a type of molecular barcode. The palmprint can be used as a computationally efficient index for the identification of which SRA sequencing runs contain a particular RNA virus. Such an index allows for targeted analysis of raw sequencing datasets from which novel RNA viruses can be characterized.[5]

All Serratus data are freely-available under the INDSC release policy.

References

edit
  1. ^ Pennisi, Elizabeth. "New dangers? Computers uncover 100,000 novel viruses in old genetic data". www.science.org. Science. Retrieved 13 January 2023.
  2. ^ Pelley, Lauren. "Supercomputer helps Canadian researcher uncover thousands of viruses that could cause human diseases". CBC. Retrieved 13 January 2023.
  3. ^ Edgar RC, Taylor J, Lin V, Altman T, Barbera P, Meleshko D; et al. (2022). "Petabase-scale sequence alignment catalyses viral discovery". Nature. 602 (7895): 142–147. Bibcode:2022Natur.602..142E. doi:10.1038/s41586-021-04332-2. PMID 35082445. S2CID 221141152.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  4. ^ Babaian, Artem; Edgar, Robert (13 October 2022). "Ribovirus classification by a polymerase barcode sequence". PeerJ. 10: e14055. doi:10.7717/peerj.14055. ISSN 2167-8359. PMC 9573346. PMID 36258794.
  5. ^ Cabrera Mederos, Dariel; Debat, Humberto; Torres, Carolina; Portal, Orelvis; Jaramillo Zapata, Margarita; Trucco, Verónica; Flores, Ceferino; Ortiz, Claudio; Badaracco, Alejandra; Acuña, Luis; Nome, Claudia; Quito-Avila, Diego; Bejerman, Nicolas; Castellanos Collazo, Onias; Sánchez-Rodríguez, Aminael; Giolitti, Fabián (October 2022). "An Unwanted Association: The Threat to Papaya Crops by a Novel Potexvirus in Northwest Argentina". Viruses. 14 (10): 2297. doi:10.3390/v14102297. ISSN 1999-4915. PMC 9610017. PMID 36298852.
edit