Time-resolved RNA sequencing

Time-resolved RNA sequencing methods are applications of RNA-seq that allow for observations of RNA abundances over time in a biological sample or samples. Second-Generation DNA sequencing has enabled cost effective, high throughput and unbiased analysis of the transcriptome.[1] Normally, RNA-seq is only capable of capturing a snapshot of the transcriptome at the time of sample collection.[1] This necessitates multiple samplings at multiple time points, which increases both monetary and time costs for experiments. Methodological and technological innovations have allowed for the analysis of the RNA transcriptome over time without requiring multiple samplings at various time points.

Background

edit

While DNA encodes all of the functional elements of life, the information encoded must be converted into functional form. Following the central dogma of molecular biology, messenger RNA encodes genetic information for producing proteins, which, alongside functional RNA carry out the majority of cellular processes required for life.[2] Changes in RNA abundance may be used as a measurement of changes in cellular behavior, such as heat stress, infection by virus, or oncogenesis.[3] Knowledge of how the transcriptome changes during cellular processes allows for greater understanding of the exact mechanisms underlying these processes.

Originally, transcriptome-wide RNA abundance could only be assessed using methods such as DNA microarrays or serial analysis of gene expression (SAGE).[4][5] These methods are prohibitive in differing regards; microarrays, while cheap, provide inconsistent results[6] and SAGE is based on sanger sequencing, which provides limited throughput. Using second generation sequencing, instead of measuring relative hybridization of sequences to probes in the case of microarrays or sequencing short segments in the case of SAGE, a researcher can simply sequence the bulk RNA within a sample and measure relative abundances of specific types of RNA by comparing the number of times each RNA molecule was sequenced in a given sample.

Normally, in a traditional RNA-seq, microarray, or SAGE experiment RNA is extracted from a biological sample such as cultured cells, and the RNA is analyzed using the chosen method. The data obtained from such an experiment corresponds to abundance of RNA under the given experimental conditions at the time of harvest. For many applications, such as comparing the abundance of mRNA molecules between cells exposed to a drug and those not exposed to the drug, this type of experimental approach is sufficient. However, many cellular processes of scientific and medical interest are processes which occur over time, such as cellular differentiation or phagocytosis.[7][8] Studying such processes requires analysis of RNA abundance across a series of time points.

Methods

edit
 
Comparison of Time Resolved RNA-seq methods. Time series samples requires samples from both before and after all-time points, as well as sequencing of all biological samples separately. Affinity purification reduces the number of biological samples required, but increases the number of sequencing runs required. Nucleotide conversion requires the fewest biological samples and sequencing runs overall. h = hours.

Time series samples

edit

Sample preparation and data processing

edit

The simplest approach towards assessing RNA abundance over time is to simply use multiple samples which are treated in exactly the same way, except for the duration of treatment. For example, to investigate a biological process which is estimated to occur for an hour, a researcher might design an experiment where the process is triggered for five minutes, 15 minutes, 30 minutes, 45 minutes, one hour, and two hours in separate cell culture samples before harvesting the cells for RNA-seq analysis. The researcher would then have measurements of the transcriptome at each of these time points, and comparing between these samples would indicate which cellular processes are activated and deactivated over time.

Strengths

edit

This method is the most common for measurement of RNA over time in cell culture models, mainly due to its simplicity. Each biological sample need only be processed in exactly the same way, and the factor of time is easily adjusted in most experimental protocols. Furthermore, since each time point is its own sample, more RNA can be harvested and sequenced for a study.

Weaknesses

edit

The requirement of multiple samples for time-resolved data collection increases the cost of the experiment as well as introducing a greater potential for technical errors. While the price of massively parallel sequencing has decreased greatly since its introduction, it is still prohibitively expensive for many laboratories to conduct large scale RNA-seq studies. This issue is compounded by additional time points increasing the number of samples by a multiple of the number of time points; using two time points rather than one doubles the number of samples required in an experiment. Consequently, many studies which use time series RNA-seq become limited in either their sample size, which reduces statistical power,[9] or the number of time points, which reduces their time resolution, or both. Finally, by requiring a greater number of biological samples, there is greater risk for human error to affect the results, which may lead to spurious conclusions[10][11]

Affinity Purification

edit
 
Comparison of metabolic labelling workflows.

Sample preparation and data processing

edit

In this approach, cell culture samples are cultured with tagged nucleotides which allow for selective purification of newly synthesized RNA molecules. One popular approach is pulse labeling with 4-thiouridine (4-sU), a uracil analogue that is incorporated in newly synthesized RNA molecules.[12] In this type of experiment, a researcher would supplement cells with 4-sU at the time of the experiment or shortly beforehand. When the experimental treatment presumably affects RNA expression, newly synthesized RNA would be labeled with 4-sU. Newly synthesized RNA is labeled with a reactive thiol group, making it possible to link useful molecules to the RNA.[13] Biotin is a popular molecule for use in this type of assay, as it is inexpensive and binds incredibly strongly and selectively to streptavidin. Incubation of biotinylated RNA with beads containing streptavidin allows for the selective purification of newly synthesized RNA. From here, newly synthesized and total RNA are sequenced separately and compared for differences.

Strengths

edit

Affinity purification makes use of the incredibly popular biotin-streptavidin system for fractionation of biological materials. Binding of biotin to streptavidin is incredibly strong (Kd < 10−14 mol/L).[14] It is also highly specific, which results in minimal background signal from non-specific binding events. Furthermore, time resolution is obtained in a single biological sample, resulting in reduced biological variability compared to using separate samples for each time point.

Weaknesses

edit

The weaknesses of this method are mainly centered around efficiency. One major difficulty is uptake of 4-sU into cultured cells. If 4-sU is given too early, then it will be incorporated into RNA that was not synthesized before the cell began responding to the experimental conditions. If it is given too late, then early stages of the cellular response are not captured by the experiment. The rate of uptake of 4-sU can be measured, but this requires additional experiments to determine optimal dosage and time. Furthermore, these parameters need to be measured in the specific cell lines of interest, as different cell lines may take up 4-sU more slowly than others. RNA is known to be prone to degradation in vitro. It is common for experimental protocols involving RNA to include a number of steps to reduce chances of Ribonuclease contamination or spontaneous degradation of samples, as RNA quality affects RNA-seq results.[15] Metabolic labeling involves a number of additional steps that must be performed in the laboratory on RNA that is in solution. Since metabolic labeling requires that the RNA be kept unfrozen in liquid solution, some level of spontaneous degradation is unavoidable, although it is usually not to such an extent that results are affected. Of greater risk is the chances of ribonuclease contamination, which would render a sample useless, wasting time and resources. It is important for researchers working with RNA in any capacity to minimize unnecessary handling of RNA due to these risks. One additional drawback of using this method is, given equivalent sample size, more sequencing runs are required compared to a time-series experiment. This is because multiple RNA samples corresponding to the initial time point must be sequenced.

Research suggests that 4-sU labeling may result in transcriptional changes on its own, which would affect any results obtained using this method.[16]

Nucleotide conversion

edit

Sample preparation and data processing

edit

Nucleotide conversion works by converting some nucleotides in newly synthesized RNA into others, which can be detected through sequencing. SLAMseq and Timelapse-seq are examples of such approaches.[17][18] As in affinity purification, cells are incubated with 4-sU. After extraction of RNA from samples, they are treated with iodoacetamide (SLAMseq) or 2,2,2-trifluoroethylamine and sodium periodate (Timelapse-seq), which converts 4-sU into a cytosine analogue that is sequenced as a cytosine nucleotide instead of uracil. During sequence alignment and data processing, the U-to-C conversions are used to quantify the number of transcripts that are newly synthesized compared to bulk RNA. [19]

Strengths

edit

This method shares many strengths with affinity purification; notably the fact that multiple samples are not required for a time-series. This method eliminates the need for multiple sequencing runs for multiple time points, as all RNA is run together on the sequencing instrument and labeled RNA is separated from nonlabeled in silico. This reduces sequencing costs significantly, as now time resolution may be obtained without the need for additional samples or additional sequencing runs. Furthermore, by sequencing multiple time points together, technical variability introduced by sample processing is further reduced in addition to the reduced biological variability provided through the 4-sU experimental strategy.

Weaknesses

edit

As with strengths, this method shares many weaknesses with affinity purification methods. Notably, 4-sU uptake and increased sample handling. Since Timelapse-seq relies upon synthetic chemistry methods to convert nucleotides, incomplete reactions result in an underestimation of the abundance of newly synthesized RNA and may result in variability between samples.

Nascent transcript sequencing

edit

Sample preparation and data processing

edit

Unlike metabolic labeling, nascent transcript sequencing (NET-seq) directly sequences transcripts that are still undergoing transcription by RNA polymerase II.[20] This method allows for the study of the dynamics of transcription elongation, which is not possible with metabolic labeling techniques. For a NET-seq experiment, cells are treated as with a standard RNA-seq experiment until they are lysed. Lysis is performed such that RNA-protein complexes remain intact, and RNA polymerase II is immunoprecipitated from the lysate. RNA that was undergoing transcription from DNA is still attached to RNA polymerase and is subsequently eluted from the polymerase and sequenced.

Strengths

edit

Since NET-seq extracts transcripts that have not completed transcription, it is possible to obtain single-nucleotide resolution on the most recently synthesized nucleotide of transcripts. This is valuable in the study of phenomena such as transcriptional kinetics. Furthermore, it allows for the study of unstable transcripts which are degraded shortly after transcription. The general approach of immunoprecipitating RNA-binding proteins has great utility in understanding other areas of RNA biology, such as splicing.

Weaknesses

edit

This method relies upon immunoprecipitation of RNA polymerase II. There are a number of issues with immunoprecipitation, including non-specific binding interactions which may result in the immunoprecipitation of off-target RNA molecules. The temporal resolution of NET-seq is limited to transcription elongation. While comparing relative abundances between transcripts using NET-seq is possible, it is not the intention of the method.

Future directions

edit

Aside from time-series sampling, there are currently no methods for comparing more than two time points. Metabolic labeling experiments are only capable of comparing RNA abundances before and after pulse-labeling. It is of interest to be able to observe modifications to the transcriptome over a series of time points in a single sample, as this would provide increased time resolution in studies. Existing methods of metabolic labeling are of interest for this; if multiple different metabolic labels were used at differing time points this may allow for intermediate time points to be investigated. However, such approaches must be developed with care, as biases in labeling methods and sample processing steps could contribute to misleading results if data from different methods are compared to one another.

Metabolic labeling with 4-sU has been reported to affect cellular phenotype.[16] In current practice, this is unavoidable and is tolerated as the obtained data still fit current biological models, as well as the fact that 4-sU samples are compared with 4-sU samples in most cases. However, this has the potential to result in spurious conclusions, especially if there is any interaction between the effect of 4-sU and the chosen experimental condition. It is not possible to distinguish differences in RNA levels as being due to the experimental conditions being studied or being the result of 4-sU treatment. Identification of labeling chemicals that do not affect cellular phenotype would eliminate these issues altogether.

References

edit
  1. ^ a b Wang Z, Gerstein M, Snyder M (January 2009). "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews. Genetics. 10 (1): 57–63. doi:10.1038/nrg2484. PMC 2949280. PMID 19015660.
  2. ^ Crick FH (1958). "On Protein Synthesis". In Sanders FK (ed.). Symposia of the Society for Experimental Biology, Number XII: The Biological Replication of Macromolecules. Cambridge University Press. pp. 138–163.
  3. ^ Ozsolak F, Milos PM (February 2011). "RNA sequencing: advances, challenges and opportunities". Nature Reviews. Genetics. 12 (2): 87–98. doi:10.1038/nrg2934. PMC 3031867. PMID 21191423.
  4. ^ Schena M, Shalon D, Davis RW, Brown PO (October 1995). "Quantitative monitoring of gene expression patterns with a complementary DNA microarray". Science. 270 (5235): 467–470. Bibcode:1995Sci...270..467S. doi:10.1126/science.270.5235.467. PMID 7569999. S2CID 6720459.
  5. ^ Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (October 1995). "Serial analysis of gene expression". Science. 270 (5235): 484–487. Bibcode:1995Sci...270..484V. doi:10.1126/science.270.5235.484. PMID 7570003. S2CID 16281846.
  6. ^ Okoniewski MJ, Miller CJ (June 2006). "Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations". BMC Bioinformatics. 7: 276. doi:10.1186/1471-2105-7-276. PMC 1513401. PMID 16749918.
  7. ^ Spies D, Ciaudo C (2015). "Dynamics in Transcriptomics: Advancements in RNA-seq Time Course and Downstream Analysis". Computational and Structural Biotechnology Journal. 13: 469–477. doi:10.1016/j.csbj.2015.08.004. PMC 4564389. PMID 26430493.
  8. ^ Hejblum BP, Skinner J, Thiébaut R (June 2015). "Time-Course Gene Set Analysis for Longitudinal Gene Expression Data". PLOS Computational Biology. 11 (6): e1004310. Bibcode:2015PLSCB..11E4310H. doi:10.1371/journal.pcbi.1004310. PMC 4482329. PMID 26111374.
  9. ^ Schurch NJ, Schofield P, Gierliński M, Cole C, Sherstnev A, Singh V, et al. (June 2016). "How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?". RNA. 22 (6): 839–851. doi:10.1261/rna.053959.115. PMC 4878611. PMID 27022035.
  10. ^ McIntyre LM, Lopiano KK, Morse AM, Amin V, Oberg AL, Young LJ, Nuzhdin SV (June 2011). "RNA-seq: technical variability and sampling". BMC Genomics. 12: 293. doi:10.1186/1471-2164-12-293. PMC 3141664. PMID 21645359.
  11. ^ Liu Y, Zhou J, White KP (February 2014). "RNA-seq differential expression studies: more sequence or more replication?". Bioinformatics. 30 (3): 301–304. doi:10.1093/bioinformatics/btt688. PMC 3904521. PMID 24319002.
  12. ^ Dölken L, Ruzsics Z, Rädle B, Friedel CC, Zimmer R, Mages J, et al. (September 2008). "High-resolution gene expression profiling for simultaneous kinetic parameter analysis of RNA synthesis and decay". RNA. 14 (9): 1959–1972. doi:10.1261/rna.1136108. PMC 2525961. PMID 18658122.
  13. ^ Schwanhäusser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, et al. (May 2011). "Global quantification of mammalian gene expression control" (PDF). Nature. 473 (7347): 337–342. Bibcode:2011Natur.473..337S. doi:10.1038/nature10098. PMID 21593866. S2CID 205224972.
  14. ^ Green NM (1975). "Avidin". Advances in Protein Chemistry. 29: 85–133. doi:10.1016/S0065-3233(08)60411-8. ISBN 9780120342297. PMID 237414.
  15. ^ Gallego Romero I, Pai AA, Tung J, Gilad Y (May 2014). "RNA-seq: impact of RNA degradation on transcript quantification". BMC Biology. 12: 42. doi:10.1186/1741-7007-12-42. PMC 4071332. PMID 24885439.
  16. ^ a b Burger K, Mühl B, Kellner M, Rohrmoser M, Gruber-Eber A, Windhager L, et al. (October 2013). "4-thiouridine inhibits rRNA synthesis and causes a nucleolar stress response". RNA Biology. 10 (10): 1623–1630. doi:10.4161/rna.26214. PMC 3866244. PMID 24025460.
  17. ^ Herzog VA, Reichholf B, Neumann T, Rescheneder P, Bhat P, Burkard TR, et al. (December 2017). "Thiol-linked alkylation of RNA to assess expression dynamics". Nature Methods. 14 (12): 1198–1204. doi:10.1038/nmeth.4435. PMC 5712218. PMID 28945705.
  18. ^ Schofield JA, Duffy EE, Kiefer L, Sullivan MC, Simon MD (March 2018). "TimeLapse-seq: adding a temporal dimension to RNA sequencing through nucleoside recoding". Nature Methods. 15 (3): 221–225. doi:10.1038/nmeth.4582. PMC 5831505. PMID 29355846.
  19. ^ Neumann T, Herzog VA, Muhar M, von Haeseler A, Zuber J, Ameres SL, Rescheneder P (May 2019). "Quantification of experimentally induced nucleotide conversions in high-throughput sequencing datasets". BMC Bioinformatics. 20 (1): 258. doi:10.1186/s12859-019-2849-7. PMC 6528199. PMID 31109287.
  20. ^ Churchman LS, Weissman JS (January 2011). "Nascent transcript sequencing visualizes transcription at nucleotide resolution". Nature. 469 (7330): 368–373. Bibcode:2011Natur.469..368C. doi:10.1038/nature09652. PMC 3880149. PMID 21248844.