Single-cell genome and epigenome by transposases sequencing (scGET-seq) is a DNA sequencing method for profiling open and closed chromatin. In contrast to single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq), which only targets active euchromatin,[1] scGET-seq is also capable of probing inactive heterochromatin.[2]

This is achieved through the use of TnH, which is created by linking the chromodomain (CD) of heterochromatin protein-1-alpha (HP-1) to the Tn5 transposase. TnH is then able to target histone 3 lysine 9 trimethylation (H3K9me3), a marker for heterochromatin.[3]

Akin to RNA velocity, which uses the ratio of spliced to unspliced RNA to infer the kinetics of changes in gene expression over the course of cellular development,[4] the ratio of TnH to Tn5 signals obtained from scGET-seq can be used to calculate chromatin velocity, which measures the dynamics of chromatin accessibility over the course of cellular developmental pathways.[2]

History

edit

Transcriptional regulation is tightly linked to chromatin states. Chromatin that is open, or permissive to transcription, make up only 2-3% of the genome, but encompass 94.4% of transcription factor binding sites.[5][6] Conversely, more tightly packed DNA, or heterochromatin, is responsible for genome organization and stability.[7] Chromatin density also changes over the course of cellular differentiation processes,[8] but there is a lack of high-throughput sequencing methods for directly assaying heterochromatin.

Many genomic-related diseases such as cancer are highly linked to changes in their epigenome. Cancers in particular are characterized by single-cell heterogeneity, which can drive metastasis and treatment resistance.[9][10]  The mechanisms that underlie these processes are still largely unknown, although the advent of single-cell technologies, including single-cell epigenomics, has contributed greatly to their elucidation.[11]

In 2015, ATAC-seq, which uses the Tn5 transposase to fragment and tag accessible chromatin, or euchromatin, for sequencing, became feasible at the single-cell resolution.[12] scGET-seq builds upon this technology by also providing information on heterochromatin, providing a more comprehensive look at chromatin structure and dynamics within each cell.[13]

Methods

edit
 
Broad overview of how scGET-seq is performed

Sample preparation

edit

Sample preparation for scGET-seq starts with obtaining a suspension of nuclei from cells using a method appropriate for the starting material.[14]

The next step is to produce the TnH transposase. Tn5 is a transposase that cuts and ligates adapters to genomic regions unbound by nucleosomes (open chromatin).[15] HP-1a is a member of the HP1 family and is able to recognize and specifically bind to H3K9me3.[16][17] Its chromodomain uses an induced-fit mechanism for recognizing this chromatin modification.[18] Linking the first 112 amino acids of HP-1a containing the chromodomain to Tn5 using a three poly-tyrosine-glycine-serine (TGS) linker leads to the creation of the TnH transposase, which is capable of targeting heterochromatin marked by H3K9me3.[2]

Library preparation is done using a modified protocol for single-cell ATAC-seq,[19] where the nuclei suspension is sequentially incubated with the Tn5 transposase first, and then TnH.[2]

Data analysis

edit

The goals of the data analysis are:[2]

  1. To identify and characterize distinct cell populations using clustering
  2. To profile chromatin accessibility across the genome
  3. To predict copy-number variants and single-nucleotide variants

Pre-processing

edit
  1. Post-sequencing, reads need to be demultiplexed and mapped to the appropriate reference genome. Duplicated reads are identified and removed.
  2. "Peaks", or regions in the DNA enriched in the number of reads mapped, are identified.[20]
  3. Quality control is performed, and cells with low numbers of reads or few detected features are filtered out.
  4. Four count matrices (matrices where each column is a cell and each row is a feature) are generated: Tn5-dhs, Tn5-complement, TnH-dhs and TnH-complement, representing signal from accessible and compacted chromatin.[2]

Analysis

edit
Dimension reduction, visualization and clustering
edit

Each of the matrices are filtered of shared regions and then normalized and log2 transformed. Linear dimension reduction is done using principal component analysis (PCA). Groups of cells are identified using a k-NN algorithm[21] and Leiden algorithm.[22] Finally, the four matrices are combined using matrix factorization[23] and UMAP reduction.[24]

Cell identification annotation
edit

There are two approaches to cell identity annotation: Annotation based on feature annotation of ATAC peaks,[25] and annotation based on integration with reference scRNA-seq data.[26]

Applications

edit
 
Differences between scGET-seq and scATAC-seq

Current

edit

By using the ratio of Tn5 to TnH signals, quantitative values describing how quickly and in what direction chromatin remodelling is taking place can be calculated (chromatin velocity).[2] By isolating regions that are most dynamic and identifying which transcription factors bind there, chromatin velocity can be used to infer the dynamic epigenetic processes happening within a given cell and the contributions of various transcription factors to those processes.[2]

Future

edit

Chromatin remodelling precedes changes in gene expression and enhances the understanding of trajectories and mechanisms of cellular changes.[27][28] Thus, platforms and tools for integration of multimodal data are areas of active research[29][30][31] Incorporating temporal and directionality elements through integration of chromatin velocity with RNA velocity has been proposed to reveal even more information about differentiation pathways.[32][33]

Limitations

edit

scGET-seq has some of the same limitations as scATAC-seq. Both processes require nuclei samples from viable cells, and high cellular viability.[13] Low cellular viability leads to high background DNA contamination that do not accurately represent authentic biological signals. Additionally, the sparsity and noisy nature of scATAC-seq and scGET-seq data makes analysis challenging, and there is no consensus yet on how to best manage this data[34]

Another limitation is that scGET-seq still needs the validation of SNVs results by bulk genome sequencing. Even though there is a high correlation of mutations between bulk exome sequencing and scGET-seq results, scGET-seq fails to capture all exome SNVs.[2]

References

edit
  1. ^ Yan F, Powell DR, Curtis DJ, Wong NC (February 2020). "From reads to insight: a hitchhiker's guide to ATAC-seq data analysis". Genome Biology. 21 (1): 22. doi:10.1186/s13059-020-1929-3. PMC 6996192. PMID 32014034.
  2. ^ a b c d e f g h i Tedesco M, Giannese F, Lazarević D, Giansanti V, Rosano D, Monzani S, et al. (February 2022). "Chromatin Velocity reveals epigenetic dynamics by single-cell profiling of heterochromatin and euchromatin". Nature Biotechnology. 40 (2): 235–244. doi:10.1038/s41587-021-01031-1. hdl:11368/3007419. PMID 34635836. S2CID 238637962.
  3. ^ Kouzarides T (February 2007). "Chromatin modifications and their function". Cell. 128 (4): 693–705. doi:10.1016/j.cell.2007.02.005. PMID 17320507. S2CID 11691263.
  4. ^ La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, et al. (August 2018). "RNA velocity of single cells". Nature. 560 (7719): 494–498. Bibcode:2018Natur.560..494L. doi:10.1038/s41586-018-0414-6. PMC 6130801. PMID 30089906.
  5. ^ Klemm SL, Shipony Z, Greenleaf WJ (April 2019). "Chromatin accessibility and the regulatory epigenome". Nature Reviews. Genetics. 20 (4): 207–220. doi:10.1038/s41576-018-0089-8. PMID 30675018. S2CID 59159906.
  6. ^ Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. (September 2012). "The accessible chromatin landscape of the human genome". Nature. 489 (7414): 75–82. Bibcode:2012Natur.489...75T. doi:10.1038/nature11232. PMC 3721348. PMID 22955617. S2CID 4304439.
  7. ^ Penagos-Puig A, Furlan-Magaril M (2020). "Heterochromatin as an Important Driver of Genome Organization". Frontiers in Cell and Developmental Biology. 8: 579137. doi:10.3389/fcell.2020.579137. PMC 7530337. PMID 33072761.
  8. ^ Golkaram M, Jang J, Hellander S, Kosik KS, Petzold LR (October 2017). "The Role of Chromatin Density in Cell Population Heterogeneity during Stem Cell Differentiation". Scientific Reports. 7 (1): 13307. Bibcode:2017NatSR...713307G. doi:10.1038/s41598-017-13731-3. PMC 5645312. PMID 29042584.
  9. ^ Dagogo-Jack I, Shaw AT (February 2018). "Tumour heterogeneity and resistance to cancer therapies". Nature Reviews. Clinical Oncology. 15 (2): 81–94. doi:10.1038/nrclinonc.2017.166. PMID 29115304. S2CID 2194691.
  10. ^ Lawson DA, Kessenbrock K, Davis RT, Pervolarakis N, Werb Z (December 2018). "Tumour heterogeneity and metastasis at single-cell resolution". Nature Cell Biology. 20 (12): 1349–1360. doi:10.1038/s41556-018-0236-7. PMC 6477686. PMID 30482943.
  11. ^ Dai Z, Gu XY, Xiang SY, Gong DD, Man CF, Fan Y (November 2020). "Research and application of single-cell sequencing in tumor heterogeneity and drug resistance of circulating tumor cells". Biomarker Research. 8 (1): 60. doi:10.1186/s40364-020-00240-1. PMC 7653877. PMID 33292625.
  12. ^ Pott S, Lieb JD (August 2015). "Single-cell ATAC-seq: strength in numbers". Genome Biology. 16 (1): 172. doi:10.1186/s13059-015-0737-7. PMC 4546161. PMID 26294014.
  13. ^ a b Tang L (December 2021). "Sketching open and closed chromatin". Nature Methods. 18 (12): 1448. doi:10.1038/s41592-021-01351-9. PMID 34862496. S2CID 244871731.
  14. ^ "Isolation of Nuclei for Single Cell RNA Sequencing & Tissues for Single Cell RNA Sequencing -Demonstrated Protocol -Sample Prep -Single Cell Gene Expression -Official 10x Genomics Support". support.10xgenomics.com. Retrieved 2022-03-02.
  15. ^ Hsu FM, Gohain M, Chang P, Lu JH, Chen PY (January 2018). "Chapter 4 - Bioinformatics of Epigenomic Data Generated From Next-Generation Sequencing". In Tollefsbol TO (ed.). Epigenetics in Human Disease. Translational Epigenetics. Vol. 6 (Second ed.). Academic Press. pp. 65–106. doi:10.1016/B978-0-12-812215-0.00004-2. ISBN 978-0-12-812215-0.
  16. ^ Bannister AJ, Zegerman P, Partridge JF, Miska EA, Thomas JO, Allshire RC, Kouzarides T (March 2001). "Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain". Nature. 410 (6824): 120–124. Bibcode:2001Natur.410..120B. doi:10.1038/35065138. PMID 11242054. S2CID 4334447.
  17. ^ Watanabe S, Mishima Y, Shimizu M, Suetake I, Takada S (May 2018). "Interactions of HP1 Bound to H3K9me3 Dinucleosome by Molecular Simulations and Biochemical Assays". Biophysical Journal. 114 (10): 2336–2351. Bibcode:2018BpJ...114.2336W. doi:10.1016/j.bpj.2018.03.025. PMC 6129468. PMID 29685391.
  18. ^ Nielsen PR, Nietlispach D, Mott HR, Callaghan J, Bannister A, Kouzarides T, et al. (March 2002). "Structure of the HP1 chromodomain bound to histone H3 methylated at lysine 9". Nature. 416 (6876): 103–107. Bibcode:2002Natur.416..103N. doi:10.1038/nature722. PMID 11882902. S2CID 4423019.
  19. ^ "Chromium Single Cell ATAC Reagent Kits User Guide (v1.1 Chemistry) -User Guide -Official 10x Genomics Support". support.10xgenomics.com. Retrieved 2022-03-02.
  20. ^ Baek S, Lee I (2020-01-01). "Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation". Computational and Structural Biotechnology Journal. 18: 1429–1439. doi:10.1016/j.csbj.2020.06.012. PMC 7327298. PMID 32637041.
  21. ^ Polański K, Young MD, Miao Z, Meyer KB, Teichmann SA, Park JE (February 2020). "BBKNN: fast batch alignment of single cell transcriptomes". Bioinformatics. 36 (3): 964–965. doi:10.1093/bioinformatics/btz625. PMC 9883685. PMID 31400197.
  22. ^ Traag VA, Waltman L, van Eck NJ (March 2019). "From Louvain to Leiden: guaranteeing well-connected communities". Scientific Reports. 9 (1): 5233. arXiv:1810.08473. Bibcode:2019NatSR...9.5233T. doi:10.1038/s41598-019-41695-z. PMC 6435756. PMID 30914743.
  23. ^ Žitnik M, Zupan B (January 2015). "Data Fusion by Matrix Factorization". IEEE Transactions on Pattern Analysis and Machine Intelligence. 37 (1): 41–53. arXiv:1307.0803. doi:10.1109/TPAMI.2014.2343973. PMID 26353207. S2CID 362295.
  24. ^ "UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction — umap 0.5 documentation". umap-learn.readthedocs.io. Retrieved 2022-03-04.
  25. ^ Cittaro D (2022-02-21), dawe/scatACC, retrieved 2022-03-04
  26. ^ Abdelaal T, Michielsen L, Cats D, Hoogduin D, Mei H, Reinders MJ, Mahfouz A (September 2019). "A comparison of automatic cell identification methods for single-cell RNA sequencing data". Genome Biology. 20 (1): 194. doi:10.1186/s13059-019-1795-z. PMC 6734286. PMID 31500660.
  27. ^ Stadhouders R, Vidal E, Serra F, Di Stefano B, Le Dily F, Quilez J, et al. (February 2018). "Transcription factors orchestrate dynamic interplay between genome topology and gene regulation during cell reprogramming". Nature Genetics. 50 (2): 238–249. doi:10.1038/s41588-017-0030-7. PMC 5810905. PMID 29335546.
  28. ^ Ranzoni AM, Tangherloni A, Berest I, Riva SG, Myers B, Strzelecka PM, et al. (March 2021). "Integrative Single-Cell RNA-Seq and ATAC-Seq Analysis of Human Developmental Hematopoiesis". Cell Stem Cell. 28 (3): 472–487.e7. doi:10.1016/j.stem.2020.11.015. PMC 7939551. PMID 33352111.
  29. ^ Lin Y, Wu TY, Wan S, Yang JY, Wong WH, Wang YX (January 2022). "scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning". Nature Biotechnology. 40 (5): 703–710. doi:10.1038/s41587-021-01161-6. PMC 9186323. PMID 35058621. S2CID 246150572.
  30. ^ Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, et al. (June 2019). "Comprehensive Integration of Single-Cell Data". Cell. 177 (7): 1888–1902.e21. doi:10.1016/j.cell.2019.05.031. PMC 6687398. PMID 31178118.
  31. ^ Wang C, Sun D, Huang X, Wan C, Li Z, Han Y, et al. (August 2020). "Integrative analyses of single-cell transcriptome and regulome using MAESTRO". Genome Biology. 21 (1): 198. doi:10.1186/s13059-020-02116-x. PMC 7412809. PMID 32767996.
  32. ^ Xu Y, Begoli E, McCord RP (2021-12-01). "sciCAN: Single-cell chromatin accessibility and gene expression data integration via Cycle-consistent Adversarial Network". bioRxiv: 2021.11.30.470677. doi:10.1101/2021.11.30.470677. S2CID 244821695.
  33. ^ Chen Z, King WC, Gerstein M, Zhang J (2022-02-23). "scDVF: Single-cell Transcriptomic Deep Velocity Field Learning with Neural Ordinary Differential Equations". bioRxiv: 2022.02.15.480564. doi:10.1101/2022.02.15.480564. S2CID 247000437.
  34. ^ Baek S, Lee I (January 2020). "Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation". Computational and Structural Biotechnology Journal. 18: 1429–1439. doi:10.1016/j.csbj.2020.06.012. PMC 7327298. PMID 32637041.