Abstract
Single-cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, thereby revealing new cell types and providing insights into developmental processes and transcriptional stochasticity. A key question is how the variety of available protocols compare in terms of their ability to detect and accurately quantify gene expression. Here, we assessed the protocol sensitivity and accuracy of many published data sets, on the basis of spike-in standards and uniform data processing. For our workflow, we developed a flexible tool for counting the number of unique molecular identifiers (https://github.com/vals/umis/). We compared 15 protocols computationally and 4 protocols experimentally for batch-matched cell populations, in addition to investigating the effects of spike-in molecular degradation. Our analysis provides an integrated framework for comparing scRNA-seq protocols.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
Primary accessions
ArrayExpress
Referenced accessions
ArrayExpress
European Nucleotide Archive
Gene Expression Omnibus
Sequence Read Archive
References
Macaulay, I.C. & Voet, T. Single cell genomics: advances and future perspectives. PLoS Genet. 10, e1004126 (2014).
Stegle, O., Teichmann, S.A. & Marioni, J.C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015).
Wu, A.R. et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat. Methods 11, 41–46 (2014).
Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Preprint at http://biorxiv.org/content/early/2016/06/29/035758/ (2016).
External RNA Controls Consortium. Proposed methods for testing and selecting the ERCC external RNA controls. BMC Genomics 6, 150 (2005).
Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).
Munro, S.A. et al. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nat. Commun. 5, 5125 (2014).
Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep. 2, 666–673 (2012).
Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2014).
Viphakone, N., Voisinet-Hakil, F. & Minvielle-Sebastia, L. Molecular dissection of mRNA poly(A) tail length control in yeast. Nucleic Acids Res. 36, 2418–2433 (2008).
Grün, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).
Walker, E. & Nowacki, A.S. Understanding equivalence and noninferiority testing. J. Gen. Intern. Med. 26, 192–196 (2011).
SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 32, 903–914 (2014).
Kapteyn, J., He, R., McDowell, E.T. & Gang, D.R. Incorporation of non-natural nucleotides into template-switching oligonucleotides reduces background and improves cDNA synthesis from very small RNA samples. BMC Genomics 11, 413 (2010).
Mahata, B. et al. Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Rep. 7, 1130–1142 (2014).
Buettner, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015).
Pollen, A.A. et al. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058 (2014).
Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371–375 (2014).
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
Zeisel, A. et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).
Jaitin, D.A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776–779 (2014).
Ferreira, T. et al. Silencing of odorant receptor genes by G protein βγ signaling ensures the expression of one odorant receptor per olfactory sensory neuron. Neuron 81, 847–859 (2014).
Owens, N.D.L. et al. Measuring absolute RNA copy numbers at high temporal resolution reveals transcriptome kinetics in development. Cell Rep. 14, 632–647 (2016).
Llorens-Bobadilla, E. et al. Single-cell transcriptomics reveals a population of dormant neural stem cells that become activated upon brain injury. Cell Stem Cell 17, 329–340 (2015).
Fan, X. et al. Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos. Genome Biol. 16, 148 (2015).
Dang, Y. et al. Tracing the expression of circular RNAs in human pre-implantation embryos. Genome Biol. 17, 130 (2016).
Velten, L. et al. Single-cell polyadenylation site mapping reveals 3′ isoform choice variability. Mol. Syst. Biol. 11, 812 (2015).
Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).
Paul, F. et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–1677 (2015).
Macosko, E.Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Klein, A.M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Macaulay, I.C. et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat. Methods 12, 519–522 (2015).
Scialdone, A. et al. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015).
Padovan-Merhar, O. et al. Single mammalian cells compensate for differences in cellular volume and DNA copy number through independent global transcriptional mechanisms. Mol. Cell 58, 339–352 (2015).
Sansom, S.N. et al. Population and single-cell genomics reveal the Aire dependency, relief from Polycomb silencing, and distribution of self-antigen expression in thymic epithelia. Genome Res. 24, 1918–1931 (2014).
Wilson, N.K. et al. Combined single-cell functional and gene expression analysis resolves heterogeneity within stem cell populations. Cell Stem Cell 16, 712–724 (2015).
Streets, A.M. et al. Microfluidic single-cell whole-transcriptome sequencing. Proc. Natl. Acad. Sci. USA 111, 7048–7053 (2014).
Guo, F. et al. The transcriptome and DNA methylome landscapes of human primordial germ cells. Cell 161, 1437–1452 (2015).
Zheng, G.X.Y. et al. Massively parallel digital transcriptional profiling of single cells. Preprint at http://biorxiv.org/content/early/2016/07/26/065912/ (2016).
Brennecke, P. et al. Single-cell transcriptome analysis reveals coordinated ectopic gene-expression patterns in medullary thymic epithelial cells. Nat. Immunol. 16, 933–941 (2015).
Patro, R., Duggal, G., Love, M.I., Irizarry, M.A. & Kingsford, C. Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference. Preprint at http://biorxiv.org/content/early/2016/08/30/021592/ (2015).
Srivastava, A., Sarkar, H., Gupta, N. & Patro, R. RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics 32, i192–i200 (2016).
Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Carpenter, B., Gelman, A., Hoffman, M., Lee, D. & Goodrich, B. Stan: A probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).
Acknowledgements
We are grateful to O. Stegle and J.K. Kim for helpful discussions and comments on the manuscript. We thank M. Lynch for support with the C1 experiments, X. Chen for discussions on spike-ins, and M. Quail for help with 10× Chromium experiments. We extend our gratitude to S. Linnarsson and A. Zeisel for invaluable support in implementing STRT-seq in our laboratory and for help with sequencing the STRT library. We also thank D. Grün for sharing smFISH molecule counts. Finally we thank R. Kirchner for many improvements to the umis tool. This study was supported by Cancer Research UK grant C45041/A14953 to A.C. and C.L.; European Research Council project 677501–ZF_Blood to A.C.; a core support grant from the Wellcome Trust and MRC to the Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute; ERC grant ThSWITCH to S.A.T. (grant 260507); and a Lister Institute Research Prize to S.A.T. K.N.N. was supported by the Wellcome Trust Strategic Award 'Single cell genomics of mouse gastrulation'. We thank P. Liu (Wellcome Trust Sanger Institute) for providing cells.
Author information
Authors and Affiliations
Contributions
V.S. and S.A.T. conceived the study. V.S. and L.-H.L. annotated and processed all data. V.S. conceived and implemented the umis tool. V.S. conceived and performed the performance modeling of the data. V.S., R.J.M., and K.N.N. designed the in-house experiments. K.N.N. optimized and implemented the protocols. The degradation experiments were designed by V.S., I.C.M., R.J.M., and K.N.N., who performed the experiments. I.C.M. and C.L. performed zebrafish Smart-seq2 experiments under the supervision of A.C. V.S. and L.H.L. designed the degradation model, and L.H.L. implemented the model. V.S., K.N.N., and S.A.T. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Comparison and overview of spike-in sets.
ERCC spike-ins consist of 92 very distinct sequences based on bacterial genes logarithmically distributed across 22 abundance levels (in Mix 1), with poly-A tails ranging from 20 to 26 base pairs. SIRV spike-ins are 69 sequences, modeled after sequences and splicing patterns in 7 human genes. In Mix 2, which we used, the SIRV molecules are present at 4 abundance levels, with virtual alternative isoforms from each gene present at each abundance level. All SIRV molecules have 30 base pair long poly-A tails.
Supplementary Figure 2 UMI efficiency as an alternative metric of sensitivity.
(A) Assuming that UMI counts correspond to a count of the fraction of molecules successfully captured by the RNA-sequencing process, in log-log space the efficiency corresponds to the offset from perfect correspondence between input molecules and counted UMIs. (B) With the exception of data from the MARS-Seq protocol, spike-in detection limits correspond well with UMI efficiency measures. The spike-in detection limit can however also be used for coverage based data quantified by TPM. (C) The assumption with UMI counting as a quantitative measurement is that efficiency is the only factor determining differences between real counts and observed counts. However, fitting a model with a non-one exponent on the number of input molecules shows this is almost in all cases < 1. This means UMI counts underestimate expression of highly expressed genes. (D) The saturation of UMI counts can be partially explained by short UMIs. If an experiment uses too short UMIs, eventually the number of possible observable UMIs plateau. However, even for very long UMIs, such as 10 base pairs, the mean molecule exponent is 0.8, indicating some additional unexplained factor is causing a saturation of UMI counts. (E) Averaged efficiency comparison of endogenous genes and ERCC spike-ins. The data by Grun et al had smFISH measurements for 9 genes in the same experimental conditions as the single-cell RNA-seq data. Assuming 100% capture rate for smFISH, we can compare average smFISH counts with average UMI counts. Round markers correspond to median value across cells, and bars correspond to 95% confidence interval across cells. The smFISH counts suggest UMI counts for endogenous transcripts are on the order of 5-10% on average, while ERCC spike-in UMI counts correspond to 0.5-1% efficiency on average.
Supplementary Figure 3 Trace plots from Bayesian models of degradation.
The posterior samples from the model parameters in Stan41 for both the ERCC and SIRV analysis show very narrow confidence intervals and good correspondence between the different sampling chains. The SIRV based model is slightly noisier, which can be expected, as isoform-level expression when multiple isoforms are present is a harder quantification problem than quantifying expression of the unique ERCC sequences. For the ERCC model, the mode of the degradation rate parameter p is 19%, and for the SIRV model it is 18.5%.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–3 (PDF 417 kb)
Supplementary Table 1
Descriptive summaries of the public studies used for the comparison (XLSX 11 kb)
Supplementary Table 2
Full data table of technical parameters for each sample used for comparison and generation of all figures (CSV 9363 kb)
Supplementary Software
Umis version 0.3.0, which we used for processing all UMI data. See https://github.com/vals/umis for updated versions (ZIP 30 kb)
Rights and permissions
About this article
Cite this article
Svensson, V., Natarajan, K., Ly, LH. et al. Power analysis of single-cell RNA-sequencing experiments. Nat Methods 14, 381–387 (2017). https://doi.org/10.1038/nmeth.4220
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.4220
This article is cited by
-
Artificial intelligence-powered discovery of small molecules inhibiting CTLA-4 in cancer
BJC Reports (2024)
-
Beyond single cells: microfluidics empowering multiomics analysis
Analytical and Bioanalytical Chemistry (2024)
-
An information-theoretic approach to single cell sequencing analysis
BMC Bioinformatics (2023)
-
Transcriptomic changes in single yeast cells under various stress conditions
BMC Genomics (2023)
-
LAST-seq: single-cell RNA sequencing by direct amplification of single-stranded RNA without prior reverse transcription and second-strand synthesis
Genome Biology (2023)