Power analysis of single-cell RNA-sequencing experiments

Svensson, Valentine; Natarajan, Kedar Nath; Ly, Lam-Ha; Miragaia, Ricardo J; Labalette, Charlotte; Macaulay, Iain C; Cvejic, Ana; Teichmann, Sarah A

doi:10.1038/nmeth.4220

Analysis
Published: 06 March 2017

Power analysis of single-cell RNA-sequencing experiments

Valentine Svensson ORCID: orcid.org/0000-0002-9217-2330^1,2^na1,
Kedar Nath Natarajan^1,2^na1,
Lam-Ha Ly²,
Ricardo J Miragaia^2,3,
Charlotte Labalette^2,4,5,
Iain C Macaulay²,
Ana Cvejic^2,4,5 &
…
Sarah A Teichmann^1,2

Nature Methods volume 14, pages 381–387 (2017)Cite this article

42k Accesses
340 Citations
206 Altmetric
Metrics details

Subjects

Abstract

Single-cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, thereby revealing new cell types and providing insights into developmental processes and transcriptional stochasticity. A key question is how the variety of available protocols compare in terms of their ability to detect and accurately quantify gene expression. Here, we assessed the protocol sensitivity and accuracy of many published data sets, on the basis of spike-in standards and uniform data processing. For our workflow, we developed a flexible tool for counting the number of unique molecular identifiers (https://github.com/vals/umis/). We compared 15 protocols computationally and 4 protocols experimentally for batch-matched cell populations, in addition to investigating the effects of spike-in molecular degradation. Our analysis provides an integrated framework for comparing scRNA-seq protocols.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Strategy for scRNA-seq protocol comparison.**

**Figure 2: Performance metrics for scRNA-seq protocols.**

**Figure 3: Performance metrics after accounting for sequencing depth.**

**Figure 4: Effects of various factors on performance metrics.**

Tools for the analysis of high-dimensional single-cell RNA sequencing data

Article 27 March 2020

Yan Wu & Kun Zhang

A systematic evaluation of single cell RNA-seq analysis pipelines

Article Open access 11 October 2019

Beate Vieth, Swati Parekh, … Ines Hellmann

The triumphs and limitations of computational methods for scRNA-seq

Article 21 June 2021

Peter V. Kharchenko

Accession codes

Primary accessions

ArrayExpress

Referenced accessions

ArrayExpress

European Nucleotide Archive

Gene Expression Omnibus

Sequence Read Archive

References

Macaulay, I.C. & Voet, T. Single cell genomics: advances and future perspectives. PLoS Genet. 10, e1004126 (2014).
Article PubMed PubMed Central Google Scholar
Stegle, O., Teichmann, S.A. & Marioni, J.C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015).
Article CAS PubMed Google Scholar
Wu, A.R. et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat. Methods 11, 41–46 (2014).
Article CAS PubMed Google Scholar
Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Preprint at http://biorxiv.org/content/early/2016/06/29/035758/ (2016).
External RNA Controls Consortium. Proposed methods for testing and selecting the ERCC external RNA controls. BMC Genomics 6, 150 (2005).
Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).
CAS PubMed PubMed Central Google Scholar
Munro, S.A. et al. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nat. Commun. 5, 5125 (2014).
Article CAS PubMed Google Scholar
Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep. 2, 666–673 (2012).
Article CAS PubMed Google Scholar
Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2014).
Article CAS PubMed Google Scholar
Viphakone, N., Voisinet-Hakil, F. & Minvielle-Sebastia, L. Molecular dissection of mRNA poly(A) tail length control in yeast. Nucleic Acids Res. 36, 2418–2433 (2008).
Article CAS PubMed PubMed Central Google Scholar
Grün, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).
Article PubMed Google Scholar
Walker, E. & Nowacki, A.S. Understanding equivalence and noninferiority testing. J. Gen. Intern. Med. 26, 192–196 (2011).
Article PubMed Google Scholar
SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 32, 903–914 (2014).
Kapteyn, J., He, R., McDowell, E.T. & Gang, D.R. Incorporation of non-natural nucleotides into template-switching oligonucleotides reduces background and improves cDNA synthesis from very small RNA samples. BMC Genomics 11, 413 (2010).
Article PubMed PubMed Central Google Scholar
Mahata, B. et al. Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Rep. 7, 1130–1142 (2014).
Article CAS PubMed PubMed Central Google Scholar
Buettner, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015).
Article CAS PubMed Google Scholar
Pollen, A.A. et al. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058 (2014).
Article CAS PubMed PubMed Central Google Scholar
Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371–375 (2014).
Article CAS PubMed PubMed Central Google Scholar
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
CAS PubMed Google Scholar
Zeisel, A. et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).
Article CAS PubMed Google Scholar
Jaitin, D.A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776–779 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ferreira, T. et al. Silencing of odorant receptor genes by G protein βγ signaling ensures the expression of one odorant receptor per olfactory sensory neuron. Neuron 81, 847–859 (2014).
Article CAS PubMed PubMed Central Google Scholar
Owens, N.D.L. et al. Measuring absolute RNA copy numbers at high temporal resolution reveals transcriptome kinetics in development. Cell Rep. 14, 632–647 (2016).
Article CAS PubMed PubMed Central Google Scholar
Llorens-Bobadilla, E. et al. Single-cell transcriptomics reveals a population of dormant neural stem cells that become activated upon brain injury. Cell Stem Cell 17, 329–340 (2015).
Article CAS PubMed Google Scholar
Fan, X. et al. Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos. Genome Biol. 16, 148 (2015).
Article PubMed PubMed Central Google Scholar
Dang, Y. et al. Tracing the expression of circular RNAs in human pre-implantation embryos. Genome Biol. 17, 130 (2016).
Article PubMed PubMed Central Google Scholar
Velten, L. et al. Single-cell polyadenylation site mapping reveals 3′ isoform choice variability. Mol. Syst. Biol. 11, 812 (2015).
Article PubMed PubMed Central Google Scholar
Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).
Article PubMed PubMed Central Google Scholar
Paul, F. et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–1677 (2015).
Article CAS PubMed Google Scholar
Macosko, E.Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Article CAS PubMed PubMed Central Google Scholar
Klein, A.M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Article CAS PubMed PubMed Central Google Scholar
Macaulay, I.C. et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat. Methods 12, 519–522 (2015).
Article CAS PubMed Google Scholar
Scialdone, A. et al. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015).
Article CAS PubMed Google Scholar
Padovan-Merhar, O. et al. Single mammalian cells compensate for differences in cellular volume and DNA copy number through independent global transcriptional mechanisms. Mol. Cell 58, 339–352 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sansom, S.N. et al. Population and single-cell genomics reveal the Aire dependency, relief from Polycomb silencing, and distribution of self-antigen expression in thymic epithelia. Genome Res. 24, 1918–1931 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wilson, N.K. et al. Combined single-cell functional and gene expression analysis resolves heterogeneity within stem cell populations. Cell Stem Cell 16, 712–724 (2015).
Article CAS PubMed PubMed Central Google Scholar
Streets, A.M. et al. Microfluidic single-cell whole-transcriptome sequencing. Proc. Natl. Acad. Sci. USA 111, 7048–7053 (2014).
Article CAS PubMed PubMed Central Google Scholar
Guo, F. et al. The transcriptome and DNA methylome landscapes of human primordial germ cells. Cell 161, 1437–1452 (2015).
Article CAS PubMed Google Scholar
Zheng, G.X.Y. et al. Massively parallel digital transcriptional profiling of single cells. Preprint at http://biorxiv.org/content/early/2016/07/26/065912/ (2016).
Brennecke, P. et al. Single-cell transcriptome analysis reveals coordinated ectopic gene-expression patterns in medullary thymic epithelial cells. Nat. Immunol. 16, 933–941 (2015).
Article CAS PubMed PubMed Central Google Scholar
Patro, R., Duggal, G., Love, M.I., Irizarry, M.A. & Kingsford, C. Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference. Preprint at http://biorxiv.org/content/early/2016/08/30/021592/ (2015).
Srivastava, A., Sarkar, H., Gupta, N. & Patro, R. RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics 32, i192–i200 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Article CAS PubMed Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Carpenter, B., Gelman, A., Hoffman, M., Lee, D. & Goodrich, B. Stan: A probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).
Article Google Scholar

Download references

Acknowledgements

We are grateful to O. Stegle and J.K. Kim for helpful discussions and comments on the manuscript. We thank M. Lynch for support with the C1 experiments, X. Chen for discussions on spike-ins, and M. Quail for help with 10× Chromium experiments. We extend our gratitude to S. Linnarsson and A. Zeisel for invaluable support in implementing STRT-seq in our laboratory and for help with sequencing the STRT library. We also thank D. Grün for sharing smFISH molecule counts. Finally we thank R. Kirchner for many improvements to the umis tool. This study was supported by Cancer Research UK grant C45041/A14953 to A.C. and C.L.; European Research Council project 677501–ZF_Blood to A.C.; a core support grant from the Wellcome Trust and MRC to the Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute; ERC grant ThSWITCH to S.A.T. (grant 260507); and a Lister Institute Research Prize to S.A.T. K.N.N. was supported by the Wellcome Trust Strategic Award 'Single cell genomics of mouse gastrulation'. We thank P. Liu (Wellcome Trust Sanger Institute) for providing cells.

Author information

Valentine Svensson and Kedar Nath Natarajan: These authors contributed equally to this work.

Authors and Affiliations

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
Valentine Svensson, Kedar Nath Natarajan & Sarah A Teichmann
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
Valentine Svensson, Kedar Nath Natarajan, Lam-Ha Ly, Ricardo J Miragaia, Charlotte Labalette, Iain C Macaulay, Ana Cvejic & Sarah A Teichmann
Centre of Biological Engineering, University of Minho, Braga, Portugal
Ricardo J Miragaia
Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute, Cambridge, UK
Charlotte Labalette & Ana Cvejic
Department of Haematology, University of Cambridge, Cambridge, UK
Charlotte Labalette & Ana Cvejic

Authors

Valentine Svensson
View author publications
You can also search for this author in PubMed Google Scholar
Kedar Nath Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Lam-Ha Ly
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo J Miragaia
View author publications
You can also search for this author in PubMed Google Scholar
Charlotte Labalette
View author publications
You can also search for this author in PubMed Google Scholar
Iain C Macaulay
View author publications
You can also search for this author in PubMed Google Scholar
Ana Cvejic
View author publications
You can also search for this author in PubMed Google Scholar
Sarah A Teichmann
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

V.S. and S.A.T. conceived the study. V.S. and L.-H.L. annotated and processed all data. V.S. conceived and implemented the umis tool. V.S. conceived and performed the performance modeling of the data. V.S., R.J.M., and K.N.N. designed the in-house experiments. K.N.N. optimized and implemented the protocols. The degradation experiments were designed by V.S., I.C.M., R.J.M., and K.N.N., who performed the experiments. I.C.M. and C.L. performed zebrafish Smart-seq2 experiments under the supervision of A.C. V.S. and L.H.L. designed the degradation model, and L.H.L. implemented the model. V.S., K.N.N., and S.A.T. wrote the manuscript.

Corresponding authors

Correspondence to Valentine Svensson or Sarah A Teichmann.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Comparison and overview of spike-in sets.

ERCC spike-ins consist of 92 very distinct sequences based on bacterial genes logarithmically distributed across 22 abundance levels (in Mix 1), with poly-A tails ranging from 20 to 26 base pairs. SIRV spike-ins are 69 sequences, modeled after sequences and splicing patterns in 7 human genes. In Mix 2, which we used, the SIRV molecules are present at 4 abundance levels, with virtual alternative isoforms from each gene present at each abundance level. All SIRV molecules have 30 base pair long poly-A tails.

Supplementary Figure 2 UMI efficiency as an alternative metric of sensitivity.

(A) Assuming that UMI counts correspond to a count of the fraction of molecules successfully captured by the RNA-sequencing process, in log-log space the efficiency corresponds to the offset from perfect correspondence between input molecules and counted UMIs. (B) With the exception of data from the MARS-Seq protocol, spike-in detection limits correspond well with UMI efficiency measures. The spike-in detection limit can however also be used for coverage based data quantified by TPM. (C) The assumption with UMI counting as a quantitative measurement is that efficiency is the only factor determining differences between real counts and observed counts. However, fitting a model with a non-one exponent on the number of input molecules shows this is almost in all cases < 1. This means UMI counts underestimate expression of highly expressed genes. (D) The saturation of UMI counts can be partially explained by short UMIs. If an experiment uses too short UMIs, eventually the number of possible observable UMIs plateau. However, even for very long UMIs, such as 10 base pairs, the mean molecule exponent is 0.8, indicating some additional unexplained factor is causing a saturation of UMI counts. (E) Averaged efficiency comparison of endogenous genes and ERCC spike-ins. The data by Grun et al had smFISH measurements for 9 genes in the same experimental conditions as the single-cell RNA-seq data. Assuming 100% capture rate for smFISH, we can compare average smFISH counts with average UMI counts. Round markers correspond to median value across cells, and bars correspond to 95% confidence interval across cells. The smFISH counts suggest UMI counts for endogenous transcripts are on the order of 5-10% on average, while ERCC spike-in UMI counts correspond to 0.5-1% efficiency on average.

Supplementary Figure 3 Trace plots from Bayesian models of degradation.

The posterior samples from the model parameters in Stan⁴¹ for both the ERCC and SIRV analysis show very narrow confidence intervals and good correspondence between the different sampling chains. The SIRV based model is slightly noisier, which can be expected, as isoform-level expression when multiple isoforms are present is a harder quantification problem than quantifying expression of the unique ERCC sequences. For the ERCC model, the mode of the degradation rate parameter p is 19%, and for the SIRV model it is 18.5%.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Svensson, V., Natarajan, K., Ly, LH. et al. Power analysis of single-cell RNA-sequencing experiments. Nat Methods 14, 381–387 (2017). https://doi.org/10.1038/nmeth.4220

Download citation

Received: 09 September 2016
Accepted: 07 February 2017
Published: 06 March 2017
Issue Date: April 2017
DOI: https://doi.org/10.1038/nmeth.4220

This article is cited by

Artificial intelligence-powered discovery of small molecules inhibiting CTLA-4 in cancer
- Navid Sobhani
- Dana Rae Tardiel-Cyril
- Yong Li
BJC Reports (2024)
Beyond single cells: microfluidics empowering multiomics analysis
- Tian Tian
- Shichao Lin
- Chaoyong Yang
Analytical and Bioanalytical Chemistry (2024)
An information-theoretic approach to single cell sequencing analysis
- Michael J. Casey
- Jörg Fliege
- Ben D. MacArthur
BMC Bioinformatics (2023)
Transcriptomic changes in single yeast cells under various stress conditions
- Yangqi Su
- Chen Xu
- Zhengchang Su
BMC Genomics (2023)
LAST-seq: single-cell RNA sequencing by direct amplification of single-stranded RNA without prior reverse transcription and second-strand synthesis
- Jun Lyu
- Chongyi Chen
Genome Biology (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Accession codes

Primary accessions

ArrayExpress

Referenced accessions

ArrayExpress

European Nucleotide Archive

Gene Expression Omnibus

Sequence Read Archive

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links