Abstract
Biobanked, cryopreserved lung cells obtained from human lung tissue are viable and valuable resources for large-scale molecular phenotyping http://bit.ly/31gtpcP
Introduction
Recent advances in high throughput omic technologies have greatly enhanced our knowledge of the molecular basis of complex lung diseases. In particular, with the advent of next generation sequencing, transcriptomic studies with RNA sequencing (RNA-seq) have yielded important insights into COPD and idiopathic pulmonary fibrosis (IPF) [1, 2]. However, most studies to date have been performed on whole lung tissue, such that many unique cell type-specific gene expression signatures, particularly those of alveolar epithelial cells (AECs), which play a key role in COPD and IPF, may have been diluted, if not missed altogether. Single-cell RNA-seq is rapidly gaining momentum as a strategy to profile individual cells [3, 4], but sequencing depth is typically much lower and cost much higher than that of bulk RNA-seq.
The challenges of procuring human lung tissue in a standardised way and isolating epithelial cells from cryopreserved samples, in addition to the requirement for RNA of adequate quantity and quality, have hampered the field's ability to perform transcriptomic profiling in AECs. Using biobanked samples would significantly expand the pool of lung specimens for investigation and minimise the statistical biases related to batch processing of fresh specimens [5]. Protocols detailing lung digestion and isolation of specific cell populations have been described [3, 6, 7], but few studies have examined how cryopreservation alters lung epithelial cell viability, RNA quality and gene expression [8–10]. Here we demonstrate an example of successful RNA-seq of AECs isolated from biobanked lung cell suspensions. We discuss the technical factors that may influence the success of this methodology and how it may be adapted for a broad range of downstream applications.
Sample pipeline
Figure 1 depicts the overall methodology of our sample pipeline. Human explanted lungs were procured from donors with end-stage lung disease (COPD, IPF and non-IPF fibrosis) undergoing transplant or control lungs rejected for transplant. For IPF and COPD specimens, tissue was selected from visibly diseased parenchyma from any lobe. Lung tissue was digested and processed according to the protocol provided in the supplementary methods. Cell suspensions were sorted immediately or cryopreserved in liquid nitrogen until thawed later (figure 1a). Non-biobanked and biobanked cell suspensions were sorted according to a previously described protocol [7] to enrich for viable (DAPI−) non-haematopoietic (CD45−) type I and II AECs using two gates, EpCAM+/PDPNlow (P1 gate) and EpCAMhigh/PDPN− (P2 gate), respectively (figure 1b). A subset of sorted cells was fixed in paraformaldehyde, incubated with antibodies against surfactant protein C and aquaporin, and visualised using confocal microscopy (figure 1c). RNA was extracted from sorted cell suspensions and purified. RNA quality and quantity were measured with a bioanalyser. RNA-seq was performed on 11 biobanked control samples. The Illumina TruSeq RNA Access Library Prep kit was used for library preparation. Purified libraries were validated on TapeStation and quantitated with Qubit. FastQC was used to visualise aggregate Phred scores from demultiplexed sample FastQ files. Sequence quality was assessed using Phred scores, mapping rate, and RNA composition. R package Rsubreads [11] and DESeq2 [12] were used to quantify and normalise gene expression counts. Statistical inference testing was performed using linear regression models in DESeq2, controlling for age, gender and cell type percentage.
Key observations
To investigate the impact of technical variables on cell viability and RNA quality, we examined the effect of disease, cold ischaemic time (CIT, defined as the period from lung explantation to the first step of tissue digestion) and use/duration of cryopreservation on the yield of viable P1- and P2-gated epithelial cells and the RNA Integrity Number (RIN) and DV200, standard metrics of RNA quality (figure 1d and e). Data on warm ischaemic times (period from cross-clamping of aorta to lung explantation) were not routinely available. There was a relative abundance of P2-gated cells in IPF lungs compared to COPD and control lungs, possibly reflecting aberrant alveolar re-epithelialisation, a key pathological feature of IPF [13]. P2-gated samples also yielded a higher quantity of RNA (427±395 ng versus 241±197 ng per mL of total cell suspension). Fresh cell suspensions had more viable total cells, but there was no significant difference in the percentage of P1 or P2-gated cells compared to biobanked cells. Longer CIT and storage duration (median 726 days, interquartile range 441–1021) generally affected the yield and RNA integrity of P1-gated cells more than P2-gated cells. This was not surprising, as type I AECs are more vulnerable and less abundant than type II AECs [14], which serve as facultative progenitor cells for alveolar regeneration in response to lung injury [15, 16]. While RIN varied widely among samples, no significant differences between non-biobanked and biobanked sorted cells were observed (figure 1d). Moreover, there were no strong correlations (r2 >0.9) of RIN with disease, CIT or storage times, although there was a trend towards lower RNA quality as CIT and storage time increased.
To determine whether RNA obtained from biobanked samples using this enrichment strategy was adequate for transcriptomic profiling, RNA-seq of P1- and P2-gated cells from control lungs was performed using exome capture sequencing. Overall quality of sequence data was high, with a high read mapping frequency (>90% for all samples), adequate insert size (median 248 bp) (figure 1f), high level of agreement between observed and expected concentrations of target sequences (figure 1g) and no duplicate transcripts (data not shown). There was a significant correlation of input reads with CIT but not storage duration (p=0.03 and 0.79, respectively) (figure 1h and i). To assess whether our cell sorting approach reliably partitioned type I and type II AECs, we compared differential gene expression of paired P1- and P2-gated biobanked samples, focusing on signature markers associated with these cell types. Genes related to type I and type II AECs were upregulated in P1- and P2-gated samples, respectively, although differential gene expression did not reach statistical significance for AQP5 (figure 1j). This was consistent with the relative expression of AQP5 in the two cell populations as determined by immunofluorescence; among P1-gated cells, only 39.4±13.7% stained positive for AQP5, while among P2-gated cells, 86.1±2.8% stained positive for SFTPC (figure 1c). The comparatively low differential expression of AQP5 in P1-gated cells likely reflects admixture by other cell types, as well as potential transdifferentiation of ATII to ATI cells. However, unbiased hierarchical clustering demonstrated overall good segregation of P1- and P2-gated samples (figure 1k). Gene ontology enrichment analysis revealed increased regulation of genes related to focal adhesion and lamellar bodies in P2-gated cells (figure 1l).
Implications for current and future applications
Our work provides a sample pipeline for isolating a targeted cell population from cryopreserved human lung explants for large-scale molecular analysis. After enriching for AECs from lung cell suspensions using an established flow-sorting protocol, we performed RNA-seq on biobanked control samples using exome capture sequencing. Our approach was limited by lower sample numbers, the use of imperfect cell markers which likely contributed to admixture in our AT1-gated cells, and a restricted set of comparisons in our analysis. Despite these challenges, as well as the significant variability in source collection and cryopreservation times, input RNA was still of adequate quality to generate transcriptome libraries that identify biologically relevant signatures, paving the way for the next step of profiling our diseased samples as well as collaborative studies with other groups.
High throughput molecular studies have yielded valuable insights into lung pathobiology, yet many studies are limited by low sample number and lack validation cohorts [17]. Individuals diagnosed with the same disease may have widely disparate clinical courses and responses to therapies, underscoring the need to generate biorepositories with comprehensive cellular and molecular phenotyping that can facilitate personalised approaches to disease classification and management. Organ explants obtained during transplantation or diagnostic surgical lung biopsies are a valuable source of human tissue. However, obtaining good quality lung specimens remains challenging due to the unpredictable timing of transplants and vulnerability to ischaemic injury, particularly in epithelial cells. Prospectively biobanking lung tissue and systematically enriching for cell populations of interest such as AECs, which are at the core of the pathogenesis of diseases such as IPF and COPD, can facilitate biologically and statistically rigorous analyses of samples by minimising batch variability while providing an entrance into investigating lung architecture and function at multiple levels of regulation even beyond the transcriptome, be it the epigenome, proteome or metabolome. Our methodology validates the use of prospectively biobanking digested lung suspensions, but alternative approaches, such as biobanking whole tissue and then digesting and sorting cells, could also be considered and systematically studied.
In our study, we demonstrated the impact of ischaemic time and cryopreservation on AEC viability and RNA quality. RIN scores higher than seven are typically recommended for RNA-seq, which is commonly performed using poly(A) tail selection to enrich mRNA. Poly(A) libraries may greatly underrepresent the target transcriptome in degraded RNA [18]. As our samples had lower mean RIN scores, we adopted exome capture technology, a validated alternate approach of enriching cDNA transcripts after the main enzymatic steps of library construction rather than at the RNA stage. Using excess complementary capture probes at multiple positions enables transcript recovery even without poly(A) tails [18, 19]. RNA-seq on biobanked samples achieved standard quality control metrics, suggesting that exome capture sequencing is a good alternative in settings where RNA quality is reduced, such as clinical or formaldehyde-fixed paraffin-embedded samples. The limited availability of fresh tissue often leads to sporadic processing of individual samples, introducing biases from technical/operator variability that are mitigated when processing cryopreserved specimens in bulk. Importantly, our study showed similar RINs between non-biobanked and biobanked specimens. While we did not determine the effect of biobanking on transcriptomic signatures, other studies have demonstrated that cryopreservation does not significantly alter gene expression in cell lines or lung cancer tissue, including those involved in inflammatory or immune responses, with over 2 h from resection to freezing [8–10].
Having validated our methodology with RNA-seq of control AECs, we can thus proceed with bulk RNA-seq and single-cell gene expression studies to identify key transcriptomic signatures in chronic parenchymal lung disease. While RNA-seq's most widely used application has been gene expression profiling, its capacity for single base pair resolution enables characterisation of other facets of the human genome, including non-coding RNA species, regulatory elements, allele-specific expression, alternative splicing and gene fusions [20]. All of these may have unprecedented implications on our understanding of complex lung diseases and the development of precision therapies.
Harnessing such applications requires sufficient sequencing depth to ensure accuracy, but this frequently limits the number of transcripts that can be read in each sample. One strategy to maximise read depth in heterogeneous organs such as the lung is to isolate targeted populations of cells, which we achieved using a straightforward method of enriching AECs from whole lung tissue. However, admixture by other cell types in our sorted populations does highlight the challenges of distinguishing AEC subtypes solely using positive/negative selection with established markers, particularly as there may be loss of these markers associated with disease pathogenesis. Other markers of type II cells have been identified that could be used for more targeted enrichment [21]. Lung cells may transdifferentiate [22] and type II cells can transform into type I cells under various conditions [21, 23, 24]. Integrating agnostic sequencing approaches such as single-cell RNA-seq or single-nucleus RNA-seq (Nuc-seq), which obviates the need for live cells [25], might effectively deal with cell admixture and uncover novel cell types and molecular signatures, including more specific cell surface markers. While our study has several technical limitations as discussed, it presents methodological concepts that can be refined and widely adapted for diverse applications, facilitating large-scale, deep molecular phenotyping of complex organ diseases.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material ERJ-01635-2018.SUPPLEMENT
Shareable PDF
Supplementary Material
This one-page PDF can be shared freely online.
Shareable PDF ERJ-01635-2018.Shareable
Acknowledgements
A subset of control lungs was provided by the International Institute for the Advancement of Medicine.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Author contributions: Conception and design: S.G. Chu, S. Poli De Frias, Y. Sakairi, B.A. Raby and I.O. Rosas. Data acquisition, analysis and/or interpretation: all authors. Writing and revising the article: S.G. Chu, S. Poli De Frias, Y. Sakairi, R.S. Kelly, R. Chase, B.A. Raby and I.O. Rosas.
Support statement: This work was supported by NHLBI grant P01HL114501 (G.R. Washko, A.M.K. Choi, B.A. Raby and I.O. Rosas). Funding information for this article has been deposited with the Crossref Funder Registry.
Conflict of interest: S.G. Chu has nothing to disclose.
Conflict of interest: S. Poli De Frias has nothing to disclose.
Conflict of interest: Y. Sakairi reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: R.S. Kelly reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: R. Chase reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: K. Konishi reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: A. Blau reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: E. Tsai reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: K. Tsoyi has nothing to disclose.
Conflict of interest: R.F. Padera reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: L.M. Sholl reports grants from National Institutes of Health (P01HL114501), during the conduct of the study; personal fees for consultancyfrom Foghorn Therapeutics, outside the submitted work.
Conflict of interest: H.J. Goldberg reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: H.R. Mallidi reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: P.C. Camp reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: S.Y. El-Chemaly has nothing to disclose.
Conflict of interest: M.A. Perrella reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: A.M.K. Choi reports grants from National Institutes of Health (P01HL114501, R01HL132198, P01HL108801, R01HL055330, R01HL133801), during the conduct of the study.
Conflict of interest: G.R. Washko reports grants from NIH and BTG Interventional Medicine, grants from and consultancy/advisory board work for Boehringer Ingelheim, consultancy for Genentech, Regeneron and GlaxoSmithKline, consultancy/data monitoring committee work for PulmonX, advisory board work for ModoSpira and Toshiba, grants from and consultancy for Janssen Pharmaceuticals, outside the submitted work; is founder and co-owner of Quantitative Imaging Solutions; and G.R Washko's spouse works for Biogen, which is focused on developing therapies for fibrotic lung disease.
Conflict of interest: B.A. Raby reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
Conflict of interest: I.O. Rosas reports grants from National Institutes of Health (P01HL114501), during the conduct of the study.
- Received August 27, 2018.
- Accepted October 10, 2019.
- Copyright ©ERS 2020