Abstract
Background Microbiome studies of the lower airways based on bacterial 16S rRNA gene sequencing assess microbial community structure but can only infer functional characteristics. Microbial products, such as short-chain fatty acids (SCFAs), in the lower airways have significant impact on the host's immune tone. Thus, functional approaches to the analyses of the microbiome are necessary.
Methods Here we used upper and lower airway samples from a research bronchoscopy smoker cohort. In addition, we validated our results in an experimental mouse model. We extended our microbiota characterisation beyond 16S rRNA gene sequencing with the use of whole-genome shotgun (WGS) and RNA metatranscriptome sequencing. SCFAs were also measured in lower airway samples and correlated with each of the sequencing datasets. In the mouse model, 16S rRNA gene and RNA metatranscriptome sequencing were performed.
Results Functional evaluations of the lower airway microbiota using inferred metagenome, WGS and metatranscriptome data were dissimilar. Comparison with measured levels of SCFAs shows that the inferred metagenome from the 16S rRNA gene sequencing data was poorly correlated, while better correlations were noted when SCFA levels were compared with WGS and metatranscriptome data. Modelling lower airway aspiration with oral commensals in a mouse model showed that the metatranscriptome most efficiently captures transient active microbial metabolism, which was overestimated by 16S rRNA gene sequencing.
Conclusions Functional characterisation of the lower airway microbiota through metatranscriptome data identifies metabolically active organisms capable of producing metabolites with immunomodulatory capacity, such as SCFAs.
Abstract
This study shows that both whole-genome shotgun and RNA metatranscriptome sequencing can be done on lower airway samples and can provide valuable information on bacterial function https://bit.ly/3hNmZfi
Introduction
Characterisation of the lower airway microbiota by 16S rRNA gene sequencing has revealed that the lower airways are frequently enriched with oral commensals in healthy subjects [1–7], most likely due to microaspiration. The presence of oral commensals in the lower airways has also been identified in multiple pulmonary diseases such as cystic fibrosis, bronchiectasis, chronic obstructive pulmonary disease and lung cancer [7–11]. However, the viability of organisms identified in lower airway samples using targeted gene sequencing is uncertain, and most investigations have been limited to just taxonomic description of the lower airway microbiota and its association with host phenotypes [1, 2, 7, 12–14]. Whole-genome shotgun (WGS) and RNA metatranscriptome sequencing can directly capture gene content and active transcription, respectively. These techniques have the potential to provide a more precise functional assessment of the lower airway microbiome [15–17]. Due to the limited microbial biomass in the lower airways, these methods are challenging and have not yet been fully evaluated in comparison with standard microbial profiling and inferred functional content based on 16S rRNA gene sequencing.
Functionally active microbes can produce microbial products of relevance to the host and may modify host functions [13, 14, 18]. For example, short-chain fatty acids (SCFAs) cannot be produced by mammalian cells (with the exception of acetate), but are produced by facultative and obligate anaerobes under hypoxic conditions [19–24]. SCFAs produced by the gut microbiota induce regulatory T-cells that modify asthma, inflammatory bowel disease and cancer [19–24]. These SCFAs have also been identified in the lower airways and, with 16S rRNA gene sequencing, our group has shown that their presence is associated with enrichment of the lower airway microbiota with oral commensals [13]. To better characterise functional aspects of the lower airway microbiome, we explored the use of WGS and RNA metatranscriptome approaches to uncover active microbial metabolism of immunologically relevant metabolites, such as SCFAs.
Methods
Participants and samples
For this study samples were used from 21 healthy participants who were recruited for research bronchoscopy as part of our ongoing Chronic Obstructive Pulmonary Disease and Smoker Control cohort. All participants signed informed consent and the protocol was approved by the New York University and Bellevue Hospital Center (New York, NY, USA) institutional review boards (approval S14-01546). Further details are provided in the supplementary material.
Sample processing
DNA was extracted from all samples using the QIAamp DNA Mini Kit spin column protocol (Qiagen, Germantown, MD, USA). RNA extraction was carried out with the miRNeasy Micro Kit (Qiagen). Bacterial burden was measured by droplet digital PCR (ddPCR). All samples had high-throughput sequencing of bacterial 16S rRNA gene amplicons, and WGS and RNA metatranscriptome sequencing. Sequence data were filtered for bacteria only. Additionally, to identify active bacterial metabolism, SCFAs were measured by mass spectrometry. Further details are provided in the supplementary material.
Mouse experiment
20 female C57BL/6J mice (8–14 weeks of age, 18–22 g) were used (Jackson Research Laboratories, Bar Harbor, ME, USA). Three mice were inoculated with PBS, while the remaining 17 mice were inoculated with a mixture of human oral commensals consisting of Prevotella melaninogenica, Streptococcus mitis and Veillonella parvula. Bronchoalveolar lavage (BAL) samples were sent for 16S rRNA gene sequencing and RNA metatranscriptome sequencing. New York University Institutional Animal Care and Use approved the animal studies (approval S16-00032). Further details are provided in the supplementary material.
Statistical analysis
For association with discrete factors, we used nonparametric tests (Mann–Whitney or Kruskal–Wallis ANOVA). We used the vegan package in R to construct principal coordinate analysis based on Bray–Curtis distances [25, 26]. To cluster microbiome communities into exclusive “metacommunities” we used a Dirichlet Multinomial Mixture (DMM) model [27, 28]. To evaluate differences between groups within each sequence data type, we evaluated differential expression with DESeq2 [29] with a false discovery rate (FDR) <0.05 [30]. All data is publicly available in the Sequence Read Archive under accession numbers PRJNA603592, PRJNA573853 and PRJNA603675. All codes utilised for the analysis included in this article are available at: www.github.com/segalmicrobiomelab/functional_microbiomics. Further details are provided in the supplementary material.
Results
We recruited 21 healthy smokers for this study; lower airway samples from two subjects did not yield an adequate cDNA library for metatranscriptome sequencing and were excluded from the analysis, leaving a study cohort of 19 subjects (table 1). 16S rRNA gene sequencing characterised the microbiota present in background (“BKG”) controls, upper airway (“UA”) and lower airway (“BAL”) samples. Hierarchical clustering of the most abundant taxa shows that the microbiota in UA and BKG samples are differentially contained within the two dominant clusters (figure 1a). Some BAL samples were more similar to the UA samples, composed of taxa commonly identified as oral commensals such as Veillonella, Prevotella and Streptococcus, whereas other samples were more similar to BKG samples dominated by taxa such as Methylobacterium, Actinobacillus and Lactobacillus. We confirmed by DMM that BAL samples clustered into two distinct groups (figure 1b). Samples that clustered with UA samples were enriched with supraglottic predominant taxa (BAL.16S.SPT), while samples that clustered with BKG samples were enriched with background predominant taxa (BAL.16S.BPT) [1, 2]. This last cluster represents a group of samples with lower microbial biomass and with a taxonomic composition where most of the identified taxa likely come from methodological contamination and not from “true” lower airway microbes. Significant differences between all sample types were determined by both α-diversity (Shannon Index; figure 1c) and β-diversity (Bray–Curtis distance; figure 1d); these microbial community metrics further supported qualifying BAL.16S.SPT samples as more similar to UA samples. The median bacterial load, as determined by ddPCR, was around 1000-fold higher for UA samples and 10-fold higher for BAL samples compared with BKG samples (figure 1e). However, three of the BAL samples clearly had higher bacterial burden, with levels similar to those found in UA samples; they were all identified as BAL.16S.SPT based on taxonomic composition (figure 1e). To explore functional aspects using the 16S rRNA gene sequence data, we inferred metagenomic composition by PICRUSt [31]. Comparison of the inferred metagenome between BAL.16S.SPT and BAL.16S.BPT samples suggested that there should be several KEGG (Kyoto Encyclopedia of Genes and Genomes) and associated functional pathways differentially expressed between these two clusters (figure 1f and data 1 file in the supplementary material). While this analysis infers microbial function, it is possible to directly measure microbial function through the use of other new next-generation sequencing methods and metabolomics.
Baseline characteristics of the study population
16S rRNA gene, whole-genome shotgun (WGS) and RNA sequencing: background (BKG), upper airway (UA) and bronchoalveolar (BAL) samples were collected via bronchoscopy, and 16S rRNA gene, WGS and RNA sequencing was performed. SPT: supraglottic predominant taxa; BPT: background predominant taxa; DMM: Dirichlet Multinomial Mixture. a) A heatmap based on the Bray–Curtis distance for 16S rRNA gene sequencing illustrates the top taxa for all samples. Hierarchical clustering showed two clear clusters: one cluster with BKG samples and BAL samples similar to BKG (BPT), and one cluster with UA samples and BAL samples similar to UA (SPT). b) DMM showed two clusters had the best model fit for the 16S rRNA gene sequencing. c) α-diversity (Shannon Index) showed significant (Wilcoxon) difference between all samples, and the lowest diversity in UA samples and among BAL samples that clustered to BAL.16S.SPT by DMM. Box plots represent median and interquartile range; whiskers represent the 5–95th percentile. d) β-diversity (Bray–Curtis) also indicates a significant (PERMANOVA) difference between all samples for 16S rRNA gene sequencing. e) Bacterial load measured by droplet digital PCR showed the highest levels in UA samples (Kruskal–Wallis). BAL samples also had higher levels when compared with BKG samples. Median values are indicated. f) The inferred metagenome was assessed using PICRUSt, highlighting several significantly enriched pathways (coloured in red). The dashed line indicates a false discovery rate cut-off of 0.05. g) β-diversity (Bray–Curtis) for WGS showed a significant (PERMANOVA) difference between all samples, with UA samples separate from BKG and BAL.BPT samples. Three BAL.SPT samples clustered with UA samples. h) β-diversity (Bray–Curtis) for RNA showed a significant (PERMANOVA) difference between all sample types. Two BAL.SPT samples clustered with UA samples. i) z-transformed Bray–Curtis distance between BAL samples and paired UA samples showed clear separation of BAL.16S.BPT and BAL.16S.SPT samples in 16S rRNA gene sequencing. This separation was not as clear in WGS and RNA sequencing.
Evaluation of the lower airway metagenome and metatranscriptome
To further characterise functional aspects of the airway microbiota we profiled the metagenome by WGS sequencing and the metatranscriptome by RNA sequencing. For this analysis, all BAL and UA samples were used, while only two BKG samples had RNA sequencing libraries that passed quality control. Importantly, rarefaction analysis for the WGS and RNA data showed plateauing of the curves at a lower depth than the one accomplished in this investigation (supplementary figure S2). Within the WGS and RNA sequence data, UA and BKG samples were significantly different from each other, based on α- and β-diversity (figure 1g and h and supplementary figure S1a and b), similar to the 16S rRNA gene sequence data, while BAL samples were similar to either the UA or BKG samples. Using each Bray–Curtis distance matrix for 16S rRNA gene sequence, WGS and RNA metatranscriptome data, we compared the paired distances between BAL and UA samples. 16S rRNA gene data indicated a clear separation of what we identified as BAL.16S.SPT and BAL.16S.BPT, but this distinction was lost in WGS and RNA metatranscriptome data (figure 1i). Importantly, β-diversity analyses on both WGS and RNA sequencing data showed that all BAL samples identified as BAL.16S.BPT remained similar to BKG samples. However, among BAL samples identified as BAL.16S.SPT, both WGS and RNA sequencing identified a subset of samples that clearly showed greater similarity to UA samples, while others showed greater similarity to BKG samples (figure 1g and h). Interestingly, two out of three BAL samples that had the greatest similarities with UA samples in WGS data also had the greatest similarities with UA samples in the RNA metatranscriptome data (figure 1i).
Taxonomic signature differences between sequencing data types
To evaluate similarities of taxonomic annotation at a global level we used PROCRUSTES with Monte Carlo simulation. While there was high correlation in β-diversity between WGS and RNA taxonomic assignment (p=0.001), there was no significant correlation when 16S rRNA gene sequencing data were compared with WGS or RNA data (supplementary figure S3).
Based on 16S rRNA gene sequencing, several taxa were significantly enriched (FDR <0.05) in samples identified as BAL.SPT compared with BAL.BPT (figure 2a and data 2 file in the supplementary material), with a number of known oral commensals such as Prevotella, Veillonella and Streptococcus enriched in BAL.SPT.
Further differences identified in taxonomic signatures between sequencing data types are discussed in the supplementary material.
Taxonomic annotation of all three sequencing data types. BAL: bronchoalveolar lavage; SPT: supraglottic predominant taxa; BPT: background predominant taxa; WGS: whole-genome shotgun. DESeq2 analysis of taxonomic annotation (at the genus level) between BAL.16S.SPT versus BAL.16S.BPT samples (false discovery rate <0.05) was performed on a) 16S rRNA gene sequencing data, c) WGS data and e) RNA metatranscriptome data. Circle size is representative of relative abundance. b, d, f) Gene set enrichment analysis was used to compare the taxonomic signatures identified as distinctly enriched in BAL.16S.SPT versus BAL.16S.BPT samples across the different sequencing platforms.
Functional overlap and differences across sequencing data types
Using gene set enrichment analysis to compare the functional annotations across the sequencing data types, we identified significant overlap between the data obtained (>1000 overlapping KEGG Orthologies (KOs) for each comparison) (figure 3a). In order to compare the differentially enriched pathways identified (with DESeq2) between BAL.16S.SPT and BAL.16S.BPT, we overlapped the fold change of the functional pathways (summarised to Level 3 of annotation). We identified some concordance in the directionality of the fold change, most identified as enriched in BAL.16S.SPT (figure 3b). For example, fatty acid biosynthesis as well as purine and pyrimidine metabolism were significantly enriched in all three sequence datasets. However, the presence of statistical significance (identified in figure 3b by the use of coloured symbols) differed by the method used. Furthermore, other functional pathways showed discordant directionality. For example, genes belonging to the fatty acid metabolism pathway appeared to be significantly depleted in BAL.16S.SPT samples by the inferred metagenome but significantly enriched in WGS and nonsignificantly enriched in RNA sequencing data (figure 3b). Since several genes annotated to fatty acid biosynthesis and fatty acid metabolism are part of the production of SCFAs (end-products of microbial metabolism associated with enrichment of the lower airway microbiota with oral anaerobes [13]), we measured the levels of these products directly using mass spectrometry.
Functional annotation of all three sequencing data types: 16S rRNA gene, whole-genome shotgun (WGS) and RNA metatranscriptome. MAPK: mitogen-activated protein kinase; CoA: coenzyme A; ABC: ATP-binding cassette; BAL: bronchoalveolar lavage; SPT: supraglottic predominant taxa; BPT: background predominant taxa; KO: KEGG Orthology. a) Gene set enrichment analysis comparing functional signatures identified across the different sequence data types as distinctly enriched in BAL.16S.SPT versus BAL.16S.BPT samples based on KO annotation (differential enrichment performed based on DESeq2 analysis). b) KOs were summarised to associated pathways; differential expression between BAL.16S.SPT and BAL.16S.BPT displayed as circles for 16S rRNA gene, diamonds for WGS and squares for RNA data. Coloured symbols indicate statistical significance (DESeq2; p<0.05) for each sequence data type and symbol size is relative to the amount of KOs contributing to that pathway. Two pathways highlighted in red include fatty acid biosynthesis, which shows concordance of directionality between the three sequence data types, and fatty acid metabolism, which shows discordance.
Further differences identified in functional signatures between sequencing data types are discussed in the supplementary material.
SCFA levels are different in the upper and lower airways
Initially, SCFA levels in ex vivo cultures were analysed (supplementary material). We then evaluated SCFA levels in the 19 UA and BAL samples and four BKG samples. The levels of four out of seven SCFAs were significantly higher in UA samples when compared with BKG samples: acetate, propionate, isovalerate and butyrate (figure 4). However, the levels of three other SCFAs measured were similar when comparing UA with BKG samples: hexanoate, valerate and octanoate (data not shown). These data suggest that some SCFAs are not produced by oral commensals or that their measurement lacks a dynamic range above BKG samples. Among BAL samples, there were three SCFAs with levels statistically higher than BKG samples: acetate, propionate and isovalerate (all of which were exponentially higher in UA samples). However, three BAL samples identified as BAL.16S.SPT had significantly higher levels of these three SCFAs (figure 4). These concentrations are comparable to those we have previously published as measurable in the lower airways of a separate cohort and found to be correlated with a regulatory T-cell phenotype, i.e. blunted interleukin-17 and interferon-γ response [13]. The remaining BAL samples had similar levels to BKG samples (regardless of their 16S.SPT/16S.BPT grouping). Importantly, the levels of these SCFAs in BAL samples did not correlate with levels in UA samples (p-values nonsignificant for all comparisons) (supplementary figure S7), suggesting that microaspiration of upper airway secretions containing these metabolites was not the main source of SCFAs in the lung. To further test whether levels of SCFAs are dependent on lower airway microbial metabolism, we correlated the findings from the different genomic datasets with the measured SCFAs.
Concentrations of short-chain fatty acids (SCFAs) in bronchoscopy samples: a panel of SCFAs were measured and compared (Kruskal–Wallis) in background (BKG), upper airway (UA) and bronchoalveolar lavage (BAL) samples by gas chromatography-mass spectroscopy. SPT: supraglottic predominant taxa; BPT: background predominant taxa. SCFAs were derived from the linear phase of the standard curve leading to the following cut-off values (dotted line): a) 1 µM for acetate, b) 0.6 µM for propionate, c) 0.01 µM for isovalerate and d) 1 µM for butyrate. ns: nonsignificant.
The RNA metatranscriptome correlates with measured SCFAs in the lower airways
At a global compositional level (β-diversity), the levels of these four SCFAs were not statistically significantly associated with the 16S rRNA gene sequencing data, but three out of four were statistically significantly associated with the WGS and RNA metatranscriptome data (figure 5a). This is likely to be driven by the three BAL samples with high levels of measured SCFAs identified by mass spectrometry (figure 4), as this correlation was not seen with BAL.BPT samples. Furthermore, rarefying the three datasets led to no significant change in the correlations.
Diversity correlations with short-chain fatty acid (SCFA) measurements. WGS: whole-genome shotgun; BKG: background; UA: upper airway; BAL: bronchoalveolar lavage; BPT: background predominant taxa; SPT: supraglottic predominant taxa; KO: KEGG Orthology. a) Levels of SCFAs with acetate, propionate, isovalerate and butyrate were tested (PERMANOVA) against β-diversity distribution of data from all three sequencing techniques in BAL samples. Coloured bars indicate a statistically significant association. b–d) Relative abundance of three KOs with direct annotation to measured SCFAs was compared across sample types: K01738 (acetate), K00925 (propionate) and K01034 (butyrate) with b) 16S rRNA gene, c) WGS and d) RNA metatranscriptome sequencing. Box plots represent median and interquartile range; whiskers represent the 5–95th percentile. e) RNA metatranscriptome taxonomic annotation for these three SCFA-associated KOs in UA, BAL.RNA.SPT, BAL.RNA.BPT and BKG samples are represented. Each circle represents a different sample type and colours indicate a different taxonomic annotation.
Since DMM clustering on 16S rRNA gene sequencing data did not distinguish BAL samples with high and low SCFAs levels, we performed DMM analysis of the WGS and RNA metatranscriptome data (supplementary methods), again identifying two separate clusters among BAL samples.
As validation, we focused on three KOs with direct SCFA annotation: K01738 for acetate, K00925 for propionate and K01034 for butyrate. KO enrichment differences could be identified between sample types with the RNA metatranscriptome data, but not with the inferred metagenome (16S rRNA data) or WGS data (figure 5b–d). KOs in the RNA metatranscriptome data were significantly elevated in UA samples and at very low levels in BKG samples. Importantly, the BAL samples identified in DMM clustering as being compositionally similar to UA samples had significantly higher levels of these KOs when compared with the remaining BAL samples which clustered with BKG samples (p<0.03 for all comparisons) (figure 5d). We then evaluated the taxonomic annotation available for these three KOs in the RNA metatranscriptome data and noted that the taxonomic source for these genes was predominantly oral commensals, e.g. Streptococcus and Veillonella (figure 5e). Thus, the RNA metatranscriptome has better resolution (when compared with 16S rRNA gene sequencing) for the identification of enriched genes involved in the metabolism of SCFAs present in oral anaerobes, and supports the presence of viable (RNA measurable) and metabolically active (metabolites elevated above background) bacteria in the lower airways.
By identifying potential contaminants (coming from DNA/RNA present in the bronchoscope or added through sample processing) within each sequencing method using the decontam package [32] (supplementary results), we ensured that these microbial patterns identified were related to signals present in the lower airways.
The RNA signature is lost earlier than the DNA signature in a mouse model of aspiration
It is possible that the improved resolution of RNA metatranscriptome data in the identification of active microbial metabolism in the lower airways is due to differences in DNA and RNA clearance over time. To evaluate the stability and functional dynamics of aspirated oral commensals in the lower airways, we used a mouse model. For this, mice were inoculated with a mixture of human oral commensals consisting of Prevotella, Streptococcus and Veillonella co-housed with a PBS control group. Mice were sacrificed at 1 h, 4 h, 1 day, 3 days and 7 days post-inoculation (figure 6a), and BAL samples were sent for 16S rRNA gene and RNA metatranscriptome sequencing. β-diversity analysis on 16S rRNA gene sequencing data shows that BAL samples remain similar to the mixture of human oral commensals for at least 1 day and become more similar to PBS by day 3 with a concordant decrease in the relative abundance of oral commensals (figure 6b–d). However, in the analysis based on the RNA metatranscriptome, BAL samples remain similar to the mixture of human oral commensals until the 4-h time-point and become more similar to PBS by day 1, with a concordant rapid loss of the RNA signal from oral commensals (figure 6e–g). These data support that discrepancies between these sequencing data can be time dependent and likely reflect the loss of viable (and transcriptionally active) microbes.
Mouse experiment with 16S rRNA gene and RNA metatranscriptome sequencing. MOC: mixture of human oral commensals. a) Schematic of the experiment. Mice (n=17) were inoculated with a mixture of Prevotella, Streptococcus and Veillonella (MOC) and sacrificed at specific time intervals: 1 h, 4 h, 1 day, 3 days and 7 days. Bronchoalveolar lavage samples were analysed by b–d) 16S rRNA gene and e–g) RNA metatranscriptome sequencing: b, e) principle coordinate analysis was performed with Bray–Curtis distances by time-point, c, f) mean intergroup distance between sample time-point and PBS was calculated, and d, g) relative abundance for taxa annotated to Prevotella, Streptococcus and Veillonella was calculated for each time-point.
Discussion
Functional characterisation of the lower airway microbiota has been attempted in a limited number of studies. In most of these, the inferred metagenome was used [2, 33]. Few have attempted metagenomic analyses [15]. The purpose of this study was to evaluate different sequence data types in the evaluation of the functional microbiome of the lower airways and to use the measurement of SCFAs as an independent biological outcome, i.e. a direct measure of bacterial metabolism. Our analysis showed that, among lower airway samples with enrichment of oral commensals, determined by taxonomic assignment of 16S rRNA gene sequencing, the use of WGS or RNA metatranscriptomic sequencing provides a distinct representation of functional aspects of the lower airway microbiota. Importantly, by pairing our sequence data with SCFA measurement, we showed that in lower airway samples with oral commensal enrichment, based on 16S rRNA gene sequencing, there is a subset with evidence of active microbial metabolism indicative of viability of the lower airway microbiota. This active microbial metabolism in the lower airways has been shown to influence lower airway immunity [34]. Further support for dissimilarity between 16S rRNA gene and RNA metatranscriptomic sequencing is provided with a mouse model of aspiration of oral commensals, demonstrating time-dependent differences likely related to loss of metabolically active microbes as the lower airways clear them.
With the introduction of next-generation sequencing, we have discovered complex microbial communities within several different mucosae [35–38]. For each of these environments, the microbial–host interplay has an impact on mucosal homeostasis in health and disease [37–42]. Within the lower airways, several studies have shown that complex microbial communities significantly impact the mucosal host immune tone [1, 2, 14, 43, 44]. For example, we have previously shown that lower airway enrichment with oral commensals leads to an increased lower airway inflammatory tone, characterised by a T-helper 17-like inflammatory phenotype [2, 45]. Thus, it is increasingly important to describe these environments beyond just the presence/absence of bacteria and to look at the functional implications of these bacteria. A common technique used to evaluate bacterial function is to infer the metagenome from 16S rRNA gene sequencing data. Major concerns associated with this approach are the poor strain resolution of 16S rRNA gene sequencing and the dependence on existing reference databases of annotated microbes, which can bias the results. Direct measurement of microbial genes can be accomplished by WGS and RNA metatranscriptome sequencing. In this study, we used all three methods to evaluate taxonomic and functional signatures in the lower airways. As previously described, we identified a subset of subjects that had a lower airway microbiota enriched with oral commensals such as Prevotella, Veillonella and Streptococcus [2]. Enrichment with oral commensals in a subset of samples that were identified as BAL.SPT based on 16S rRNA gene sequencing was also found in WGS and RNA sequencing data but, importantly, not in all of them. This enrichment with supraglottic taxa in the lower airways and its impact on host immune tone also has implications in disease states, as we have previously shown [2]. Thus, the information we glean from each of these data types is variable and potentially important when combined in a multi-omic approach. In addition, we have shown performing multi-omic analysis on lower airways samples is a feasible approach that provides deeper insight into the lower airway micro-environment.
As validation for such an approach, we focused on SCFAs. Several publications have identified SCFAs as the products of bacterial metabolism [46, 47]; the role these metabolites play in disease has been extensively studied in the gastrointestinal microbiota [20, 21, 23, 24], and is thought to be beneficial in inflammatory bowel disease and bowel cancer [48–51]. Within the lower airways, we have described that levels of these metabolites are associated with oral commensal enrichment (as defined by 16S rRNA gene sequencing) and have significant immunomodulatory effects on T-cells [13]. In our prior investigations we also noted that not all subjects that had enrichment of the lower airway microbiota with oral commensals had elevated levels of SCFAs [13]. This suggests that functional characterisation of the lower airway microbiota cannot be fully determined based solely on 16S rRNA gene sequencing data. We therefore integrated our WGS and RNA data with the measurement of SCFAs in UA, BAL and BKG samples. As expected, SCFAs were highest in UA samples, consistent with the presence of oral anaerobes in these high biomass samples. Within BAL samples, a small subset of samples had detectable concentrations of acetate, propionate and isovalerate at levels similar to the UA samples, identified as BAL.SPT samples based on 16S rRNA gene sequencing. Other BAL samples also identified as BAL.SPT based on 16S rRNA gene sequencing had low/below the limit of detection SCFA levels that were comparable with BAL.BPT and BKG samples. In contrast, the RNA metatranscriptome showed better sample type differentiation concordant with detected levels of SCFAs. In a recent report evaluating samples from 13 subjects, SCFA levels did not correlate with bacterial burden in BAL samples [52]. In the current investigation we identified associations between SCFAs and the lower airway RNA metatranscriptome, suggesting that it is this active microbial translation that can be associated with levels of SCFAs. Importantly, taxonomic evaluation of these KOs identified that the bacteria expressing these genes were oral commensals such as Streptococcus, Prevotella and Veillonella. Thus, these data suggest that in these samples, although oral commensals might have reached the lower airways and left traces of their genomic DNA, these bacteria have been cleared and are neither transcriptionally active nor capable of producing SCFAs at the time of sampling. This is further supported with a pre-clinical model of aspiration where mice were exposed to a mixture of human oral commensals and sacrificed at different time-points post-exposure, showing rapid loss of an RNA metatranscriptome signal from these microbes and longer persistence of a 16S rRNA gene signal. Considering the known immunomodulatory effects of SCFAs and other microbial metabolites, both possibly beneficial and detrimental depending on the different human conditions, improved understanding of the value of different sequencing methods will be key to gain functional insights of the lower airway microbiome.
There are several limitations to this study. First, in our analysis we did not remove any potential contaminants, which we found as the predominant taxa identified in many of the lower airway samples. Removing taxa identified as contaminant is frequently done in many microbiome studies hoping to remove contamination. However, there are many sources of noise, including DNA contamination from the reagents/bronchoscope and stochastic sequencing noise [53, 54]. Furthermore, in low biomass samples background removal can be quite variable within different sequencing datasets and its effect on the resulting new dataset is unclear. A recent guideline on lung microbiome research has not recommended background removal [55], so our analysis was limited to just identifying possible contaminants. Importantly, none of the oral commensals associated with active microbial metabolism were identified as background contaminants. In addition, our sample size was small and further validation will require a larger cohort. The lack of good correlation between 16S rRNA gene sequencing and WGS data may seem quite surprising since both are based on a similar DNA template. However, similarity between two genomic datasets will be dependent on 1) their ability to detect “true” bacterial signals present in a biological sample (a well-recognised challenge in low biomass samples), and 2) the background contamination due to differences in library preparation and sequencing techniques [53, 54]. We also recognise that this approach could impose a significant increase in sequencing cost compared with traditional 16S rRNA gene sequencing. However, improved accuracy in identifying active microbial metabolism in the lower airways can potentially lead to novel mechanistic insights about microbial metabolites with significant potential effects on the host, such as SCFAs. Future investigations should focus on determining the value of this improved accuracy by evaluating the potential implications for the host immune tone, an undertaking that should be designed with a larger cohort. Thus, we acknowledge that in the current investigation we did not attempt to evaluate host factors. Instead, we focused on functional evaluation of the lower airway microbiota using SCFAs as proof of concept. It is important to note that we are already facing an increase in literature suggesting that “near-bedside” metagenomics is feasible (both technically and computationally), and has potential clinical implications in terms of rapid detection of pathogens when compared with culture-based approaches and ability to detect resistant genes [56, 57]. In this setting, our data support that RNA sequencing could provide a better resolution of what microbial functions are active at a given time and may therefore contribute to the development of more targeted therapy. We also acknowledge that concentrations of acetate, which are not specific to microbes, can be influenced by the host and the environment, including water. For our assay, we used freshly opened high-performance liquid chromatography-grade water which did not have detectable acetate above that in the BKG samples. Dilution of BAL can affect the levels of SCFAs, but should not affect compositional data such as metagenome/metatranscriptome data. Future investigations may consider estimating BAL dilution factors, noting that there is still controversy in the literature about the accuracy and the best method for this [58, 59]. Also, variability between sequence data type may be due to differences in measured targets (target amplicon versus WGS versus RNA) as well as technical differences. For example, it is expected that deeper sequence depth will be needed to characterise the whole genome than to characterise taxonomic composition based on 16S rRNA gene sequencing and infer the metagenome using that data. The low bacterial biomass of the lower airway environment represents a critical challenge for the evaluation of the lower airway microbiome, both taxonomically and functionally. It is likely that this is an exponentially bigger problem for the WGS metagenome and RNA metatranscriptome. Although we aimed to achieve more sequence depth for our WGS metagenome and RNA metatranscriptome data, it is not surprising that differences may be more difficult to assess and that there is a greater level of background intrusion in these samples using these methods. However, the correlation of these data with SCFA levels suggests a more accurate functional evaluation can be achieved with WGS metagenome and RNA metatranscriptome data than with 16S rRNA gene sequencing data. Finally, the analyses presented here focused on fatty acid metabolism as a surrogate for bacterial activity. The highest level of precision and differential functional expression for lower airway samples identified by using RNA metatranscriptome data suggests that this might be a preferable functional method. However, it is likely that other microbial functional pathways may be important to study in health and disease, and future investigations should focus on experimental approaches to expand the observations made as proof of concept here.
In summary, the evaluation of the lower airway microbiome with 16S rRNA gene sequencing is limited in assessing bacterial function and therefore in assessing the potential impact on disease/host. The use of functional microbiome approaches that measure bacterial genes (WGS) and bacterial transcripts (RNA metatranscriptome) provides evidence of viable and active bacterial metabolism in the lower airways, and will likely define subgroups of lower airway microbiota with different implications for the host.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary methods and results ERJ-03434-2020.Supplement
Supplementary figures ERJ-03434-2020.Figures
Supplementary data file 1 ERJ-03434-2020.Supp_Data_1
Supplementary data file 2 ERJ-03434-2020.Supp_Data_2
Supplementary data file 3 ERJ-03434-2020.Supp_Data_3
Supplementary data file 4 ERJ-03434-2020.Supp_Data_4
Shareable PDF
Supplementary Material
This one-page PDF can be shared freely online.
Shareable PDF ERJ-03434-2020.Shareable
Footnotes
Data availability: All data is publicly available in the Sequence Read Archive under accession numbers PRJNA603592, PRJNA573853 and PRJNA603675. All codes utilised for the analysis included in this article are available at: www.github.com/segalmicrobiomelab/functional_microbiomics
Author contributions: Conception and design: I. Sulaiman and L.N. Segal. Acquisition of data: B.G. Wu, Y. Li, A.S. Scott, K. Ji, A. Geber, S. Banakis, E. Ghedin, I. Sulaiman and L.N. Segal. Analysis and interpretation of data: B.G. Wu, J-C. Tsay, M. Sauthoff, M. Weiden, J.C. Clemente, D. Jones, Y.J. Huang, K.A. Stringer, L. Zhang, L. Tipton, E. Ghedin, I. Sulaiman and L.N. Segal. Drafting or revising the manuscript: I. Sulaiman, B.G. Wu, Y. Li, M. Sauthoff, A.S. Scott, J-C. Tsay, S.B. Koralov, M. Sauthoff, M. Weiden, J.C. Clemente, D. Jones, Y.J. Huang, K.A. Stringer, E. Ghedin and L.N. Segal. Final approval of the manuscript: I. Sulaiman and L.N. Segal.
This article has supplementary material available from erj.ersjournals.com This article has an editorial commentary: https://doi.org/10.1183/13993003.00321-2021
Conflict of interest: I. Sulaiman has nothing to disclose.
Conflict of interest: B.G. Wu has nothing to disclose.
Conflict of interest: Y. Li has nothing to disclose.
Conflict of interest: J-C. Tsay has nothing to disclose.
Conflict of interest: M. Sauthoff has nothing to disclose.
Conflict of interest: A.S. Scott has nothing to disclose.
Conflict of interest: K. Ji has nothing to disclose.
Conflict of interest: S.B. Koralov has nothing to disclose.
Conflict of interest: M. Weiden has nothing to disclose.
Conflict of interest: J.C. Clemente has nothing to disclose.
Conflict of interest: D. Jones has nothing to disclose.
Conflict of interest: Y.J. Huang has nothing to disclose.
Conflict of interest: K.A. Stringer has nothing to disclose.
Conflict of interest: L. Zhang has nothing to disclose.
Conflict of interest: A. Geber has nothing to disclose.
Conflict of interest: S. Banakis has nothing to disclose.
Conflict of interest: L. Tipton has nothing to disclose.
Conflict of interest: E. Ghedin has nothing to disclose.
Conflict of interest: L.N. Segal has nothing to disclose.
Support statement: R37 CA244775 (L.N. Segal, National Institutes of Health/National Cancer Institute), R01 HL125816 (S.B. Koralov and L.N. Segal, National Institutes of Health/National Heart, Lung, and Blood Institute), R01 GM111400 (K.A. Stringer, National Institutes of Health/National Institute of General Medical Sciences), Stony Wold (I. Sulaiman), Chest Foundation Grant (I. Sulaiman), Flight Attendant Medical Research Institute (B.G. Wu), PACT grant (L.N. Segal, Foundation for the National Institutes of Health (FHIH)), R01AI129958 (Y.J. Huang). Financial support for the PACT project is made possible through funding support provided to the FNIH by AbbVie Inc., Amgen Inc., Boehringer Ingelheim Pharma GmbH & Co. KG, Bristol-Myers Squibb, Celgene Corporation, Genentech Inc., Gilead, GlaxoSmithKline plc, Janssen Pharmaceutical Companies of Johnson & Johnson, Novartis Institutes for Biomedical Research, Pfizer Inc. and Sanofi. Funding information for this article has been deposited with the Crossref Funder Registry.
- Received October 16, 2020.
- Accepted December 19, 2020.
- Copyright ©The authors 2021. For reproduction rights and permissions contact permissions{at}ersnet.org