Abstract
Studying the respiratory microbiome provides critical novel insights into respiratory disease pathogenesis which may improve clinical management, and we should strive to standardise study design, laboratory procedures and statistical methods in the field http://ow.ly/qNF330mWrEK
Over the past decade, researchers have begun to unravel the causes and consequences of variation within the respiratory microbiota, developing a more profound understanding of its role in the pathogenesis of pulmonary disease to improve clinical management. Developments in culture-independent identification of bacterial species have provided faster and more cost-effective methods to characterise niche-specific microbial ecosystems. Historically, the gut has been the niche of focus for human microbiome research, but recent studies have revealed an unexpected diversity of bacteria in both the upper and lower airways, linking community composition to a number of respiratory diseases, including cystic fibrosis (CF), chronic obstructive pulmonary disease (COPD) and acute infections [1]. Monitoring temporal changes in community composition of the respiratory microbiota can reveal the influence of host and environmental drivers on ecosystem behaviour, as well as the consequences of infection susceptibility or severity, and treatment effects. Here we outline current best practices and upcoming developments for respiratory microbiome research and potential clinical applications.
Study design
So far, in respiratory microbiome study design, we have learned that crucial elements in generating valid, useful results include clear research questions, power calculations, enrolment of sufficient numbers of subjects and controls, robust sampling and exhaustive patient information collection. Although this applies to any well-designed population-based or clinical study, we also need to carefully consider possible confounding effects of a broad range of environmental and host characteristics on microbiome composition [1, 2]. The first pioneering cross-sectional studies linked altered microbial community structure and composition to disease state [3, 4], but longitudinal sampling is needed to fully understand the causes and long-term clinical outcomes of variation in respiratory microbiota. For example, recent well-characterised healthy birth cohorts have shown the dynamics of nasopharyngeal microbiota development in relation to lifestyle factors [5, 6], and have revealed marked shifts in microbial community composition associated with acute respiratory infections [7, 8]. Intensive follow-up of CF [9, 10] and COPD [11] patients demonstrated changes in the airway microbiome composition preceding symptom onset, suggesting that dysbiosis coupled to a dysregulated host immune response could be the basis of disease progression [12]. Support for the potential role for the respiratory microbiome in early disease pathogenesis is evident in early childhood, as microbial communities with fewer commensals and more potential pathogens are associated with consecutive wheeze and asthma [8]. To date, every study of respiratory microbiota in relation to any lung disease has revealed clear aberrations of microbial community composition from the healthy state, redefining commonly accepted pathophysiological concepts in respiratory disease pathogenesis [12].
Sample collection
With respect to anatomy and site of sampling, the respiratory tract is not a single uniform system, but consists of interconnected niches harbouring distinct microbial communities that depend highly on local microenvironmental conditions. Therefore, when designing a new microbiome study, the appropriate sampling niche will largely depend on the research question, hypothesis and target population. Key procedural practicalities also require consideration; for example, sampling the lower respiratory tract (LRT) requires invasive bronchoscopic procedures, limiting sample size, age-groups to be studied, and frequency of repeated sampling. To overcome this lack of access to the lungs, many studies use the easily accessible upper respiratory tract (URT), which is considered the probable source community of the lungs as well as a reservoir for most respiratory pathogens [12, 13]. In healthy adults, microbial colonisation of the LRT is assumed to originate from micro-aspiration of the oropharyngeal “flora”, and hence, the oropharynx can be used as an, albeit imperfect, proxy for the lungs. In children, however, both the nasopharynx and oropharynx are likely sources of microbial seeding to the LRT, probably resulting from anatomical differences, nasal breathing, and higher production of nasal secretions by children [14, 15], further limiting result extrapolation. In chronic lung diseases such as CF and COPD, the URT and LRT communities appear to become segregated with increasing disease duration. This is probably due to chronic inflammation, failure of lung clearance mechanisms and repeated antimicrobial treatment resulting in localised selection and evolution of independent communities, the latter rendering LRT sampling from multiple sites mandatory to obtain meaningful results [16, 17].
Sample processing
An important aspect to consider throughout the design and execution of a respiratory microbiome study is the risk of and control for contamination. The respiratory tract harbours low-density bacterial communities, with microbial densities dropping along the way from the URT to the LRT [14, 18]. As a result, environmental DNA introduction during sample collection and processing becomes a likely threat and can entirely overrule the true microbial signal [18]. In particular, sampling of the LRT carries a high risk of microbial carryover from the URT, and so accurate sampling should be undertaken by well-trained and consistent personnel to reduce the risk of contamination. During transportation, samples should be kept cooled in appropriate storage media, and then processed and stored at −80°C as soon as possible to prevent selective bacterial outgrowth. In addition, contamination from the laboratory environment and the reagents used for sample processing can significantly influence results from low-biomass microbial communities [19]. Implementing proper “negative” controls for all sampling, storage and laboratory procedures allows for later comparison and identification of potentially confounding environmental signals (for more details see [20]). Variations in methodology and batches can also affect results, highlighting the importance of clean working during DNA extraction and using fully optimised methods for the specific sample type. In addition to contamination, the extraction method can also affect the quality of the data and care should be taken to use methods which do not bias the bacteria extracted from the samples [18]. Including “positive” controls in the form of mock communities, will allow for adequate control and comparison between sequencing runs, laboratories and institutes [13].
Sequencing platforms
In terms of sequencing platforms, amplicon sequencing is currently the most commonly used method for determining the microbial community composition and targets the bacterial 16S ribosomal RNA (rRNA) gene, containing highly conserved as well as hypervariable regions. This targeted approach has revealed a wealth of information regarding community composition and dynamics. However, the taxonomic resolution provided by 16S rRNA sequencing is limited due to the short target region length, complicating accurate species- and strain-level identification. In comparison, metagenomic sequencing captures the entire microbial genomic content, including bacteria, viruses and eukaryotes, and allows for microbial characterisation at the deepest taxonomic levels as well as functional potential profiling. However, applying this technique to low-biomass respiratory samples is challenging, as genome assembly requires high numbers of sequencing reads per sample, which makes detection of low-abundant species difficult and increases the risk of contamination [21].
Data handling
Once data is generated, the bioinformatics and statistical methods required to analyse the large amounts of raw DNA reads generated by sequencing can be daunting. Initially, raw reads are filtered to remove sequencing errors and are assembled into complete sequences, after which the sequences are grouped based on similarity and assigned taxonomic names to reveal their identities. Several bioinformatics pipelines are freely available for data pre-processing, including Qiime [22] and mothur [23]. Each resulting microbial profile shows the abundance of individual species relative to the entire bacterial population within a sample, and contains many zero abundances, demanding nonparametric statistical methods developed specifically for handling microbiome data [24]. Characterising microbial development over time requires multiple measurements of the same individual, further complicating data analysis, but several approaches have been proposed to correct for repeated measures [25, 26]. The increasing application of machine-learning techniques that perform predictive modelling of clinical outcomes from microbial profiles combined with host and environmental characteristics is a promising development [27]. However, the study of temporal microbiome dynamics, especially while accounting for confounding factors, remains in its infancy [24].
Clinical applications
In the era of the 100 000 Genomes Project and the launch of the NHS Genomic Medicine Service, sequencing techniques are not only more accessible but are also becoming more integral to the clinical environment. In the clinic, identification by culture still dominates pathogen detection, and although quantitative methods such as qPCR are increasingly available, applications of sequencing technologies are lacking. Cost-effectiveness and efficiency of sample and data processing are currently being improved to enable clinical implementation of sequencing methods. Single-use sequencing applications are being developed, as are faster methods of DNA extraction and library preparation [28]. Technological and bioinformatic advances are in the pipeline to improve detection of subtle strain-specific variation within the target region [24]. For applying sequencing at the point of care, the portable, low cost, real-time DNA sequencer Oxford Nanopore MinION (Oxford Nanopore Technologies, Oxford, UK) has real potential with its ability to rapidly sequence the bacterial 16S gene, even up to strain-specific resolution [29]. The emergence of real-time sequencing technologies could dramatically influence diagnostic methods through accurate species identification and quantification within a clinically relevant time frame.
Research priorities
To move towards clinical applications, comparative and meta-analyses must combine results from different cohorts to define actionable thresholds of microbial abundance. Current methodological heterogeneity restricts comparability across institutes, and so by underlining essential aspects of study design including consistent sample collection and processing, adequate contamination controls, and longitudinal sampling (summarised in figure 1 and box 1), we hope to encourage reaching a consensus on solid, robust methodology for respiratory microbiota research. Our increased understanding of respiratory disease pathogenesis will contribute to reshaping clinical diagnostic, preventative and therapeutic strategies. Important challenges remain to integrate the advances within microbiota research into everyday medical practice, and future efforts should prioritise standardisation of protocols and analysis, adaptation of technology for application in the field including remote settings, and collaboration across countries and disciplines (box 2). However, current progress in respiratory microbiota research certainly provides a promising platform for the clinical application of culture-independent techniques in the future.
BOX 1 Essentials for respiratory microbiome studies
Longitudinal study design
Appropriate power calculations
Consistent sampling
Appropriate niche (proxy)
Minimise contamination at all stages
Contamination controls at all stages
Robust quality checks
Consistent bioinformatic processing
Suitable analysis techniques (complex data)
BOX 2 Research priorities for future studies
International platforms for communication
Uniform sampling and transport protocols
Standardised controls across laboratories
Agreement on handling complex data
Adapt technology for remote settings
Collaboration between research disciplines
(clinics, microbiology, molecular biology,
ecology, bioinformatics)
Invest in (interdisciplinary) training
Challenges in characterising the respiratory microbiome. Niche-specific communities reside in the different parts of the upper respiratory tract (URT) and lower respiratory tract (LRT), and therefore sampling site should depend on the research question and population studied. During health and acute URT infection (URTI) or LRT infection (LRTI), the LRT is transiently colonised with microbes from the URT (oropharynx for adults, naso- and oropharynx for children), while in chronic lung diseases, over time local selection and community assembly leads to differences between URT and LRT assemblages. In general, the local bacterial density in the respiratory tract is low, further decreasing when descending towards the LRT. Therefore, working with low-biomass samples requires careful sampling procedures and laboratory handling, including appropriate negative and positive controls to acquire reliable results. Microbial development over time is affected by environmental stimuli including crowding factors and pollution, and is altered in various acute and chronic diseases, so repeated sampling and exhaustive data collection of the same subjects over time is required to study cause–consequence relationships and estimate environment-induced variation. Appropriate bioinformatic processing of the sequencing data is required before robust statistical analysis is executed, which preferably accounts for covariates and repeated measures where relevant.
Footnotes
Conflict of interest: R.L. Watson has nothing to disclose.
Conflict of interest: E.M. de Koff has nothing to disclose.
Conflict of interest: D. Bogaert reports personal fees from Friesland Campina, grants from Nutricia, and grants from MedImmune, outside the submitted work.
- Received September 6, 2018.
- Accepted November 5, 2018.
- Copyright ©ERS 2019