Abstract
Background Sarcoidosis is a multisystem granulomatous disease of unknown origin with a variable and often unpredictable course and pattern of organ involvement. In this study we sought to identify specific bronchoalveolar lavage (BAL) cell gene expression patterns indicative of distinct disease phenotypic traits.
Methods RNA sequencing by Ion Torrent Proton was performed on BAL cells obtained from 215 well-characterised patients with pulmonary sarcoidosis enrolled in the multicentre Genomic Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) study. Weighted gene co-expression network analysis and nonparametric statistics were used to analyse genome-wide BAL transcriptome. Validation of results was performed using a microarray expression dataset of an independent sarcoidosis cohort (Freiburg, Germany; n=50).
Results Our supervised analysis found associations between distinct transcriptional programmes and major pulmonary phenotypic manifestations of sarcoidosis including T-helper type 1 (Th1) and Th17 pathways associated with hilar lymphadenopathy, transforming growth factor-β1 (TGFB1) and mechanistic target of rapamycin (MTOR) signalling with parenchymal involvement, and interleukin (IL)-7 and IL-2 with airway involvement. Our unsupervised analysis revealed gene modules that uncovered four potential sarcoidosis endotypes including hilar lymphadenopathy with increased acute T-cell immune response; extraocular organ involvement with PI3K activation pathways; chronic and multiorgan disease with increased immune response pathways; and multiorgan involvement, with increased IL-1 and IL-18 immune and inflammatory responses. We validated the occurrence of these endotypes using gene expression, pulmonary function tests and cell differentials from Freiburg.
Conclusion Taken together, our results identify BAL gene expression programmes that characterise major pulmonary sarcoidosis phenotypes and suggest the presence of distinct disease molecular endotypes.
Abstract
Genome-wide BAL transcriptomics identified novel gene expression profiles associated with distinct phenotypic traits in sarcoidosis and is suggestive of the presence of novel molecular and clinical sarcoidosis endotypes https://bit.ly/3vf7VfT
Introduction
Sarcoidosis is a granulomatous disease of unknown aetiology which can affect almost every organ, but affects the lungs in majority of the cases (>90%). The patterns of organ involvement and disease course are often unpredictable, but a substantial number of patients suffer either a relapsing or progressive course with mortality estimated at 12% in advanced cases [1–4]. Despite significant advances in understanding the contribution of genetic predisposition, immune aberrations and the presence of microbial antigens in patients with sarcoidosis, little is known about the genetic networks and environmental factors that determine the phenotype in sarcoidosis. Similarly, treatment is still based on immunosuppression, with corticosteroids serving as first-line, and then use of “steroid-sparing” agents with limited evidence of long-term benefit [5, 6]. Genetics and genomics studies on sarcoidosis have focused on identifying DNA variants or gene signatures associated with sarcoidosis [7–13]. Genome-wide association studies have found associations between the major histocompatibility complex region and HLA-DRB1 variants and disease severity and sarcoidosis risk. Most previous whole-transcriptome studies focused on identifying gene signatures that distinguish sarcoidosis from control; progressive from nonprogressive disease; or from other granulomatous diseases such as tuberculosis [7–14]. While highly informative, they were mostly focused on the peripheral blood and limited in size, heterogeneity of disease manifestations and depth of phenotyping.
In this study, we performed a genome-wide transcriptome analysis of bronchoalveolar lavage (BAL) samples collected from a large cohort of well-characterised sarcoidosis patients recruited by the Genomic Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) study [15]. We conducted both supervised and unsupervised analysis on the measured gene expression profiles of these BAL samples to identify gene signatures that are associated with the heterogeneity in the clinical and phenotypic manifestations of sarcoidosis (figure 1). The results suggest the presence of novel molecular and clinical endotypes of sarcoidosis.
Methods
GRADS patient population
BAL samples were available from subjects enrolled in the GRADS study as described previously [15]. After informed consent and recruitment screening (supplementary figure S1), patients were grouped into pre-defined phenotypic groups based on a modified organ assessment instrument developed by the ACCESS study [16] and described by us in detail [15]. The collected phenotypic information according to the GRADS study protocol include physiological parameters, high-resolution computed tomography (CT), BAL and detailed questionnaires assessing dyspnoea, fatigue and quality of life, current and past medical information, occupational and environmental exposures, among others. All recruitment and consenting procedures were compliant with current United States Health Insurance Portability and Accountability Act regulations and approved by the local institutional review board (IRB) [15].
Freiburg validation cohort
BAL gene expression data from Affymetrix Human Gene 1.0 ST arrays were available from 50 individuals with sarcoidosis recruited independently in Freiburg, Germany (table 1, supplementary material). The consents were collected following institutional IRB protocols.
Sample preparation and transcriptomic analysis
Total RNA was extracted as described previously [17] (supplementary material). For GRADS samples, cDNA libraries were generated, amplified and sequenced on the Ion Proton System for next-generation sequencing to obtain sequencing depth of ∼30 million single-end reads per sample with an average read length of 150 bps (supplementary table S1). Gene expression in Freiburg samples was quantified using the Affymetrix Human Gene 1.0 ST arrays (supplementary material).
Statistical analysis
Our analysis plan included a supervised analysis aiming to identify known associations between pre-defined parameters and unsupervised analysis aimed to identify novel relationships. The supervised analysis identified gene signatures associated with pre-defined clinically relevant sarcoidosis phenotypes including Scadding staging, pulmonary function tests (PFTs), CT scan parameters, age, gender, BAL cell counts and the eight phenotypic groups defined by the GRADS study [15] using nonparametric tests (supplementary material). Unless otherwise stated in the text we used the false discovery rate (FDR) to control for multiple testing error. The unsupervised analysis was performed using the weighted gene co-expression network analysis (WGCNA) [18] as described previously by us [19]. Genes from chosen WGCNA modules (p<0.05) were used to cluster the patients into subgroups using K-means clustering (supplementary material). Chi-squared test and Wilcoxon rank sum test were used to assess the difference for categorical and continuous patient characteristics (p<0.05), respectively, among the clusters for chosen gene modules (supplementary material). GeneGO MetaCore was applied to identify significant enriched pathways (FDR<0.05) for each gene module identified by the unsupervised analysis. The data are publicly available on the Gene Expression Omnibus database (www.ncbi.nlm.nih.gov/geo/) under the accession number GSE109516. All analysis codes and results are available on github page (https://yale-p2med.github.io/SARC_BAL).
Results
Patient cohorts
For the GRADS cohort, based on RNA quality and quantity as well as the quality of RNA-Seq data of BAL samples, we included 209 BAL samples from eight pre-defined clinical phenotype groups [15]: n=25 stage I, n=34 stage II–III treated, n=42 stage II–III untreated, n=19 stage IV treated, n=12 stage IV untreated, n=14 acute sarcoidosis, n=40 remitting and n=23 subjects with multiorgan involvement (supplementary table S1). 53.6% were female and 23.4% were black (table 2). For the Freiburg cohort, 50 BAL samples were available; 36% were female and 100% were white. The clinical information collected included demographics and 12 clinical phenotypic traits overlapping with the GRADS protocol (table 1, supplementary table S3).
Gene expression patterns associated with hilar lymphadenopathy, parenchymal or airway involvement
Using supervised analysis, we assessed the association of gene expression of 209 BAL samples with each of 24 clinical traits that were pre-defined including Scadding staging, PFTs, CT scan parameters, age, gender, BAL cell counts and the eight phenotypic groups. This analysis revealed genes associated with each trait (FDR<0.05) as well as overlapping genes between traits (figure 2a, supplementary material).
Increased Scadding stage was significantly associated with increased expression in 166 genes and decreased expression in 29 genes (figure 2a). Of these genes, many were also significantly associated with CT traits such as hilar lymphadenopathy (17 genes) and reticular abnormalities (83 genes). Increased genes associated with both progressive Scadding staging and more reticular abnormalities include PLA2G7, ID1, LGMN and CCL2, and decreased genes include PDLIM1 and AOC3. SLC40A1 was increased with progressive Scadding staging, more reticular abnormalities and more hilar lymphadenopathy. These genes are known to be involved in fibrosis and chronic obstructive lung disease [20–22]. Genes increased with progressive Scadding staging were enriched for interleukin (IL)-1, IL-8 and IL-6 pathways (figure 2b), previously associated with inflammatory bowel disease [23] and lung fibrosis [20]. Interestingly, many of these genes are also increased with changes in clinical traits not included in the definition of Scadding stage, such as increased bronchial wall thickening (68 genes; e.g. TREM2, CHIT1, LGMN) and decreased forced expiratory volume in 1 s (FEV1) percentage and forced vital capacity (FVC) % predicted (47 and six genes, respectively, e.g. IDI1 and INSIG1), previously reported in sarcoidosis [24, 25].
Among the CT phenotypic traits, reticular abnormality, hilar lymphadenopathy and bronchial wall thickening were significantly associated with large number of genes (757, 266 and 487, respectively; figure 2a). Genes increased in the presence of hilar lymphadenopathy were enriched for T-helper type 1 (Th1) and Th17, interferon (IFN)-γ and nuclear factor of activated T-cells (NFAT) signalling (figure 2b), and included CD28, STAT1, CXCR3 and CCR4, previously reported in sarcoidosis immune response and implicated in pulmonary fibrosis [13, 26]. Genes increased with more severe bronchial wall thickening are enriched for aberrant IL-2 and IL-7 pathways (figure 2b) including MRC2, SLC40A1, F2R, IL7, PTPN7, ADORA2A, SPRY2, PLA2G7 and PTGS1. Reticular abnormalities on CT scan were positively correlated with many known fibrosis-associated genes such as TGFBR1, COL3A1, TLR3, ID1, TCF4, IGFBP6, PLA2G7, FADS1, ARGHAP12 and MMP10 [27–30]. Interestingly, ID1, HMGC1 and SEPP1 were also inversely correlated with diffusing capacity of the lung for carbon monoxide (DLCO) percentage and FVC % pred, potentially reflecting the shared biology underlying physiological restriction, loss of diffusion capacity and reticular abnormalities. Pathway analysis of genes positively correlated with reticular abnormality revealed an enrichment for mechanistic target of rapamycin (MTOR; cytochrome C, SC5D, HIF1A, PPAR-α) and cell cycle (CDC25C, CHK2, CDK1) signalling, which was also seen in genes increased with progressive Scadding staging (figure 2b).
Taken together, our results suggest that different transcriptional programmes affect the three major phenotypic manifestations of pulmonary involvement in sarcoidosis with Th1 and Th17 associated with hilar lymphadenopathy, transforming growth factor-β1 (TGFB1) and MTOR signalling for parenchymal involvement and IL-7 and IL-2 for airway involvement.
Genes associated with race are also associated with hilar lymphadenopathy, parenchymal or airway involvement
308 genes were associated with race (188 increased in white subjects and 120 in black subjects; figure 2a). SLC22A16, NME4, PWP2, ASRGL1 and SCARB1 were the top increased genes in black subjects while CD300C, LAMA1, RNF135, ST14 were the top genes increased in white subjects (figure 2a). Expression of SLC22A16 and PWP2 were previously shown to be race dependent in cancer subjects [31]. Aberrant lipid trafficking, coenzyme A biosynthesis and arachidonic acid production were enriched in the genes higher in black subjects, while no enriched pathway was found for the genes higher in white subjects. Genes increased in black subjects had a significant overlap with genes increased with progressive Scadding stage (MYOZ1, SQRDL, FAM213A) and decreased PFTs (TCEA3, FAM213A, MYO1E, CYP51A1, SQLE), being consistent with findings from previous studies on the importance of race in disease severity [32]. Genes increased with reduced lung function (DLCO %, FVC %, FEV1 %), increased reticular abnormality and in black subjects were associated with SCAP/SREBP transcriptional control of cholesterol and fatty acid biosynthesis (figure 2b), potentially reflecting race-specific pathways of injury.
Genes increasing with higher lymphocyte fraction in BAL reveal shared transcriptional programmes related to hilar lymphadenopathy and bronchial wall thickening
Disease activity is known to be associated with changes in BAL cell composition [33]. While total BAL cell count was associated with only a small number of genes, lymphocyte and macrophage fractions in BAL were associated with the largest number of genes (2435 and 1284, respectively; figure 2a, supplementary material). Among the top genes (Spearman's ρ>0.3, FDR<0.05) positively correlated with lymphocyte count were known markers of T-lymphocyte subpopulations such as CD2, CD3, CD6, CD5, CD96 and CD247 [34–37]. Among other positively correlated genes were markers of lymphocyte activation (ITK, LCK, CD28, CTLA4, IL2RB), granzymes (GZMA, GZMB and GZMH, ETS1), chemokines and their receptors (CCL5, CXCL9, CCR4, CXCR3,4,6), cytokines and their receptors (IFNG, IL6, IL26, IL32, IL2RB,G, IL12RB2, IL15RA, IL18RA, IL21RA) and inflammatory regulators (JAK3, NFATC, NKB2). Genes significantly correlated with lymphocyte count significantly overlapped with those genes associated with bronchial wall thickening, hilar lymphadenopathy and Scadding stage, but not with genes associated with reticular abnormality, PFT and demographics (figure 2a). Among the genes positively correlated with both lymphocyte count and bronchial thickening were CCR5, CCR6, CD84, CD28, IL12RB, IL18R, IL21R and IFNG, potentially reflecting activation of CD4+ T-lymphocytes, T:B lymphocyte interactions, Th1 inflammatory response and susceptibility to sarcoidosis [38–41]. Genes increasing with both increased lymphocyte count and increased hilar lymphadenopathy were LY9, GNAO1, IFNG, F2R, CCR 4/5/8, CD6, CD5 and KIF21B [42]. Genes increased with higher lymphocyte fractions were enriched for numerous immune T-cell responses (Th1/Th17, IFN-γ, OX40L/OX40), follicular Th-cell dysfunction, T-cell co-signalling receptors and systemic lupus erythematosus genetic marker genes (figure 2b, supplementary material).
WGCNA gene modules associate significantly with PFTs, CT imaging features and BAL cell differentials
To identify gene modules associated with important clinical features of sarcoidosis in an unbiased way, we ran a WGCNA analysis. This analysis identified 48 gene modules that correlated with PFTs, CT scan findings (mediastinal and hilar lymphadenopathy, traction bronchiectasis, micronodule, ground glass, reticular abnormality) and BAL cell differentials (supplementary figure S5). Among the 48 modules, five (modules 1, 4, 18, 33 and 47) had the largest number of genes significantly correlated (p<0.05) with the highest number or unique combination of clinical traits (figure 3a).
Module 4 (1626 genes) was positively correlated with most CT imaging features, BAL cell differentials and Scadding staging, suggesting a plausible link between BAL gene expression, increased Scadding stage, increased lymphocyte and eosinophil differentials. This module was negatively correlated with macrophage differential and FEV1/FVC ratio (figure 3a). Module 1 had the largest number of genes (n=7258). It was negatively correlated with all PFT % pred values, and positively correlated with the presence of mediastinal lymphadenopathy and reticular abnormality. No correlation was found for BAL cell differentials or the total BAL cell counts, suggesting the existence of large number of gene signatures associated with lung function regardless of BAL cell differentials. In addition, module 18 (99 genes) was negatively correlated with FEV1 % pred, FVC % pred and FEV1/FVC ratio, as well as macrophage differential. Module 33 (51 genes) was positively correlated with the presence of mediastinal and hilar lymphadenopathy, micronodule, traction bronchiectasis and higher lymphocyte and eosinophil cell differentials (figure 3a). Module 47 contained 21 genes that correlated with sex and PFT but not % pred PFT values, reflecting the effect of sex on PFT. Disease duration, smoking and specific sarcoidosis treatment were not significantly associated with any gene module (figure 3a).
Four gene modules are suggestive of novel molecular endotypes of sarcoidosis
To define novel disease endotypes we performed K-means clustering analysis using genes from each of the WGCNA modules to identify clusters of sarcoidosis patients using 2289 clinical traits collected under the GRADS protocol, including 204 environmental factors relevant for pathogenesis of lung disease and outlined by GRADS investigators (figure 4).
Clustering based on module 47 revealed separation of sarcoidosis patients based on sex in two clusters (A: male, B: female) but no other phenotypic traits (figure 4a).
Clustering based on genes in module 4 identified clusters of patients who differed clinically with one group having significantly more hilar and mediastinal lymphadenopathy, larger lymph nodes, increased Scadding stage, less remitting phenotype and more lymphocytes in BAL (figure 4b, cluster C) than the others. These patients had a history of increased exposure to woodfire smoke and had more skin and kidney involvement. Increased T-cell immune response and decreased signal transduction via cAMP and protein kinase A was observed in this cluster of patients, suggesting that these patients have an acute lymphocytic inflammation (figure 4b, supplementary table E3).
Clustering patients based on genes in module 33 revealed clusters of patients who differed clinically with one group (cluster C; figure 4c), having significantly more lymphocytes in BAL and more lymph organ involvement, more exposure to sand, less fatigue and less work in the house than patients in cluster B. Patients in cluster C were more chronic and had higher multiorgan involvement (skin) than cluster A (figure 4c, supplementary table E3).
Clustering patients based on genes in module 18 identified clusters of patients who differed clinically and environmentally with one distinguished group (cluster C; figure 4d) having significantly more patients from US states Connecticut, New Jersey, New York, Pennsylvania and Tennessee; more lung, joint and kidney involvement; and increased urinary calcium; and were exposed to more wood coal before diagnosis. However, these patients have less mediastinal lymphadenopathy and micronodules than patients in cluster B (figure 4d, supplementary table E3). Increased IL-1 and IL-18 immune and inflammatory responses were observed in cluster C (figure 4d).
Clustering patients based on genes in module 1 revealed clusters of patients who differed clinically and environmentally with one group having significantly more mediastinal lymphadenopathy and reticular abnormality in the lung, less multiorgan phenotype (affected eyes), more environment effects (aluminum) and with more subjects living in US states Arizona and Tennessee. Upregulation of apoptotic, immune response and development pathways related to phosphoinositide 3-kinase (PI3K) activation was observed in the same cluster of patients (cluster A; figure 4e, supplementary table E3).
Validation using an independent cohort
Endotype validation
To validate the endotypes we discovered, we used the genome-wide BAL transcriptome data and clinical traits from an independent cohort of sarcoidosis patients (Freiburg cohort). 12 clinical traits (Scadding stage, age, gender, BAL cell differential and PFTs) were available for both the GRADS and the Freiburg cohorts (table 1). Patients in both cohorts were clustered using genes from each of the four novel modules and gender module independently and revealed a similar gene expression pattern visually (figure 4). The two extreme patient clusters defined by each module in both cohorts were compared for the 12 clinical common traits. Validated traits were considered if p-values from the two cohorts were either both significant (p<0.05) or both insignificant (p>0.05) (table 3 using original data, supplementary table S3 using data adjusted for BAL cell differentials). This analysis showed that the endotype of sex (module 47) was fully validated for its significant association with sex. Both modules 4 and 33 were mainly defined by CT scan features (figure 3a), which were not available in the Freiburg cohort for validation. However, for both modules, the significant association with macrophage, lymphocyte and neutrophil differential were validated and no association with age and PFTs (basal and predicted) was found in either cohort. Module 18, a chronic sarcoidosis endotype, was validated for its association with macrophage and lymphocyte differential. No association with any other traits was found in either cohort. Module 1, an extraocular organ involvement and PI3K activation, had a significant but weak association with PFT % pred values (figure 3a, table 3). This association was not validated possibly due to the weak association strength. No association with age, gender, Scadding stage, cell differentials, PFTs (basal) and FEV1/FVC ratio was found in either cohort. In general, validation for modules 47, 4, 33 and 18 were all significant with a less stringent significance level (hypergeometric test p<0.1) chosen due to the small number of overlapping features (table 3). Gene expression data adjustment for BAL cell differentials removed correlation with important clinical traits (supplementary table S3), suggesting that cell differentials are indeed indicative of sarcoidosis severity.
Phenotypic trait validation using supervised analysis
We also examined the overlap in genes significantly associated with each of the 12 overlapping traits between GRADS and Freiburg cohorts for validation of individual phenotypic trait in supervised analysis. Due to the small sample size of Freiburg cohort, we considered genes with a p-value <0.05 as significant in each cohort for comparison. Significant overlap was observed in five out of the 10 phenotypic traits (Scadding stage, neutrophils %, lymphocytes %, FVC %, FEV1 %; supplementary table S4). 1682 genes overlapped for Scadding with SPRY2, PLA2G7, CHIT1, RFTN1 and MMP9 among the most positively correlated. For FVC % pred and FEV1 % pred in both cohorts (regardless of endotype), we found 16 and 39 genes, respectively (Chi-squared p-values 7.3×10−15 and 1.5×10−2, respectively; supplementary table E4). Among the top negatively correlated genes with FVC % pred in both cohorts were COL1A2, IGFBP6, MAT2A, AQP9, GJA1 and EPDR1. The 39 genes highly negatively correlated with FEV1 % pred included CYP51A1, FADS1, COL1A1, HSPA7, LDLR and NFKB1. SIGLEC6, GIMAP6, GEMIN7, RAPGEF5, A2M and GHRL were among the most positively correlated with lymphocyte percentage, suggestive of increased inflammatory and immune response.
The substantial replication of the results despite differences in cohorts support the validity of our results.
Discussion
Our study based on genome-wide transcriptome analysis of 209 BAL samples represents the largest effort to examine the BAL transcriptome in sarcoidosis with pulmonary involvement. Using both supervised and unsupervised analysis, our study revealed specific gene profiles correlated with activity and severity of disease including immune, inflammatory and profibrotic mediators. Most importantly, our study identified four new groups of sarcoidosis patients with specific molecular, clinical and environmental characteristics.
The unsupervised analysis identified four gene modules that were strongly correlated with CT imaging features of pulmonary involvement, Scadding stage and BAL cell differentials. These modules further identified groups of patients with 1) hilar lymphadenopathy and acute lymphocytic inflammation, 2) extraocular organ involvement and PI3K activation, 3) chronic sarcoidosis and 4) multiorgan involvement with increased immune response. Identification of BAL gene modules represents a robust and novel molecular approach to address clinical heterogeneity of sarcoidosis patients initially classified into eight clinical phenotypes in the GRADS study [15]. BAL gene modules had the strongest correlation with multiple clinical features suggestive of pulmonary involvement, which might be the top factor to consider for further patient phenotyping, as suggested previously [43]. The gender response gene module clearly separated female from male patients based on expression of 15 out of 21 genes, reflecting the accuracy and sensitivity of our method. The same gene modules were confirmed in the Freiburg cohort and were suggestive of the same endotypes regardless of clinical and demographic differences present between two cohorts. Previously, unsupervised clustering approaches used solely clinical patient characteristics to successfully subgroup sarcoidosis patients based on organ involvement (lung, abdominal, heart, muscle and extrapulmonary) or to modulate sarcoidosis based on personal and environmental factors [44, 45]. Our study is the first to combine multiple clinical and environmental factors with BAL transcriptome data, in an unsupervised approach, to phenotype sarcoidosis with pulmonary involvement.
The supervised analysis identified numerous divergent and convergent gene expression patterns associated with hilar lymphadenopathy, parenchymal or airway involvement in sarcoidosis. Our results suggest that different transcriptional programmes affect the three major phenotypic manifestations of pulmonary involvement in sarcoidosis with Th1 and Th17 associated with hilar lymphadenopathy, TGFB1 and MTOR signalling for parenchymal involvement and IL-7 and IL-2 for airway involvement. These responses were among identified pathways known to be involved in pathogenesis of sarcoidosis [7, 9, 46, 47]. An increase in reticular abnormality observed in chest radiology of sarcoidosis patients was associated with increased molecular signalling in BAL such as growth factor, apoptosis/survival, mTORC1 and immune CD28 signalling. The activation of these signalling pathways is suggestive of BAL cells involvement in granuloma formation and fibrosis in sarcoid lungs [48–50]. Genes associated with race (black) were also associated with hilar lymphadenopathy, aberrant lipids pathways, parenchymal or airway involvement in sarcoidosis when race was used as individual variable in supervised analysis, but not in combination with other clinical and environmental traits in unsupervised analysis. Thus, our data do not allow determination of whether gene expression differences are race specific or directly related to the presence of sarcoidosis, as described previously [51], or due to differences in disease severity and manifestations. Genes increasing with increased lymphocyte fraction in BAL reveal shared transcriptional programmes related to hilar lymphadenopathy and bronchial wall thickening. Genes increasing with increased macrophage fraction in BAL are associated with increased cell differentiation and development pathways such as PI3K/AKT, MAPK and BMP7 signalling, suggesting indirectly that these BAL macrophages were in contact with inflammatory sites and granuloma core in lungs [52, 53]. Genes positively correlated with decreased eosinophil fraction in BAL were associated with decreased airway thickness and could reflect the level of lung inflammation, as well as on severity and progression of sarcoidosis.
Our study has several limitations including sample size, heterogeneity of clinical and environmental features and confounding effect of BAL cell differentials. Although our study collected a relatively large sample size for transcriptomics analysis, the original design that aimed for equal recruitment of patients with a wide range of presentations of sarcoidosis [15] resulted in a substantial fragmentation and heterogeneity of the patient populations. Despite these limitations we have been able to identify statistically significant gene expression patterns that associated with clinical attributes in our supervised analysis, and four distinct endotypes in our unsupervised analysis. While the validation cohort was different from our cohort in size, racial and clinical diversity, and clinical information collected, we were able to replicate our major findings, supporting future more detailed prospective validations of our findings. Another key limitation of our study is the impact of the BAL cell differentials on gene expression data. In many of our findings it was impossible to distinguish between the effect of gene expression changes, and the effect of change in cell counts, as has happened for gene modules 4 (hilar lymphadenopathy and acute lymphocytic inflammation) and 33 (multiorgan involvement with increased immune response) but not for modules related to sex, chronic sarcoidosis (module 18) or extraocular organ involvement (module 1). While an important distinction, this does not detract from the value of the information, as clearly cell counts do not provide any information regarding molecular mechanisms and pathways, whereas our data provide the information about the genes differentially expressed and relevant to distinct clinical attributes regardless of the cells involved. Considering the statistically significant and biological plausible gene expression signals we uncovered for sarcoidosis clinical attributes and endotypes, our results support future studies utilising novel technologies such as single-cell RNA sequencing [54] to better understand the mechanistic role of our discoveries. Finally, our study was not designed to specifically study the effects of race, disease duration or specific therapy on gene expression and indeed we did not find significant gene expression changes associated with these attributes. Our results suggest that treatment [55] and race effects [56] should be studied specifically, prospectively in studies using repeated sampling and single cell technologies as previously mentioned.
In summary, our study identified gene profiles associated with major phenotypic manifestations of pulmonary involvement in sarcoidosis, as well as identified four novel endotypes that help to better stratify patients in the future. Our findings support the design of future studies to focused on specific attributes of sarcoidosis, and the use of novel single cell profiling technologies leading to novel therapeutic interventions and biomarkers.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material ERJ-02950-2020.SUPPLEMENT
Supplementary table E1 ERJ-02950-2020.Table_E1
Supplementary table E2 ERJ-02950-2020.Table_E2
Supplementary table E3 ERJ-02950-2020.Table_E3
Supplementary table E4 ERJ-02950-2020.Table_E4
Shareable PDF
Supplementary Material
This one-page PDF can be shared freely online.
Shareable PDF ERJ-02950-2020.Shareable
Acknowledgements
We thank all patients participating in the GRADS study for contributing samples.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Author contributions: N. Kaminski, L.L. Koth, D.R. Moller, K.F. Gibson and W.P. Drake conceived and designed the experiments; K.F. Gibson, M. Gulati, M.J. Becich, H. Hochheiser, E.L Herzog, E.S. Chen, A. Morris , J.K. Leader, J.G.N. Garcia, S.R. Wisniewski, L.A. Maier, D.R. Moller, R.G. Collman and W.P. Drake conducted patient phenotyping, classification, supervised sample and data collection; M. Vukmirovic, T.S. Adams, T.N. Woolard and G. Deluliis performed the RNA sequencing experiments; J.C. Schupp and A. Prasse collected and generated the microarray data of the Freiburg cohort for validation; X. Yan, M. Vukmirovic, N. Kaminski, B. Hu, A. Mihaljinec, Y. Zhang, N. Emeagwali, P.V. Benos, J.C. Schupp and A. Prasse analysed the data; N. Kaminski, X. Yan, M. Vukmirovic and L.L. Koth supervised the analytical plan; M. Vukmirovic, X. Yan and N. Kaminski wrote the manuscript with input from all other authors. All authors have read and approved the manuscript.
Conflict of interest: M. Vukmirovic has nothing to disclose.
Conflict of interest: X. Yan has nothing to disclose.
Conflict of interest: K.F. Gibson has nothing to disclose.
Conflict of interest: M. Gulati reports grants from NIH, during the conduct of the study; personal fees for advisory board work and other (PI/publication committee) from Boehringer Ingelheim, other (lectures) from France Foundation, other (PI/centre director) from Pulmonary Fibrosis Foundation, grants from NIH and Sarcoidosis Research Foundation, outside the submitted work.
Conflict of interest: J.C. Schupp has nothing to disclose.
Conflict of interest: G. DeIuliis has nothing to disclose.
Conflict of interest: T.S. Adams has nothing to disclose.
Conflict of interest: B. Hu has nothing to disclose.
Conflict of interest: A. Mihaljinec has nothing to disclose.
Conflict of interest: T.N. Woolard has nothing to disclose.
Conflict of interest: H. Lynn has nothing to disclose.
Conflict of interest: N. Emeagwali has nothing to disclose.
Conflict of interest: E.L Herzog reports grants from NIH, Sanofi, Bristol Myers and Promedior, personal fees for consultancy from Boehringer Ingelheim and Pfizer, outside the submitted work.
Conflict of interest: E.S. Chen has nothing to disclose.
Conflict of interest: A. Morris reports grants from NIH, during the conduct of the study.
Conflict of interest: J.K. Leader has nothing to disclose.
Conflict of interest: Y. Zhang has nothing to disclose.
Conflict of interest: J.G.N. Garcia has nothing to disclose.
Conflict of interest: L.A. Maier grants from NIH (1U01 HL112695-01, U01 HL112707-03) and NIH/NCRR (UL1TRR002535), during the conduct of the study; grants from National Institutes of Health (1R01 HL127461-01A1, R01HL136681-01A1, 1R01 HL140357-01A1, R01HL136681-01A1), FSR, University of Cinncinati under a Mallinckrodt foundation, MNK14344100, ATYR1923-C-002, outside the submitted work; and is a member of the FSR scientific advisory board, for which no compensation is received.
Conflict of interest: R.G. Collman reports grants from National Institutes of Health, during the conduct of the study.
Conflict of interest: W.P. Drake has nothing to disclose.
Conflict of interest: M.J. Becich reports grants from NCATS, NCI, PCORI, NHLBI and CDC NIOSH, other (startup) from SpIntellx, during the conduct of the study; other (startup) from SpIntellx, outside the submitted work; and has patents SpIntellx (multiple) pending.
Conflict of interest: H. Hochheiser has nothing to disclose.
Conflict of interest: S.R. Wisniewski has nothing to disclose.
Conflict of interest: P.V. Benos has nothing to disclose.
Conflict of interest: D.R. Moller reports grants from NHLBI (1U01HL112708), during the conduct of the study; personal fees for consultancy from Merck, aTYR and Roivant, personal fees for advisory board work from SarcoMed, personal fees for consultancy/witness from Legal Expert, other (royalties) from Hodder Education and Taylor & Francis Group, outside the submitted work; has patents number 9,683,999 B2 issued, and number 9,977,029 B2 issued; is Chairman and Chief Technical Officer of Sarcoidosis Diagnostic Testing, LLC (a company whose goal is to develop a diagnostic blood test for sarcoidosis) and has received funding including past salary support under the NHLBI STTR programme, grant R41 HL129728 more than 3 years ago; and is a former member of the Scientific Advisory Board of the Foundation for Sarcoidosis Research.
Conflict of interest: A. Prasse reports personal fees for lectures and consultancy and non-financial support for meeting attendance from Boehringer Ingelheim and Roche, personal fees for lectures from Novartis and AstraZeneca, personal fees for consultancy from Amgen, Pliant and Nitto Denko, outside the submitted work.
Conflict of interest: L.L. Koth has nothing to disclose.
Conflict of interest: N. Kaminski reports personal fees for consultancy and/or advisory board work from Biogen Idec, Boehringer Ingelheim, Third Rock, Samumed, NuMedii, Indaloo, Theravance, LifeMax, Three Lake Partners, RohBar and Pliant, non-financial support from Miragen, equity with Pliant, a grant from Veracyte; all outside the submitted work; and has a patent New Therapies in Pulmonary Fibrosis and on Peripheral Blood Gene Expression that have been licensed to Biotech.
Support statement: This work is supported by NIH grants: U01 HL112707, U01 HL112694, U01 HL112695, U01 HL112696, U01 HL112702, U01 HL112708, U01 HL112711, U01 HL112712, UL1 RR029882, UL1 RR025780, R01 HL110883, R01 HL114587, R01 HL127349, U01 HL137159, CTSI U54 grant 9 UL1 TR000005, CDC NMVB 5U24 OH009077, CTSA UL1 TR002535, R21 LM012884, German Network for Lung Research (DZL/ BREATH) and Fraunhofer CIMD Forschungscluster. Funding information for this article has been deposited with the Crossref Funder Registry.
- Received July 28, 2020.
- Accepted April 20, 2021.
- Copyright ©The authors 2021. For reproduction rights and permissions contact permissions{at}ersnet.org