Abstract
Surfactant protein D (SP-D) is produced primarily in the lung and is involved in regulating pulmonary surfactants, lipid homeostasis and innate immunity. Circulating SP-D levels in blood are associated with chronic obstructive pulmonary disease (COPD), although causality remains elusive.
In 4061 subjects with COPD, we identified genetic variants associated with serum SP-D levels. We then determined whether these variants affected lung tissue gene expression in 1037 individuals. A Mendelian randomisation framework was then applied, whereby serum SP-D-associated variants were tested for association with COPD risk in 11 157 cases and 36 699 controls and with 11 years decline of lung function in the 4061 individuals.
Three regions on chromosomes 6 (human leukocyte antigen region), 10 (SFTPD gene) and 16 (ATP2C2 gene) were associated with serum SP-D levels at genome-wide significance. In Mendelian randomisation analyses, variants associated with increased serum SP-D levels decreased the risk of COPD (estimate −0.19, p=6.46×10−03) and slowed the lung function decline (estimate=0.0038, p=7.68×10−3).
Leveraging genetic variation effect on protein, lung gene expression and disease phenotypes provided novel insights into SP-D biology and established a causal link between increased SP-D levels and protection against COPD risk and progression. SP-D represents a very promising biomarker and therapeutic target for COPD.
Abstract
Surfactant protein D is a causal risk factor for COPD http://ow.ly/n1OG30eUQlf
Introduction
Chronic obstructive pulmonary disease (COPD) is currently the third leading cause of death worldwide [1]. The disease is characterised by airway obstruction that is not fully reversible, and is caused by a complex interaction between genetic and environmental risk factors (e.g. smoking) [2]. Although one-third of the variability in lung function can be attributed to genetic factors, the genes that directly control the risk of COPD remain largely unknown [3].
A critical class of molecules involved in lung growth and homeostasis are surfactants. One surfactant-associated protein that shows correlation with COPD, even in the absence of cigarette smoke, is surfactant protein D (SP-D) [4]. SP-D is a large multimeric collagenous glycoprotein weighing ∼43 kDa and is part of the collectin family of proteins [5]. SP-D is produced primarily by type II alveolar cells, as well as by club cells [6, 7]. SP-D plays an important role in regulating pulmonary surfactants, maintaining lipid homeostasis in the lung [8] and promoting innate immunity [9] to protect the lungs from microbial and chemical insults [10]. SP-D deficiency in mice leads to alveolar macrophage activation, increased oxidant stress in the airways and emphysematous changes in lung parenchyma by 3 weeks of age [11]. In humans, single-nucleotide polymorphisms (SNPs) in the surfactant protein D (SFTPD) gene on chromosome 10 have been associated with COPD in several different cohorts [12]. The most-studied SFTPD variant (rs721917) is a coding SNP, which results in the substitution of methionine for threonine at amino acid 11 (Met11Thr) [13]. This SNP is associated with SP-D serum levels and emphysema [14]. COPD is associated with reduced levels of SP-D in bronchial and bronchoalveolar lavage (BAL) fluids [15], as well as increased serum SP-D levels [16]. Despite these associations it remains uncertain whether SP-D is a biomarker of COPD or is part of the causal pathway leading to COPD.
One way of evaluating causality in biomarker-disease associations is to use Mendelian randomisation (MR), which utilises genetic variation to ascertain causal associations [17–19]. MR exploits two unique attributes of genotypes enabling robust causal interrogation: 1) the random allocation of parental alleles to zygotes at meiosis, independent of environmental exposures later in life and 2) the unidirectional flow of biological information from gene to transcript to protein, thus avoiding reverse causation [20]. Within the MR framework, if the relationship between SP-D and COPD is truly causal, the genetic variant(s) influencing serum SP-D levels should also be significantly associated with COPD risk and/or forced expiratory volume in 1 s (FEV1) decline over time.
In the present study, we unravelled the genetic architecture of SP-D serum levels in a large cohort of COPD patients to determine SP-D protein quantitative trait loci (pQTLs) and mRNA expression quantitative trait loci (eQTLs) and used MR to demonstrate a causal relationship between serum SP-D and two important phenotypes: COPD risk and FEV1 decline over time.
Methods
Overall study design
The overall study design is depicted in figure 1. First, we identified pQTLs for SP-D using a genome-wide association study (GWAS) of serum SP-D levels in COPD patients, who participated in the Lung Health Study (LHS) [21]. Second, using data from the Lung eQTL Consortium [22], we determined which of these pQTLs were also associated with changes in gene expression (eQTLs) in lung tissue. In addition, we used gene expression data from lung tissue to identify networks of genes strongly related to SFTPD. Third, we determined which of the SP-D pQTLs were associated with risk to COPD and COPD progression defined by the rate of decline in FEV1 over 11 years. Finally, we applied a MR framework to determine whether there was a causal relationship between serum SP-D levels and COPD risk or progression.
The Lung Health Study
The details of the LHS have been published previously [21, 23]. Briefly, LHS was a multicentre clinical study that evaluated the effects of ipratropium bromide and smoking cessation on lung function decline in current smokers with mild to moderate COPD. The participants were recruited from a total of 10 sites, including nine in the United States and one Canadian site, as previously described [21, 23]. For the first 5 years, the lung function of participants was measured annually, followed by another lung function measurement at year 11. In year 5 of the LHS, venipuncture was performed on 5413 LHS participants. Genotyping was performed using buffy coat samples of 4251 European Americans in the LHS. The details of genotyping and quality control have been described previously [24]. Briefly, samples were genotyped using the Human 660WQuad v.1_A BeadChip (Illumina, San Diego, CA, USA). Imputation was undertaken using the Michigan Imputation Server [25] using the Haplotype Reference Consortium [26] panel. Variants were excluded if the imputation r2 was <0.7 and if the minor allele frequency was <1%.
Serum SP-D measurements in the LHS
Using one aliquot of serum samples from blood withdrawn in the year 5 visit, SP-D levels were measured in duplicate in 4754 LHS participants using a commercially available solid-phase sandwich ELISA according to the manufacturer's instructions (BioVendor Laboratory Medicine, Modrice, Czech Republic).
pQTLs of serum SP-D: GWAS of SP-D serum levels
We performed a GWAS for log serum SP-D levels using SNPTEST [27] assuming an additive genetic model and adjusting for age, sex, body mass index (BMI), smoking status and the first five genetic principal components. The final GWAS dataset included 4041 subjects. Conditional analysis was performed to reveal independent SNPs in the significant regions associated with SP-D levels (online supplementary material). Using this approach, we identified three additional independent SNPs on the chromosome 10 locus.
Association of SP-D pQTLs with lung tissue expression in the lung eQTL study
We used data from the lung eQTL study to identify potential effect of SP-D pQTLs on lung gene expression. The study details and the subjects' characteristics have been described previously [22]. Briefly, lung eQTLs were derived from a meta-analysis of genotyping and mRNA expression data performed in nontumour lung tissue samples from 1037 patients who underwent lung resection surgery. The eQTL analysis was adjusted for age, sex and smoking status. Genome-wide significant pQTLs for serum SP-D levels were tested for association with lung gene expression (i.e. lung eQTLs).
Association of SP-D pQTLs with COPD risk and progression
The relationship of SNPs identified as pQTLs for serum SP-D with COPD risk was investigated in the International COPD Genetics Consortium (ICGC) study. The ICGC study is the largest COPD GWAS to date, and the lookup was performed in GWAS results for individuals with European ancestry with a sample size of 11 157 cases and 36 699 controls [28].
In addition, SP-D pQTLs were tested for association with FEV1 decline over the 11 years of follow-up in the LHS using a multiple linear mixed effects model. All analyses in LHS were adjusted for age, sex, BMI and smoking status, as noted previously.
Mendelian randomisation analysis
The independent pQTLs for SP-D identified in this study were used as genetic instruments in MR analyses. We used two MR methods: inverse variance weighting (IVW) and MR-Egger. The IVW MR approach was undertaken by regression of the genetic associations with the outcome (COPD risk from the ICGC study and lung function phenotypes in the LHS) on the genetic associations with observed SP-D serum levels in an inverse variance weighted linear regression with the intercept constrained to zero while accounting for any correlation (linkage disequilibrium) between the tested variants [29, 30]. The MR-Egger approach [31, 32] takes into account the presence of pleiotropy (SNPs associated with multiple unrelated phenotypes) and employs the same IVW approach, but with the intercept unconstrained. The intercept represents the average pleiotropic effect across the tested SNPs on the outcome. If the intercept differs from zero (the MR-Egger test), then there is evidence of directional pleiotropy [32]. In both approaches, we additionally tested for heterogeneity to identify whether or not the effect of SNPs is homogenous on the outcome.
Weighted gene co-expression analysis
To identify genes whose expression in lung correlated with the SFTPD gene, the weighted gene co-expression analysis (WGCNA) R package [33] was used to cluster modules of co-expressed genes from 1037 lung tissue samples. Details of the WGCNA approach are available in the online supplementary material. Briefly, modules are determined based on the co-expression patterns, and “hub” genes are ones with the highest correlation to their module expression values (referred to as module membership).
Statistical analysis software
All analyses were performed using R (version 3.2.1; www.r-project.org). Detailed and additional methods are available in the online supplementary material.
Results
Descriptive demographics of LHS participants
The demographics of LHS participants across the quintiles of serum SP-D levels are shown in table 1. Serum SP-D levels were significantly related to increasing age, decreasing BMI and FEV1 and smoking status (with continuous smokers having higher average serum SP-D levels than sustained ex-smokers).
GWAS of serum SP-D levels
Variants with minor allele frequency <1% or imputation quality <0.7 were filtered out, leaving 7 807 998 variants in this GWAS analysis. Quantile–quantile plots are presented in online supplementary figure S1 and showed a deviation from the expected distribution only for low p-values, indicating strong association signals. The genomic inflation factor (λ) was 1.02, suggesting no systematic deviation in the association statistics due to factors such as population structure.
A total of 1633 SNPs representing three regions on chromosomes 6, 10 and 16 achieved genome-wide significance (p<5×10−8). The Manhattan plot is shown in figure 2 and the region plots are shown in figure 3. The most significantly associated pQTL was an intronic SNP (rs34406153) in the SFTPD gene (p=2.02×10−99). The sentinel SNP in the second associated locus was an intergenic SNP (rs12660817) in the human leukocyte antigen (HLA) region on chromosome 6 (p=1.00×10−55). The third locus harboured the intronic SNP (rs9927461) in the ATPase secretory pathway Ca2+ transporting 2 (ATP2C2) gene on chromosome 16 (p=3.36×10−18).
Conditional analyses on chromosome 10 identified an additional three independent signals exceeding genome-wide significance (online supplementary figure S2). The regions on chromosomes 6 and 16 did not harbour any additional independent loci, which were significant at the p<5×10−8 threshold. Genome-wide significant pQTLs for serum SP-D in these three regions are presented in table 2. SNP rs721917, which was reported recently by the ICGC consortium to be associated with COPD [28] was a pQTL for serum SP-D (p=2.34×10−13). This SNP had a modest linkage disequilibrium with the other sentinel SNPs in the chromosome 10 region, with linkage disequilibrium values ranging from r2=0.16 (SNP rs726014) to r2=0.35 (SNP rs12358676).
The effect of SP-D pQTLs on lung mRNA expression
Given that SP-D is predominantly synthesised in the lung, the three loci harbouring pQTLs for serum SP-D were evaluated in lung tissue to determine whether these regions also had an effect on lung gene expression (i.e. lung eQTLs) [22]. The top eQTL results (based on the lowest eQTL p-value per gene) for each of the loci are shown in table 3. The pQTL SNPs in the SFTPD locus on chromosome 10 were also lung eQTLs for SFTPD and the SNPs' effects on serum protein levels were significantly related to their effect on mRNA expression in lung tissue (figure 4). We additionally report in table 2 the effect of SP-D pQTLs identified in our GWAS on lung tissue expression for the SP-D encoding gene SFTPD (i.e. lung eQTLs for SFTPD). Three of the four pQTLs in table 2 were significant eQTLs for SFTPD gene expression in lung tissue (p<0.01) with the same direction of effect to the protein levels in serum. Together, these data suggest SNPs in the SFTPD gene region affect serum SP-D by altering the levels of gene expression in lung tissue. In contrast, the pQTL SNPs in chromosomes 6 and 16 were not related to SFTPD mRNA levels in the lung, indicating that they affect serum levels through other mechanisms (online supplementary figure S3).
Mendelian randomisation analyses
The six pQTL SNPs for serum SP-D levels were tested for association with COPD risk and progression (defined by FEV1 decline over 11 years). The associations with COPD risk were extracted from the ICGC study meta-analysis. The association with COPD progression was performed using FEV1 decline over 11 years in participants in the LHS. The combined SNP allelic effects on serum SP-D levels and risk of COPD and FEV1 decline were then evaluated using two different MR frameworks: IVW and MR-Egger. In the ICGC cohort, IVW MR analysis revealed that SNPs associated with increased serum SP-D levels decreased the risk of COPD (estimate −0.19 change in COPD log odds ratio for every unit increase in log SP-D, p=6.46×10−03; figure 5a). In the LHS, IVW MR analysis revealed that those SNPs that increased serum SP-D levels slowed the FEV1 decline over 11 years (estimate 0.0038 less decline in FEV1 slope per unit increase in log SP-D, p=7.68×10−3; figure 5b). Similarly, the MR-Egger method demonstrated a significant protective relationship between SP-D and FEV1 decline (estimate 0.0082 less decline in FEV1 slope per unit increase in log SP-D, p=1.97×10−02) as well as COPD risk (estimate −0.42 change in COPD log odds ratio for every unit increase in log SP-D, p=3.51×10−03). The MR-Egger intercept p-value was not significant (p=0.15), indicating the absence of pleiotropy in the MR IVW SNPs. The heterogeneity p-value for all MR tests was not significant (p>0.05), indicating that the SNPs had a relatively similar effect on COPD risk and progression. Results from all MR analyses are shown in table 4 and the MR-Egger plots are presented in online supplementary figure S4.
WGCNA and SFTPD co-expression in lung tissue
To further understand the biology of SP-D, we searched for genes that were strongly correlated with SFTPD gene expression in lung tissue. Using the consensus network approach, we identified 38 distinct modules containing strongly co-expressed genes in the lung expression data. The module containing the SFTPD gene included 364 different genes, which were enriched for pathways related to lipid, cholesterol and fatty acids metabolic and biosynthesis processes (online supplementary table S1). This network construction additionally allowed us to identify “hub” genes, which, by definition, had the highest module membership value for a given module. The hub gene in the module containing SFTPD was surfactant-associated 3 (SFTA3). The top 50 genes with the strongest correlation with SFTPD are shown in online supplementary figure S4. Genes strongly co-expressed with SFTPD included CRTAC1, CACNA2D2 and S100.
Discussion
Despite its high prevalence and impact, there are currently no biomarkers or disease-modifying therapies for COPD. Establishing causality for selected proteins and pathways is one promising step toward their development as both biomarkers and therapeutic targets.
In this study, we determined the effect of genetic variation on SP-D protein levels in serum, mRNA levels in lung tissue and disease phenotypes to identify a potential causal role for SP-D in the aetiology of COPD. We identified six independent loci that were significantly associated with serum SP-D levels, including four independent SNPs at the SFTPD locus on chromosome 10, plus two distal regions of the genome (in the HLA locus on chromosome 6 and the ATP2C2 gene region on chromosome 16). Importantly, integrating with lung tissue-specific eQTL data, we showed that SNPs in the SFTPD gene probably impact serum SP-D by modulating SFTPD mRNA levels in the lung. In contrast, the genetic signal on chromosome 16 appears to be related to expression of the ATP2C2 gene, and thus its effect on serum SP-D is likely to be indirect. Most importantly, using two MR approaches, we showed that SNPs that were associated with higher serum SP-D levels imparted lower risk for COPD and FEV1 decline over time with no evidence for heterogeneity across SNPs on these phenotypes. Together, these data suggest a causal role of SP-D in the pathogenesis of COPD.
A study by Kim et al. [34] identified the three loci that were pQTLs for SP-D on chromosomes 6, 10 and 16. A more recent study by Sun et al. [35] additionally identified SNP rs2146192 in the SFTPD gene region as associated with SP-D blood levels in COPD subjects. This SNP is in modest linkage disequilibrium with the top SP-D pQTL in our study (r2=0.53). Using conditional analyses, Sun et al. identified two other independent pQTL SNPs: one on chromosome 10 and the other in the HLA region on chromosome 6. In the current study, we confirm and extend the findings of Kim et al. and Sun et al. We replicated the same three loci from the study by Kim et al. and identified several additional SNPs in the SFTPD, HLA and ATP2C2 regions that act as pQTLs for SP-D levels. In total, the three loci explained 30.7% of the variation in serum SP-D levels. The strongest pQTLs for SP-D in this study were in the SFTPD gene encoding the SP-D protein, suggesting that serum levels are driven in part by SP-D production in lung tissue. Consistent with this notion, we observed that the SNPs' effect on SFTPD mRNA levels in lung tissue is directionally consistent with their effect on serum SP-D levels. In addition, there were no distal genetic determinants of SFTPD expression in lung tissue suggesting the SP-D serum pQTLs on chromosome 10 captured most of the regulatory effects on SP-D production in the lung.
The identification of distal pQTLs may provide important insights into SP-D biology. The chromosome 16 locus contains the ATP2C2 gene, which is expressed in many tissues (including the lungs) and functions as a transporter of Ca2+ ions [36]. However, SNPs in the ATP2C2 region had no effect on SFTPD mRNA levels in lungs; mRNA levels of ATP2C2 were positively correlated with SFTPD mRNA expression (r=0.2, p=6.9×10−11). Together, these data suggest ATP2C2 may regulate serum SP-D through an indirect mechanism such as post-translational modification, transport and/or clearance, among others, or through a shared functional pathway.
In lung tissue, SFTPD was mapped to a module of 364 strongly co-expressed genes which were enriched for pathways related to lipid and cholesterol biosynthesis. Surfactant-associated 3 (SFTA3) was the “hub” gene and its expression was positively related to SFTPD. SFTA3 is an immunoregulatory protein in the lung, and may play an important role in inflammation and immune defence [37]. Other genes that were strongly co-expressed with SFTPD included cartilage acidic protein 1 (CRTAC1), the calcium channel, voltage-dependent, α2/δ subunit 2 (CACNA2D2) and S100 calcium-binding protein A14 (S100A14) genes. CRTAC1 is upregulated (>10-fold) in type 2 epithelial cells as they differentiate in culture [38]. Previous studies have shown that binding of SP-D to phosphatidylinositol [39], toll like receptor (TLR)2 and TLR4 [40] and carbohydrate structures on the surface of bacteria, viruses and fungi is calcium dependent [41–46]. Calcium may also cause SP-D oligomers to self-aggregate, as has been shown previously with surfactant protein A [47]. Further work is needed to unravel the mechanisms underlying calcium-related gene associations with SP-D gene expression and serum levels.
In the LHS, there was a modest negative correlation between serum SP-D levels and FEV1 (r=−0.063, p<1×10−4), consistent with several other studies [48, 49]. In addition, serum SP-D levels were correlated with accelerated FEV1 decline using a linear mixed effect regression (estimates −0.0068, p=6.60×10−23). Superficially, these relationships appear to be paradoxical to the main findings of the present study (opposite direction of association). However, cross-sectional relationships between lung function and serum SP-D have to be interpreted with caution because of the possibility of reverse causation (i.e. pathological processes that contribute to COPD causing increased leakage of SP-D from lungs into the systemic circulation) and confounding (e.g. smoking). For instance, we and others have previously shown that serum SP-D is affected not only by lung synthesis of SP-D, but also by permeability of the alveolar–capillary interface, which is perturbed in COPD and increased by active smoking and lung inflammation [48, 50]. Furthermore, it is notable that in the LHS none of the study participants were taking inhaled or systemic steroids at the time of blood assessment. Thus, the current results were not impacted by steroids. Genetic analyses avoid most of these pitfalls since the causal pathway between genetic variants and protein and disease is unidirectional. In the present study, MR results supported a significant protective role for SP-D on both risk of COPD and FEV1 decline.
There is a huge unmet clinical need to identify novel biomarkers and therapeutic targets for COPD. SP-D represents an excellent candidate since 1) it is a lung-specific protein that can be reliably assayed in blood, BAL fluid and lung tissue; 2) it tracks well with disease severity and exacerbations [49]; 3) it is responsive to steroid treatment [49]; and 4) it has inherent functional and biological attributes that suggest a role in COPD pathogenesis [10]. Moreover, this study is the first to our knowledge to report a novel causal link between genetically elevated SP-D serum levels and decreased risk for COPD and decelerated FEV1 decline.
The current study has a number of limitations. First, LHS participants were all smokers with mild or moderate COPD at the time of recruitment, and this may have affected the serum measures of SP-D as shown by the discordance between genetic and observational analyses. Second, the serum and lung mRNA expression were measured in different individuals, which prevents direct comparisons between the two sites. However, the use of genetic variation provided a robust method to estimate and extrapolate the findings for mRNA and protein between lung tissue and blood. Third, SP-D was detected in serum using ELISA, which may not differentiate between intact (dodecameric) and monomeric or trimeric forms of SP-D. Further quantitative studies of different multimeric SP-D protein structures are warranted.
In conclusion, the current study investigated the effect of genetic variation on serum SP-D levels, lung tissue gene expression and COPD risk and progression. The findings provided novel insights into SP-D biology and support a novel causal link between increased SP-D levels and protection against COPD risk and accelerated FEV1 decline. SP-D represents a very promising biomarker and therapeutic target for COPD which warrants further study.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material ERJ-00657-2017_Supplement
Disclosures
Supplementary Material
Author disclosures ERJ-00657-2017_Disclosures
Acknowledgements
Authors' contributions are as follows. Conception and design of the study: M. Obeidat, P.D. Paré and D.D. Sin; lung expression quantitative trait loci data collection and analysis: K. Hao, D.C. Nickle, M. van den Berge, W. Timens, Y. Bossé and P. Joubert; International COPD Genetics Consortium genome-wide association study data collection and analysis: M.H. Cho, B.D. Hobbs, K. de Jong and M. Boezen; Lung Health Study data analyses: M. Obeidat, X. Li, G. Zhou, N. Fishbane, N.N. Hansel, T.H. Beaty, N. Rafaels, R. Mathias, I. Ruczinski and K.C. Barnes; statistical support and advice on study conduct: R. Hung and S. Burgess; writing the manuscript: M. Obeidat, P.D. Paré and D.D. Sin; all co-authors discussed results and implications and commented on the manuscript at all stages.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Support statement: M. Obeidat is a Postdoctoral Fellow of the Parker B. Francis Foundation. He is also a recipient of British Columbia Lung Association Research Grant. K. Hao is partially supported by National Natural Science Foundation of China grant no. 21477087. M.H. Cho is recipient of National Institutes of Health grant R01 HL113264, and he also received grant support from GlaxoSmithKline. D. Sin is a tier 1 Canada Research Chair in COPD.
Conflict of interest: Disclosures can be found alongside this article at erj.ersjournals.com
- Received March 28, 2017.
- Accepted August 22, 2017.
- Copyright ©ERS 2017