Abstract
Background Genetic factors and smoking contribute to chronic obstructive pulmonary disease (COPD), but whether a combined polygenic risk score (PRS) is associated with incident COPD and whether it has a synergistic effect on smoking remains unclear. We aimed to investigate the association of the PRS with COPD and explore whether smoking behaviours could modify such association.
Methods Multivariable Cox proportional hazards models were used to estimate hazard ratios (HRs) and 95% confidence intervals for the association of the PRS and smoking with COPD.
Results The study included 439 255 participants (mean age 56.5 years; 53.9% female), with a median follow-up of 9.0 years. PRSlasso containing 2.5 million variants showed better discrimination and a stronger association for incident COPD than PRS279 containing 279 genome-wide significance variants. Compared with low genetic risk, the HRs of medium and high genetic risk were 1.39 (95% CI 1.31–1.48) and 2.40 (95% CI 2.24–2.56), respectively. The HR of high genetic risk and current smoking was 11.62 (95% CI 10.31–13.10) times that of low genetic risk and never smoking. There were significant interactions between PRSlasso and smoking status for incident COPD (pinteraction<0.001). From low genetic risk to high genetic risk, the HRs of current smoking increased from 4.32 (95% CI 3.69–5.06) to 6.89 (95% CI 6.21–7.64) and the population-attributable risks of smoking increased from 42.7% to 61.1%.
Conclusions The PRS constructed from millions of variants below genome-wide significance showed significant associations with incident COPD. Participants with a high genetic risk may be more susceptible to developing COPD when exposed to smoking.
Abstract
The polygenic risk score, constructed from 2.5 million variants, showed a significant association with incident COPD. Individuals with a high genetic risk may be more vulnerable to the lung-damaging effects of smoking and develop COPD. https://bit.ly/2T3lgub
Introduction
Chronic obstructive pulmonary disease (COPD) is the most prevalent chronic respiratory disease, and is characterised as a progressive and not fully reversible airflow limitation. In 2017, 3.20 million deaths were attributed to COPD worldwide, 23% more than the number of deaths in 1990 [1]. Cigarette smoking is the major risk factor for COPD, mainly causing chronic inflammatory responses and oxidative stress [2]. Plenty of evidence has shown that avoiding smoking can reduce the risk of COPD [3]. However, a striking proportion of 25–45% of COPD cases occur in never-smokers and only 25% of continuous smokers will develop incident COPD [4, 5]. These findings highlight the potential importance of other risk factors, one of which is the genetic structure.
Over the past decade, genome-wide association studies (GWASs) have successfully identified multiple genetic variants associated with the risk of COPD and COPD-related phenotypes [6–8], including FAM13A, HHIP, RIN3, CHRNA3/5 and IREB2 [9–13]. However, each common variant's impact consistently and significantly associated with the risk of COPD is modest, indicating that individual genetic variation only explains a small fraction of COPD susceptibility. Aggregating multiple single nucleotide polymorphisms (SNPs) with small effects to generate a composite polygenic risk score (PRS) may elucidate the genetic risk of complex diseases. Recently, based on case–control studies, a PRS that contains millions of SNPs that have not reached genome-wide significance has been verified to predict a higher risk for developing various diseases, including COPD [8, 14]. However, it is still unclear whether the PRS is associated with new-onset COPD events, and whether smoking and genetic factors have synergistic effects on COPD is still controversial.
The purpose of this study was to evaluate whether the PRS constructed from 2.5 million variants below genome-wide significance is associated with the risk of incident COPD in a large population-based cohort and whether there are differences in the effects of smoking among different genetic risks.
Methods
Study design
The UK Biobank study recruited more than 500 000 participants aged 40–69 years from the general population at 22 assessment centres throughout the UK between 2006 and 2010 [15]. Participants provided information on health-related aspects through extensive baseline questionnaires, verbal interviews and physical measurements. Self-reported ethnicities were categorised as Mixed, Black/African, East Asian, White/European, South Asian and Unknown. Participants were excluded if they withdrew from the study, their genotype data did not meet the quality control conditions, they had a relatedness of second degree or higher, or they had a history of COPD (figure 1). The UK Biobank received ethical approval from the Research Ethics Committee (REC 11/NW/0382) and participants provided written informed consent. Any additional ethical approval was adjudged unnecessary for the present study.
Polygenic risk score
Based on the same GWAS, two sets of PRSs comprised of genetic variants related to lung function that showed the ability to predict COPD prevalence were included in this study [7, 8]. PRS279 was created following an additive model for 279 genome-wide significance variants (supplementary table E1) [7]. The number of risk alleles was summed after multiplication with the effect size between the SNPs and forced expiratory volume in 1 s (FEV1)/forced vital capacity (FVC). PRSlasso was created using a weighted sum of the two PRSs for FEV1 and FEV1/FVC [8], which were penalised using lasso regression to simplify the final model. A total of 2.5 million SNPs were calculated using lassosum version 0.4.5 [16]. The detailed derivation of PRSlasso is shown in the supplementary material. Then, the two PRSs were categorised into low (lowest quintile), intermediate (quintiles 2–4) and high (highest quintile) risk.
Smoking status and pack-years
Touchscreen questionnaires collected information on smoking status and pack-years at baseline. Detailed definitions of smoking status and pack-years of smoking are provided in supplementary table E2. All participants were categorised as never, former and current smoking according to their smoking status, or as no (0), light (0.1–19.9), intermediate (20–39.9) and heavy (≥40) smoking according to pack-years of smoking.
COPD and lung function
Trained healthcare technicians and nurses at UK Biobank assessment centres used a Pneumotrac 6800 spirometer (Vitalograph, Maids Moreton, UK) to perform lung function tests. Each participant performed two to three tests, and was analysed using the maximum acceptable value of FVC and FEV1 [17]. Participants with incident COPD were identified as having a diagnosis in hospital admission electronic health records or death register data, or lung function test FEV1/FVC ratio below the Global Lung Function Initiative (GLI) 2012 reference values [18] for the lower limit of normal post the date of baseline assessment [19]. We calculated the follow-up time from the date of attendance until the date of first diagnosis, date of death or 25 February 2018 for Wales and England, or 28 February 2017 for Scotland, whichever occurred first (supplementary table E3).
Statistical analyses
The associations between two sets of PRSs and incident COPD across strata of ethnicity were assessed by the area under the curve from receiver operating characteristic (ROC) models and the Harrell C-statistic from Cox proportional hazards models. We assessed whether Cox proportional hazards models and correlated ROC curves were significantly different using ANOVA and the Delong test.
The characteristics of the participants were summarised across incident COPD status as number (percentage) for categorical variables, mean with standard deviation for normally distributed variables and median (interquartile range (IQR)) for skewed variables. The association between genetic risk categories, smoking categories, and the combination of genetic and smoking categories (nine categories with low genetic risk and never smoking as a reference; 12 categories with low genetic risk and no smoking pack-years as a reference) and incident COPD were explored using multivariable Cox proportional hazards models. The covariates included in this study were recognised risk factors for COPD [20], which were unevenly distributed among the exposure groups. All models were adjusted for age, sex, education, socioeconomic status (household income and Townsend deprivation index [21]), body mass index, physical activity, healthy diet, alcohol consumption, passive smoking, occupational exposure, third-degree relatedness of individuals in the sample, genotyping chip and the first 20 principal components of ancestry (supplementary table E2). Detailed information on the number of missing covariates is shown in supplementary table E4. We used multiple imputations by chained equations to impute missing covariate values with the mice package version 3.12.0 [22]. The assumption for proportional hazards was evaluated by tests based on Schoenfeld residuals [23]; violation of this assumption was not observed in our analyses. Moreover, interactions between PRS and smoking status or pack-years were tested by adding the cross-product term to Cox models.
The associations between genetic risk and smoking pack-years and incident COPD were evaluated on a continuous scale with restricted cubic spline curves based on multivariable Cox proportional hazards models. To balance best fit and over-fitting in the main splines for incidence, the number of knots, between three and five, was chosen as the lowest value for the Akaike Information Criterion, but if within two of each other for different knots, the lowest number of knots was chosen. The sensitivity and subgroup analysis methods are shown in the supplementary material.
The population-attributable fraction (PAF), an estimate of the proportion of events that would have been prevented if all individuals would have been in a lower smoking category, was calculated [24]. Analyses were undertaken using R version 3.6.1 (R Center for Statistical Computing, Vienna, Austria). A p-value <0.05 (two-sided) was considered significant. Because we tested the joint association of genetic risk and smoking to maximise the likelihood of reporting true findings, we conservatively corrected for multiple testing using Bonferroni correction and set a significance level of 0.05/11=0.0045.
Results
Participant's characteristics
The process of enrolling participants in this study is shown in figure 1. The overall study population included 439 255 participants (mean±sd age 56.5±8.0 years), of which 236 795 (53.9%) individuals were female (supplementary table E5). PRSlasso showed better discrimination for incident COPD (PRSlasso C-statistic 0.60; PRS279 C-statistic 0.53; p<0.001) (supplementary table E6), while PRSlasso also showed a greater risk of incident COPD for each quintile score increase among the White/European participants (PRSlasso hazard ratio (HR) 1.24, 95% CI 1.23–1.26; PRS279 HR 1.07, 95% CI 1.05–1.08) (supplementary figure E1). However, there was no significant association between the two sets of PRSs and incident COPD in other ethnic groups, except for Black/African individuals, with HR 1.26 (95% CI 1.00–1.58) per quintile PRSlasso increase. We subsequently analysed only individuals of White/European ethnicity for improving statistical power and predictive strength, and used PRSlasso as the main genetic risk assessment method (figure 1).
Table 1 presents the baseline characteristics of eventually included participants. Of the 411 712 White/European individuals (mean±sd age 56.8±8.0 years), 222 212 (54.0%) were female. There were 145 662 (35.4%) former smokers and 41 558 (10.1%) current smokers, among which 40 629 (11.6%) individuals had intermediate smoking exposure (20–39.9 pack-years) and 17 909 (5.1%) individuals had heavy smoking exposure (≥40 pack-years). Over 3 625 259 person-years of follow-up (median (IQR) length of follow-up 9.0 (8.3–9.5) years), there were 9577 cases of incident COPD. The characteristics of COPD cases determined by the two sources are shown in supplementary table E7. Participants who developed incident COPD were slightly older, more likely to be male and obese, had more smoking exposure, and had higher genetic risk scores.
Associations of genetic risk with incident COPD
For the increased genetic risk groups, the incidence and HR of COPD gradually increased. The high genetic risk group HR was 2.40 (95% CI 2.24–2.56) compared with the low genetic risk group. After additional adjustment for smoking status or pack-years, the HRs of the high genetic risk group were 2.37 (95% CI 2.21–2.53) and 2.27 (95% CI 2.12–2.44), respectively (table 2). The association between PRSlasso on a continuous scale and risk of incident COPD was nonlinear (pnonlinear<0.001); a high PRSlasso presented a very high risk (figure 2). When genetic risk deciles were used instead of categories, the same trend of results was observed (supplementary table E9). Supplementary figure E3a shows the cumulative risk of incident COPD in each genetic risk group during follow-up.
Associations of smoking with incident COPD
As the smoking status changed and smoking pack-years increased, the incidence and HR of COPD also increased. The HRs of the current and heavy smoking groups were 5.97 (95% CI 5.64–6.32) and 8.32 (95% CI 7.81–8.86), respectively, compared with the never smoking group. After additional adjustment for PRSlasso, the HRs were slightly lower than before among the smoking groups (table 3). The association between smoking pack-years on a continuous scale and risk of incident COPD was nonlinear (pnonlinear<0.001); heavy smoking presents a very high risk (supplementary figure E2). When the number of smoking pack-years was further subdivided into more categories, the same results were observed (supplementary table E10). Compared with former smokers, the associations between smoking pack-years and COPD risk were stronger among current smokers (supplementary table E11). The cumulative risk of incident COPD in each smoking status and pack-year group during follow-up is shown in supplementary figure E3b and c.
Associations of smoking and genetic risk with incident COPD
Combining genetic risk and smoking to group the entire cohort, the risk of incident COPD still increased with smoking and genetic risk (figure 3). Compared with the low genetic risk and never smoking group, the high genetic risk and current smoking group HR was 11.62 (95% CI 10.31–13.10). A similar pattern was observed among the genetic risk and smoking pack-year groups, and the highest risk was observed among individuals with high genetic risk and heavy smoking exposure (HR 14.85, 95% CI 13.09–16.84).
Moreover, we observed significant interactions between the PRSlasso categories and smoking status or pack-years (both pinteraction<0.001) (table 4), and the interactions between PRS279 and smoking were not significant (smoking status: pinteraction=0.116; smoking pack-years: pinteraction=0.334) (supplementary figure E4). When stratified by current smokers and former smokers, the interactions between genetic risk and smoking pack-years for incident COPD remained significant in both groups (both pinteraction<0.001) (supplementary table E12). The impact of smoking was substantially more pronounced in those with an elevated genetic risk. In the low, intermediate and high genetic risk groups, the HRs of current smoking were 4.32 (95% CI 3.69–5.06), 5.92 (95% CI 5.48–6.39) and 6.89 (95% CI 6.21–7.64), respectively, compared with never smoking (table 4). Supplementary figure E5 shows the cumulative risk of incident COPD in each smoking status and pack-year group among each genetic risk category during follow-up.
The same pattern of associations was observed in a series of sensitivity analyses with additional adjustment for asthma, chronic pulmonary infections and residential air pollution, excluding participants related to others and participants who developed outcomes within the first 2 years of follow-up (supplementary tables E13 and E14). Subgroup analyses were performed by age and sex (supplementary tables E15 and E16). The risk of incident COPD increased with elevated smoking and the genetic risk was more evident in elderly individuals aged >60 years (pinteraction<0.001).
Population-attributable fractions
If all individuals do not smoke, 54.9% (95% CI 53.1–56.6%, based on smoking status) to 55.3% (95% CI 53.4–57.1%, based on smoking pack-years) new-onset COPD events might be prevented during follow-up. Another reality was that if all current smokers quit smoking, new-onset events might be reduced by 20.2% (95% CI 19.9–20.5%). In addition, if smoking pack-years were reduced by one or two levels, 40.7% (95% CI 39.6–41.8%) or 53.1% (95% CI 51.4–54.7%) of new incident cases might be prevented. Further analyses stratified by genetic risk category showed that 42.7% (95% CI 37.1–48.0%), 54.1% (95% CI 51.7–56.4%) and 61.1% (95% CI 58.2–63.9%) of incident COPD cases were attributed to smoking among the low, intermediate and high genetic risk populations (table 5).
Discussion
In this large population-based prospective cohort study, the PRS constructed from more than 2.5 million variations, which did not reach genome-wide significance, showed better predictive accuracy for incident COPD. After stratifying more than 411 000 White/European participants with the PRS, we found that smoking was more strongly associated with incident COPD in the high genetic risk group. The population-attributable risks of smoking increased from 42.7% to 61.1% from low genetic risk to high genetic risk.
Moll et al. [8] developed PRSlasso based on the GWAS for FEV1 and FEV1/FVC ratio, and showed that it could significantly improve the predictive power of COPD in a case–control study. This PRS construction method containing millions of variants below genome-wide significance was also applied for other diseases [14]. However, the current study was based on a prospective study design and our results suggest that the PRS was a significant predictor of new-onset COPD cases. The significant nonlinear association between PRSlasso and COPD risks allowed the discovery of individuals with extremely high genetic risk. Meanwhile, our results also demonstrated that the PRS constructed based on cross-sectional data showed associations not only with peak lung function but also with accelerated lung function decline [25, 26], which is the main feature of COPD. Therefore, this study provided evidence from a large-sample prospective cohort study for the application of the PRS. Moreover, due to the inclusion of below genome-wide significance variants, the study also showed that the new method based on regularised regression might significantly improve the predictive performance of the PRS.
The predictive power for incident COPD of the PRS has not been well verified in other ethnic groups except for White/European and Black/African participants. The main reason for this may be that the sample size of other ethnic groups included in the UK Biobank study is limited and there were not enough outcome events recorded during the follow-up, which may lead to inadequate statistical power. In addition, most GWASs are currently conducted on the European ancestry population, and directly generalising the weights and risk loci to other races/ethnicities may attenuate its predictive power accuracy [27]. Thus, using appropriate methods to develop additional PRSs in multi-ethnic populations is critical to implement precision medicine and prevention in global health [28].
Smoking can directly induce tissue damage through oxidative stress or indirectly induce an inflammatory response, resulting in irreversible airway remodelling. Whether smoking and genetic susceptibility interact with the occurrence and development of COPD and decreased lung function has long been a concern. This study is the first in a large population-based prospective study to confirm that smoking has a more significant impact on new-onset COPD in a high genetic risk population. Both the attributable risks and the differences in smoking hazard ratios increased from low genetic risk to high genetic risk. Similar results have been reported: individuals with a high genetic risk for low FEV1/FVC (a PRS constructed with 26 SNPs) are more susceptible to the deleterious effects of smoking [29]. However, studies of the UK Biobank suggested that smoking and genetic effects generally act independently. Wain et al. [30] included 50 008 individuals with extreme FEV1 and smoking behaviours for a GWAS. The results showed shared genetic causes of low FEV1 between heavy smokers and never-smokers, and smoking only likely interacted with a small proportion of the genetic effects. Shrine et al. [7] combined 279 FEV1/FVC-related variants to construct a PRS. Based on the cross-sectional study results, no significant smoking–PRS interaction for FEV1/FVC was observed and only a weak interaction for COPD was observed. A GWAS of 5070 participants found five FEV1/FVC-related variants, but no interaction between them and smoking was found [31]. In this study, the significant interaction may be attributed to the novel PRS estimation method, which combines the two lung function phenotypes of FEV1 and FEV1/FVC ratio and includes 2.5 million variants. Meanwhile, the interaction may also benefit from identifying extremely high genetic risk populations brought about by the significant nonlinear association between PRSlasso and incident COPD risk. It provided a more accurate description of the genetic characteristics of lung function. The results also suggested that similar PRS estimation methods may be an effective tool for discovering interactions between genetics and environmental factors.
Previous studies showed that the PRS was associated with several computed tomography imaging phenotypes, including quantitative emphysema and airway wall thickness measures, and that it was associated with reduced lung growth patterns in children with asthma [8]. These characteristics may be why an individual with high genetic risk has more severe damage or poor repair and is more likely to develop COPD after smoking exposure. Moreover, multiple genes, including CHRNA3 and CHRNA5 at 15q25, were strongly associated with lung function, COPD and smoking behaviours [30, 32–35]. A higher PRS may lead to more undocumented smoking exposures, including deeper inhalation depth and tobacco selection with a higher nicotine content. It is challenging to attempt a rigorous biological mechanism for millions of SNPs, but it is worth noting that a model called Omnigenic may serve as its theoretical basis [36]. This model refers to the existence of a considerable number (“omni-” refers to “all”) of genes that may contribute to disease risk, among which peripheral genes (numerous, pleiotropy and regulatory effects) play a synergistic role by influencing core genes (rare, specificity, interpretable biological roles) through a regulatory network [37]. Although in the current study it was difficult to solve the model's concerns about regulatory networks and rare mutations, there was a consensus on the involvement of below genome-wide significance variants. Therefore, we speculate that this may be an essential strategy for genetic risk assessment.
Considering that more than half of new-onset COPD cases are still attributed to smoking, all populations, especially those who have a high genetic risk, are recommended to strengthen interventions to protect lung function, including smoking cessation at an earlier age to bring about more benefits. PRS-informed intervention may be crucial. After becoming aware of their genetic risk, the high-risk population may actively choose a healthier lifestyle, confirmed in a study of α1-antitrypsin deficiency [38]. To date, lung function testing is not included in routine COPD screening [39, 40]. As the cost continues to decrease, genome-wide genotyping that only needs to be performed once in a lifetime can provide an evaluation of various phenotypes and may become an additional solution. This study suggests that the PRS was associated with incident COPD in nonsmokers. Therefore, the PRS also provides additional information independent of traditional factors for these populations and individuals may be more active in reducing exposure to other risk factors after receiving this information. Whether PRS-informed early disease screening and intervention can improve COPD underdiagnosis and reduce the overall burden of severe COPD deserves more in-depth research.
Strengths and limitations
Our study has several significant strengths, including the prospective population-based study design, large sample size and detailed information on related covariates.
Some limitations should also be considered. First, smoking behaviours were self-reported and lacked information on tobacco type, inhalation depth and smoking space, and these may cause recall bias and decreased accuracy. In addition, smoking was not randomly assigned and behaviour at baseline may be affected by lung function or other unmeasured variables. Second, rare variants that may have enormous functions were not included in this study, leading to genetic risk estimation inaccuracy. Third, PRSlasso used in this study was developed from the GWAS conducted in the UK Biobank (n=321 047) and SpiroMeta (n=79 055). The samples from the UK Biobank occupied 80% of the study population. This may lead to an over-fitting issue in the current analysis. Fourth, although we introduced lung function testing results during follow-up to compensate for the lack of just relying on a hospital diagnosis to define incident COPD, the UK Biobank has not yet repeated the lung function measurement of all participants and there may still be an underdiagnosis problem. Fifth, incident COPD cases were collected through hospital records and death registries, and some mild cases or cases only receiving primary care could be missed. Failure to obtain information about the severity of the disease may obscure potential dose–response relationships. Sixth, smoking exposure may change during follow-up, leading to deviations in the accurate exposure of risk factors.
Conclusions
The PRS, constructed from 2.5 million variants below genome-wide significance, showed a significant association with incident COPD. The study also found that participants with a high genetic risk may be more susceptible to developing COPD if exposed to smoking.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material ERJ-01320-2021.Supplement
Shareable PDF
Supplementary Material
This one-page PDF can be shared freely online.
Shareable PDF ERJ-01320-2021.Shareable
Acknowledgements
We are grateful to UK Biobank participants. This research has been conducted using the UK Biobank resource (www.ukbiobank.ac.uk) under application number 43795. We are grateful to the International COPD Genetics Consortium (www.copdconsortium.org) for providing the polygenic risk score calculation code.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Author contributions: C. Mao had full access to all the data in the study, and took responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: P-D. Zhang and X-R. Zhang. Acquisition, analysis or interpretation of data: all authors. Drafting of the manuscript: P-D. Zhang and A. Zhang. Critical revision of the manuscript for important intellectual content: all authors. Statistical analysis: P-D. Zhang and Z-H. Li. Obtained funding: C. Mao. Administrative, technical or material support: D. Liu and Y-J. Zhang. Study supervision: C. Mao.
Conflict of interest: None declared.
Support statement: This work was supported by the National Natural Science Foundation of China (81973109), the Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (2019), the Construction of High-level University of Guangdong (G820332010, G618339167 and G618339164), and the Guangzhou Science and Technology Project (202002030255). Funding information for this article has been deposited with the Crossref Funder Registry.
- Received February 25, 2021.
- Accepted June 14, 2021.
- Copyright ©The authors 2022. For reproduction rights and permissions contact permissions{at}ersnet.org