Abstract
This study aimed to identify simple rules for allocating chronic obstructive pulmonary disease (COPD) patients to clinical phenotypes identified by cluster analyses.
Data from 2409 COPD patients of French/Belgian COPD cohorts were analysed using cluster analysis resulting in the identification of subgroups, for which clinical relevance was determined by comparing 3-year all-cause mortality. Classification and regression trees (CARTs) were used to develop an algorithm for allocating patients to these subgroups. This algorithm was tested in 3651 patients from the COPD Cohorts Collaborative International Assessment (3CIA) initiative.
Cluster analysis identified five subgroups of COPD patients with different clinical characteristics (especially regarding severity of respiratory disease and the presence of cardiovascular comorbidities and diabetes). The CART-based algorithm indicated that the variables relevant for patient grouping differed markedly between patients with isolated respiratory disease (FEV1, dyspnoea grade) and those with multi-morbidity (dyspnoea grade, age, FEV1 and body mass index). Application of this algorithm to the 3CIA cohorts confirmed that it identified subgroups of patients with different clinical characteristics, mortality rates (median, from 4% to 27%) and age at death (median, from 68 to 76 years).
A simple algorithm, integrating respiratory characteristics and comorbidities, allowed the identification of clinically relevant COPD phenotypes.
Abstract
An algorithm integrating respiratory characteristics and comorbidities identifies clinical COPD phenotypes http://ow.ly/eSRp30fJPG5
Introduction
Airflow limitation is the hallmark of chronic obstructive pulmonary disease (COPD), and forced expiratory volume in 1 s (FEV1) has long been used as the main criterion for the characterisation of disease severity [1, 2]. Analyses of observational cohorts (e.g. the ECLIPSE cohort) have revealed that COPD patients with similar levels of FEV1 experience different degrees of disease burden, reflected by dyspnoea levels, exacerbation rates, health-related quality of life (HRQoL) impairment and exercise limitation [3]. Accordingly, the current classification of COPD proposed by the Global initiative for chronic Obstructive Lung Disease (GOLD) incorporates not only the FEV1, but also dyspnoea or HRQoL, and previous occurrence of COPD exacerbations and/or hospitalisation [1]. Although this classification is not fully evidence-based, it has the advantage of taking into account some of the clinical heterogeneity of COPD with the aim of predicting future risk and proposing corresponding treatment choices. A limitation of this classification is that it does not account for age, an important determinant of prognosis in patients with COPD [4]. Furthermore, the GOLD classification does not account for comorbidities, which can both be frequent and contribute to the prognosis [5–7].
Several groups have used cluster analyses to explore clinical heterogeneity in cohorts of patients with COPD [8–10]. These studies have identified consistent clinical COPD phenotypes at high risk of mortality, including 1) younger patients with severe respiratory disease, few cardiovascular comorbidities, and poor nutritional status; and 2) older patients with moderate respiratory disease, metabolic and cardiovascular comorbidities, and obesity [11]. They have also identified patients with mild disease and a good prognosis [12, 13]. However, all published studies had limitations related to a relatively small sample size and lack of further validation in independent samples [11, 13]. Furthermore, the results of cluster analyses are difficult to translate for use in daily practice, as they provide no tool for individual patient allocation in the identified phenotypes.
In the present study, our aim was to develop and validate an algorithm, based on easily available clinical data, to assign patients with COPD to clinically relevant phenotypes.
Methods
Overall design
Data from three French/Belgian COPD cohorts were used to identify clinical COPD phenotypes using cluster analysis. Classification and Regression Tree (CART) [14] analysis was then used to develop an algorithm to allocate individual COPD patients recruited in these French/Belgian cohorts to specific subgroups. This algorithm was further tested in an independent sample of patients with COPD, using data from the COPD Cohorts Collaborative International Assessment (3CIA) initiative [15].
COPD patient cohorts
The French/Belgian COPD cohorts are composed of three cohorts: the Initiatives BPCO cohort [8], the French College of General Hospital Respiratory Physicians (CPHG) cohort [16] and the Leuven cohort [12]. Patients within these cohorts had a diagnosis of COPD, based on post-bronchodilator FEV1/FVC<0.70, and were recruited in a stable state in university hospitals (Initiatives BPCO and Leuven cohorts) [8, 12], or at the time of hospitalisation for COPD exacerbations (CPHG cohort) [16], as previously described. The 3CIA initiative contains pooled individualised data from 22 cohorts of patients with COPD, who were recruited in publicly funded hospitals or in population-based studies [15]. All cohorts were approved by a local Ethics Committee and all subjects provided informed written consent.
Statistical analysis plan
First, COPD patients recruited in the French/Belgian cohorts were classified into subgroups, based on the results of cluster analysis of data obtained at inclusion in the cohorts. The clinical relevance of the identified subgroups was established by examining their association with 3-year all-cause mortality. Next, CARTs were used for the development of an algorithm, assigning COPD patients to the subgroups identified by cluster analysis. The clinical value of this algorithm was examined using 3-year all-cause mortality in the French/Belgian cohorts. Finally, the algorithm was tested for external validation using data from the 3CIA initiative database [15]. Mortality risks among subgroups were analysed using Kaplan–Meier curves and Cox models. The concordance probability estimate was used to evaluate the discriminatory power of classifications for mortality prediction. Data are presented as median (interquartile range, IQR) or n (%). Analyses were performed using SAS 9.2 (SAS Institute Inc., Cary, NC, USA) and Tanagra 1.4 (Lyon, France) software. Additional information on the methods used can be found in the online supplementary material.
Cluster analysis of the French/Belgian COPD cohorts
Variables were selected for inclusion in the cluster analysis, based on their previous association with future risk and prognosis in COPD patients [1, 6], and included age, body mass index (BMI), FEV1 (% predicted), modified Medical Research Council (mMRC) dyspnoea scale, number of exacerbations in the previous 12 months, and presence/absence of cardiovascular comorbidities (hypertension, coronary artery disease and/or left heart failure) and/or diabetes. Identification of subgroups of patients with COPD associated with survival was achieved using factor analysis for mixed data (FAMD) [17, 18], followed by classification of patients using Ward's agglomerative hierarchical cluster analysis [8, 12]. The clinical relevance of the identified subgroups was examined by comparing their all-cause mortality at 3 years, as previously described [8, 12]. These subgroups (phenotypes) were labelled using Roman numbers.
Development of an algorithm for assigning COPD patients to specific subgroups in the French/Belgian cohorts
The development of an algorithm to assign COPD patients to the subgroups identified by cluster analysis was achieved using CART analysis [14, 19], a non-parametric decision tree learning technique [19]. Variables included in this analysis were those selected for the cluster analysis (see above). Threshold values for these variables were based on those obtained by CART analysis and were slightly modified for improved practicality (see online supplement for a detailed explanation).
External validation of the algorithm
The algorithm established in the French/Belgian cohorts was then tested in an independent group of patients with COPD from the 3CIA database. Patients in this database (n=16 332) were considered eligible for the study if data necessary to apply the algorithm (age, BMI, FEV1% predicted, mMRC scale, presence/absence of cardiovascular comorbidities and diabetes) and information on vital status at 3 years were available. Patients with appropriate data (n=3651) were classified by the algorithm into the five classes described above (labelled using Arabic digits), and these classes were compared according to their clinical characteristics, all-cause mortality at 3 years and age at death.
Results
Patients and overall study design
The study design is presented in figure 1 and the characteristics of the patients with COPD at inclusion in the French/Belgian cohorts (n=2409 patients) and in the 3CIA database (n=3651 patients) are presented in supplementary table S1. Their 3-year all-cause mortality rates were 30.8% and 11.6%, respectively. Patients included in the French/Belgian cohorts were characterised by older age, more severe airflow limitation and higher rates of cardiovascular comorbidities and/or diabetes. Furthermore, 57% of patients in the French/Belgian cohorts were recruited at the time of hospitalisation for COPD exacerbations (as part of the CPHG cohort) [16].
Study design. Patients with chronic obstructive pulmonary disease (COPD) recruited in the French/Belgian cohorts were classified into subgroups (phenotypes), based on the results of a cluster analysis of clinical data obtained at inclusion in the cohorts. Next, classification and regression trees (CARTs) were used on the same data to determine the best variables and thresholds necessary for the development of an algorithm for assigning COPD patients to the subgroups identified by cluster analysis in the French/Belgian cohorts. This analysis led to the development of a simple algorithm for allocating patients with COPD into five classes. This algorithm was then tested for external validation using data from the 3CIA initiative database (n=16 332). This latter analysis was only possible in patients with available data (n=3651), i.e. with all the variables contained in the algorithm. In each analysis, the clinical relevance of the identified subgroups/classes was established by examining their association with 3-year all-cause mortality. BMI: body mass index; FEV1: forced expiratory volume in 1 s; mMRC: modified Medical Research Council.
Cluster analysis of the French/Belgian cohorts
Table 1 shows the five subgroups (labelled I to V) identified in the French/Belgian COPD cohorts using cluster analysis (see online supplementary tables S2–S6 and figure S1). Table 2 summarises the main descriptors of these subgroups, according to increasing rates of 3-year all-cause mortality. Subgroup V (mortality rate 2.5%) was characterised by mild respiratory disease and low rates of comorbidities. Subgroup II (mortality rate 21.8%) was characterised by moderate to severe respiratory disease and low rates of comorbidities. Subgroup III (mortality rate 30.0%) was generally characterised by an older age than that of subgroup II, with a high prevalence of comorbidities and obesity. Subgroup IV (mortality rate 47.0%) was characterised by very severe respiratory disease with low rates of cardiovascular comorbidities and diabetes. Subgroup I (mortality rate 50.9%) had less severe respiratory disease than subgroup IV, but was characterised by older age and very high rates of cardiovascular comorbidities and diabetes.
Characteristics and 3-year mortality in chronic obstructive pulmonary disease (COPD) patients (n=2409) recruited in the French/Belgian COPD cohort, according to the five subgroups identified using cluster analysis
Main descriptors of the five chronic obstructive pulmonary disease (COPD) phenotypes identified by cluster analysis in the French/Belgian COPD cohort#
Use of CART for the development of an algorithm to assign COPD patients to subgroups of patients, identified by cluster analysis in the French/Belgian cohorts
The CART analysis provided an algorithm that facilitated the assignment of up to 80% of the patients to the subgroups identified by cluster analysis (see online supplementary tables S7 and S8). This algorithm is presented in figure 2 and the clinical characteristics of patients, according to the five classes obtained by applying this algorithm, are presented in table 3. Kaplan–Meier survival curves by cluster analysis-defined subgroups (figure 3a) and CART-defined classes (figure 3b) showed comparable results. Concordance probability estimates were 0.61 (95% CI 0.59–0.63) for cluster analysis-defined subgroups and 0.60 (95% CI 0.58–0.62) for CART-defined classes, confirming that both methods had comparable discriminatory power for the identification of subgroups with different prognoses.
Algorithm developed by classification and regression tree (CART) analysis for the classification of chronic obstructive pulmonary disease (COPD) patients. Application to the French/Belgian and 3CIA cohorts. BMI: body mass index; FEV1: forced expiratory volume in 1 s; mMRC: modified Medical Research Council.
Characteristics and 3-year mortality rates in chronic obstructive pulmonary disease (COPD) patients recruited in the French/Belgian COPD cohorts, or in the 3CIA initiative database according to the five classes identified using the classification and regression tree (CART)-based algorithm
Kaplan–Meier analyses for assessing all-cause mortality at 3 years. a) French/Belgian chronic obstructive pulmonary disease (COPD) cohorts according to the five subgroups (phenotypes, Ph) identified by cluster analysis. b) French/Belgian COPD cohorts according to the five classes identified by classification and regression tree (CART) analysis. c) The 3CIA COPD cohort according to the five classes identified by the algorithm developed in the French/Belgian cohorts. All analyses, p<0.0001 (Log-rank test).
Evaluation of the algorithm using data from the 3CIA initiative database
The algorithm developed in the French/Belgian cohorts was then tested, using data obtained in COPD patients from the 3CIA database. Characteristics of the 3651 patients distributed into classes, according to this algorithm, are presented in table 3. Kaplan–Meier survival curves by classes are presented in figure 3c. The concordance probability estimate was 0.62 (95% CI 0.59–0.64).
Comparison of mortality rates among classes in the French/Belgian COPD cohorts versus the 3CIA database
Because 3-year mortality rates varied widely between French/Belgian COPD cohorts and the 3CIA database, we used Cox analysis to examine hazard ratios of mortality among patients in the five classes defined by our algorithm in both cohorts, respectively. Forest plots corresponding to these analyses are presented in figure 4. Although absolute rates of death were markedly higher in the French/Belgian cohorts, hazard ratios of mortality among the five classes were rather comparable in the French/Belgian cohorts and in the 3CIA initiative.
Relative mortality risks at 3 years among chronic obstructive pulmonary disease (COPD) patients in a) the French/Belgian COPD cohorts and b) the 3CIA initiative. COPD patients were classified into five classes according to the algorithm. Horizontal bars show hazard ratios and 95% confidence intervals of mortality risks between classes. For example, in the French/Belgian COPD cohorts, subjects in class 4 have a 23.2-fold (95% CI 10.2–52.7) increased risk of mortality when compared with subjects in class 5.
Distribution by GOLD grades of severity of airflow limitation [1] in patients who died during follow-up is presented in figure 5. When comparing classes with high rates of all-cause mortality, patients without cardiovascular comorbidities/diabetes (class 4) who died were predominantly in GOLD 4; whereas patients with cardiovascular comorbidities/diabetes (class 1) who died had less severe airflow limitation (predominantly GOLD 2 and 3). Comparable observations were made when comparing patients in class 2 versus class 3 (intermediate mortality rates).
Distribution of airflow limitation severity by Global Initiative for Chronic Obstructive Lung Disease (GOLD) grade at inclusion in the cohorts, in patients who died during follow-up. a) French/Belgian cohorts. b) 3CIA initiative. Data are presented as a percentage of the total number of deaths in each class. Absolute numbers of deaths (n) in each class are also presented.
Discussion
In the present study, we first performed cluster analysis in a pool of French/Belgian COPD cohorts, which identified five subgroups (phenotypes) of patients with different rates of all-cause mortality at 3 years and different ages at death. We then used CART analysis in this pool of French/Belgian cohorts to develop an algorithm that allowed allocation of patients into five classes that corresponded to the subgroups identified by cluster analysis. This simple algorithm was based on clinical variables (including cardiovascular comorbidities and/or diabetes and respiratory characteristics) that are routinely available in daily practice. Classification of COPD patients using this algorithm allowed the identification of subgroups of patients, which differed on 3-year all-cause mortality and age at death in the pool of French/Belgian cohorts, thereby providing internal validation of the approach. This method provided comparable results in patients included in the 3CIA initiative database, which contained an independent group of patients with COPD recruited in multinational cohorts, thereby providing external validation. The algorithm identifies clinical phenotypes that are relevant to the prognosis of patients with COPD, which could aid in the exploration of underlying pathophysiological mechanisms and development of novel strategies of care.
The algorithm described in the present study is the first to integrate comorbidities (cardiovascular diseases, diabetes and obesity) and age to more classical respiratory variables (FEV1 and dyspnoea) to improve the characterisation of patients with COPD. An important yield of this algorithm is to identify patients who belong to two subgroups with a poorer prognosis, i.e. classes 1 and 4; and to highlight the corresponding determinants, i.e. the severity of the respiratory component (as assessed by the degrees of lung function impairment and dyspnoea) and the presence of major cardiovascular comorbidities or metabolic risk factors (diabetes). These data confirm previous studies, which show that 1) cardiovascular and metabolic comorbidities contribute to worsening outcomes (e.g. mortality, hospitalisation and exacerbation) in patients with COPD [6, 20]; and 2) two very different phenotypes of COPD patients with a poor prognosis exist (those with severe respiratory disease, often occurring at a younger age; and those with multi-morbidities including cardiovascular and metabolic diseases, often characterised by an older age) [9, 12]. Importantly, this study extends previous data by studying larger numbers of patients (including larger numbers of women) recruited in multiple countries, and provides a simple algorithm that can be used in the clinic to classify patients. One notable characteristic of the algorithm is that it highlights the variables on which clinicians and researchers should focus during follow-up and treatment adaptation. Whether specific strategies need to be developed for all or some of the identified phenotypes now needs to be tested prospectively. Similarly, future studies should aim to determine whether these phenotypes are associated with specific biomarkers that reflect underlying pathophysiological mechanisms.
The main strengths of the present study were the application of exploratory statistical analyses complemented by clinical knowledge in large cohorts of patients, the validation of findings in an external pool of cohorts and the use of a robust variable (mortality) for validation. We also recognise that the present study has limitations. Our assessment of comorbidities was based on physician diagnoses that did not consider occult conditions, which reportedly occur in COPD patients [21]. To limit such underestimation of the impact of undiagnosed cardiovascular diseases, the definition of cardiovascular comorbidities was relatively loose and included hypertension (a risk factor for cardiovascular disease rather than a disease itself). This definition also corresponds to what occurs in real-life daily practice, where many patients do not benefit from systematic screening for cardiovascular comorbidities. Although COPD patients are at high risk for lung cancer, which is associated with a poor prognosis, patients with active lung cancer were generally excluded from the present cohorts, thus limiting our findings to COPD patients without active lung cancer. Specific causes of mortality were not available in the cohorts used in the present analyses, and the prognostic value of the phenotypes was confirmed using all-cause mortality. Previous studies have shown that causes of mortality in COPD populations differ between patients with mild versus severe airflow obstruction, with a higher relative weight of cancer and cardiovascular causes in patients with less severe airflow impairment, and more respiratory causes in those with more severe airflow impairment [22]. Among the patients who died, differences in the GOLD grades of airflow obstruction (see figure 4) between phenotypes with comparable survival rates (e.g. class 1 versus class 4 and class 2 versus class 3) suggest that patients with relatively high rates of cardiovascular comorbidities and/or diabetes (e.g. class 1 and 3) were more likely to die from extrapulmonary causes. Importantly, even if one of its purposes is to identify populations with different mortality rates, the algorithm is not intended to represent a prognostic index, as the determinants of a given prognosis might differ markedly between patients of a given group. The large difference in mortality rates between the two groups of cohorts largely relates to the fact that 57% of patients in the French/Belgian cohorts were recruited at the time of hospitalisation for a COPD exacerbation (CPHG cohort) [16], reflecting the prognostic impact of hospitalisations. Although hospitalisation appears to be an important prognostic factor, it should be considered a marker of disease severity rather than a phenotype per se. This was the basis for not including previous hospitalisation as a variable in the cluster analysis. However, COPD exacerbations (which are important in the characterisation of patients with COPD) [23] were included in the cluster analysis and the CART analysis. The finding that exacerbations were not retained in our final algorithm should not be misinterpreted, as exacerbations remain important events in the life of patients with COPD [24]; it merely reflects that non-hospitalised exacerbations were not significantly related to prognosis. The performance of classification trees could also be improved by the integration of biomarkers that reflect inflammatory (fibrinogen, white blood cell count, C-reactive protein, eosinophils, etc.) [25–27] and cardiovascular (brain natriuretic peptide, copeptin, pro-adrenomedullin etc.) [28] biological phenomena.
The field of COPD phenotypes was once considered ‘the future of COPD’ [29], but moving from exploratory research studies to the clinic has proven to be difficult. The algorithm described in the present study offers a new way of combining and hierarchising well-known prognostic criteria (including comorbidities, age and symptoms) to identify COPD phenotypes in the clinic. This approach could serve as a basis to develop phenotype-specific therapeutic strategies, by recruiting appropriate at-risk target populations in clinical trials. We speculate that our algorithm might also help in unravelling specific biological pathways that were previously missed, owing to the mixing of various phenotypes in the current classifications of COPD.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Online supplement ERJ-01034-2017_supplement
Disclosures
Supplementary Material
I. Alfageme ERJ-01034-2017_Alfageme
P. Bakke ERJ-01034-2017_Bakke
P.-R. Burgel ERJ-01034-2017_Burgel
C. Casanova ERJ-01034-2017_Casanova
B. Celli ERJ-01034-2017_Celli
BG. Cosio ERJ-01034-2017_Cosio
M. Decramer ERJ-01034-2017_Decramer
A.L. Echazarreta ERJ-01034-2017_Echazarreta
M. Han ERJ-01034-2017_Han
W. Janssens ERJ-01034-2017_Janssens
P. Lange ERJ-01034-2017_Lange
J.M. Marin ERJ-01034-2017_Marin
M. Miravitlles ERJ-01034-2017_Mirevitlles
T. Oga ERJ-01034-2017_Oga
J.-L. Paillasseur ERJ-01034-2017_Paillasseur
A.S. Ramírez García Luna ERJ-01034-2017_Ramirez
N. Roche ERJ-01034-2017_Roche
D. Sin ERJ-01034-2017_Sin
J.J. Soler-Cataluña ERJ-01034-2017_SolerCataluña
J.B. Soriano ERJ-01034-2017_Soriano
A. Turner ERJ-01034-2017_Turner
Footnotes
This article has supplementary material available from erj.ersjournals.com
Support statement: The analyses reported here were supported by an unrestricted grant from Boehringer Ingelheim France, which played no role in study design, data collection, analysis and interpretation of data, writing of the manuscript nor decision to submit it for publication. The Initiatives BPCO study was supported by an unrestricted grant from Boehringer Ingelheim France and (until 2015) Pfizer. None of the funding sources of the individual trials were involved in any aspect of the 3CIA initiative, including the design, data collection and analysis, decision to publish, or preparation of the manuscript. P. Martinez-Camblor was supported by research grant MTM2011-23204 from the Spanish Ministerio de Ciencia e Innovación (FEDER support included). J. Garcia-Aymerich has a researcher contract from the Instituto de Salud Carlos III (CP05/00118), Ministry of Health, Spain. Funding information for this article has been deposited with the Crossref Funder Registry.
Conflict of interest: Disclosures can be found alongside this article at erj.ersjournals.com
- Received May 19, 2017.
- Accepted July 28, 2017.
- Copyright ©ERS 2017