Abstract
Chronic obstructive pulmonary disease (COPD) is commonly staged according to the percentage of predicted forced expiratory volume in 1 s (FEV1 % pred), but other methods have been proposed. In this study we compared the performance of seven staging methods in predicting outcomes.
We retrospectively studied 296 COPD outpatients. For each patient the disease severity was staged by separately applying the following methods: the criteria proposed by the Global Initiative for Chronic Obstructive Lung Disease (GOLD), quartiles of FEV1 % pred and z-score of FEV1, quartiles and specified cut-off points of the ratio of FEV1 over height squared ((FEV1·Ht−2)A and (FEV1·Ht−2)B, respectively), and quartiles of the ratio of FEV1 over height cubed (FEV1·Ht−3) and of FEV1 quotient (FEV1Q). We evaluated the performance of these methods in predicting the risks of severe acute exacerbation and all-cause mortality.
Overall, staging based on the reference-independent FEV1Q performed best in predicting the risks of severe acute exacerbation (including frequent exacerbation) and mortality, followed by (FEV1·Ht−2)B. The performance of staging methods could also be influenced by the choice of cut-off values. Future work using large and ethnically diverse populations to refine and validate the cut-off values would enhance the prediction of outcomes.
Abstract
Reference-independent methods for staging COPD performed better than the GOLD criteria in predicting outcomes http://ow.ly/W95F30iiMKO
Introduction
Chronic obstructive pulmonary disease (COPD) is a major cause of morbidity and mortality worldwide [1]. Accurate staging of COPD is important for determining treatment strategy and prognosis. Common staging based on the percentage of predicted forced expiratory volume in 1 s (FEV1 % pred), as proposed by the Global Initiative for Chronic Obstructive Lung Disease (GOLD), has been criticised for its susceptibility to the influence of patient age, height, sex and race [2–4]. Alternative expressions of forced expiratory volume in 1 s (FEV1) for staging COPD have been proposed. Staging based on z-scores of FEV1 potentially avoids the confounding effect of inter-personal physiological variations [4–6]. Other alternative staging methods are independent of reference values, such as the ratios of FEV1 over height squared (FEV1·Ht−2) and over height cubed (FEV1·Ht−3), and the ratio of FEV1 over the sex-specific Miller values (FEV1 quotient, or FEV1Q) [7–10]. Most studies evaluating these alternative methods were done before the Global Lung Function Initiative (GLI) issued their multi-ethnic reference equations [3]. Studies by Turkeshi et al. [11] and Hegendörfer et al. [12] applied the GLI reference equations while comparing the prognostic performance of different expressions of FEV1. These studies, however, were not performed specifically on patients with COPD [11, 12]. In the present study, with the application of the GLI reference equations and working with a cohort of patients with COPD, we aimed to evaluate the performance of staging based on different expressions of FEV1 in predicting the risk of two important clinical outcomes: severe acute exacerbation (SAE) and all-cause mortality.
Methods
Study design and population
This was a retrospective study using delinked clinical data from a database that was established in our previous study [13]. All patients were aged between 40 and 95 years old, had received a diagnosis of COPD by board-certified pulmonologists at the tertiary medical centre National Cheng Kung University Hospital, Taiwan, between January 2006 and December 2012, and had been followed for at least 1 year or until death. For the analysis of all-cause mortality, only those patients with complete longitudinal data at the final censoring date of the study (April 30, 2015) were included. The diagnosis of COPD required that the ratio of FEV1 to forced vital capacity (FVC) be lower than the fifth percentile value (lower-limit-of-normal) of the population [1, 14, 15]. Post-bronchodilator values were used if available. Patients with asthma or advanced-stage malignancy of any organ, patients lost to follow-up, and patients with incomplete or questionable data were excluded from the study. Relevant data (including sex, age, height, body mass index (BMI), cigarette smoking history, Charlson comorbidity index, frequency of SAE during the year before and the first 3 years after study enrolment, survival status and spirometric measurements) were obtained from medical records and delinked. SAE was defined, according to relevant guidelines, as an acute event characterised by the worsening of respiratory symptoms that were beyond daily variation and required hospitalisation [1, 16]. Experienced and board-certified pulmonary technicians at our hospital performed all pulmonary function tests according to standard protocols established by the American Thoracic Society [17]. All patients were in a stable condition when they underwent pulmonary function tests. The test that was closest in date to study enrolment was considered the baseline test. For each patient, we calculated five expressions of FEV1 (FEV1 % pred, z-score, FEV1·Ht−2, FEV1·Ht−3 and FEV1Q). We used the GLI specialised software and reference equations for South East Asians to convert the measured FEV1 to FEV1 % pred and z-scores [3]. When calculating FEV1Q, we used the Miller values (i.e. 0.5 L for males and 0.4 L for females) [9]. This study was approved by the Institutional Review Board of the National Cheng Kung University Hospital (A-ER-104-366).
COPD severity staging
For each patient, the severity of COPD was staged into four stages (stage 1–4 in order of increasing severity) using the different expressions of FEV1. Cut-off values for staging based on FEV1 % pred were those proposed by GOLD (i.e. 80%, 50% and 30%) and quartiles of FEV1 % pred. Quartiles were also applied as cut-off values for staging based on z-scores, FEV1·Ht−3 and FEV1Q [9, 11, 12]. For staging based on FEV1·Ht−2, we used two distinct sets of cut-off values: for (FEV1·Ht−2)A we used quartiles, whereas for (FEV1·Ht−2)B we used the cut-off values proposed by Miller et al. [7] (i.e. 0.3, 0.4 and 0.5). In total, we evaluated seven staging methods.
Statistical analysis
Categorical data are presented as counts and percentages, while continuous data (mostly not normally distributed as assessed by the Shapiro–Wilk test) are presented as medians and interquartile ranges (IQR). Variables between groups were compared via Mann–Whitney U test or Pearson's Chi-squared test of homogeneity as appropriate. Single- and multivariate logistic regression was carried out to determine the performance of each staging method in predicting the risk of having at least one SAE at 1 year, the risk of frequent (≥2) SAE at 1 year, and the risk of having SAE every year at 2 years. No co-linearity or interaction among all the candidate variates was found. Adjustment was made for age, BMI, Charlson comorbidity index, smoking status and history of SAE in the year preceding enrolment. Kaplan–Meier curves were plotted and log-rank tests performed to compare differences in the survival of patients at different stages based on each staging method. Single- and multivariate Cox proportional hazard regression analyses (adjusting for covariates such as age, BMI, Charlson comorbidity index, smoking status and history of SAE in the preceding year) were carried out to assess the performance of each staging method in predicting mortality risk. No violation of the proportional hazard assumption was detected. A p-value <0.05 was considered to indicate statistical significance; all tests were two-tailed. Statistical analysis was performed with the statistical packages SPSS (Version 22, SPSS, Chicago, IL, USA), R (Version 3.3.2) and MedCal (Version 16.8.4, MedCal Software, Ostend, Belgium).
Results
We screened the data of 667 outpatients, and identified 296 patients who met the inclusion criteria and were therefore enrolled into the study (figure 1). Table 1 shows the baseline characteristics of the cohort. Most participants were male (94%), and either current (60%) or former (30%) smokers. Patients were followed for a median duration of 57.9 months (IQR 49.8 months, range 14 days to 9.5 years). Pre-bronchodilator spirometric data were used for 56 patients (19%) owing to a lack of bronchodilator tests. We determined that 72 patients (24%) died during follow-up, and the median duration from enrolment to death was 43.5 months (IQR 39.9 months, range 14 days to 8.5 years). There were complete longitudinal data up to the final censoring date for 189 patients (64%), which were thus included for the analysis of all-cause mortality. There was no significant difference in baseline characteristics between those included and those excluded from the mortality analysis, except for a lower Charlson comorbidity index (median 1, IQR 1) and a shorter duration of follow-up (median 37.6 months, IQR 26.4 months) in those excluded (table 1).
The distributions of the measured FEV1, FEV1 % pred, FEV1·Ht−2, FEV1·Ht−3 and FEV1Q of our patients all exhibit positive skewness, while z-scores of FEV1 are normally distributed (figure S1a−f). Although we applied the Miller values for deriving FEV1Q for our patients, we also calculated the sex-specific first-percentile value of FEV1 for our study cohort, which was 0.51 for males and 0.52 for females. Our male-specific first-percentile value highly agrees with the Miller value for males (0.5), and the difference between our female-specific first-percentile value and the Miller value for females (0.4) is likely due to the small number of female patients (17 of 296 (6%)) in our cohort [9].
Table 2 displays the distribution of patients with a specific outcome across the stages based on different staging methods. We found that 48 patients (16%) had at least one SAE at 1 year of follow-up. Of these, 22 patients (7%) had two or more episodes of SAE. Additionally, 23 patients (8%) had SAE every year at 2 years. Staging based on (FEV1·Ht−2)A, (FEV1·Ht−2)B, FEV1·Ht−3 and FEV1Q stratified the distribution of patients with SAE-related outcomes well, with advancing stages having elevating percentages of patients with SAE-related outcomes. In contrast, staging based on the GOLD criteria, FEV1 % pred (quartiles) and z-scores did not stratify well. For mortality, staging based on the GOLD criteria, FEV1 % pred (quartiles), (FEV1·Ht−2)B and FEV1Q yielded progressively increasing mortality rates across ascending stages. Staging based on z-scores, (FEV1·Ht−2)A and FEV1·Ht−3 performed worse in this regard (table 2).
Single- and multivariate logistic regression assessing the risk of SAE revealed that staging methods that were independent of predicted reference values performed better than reference-dependent methods. Staging based on FEV1Q and (FEV1·Ht−2)A consistently discriminated the risk of SAE well, including the risk of ≥1 SAE at 1 year as shown in table 3, and the risk of frequent SAE at 1 year and having SAE every year at 2 years as shown in table 4. Advancing stages had progressively increasing crude and adjusted odds ratios. Staging based on (FEV1·Ht−2)B and FEV1·Ht−3 also showed an overall incremental trend of the risk of SAE across the ascending stages. However, for these two methods, the crude and adjusted odds ratios for ≥1 SAE at 1 year were slightly lower for stage 4 than for stage 3. For staging based on (FEV1·Ht−2)B, the adjusted odds ratio for frequent SAE at 1 year was also very slightly lower for stage 4 than for stage 3. In contrast, all three reference-dependent staging methods (i.e. GOLD, FEV1 % pred (quartiles) and z-scores) performed unsatisfactorily in stratifying the risk of SAE (tables 3 and 4).
The Kaplan–Meier survival curves for staging based on FEV1 % pred (quartiles), (FEV1·Ht−2)B and FEV1Q discriminated the survival differences of the patients in different stages. However, the survival curves for staging based on z-scores, (FEV1·Ht−2)A and FEV1·Ht−3 showed early crossovers between stages 3 and 4. The survival curves for GOLD-based stages separated well initially but then exhibited late-phase crossovers between stages 1 and 2 and between stages 3 and 4, which were likely caused by the low numbers of remaining subjects at risk near the end of follow-up (figure 2a–g).
Single- and multivariate Cox regression analysis revealed that, consistent with the distribution of mortality rates as displayed in table 2, staging based on FEV1 % pred (quartiles), FEV1Q and (FEV1·Ht−2)B in particular stratified the mortality risk well, with advancing stages exhibiting progressively increasing crude and adjusted hazard ratios. For the GOLD stages, the crude and adjusted hazard ratios showed a similarly incremental trend, but mostly without reaching statistical significance. Stages based on z-scores exhibited progressively increasing mortality risk only in the multivariate Cox regression model. Staging based on (FEV1·Ht−2)A and FEV1·Ht−3 performed unsatisfactorily in predicting the different mortality risk across the stages (table 5).
Considering the relatively small size of our cohort and the skewed distribution of the major variables, we further applied bootstrapping (with 5000 random samplings with replacement, using SAS version 9.4 (SAS Institute, Cary, NC, USA)) to all the logistic and Cox regression analyses to enhance the accuracy of our statistical estimates. The bootstrapping-derived crude and adjusted odds ratios and hazard ratios, and the corresponding 95% confidence intervals, generally reached statistical significance and agreed with the findings obtained from our original regression models (tables S1 and S2).
Discussion
In this study, we compared seven methods for staging the severity of COPD. We showed that staging based on FEV1Q accurately predicted survival and the risks of all the adverse outcomes studied. Staging based on (FEV1·Ht−2)B also performed well in general, particularly in predicting mortality risk, although there was a slight inversion in the crude and adjusted odds ratios for ≥1 SAE at 1 year and in the adjusted odds ratios for frequent SAE at 1 year between stages 3 and 4. A slight inversion in the bootstrapping-derived adjusted hazard ratios for mortality was also noticed between stages 2 and 3. Staging based on (FEV1·Ht−2)A and FEV1·Ht−3 discriminated the SAE-associated risks well, but performed unsatisfactorily in predicting the survival differences and mortality risk. The commonly applied GOLD criteria using GLI reference equations failed to discriminate all the SAE-associated risks, and predicted the mortality risk less well than FEV1Q and (FEV1·Ht−2)B. By using different cut-off values from the GOLD criteria, staging based on FEV1 % pred (quartiles) better predicted the risks of mortality and of frequent SAE at 1 year, but still inadequately differentiated other SAE-associated risks. Staging based on GLI-derived z-scores performed well only in the multivariate Cox regression model for mortality after adjusting for confounders, but completely failed to predict all the SAE-associated risks.
The severity of COPD, as represented by the degree of impairment in FEV1, is associated with acute exacerbation [18–27] and mortality [28–34]. However, the best method and cut-off values for staging the severity of COPD remain to be identified. The commonly used GOLD criteria have been criticised for their liability to bias [2–4]. Pioneering works by Miller and co-workers [7, 8] showed that staging based on reference-independent FEV1·Ht−2 and FEV1·Ht−3 correlated well with mortality. Later, Miller et al. [9] found that FEV1Q-based staging was an even better predictor of survival. The superiority of FEV1Q, and to a lesser extent FEV1·Ht−3, over other expressions in staging COPD was further supported by the work of Pedone et al. [10]. Moreover, Vaz Fragoso and co-workers [4, 5] applied z-scores of FEV1 to stage COPD among elderly patients, and found that advanced stages based on z-scores were associated with enhanced risks of adverse outcomes and worsened respiratory symptoms. Furthermore, it is unclear whether the use of the GLI reference equations for the calculation of FEV1 % pred and z-scores alters the performance of these staging methods in predicting outcomes. Hegendörfer et al. [12] applied the GLI reference equations and compared staging based on five different expressions of FEV1 in predicting mortality, unplanned hospitalisation, and physical and mental decline. They also identified FEV1Q and FEV1·Ht−3 as the two best methods in predicting outcomes. Nevertheless, only about 14% of the subjects in their study cohort had COPD [12]. To the best of our knowledge, our present study is the first to have applied the GLI reference equations and compared the performance of different expressions of FEV1 in staging COPD and predicting disease-specific outcomes on a cohort of patients all having COPD.
The findings of our present work agree with previous studies in that staging based on FEV1Q and (FEV1·Ht−2)B performed well in predicting and stratifying the risks of adverse outcomes, in both single- and multivariate analyses. This carries practical importance, because in clinical settings clinicians often weigh those patient-specific variates differentially and even consider the severity stages in isolation. Like previous researchers, we also found that the widely applied GOLD criteria performed inadequately in predicting outcomes, particularly the risk of SAE. The GOLD criteria and (FEV1·Ht−2)B both used cut-off values that resulted in an uneven splitting of the patients. Consistent with the findings of Miller et al. [7], compared with the GOLD criteria, (FEV1·Ht−2)B classified more patients into the highest and lowest stages than into intermediate stages 2 and 3, and yet it performed better than the GOLD criteria in predicting outcomes. Our present study differs from those of Vaz Fragoso and co-workers [4, 5] in that we did not observe any advantage in terms of outcome prediction in z-score-based staging. The performance of z-scores in outcome prediction was still unsatisfactory following the use of the five-stage cut-off values proposed by Quanjer et al. [6] (table S3 and figure S2a). Unlike in previous studies, the performance of staging based on FEV1·Ht−3 in the present study was inadequate in discriminating the differences in survival and mortality risk, and this could not be improved by changing the cut-off values to quintiles as proposed by Pedone et al. [10] (table S3 and figure S2b). When we analysed the risks of having at least one SAE at 2 and 3 years of follow-up, FEV1Q-based staging was the only method that consistently yielded a stepwise increase in all the crude and adjusted odds ratios with advancing stages (table S4).
Another important implication from our findings is that the performance of staging methods in predicting outcomes can also be influenced by the choice of cut-off values. This was demonstrated by the difference in performance between the GOLD criteria and FEV1 % pred (quartiles), and also between (FEV1·Ht−2)A and (FEV1·Ht−2)B. It is unlikely that the relationship between the severity of COPD based on any staging method and the risk of various adverse outcomes is linear. The best cut-off values, even for staging with FEV1Q, may still need to be refined and validated, particularly in large clinical populations.
There are some limitations to the present study. First, the relatively small size of our cohort might have rendered our study insensitive to subtle differences. Nevertheless, the application of bootstrapping helped to offset this limitation. Second, the majority of our participants were male, which was mainly because of the epidemiology of cigarette smoking and COPD in Taiwan, where more than 95% of smokers are male [35]. Generalising our findings to populations with different sex ratios needs to be carried out with caution. Third, this study was conducted in the outpatient setting of a single tertiary medical centre, and the conclusions might not be generalisable to all patients with COPD. Finally, we did not analyse psychosocial factors or treatments in our participants, and therefore cannot exclude the potential confounding effects from those factors.
Conclusion
We show that staging of COPD severity based on the GOLD criteria performed inadequately in predicting the risk of adverse outcomes. Staging based on the reference-independent FEV1Q using quartiles as cut-off values best predicted the risks of SAE and mortality and performed well in stratifying the differences in survival. Staging based on (FEV1·Ht−2)B was the next best performing method. The performance of staging methods depends on the expression of FEV1 and on the choice of cut-off values.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary figure S1 ERJ-00577-2017_Supplementary_figure_S1
Supplementary figure S2 ERJ-00577-2017_Supplementary_figure_S2
Supplementary table S1 ERJ-00577-2017_Supplementary_table_S1
Supplementary table S2 ERJ-00577-2017_Supplementary_table_S2
Supplementary table S3 ERJ-00577-2017_Supplementary_table_S3
Supplementary table S4 ERJ-00577-2017_Supplementary_table_S4
Acknowledgements
We thank all the patients enrolled in this study for their contribution. We are also grateful to Wei-Ming Wang, Wan-Ni Chen and Chih-Hui Hsu, statisticians from the Biostatistics Consulting Center of National Cheng Kung University Hospital, for providing statistical consultation and assistance. Parts of the data in this paper were presented in a poster discussion entitled “In- and out-patient COPD management” at the European Respiratory Society 2016 International Congress.
Author contributions: T-H. Huang: study design, acquisition of subjects, data analysis and interpretation, figures and tables, writing of manuscript. T-R. Hsiue, P-L. Su, X-M. Liao: acquisition of subjects, data analysis, and comments on data interpretation and figures. S-H. Lin: consultation and assistance on statistical analysis, computation, and interpretation. C-Z. Chen: study design and concept, analysis and interpretation of data, writing of manuscript.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Support statement: This study was supported by the grant NCKUH-10508003 from National Cheng Kung University Hospital. The sponsor had no role in the design, methods, subject recruitment, data collections, data analysis or preparation of the paper. Funding information for this article has been deposited with the Crossref Funder Registry.
Conflict of interest: None declared.
- Received December 14, 2016.
- Accepted January 31, 2018.
- Copyright ©ERS 2018