Abstract
Published guidelines recommend spirometry to accurately diagnose chronic obstructive pulmonary disease (COPD). However, even spirometry-based COPD prevalence estimates can vary widely. We compared properties of several spirometry-based COPD definitions using data from the international Burden of Obstructive Lung Disease (BOLD)study.
14 sites recruited population-based samples of adults aged ≥40 yrs. Procedures included standardised questionnaires and post-bronchodilator spirometry. 10,001 individuals provided usable data.
Use of the lower limit of normal (LLN) forced expiratory volume in 1 s (FEV1) to forced vital capacity (FVC) ratio reduced the age-related increases in COPD prevalence that are seen among healthy never-smokers when using the fixed ratio criterion (FEV1/FVC <0.7) recommended by the Global Initiative for Chronic Obstructive Lung Disease. The added requirement of an FEV1 either <80% predicted or below the LLN further reduced age-related increases and also led to the least site-to-site variability in prevalence estimates after adjusting for potential confounders. Use of the FEV1/FEV6 ratio in place of the FEV1/FVC yielded similar prevalence estimates.
Use of the FEV1/FVC<LLN criterion instead of the FEV1/FVC <0.7 should minimise known age biases and better reflect clinically significant irreversible airflow limitation. Our study also supports the use of the FEV1/FEV6 as a practical substitute for the FEV1/FVC.
Although chronic obstructive pulmonary disease (COPD) is recognised as a major public health problem worldwide, estimates of its prevalence vary widely 1. Much of this variation probably reflects differences in the populations studied, spirometry methods and data quality control, and the rules used to define COPD. For example, self-reported physician diagnosis of COPD typically results in estimated prevalences well below those obtained based on spirometry 1, 2.
Although no gold standard definition of COPD exists, published guidelines recommend use of spirometry to define it 3, 4. However, even spirometry-based COPD prevalence estimates can vary by two-fold or more, depending on the definition used to classify mild disease 5, 6. The most widely used definition comes from the Global Initiative for Chronic Obstructive Lung Disease (GOLD), which recommends using a post-bronchodilator forced expiratory volume in 1 s (FEV1) to forced vital capacity (FVC) ratio <0.7 to define irreversible airflow limitation, and the FEV1 to stage disease 3. This “fixed ratio” approach, while easy to apply, appears to overestimate COPD in older individuals 2, 7–10 and to underestimate it in young adults 9, 11. Alternative definitions that account for normal ageing can alleviate this bias 9, 12 but, in turn, this raises questions about which reference equations are appropriate for which populations. In addition, if pre- (rather than post-) bronchodilator spirometry is used, COPD prevalence may be overestimated by as much as 30% 8, 10, 13, 14.
The Burden of Obstructive Lung Disease (BOLD) study is an international effort to collect population-based estimates of the prevalence and economic burden of COPD using standardised methods 15, 16. Using BOLD study data, we examined the impact on prevalence estimates of using the fixed ratio criterion versus various other spirometry-based definitions of COPD. We also compare the effects of using central versus site-specific prediction equations and of using the FEV1/FEV6 ratio in place of FEV1/FVC.
METHODS AND MATERIALS
The design of the BOLD study is described in detail elsewhere 15, 16 and only summarised here. Participating entities in the BOLD Collaborative Research Group are listed in the online supplement.
Population
Participating sites were expected to recruit population-based samples of ≥600 noninstitutionalised adults aged ≥40 yrs. We report data from the first 14 BOLD sites (table 1⇓), consisting of 10,001 individuals (93% of all responders) with acceptable post-bronchodilator spirometry. Each site obtained approval from local ethical committees and written informed consent from each participant.
Questionnaires
Questionnaires were administered by trained and certified staff and covered respiratory symptoms, smoking history, respiratory diagnoses and comorbidities. We defined pack-years of cigarette smoking exposure as average number of packs smoked per day (20 cigarettes per pack) multiplied by the number of years smoked. Never-smoking was defined as <20 packs of cigarettes in a lifetime.
Site-specific prediction equations were developed using never-smokers who had never been told by a healthcare provider that they had emphysema, COPD or tuberculosis, and did not report a current diagnosis of asthma or chronic bronchitis. We were unable to restrict to asymptomatic never-smokers due to the extremely small numbers of (particularly male) never-smokers at some sites.
Height and weight
We measured height (to the nearest centimetre) with the participant standing on a firm, level surface that was perpendicular to the vertical board of the height measurement device (ideally a wall-mounted stadiometer). Participants were instructed to remove their shoes and stand erect with feet flat on the floor, heels together, and head in the horizontal (Frankfort) plane.
Sites used calibrated scales (preferably balance beam or digital) to measure weight to the nearest 0.1 kg. Participants were instructed to remove shoes, hats, coats, and heavy items in their pockets in order to be weighed in light indoor clothing.
Body mass index (BMI) was computed as weight over height-squared and expressed in units of kg·m−2.
Spirometry
Lung function data were collected using the ndd EasyOne Spirometer (ndd Medical Technologies, Zurich, Switzerland), which was chosen for its portability and level of accuracy 17. Lung function was measured before and 15 min after administration of 200 μg of albuterol/salbutamol. Spirometry measures reported here include the FEV1, FEV6 and FVC, as well as the FEV1/FVC and FEV1/FEV6 ratios. FEV1 % predicted, although not reported separately, was used to stage COPD 3.
All spirograms were reviewed by the BOLD Pulmonary Function Reading Center and assigned an overall quality score based on standardised criteria 18. Local spirometry technicians were trained and certified, and received regular quality control feedback during data collection. Usable spirometry was defined as two or more acceptable blows, with FEV1 and FVC repeatability within 200 mL. Acceptable manoeuvres were defined as those with a rapid start (back-extrapolated volume <150 mL or <5% of the FVC), lack of a cough during the first second, and a small end-of-test volume (<40 mL during the final second). The calibration of all spirometers was verified to be accurate within 3.0% using a 3.00 L syringe at the beginning of each day of testing. Biological controls were not used.
Definition of COPD
The BOLD study uses the GOLD criteria for defining and staging COPD 3, which are consistent with the 2004 American Thoracic Society (ATS)/European Respiratory Society (ERS) criteria 4 and define COPD as a post-bronchodilator FEV1/FVC <0.70. The FEV1 % pred is used to further stage disease (FEV1 ≥80% pred: stage 1; ≥50 and <80% pred: stage 2; ≥30 and <50% pred: stage 3; <30% pred: stage 4). The BOLD study also uses the prediction equations for Caucasian adult males and females derived from the Third US National Health and Nutrition Examination Survey (NHANES-III) 19 as its primary reference equations for all participants, although this paper also examined the impact of using equations derived from Norway's Hordaland County Respiratory Health Study 20, as well as site-specific prediction equations, in place of the NHANES-III equations.
In addition, we assessed the impact of restricting COPD to GOLD stages 2 or above, and of using the lower limit of normal (LLN) of the FEV1/FVC, and the FEV1 in place of the fixed ratio and the FEV1 <80% pred criteria, in the GOLD definitions. Finally, we examined the impact of using FEV1/FEV6 in place of FEV1/FVC in our definitions. Table 2⇓ summarises the various definitions of COPD assessed in this manuscript.
Although the text focuses on post-bronchodilator spirometry, the results of comparable analyses based on pre-bronchodilator data are included in the online supplementary material.
Analysis
To provide comparability with earlier reports 16, the site-specific prevalences presented in figure 1⇓ are population-based estimates reflecting sampling designs used at each site. For all other analyses, data are pooled across sites and presented as unweighted prevalences with standard errors accounting only for correlations within the site and, where applicable, for clustering in the sampling plan. Comparisons of the prevalence estimates in figures 1⇓–⇓3⇓ and in table 3⇓ were computed using McNemar's test.
A desired characteristic of any prevalence estimator is that it gives comparable estimates in different populations after adjusting for known confounders. In order to compare the residual site-to-site variability associated with our various prevalence estimators, we report the Wald statistic for the “site” effect, as derived from logistic regression models that adjusted for age (40–49, 50–59, 60–69 and ≥70 yrs), sex, cigarette smoking history (never-smokers, 0–9, 10–19 and ≥20 pack-yrs), BMI (<20, 20–25, 25–30, 30–35 and >35 kg·m−2), years worked in a dusty job (0, 1–9 and ≥10 yrs) and interactions of sex with both age and smoking history. We also report Wald statistics for testing the significance of age in selected regression models. Where appropriate, we tested heterogeneity of age effects across strata using appropriate interaction terms. Under the null hypothesis of no effect, the Wald statistic will have an F-distribution with an expected value equal to one, and higher values indicate greater heterogeneity across subgroups. All Wald tests are adjusted for clustering in the sampling plan.
All analyses were conducted using Stata version 9.2 (Stata Corp., College Station, TX, USA).
RESULTS
Participants exhibited marked differences in smoking patterns across sites and between sexes within sites (table 4⇓). BOLD sites also differ markedly in prevalences of occupational and other potential COPD risk factors 16.
Use of the fixed ratio criterion (GOLD stage 1 and higher) produced overall population prevalence estimates that, for each site, were significantly greater than those for each of the other estimators (all but one p<0.0001). The fixed ratio estimates were generally 5–11 percentage points higher than those for GOLD stages 2–4 (fig. 1a⇑). The LLN (FEV1/FVC) criterion produced estimates that tended to be intermediate to these two GOLD-based definitions, although generally closer to the GOLD stages 2–4 criterion than to the fixed ratio criterion. The added requirement of an FEV1 <80% pred and an FEV1/FVC ratio below the LLN resulted in estimates that were 1–3 percentage points lower than estimates for GOLD stages 2–4. Finally, use of FEV1<LLN in place of FEV1 <80% pred in this latter definition further reduced estimates (although generally by less than one percentage point). These patterns were generally consistent across sites.
Regardless of the definition used, we observed sizable site-to-site variation in prevalence estimates (fig. 1b⇑). After adjusting for potential confounders, site-to-site variance in COPD prevalence (as measured by the Wald statistic) ranged from 7.1 to 8.6 and was lowest (7.1 and 7.3, respectively) using the “LLN (FEV1/FVC) and LLN (FEV1)” and “LLN (FEV1/FVC) and FEV1 <80% pred” criteria, respectively. These Wald statistics all indicated highly statistically significant (p<0.0001) residual site-to-site variability in prevalence estimates.
All prevalences reported in figure 1⇑ were lower than they would have been had we based them on pre-bronchodilator measurements (see online supplementary material). For the fixed ratio criterion, absolute declines between pre- and post-bronchodilator values ranged from 1 to 11 percentage points across centres, while using GOLD stages 2–4 instead of the fixed ratio criterion led to a decline in prevalence ranging from 1 to 6 percentage points across centres. On a relative basis, prevalence estimates declined between 25% to 29% (depending on the definition used) across the five measures in going from pre- to post-bronchodilator measurements.
The prevalence of “COPD” per the fixed ratio criterion increased sharply with age even among healthy never-smokers (fig. 2⇑), a population in which COPD is expected to be rare. By contrast, for the other measures we observed much more muted increases with age and, except for the LLN (ratio) criterion for the lowest age group (p = 0.14), the fixed ratio prevalence estimates were all significantly greater than those for each of the other estimators (p<0.0001). These age-related increases in prevalence were lowest for the “LLN (FEV1/FVC) and LLN (FEV1)” and “LLN (FEV1/FVC) and FEV1 <80% pred” criteria, for which the age-specific prevalence estimates varied from 2% among 40 yr olds, to 4–5% among those aged ≥70 yrs. We observed the same general patterns within each site (data not shown).
The Wald statistic for testing for age effects in figure 2⇑ dropped from a high of 62.6 for the fixed ratio criterion to 24.5 for GOLD stages 2–4, to ∼6.6 for the two analogues of these criteria in which FEV1/FVC <0.7 is replaced by FEV1/FVC<LLN, and to 3.4 for the “LLN (FEV1/FVC) and LLN (FEV1)” criteria. All were statistically significant. We found modest evidence of a statistically significant sex-age interaction using the fixed ratio criterion (Wald statistic 3.1, p = 0.027) and no evidence of statistically significant sex-age interactions in these healthy never-smokers using any of the other prevalence estimators.
When we assessed site differences in the group of healthy, never-smoking individuals, we observed smaller site differences for the GOLD stages 2–4 criterion (Wald statistic 1.6) than for the LLN (FEV1/FVC) criterion (Wald statistic 2.9), although once again the smallest site differences were seen for the “LLN (FEV1/FVC) and LLN (FEV1)” and “LLN (FEV1/FVC) and FEV1 <80% pred” criteria (Wald statistic 0.9–1.1). Indeed, for both of these latter criteria, the site differences did not come close to reaching statistical significance (p>0.35), whereas for the other three criteria the p-values were all less than 0.07.
Figure 3⇑ illustrates the impact on prevalence of using a single common prediction equation (the US NHANES-III Caucasian equations or the Hordaland County Respiratory Health Study equations) versus site-specific prediction equations. For both males and females, the estimated GOLD stage 2–4 prevalences were higher (by 2–3 percentage points overall; p<0.0001) when using common reference equations for all sites (NHANES-III and Hordaland County) than when using local prediction equations. The NHANES-III and Hordaland County prevalence estimates were generally similar, although they differed significantly overall and for the oldest age group. The Wald statistic for site differences computed from the site-specific equations (4.7) was less than the Wald statistic for the NHANES (9.5) and Hordaland County (8.4) equations, although all were highly statistically significant (p<0.0001). We observed similar patterns when we replaced the GOLD stage 2–4 criterion with the LLN (ratio) and FEV1 <80% pred criterion (data not shown), although the Wald statistics were closer (6.5 versus 7.3 and 8.9).
Finally, the use of the FEV1/FEV6 in place of the FEV1/FVC when using the “LLN (FEV1/FVC) and FEV1 <80% pred” criterion had little clinically relevant impact on prevalence estimates, whether computed overall, by age or smoking history categories, or by site (table 3⇑). When we did observe statistically significant differences, the prevalences were generally smaller for the FEV1/FEV6-based criterion.
DISCUSSION
This analysis of data from the BOLD study confirmed previously reported limitations associated with the use of the fixed ratio criterion to define COPD. Adjusting the FEV1/FVC for normative ageing effects appears to reduce the rate of false-positive diagnoses that has been reported for older individuals 2, 7–10, and the added requirement of a low FEV1 further reduced the age-related increases in COPD prevalence seen among healthy never-smokers.
A strength of this analysis is that data were gathered using a standardised approach from a wide range of populations, with close attention paid to spirometry quality control. The qualitative similarity of results across sites (fig. 1a⇑) provides strong evidence for the robustness of our findings. The wide variation in characteristics of BOLD sites enabled us to use site-to-site variation in prevalence (assessed using the Wald statistic) as a convenient metric for comparing alternative measures of COPD prevalence, since a desired characteristic of any prevalence estimate is that it yields comparable estimates in different populations after adjusting for known risk factors.
An obvious limitation of this analysis is the lack of a gold standard against which to assess our putative definitions of COPD (indeed, a more accurate descriptor of what we are measuring may simply be chronic airflow limitation). Nonetheless, it is possible to evaluate how alternative definitions perform in individuals who have a low a priori probability of disease. Our results confirm previous reports that the fixed ratio criterion lacks specificity and that, as age increases, increasingly misclassifies apparently healthy never-smokers as having COPD 2, 7–10, 12. This pattern of (apparent) misclassification with increasing age was greatly muted by adding the requirement that the FEV1 % pred be below a defined threshold, or by replacing the fixed ratio criterion with a criterion that the FEV1/FVC be below the LLN (fig. 2⇑). However, only the method requiring both an FEV1/FVC below the LLN and a low FEV1 (measured as either FEV1<LLN or FEV1 <80% pred) largely eliminated this age-related increase.
The upward trend that still persists in figure 2⇑ even with our “best” definitions of COPD may reflect the fact that our “healthy” never-smokers did include some individuals with symptoms. As noted below, this was a pragmatic decision due to the limited number of never-smokers at some sites. The fact that the NHANES-III prediction equations were fit to a cohort whose upper age limit was 80 yrs also may create an upward bias for very old individuals that helps explain the upward drift in figure 2⇑. However, <4% of the BOLD cohort were aged ≥80 yrs; in addition, the NHANES-III prediction equations for FEV1 include an age-squared term and so allow for accelerated ageing effects.
Notably, the recent ATS/ERS statement recommends using the LLN of the FEV1/FVC in place of the fixed ratio criterion to diagnose airflow obstruction 21; a recent paper by Swanney et al. 12, albeit using pre-bronchodilator spirometry, also supports this recommendation. Use of both an FEV1/FVC below the LLN and a low FEV1 was consistently associated with low site-to-site and age-related variation relative to other measures, after adjusting for known risk factors. Assuming that variability about the prediction equations is stable, using the LLN as a threshold for defining low FEV1 should produce less misclassification 22, although in practice these two measures performed similarly.
The results of our study also add to the evidence suggesting that, without both a low FEV1/FVC and a low FEV1, confidence is low that a true lung function abnormality (or airway disease) exists. The current GOLD stage 1 classification was based solely on expert opinion, not on evidence of airway disease or subsequent rapid loss of lung function. Patients with GOLD stage 1 do not have reduced exercise capacity 23. Among Lung Health Study participants, a rapid fall in FEV1 was not seen when baseline FEV1 was >70% pred 24.
Apart from the fixed ratio criterion, the competing measures we evaluated all require use of prediction equations. One of the purported benefits of the fixed ratio criterion is that it does not rely on such equations. However, as Swanney et al. 12 note, this easy-to-apply definition is only valid at age ∼50 yrs. In addition, the fixed ratio criterion is not necessarily easier to use in practice, since even inexpensive pocket spirometers include a microprocessor that calculates the appropriate LLN for FEV1/FVC, FEV1/FEV6 and FEV1. Lastly, even GOLD relies on prediction equations to stage disease, so any advantage of the fixed ratio in terms of its simplicity disappears as soon as one looks at clinically relevant impairment (nominally GOLD stage 2 or higher). The only way to overcome the limitations of the current fixed ratio criterion while still avoiding the need for prediction equations would be to establish a series of separate fixed ratio cut points for different ages.
The question then arises, what is a suitable prediction equation for any given population, and what if normative prediction equations do not exist for that population? While the documented variability in lung function that exists among “healthy” never-smokers in different racial groups may reflect, at least in part, true genetic differences in these populations, it also may represent the cumulative effect of environmental exposures, including childhood factors. For this reason, BOLD chose to use a single set of sex-specific prediction equations for all subjects in all sites. We chose the US NHANES-III equations for Caucasian adults because they were derived from a large study conducted in a diverse population with rigorous attention to quality control. We observed similar prevalence estimates using equations derived from Norway's Hordaland County Respiratory Health Study 20 in place of the NHANES-III equations.
The PLATINO study (Latin American Project for the Investigation of Obstructive Lung Disease), conducted in five Latin American countries using methods similar to those of the BOLD study, used site-specific prediction equations 25. In BOLD, the use of local prediction equations led to prevalence estimates 2–3 percentage points lower, on average, than those based on a single, common equation. Whether this means that the BOLD prevalence estimates overestimate the “true” estimate, or the local equations underestimate it, we cannot say, but on balance we prefer to maintain the site-to-site variation and see if it can be explained by other risk factors. Because our local equations were fitted to individuals aged ≥40 yrs, while the NHANES-III equations were fitted to adults aged ≥18 yrs, the former may better describe the accelerated ageing that is known to occur in healthy adults. Also, we included symptomatic individuals in our prediction equations as long as they did not report diagnosed disease, whereas the NHANES-III equations required individuals to be asymptomatic. Since there can be large discrepancies between prediction equations based on individuals with and without major respiratory symptoms 26, this may also help to explain the somewhat lower prevalence estimates between the two approaches. One final consideration relating to the use of site-specific prediction equations, particularly if reliable normative equations for that population do not exist, is that the resulting estimates may be highly variable owing to limited sample sizes. For instance, despite relatively large sample sizes from each site, the number of healthy never-smokers available to build our prediction equations was very limited in some sites due to extremely high rates of ever having smoked.
Considerable attention is now being paid to the use of the FEV1/FEV6 as an alternative to the FEV1/FVC, particularly in older, less healthy populations for whom achievement of a high quality, reproducible FVC may be problematic 27. Several studies have shown that the FEV1/FEV6, for which reliable reference equations exist 19, is a more reproducible measure than is the FEV1/FVC 28, 29, and predicts subsequent lung function decline about as well as the FEV1/FVC 30. Our results (table 3⇑) show that using the FEV1/FEV6 in place of the FEV1/FVC in our definition of “LLN (FEV1/FVC) and FEV1 <80% pred” yields very similar prevalence estimates, thus further supporting the use of this alternative measure in future studies of COPD prevalence. Once an obstructive lung disease has been diagnosed, however, changes in FEV1 should be used to follow disease progression or treatment responses.
Finally, our observation that use of pre-bronchodilator spirometry results in consistently inflated estimates of chronic airflow obstruction, regardless of the definition used, further emphasises the need for using post-bronchodilator spirometry to classify COPD 14. Our finding that prevalence estimates dropped, on average, ∼25% when using post-bronchodilator spirometry is generally consistent with other reports 8, 10, 13. Although we recognise that well-assessed, normal pre-bronchodilator spirometry has high negative predictive value even in the absence of post-bronchodilator testing, its use is associated with the more serious risk of increased false-positive diagnoses.
In summary, data from the BOLD study confirm previous reports of misclassification using the fixed ratio criterion to measure COPD. As an alternative, we recommend a definition based on an FEV1/FVC ratio less than the LLN, and an FEV1 either <80% pred, or below the LLN. This modification of the current GOLD stage 2 severity threshold appears to better account for known ageing effects in healthy never-smokers. While this new definition will likely miss many individuals with mild COPD, it should capture most individuals with clinically significant disease, while minimising the risk of false-positive diagnoses. Finally, substitution of the FEV1/FEV6 in place of the FEV1/FVC in this definition appears to yield similar prevalence estimates and, based on previous reports, may be a more reproducible and practical measure.
Statement of interest
Statements of interest for W.M. Vollmer and P.L. Enright can be found at www.erj.ersjournals.com/misc/statements.dtl The BOLD initiative has been funded in part by unrestricted educational grants to the Operations Center. For full details see the online supplementary material available from www.erj.ersjournals.com
Acknowledgments
The BOLD study group wishes to acknowledge the contributions of G. Harnoncourt (ndd Medizintechnik AG, Zurich, Switzerland) and P.L. Enright (The University of Arizona, Tucson, AZ, USA) for their assistance with spirometry training and quality control during the study.
Footnotes
-
For editorial comments see page 527. This article has online supplementary material available from www.erj.ersjournals.com
- Received October 30, 2008.
- Accepted April 27, 2009.
- © ERS Journals Ltd