Abstract
The objective of this study was to redesign the current grading of obstructive lung disease so that it is clinically relevant and free of biases related to age, height, sex and ethnic group.
Spirometric records from 17 880 subjects (50.4% female) from hospitals in Australia and Poland, and 21 191 records (53.0% female) from two epidemiological studies (age range 18–95 years) were analysed. We adopted the American Thoracic Society(ATS)/European Respiratory Society (ERS) criteria for airways obstruction based on an forced expiratory volume in 1 s (FEV1)/(forced) vital capacity ((F)VC) ratio below the fifth percentile and graded the severity of pulmonary function impairment using z-scores for FEV1, which signify how many standard deviations a result is from the mean predicted value.
Using the lower limit of normal for FEV1/(F)VC and z-scores for FEV1 of -2, -2.5, -3 and -4 to delineate severity grades of airflow limitation leads to close agreement with ATS/ERS severity classifications and removes age, sex and height related bias.
The new classification system is simple, easily memorised and clinically valid. It retains previously established associations with clinical outcomes and avoids biases due to the use of per cent predicted FEV1. Combined with the Global Lung Function prediction equations it provides a worldwide diagnostic standard, free of bias due to age, height, sex and ethnic group.
Abstract
Using FEV1 z-scores to classify severity of airways obstruction is clinically valid and overcomes bias inherent in % pred http://ow.ly/pRyGI
Introduction
It is accepted that a diagnosis of airflow limitation should be based on an abnormally low ratio of the forced expiratory volume in 1 s (FEV1) and vital capacity (expressed as the larger of inspiratory or expiratory vital capacity (VC), or forced vital capacity (FVC)). Surprisingly, there is no universal agreement on what constitutes a low FEV1/(F)VC ratio. The American Thoracic Society (ATS)/European Respiratory Society (ERS) recommend the fifth percentile of the distribution in a population of healthy lifelong nonsmokers as the lower limit of normal (LLN) [1–3], whereas the Global Initiative for Chronic Obstructive Lung Disease (GOLD) [4] recommend a post-bronchodilator ratio of <0.70 as indicative of persistent airways obstruction. The latter lower limit was arbitrarily selected because of its simplicity, and because it constituted a limit that would not vary with the prediction equation used. Many investigators have documented that the GOLD criterion for defining airway obstruction can result in marked overestimation of the prevalence of airway obstruction in subjects aged ≥45 years, which may lead to unnecessary medical expenditure and to underestimation in younger adults [5]. This is because using a fixed ratio of 0.70 ignores both the natural decline of the FEV1/(F)VC ratio with age and the sex differences observed in a normal population. The limitations of this approach and the resultant misdiagnoses have been acknowledged by the GOLD group [4]. A standardised approach to interpretation of spirometry utilising the LLN for FEV1/(F)VC provides a statistically and scientifically more valid approach to defining abnormality in the FEV1/(F)VC ratio, with consequent reductions in misdiagnoses of pathological airways obstruction and earlier detection of mild disease.
Once a diagnosis of airways obstruction has been made, spirometry values are commonly used to categorise pulmonary function impairment and the ATS/ERS recommended approach is to use the FEV1 as a percentage of the predicted value (FEV1 % pred) as the basis for this classification [3]. Current ATS/ERS recommendations define a FEV1 % pred of >70% as mild impairment, 60–69% as moderate impairment, 50–59% as moderately severe impairment, 35–49% as severe impairment, and <35% as very severe impairment [3]. This approach uses an arbitrary number of severity categories, using arbitrarily decided cut-off values. However, the use of per cent predicted in this way leads to a pronounced age-related bias. For example, the LLN for FEV1 in elderly healthy subjects and in preschool children may be as low as 65% pred and 72% pred, respectively [6, 7]. The result is that the severity of impairment category corresponds to differing degrees of abnormality for different age groups. A more valid approach would be to take into consideration the underlying distribution of normal lung function data when classifying test results into severity category. The z-score (signifying the number of standard deviations a result is from the mean predicted value) provides a metric for achieving this. A similar approach has previously been shown to provide effective risk stratification for death from respiratory and vascular causes [8].
The objective of the present study is to evaluate an alternative to the ATS/ERS recommendations for grading respiratory impairment to overcome biases related to age, height and sex.
Materials and methods
Materials
We used adult data from two population-based studies: the Health Survey for England (HSE) comprising 9961 subjects (53.8% female, age range 18–93 years) [9] and the National Health and Nutrition Examination Survey (NHANES III) comprising 11 230 subjects (52.2% female, age range 18–90 years) [10]. All data from these epidemiological studies were included in the analysis, and not just those data used for the derivation of normal predicted values. We also utilised data from three clinical populations: 2776 records from the Austin Hospital in Victoria, Australia (49.8% female, age range 18–95 years); 4258 from the John Hunter Hospital in New South Wales, Australia (53.8% female, age range 18–95 years); and 10 846 from the National Research Institute of TB and Lung Diseases (TLDRI) in Warsaw, Poland (47.8% female, age range 18–92 years). The clinical data comprised consecutively collected test results from patients referred for lung function assessment for the typical wide range of clinical purposes seen in large hospital-based clinical laboratories. Tests were conducted between August 2008 and June 2012 (John Hunter Hospital), January 2001 and May 2012 (Austin Hospital), and April 2009 and June 2012 (TLDRI). All spirometry was performed in accordance with internationally agreed standards applicable at the time of data collection [11, 12], and only baseline or pre-bronchodilator status data was included in the analysis.
All analyses were limited to those aged ≥18 years and to those of European descent, because there were few non-European data in the clinical dataset. This study is a retrospective analysis of de-identified data, obviating the need for approval from local Ethics Committees.
Methods
Predicted values and z-scores were derived for each subject in each dataset using prediction equations from the Global Lung Function Initiative (GLI-2012) [7], using specially developed software (GLI-2012 Data Conversion software; www.lungfunction.org/files/InstallGLI2012_DataConversion.EXE). The z-score indicates the number of standard deviations that a measurement differs from the mean predicted value. Airway obstruction was diagnosed using the ATS/ERS definition [3] of FEV1/(F)VC <LLN, i.e. the z-score was < -1.645 [1–3, 6, 7]. Categorisation of the severity of airway obstruction was also made using the five category scale recommended by ATS/ERS [3].
Data analysis was performed using the statistical software R (Version 3.0.1; The R Project for Statistical Computing; www.r-project.org). As the relationship between the z-score for FEV1 and FEV1 % pred is symmetrically heteroscedastic (fig. 1), the mean z-score was derived at ratios of 0.8, 0.7, 0.6, 0.5, 0.4, 0.35 and 0.30, and subsequently described by linear regression. Differences between the ATS/ERS recommended scale and new grading systems were tested using two-sided t-tests and by multinomial logistic regression using sex, age and height categories as covariates.
Relationship between the z-score for forced expiratory volume in 1 s (FEV1) and FEV1 as percentage of the predicted value, with subjects stratified by age group; values above zero z-score and 100% pred fan out in a symmetrical way.
Results
table 1 shows the age distribution of the subjects in the five study groups. As expected, the prevalence rate of airways obstruction is much higher in clinical patients than in the population-based samples (fig. 2).
Percentage of subjects with airways obstruction for the clinical datasets and population-based surveys using the American Thoracic Society/European Respiratory Society definition of forced expiratory volume in 1 s (FEV1)/forced vital capacity (FVC) less than the lower limit of normal (solid lines) and using the Global Initiative for Chronic Obstructive Lung Disease definition of FEV1/FVC <0.70 (dotted lines). The grey line depicts the percentage of healthy nonsmokers with FEV1/FVC <0.70 derived from the population-based surveys.
There is a strong linear relationship between FEV1 % pred and FEV1 expressed as a z-score (fig. 1) across the FEV1 30–80% pred range (z-score FEV1=5.589×FEV1 % pred−5.894, adjusted R2=0.9997). This allows reliable reconstruction of ATS/ERS severity classification cut-off values by simple replacement of FEV1 % pred with z-scores. After accounting for differences in age there were clinically trivial, albeit statistically significant, differences in intercept: -0.0048 in centre 2 (John Hunter Hospital) and 0.0024 in centre 3 (TLDRI).
Table 2 shows the z-score equivalencies to the ATS/ERS schema. The z-scores have been rounded for simplification since it is cumbersome to use z-scores to precise values and, given that the category boundaries were arbitrarily defined originally [3], this rounding will have little overall effect. Z-scores of -2, -2.5, -3 and -4 can be used as cut-off values that faithfully reproduce the ATS/ERS severity categories.
FEV1 % pred correlated negatively (R=0.29; p<0.001) with age and sex (higher FEV1 % pred in females). The relationship between FEV1 % pred and z-score for FEV1 was significantly affected by age (fig. 1) and sex (R=0.988) (data not shown). At an FEV1 of 70% pred the z-score for FEV1 varied between -1.53 and -2.59; at a z-score for FEV1 of -2 FEV1 % pred varied between 68% pred and 81% pred (fig. 1). In patients with an FEV1/FVC ratio <LLN the average ATS/ERS grade of 2.79 differed significantly from the 2.69 average grade with the new system (two-sided t-test, p=0.0006). Multinomial logistic regression revealed that allocation to more severe grades of airways obstruction using the ATS/ERS recommendations compared with the new allocation system was significantly related to age, sex and height.
There were 2097 (11.6%) clinical patients classified as having severe or very severe airflow limitation using the GOLD definition of FEV1/FVC <0.70 and FEV1 <50% pred [4]. The proposed new system classified a similar number as having severe or very severe airflow limitation (n=2079, 11.5%); however, in 624 (29.8%) patients the GOLD and proposed criteria lead to a different classification. The GOLD classification system underestimated severity of airways obstruction in 303 of these patients and overestimated severity in 321. Figure 3 shows the prevalence of severe and very severe airway obstruction using these two classification systems as related to age.
Percentage of subjects with severe or very severe airways obstruction for the clinical dataset using the proposed classification system (forced expiratory volume in 1 s (FEV1)/forced vital capacity (FVC) < lower limit of normal and z-score for FEV1 < -3) (solid line) or the Global Initiative for Chronic Obstructive Lung Disease definition (FEV1/FVC <0.70 and FEV1 <50% pred) (dotted line).
table 3 indicates the small differences in severity classification between the ATS/ERS grading system and the new z-score based grading system. The overall prevalence of airways obstruction is identical with the old and new systems since they both utilise FEV1/FVC <LLN to define abnormality.
Discussion
The primary objective of this study was to design a statistically robust system for detecting airways obstruction and for grading the extent of impairment. Our analysis reveals that the ATS/ERS recommended system of using FEV1 % pred to grade respiratory impairment is prone to biases caused by age, height and sex differences in underlying distribution of lung function. This can be circumvented by the use of z-scores, since they fully account for these biases, and also for bias that may be introduced by ethnicity factors. The bias due to sex and height, although statistically highly significant, is quite small because the coefficient of variation in males and females is virtually the same [7]. To illustrate this point, a 30-year-old 170 cm subject with a z-score for FEV1 of -2 has a FEV1 % pred of 75.9% pred and 75.4% pred for a male or female, respectively; at age 75 years the corresponding values are 65.1% pred and 64.3% pred, respectively.
The age bias has created recent discussion and controversy about the criteria used to define airways obstruction [5, 13–21]. The ATS and ERS advocate use of the fifth percentile of FEV1/FVC from a healthy population as the LLN [3], whereas the GOLD group adopted a fixed 0.70 cut-off value for the FEV1/FVC ratio because of its simplicity and independence of different prediction equations [4]. The GOLD criterion for airway obstruction, by ignoring sex differences and the natural decline of the FEV1/FVC ratio with age to well below 0.70, leads to a >50% overestimation of the prevalence rate of airway obstruction in subjects aged ≥45 years and to significant underestimates in younger adults [5]. These findings are corroborated in the present study (fig. 2) and have been implicitly acknowledged by the GOLD group [4]. On that account we adopted the LLN for FEV1/(F)VC as the gold standard for diagnosing airways obstruction. It is important to note that this study is not about diagnosing clinical syndromes, such as asthma or chronic obstructive pulmonary disease (COPD) (which rely on clinical assessment in addition to spirometric measurement), but to provide a statistically valid system for interpretation of spirometry.
There is obviously some overlap in FEV1/FVC ratio between those with and those without respiratory disease, and spirometry results cannot be used to categorically separate ill from healthy individuals. How test results are interpreted is clearly important and the choice between ATS/ERS and GOLD guidelines for this requires some justification. One key difference between the two guidelines is that the GOLD criteria identify a higher incidence of airflow obstruction in elderly subjects. One would expect that subjects in this “marginal zone” (with FEV1/FVC <0.70 but >LLN) would develop clinical symptoms and signs of disease, and follow-up studies have shed light on this clinically relevant issue. In longitudinal studies of subjects in this marginal zone no association has been found with increased risk of all-cause mortality [13–15, 22] (except in symptomatic smokers [16]), development of respiratory symptoms [16], accelerated decline in FEV1 [13, 16–18], or with respiratory care utilisation or poorer quality of life scores compared with a reference group [16]. In a review of the literature, Mohamed Hoesein et al. [23] could only find one publication by Mannino et al. [24] allegedly demonstrating that subjects with an FEV1/(F)VC ratio <0.70 but >LLN had an increased risk of premature death and hospitalisation for COPD, and on that basis accepted that GOLD grade I does represent respiratory disease. However, Mannino et al. [24] misrepresented their own findings: the adjusted hazard ratio for death of 1.1 was not elevated given the confidence interval for this value was 0.96–1.3. Also, the authors conceded that the “measure of COPD-related hospitalisations was too inclusive” [25]. Thus, there is no evidence that this marginal zone represented by GOLD stage I indicates respiratory disease. Conversely, individuals with FEV1/(F)VC <LLN, but not nonsmokers with FEV1/(F)VC <0.70 but >LLN, have an increased risk of all-cause mortality [19, 20], development of respiratory symptoms [19, 20], accelerated decline in FEV1 [16] and hospitalisation for COPD [21]. Therefore, we believe that there is overwhelming evidence that the LLN, defined as the fifth percentile of the distribution in a reference population, should be used to diagnose pathological airflow limitation.
Another issue of concern is the use of FEV1 % pred for categorising disease severity. There have been numerous publications, summarised in the GLI-2012 report [7], that show the underlying scatter in FEV1 is not proportional to the predicted value, and that it varies with age. There is, therefore, a widening gap between 80% pred and the LLN for FEV1, particularly in adults aged ≥45 years (fig. 4), leading to an age-related overestimate of the severity of airflow limitation. The proposed modifications to the ATS/ERS criteria (table 2) remove the bias in age, sex and height; they led to identification of 4% of healthy females and males from the NHANES III and HSE studies aged ≥18 years as having mild airways obstruction.
Relationship between age and the predicted value for forced expiratory volume in 1 s (FEV1) (solid black line), its lower limit of normal (solid grey line) and FEV1 80% pred (dashed black line) in females aged 18–95 years.
Using the absolute value of FEV1/FVC to define disease severity is problematic, in that more severe airflow obstruction can lead to a reduction in FVC with a paradoxical increase in FEV1/FVC and poorer prognosis [26]. For this reason severity scaling using FEV1 % pred was recommended [3]. FEV1/FVC ratios as low as 0.31, 0.36, 0.29, 0.21 and 0.15 occurred in ATS/ERS criteria and were defined as mild, moderate, moderately severe, severe and very severe airway obstruction, respectively. Hence we have focused on FEV1 z-scores.
Correct classification of obstructive spirometry values in the moderate to severe range is of importance in both clinical and research settings. The most recent GOLD guidelines (2011–2013) [4] introduced categorisation for risk of exacerbation, with one condition leading to patients being categorised as “high risk” if FEV1 was <50% pred. We identified 2097 (11.6%) patients meeting this spirometric criterion. Nearly the same number of patients were identified with severe or very severe airway obstruction according to the new proposed criteria (FEV1/FVC<LLN and FEV1 z-score < -3). Although these results appear to be similar, there is actually a 29.8% disagreement in classification. In the younger age groups the GOLD criterion underestimates the real severity of the obstruction in half of these patients and overestimates it in half of the elderly (fig. 3). This may lead to inappropriate management and/or treatment in both situations and introduces a considerable bias in the selection of patients for intervention studies and other research.
Utilising the underlying spread of spirometry data to stratify risk is not new. In 1983, Peto et al. [8] described a 10-fold increase in risk of death from respiratory disease over the 20 subsequent years in those with a FEV1 more than two standard deviations below average. In developing these new classification criteria we have used z-scores as an essentially identical approach to help stratify severity of disease state. The use of z-scores is already standard procedure when evaluating bone mineral density and growth curves [27, 28], and is recommended for the interpretation of lung function test results [6, 7]. However, recent analyses reveal that neither FEV1 % pred nor z-scores provide the best indices of choice for studying mortality. Expressing lung function as multiples of the minimum FEV1 considered to be compatible with life (estimated at 400 mL and 500 mL in female and male adults, respectively) may provide better prediction of mortality [29]. In a 175 cm tall male aged 30 years this minimum FEV1 of 500 mL represents an FEV1 of 11.5% pred, z-score -6.51, and at age 75 years an FEV1 of 17.0% pred, z-score -4.17; illustrating that both the per cent predicted and z-score approaches do not account for this age bias. A z-score of -4.17 in the elderly subject is equivalent to -6.51 in the younger subject from a mortality perspective. Hence, disease severity should be categorised using our suggested cut-off values but we should not be tempted to use absolute z-scores to further stratify disease severity.
Spirometric indices of pulmonary function correlate poorly with respiratory symptoms, dyspnoea, exacerbations, hospitalisation, exercise limitation and quality of life. Therefore, any system for grading severity of airflow limitation based solely on spirometric test results will inherently fail to satisfactorily capture the complete clinical picture. Rather than being treated solely as a disease of the lungs, COPD is now considered by some as a multifactorial systemic disease [30]. In view of this the GOLD committee introduced a new classification system that combines spirometry with respiratory symptoms in an attempt to quantify future risk; however, this system had not been previously validated. Leivseth et al. [22] concluded that GOLD grade 1 was not associated with increased mortality, and that spirometric GOLD grades 2 and higher predicted mortality better than the new GOLD ABCD groups amongst people with COPD from a Norwegian general population. Also, survival is better in the more severe COPD group C (low lung function but less dyspnoea) than in the less severe group B (much better lung function but more dyspnoea) [31], and the addition of dyspnoea and exacerbations to the severity classification did not add prognostic value on long-term COPD outcomes [32]. It therefore seems that more research is required to clinically phenotype the heterogeneous syndrome of COPD and identify biomarkers of disease, with a view to bringing treatment tailored to the individual patient within reach [33], and improving the classification of respiratory impairment and improving risk assessment.
Adopting the proposed adjustments to the ATS/ERS recommendations avoids contaminating research groups with healthy subjects and avoids unnecessary medical interventions. In addition, this proposal would enable younger patients with milder disease, who would not have been identified using the GOLD criteria, to be included in future research initiatives. The all-age prediction equations for different ethnic groups issued by the Global Lung Function Initiative [7], and endorsed by six large international respiratory societies, have for the first time provided a means for categorising the LLN and also the severity of pulmonary impairment free of bias due to age, height, sex or ethnic group. Thus, a diagnosis of airways obstruction and categorisation of its severity no longer needs to be biased and dependent on where the measurements were performed.
Conclusion
The redesigned system for diagnosing airways obstruction and for categorising pulmonary function impairment is based on evidence which underpins its clinical validity and retains previously documented associations between severity of respiratory impairment and outcome variables, such as all-cause death. Application with the GLI-2012 prediction equations [7], coupled with standardisation in testing methodologies [11], provides the opportunity for a worldwide standard for performance and interpretation of spirometry, which is free of major sources of bias.
Acknowledgments
The authors gratefully acknowledge the constructive comments from the reviewers and Associate Editor M.R. Miller which helped to substantially improve the manuscript.
Footnotes
Conflict of interest: None declared.
- Received May 21, 2013.
- Accepted July 16, 2013.
- ©ERS 2014