Abstract
The diagnosis and severity categorisation of obstructive lung disease is determined using reference values. The American Thoracic Society/European Respiratory Society in 2005 recommended the National Health and Nutrition Examination Survey (NHANES) III spirometry prediction equations for patients in USA aged 8–80 years. The Global Lung Initiative 2012 (GLI 12) provided spirometry prediction equations for patients aged 3–95 years. Comparison of the NHANES III and GLI 12 prediction equations for diagnosing and categorising airway obstruction in patients in USA has not been made.
We aimed to quantify the differences between NHANES III and GLI 12 predicted values in Caucasians aged 18–95 years, using both mathematical simulation and clinical data. We compared predicted forced expiratory volume in 1 s (FEV1) and lower limit of normal (LLN) FEV1/forced vital capacity (FVC) % for NHANES III and GLI 12 prediction equations by applying both a simulation model and clinical spirometry data to quantify differences in the diagnosis and categorisation of airway obstruction.
Mathematical simulation revealed significant similarities and differences between prediction equations for both LLN FEV1/FVC % and predicted FEV1. There are significant differences when using GLI 12 and NHANES III to diagnose airway obstruction and severity in Caucasian patients aged 18–95 years.
Similarities and differences exist between NHANES III and GLI 12 for some age and height combinations. The differences in LLN FEV1/FVC % and predicted FEV1 are most prominent in older taller/shorter individuals. The magnitude of the differences can be large and may result in differences in clinical management.
Abstract
Significant differences exist between NHANES III and GLI 12 prediction equations for some age and height combinations http://ow.ly/4mWTZ8
Introduction
Pulmonary function tests (PFTs) are fundamental in respiratory medicine, but their usefulness depends on many factors, including appropriate spirometry prediction equations [1]. The American Thoracic Society (ATS) and European Respiratory Society (ERS) task force (2005) recommended that reference values for PFTs be obtained from “normal” or “healthy” subjects from a similar age range and race/ethnicity as the population being tested, wherever possible [1]. The ATS/ERS task force recommends the use of the National Health and Nutritional Examination Survey (NHANES) III prediction equations as the spirometry standard to be used in patients aged 8–80 years in USA [1]. The NHANES III prediction equations have also been used globally.
NHANES III prediction equations are based on data obtained using equipment and techniques that met or exceeded the ATS guidelines [2]. After applying strict exclusion criteria, 7429 subjects (3041 males) aged 8–80 years were included in the NHANES III polynomial model development with age and height as predictors for three specific populations: Caucasian, African-American and Mexican-American [2]. The ATS/ERS task force did not recommend extrapolating the NHANES III prediction equations for patients aged >80 years [1]. However, extrapolating NHANES III equations beyond 80 years is common in clinical practice and there is some evidence that this is reasonable [3].
The ERS Global Lung Initiative 2012 (GLI 12) through the GLI task force developed spirometric reference equations for ages 3–95 years using the generalised additive models for location, scale and shape approach [4–8]. The GLI 12 prediction equations were drawn from data collected in a number of international studies (73 centres around the world), including NHANES III data, and are intended to be applied globally [4]. GLI 12 data were collected under standardised measurement conditions using validated equipment and software [4]. After applying strict exclusion criteria, 74 187 subjects (42.94% male) aged 2.5–95 years were included in the GLI 12 spirometric prediction equations developed for four specific populations: Caucasian, Black, North-East Asian and South-East Asian [3]. The GLI 12 spirometric prediction equations have been endorsed by the ERS, the ATS, the Australian and New Zealand Society of Respiratory Science, the Asian Pacific Society of Respirology, the Thoracic Society of Australia and New Zealand and the American College of Chest Physicians [4]. GLI 12 prediction equations are in use in some laboratories in USA [9, 10]. Several spirometry manufacturers have programmed, or are programming, the GLI 12 prediction equations into products intended for use in USA and around the world [11].
Differences and similarities between the NHANES III and GLI 12 prediction equations in diagnosing and categorising airway obstruction in patients in USA using model simulation and clinical data have not been evaluated. We quantified these differences and similarities in both a mathematical model simulation and in a clinical spirometry data set.
Methods
NHANES III and GLI 12 model simulation prediction equation comparisons
Diagnosis of pulmonary airway obstruction is based on prediction of lower limit of normal (LLN) of forced expiratory volume in 1 s (FEV1)/forced vital capacity (FVC) ratio (LLN FEV1/FVC%). The severity categorisation of pulmonary airway obstruction is based on FEV1 % predicted [1]. For comparison of these two prediction equations, we simulated NHANES III [2] and GLI 12 [4–6] prediction equations for FEV1 and LLN FEV1/FVC% for Caucasians aged 18–95 years with 5-cm increments of height in the range 155–200 cm for males and 145–185 cm for females. This height range was chosen because it includes most patients presenting to an adult PFT laboratory. For comparison purposes we extrapolated the NHANES III prediction equations for subjects aged >80 years, a practice that is commonly used in the clinical setting, but not recommended by the ATS/ERS task force [1].
For the GLI 12 prediction equations, we used the male and female Lspline, Mspline and Sspline values, coefficients and prediction equations for the Caucasian population, taken from the GLI 12 “lookup tables” (www.ers-education.org/home/browse-all-content.aspx?idParent=138978) (online supplementary table S1). We performed model simulations using MATLAB (MATLAB Central File Exchange version R2013b; MathWorks, Natick, MA, USA).
To quantify the prediction equation differences we generated two-dimensional displays of LLN FEV1/FVC% and predicted FEV1 differences as a function of age with height isopleths in increments of 5 cm. If the difference in predicted FEV1 between NHANES III and GLI 12 exceeded 150 mL, the repeatability recommendation from the ATS/ERS task force [12, 13], we declared the difference clinically significant.
Differences between NHANES III and GLI 12 prediction equations in the diagnosis and categorisation of airway obstruction in a clinical spirometry data set
We obtained 12 116 clinical spirograms from Caucasian males and females aged 18–95 years in adult academic PFT laboratories at Intermountain Medical Center and LDS Hospital (Salt Lake City, UT, USA). The data were collected from 2001 to 2013 (table 1). Certified PFT technicians performed the tests using standard equipment and techniques that met ATS standards in use at the time of testing [1, 12]. Laboratory technicians routinely made extensive efforts to meet the acceptability and repeatability criteria applicable at the time of spirometry [1, 12]. Subjects were weighed and their height measured in indoor clothing without shoes using a calibrated scale and stadiometer, respectively. Age was recorded to the nearest birthday.
Because our patient population is 92% Caucasian, we restricted our analysis to Caucasian patients for this NHANES III versus GLI 12 clinical airway obstruction diagnosis and severity comparison. We used only pre-bronchodilator spirograms. We divided patients into those aged 18–80 years (the age range of the NHANES III study) and those aged 81–95 years (table 1). We analysed the clinical data of patients aged 81–95 years separately because NHANES III needs to be extrapolated beyond 80 years and the ATS/ERS 2005 task force did not recommend extrapolation beyond 80 years.
We defined obstruction as a measured FEV1/FVC % <LLN for either the NHANES III or GLI 12 predictions [1]. We then compared the number of patients obstructed according to NHANES III with the number obstructed using GLI 12 and identified a common set of patients who were obstructed according to both prediction equations. We used this common set to compare the categorisation of obstruction severity using both prediction models. Severity of airway obstruction was categorised using FEV1 % pred according to the ATS/ERS task force [1] for either the NHANES III or GLI 12 prediction equations. Specifically, mild obstruction was defined as FEV1 >70% pred, moderate as FEV1 50–69% pred and severe as FEV1 <50% pred. Next, we evaluated obstructive classifications and severity of obstruction according to NHANES III and GLI 12 with rates of concordance and discordance. Differences in categorisation for airway obstruction severity between NHANES III and GLI 12 were based on predicted FEV1 values, as a function of age and sex.
Institutional review board approval was waived because only de-identified patient data was used.
Statistical analysis
We compared diagnostic obstructive classifications (obstruction versus no obstruction) using rates of concordance and discordance. Concordance is defined as the same outcome for an individual according to NHANES III and GLI 12, while discordance is defined as different outcomes returned for the same individual by the two approaches (table 2). In addition, we evaluated binary severity categorisations (mild versus moderate or greater severity) with rates of concordance and discordance to evaluate whether NHANES III and GLI 12 produced similar classifications on the same patients (table 3). All statistical analyses were performed using R (R statistical software, version 2.11; www.r-project.org).
Results
NHANES III and GLI 12 model simulation prediction equation comparisons
Male and female prediction equation differences (NHANES III − GLI 12) for LLN FEV1/FVC% are shown in figure 1. Male and female prediction equation differences for predicted FEV1 are shown in figure 2.
Figure 1 shows that prediction equation differences in LLN FEV1/FVC% between NHANES III and GLI 12 are a function of age, height and sex. The differences are most pronounced for taller and shorter males and for taller females aged >30 years. The prediction equation differences in LLN FEV1/FVC% between the NHANES III and GLI 12 ranged from −2% to +3.5% (fig. 1). NHANES III consistently predicted a higher LLN FEV1/FVC% than did GLI 12 for males aged >70 years and of any height; this difference was greater for taller males. For males aged 18–70 years, NHANES III predicted higher LLN FEV1/FVC% than did GLI 12 for taller individuals. In the case of females, NHANES III consistently predicted a higher LLN FEV1/FVC% for individuals aged 25–95 years; this difference was greater for taller females. For females aged 18–25 years, NHANES III predicted higher LLN FEV1/FVC% than did GLI 12 for taller individuals. These results indicate that taller males and females are more likely to be diagnosed with airflow obstruction when the NHANES III rather than GLI 12 prediction equations are used. Moreover, males of any height aged >70 years are more likely to be diagnosed with airflow obstruction.
FEV1 predicted differences between NHANES III and GLI 12 are a function of age, height and sex. FEV1 predicted differences were small when using the average height for males and females across the age range. When average height is used, FEV1 predicted is similar for most males and females.
When using individual height, NHANES III and GLI 12 prediction equation differences were within the intraindividual variability limits (+/−150 mL) for any height in males and females aged ∼18–60 years, but there are notable differences for specific age and height combinations outside this range (fig. 2). Generally speaking, the differences are more pronounced for taller males aged >60 years and for shorter males aged >70 years. For females the differences in predicted FEV1 are greatest for taller individuals aged >60 years, and for shorter individuals aged >80 years (fig. 2).
Differences between NHANES III and GLI 12 prediction equations in the diagnosis and categorisation of airway obstruction in a clinical spirometry data set
We analysed clinical spirograms from 12 116 Caucasian patients (aged 18–95 years). These included 5654 males, of which 5288 were aged 18–80 years, and 6462 females, of which 6034 were aged 18–80 years (table 1). For patients aged 18–80 years, NHANES III predicted LLN FEV1/FVC% identified obstruction in 1892 (35.7%) males and in 1733 (28.7%) females. In contrast, GLI 12 identified obstruction in 1959 (37%) males and in 1719 (28.5%) females (fig. 3). While these aggregate values are similar for our clinical data base PFT patients, these differences show that the two methods can categorise individual patients differently. When we extrapolate NHANES III predicted LLN FEV1/FVC% for patients aged >80 years, NHANES III identified obstruction in 123 (33.6%) males and 122 (28.5%) females. In contrast, GLI 12 identified obstruction in 112 (30.6%) males and 112 (26.2%) females (table 2). The diagnosis concordance rate for males aged 18–80 years was 98.1% (5187/5288), and the diagnosis concordance rate for females aged 18–80 years was 99.1% (5980/6034). The diagnosis concordance rate for males aged 81–95 years was 96.9% (355/366), and the diagnosis concordance rate for females aged 81–95 years was 97.6% (418/428) (see table 2 for separate concordance and discordant rates).
For comparison of categorisation of severity of airway obstruction, we used all patients aged 18–95 years with measured FEV1/FVC% below both predicted LLN FEV1/FVC% from NHANES III and GLI 12. For patients aged 80–95 years, NHANES III identified mild airway obstruction in 49 (43.7%) males and 32 (28.5%) females, whereas GLI 12 identified mild airway obstruction in 36 (32.1%) males and 27 (22.1%) females. The concordance rate for obstruction severity for males aged 18–80 years was 98.6% (1850/1875) and for females aged 18–80 years was 97.6% (1658/1699). The obstruction severity concordance rate for males aged 81–95 years was 88.4% (99/112) and the severity concordance rate for females aged 81–95 years was 93.8% (105/112) (see table 3 for separate concordance and discordance rates).
Figure 3 shows prediction equation differences from our clinical spirometric data for LLN FEV1/FVC% (NHANES III − GLI 12). The clinical spirometric data mirror the simulation model in figure 1 for both males and females. For example, figure 3b shows that most females aged >25 years present a positive difference between NHANES III and GLI 12. In the case of males, figure 3a shows that that the differences are most pronounced for older, taller patients. These positive differences could lead to more airway obstruction diagnosis when using NHANES III than when using GLI 12.
The predicted FEV1 differences between NHANES III and GLI 12 in our clinical spirometric data are within the intra-individual variability limits for males aged ∼30–70 years, and for females aged 18–65 (fig. 4). However, taller males aged 18–30 years and >70 years are outside the intra-individual limits. Shorter males aged >70 years are also outside these limits. In the case of females, taller individuals aged >65 years and shorter females aged >75 years are outside the intra-individual limits (fig. 4). In some cases, differences in predicted FEV1 between the two prediction equations exceeded 400 mL. Individuals outside the intra-individual limits could be categorised differently in airway obstruction diagnosis and severity.
Discussion
PFT laboratories in the United States and around the world using NHANES III and/or GLI 12 need to be cognisant of the similarities and differences between the NHANES III and GLI 12 spirometry prediction equations. The differences could affect the diagnosis of airway obstruction and categorisation of its severity.
We expected differences between the Caucasian NHANES III and GLI 12 reference equations because they were drawn from different data sets (with some overlap) and used different mathematical models. While differences between the two prediction equations are generally small and probably not clinically significant for most patients of average height, particular caution must be exercised in selecting prediction equations for older patients, especially those taller and shorter than average height. In older patients, the NHANES III − GLI 12 differences in LLN FEV1/FVC% may exceed 3.5%, and differences in predicted FEV1 may be as large as ±400 mL for specific age and height combinations. For example, in an 80-year-old male 195 cm (77 inches) tall, the difference between NHANES III and GLI 12 is >250 mL (figs 1 and 2).
Quanjer et al. [4] previously compared spirometry predicted FEV1 values between NHANES III and GLI 12 for a limited number of age and height combinations. They found that, for example, for persons aged 55 years and 175 cm tall, the difference was 20 mL, as it is in our current study. But our figure 2 shows that in a 90-year-old individual of the same height (175 cm), the difference increases to ∼200 mL. The current study therefore expands on the work of Quanjer et al. [4] related to model comparison by showing how predicted values vary across GLI 12 and NHANES III models for different height and age combinations.
Analysis of our clinical spirometric data revealed similarities and differences between NHANES III and GLI 12. The Caucasian patients studied in our laboratory had differences in LLN FEV1/FVC% and predicted FEV1 that were similar in both magnitude and direction to those observed in the mathematical simulation (figs 1–4). There are differences in the categorisation of obstruction severity between the NHANES III and GLI 12 for Caucasian males and females (table 3). However, in the age range of 40–60 years, the predicted FEV1 used to categorise airway obstruction differs little between the two prediction equations and is within the ATS/ERS task force suggested repeatability range of 150 mL (fig. 4).
The average differences between the two prediction equations for the LLN FEV1/FVC ratio (resulting in differences in the diagnosis of obstruction) are small and indicate that, for population/epidemiological purposes, the two prediction models seem quite similar and are probably interchangeable (fig. 1). However, for application in “personalised” medicine (individual patient diagnosis), the prediction equations are not interchangeable for all physiological age and height combinations. When our entire clinical pulmonary laboratory population is considered, the differences in obstruction diagnoses between the two prediction equations are significant, particularly for men and women aged >80 years (table 2).
Notwithstanding the relatively limited number of discordant results between NHANES III and GLI 12 for older patients, these differences may be important. These patients may have a variety of cardiopulmonary diseases resulting in respiratory symptoms [14, 15]. An erroneous diagnosis of airway obstruction may result in inappropriate clinical interventions, including prescribing bronchodilators that may be costly and, even worse, associated with adverse drug reactions [16, 17]. This problem may increase as the population ages and more patients aged >80 years are tested in the future.
The optimal reference equation for patients aged >80 years is not known. Extrapolation of the NHANES III prediction equations for patients aged >80 years has not been recommended because these subjects were not included in the NHANES III derivation cohort [2]. Despite this limitation in the NHANES III study, laboratories must make decisions about patients aged >80 years. The Multi-Ethnic Study of Atherosclerosis revealed extrapolation of NHANES III predictions to 84 years was acceptable [18]. The GLI 12 study did include a limited number of subjects aged 80–95 years; however, this small sample may not be representative of the larger group of elderly patients [4]. While the NHANES III and GLI 12 prediction equations provide similar results for many older patients, there are potentially significant differences in the diagnosis and categorisation of obstruction severity when using the extrapolated NHANES III prediction equations compared with the GLI 12 reference equations.
Prediction equations that most accurately distinguish individuals as normal (“healthy”) or abnormal (“sick”) are most clinically useful. One study suggested that the NHANES III prediction equations were more strongly associated with survival in patients aged ≥90 years than were the GLI 12 equations, and therefore may be more appropriate for clinical use [19].
Our study has limitations. We did not analyse clinical outcomes data (e.g. survival) and are not able to recommend a change in prediction equations for clinical use. We excluded non-Caucasian patients and limited our study to adults. Our clinical laboratory population is largely Caucasian (92%), precluding confident analysis of other ethnic/racial groups. We also chose to focus on the diagnosis and categorisation of airflow obstruction, as this is the most clinically important spirometric result. We acknowledge that using FEV1 % pred to categorise airflow obstruction severity is flawed as it incorporates sex, age and height bias, as shown by Miller et al. [19] While other methods of categorising airflow obstruction severity are more clinically relevant, they are not widely used currently [19].
The purpose of our current study was not to determine the “best” spirometry prediction equation, but to inform clinicians that there are important differences in diagnosis and categorisation of airway obstruction between the NHANES III and GLI 12 prediction equations. NHANES III and GLI 12 spirometry prediction equations should not be used interchangeably to diagnose and categorise airway obstruction for all patients. Abandoning NHANES III in favour of the GLI 12 prediction equations will lead to significant changes in diagnosis and categorisation of airflow obstruction, specifically in patients at extremes of height and age, and may disrupt longitudinal spirometry studies. Until we have credible long-term clinical outcome data to indicate the preferable prediction equations, clinical laboratories should be cautious about changing prediction equations.
Conclusion
There is broad general agreement between NHANES III and GLI 12 spirometry prediction equations for most patients. However, significant differences exist between NHANES III and GLI 12 for some age and height combinations. The differences in LLN FEV1/FVC% and predicted FEV1 are most prominent in older taller/shorter individuals. The magnitude of the differences can be large and may result in differences in clinical management. While we cannot definitively endorse one of the prediction equations over the other, generally they should not be used interchangeably. Caution should be used when either the GLI 12 or NHANES III prediction equations are used for older short/taller patients in the diagnosis and categorisation of airway pulmonary obstruction.
Acknowledgements
We thank Julia Ratomaharo (Service de Pneumologie, Hôpital Privé d'Athis-Mons, Athis-Mons, France) and Steve Howe (Intermountain Healthcare, Salt Lake City, UT, USA), for reviewing the manuscript.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Support statement: This study was supported by the Intermountain Research and Medical Foundation, and Intermountain Healthcare-Utah, USA.
Conflict of interest: None declared.
- Received October 15, 2015.
- Accepted April 14, 2016.
- Copyright ©ERS 2016