Abstract
We evaluated which equations best predicted the lung function of a cohort of nonagenarians based on which best accounted for subsequent survival.
In 1998, we measured lung function, grip strength and dementia score (Mini Mental State Examination (MMSE)) in a population-based sample of 2262 Danes born in 1905. Mortality was registered to 2011 when only five (0.2%) subjects were alive. In half the cohort, we recorded forced expiratory volume in 1 s (FEV1).
Complete data were available in 592 subjects with results expressed as standardised residuals (SR) using various prediction equations. Cox proportional hazard regression found lower FEV1SR was a predictor of mortality having controlled for MMSE, grip strength and sex. The US National Health and Nutrition Examination Survey (NHANES) III (1999) equations gave a better spread of median survival by FEV1SR quartile: 3.94, 3.65, 3.51 and 2.61 years with a hazard ratio for death of 1, 1.16, 1.32 and 1.60 respectively, compared with equations derived with the inclusion of elderly subjects.
We conclude that extrapolating from NHANES III equations to predict lung function in nonagenarians gave better survival predictions from spirometry than when employing equations derived using very elderly subjects with possible selection bias. These findings can help inform how future lung function equations for the elderly are derived.
Abstract
Extrapolating from NHANES III equations to predict lung function in nonagenarians gave the best survival predictions http://ow.ly/rFPXq
Introduction
Lung function assessed by spirometry is known to be a predictor of all-cause mortality [1–5] in the general population and in patients with respiratory disease [6, 7]. The proportion of subjects who are very elderly (aged ≥80 years) is increasing, with one in 20 of the population of the European Union in 2012 being in this category [8], and so there is an increasing requirement to manage lung diseases in the very elderly. Generating lung function prediction equations for the very elderly is problematic for several reasons. The requirement to sample elderly subjects who are free from any disease that might affect lung function causes the possible sample size to be small, and then getting access to and obtaining reliable test results from the elderly is difficult. Furthermore, survivors free of disease to a very advanced age may be deemed in some respects to be atypical subjects.
Accurate prediction equations are necessary for the elderly just as much as for the young, as misdiagnosis due to incorrect interpretation of the results can lead to inappropriate clinical interventions, which, in the elderly, are more likely to cause problematic side-effects and iatrogenic disease [9].
Recently, the ability of various prediction equations to predict survival in a large UK patient population was used to determine which equation was the best fit for that population [10]. We have previously studied the Danish 1905 cohort as part of a large study exploring issues around ageing [11, 12] and obtained recordings of forced expiratory volume in 1 s (FEV1), which was one of the predictors of their subsequent survival [13]. The Danish 1905 cohort has now nearly all died after 14 years from the original study and we have used these data to test whether survival analysis could determine which lung function prediction equations were the best fit for the very elderly.
Methods
In 1998, all surviving Danes born in 1905 (n=3600) were invited to participate in a cohort study conducted by the University of Southern Denmark, Odense, Denmark (approved by the Danish Ethics Committee, number 19980073). Interviews were conducted in the subjects’ residences during September 1998. The interviewers had no previous medical or paramedical experience and were specifically trained for this project by physicians [11]. Study subjects had their grip strength and peak expiratory flow (PEF) and/or FEV1 recorded, and were tested cognitively by the Mini Mental State Examination (MMSE) questionnaire [14] (giving a score from 0 to 30 with a higher score indicating better cognitive ability).
There were 2262 people who agreed to participate (63% of the total) and survival was assessed in September 2011. There were no specific exclusion criteria for subjects to attempt lung function tests as this would have pre-judged the feasibility. Standing height and weight were as specified by the subject. If the subject was unable to state their height and weight, a previously recorded value was obtained.
For logistical reasons, in only half the cohort (the eastern half of Denmark) did we attempt to record both FEV1 and PEF. The devices used (MicroDL; Micro Medical, Chatham, UK) underwent regular calibration checks to ensure they were performing satisfactorily and were maintained according to manufacturer’s instructions. No malfunctions were detected. Mouthpieces without filters were used. At least two and up to three satisfactory forced expiratory manoeuvres were recorded with the subject in an upright sitting position. Spirometry end-of-test criteria [15] could not be achieved in this age group so forced vital capacity data were not analysed. Forced expiratory volume in 6 s (FEV6) values were derived post hoc from stored data. Spirometry quality checks were applied post hoc but were not used to reject any data at the time of acquisition.
We applied equations to our data that have been commonly used in Europe and the USA, and included some that especially cater for predicting for the elderly. The equations included those of the European Coal and Steel Community [16], the US National Health and Nutrition Examination Survey (NHANES) III from 1999 [17], Enright et al. [18], Miller et al. [19] and Garcia-Rio et al. [20], and the new Global Lung Initiative (GLI) 2012 Lambda-Mu-Sigma equations [21]. Three of these equations (Enright, Garcia and GLI) were derived using subjects aged ≥85 years, and so may be deemed more applicable for our age group, and two of the equations (Garcia and NHANES III) used quadratic expressions. Deviation from predicted values was expressed as standardised residuals (SR) (e.g. FEV1SR) [16].
Statistical analysis
Statistical analyses were performed using Stata/SE version 11.0 (StataCorp LP, College Station, TX, USA). Cox proportional-hazard models were used to test the relationship between possible lung function predictor variables and survival while controlling for the previously described effect of sex, MMSE and grip strength [13]. A significance level of 0.05 was chosen for inclusion into a model. The proportional assumption for Cox regression was confirmed for lung function indices from plots of Schoenfeld residuals. The Akaike information criterion (AIC) for Cox models were compared, with lower values suggesting an improved model. Harrel’s C statistic, which is the proportion of predictions and outcomes that were concordant, was used to express closeness of fit for Cox models.
Results
Of the 2262 subjects who agreed to participate, 584 (26%) were male, indicating the survival disadvantage for male sex up to the age of entry to this study. By the censor date of September 21, 2011, a total of 2257 (99.8%) were known to have died. Figure 1 shows the Kaplan–Meier survival for the males and females, with the males, even at this advanced age, still having significantly worse survival with separation of the curves (log-rank test Chi-squared, 26.7; p<0.0001). Of the 1132 subjects we tested in Eastern Denmark, only 606 subjects were able to record at least two expirations to obtain spirometric results, of whom complete data were available in 592 (28% male). Flow–volume curves were available for only 480 (79%) of these subjects, of which 285 (47% of total) were deemed acceptable based on 1) visual inspection of the flow–volume curve, 2) the start of the test had 10–90% rise time to PEF ≤300 ms and 3) the back-extrapolated volume met American Thoracic Society (ATS) criteria [15].
Kaplan–Meier plots of the proportion of subjects surviving to a given age by sex.
table 1 presents the principal measures for the study population with the data stratified by sex with significance values given for the differences that were expected. There were complete data for 592 subjects (164 males, 28%) with FEV1 measurements for each of the prediction equations. table 2 shows the mean FEV1 value for the 592 subjects stratified by sex together with the average predicted values for each of the equations and the SR values for the cohort for each equation and the difference in SR value between the sexes. The GLI and NHANES equations had the smallest difference in SR values between the sexes but the NHANES placed the mean cohort FEV1 values closest to predicted, with GLI second lowest in terms of SR values. The Miller equations predict much higher values in males than the other equations and Garcia much lower in the females. The plot of age against the predicted values for the average height and weight of our subjects (fig. 2) demonstrated that the quadratic equations of Garcia and NHANES III were as linear as the others over this age range.
A plot of age against the predicted values for the average height and weight of our subjects for each of the equations. NHANES: National Health and Nutrition Examination Survey; ECSC: European Coal and Steel Community; GLI: Global Lung Initiative.
Histograms of FEV1SR are shown in figure 3. As our study population was not free from disease, there was no expectation that these distributions would approximate to Gaussian curves. The Garcia and NHANES equations were closest centred to zero for SR values with the Miller equations slightly left skewed and GLI slightly right skewed.
Distribution of forced expiratory volume in 1 s (FEV1) expressed as standardised residuals (SRs) from the predictions using the a) Enright et al. [18], b) National Health and Nutrition Examination Survey (NHANES) III [17], c) European Coal and Steel Community (ECSC) [16], d) Miller et al. [19], e) Global Lung Initiative (GLI) [21] and f) Garcia-Rio et al. [20] equations.
Cox proportional hazards models were created with lung function indices as predictors for survival entered as continuous variables while controlling for known predictive factors such as sex, MMSE and grip strength [13]. There were 592 subjects with FEV1 values and results for all equations, and the results are presented in table 3. Age was not a significant predictor of survival for this cohort because of the cohort’s narrow age band. In addition, ever being a smoker (45% of subjects) and pack-years of smoking exposure were not significant predictors of survival. FEV1 was a significant predictor, with higher lung function values having a reduced hazard for death. FEV1 standardised by height cubed performed better than when standardised by height squared but neither of these transformations performed as well as FEV1SR. FEV6SR was not a significant predictor from the NHANES equations but was for the Garcia equations, although this model was not as good as that for FEV1SR. Inability to perform completely satisfactory spirometry, as judged by our post hoc quality check, was also associated with increased hazard for death.
Cox proportional-hazard models were then created for the data with FEV1SR values in quartiles and the models are shown in table 4. The NHANES and Miller equations gave a progressive increase in median survival across the quartiles with the remaining equations getting the survival for the middle two quartiles the wrong way around. The percentage of males in each quartile varied between the equations, with NHANES and GLI showing the least variation across quartiles. The hazard ratio for all-cause mortality when using the NHANES and Miller equations increased progressively with each quartile, and was significantly different from unity for the upper two quartiles. For the Garcia equations, the HR for the middle two quartiles were the wrong way around and for the other equations only the worst quartile had a hazard ratio significantly different from unity. The models were similar in terms of AIC with the GLI model being the least good.
Discussion
We have found that using survival analysis in an elderly cohort demonstrated how different prediction equations put subjects into different severity categories for lung function, with some equations appearing superior to others in this respect. We conclude that the American NHANES III equations [17] performed the best because of a combination: less sex bias; the population means were close to predicted (table 2); and they had the highest Harrel’s C in predicting survival with coherent increase in survival by lung function quartiles (table 4). Thus, extrapolation from these equations to make predictions for the very elderly gave a better prediction than using equations specifically derived with input from elderly normal subjects (GLI, Garcia and Enright). The NHANES III equations are quadratic with regard to the effect of age, as are the Garcia equations; however, the better performance of NHANES was not seen with Garcia and so it is not just the fact that NHANES are quadratic that gives them the best result overall. The data from NHANES form a part of the GLI equation dataset but the GLI equations did not give the best survival prediction at the advance age of our cohort, although they have been found to be good in predicting mortality in a younger patients [10]. Obtaining meaningful spirometric measurements of lung function in subjects of very advanced age is problematic but the data we obtained from our cohort were able to predict subsequent survival. Previous studies [1–4] have shown that survival in cohorts of younger subjects is related to FEV1 and we have confirmed that FEV1 is a survival predictor even at advanced age.
There is a lack of prediction equations for the very elderly because obtaining reliable lung function data from a large representative sample of elderly free from conditions that might adversely affect their lung function is extremely difficult. Our findings suggest that striving to obtain representative “normal” subjects in extreme old age does not lead to improved prediction equations and that robust equations from a younger age group can be extrapolated to older subjects to give better results. Therefore, when newer equations are derived in the future, such as updating the GLI equations [21], it may be better to leave out very elderly subjects from these calculations and extrapolate from a younger age. Including ostensibly normal very elderly people in studies to determine normal ranges of lung function includes a select group of survivors who are functionally and cognitively inclined to participate and, in some sense, atypical and “supranormal”. Such studies do not record lung function in the subject’s place of residence, whereas we did to avoid such bias. This elderly selection bias in studies defining normal lung function tends to lead to overestimation of the predicted values. Studies in other domains of measurement have shown that the results obtained from the very elderly tend to come from subjects who had results falling in the upper range of values measured earlier in life, with the subjects with the lower values being lost from studies in the elderly due to their death [22].
A potential weakness of our study was that we used nontechnical staff who had undertaken a training programme on all aspects of the study, including spirometry, but their level of expertise was likely to be below that which the ATS and European Respiratory Society (ERS) recommend for spirometry [23]. Our interviewers had not previously undertaken supervision of physical tests like spirometry so it is possible they may not have coached these elderly subjects as rigorously as trained lung function technicians might have done. However, in return for this possible slight loss of expertise, we were able to sample a more representative study population. Recording spirometry in a clinic with fully trained staff would result in testing only the strongest and fittest as well as the weakest and most ill, as only those healthy enough to make their way to the clinic for the test or those already hospitalised would have been sampled. The ATS/ERS guidelines require subjects to perform at least three manoeuvres [15] but some of our subjects were too weak to perform more than two expiratory manoeuvres. We believe we have adequately sampled a representative group of nonagenarians, as shown by the distributions of FEV1. Our recorded FEV1 values may be slightly lower than might be obtained in a lung function laboratory, which may explain why the median SR values for most equations were below zero (table 2 and fig. 2), with those for males being slightly lower than for females. Previous smoking and the accelerated loss of lung function found in elderly males and not females [24] may contribute to this finding.
Smoking was not a predictor of survival in our subjects because the major smoking-related risks will have been manifest in mortality prior to the age of our subjects. Furthermore, smoking in nonagenarians is associated with general activity, interaction and an appetite for life, all features associated with better survival [12]. We have shown that in the very elderly, the inability to perform adequate spirometry is a marker of subsequent mortality. Failure of a subject to obtain a satisfactory set of lung function data has been shown to be as likely to be due to ill health as it is to be due to poor cooperation, poor effort or the incompetence of the technician [25]. Others have found that inability to perform spirometry to accepted criteria in an occupational setting was a reflection of poorer function and subsequent worse decline in lung function [26]. The likely explanation for our finding is that performing these tests is a complex undertaking requiring cognitive and physical ability [27], and so is a surrogate measure for these attributes, which are known predictors of survival.
The ability to inhale maximally prior to the start of the forced expiration influences the magnitude of the FEV1 achieved. We have measured grip strength and shown it independently relates to survival, and future similar studies should consider recording maximum inspiratory pressure (MIP), which is a more appropriate measure of muscle strength for achieving the highest total lung capacity prior to the exhalation, and MIP has been found to be an independent predictor of survival in a study of cardiovascular mortality [28]. FEV1/height3 was not as good as FEV1SR at predicting survival in this age group, which is contrary to the findings of studies in younger subjects [2, 4]. This may be because height in this age group is reduced by kyphosis and posture, and so no longer properly accounts for differences in lung size. We used recollected height, as slight kyphosis and stoop will affect height measurement but will not alter the lung capacity in the thorax, so height recorded at a younger age might be a better estimate for lung function. A follow-up study in COPD recently found that height estimated from arm span was a better measure for lung function prediction [29]. Future studies in the aged may need to measure arm span as a surrogate for true height to see if this better standardises FEV1, to improve survival prediction.
The reason for the link between lung function and survival is not known. A plausible explanation is that by having lower lung function, a subject is more at risk of mortality from a wide range of illnesses that would otherwise be survivable with better lung function. For example, a debilitating disease such as a stroke would be more likely to lead to death from pneumonia in someone with lower lung function. This hypothesis would be especially true in advanced age where pneumonia is increasingly common [30], but in younger subjects one might expect the association between lung function and survival to be less strong as lung function is more likely to be above a critical threshold for survivability. This aspect is worthy of future study in established cohorts of subjects with a wide age range. A further explanation for the link between lung function and survival would be that poorer lung function is a surrogate marker of an established disease or ageing in general, but this is not likely to account for the link seen in much younger populations [2–4]. Another possible explanation for lung function predicting mortality might be that genes that determine lung function are in linkage disequilibrium with those conferring an increased susceptibility to other nonrespiratory diseases that commonly cause death, such as cancer and myocardial infarction. However, cancer as a cause of death reduces in frequency in subjects of advanced age [31] and any increased genetic susceptibility to cancer or vascular disease would have already been expressed in mortality prior to the late age of our cohort.
Despite possible limitations to a study such as this in terms of the quality of the lung function tests, we have shown that meaningful FEV1 measurements can be made in a representative sample of nonagenarians and are important biomarkers of relative ageing and mortality. We conclude that lung function equations derived from older subjects did not predict survival as well as the US equations of NHANES III [17], which seem the most appropriate to apply to the very elderly. Our finding that extrapolation from prediction equations based on younger subjects gives the best results for the very elderly may help determine how prediction equations such as GLI [21] are prepared in the future.
Footnotes
Support statement: The present study was supported by grants from the National Institutes of Health/National Institute of Aging (grant number P01 AG08761) and the VELUX Foundation.
Conflict of interest: Disclosures can be found alongside the online version of this article at www.erj.ersjournals.com
- Received June 13, 2013.
- Accepted November 29, 2013.
- ©ERS 2014