Abstract
Spirometric lung function is partly determined by sex, age and height (Ht). Commonly, lung function is expressed as a percentage of the predicted value (PP) in order to account for these effects.
Since the PP method retains sex, age and Ht bias, forced expiratory volume in 1 s (FEV1) standardised by powers of Ht and by a new sex-specific lower limit (FEV1 quotient (FEV1Q)) were investigated to determine which method best predicted all-cause mortality in >26,967 patients and normal subjects.
On multivariate analysis, FEV1Q was the best predictor, with a hazard ratio for the worst decile of 6.9 compared to 4.1 for FEV1PP. On univariate analysis, the hazard ratios were 18.8 compared to 6.1, respectively; FEV1·Ht−3 was the next-best predictor of survival. Median survival was calculated for simple cut-off values of FEV1Q and FEV1·Ht−3. These survival curves were accurately fitted (r2 = 1.0) by both FEV1Q and FEV1·Ht−3 values expressed polynomially, and so an individual's test result could be used to estimate survival (with sd for median survival of 0.22 and 0.61 yrs, respectively).
It is concluded that lung function impairment should be expressed in a new way, here termed the FEV1Q, or, alternatively, as FEV1·Ht−3, since these indices best relate spirometric lung function to all-cause mortality and survival.
From the first scientific recording of lung function data 1, 2, it was appreciated that the values obtained were dependent upon the subject’s sex, age and height (Ht). This led to the practice of trying to take these influences into account by using prediction equations and then relating the subject's result to the expected value (percentage of the predicted value (PP)). In the second edition of Respiratory Function in Disease by Bates et al. 3, it was suggested that, if a lung function index value were <80% pred, then it was likely to be abnormal. This method was widely embraced 4 and has endured, but there are several reasons why this is not a helpful rule of thumb 5, 6. The European Respiratory Society (ERS) was the first to recommend the use of standardised residuals (SRs), which are in essence a z-score, for determining whether or not an index is outside the normal range 7. The lower limit of normal (LLN) is -1.645 SRs, which is an estimate of the lower 5th percentile, and this method has been recommended by the most recent American Thoracic Society (ATS) and ERS statement 8 for determining whether or not a result is abnormal, with PP suggested as the method for expressing severity.
Whenever lung function is related to a predicted value, it requires accurate prediction equations against which to compare the subject's data. The equation must be obtained from a relevant population of subjects, using comparable equipment and with rigorous technical standards applied. Exactly how normal these subjects are can be hard to define, and a population that is too pure may be unrepresentative. Even the best prediction equation has quite wide 95% confidence limits for its predicted value, and so this, in itself, is an inexact science. An alternative approach has been to standardise spirometric lung function using a power relationship with Ht that helps to account for some size and sex difference. This method was shown, in the Framingham study 9, to be helpful in relating lung function data to subsequent survival, and was found by Fletcher et al. 10 to be the best method for evaluating longitudinal decline in function in chronic obstructive pulmonary disease (COPD).
The purpose of the present article is to explore the limitations and advantages of these various methods, and to explore other and perhaps better methodologies so that these can be tested by other researchers in order to help determine the best way forward. The relationship between lung function and all-cause mortality was used to explore this since this was information that was readily available. It has previously been shown that forced expiratory volume in 1 s (FEV1) is a predictor of all-cause mortality in the general population 9, 11–13, and that this relationship is even stronger for mortality caused by respiratory diseases related to airflow obstruction 11.
DATA AND METHODS
Data sets
In order to help explore various methods for using spirometric lung function data, three sets of data were used. One was obtained from routine lung function tests performed at the University Hospitals Birmingham National Health Service Trust (Birmingham, UK). These data comprise the results of the most recent attendance tests obtained from 11,972 patients (53% male) referred for whatever reason for lung function tests, and all had their survival registered up to October 2008 in UK National Health Service data records. Tests were performed on equipment validated to conform to ATS/ERS specifications 14, and the data were obtained following these test criteria. The second set of data were from the Copenhagen City Heart Study (CCHS) 15, kindly released to us by P. Lange (Hvidovre Hospital, Hvidovre, Denmark) in order to facilitate exploration of a novel approach to using lung function data 11. The lung function data from subjects entered into the CCHS during the period 1976–1978 and their survival up to December 2002 were released for analysis. The methods and background to this large project have been described previously 12, 15. FEV1 and forced vital capacity (FVC) were recorded without prior brochodilatation whilst sitting using a Vitalograph bellows spirometer (Maids Moreton, UK). Only subjects with at least two measurements within 5% of each other were included, and the highest value obtained was recorded. Any subject whose recorded FEV1 was <0.3 L (n = 4) or whose FEV1 exceeded FVC (n = 19) were excluded, leaving data from 13,900 subjects (46% male) for analysis. The third set of data comprised 1,095 patients (41% male) with COPD who had had their post-bronchodilator FEV1 recorded and were then followed for 15 yrs in order to explore predictors of survival 16. When all three data sets were combined and only those aged ≥20 yrs retained, this left 26,967 subjects for analysis with regard to FEV1 and survival prediction.
Methods for expressing FEV1 impairment
FEV1 has been expressed in a number of ways, as PP (FEV1PP), using European Coal and Steel Community (ECSC) reference equations, and as FEV1 divided by Ht squared (FEV1·Ht−2) 13, 17 and Ht cubed (FEV1·Ht−3) 10. FEV1 has also been presented as a SR (FEV1SR), which is derived from:
FEV1SR = (observed FEV1−predicted FEV1)/RSD
where RSD is the residual standard deviation of the prediction equation used 7. Although the ECSC reference equations are only relevant for people aged ≤70 yrs, they are frequently used beyond this age, and we have found the equations to be as good at prediction as other more age-specific equations up to an age of 95 yrs 18. Plotting the patient data showed that, regardless of age, there was a flat lower limit to FEV1, as shown in figure 1⇓. It was found that the lower 1st percentile of FEV1 in the patient group of nearly 12,000 subjects differed between the sexes (0.5 L for males and 0.4 L for females), but did not vary significantly with age at ages >50 yrs, where more reliable estimates of the 1st percentile were possible, as shown in figure 2⇓. It was then decided to standardise FEV1 using these sex-specific lowest 1st percentiles, and this index was termed the FEV1 quotient (FEV1Q). It is an index of the number of turnovers of a nominal lower limit of lung function remaining, and takes into account some sex and size differences in lung function. When using data from a single sex, FEV1Q has no advantage over raw FEV1.
Statistics
All analysis was undertaken using Stata/SE version 9.1 (StataCorp, College Station, TX, USA). Cox's regression models for predicting survival from FEV1 were derived together with age and sex as predictors, and then without these predictors, since the object was to determine the best method for using lung function data in a clinical setting in which other factors would not be explicitly accounted for in decision making. All models were confirmed to abide by the assumptions implicit in proportional hazard analysis.
RESULTS
Each method of expressing lung function impairment is taken in turn, with results provided to support or reject its use in this context. Finally, results on survival in the present large data set are explored.
Percentage of the predicted value
PP methodology has sustained itself over the years, but it has no statistical basis and can be misleading when comparing different lung function indices. Figure 3⇓ shows idealised data for males and females indicating that the true LLN is at different PPs for different ages, and that this differs between the sexes. It also varies with Ht. Thus, if the procedure of relating to a predicted value is an attempt to account for age, Ht and sex, this method indeed conceals influences for each of these three domains that potentially corrupt the result. The problems with PP get worse if it is desired to compare results from different indices because the PP that relates to the estimated 5th percentile is very different according to the index considered. Table 1⇓ shows the LLN (estimated 5th percentile) expressed as PP for both sexes for a variety of indices using the ECSC equations 7. The values for LLN range from 58% pred for residual volume to 87% pred for FEV1 as a percentage of FVC. If the PP 80% rule 3 were used, it might be falsely assumed that the result for residual volume was extremely low but that the FEV1 as a percentage of FVC was acceptable, whereas they are indeed both equivalent and at the LLN. This table indicates how difficult and potentially misleading it is to use PP to look for patterns of abnormality amongst lung function indices.
If PP were a valid method of expressing severity of impairment, then it might be expected that the lowest FEV1PP seen in patients would be roughly the same irrespective of sex, age and Ht. Looking at this another way, if FEV1PP were a valid measure of severity, then young patients with cystic fibrosis would die with a larger raw FEV1 than older people because their predicted value is larger than that found in older subjects. Figure 1⇑ shows the FEV1 data from all of the present patients plotted against age expressed as raw FEV1, FEV1PP, FEV1·Ht−2, FEV1SR and FEV1Q. For raw FEV1, it is striking that the lower boundary was roughly the same irrespective of age. For FEV1PP, the lower limit in the young subjects was lower than that seen in older subjects, i.e. young subjects can survive with an FEV1 that is a much lower PP than can older subjects. For FEV1·Ht−2 and FEV1Q, the lower boundary is flat, much as for raw FEV1. Figure 1⇑ confirms what is known from clinical practice, i.e. that young cystic fibrosis patients can survive with an absolute FEV1 just as low as can 70 yr olds, and so can survive with a much lower PP 19. This suggests that FEV1PP is also not the best method for estimating severity.
Standardised residuals
Use of SRs is the method endorsed by the ATS and ERS in their recommendations for determining whether or not an individual's lung function is outside the normal range 7, 8. The SR is commonly used in statistical analysis, with the term being synonymous with a z-score, and was first used in the context of lung function data in a study looking at patterns of abnormality in smokers 20.The advantage of this technique is that the units are the same for all types of index, and the SR indicates where a subject's result lies with regard to the Gaussian distribution of the normal population. Since 1.645 SRs below the predicted value is an estimate of the lower 5th percentile (1.96 SRs below estimates the 2.5th percentile), a level of deviation from predicted where clinical interest is to be directed can be decided upon. The ATS and ERS 8 have suggested that 1.645 SRs below the predicted value is the level to use in patients to define the LLN. Implicit in this is that 5% of people who have been judged to be completely normal would now be considered as abnormal (i.e. they are false positives). For patients or symptomatic subjects, this may be acceptable, but, if an asymptomatic population of nondiseased subjects were being tested, the estimated 2.5th percentile might be chosen instead in order to minimise the number of false positive results.
Using SRs to express the degree of abnormality below the LLN is more problematic since the predicted values for younger subjects are higher and thus, in terms of the number of RSDs available to fall, the younger are able to go lower. This can be seen in figure 3a⇑ for males, where point A is the predicted value for a male aged 25 yrs of Ht 1.77 m and point B is for a male of the same age and Ht with an FEV1 of 0.6 L. Points C and D are the equivalent for a male aged 70 yrs. The baseline of zero FEV1 is 8.6 and 6.1 SRs below the predicted values for these males aged 25 and 70 yrs, respectively. The FEV1 at B represents 7.44 SRs below the predicted value and point D represents an FEV1 of 4.9 SRs below the predicted value. It is not possible for the male aged 70 yrs to have an FEV1 of 7.44 SRs below the predicted value since this would require a negative FEV1, which is nonsensical. If the two subjects at B and D were indeed equivalently disabled, showed equivalent symptoms and had similar survival projections, then the SR method would not appear to reflect properly the degree of impairment. This is borne out in figure 1⇑, where the SRs go much lower in the younger subjects than in the older subjects.
Standardising by powers of height
The Framingham study 9 showed that FEV1 divided by Ht gave a reasonable prediction of long-term survival. Fletcher et al. 10 showed that FEV1·Ht−3 as a means of standardising FEV1 was the best method for evaluating lung function decline. This form of standardisation by Ht takes some sex and size differences into account, and it is these differences that make use of raw FEV1 problematic, especially when considering data from both sexes together.
Regression of log FEV1 against log Ht in the CCHS data gives a slope of 3.7, but only 0.33 of the variance in ln FEV1 was explained by ln Ht. The fit was not very good and this slope for both sexes suggested Ht to the power of three or four might provide the best fit. Figure 4⇓ shows histograms of CCHS FEV1 data expressed as FEV1PP, FEV1SR, FEV1Q and FEV1·Ht−3. Since these data were randomly acquired from a normal population with respect to their lung function, the distribution for a satisfactory method of expressing lung function for both sexes together should be normal. For raw data, the histogram would indeed show two separate distributions for males and females, with their known size differences (skewness of 0.57). When expressed as PP the distribution was negatively skewed (-0.27), and the same was true for SRs (-0.26). FEV1Q had a skewness of 0.23, but FEV1·Ht−2 gave a better fit for a normal distribution, with skewness of 0.15, and the fit was best for FEV1·Ht−3, with skewness of 0.00.
Testing lung function impairment and survival
The hardest end-point for lung function impairment to be tested against is survival, and this is a clearly defined end-point. Table 2⇓ shows the mean ages and mean survival for each component of this large data set, and table 3⇓ shows the number of subjects, split into 10-yr age bands, with their mean±sd FEV1SR and survival, and the percentage of subjects who had died. Receiver operating characteristic (ROC) curves were calculated to investigate which method of expressing FEV1 was best, on its own, at predicting survival, and the area under the curve was best for FEV1Q (0.631 (95% confidence limit 0.624–0.637)), with FEV1·Ht−3 almost the same (0.626 (0.619–0.633)); next best was FEV1·Ht−2 (0.621 (0.614–0.628)), followed by raw FEV1 (0.606 (0.599–0.612)) and FEV1PP (0.586 (0.579–0.592)), with FEV1SR being worst at 0.571. Figure 5⇓ shows the ROC curves for FEV1Q and FEV1PP, with FEV1Q being more specific and no less sensitive than FEV1PP.
The best FEV1 predictor of survival on multivariate analysis was determined from Cox's regression models, which were derived using each index, sex and age as predictors, with lung function in deciles. The best model for predicting survival was with FEV1Q, followed by FEV1·Ht−2, FEV1·Ht−3 and then FEV1PP, with each model significantly better than the next (p<0.05 (likelihood-ratio test)), with the hazard ratios for the results shown in table 4⇓. The FEV1Q column in table 4⇓ shows that the hazard ratios for older age groups were smaller for FEV1Q than for FEV1PP, but that the opposite was true for the hazard ratios associated with worsening lung function. This indicates that age per se plays a smaller part in prediction of survival for the model using FEV1Q than for the model with FEV1PP. The data were then split into data set A, comprising 12,181 subjects with an FEV1SR ranging from 0.0 to -1.645 (mean±sd -0.82±0.46), and data set B with 9,630 subjects with an FEV1SR of <-1.645 (mean±sd -2.81±0.95), i.e. all were below the LLN. Cox's models were generated for each of these two data sets using sex and quintiles of both function and age to determine whether or not the ability of the various FEV1 indices to predict survival was different in those with better (data set A) or worse (data set B) lung function. The best model for predicting survival in both data sets was with FEV1Q, followed, in order, by FEV1·Ht−3, FEV1·Ht−2 and then FEV1PP. For each data set, the FEV1Q model was significantly better than the other three models, and the FEV1PP model was significantly worse than the others (p<0.001 (likelihood-ratio test)). Models with FEV1SR were very much worse with little utility. Thus the superiority of FEV1Q in predicting survival was not affected by the range of lung function being considered.
Each method of expressing lung function was then split into the top quartile, as the reference group for normal survival, and then the remaining values for the index were divided into a further nine bins of subjects using cut-off levels derived as follows. If Xq were the value defining the upper quartile, the other bins were defined at Xq, Xq×9/10, Xq×8/10 . . . Xq×1/10. The worst two groups were combined since the number of subjects was <10 in the lowest groups. Regression models using only these bins as predictors were derived without sex or age as predictors, with the results shown in table 5⇓. Again, FEV1Q was the best predictor, with FEV1·Ht−3 being the next best. Lastly, Cox's regression models were derived for simple numerical cut-off levels of FEV1Q and FEV1·Ht−3 that might easily be applied in lung function laboratories (<1.0, 1.0–1.9, 2.0–2.9 . . . 6.0–6.9 and ≥7.0 for FEV1Q, with the cut-off levels for FEV1·Ht−3 being numerically a tenth of these). Median survival was calculated for each group, with the results shown in figure 6⇓. The survival curves in figure 6⇓ predicted from FEV1Q and FEV1·Ht−3 could each be accurately fitted by polynomial functions, as presented in table 6⇓.
DISCUSSION
The present study has shown that the currently widely used PP method is significantly inferior to other methods for expressing FEV1 when considering the relation between lung function and subsequent survival. In this context, it has been shown that standardisation using a power of Ht is much better than using FEV1PP or FEV1SR. Overall FEV1·Ht−3 has a slight edge as the best power of Ht for removing sex and size bias, and, in a random normal population, this measure was normally distributed. However, the best way of expressing impairment was with use of FEV1Q, a novel method we propose for expressing lung function data. Like FEV1PP and FEV1SR, however, it depends on sex, since the denominator is sex-dependent. Choosing the best method in a given circumstance is dependent upon the aspect of clinical care or management that is relevant. We chose the method that was best for predicting all-cause mortality as this information was readily available to us, and it is a well-defined end-point that is ultimately the most important outcome in any medical condition. The results presented here may not be correct for an alternative end-point, such as symptoms like breathlessness, and this aspect needs testing in other appropriate data sets. Focussing on respiratory mortality alone might further enhance the prediction; it has previously been shown, in a general population sample, that the hazard ratio of cut-off levels of FEV1·Ht−2 for predicting death caused by respiratory disease were 10 times higher than for all-cause mortality in the more severely affected subjects 11.
FEV1 has been found by many authors to relate to survival in the general population 9, 11–13, but the exact reason for this is not clear. It is possible that this link occurs because genes that are associated with worse lung function are in some way co-located with genes that determine susceptibility to common diseases, such as cancer or cardiovascular disease. In support of this, the Framingham study found that FEV1 did not relate to survival in the elderly 9. However, it has recently been shown that lung function still predicts survival in a cohort of 95 yr olds 18, when a putative link between lung function and other disease risks would have been mainly spent. This suggests that having lower lung function may mean that other diseases are more likely to be fatal, for example by predisposing to pneumonia following a stroke, but a firm causal link has not yet been proven.
The problems with the PP method relate to the proportional assumption implicit in this expression and the numerical aspects that lead to PP retaining unwanted age, sex and Ht bias. A further issue is that all assessments of impairment are currently based on looking at how far a subject has fallen from an estimated predicted point that is deemed appropriate for the subject's sex, age and Ht. This predicted value includes a lot of uncertainty, and so the resulting index includes this uncertainty, and perhaps contributes to why the PP method is not the best index. We chose to turn the issue on its head and concentrate on looking at how far a subject is above the bottom line. Using a zero value as the bottom line is no good since, with data for both sexes, size differences obscure the signal. Using Ht standardisation is effective in this respect and it seems, from the combined data here, that standardisation by Ht cubed is better than using lower powers of Ht.
We here propose a new concept for expressing spirometric data, which was suggested to us from the observation in figure 1⇑ that there is an absolute lower limit of FEV1 seen in laboratory testing. It was then found that this limit is slightly lower in females and, if a subject's FEV1 is standardised by the relevant sex-specific lower limit, this gives the number of turnovers of FEV1 left for the subject. This index, FEV1Q, is the best overall predictor for use with regard to predicting survival. This is true with multivariate (table 4⇑), as well as univariate, analysis (table 5⇑). This represents a change in thinking to concentrate on what function it is known a subject has left to survive on, rather than on what it is thought that they might have lost. In a clinical setting, there is always awareness of the age and sex of the patient and appreciation that survival is related to both of these attributes. However, an index may be of greater utility to a clinician if it does not require any additional manipulation in order to take these into account. With FEV1Q and FEV1·Ht−3, this is possible, but, if FEV1PP were to be used, then this index must be manipulated with a complex function in order to take into account the age and sex of the subject before it can accurately be used to predict survival potential. Since an age effect is retained in FEV1Q and FEV1·Ht−3, it is possible that these indices might not be so suitable if it were necessary to focus solely on the extent of lung function abnormality per se and totally avoid any age effect, or, alternatively, if there were a research need to tease out the exact effect of age, as distinct from lung function, on an aspect of medical interest. Although FEV1PP attempts to account for age effects on lung function by use of prediction equations, this method introduces other age, sex and Ht biases from the equation used, and the assumption of proportionality and these effects introduce noise in the signal and reduce the overall ability of a researcher to predict mortality. In analysis of the effect of time-related exposures on lung function, such as in occupational medicine, there may be an advantage in using FEV1Q or FEV1·Ht−3 as it does not hide such biases. It has been shown here that FEV1Q and FEV1·Ht−3 are the best indices for investigating all-cause mortality, but it is another question that remains to be tested as to how well they relate to symptoms and other markers of lung disease.
The results in figure 6⇑, based on simple cut-off points of the lung function index alone, can be used clinically to judge the severity of a situation with regard to survival by using the appropriate polynomial prediction from table 6⇑. This represents a potential benefit to patients for initiating treatment strategies and for disease severity stratification in future research into outcomes of lung diseases. Thus, for example, a subject’s FEV1 would first be tested to see whether or not it was outside the expected range (e.g. below the lower 90% confidence limit, i.e. an FEV1SR of <-1.645), and then the severity of any abnormality could be judged using the estimated median survival for the FEV1Q. This survival could, if needed, be related to the estimated survival for the predicted FEV1Q. In relating any other measurements that might arise in research to severity, using the FEV1Q itself would suffice. Table 7⇓ shows some examples of how lung function results can be expressed using the present results.
The method for standardising FEV1 using the lowest sex-specific FEV1 (FEV1Q) is a way of avoiding the difficulties of using raw FEV1 because of sex and size differences between individuals. This method expresses an individual's FEV1 as the number of turnovers of the bottom line level of lung function that remain. It has been found that decline in FEV1 in normal subjects is greater in males than females, and is defined in terms of absolute loss of volume 21. For never-smokers aged >50 yrs the annual FEV1 decline has been estimated to be 28 mL in males and 22 mL in females 22, giving a female to male ratio of 0.78, which is approximately the same as the ratio of a female to male 1st percentile FEV1 of 0.8. In patients with asthma, FEV1 decline has been found to be ∼50 mL·yr−1 in intrinsic asthmatics and ∼23 mL·yr−1 in extrinsic asthmatics 23, and ∼32 mL·yr−1 in asthmatic nonsmokers aged 40–60 yrs and ∼26 mL·yr−1 in those aged >60 yrs 21. None of these studies suggested that this loss was a proportional effect. These longitudinal findings can be easily applied to FEV1Q data in that, for nonsmokers aged 40–60 yrs, a decrease in FEV1Q of 1.0 would take ∼18 yrs 22. In smokers aged >60 yrs, their accelerated loss would equate to a decrease in FEV1Q of 1.0 every 10 yrs 21. These estimates of FEV1Q decline are independent of sex because the proportional difference in lung function decline between males and females appears to be an approximate fit for the observed sex difference in 1st percentile FEV1.
As FEV1 declines, the likelihood of terminal hypoventilatory failure increases, and the fact that females appear to have a smaller absolute lower limit of FEV1 can be explained in two ways. Ventilatory dead space 24, 25 and airway volume 26, which are closely correlated to anatomical dead space, have been shown to be smaller in females and are related to their overall smaller stature. Secondly, females also exhibit a lower basal metabolic rate than males 27 and so have a lower basal ventilatory demand. Thus females may be able to survive to a lower absolute FEV1 than males.
It is concluded that the FEV1PP method is not ideal for expressing lung function impairment and should be dropped in favour of a new method of expressing FEV1 impairment called FEV1Q, with FEV1·Ht−3 being the next best alternative. Future work should determine how these expressions of spirometric lung function impairment relate to symptoms and whether other lung function indices can be managed in a similar way.
Statement of interest
None declared.
Acknowledgments
We thank P. Lange (Hvidovre Hospital, Hvidovre, Denmark) and the research group of the Copenhagen City Heart Study (Epidemiological Research Unit, Bispebjerg University Hospital, Copenhagen, Denmark) for releasing the spirometric data for this analysis. We thank A. Dirksen (Gentofte University Hospital, Hellerup, Denmark) for the inclusion of the 1,095 chronic obstructive pulmonary disease patients followed for 15 yrs.
- Received February 14, 2009.
- Accepted August 27, 2009.
- © ERS Journals Ltd