Abstract
Published prediction equations for lung function differ considerably, but the components of population variation responsible for the differences are unknown.
Data were analysed for 6,323 never-smoking adults who did not report wheeze or asthma, from 42 centres participating in the European Community Respiratory Health Survey. Means and components of variance were estimated for males and females aged 20–24 yrs, and the relationships with age and height were examined in those aged 25–44 yrs.
Mean lung function for those aged 20–24 yrs differed between centres, but variation could not be wholly attributed to differences in population or equipment. The maximum difference in means by equipment type was 101 mL for FVC in males. Equipment differences were not statistically significant adjusted for country, but differences in mean forced expiratory volume in one second and forced vital capacity by country, adjusted for instrument, were statistically significant in males. Differences between centres in relation to age and height had less influence on predicted values.
In conclusion, there are unexplained differences in lung function between ethnically similar nonsmoking symptom-free populations. Neither national reference curves nor those based on the same ethnic group can be guaranteed to give accurate norms of lung health.
Spirometric lung function measurements are used clinically for diagnosis and monitoring, and have many uses in research. In clinical use, forced expiratory volume in one second (FEV1) and forced vital capacity (FVC) are usually each expressed as a percentage of predicted value for height, age and sex. For example, the Global Initiative for Chronic Obstructive Lung Disease (GOLD) criteria classify patients with chronic obstructive pulmonary disease into categories according to post-bronchodilator FEV1/FVC and FEV1 % predicted 1. Researchers often analyse FEV1 % pred as an outcome variable 2, 3 or select patient groups according to FEV1 % pred 4.
The GOLD criteria do not specify the reference equations that should be used to obtain predicted values, despite evidence that the published predicted values for FEV1 and FVC differ considerably 5, for both males and females. Recent recommendations for lung function testing have suggested that reference values should be derived from lung function measurements from a population including the “age range, sex and ethnic group of individuals to be tested” 6. This assumes that predicted values should depend only on these three parameters. Each of the equations compared by Roca et al. 5 was a regression equation linear in height and age fitted to data for adults, mostly from those aged 25 yrs 7–11. Hence, reference equations that are usually constructed using data from adults have four elements: the mean value at the youngest age considered fully mature; the adjustment for height; the relationship with age; and the residual standard deviation. Each of these may differ between populations and over time.
The reasons for the differences in predicted values were not explored by Roca et al. 5, but the regression coefficients in the published equations suggest that these include variation in the relationship with height or age between populations. Calculations show some variation in predicted values for those aged 25 yrs and those of average height.
Roca et al. 5 found variation between the mean values for adults aged 20–44 yrs from 34 centres participating in the European Community Respiratory Health Survey (ECRHS). The variation was not partitioned between the mean values in young adults, or different relationships to age and height, the knowledge of which can inform debate about appropriate reference values. The analysis presented in the current study describes variation in FEV1 and FVC in healthy nonsmoking adults aged 20–24 yrs in 42 centres taking part in the ECRHS I. The relationships with height and age in adults aged 25–44 yrs are also estimated.
METHODS
Participants and study design
The protocol for ECRHS I has been described in detail elsewhere 12. Participating centres invited young adults from an area defined by pre-existing administrative boundaries, with a population of ≥150,000 people. At stage one, where possible, an up-to-date sampling frame was used to randomly select at least 1,500 males and 1,500 females aged 20–44 yrs, who were sent a self-completed postal questionnaire. A random sample of responders to stage one was invited to stage two, which included an administered questionnaire and measurement of lung function. The administered questionnaire in stage two included the question: “Have you ever smoked for as long as a year?” Stage two was carried out from 1990 to 1995 across 42 centres, which included the 34 centres in the analysis of Roca et al. 5, and eight contributing data later. The centre in Bombay (India) was omitted, as were those in Aarhus (Denmark) and Wroclaw (Poland) where the equipment used was unknown. Most centres were in Western Europe, plus three in New Zealand, one in Australia, six in Canada and one in the USA. Ethnic origin was not recorded, but participants were known to be almost exclusively White.
Spirometry
The equipment used was Biomedin spirometer (Biomedin, Padua, Italy; 16 centres), SensorMedics spirometer (SensorMedics, Yorba Linda, CA, USA; nine centres), Spirotech spirometer (Spirotech, Bilthoven, the Netherlands; eight centres), Jaeger pneumotach (Jaeger, Hoechberg, Germany; three centres), Morgan spirometer (Morgan, Haverhill, MA, USA; three centres), Morgan pneumotach (Morgan; one centre), Fleisch pneumotach connected to a Hewlett-Packers lung function analyser (Massach USA; one centre) and Vitalograph spirometer (Vitalograph, Buckingham, UK; one centre), each of which complied with American Thoracic Society (ATS) standards. The maximum FEV1 and maximum FVC of up to five technically acceptable blows were determined, and whether FEV1 and FVC each met the ATS criterion for reproducibility 13. Height was recorded prior to spirometry. Out of the 42 centres, it was measured in 31, self-reported in five centres, and not recorded whether measured or asked in six. Each European centre took part in a centrally organised training day, and was visited by a central organiser who checked that the common protocol for lung function testing was observed.
Statistical analysis
All analyses were carried out for never-smokers who did not report ever having had asthma, or wheeze in the last 12 months, for males and females separately. Data were divided a priori between participants aged 20–24 yrs and those aged 25–44 yrs, since 25 yrs has been reported as the age from which lung function starts to decline 11, at least in males 8. Falaschetti et al. 14 demonstrated a plateau up to around age 25 yrs for FEV1 and FVC in males, and for FVC in females. Variation of FEV1 and FVC in those aged 20–24 yrs was analysed for heterogeneity between centres using Bartlett's test. The variation between individuals and between centres was estimated without and with adjustment for height. This was carried out using multilevel models with participant at level one and centre at level two, with height included as a covariate at level one for the height-adjusted components. The regression coefficients for height from the models were used to calculate height-adjusted values. Values were analysed according to whether they met the ATS reproducibility criterion or not, i.e. that the two largest values did not differ by >0.2 L 13. Height-adjusted centre means were analysed in relation to response rate, type of instrument used and country. A meta-analysis method was used to estimate heterogeneity of means and relationships with height and age between centres, for those aged 25–44 yrs 15. The percentage of total variation across centres due to chance was calculated 16.
RESULTS
In the 42 centres, there were 44,801 respondents to the stage one questionnaire in the sample randomly selected for stage two, of whom 21,503 (48.0%) participated in stage two and 18,160 (40.5%) had an acceptable FEV1. Response rates varied considerably between centres (table 1⇓). Of those taking part in stage two, 18,160 (85.5%) provided an acceptable FEV1, ranging 52.2–98.5% across centres. Of these, 17,545 (96.6%) met the ATS criterion. The number of nonsmokers who did not report wheeze in the last 12 months or ever having asthma was 6,323, of whom 1,234 were aged <25 yrs (table 1⇓).
Adults aged 20–24 yrs
There were 605 males aged <25 yrs who did not report wheeze, ever having asthma or ever having smoked for as long as 1 yr with a valid FEV1. Of these, 604 also had a valid FVC. There were 629 females meeting the criterion with FEV1 and FVC measurements. There was some evidence that the degree of variation in FEV1 (p = 0.042 for males; p = 0.047 for females) and in FVC for females (p = 0.041) differed between centres, but this was not the case for FVC in males (p = 0.85).
Table 2⇓ shows the mean, average within-centre variation and between-centre variation for males and females. Without adjustment for height, differences between centre means accounted for, at most, 10% of the between-person variation, as shown by the intraclass correlation coefficients (ICCs). This centre variation increased the standard deviation of a single lung function measurement by at least 3%, for females, but at most 6%, for FEV1 in males, as shown by the ratios of the total single-determination standard deviation to the within-centre standard deviation in table 2⇓. No statistically significant variation between centres in the relationship of FEV1 or FVC with height was detected, either in males or females. The mean±sd height of the young males was 1.79±0.07 m and 1.66±0.07 m for the young females. Adjustment for height reduced each component of variation, but had little effect on the ICCs (table 2⇓). Although the ICCs were small, the differences between centres were highly statistically significant (p-value for heterogeneity <0.0001 in each case). Chance accounted for less than half of the observed variation between centres (FEV1 27.6% in males, 32.6% in females; FVC 42.5% and 26.0%, respectively).
ATS criterion
Mean FEV1, and FVC in males, did not differ significantly between measurements that met the ATS criterion and those that did not. Mean FVC was 0.32 L lower (95% confidence interval (CI) 0.09–0.55) in females for whom the criterion was met. This factor did not explain any of the between-centre variation in either measure for either sex (data not shown).
Relationship of centre means at ages 20–24 yrs with response rate and type of instrument
There was no evidence for a relationship of height-adjusted centre mean FEV1 (p = 0.72) or FVC (p = 0.57) in males or FVC in females (p = 0.17) with overall centre response rate, but there was some evidence of an increase for females with response rate for FEV1 (0.021 L per 10% increase in response rate; 95% CI 0.010–0.041; p = 0.040). Mean FEV1 by instrument ranged from 4.33 L (Vitalograph) to 5.14 L (Morgan pneumotach) in males, and from 3.40 L (Fleisch) to 3.72 L (Morgan pneumotach) in females (fig. 1⇓). There were corresponding differences in FVC. The variation in means was statistically significant for FVC in males (p = 0.021), accounting for 29% of the centre variation. The differences by instrument type were not statistically significant for FEV1 in males (p = 0.06; 20% variation explained), FEV1 in females (p = 0.44; <1%) or for FVC in females (p = 0.81; 0%). Comparing the three makes of spirometer that were used in more than one country, Biomedin, SensorMedics and Spirotech, there were no significant differences between mean FEV1 or FVC by make with adjustment for country, but country differences adjusted for make were statistically significant for males (FEV1 p = 0.0005; FVC p = 0.0027). Divided, a priori, into Biomedin, other spirometer and other type of instrument, there were significant differences between the groups in mean FEV1 in males, adjusted for country (p = 0.009), but not in the other measures or in means unadjusted for country except for FVC in males (p = 0.017).
Figure 2⇓ shows mean FEV1 by country, ranging from 4.18 L (Norway) to 5.02 L (New Zealand) in males, and from 3.23 L (Norway) to 3.68 L (Ireland) in females, with corresponding differences in FVC. The percentage of centre variation explained by country was 52% (p = 0.002) and 34% (p = 0.035) for FEV1 in males and females, respectively, and 53% (p = 0.002) for FVC in males but 0% (p = 0.52) for FVC in females. Between-country differences were statistically significant with adjustment for instrument group in males (FEV1 p = 0.0006; FVC p = 0.0004), but not those in females (FEV1 p = 0.073; FVC p = 0.348).
Relationships with age and height for subjects aged 25–44 yrs
There was some heterogeneity between centres in the relationship of FEV1 with age (p = 0.001 in males; p = 0.048 in females) and of FVC to age (p = 0.008 in males; p = 0.006 in females) adjusted for variation in height. Chance accounted for 55.8%, 71.8%, 62.4% and 60.7% of the observed between-centre variation in the relationship with age, respectively. There was heterogeneity in the relationship of FEV1 with height (p = 0.009) and of FVC with height (p = 0.003) in females, adjusted for age, but not in males (p = 0.809 and 0.291, respectively). There was no association between the relationship with age in those >25 yrs and the mean for participants aged 20–24 yrs across centres for either measure in males or females. The estimated relationships with age from subjects aged 25–44 yrs are shown in table 3⇓. The estimated linear decline in FEV1 with age did not differ significantly between males and females (p = 0.283), and there was weak evidence for a difference in FVC with age (p = 0.046). However, there was some evidence for the relationship of each measure with age being nonlinear in females, with a greater decline in those aged >40 yrs (data not shown). FEV1 and FVC each increased with each centimetre of height by more in males than in females (table 3⇓). Components of variance in table 2⇑ were virtually unchanged on adjustment for nonstatistically significant relationships of height with age within the age range 20–24 yrs.
DISCUSSION
Of the four elements of adult reference equations, i.e. mean in early adulthood, relationship with height, decline with age and variation around the predicted value, it has been shown that variation in the mean values at the youngest mature age may differ substantially between populations. Variation in the other components between populations is likely to have less influence or almost none. Although the current study data were obtained in young adults, and variation in relationships with age may be more important at older ages, mean values at the age of maximum lung function are important components of reference equations.
Although there was evidence for variation between centres in the relationship of lung function with age, more than half of the observed variation could be attributed to chance. For most uses, in diagnosis and patient selection, cross-sectional reference curves are appropriate. Several authors have found differences between cross-sectional associations with age and longitudinal decline with age 17, but not all in a consistent direction 18, 19. The longitudinal decline with age should represent the true mean decline due to the ageing process, but estimates may be affected by selective participation in multiple surveys, learning effects and healthy survivor effects. Cross-sectional relationships with age will encompass cohort effects as well as healthy survivor effects, but are less affected by participation bias than longitudinal estimates. These different influences may explain the discrepancies between the cross-sectional and longitudinal findings. Clearly, it is desirable to allow for pure cohort effects in reference equations, but part of the decline with age may be due to increasing ill health, and earlier cohorts may have poorer health than later cohorts. Hence, full age adjustment may lead to underdiagnosis of lung disease. Although never-smokers without wheeze or asthma were selected, in common with most studies that reported reference equations, asymptomatic disease could not be rules out. The increasing decline with age, observed in the current females (data not shown) and reported by others 17, 18, 20, may also be due to effects of poorer health.
The current authors chose to compare means in those aged 20–25 yrs since several studies have estimated the decline after that age 8, 11, while others have modelled means from an earlier age 14, 20–23. Studies that have modelled variation from subjects aged ≤20 yrs do not show a consistent age of maximum lung function. Falaschetti et al. 14 have found an earlier decline in FEV1 in females than in males but an extended plateau in FVC, Gulsvik et al. 20 have shown an apparent maximum lung function in male subjects aged 30 yrs and stated that curves in males and females were parallel, and Langhammer et al. 23 have demonstrated a decline in male subjects from 20 yrs but later in females. These differences may be an artefact of the various forms of equations used to model mean lung function. It is unlikely that the sample sizes available, ∼6,000 in the Health Survey for England across the age range of 16–85 yrs and over 14, would be adequate to distinguish between these models.
By far the most difficult issue is what mean value of each measure of lung function in subjects aged 20–25 yrs, or at the age of maximum lung function, should be used in reference equations. Although considered as random effects, the centre differences increased the residual standard deviation, and hence the width of a reference range, by at most 6% (table 2⇑), the differences in means by type of instrument were not negligible (fig. 1⇑). Compared with between-centre variation, differences between the eight types of equipment were not generally statistically significant, but this analysis of 42 mean values does not rule out important real differences. Assuming a normal distribution, and, therefore, using the mean minus 1.64 total standard deviations of FEV1 from table 2⇑ 14, 21, the estimated 5th centiles of FEV1 for males and females of average height aged 20–25 yrs are 3.94 L and 2.87 L, respectively, but range 3.48–4.29L for males and 2.73–3.05 L for females using the minimum and maximum means from figure 1⇑ with within-centre variation from table 2⇑. The ATS 13 and the European Respiratory Society 11 each state that calibration of equipment should achieve readings to within ±50 mL.
Although participation bias could be ruled out as unlikely to be a major cause of the centre variation, true population differences could not be fully separated from instrument variation. The latter comprises spirometer versus pneumotach difference, make, model and machine within-model variation, and inconsistent calibration and operation. It is not possible to ascertain whether corrections to body temperature, ambient pressure, saturated with water vapour conditions are comparable between different manufacturers. In so far as the same type of instrument was used in several centres in different countries, there is evidence that there are true population differences and that differences between spirometers may be less important, but neither of these can be quantified from the current study. However, there is evidence that even devices of the same type, used under carefully controlled conditions and calibration, may give differing results 24, 25.
The current results suggest that centre variation was more likely to be due to true population differences. Population differences may be due to genetic differences or to differences in health that are not removed by restricting the data to nonsmokers without wheeze or asthma. In the present data, differences remained after adjusting for variation between countries in height and also for variation in body mass index. As participants in the ECRHS were almost exclusively White, the country differences lead to the conclusion that reference curves cannot be guaranteed to be applicable to a population of the same ethnic group as that from which they were derived, as recently recommended 6. Conversely, national reference curves cannot be used in epidemiological studies that seek to compare populations and, if used to select patients with lung function below a given percentage of predicted value in multicentre trials across different countries, may result in heterogeneity in severity of disease in those chosen.
Without a large international study comparing several instruments within each centre, it is impossible to fully separate instrument from true population differences. Studies that wish to compare population values between centres must be prepared to invest in standardised equipment. In order to show whether population differences in lung function represent differences in health, it would be necessary to compare mortality and morbidity between populations in relation to lung function measured in a standardised way. Neither national reference curves nor ones based on the same ethnic group can be guaranteed to give accurate norms of lung health.
- Received June 22, 2005.
- Accepted December 15, 2005.
- © ERS Journals Ltd