## Abstract

Reliable interpretation of pulmonary function results relies on the availability of appropriate reference data to help distinguish between health and disease and to assess the severity and nature of any functional impairment.

The overwhelming number of published reference equations, with at least 15 published for spirometry alone in the past 3 yrs, complicates the selection of an appropriate reference. The use of inappropriate reference equations and misinterpretation, even when potentially appropriate equations are used, can lead to serious errors in both under and over diagnosis, with its associated burden in terms of financial and human costs.

Further misdiagnosis occurs when fixed cut-offs, such as 80% predicted forced expiratory volume in 1 s (FEV_{1}) or 0.70 FEV_{1}/forced vital capacity, are used; particularly in young children and elderly adults. While per cent predicted has historically been used to interpret lung function results, z-scores are more appropriate as they take into account the predicted value, as well as the between-subject variability of measurements.

We aim to highlight some of the main issues in selecting and using reference equations and discuss how recent developments may improve interpretation of pulmonary function results.

Like most medical observations, reliable interpretation of pulmonary function results relies on the availability and use of appropriate reference data to help distinguish between health and disease and to assess the severity and nature of any functional impairment. The overwhelming number of published reference equations, with at least 15 published for spirometry alone in the past 3 yrs, complicates the selection of an appropriate reference. The use of inappropriate reference equations and misinterpretation, even when potentially appropriate equations are used, can lead to serious errors in both under and over-diagnosis, with its associated burden in terms of financial and human costs 1, 2. Furthermore, overdependence on fixed cut-offs to define abnormality, irrespective of well recognised age-related changes, further magnifies these problems 3–7.

In this article we aim to highlight some of the main issues in selecting and using reference equations and discuss how recent developments may improve interpretation of pulmonary function results. In the interests of brevity, this article has been limited to spirometric reference data, but most of the principles discussed apply to other pulmonary function tests. No attempt has been made to provide a comprehensive review of all available reference data, with emphasis instead being placed on what is commonly used, new developments in the past 3 yrs and practical issues relating to the use and misuse of reference data.

## WHAT IS NORMAL?

The range of values obtained from a “healthy population” is assumed to represent normal. Individuals with values outside the central 95% of this range (which may be defined as the normal range) are often considered to have atypical results and are referred for further testing. Unlike many other medical observations, lung function measurements are frequently related to body size and age, where height is a proxy for chest size, and age reflects maturity. During childhood and adolescence, growth is particularly rapid with lung function increasing 20-fold during the first 10 yrs of life 8. By contrast, once peak lung function has been attained during early adulthood, this peak being some 5 yrs later in males than females, there is a steady age-related decline in most lung function outcomes 9, 10 (fig. 1). While the precise age at which lung growth ceases depends on whether cross-sectional or longitudinal data are inspected 12, 13, height, age, sex and, ideally, ethnic/racial group must be taken into consideration when defining the normal range for lung function.

The selection of “healthy” subjects who comprise the reference population is of paramount importance 14. However “health” itself is difficult to define and the choice of appropriate inclusion and exclusion criteria often depends on the intended use of the reference ranges. Inevitably there will be differences between the predicted values derived from studies which have selected a sample from the general population (*i.e.* minimal exclusion criteria) compared to studies which have specifically excluded those with any clinical risk factors (for example, smoking history, pollutant exposure, history of respiratory symptoms). Ideally the sample should be unbiased and generalisable, with the characteristics of the reference population well documented so they can be considered by the user.

## POPULATION-SPECIFIC REFERENCE EQUATIONS

The observed differences between reference equations may be explained by differences in population characteristics, although the equipment, software and measurement technique used may also explain some of the variation 15, 16. Therefore, steps should be taken to ensure that the reference equations selected by the user are applicable and appropriate for the population being tested. In practice this is extremely difficult to achieve as it ideally necessitates investigating a large number of healthy subjects of both sexes over the entire relevant age range from the local community, so that small but important differences may be identified 17. In future, a consortium of users who are using the same equipment, software and methodology in similar populations may be able to address this issue more effectively. Manufacturers should help to coordinate such efforts and facilitate the decision-making process by providing an easy to search guide of the selected reference population together with details of equipment and measurement protocol used. These issues are currently being addressed by a European Respiratory Society (ERS) working group 17.

The available references may also lead us to question the necessity of population-specific equations or the more practical solution of large generalisable (multicentred) reference populations 14, 18. Population-specific equations might be more representative of the population being tested but are expensive, logistically difficult and based on a smaller number of subjects than can be included in multicentre studies. By contrast, equations based on large representative populations may be less accurate but more precise. Recently published measurement and equipment standards in both young children and adults 7, 19, 20 should help to make reference data more comparable across different populations and equipment types in future.

Neither of the two approaches deals adequately with the well-recognised ethnic differences in lung function 16, 21, 22. Defining ethnicity itself is both complex and continually evolving. The issue of how best to address the impact of variation in ethnicity when interpreting lung function tests requires urgent international attention. Ideally, multi-ethnic populations should be studied using identical equipment and protocols and a more sophisticated approach to characterising body shape and size, with modern statistical techniques 23 applied to the resultant data. This would enable the effect of population characteristics to be distinguished from those of methodology and may show that appropriate anthropometric adjustments will allow one set of equations to be applied across a wide range of ethnic groups.

## WHAT IS RECOMMENDED?

Currently, the American Thoracic Society (ATS) recommend the use of the third National Health and Nutrition Examination Survey (NHANES III) reference 21 for interpreting spirometry results. This dataset is one of the few references spanning childhood and adulthood, which is also nationally representative and generalisable. The ERS does not recommend a particular reference, although in Europe the European Community for Steel and Coal reference 24 is most widely used. In the UK, it is recommended that the reference equation of Rosenthal *et al.* 25 be used in children aged <18 yrs, whereas the European Community for Steel and Coal equations are usually adopted in adults. Australia, New Zealand and the Asia-Pacific region do not set recommendations but leave it to individual laboratories to select appropriate reference equations. Even when recommendations are available, problems may arise if laboratories do not adhere to the same quality standards and equipment (or even software) when measuring patients in clinical settings compared to those used to construct the reference ranges.

The ATS and ERS both recommend the use of the lower limit of normal (LLN), or upper limit where appropriate (*i.e.* plethysmographic lung volumes), to delineate between health and suspected disease. As, by convention, the LLN is set at 5%, whereby 90% of the healthy population fall within the normal range, it must be appreciated that using this cut-off results in a 5% false positive rate.

## WHAT IS COMMONLY USED?

Despite the profound influence that the choice of reference equation may have on the interpretation of results 1, 2, many users of lung function equipment, and indeed the clinicians who request such tests, are not aware of which equations are being used to interpret results, simply relying on default values set by manufacturers at the time of installation. Although it is now recognised that the source of reference data should be an integral component of any pulmonary function test report, this practice has yet to be implemented by many manufacturers and users.

To ascertain which reference equations are currently available in commonly used equipment and how these equations are used and presented to the user, we surveyed manufacturers of spirometry equipment. From the results of 16 companies who responded it was evident that, despite the growing number of published reference equations, relatively few are readily available in equipment software. The majority of systems have the facility to install new equations upon request and to switch between adult/paediatric or Caucasian/non-Caucasian equations, but relatively few provide guidance to assist the respiratory scientist to select the most appropriate population group according to age or ethnic group of the subject being studied.

It is also apparent that the default equations set by several manufacturers are still based on reference equations that were developed more than 30–40 yrs ago. The development of international standards over the past two decades 7, 20, with corresponding changes in equipment, software and measurement techniques, combined with shifts in population characteristics, means that such equations may no longer be appropriate.

Another major limitation remains the lack of a single reference across all ages. The NHANES III reference is limited by a lack of subjects aged <8 yrs and >80 yrs, which often results in reference data being extrapolated to younger and older ages. Subbarao *et al.* 2 have demonstrated the inaccuracies in interpreting results at younger ages using NHANES III, and the ATS and ERS strongly discourage extrapolation of reference data beyond the intended age/height range 16. The alternative, to use paediatric reference equations before switching to NHANES III, as currently recommended by the CF Foundation (M. Rosenfeld, CF Foundation, Bethesda, MA, USA; personal communication), introduces discontinuities between equations at the transition point (fig. 2). These arbitrary jumps mean that interpretation of results may not reflect either baseline or change in clinical status of the subject, particularly at crucial junctions of clinical care, such as between paediatric and adult centres.

## WHAT IS NEW?

The continued publication of new reference equations reflects the widespread recognition of the limitations of existing equations, particularly with respect to specific ethnic groups. In the past 3 yrs reference equations have been published for: Brazilian adults 27, Polish adults 28, Chinese adults 29, Kazakh adolescents 30, Italian adults 31, and young children 32, 33. However, the abundance of unrelated studies can introduce new complexities, not least because most are not generalisable to any other population than the one in which they were collected. Two exceptions to this were the Health Survey for England 32, which studied a large population (6,053 individuals) and included subjects from 16–90 yrs 31, and the recently published study by Kuster *et al.* 35 which was also based on a very large sample (8,684 individuals aged 18–80 yrs) and used sophisticated statistical techniques to define a more accurate, age-dependent, LLN. While both of these populations were representative for those of English and Central European origins across a wide age range, they are limited by a lack of data in non-Caucasians and younger children aged <16 yrs. Furthermore, Kuster *et al.* 35 used self-reported height, which may introduce errors in the predicted values as well as bias, particularly in elderly subjects who tend to overestimate their current height. Both of these studies were limited to older subjects, thereby excluding the complexities of puberty and growth spurts. The Italian equations by Pistelli *et al.* 31 do attempt to develop smoothly changing equations across the entire age range but are limited to <500 subjects (aged 8–74 yrs) from one population and, therefore, may not be representative.

## PRE-SCHOOL EQUATIONS

Although several paediatric reference equations have been published, very few include children aged <5 yrs, measurement guidelines for which were only published recently 20. Of the pre-school equations that are available, lack of details regarding either the population characteristics or measurement techniques, together with a failure to link results to school-age equations, may limit their usefulness beyond the centre where they were generated 36. Furthermore, many are based on forced expiratory volume in 1 s (FEV_{1}), which may not be an appropriate measurement for very young children. These factors, together with the extent to which equations have been extrapolated beyond the intended height and age range, mean that paediatric spirometry results are frequently subject to misinterpretation.

In 2007, two specific studies for children aged <6 yrs were published 32, 33. Both included reference equations for FEV in 0.75 s (FEV_{0.75}). This may be a more appropriate outcome for this age group, since young children have large airways relative to their lung volumes such that, during forced expiration, emptying may be virtually complete within 1 s, if not earlier 8.

## STATISTICAL MODELLING

The complex nature of the relationship between body size and lung function, particularly during periods of rapid growth 37–39 means that traditional and commonly used multiple regression analysis is not sufficient. Recent advances in computational power and statistical software allow more sophisticated statistical methods to be applied relatively easily 40. This added flexibility allows the complexity of the relationship to be quantified to reflect biologically plausible relationships of lung function with age and height using a smoothly changing model.

## ALL AGE EQUATIONS

Given the well-recognised issues around the use and misuse of reference equations, two recent international collaborative initiatives have attempted to address some of the described limitations. The “all-age spirometry” study 41 investigated ways to develop more appropriate reference ranges which could describe the relationship between lung function, height and age more accurately during childhood, while also being applicable to adults and the critical transition between the two (fig. 3). These equations provide smoothly changing reference curves during periods of rapid growth and transition to produce a single reference across a wide age range (5–80 yrs) in Caucasians. Furthermore, the equations describe a multiplicative and allometric relationship, where FEV_{1}, forced vital capacity (FVC) and forced expiratory flow at 25–75% of FVC (FEF_{25–75}) are proportional to height raised to the power 2.5. For example, a 1% increase in height corresponds to a 2.5% increase in spirometry.

Since publication in 2008, these data have been supplemented with the largest collection of pre-school data (3–7 yrs) to extend the continuous all-age models from 3–80 yrs without changing the equations in children aged >10 yrs 8. In addition to extending the outcomes already reported, reference equations were developed for FEV_{0.75}.

The all-age reference demonstrates that the between-subject variability in lung function is highly age-dependent, which has important implications for defining the LLN. The largest degree of variability was observed in children aged <11 yrs, but between-subject variability also increased steadily with age after 30 yrs (fig. 4). The practical implication of these findings is that the “normal range” for FVC or FEV_{1} is considerably wider than the frequently quoted 80–120% pred both for young children and for subjects aged >30 yrs. Table 1 demonstrates several examples of the predicted values across the age range, along with the LLN (-1.645 z-scores) and the normal range using per cent predicted. This study also emphasised the much wider normal range for FEF_{25–75 }compared with FEV_{1} and FVC.

## IMPLEMENTATION OF ALL-AGE EQUATIONS

Due to the complexity of the smoothly changing models, the equations cannot be expressed as simple polynomial equations, instead they require look-up tables. Most modern pulmonary function test software can install the equations and complementary tables required. The equations are also available as an Excel add-in (fig. 5) and can be downloaded from www.growinglungs.org.uk 42.

## INTERPRETING RESULTS

Clinicians in respiratory medicine have become familiar with the concept of expressing lung function as per cent predicted, (observed/predicted)×100, where the predicted value is derived from reference equations. The median predicted value is 100% and any deviation from 100% indicates an offset from the predicted value. Conventionally, the variability between healthy subjects is taken to be an sd of 10%. On this basis, the normal predicted range would be from 80% to 120%.

This may be valid so long as the between-subject sd genuinely is 10% at all ages and for all lung function outcomes. In practice it is not (fig. 4). The flexible modelling techniques used when developing the all-age equations have quantified the sd and demonstrated that it is highly age and outcome dependent 8, 41. In young children and the elderly, for example, the sd for FEV_{1} is close to 15%, so that the normal range extends from 70% to 130%, and is 67–133% for 3 yr-olds (table 1). Ignoring this age-dependent variability means many patients will be flagged incorrectly as “abnormal”. Furthermore, the variability is appreciably greater for flows (FEF_{25–75}) than for volumes, the “normal range” being between 46–154% in a 3 yr-old.

A better approach to reporting lung function measures is to express results as z-scores (or sd scores). The z-score is a mathematical combination of the per cent predicted and the between-subject variability to give a single number that accounts for age- and height-related lung function variability expected within comparable healthy individuals. The LLN for a z-score is a value of -1.64. Unlike per cent predicted, where each outcome has a different cut-off, the same cut-off of -1.64 for z-scores applies across all ages, sex, ethnic groups and spirometric pulmonary function indices. For example, in a 160 cm tall, 50-yr-old male, an FEF_{25–75 }of 60% pred and an FEV_{1} of 80% pred both equate to a z-score of -1.5. For some lung function outcomes (*e.g.* plethysmographic lung volumes), impairment is indicated by an elevated value, in which case an upper limit of normal or 95% percentile (z-score 1.64) would be used. There are still many questions to be answered before a consensus can be reached regarding what requirements an index of severity of lung disease should fulfil. We may need to adopt an entirely different approach in future to ascertain, for example, what is the minimum FEV_{1} required to sustain life and what is the level (whether in a “pathological” range or not) which does not limit our daily activities. Furthermore, a lung function test must never be used in isolation to define disease severity; a number of factors including quality of life are likely to contribute, and the ideal approach remains to be determined. Neither per cent predicted nor z-scores used in isolation can answer those fundamental questions.

Regardless of whether z-scores or per cent predicted are used to express results, the age-specific normal range should always be included in the lung function report. When interpreting results, it is important to remember that there will always be a degree of within-person variability, so that by chance a measurement may be just outside the “normal range” on one occasion, but just within it on the next. It is also essential to take other clinical information into account, and to weigh the consequences of an erroneous decision against that of a correct diagnosis. Particular caution is required when interpreting results which lie close to the somewhat arbitrary cut-offs between health and suspected disease, especially when results are limited to a single test occasion. Presentation of the actual z-score rather than whether it lies above or below some cut-off will assist interpretation.

## FEV_{1}/FVC RATIO

The limitations of using a fixed ratio as a cut-off to define airway obstruction have also been highlighted recently. Hansen *et al.* 43 found the use of the Global Initiative for Obstructive Lung Disease criteria (<0.7 FEV_{1}/FVC and FEV_{1} below the normal range) resulted in an inappropriately high prevalence of obstruction in adults aged >70 yrs. The all-age spirometry analysis indicated the ratio has a strong negative age dependency, the frequently used fixed threshold of 0.7 for FEV_{1}/FVC not being attained until ∼50 yrs of age in males and later in females, such that airway obstruction in younger subjects would be missed. By contrast, the use of the 0.7 cut-off would falsely identify a large number of older healthy subjects as having lung disease. Similar observations were made when data from 40,646 adults aged 17–90 yrs were re-examined 4.

## OUTSTANDING ISSUES

### Elderly subjects

There is a relative lack of reference data for elderly subjects 35, with existing data based on relatively small and unrepresentative samples. Most of the currently recommended reference equations were developed in the 1980s and 1990s such that the older participants would be those born at the beginning of the 20th Century, with potential for cohort effects due to changes in health, nutrition and measurement standards 44. The tendency for over-estimated, self-reported height in the elderly may add further bias. Recent studies, such as those by Falaschetti *et al.* 34 and Kuster *et al.* 35, are particularly important as they can provide more appropriate reference equations for current generations. As lung function is highly age dependent in adults, future studies should aim to include a larger number of older subjects to improve the accuracy with which we can interpret spirometry in this population.

### Ethnicity

Despite recent progress, there remains a lack of appropriate equations for ethnic groups other than those of white European descent, especially among younger children. In the past, even when attempts to correct for such ethnic differences have been made, these have tended to apply the same fixed adjustment factor across all ages, all ethnic groups, both sexes and all spirometric outcome measures, an approach now shown to be over-simplistic 22, 41.

Ethnic-specific equations are not necessarily a satisfactory solution since this approach requires large and representative samples, which are not readily available. Currently, most ethnic-specific equations are based on small numbers, which are unlikely to be either representative or generalisable. Most importantly, ethnicity itself is extremely difficult to define, especially given the growing multi-ethnic population, and may be politically sensitive with some nations now forbidding recording of such details. Therefore, there is an urgent need for future research in this field to focus on finding an appropriate proxy measure, which accounts for the variability in lung function due to ethnicity. While sitting height has been suggested as an appropriate measure, it may not be practical to measure in clinical settings. Furthermore, sitting height was not found to explain the variability observed in Asian subjects 22 and may not be appropriate for all ethnic groups.

### Tracking longitudinal changes

Accurate identification and interpretation of changes in lung function as a result of disease or treatment requires knowledge of normal variability over time within healthy subjects 45–47, but most reference ranges are based on cross-sectional samples with a paucity of data regarding either short- or long-term repeatability of spirometry 44. Spirometry is an effort-dependent technique, the accuracy and repeatability of which depends on many factors including the equipment used, coordination and motivation of the subject, skill of the technician and overall quality control. Thus, studies that include repeated measures in health over specified time intervals (ranging from within day, to over a week, month or year) are urgently required, this being particularly pertinent during periods of rapid growth and development.

### Future steps

Many important advances have occurred in recent years including: 1) the development of standardised measurement protocols across all ages, including those for pre-school children 7, 19, 20; 2) application of more appropriate statistical techniques for developing reference equations 31, 35, 41; and 3) the establishment of collaborative networks with access to spirometry and other pulmonary function testing data in healthy individuals 17. Such initiatives provide an opportunity to develop an international collaborative group that can take the next steps towards developing more up-to-date and generalisable reference equations applicable across different populations and equipment models. A large multi-national collaborative study also has the potential to address the lack of ethnic-specific equations or to develop more appropriate adjustment factors. An initiative such as this has the potential to be expanded to other pulmonary function tests. The established collaborative initiatives could eventually be extended to include infants, elderly subjects and different ethnic groups in order to track the longitudinal development throughout the life course. Such references would of course need maintenance and update (every 5–10 yrs) and, as such, would need allocation of long-term funding.

Finally, these advances will have little impact unless disseminated and implemented in equipment software. Both manufacturers and users need to ensure the most appropriate reference is applied. As an initial step forward, manufacturers should clearly display the source of the reference equations and assist the technician to select equations based on the most appropriate population group, according to age and ethnic background. To facilitate interpretation of results, manufacturers should also ensure that z-scores and the lower and upper limits of normal are also clearly displayed, ideally with a bar delineating the position of the actual data relative to the predicted mean and normal range.

## Acknowledgments

We gratefully acknowledge the assistance of T. Cole and P. Quanjer with respect to much of the work underpinning this report. We also thank H. Pan for developing the lmsGrowth macro and the spirometry add-in.

## Footnotes

**Support Statement**S. Stanojevic received funding from Asthma UK (London, UK).

**Statement of Interest**None declared.

- Received September 10, 2009.
- Accepted November 24, 2009.

- ©ERS 2010