Abstract
Background Appropriate interpretation of pulmonary function tests (PFTs) involves the classification of observed values as within/outside the normal range based on a reference population of healthy individuals, integrating knowledge of physiological determinants of test results into functional classifications and integrating patterns with other clinical data to estimate prognosis. In 2005, the American Thoracic Society (ATS) and European Respiratory Society (ERS) jointly adopted technical standards for the interpretation of PFTs. We aimed to update the 2005 recommendations and incorporate evidence from recent literature to establish new standards for PFT interpretation.
Methods This technical standards document was developed by an international joint Task Force, appointed by the ERS/ATS with multidisciplinary expertise in conducting and interpreting PFTs and developing international standards. A comprehensive literature review was conducted and published evidence was reviewed.
Results Recommendations for the choice of reference equations and limits of normal of the healthy population to identify individuals with unusually low or high results are discussed. Interpretation strategies for bronchodilator responsiveness testing, limits of natural changes over time and severity are also updated. Interpretation of measurements made by spirometry, lung volumes and gas transfer are described as they relate to underlying pathophysiology with updated classification protocols of common impairments.
Conclusions Interpretation of PFTs must be complemented with clinical expertise and consideration of the inherent biological variability of the test and the uncertainty of the test result to ensure appropriate interpretation of an individual's lung function measurements.
Abstract
Data from pulmonary function tests must be complemented with clinical expertise and consideration of the inherent biological variability and uncertainty of the test result to ensure appropriate interpretation of an individual's lung function measurements https://bit.ly/3ecIuFc
Introduction
Pulmonary function tests (PFTs)/respiratory function tests reflect the physiological properties of the lungs (e.g. airflow mechanics, volumes and gas transfer). These tests have been used for decades to help diagnose lung disease, explain dyspnoea, and monitor disease progression and treatment response. In addition, PFTs have been employed in population studies of the association between exposures and lung health. The American Thoracic Society (ATS)/European Respiratory Society (ERS) Task Force on the standardisation of PFTs published a series of technical documents in 2005 [1–4]. The technical standards for spirometry [5] and single-breath carbon monoxide uptake in the lung (transfer factor (TLCO) or diffusing capacity (DLCO)) [6] have recently been updated, and an update on lung volumes is forthcoming. This document is an update to the interpretation strategies of routine PFTs [3].
Interpretation of technically acceptable PFT results has three key aspects. 1) Classification of observed values as within/outside the normal range with respect to a population of healthy individuals. This involves consideration of the measurement error of the test, as well as the inherent biological variability of measurements both between individuals and between repeated measurements in the same individual. 2) Integration of knowledge of physiological determinants of test results into a functional classification of the identified impairments. 3) Integration of the identified patterns with other clinical data to inform differential diagnosis and guide therapy. These are three distinct, yet complementary aspects of interpretation. This document addresses only the first two aspects. The final integration of pulmonary function results into a diagnosis or management plan is beyond the scope of this technical guidance on physiological interpretation.
Appropriate interpretation of PFTs requires measurements that meet technical specifications for test performance and appropriate levels of quality [6–8]. Poorer quality tests must be interpreted with greater uncertainty as the measurements may not reflect functional impairments. Interpretation also relies on clear reporting of results; therefore, current ATS standards for reporting of PFTs are recommended [9]. Technical aspects of PFT measurement, equipment and biological controls are summarised in the ERS/ATS standards for each PFT [6–8].
This document considers the 2005 recommendations [1–4] and incorporates evidence from subsequent literature to establish new standards for PFT interpretation. The key distinction between the previous recommendations and the current ones is the emphasis on the uncertainty of measurement and interpretation.
A summary of the changes from the 2005 interpretation standard can be found in table 1.
Methods
Task Force members were selected by the ATS Proficiency Standards for Pulmonary Function Laboratories Committee, as well as ERS leadership. Conflicts of interest, including academic conflicts, were declared and vetted by the ATS throughout the duration of the Task Force. Six of the 16 Task Force members are current or past members of the Global Lung Function Initiative Network Executive. A comprehensive literature search was conducted by a professional librarian using the following databases: Ovid MEDLINE, Epub Ahead of Print, In-Process & Other Non-Indexed Citations, Ovid MEDLINE Daily, Ovid MEDLINE 1946 to Present, Embase Classic, Embase 1947 to 29 March 2019 and Wiley Cochrane. The search terms are listed in figure 1. All identified publications were screened by two members of the Task Force at the title/abstract level. Publications identified as relevant for the Task Force were read in full by at least one member of the Task Force. The literature search was systematic, but not a formal systematic review of the evidence. Available literature was used to inform the discussions and recommendations. The reported standards were reached by consensus among the Task Force members and apply to all settings globally (clinical interpretation, research studies, and tertiary, community and primary care). Consensus was reached after all Task Force members agreed on the final version.
Comparison of measured values to a healthy population
Global Lung Function Initiative (GLI) reference equations for spirometry [10], diffusing capacity [11] and lung volumes [12] should be used to define the expected range of values in healthy individuals.
Summaries of data collected in otherwise healthy individuals provide meaningful benchmarks against which to compare an individual's PFT results. The range of values expected in a healthy population is expressed using population-based reference equations that, ideally, are derived from large and representative samples of healthy individuals (i.e. never-smokers, without a history of respiratory disease). There are hundreds of published reference equations for different populations and for each PFT. Comparison of published reference equations and individual results derived from different reference equations demonstrates large differences that may be attributed to real population differences in lung function or simply sampling variability with equations derived from small samples. The lack of standards for how to derive and use PFT reference equations has led to considerable confusion in the interpretation of PFT results.
Typically, height, age and sex are used to estimate expected lung function in health, and account for the wide biological variability observed within and between populations. Height per se is not a direct determinant of lung size but is a reasonable proxy for chest size. Differences in height and body proportions (e.g. leg length and trunk length) have been observed between populations [13]. The determinants of the observed differences in height and chest size are multifaceted and must be considered during PFT interpretation. Age has two important contributions to the expected range of lung function in health. In childhood, somatic growth (i.e. height) is strongly linked to chronological age, except during periods of rapid growth and development, such as puberty, when there is asynchrony between height and thoracic volume, and thus disproportionate growth between lung parenchyma and airway calibre [14, 15]. In older adults the rigidity of the chest wall, chest wall muscles and the elasticity of the lung change with the normal ageing process [16, 17]. Sex is an important predictor of lung size, even after accounting for differences in height [18–21]. Thus, while gender identity should be respected, use of biological sex will yield a more accurate prediction of lung function. The effect of gender-affirming hormonal therapy on lung function is poorly understood, so the appropriate reference equation for transgender individuals is currently not known. Timing of gender reassignment, especially during adolescence, may impact lung growth and development, and thus needs to be considered when interpreting results during adulthood [22, 23]. Considerations for individuals in whom standing height cannot be measured are summarised in the technical standards for each PFT [6–8].
The reasons for observed differences in lung function between people around the world are multifactorial and not fully understood. The narrow definition of health may contribute to the observed differences, as “healthy” individuals may include people exposed to risk factors for poor lung health during their lifetime. There are ongoing efforts to better understand the geographical, environmental, genetic and social determinants of health that play a role in explaining these observed differences. The differences by population groupings that were observed in the GLI data may represent genetic differences or health disparities, and thus reflect social and environmental determinants of health. The specific contribution that genetic ancestry plays in the regional differences that were observed in GLI data remains uncertain. Furthermore, assigning ethnicity is challenging. It is important that individuals have their lung function assessed against the appropriate reference population for that individual. The historical approach of fixed adjustment factors for race is not appropriate and is unequivocally discouraged [24, 25]. As there are observed population differences in body proportions [13, 26, 27] and lung function [28, 29], in some contexts it may be relevant to interpret results for an individual relative to that of a similar ancestral grouping, whereas in others it may be more appropriate to compare to the whole population. Caution over which equation is applied is necessary to ensure the same reference equations are applied across serial encounters. An individual's medical history, symptoms and social circumstances must be considered when applying PFT results to inform clinical decision making.
GLI equations
The GLI reference equations are available for spirometry [10], DLCO [11] and lung volumes [12], and facilitate standardised reporting and interpretation of pulmonary function measurements. These three GLI equations (spirometry, DLCO and lung volumes) are internally consistent, providing a single suite of PFT equations which will avoid discordant results between PFTs and potential misclassification of physiological phenotypes. The GLI equations include the largest samples of healthy individuals and represent a single standard to compare observed measurements applicable across all ages. The GLI equations also explicitly describe the between-subject variability across age, such that the limits of normal are age-specific. Despite the name, the GLI do not include individual data from all populations around the world and do not explicitly consider the factors that may contribute to the observed differences in lung function between populations. Spirometry equations are available for four specific population groupings as well as a composite “other” equation which represents a multi-ethnic population (table 2). The GLI “other” equation was mathematically derived from the four population-specific equations, including the White group, and represents an average across these populations. GLI reference ranges for the forced expiratory volume in 1 s (FEV1)/forced vital capacity (FVC) ratio appear relatively independent of population differences and will result in more consistent interpretation between populations. GLI DLCO equations and GLI static lung volumes are currently based on measurements predominantly from individuals of European ancestry due to insufficient reference data from other populations.
Further studies regarding the use of reference equations relating to specific population groupings are currently under development, so these recommendations are based on the current evidence designed to increase the precision of determining whether the results are outside of the expected range for an individual. There is no single reference equation equally applicable to all populations. There is a trade-off between applying reference equations that are specific to population groupings versus a single standard for all. Different approaches may be warranted in different contexts. Therefore, at this time employing the appropriate GLI spirometry equations based on self-reported ancestral origins, if known, should be used as a way to standardise lung function measurements for sex, age and height. If ancestral origins are unknown or uncertain, the GLI “other” equation should be used. PFT reports and research publications must include the specific reference equation that is used.
Differences from the previous recommendations
The 2005 ATS/ERS interpretation strategy [3] recommended the use of the National Health and Nutrition Examination Survey (NHANES) III spirometry reference equations for individuals in North America. The NHANES III spirometry data are included within the GLI equations and, overall, the predicted values are similar, with a few notable differences. NHANES III derived separate equations for Mexican Americans and Caucasians, whereas the GLI equations do not make this distinction, as re-analysis of NHANES III data reveals minimal differences between expected lung function in these populations [30]. The GLI equations span a wider age range (3–95 years) than NHANES III (8–80 years). There have been notable differences observed in predicted values between the two equations for adults older than 65 years [31–33]. The 2005 ATS/ERS interpretation document [3] did not make specific recommendations for reference equations in Europe and elsewhere, although the European Community for Steel and Coal (ECSC) equations have been used widely. There are demonstrable differences between the predicted values from ECSC and GLI, where the predicted GLI values are consistently higher than ECSC values [34–36].
Special considerations for DLCO
The overall recommendation to use the GLI reference equations also applies to DLCO [11]. Interpretation of DLCO values requires adjustment for equipment dead space and barometric pressure (altitude), which should be done by the equipment software before calculating predicted values [11]. Changes in haemoglobin, carboxyhaemoglobin and carbon monoxide back-pressure must also be considered when interpreting results. This is particularly important in situations where patients are being serially monitored for possible drug toxicity and where haemoglobin is subject to large shifts (e.g. chemotherapy for cancer) [6, 7]. The clinician must incorporate information about haemoglobin concentrations on an individual basis while interpreting results. It is recommended that the reference value be adjusted for measured haemoglobin concentration.
Special considerations for lung volumes
The overall recommendations for reference equations also apply to the interpretation of lung volumes. GLI [12] and other reference equations for lung volumes adjust for height but not weight. However, lung volumes can be affected by obesity, with significant reductions in functional residual capacity (FRC) and expiratory reserve volume (ERV) at body mass index (BMI) >30 kg·m−2 [37, 38], with similar findings in children and adolescents when obesity is defined as >97th percentile [39]. In extreme obesity, both obstructive and restrictive ventilatory impairment patterns are seen [40]. Nonetheless, measured lung volumes for the majority of obese individuals still fall within the normal range and total lung capacity (TLC) is usually not reduced until BMI >40 kg·m−2 [37]. The typical patterns of obstruction and restriction may be altered in obesity; thus, in the context of obesity, results observed outside the normal range need to be interpreted with greater uncertainty [41]. Measurements of lung volumes are also impacted during pregnancy, and results need to be interpreted cautiously both during pregnancy and in the post-partum period [42].
Practical considerations
PFT reports must include the reference equations applied for each index [9]. Caution should be applied to interpretation of results where different reference equations or combinations of reference equations are used for each test (or indices) as there may be differences in the healthy populations used to derive the equations. A change in reference equations must be clearly documented and communicated, as an individual's results may appear to change based solely on the change in reference equation [34, 43–45]. If reference equations are changed, interpretation of trends should include re-calculation of prior predicted values as well as comparison of raw values to avoid misinterpretation. If standing height, biological sex or ancestral background are not known, the report must clearly state what is assumed.
Validation of reference equations in individual PFT laboratories with a small sample of healthy individuals (e.g. 100) is not recommended. Differences due to sampling variability alone can be as large as 0.4 z-score (6–9% predicted) even when the same equipment and protocols are used and the sample size is at least 1000 [46].
Limits of normal
The 5th and 95th percentile limits (−1.645 and +1.645 z-score) of the healthy population can be used to identify individuals with unusually low or high results, respectively.
Ideally, limits of normal ought to be based on an individual's pre-disease measure or baseline. Further clinical decision making requires relevant thresholds based on prognosis or clinical risk of adverse outcomes. To date, no satisfactory outcome-based thresholds for lung function have been defined; therefore, careful consideration of the medical and exposure history of an individual is necessary when interpreting lung function results when using the limits of normal. Further research to establish a comprehensive disease-specific clinical approach to interpretation (not simply relying on whether results are within or outside the normal range) is necessary. It is the consensus of the Task Force that the percentile limits represent a standardised and unbiased approached to identify values outside the range of expected results from a normal population.
A reference range represents the distribution of values that are expected in a healthy population and the lower limit of normal (LLN) represents a cut-off to define results that are outside the range of values typically observed in health. This approach is used for many clinical outcomes in medicine [47–49]. Population-defined z-scores or percentile values describe the chance the observed result falls within the distribution of values in healthy individuals (figure 2). At the 5th percentile (corresponding to a z-score of −1.645), there is a 5% chance that the results in a healthy individual would be at or below this level, as shown in figure 2. At the 1st percentile, there would be a 1% chance. Since typically for spirometry, low values are considered abnormal, it has become standard to define the LLN as the 5th percentile, accepting that this will result in 5% of healthy individuals having a false-positive result (i.e. being incorrectly classified as having an abnormal result). The 5th percentile represents a trade-off between incorrectly classifying a low value in a healthy individual and missing a clinically significant reduction in lung function (i.e. increased sensitivity for less specificity compared with using a lower percentile). For tests that may be outside the normal range in either direction (e.g. lung volumes or DLCO), the potential for false positives increases to 10% but the probability in a given individual for which these tests are requested based on concerns for lung disease is lower because there is a higher likelihood (pre-test probability) that lung function will be outside the normal range [50]. The LLN does not necessarily indicate a pathophysiological abnormality nor is it a clinically meaningful threshold to diagnose disease. It provides an indication of whether the observed result can be expected in otherwise healthy individuals of similar age, sex and height. A result within the expected range for a subject does not exclude the presence of a disease process impairing function. For example, a drop from the 95th percentile to the 10th percentile is a very significant change but still leaves lung function within the normal limits.
The LLN need not be the 5th percentile. With adequate supporting evidence, the LLN could be adjusted lower when PFTs are performed in the absence of elevated risk (e.g. screening the general population). For example, when screening a general population, a more conservative lower limit of 2.5% (−1.96 sd or z-score) or even 1% (−2.326 sd or z-score) will reduce the number of false positives. The specific LLN that is used must be clearly documented in PFT reports. Results that are close to the LLN should be interpreted with caution and considered in the context of the individual patient's medical history, physical findings and pre-test probability of disease. This further emphasises that the person interpreting PFTs should be informed of the patient's context and not solely rely on the numbers generated in reports.
The widely used cut-offs of 80% of predicted for FEV1 (% predicted=observed×100/predicted) and the 0.70 cut-off for the FEV1/FVC ratio are strongly discouraged [51]. Percent predicted does not take into account the observed age-related changes in measurement variability (figure 3). These “rules of thumb” only approximate the LLN in the mid-range of age, where screening or case finding for obstructive disease is most likely to be conducted (figure 4). The simplicity of these cut-offs has resulted in their use across the age spectrum, leading to systematic misinterpretation of results, particularly for women, children and older adults [52, 53]. For example, the LLN for FEV1 varies from 81% predicted at the age of 10 years to 68% predicted at the age of 85 years (figure 3 and table 3).
The limits of normal derived from data collected in healthy individuals represents a cross-sectional snapshot of an otherwise healthy population, and the range of values does not represent ideal lung growth and development expected under optimal social and environmental conditions. Therefore, neither simple cut-offs nor the 5th percentile should be used as absolute diagnostic criteria, as there is a gradual increase in risk the further away from the range of values observed in health (figure 2). There is considerable overlap in the range of values in health and disease, resulting in a “zone of uncertainty” (figure 5). Early-life exposures and cumulative environmental exposures have negative effects on growth and development of the lungs that predispose individuals to lung disease in later life [54, 55]. For some ventilatory impairments, development of airflow obstruction is characterised by a slowly progressive decline in FEV1 relative to FVC [56] and it is likely that early stages of airflow obstruction will be present before the FEV1/FVC value falls below the LLN.
Future directions
There is an urgent need to develop more precise and individualised ways to define what normal lung function should be under ideal growth and environmental conditions. There is a need to understand the factors that contribute to population differences and environmental influences in lung function, and the impact of using ethnic-specific equations on clinical decisions in populations around the world. There is also a need for data to better define the relationship between risk factors, lung function and outcomes that would allow a shift from the interpretive dichotomy of normal/abnormal to a more realistic probability assessment as lung function declines through lower percentiles or z-scores.
Bronchodilator responsiveness testing
Changes in FEV1 and FVC following bronchodilator responsiveness (BDR) testing should be expressed as the percent change relative to the individual's predicted value. A change >10% of the predicted value indicates a positive response.
When clinically indicated, the BDR test assesses the change in respiratory function in response to bronchodilator administration. The BDR result reflects the integrated physiological response of airway epithelium, nerves, mediators and airway smooth muscle, along with structural and geometric factors that affect airflow in the conducting airways [3, 57–59]. The choice of bronchodilator, dose and mode of delivery is a clinical decision. The relative merits of different protocols (e.g. delivered dose) are unclear. Recommended BDR protocols are included in the 2019 ATS/ERS spirometry standard [5]. The concept of a response to bronchodilators must not be confused with “reversibility” of airflow obstruction, which is a qualitative term reflecting the normalisation of FEV1/FVC (and hence airflow obstruction) after bronchodilator administration [60]. Here we address how to interpret acute changes in lung function after bronchodilator administration and do not consider how BDR can be used to make diagnostic or clinical decisions.
Expressing the results of a BDR test
Interpretation of BDR can employ two approaches: 1) the upper limit of the changes expected in a healthy population or 2) a threshold at which a clinically meaningful event occurs. The upper limit of the changes expected in a healthy population may not be clinically relevant [61]. Although data are limited for clinically meaningful thresholds across a range of diseases and age groups, there is evidence related to survival to support a threshold-based approach [27, 57, 59, 62, 63]. In over 4000 patients referred for BDR in a hospital laboratory, those with BDR >8% of predicted FEV1 had a lower subsequent mortality than those with BDR below this threshold [62]. Thus, a threshold approach that is supported by both methods (i.e. the percentage of predicted value threshold) should be used until further data are available [27].
Established methods to assess the change in FEV1 and FVC after administration of a bronchodilator include: 1) an absolute change from the initial value, 2) a relative change related to the initial value or 3) a change related to the individual's predicted value, or a combination of these options. The combination of an absolute and relative change (percentage change) in FEV1 and FVC from baseline as evidence of BDR was recommended in the 2005 ATS/ERS interpretation statement (i.e. >200 mL AND ≥12% increase in FEV1 and/or FVC) [3]. The major limitation to this approach is that the absolute and relative changes in FEV1 and FVC are inversely proportional to baseline lung function, and are associated with height, age and sex in both health and disease [57, 59, 62–64]. The use of approaches 1) and 2) to define BDR are no longer recommended.
We recommend reporting the change in FEV1 or FVC as the increase relative to the predicted value, which minimises sex and height difference in assessing BDR [57, 59, 62]. Two studies of collated epidemiological data in healthy adults reported the upper limit (95% percentile) of the range of bronchodilator response in healthy individuals to be 11.6% and 10.1% of predicted for FEV1 and 10.2% and 9.6% of predicted for FVC [59, 62]. Similar changes of 8.5% for FEV0.75 in young children have been reported [65]. BDR in FVC, rather than FEV1, has been shown to better reflect the physiological processes of air trapping [66–70]. Based on these considerations, it is recommended that BDR be classified as a change of >10% relative to the predicted value for FEV1 or FVC (see box 1 for example calculation). This approach avoids misinterpretation due to the magnitude of the baseline lung function level. Over-reliance on strict cut-offs for BDR should be avoided as these cut-offs are prone to the same limitations as for limits of normal. Importantly, this is not equivalent to a 10% change between pre- and post-bronchodilator measurements.
BOX 1 Determination of a bronchodilator response
A change of >10% is considered a significant bronchodilator response. #: predicted value should be determined using the appropriate Global Lung Function Initiative (GLI) spirometry equation.
For example, a 50-year-old male, height 170 cm, has a pre-bronchodilator forced expiratory volume in 1 s (FEV1) of 2.0 L and a post-bronchodilator FEV1 of 2.4 L. The predicted FEV1 is 3.32 L (GLI 2012 “other” equation). Therefore, their bronchodilator response is reported as an increase of 12.1% of their predicted FEV1 and classified as a significant response.
Changes in forced expiratory flows (e.g. peak expiratory flow (PEF) or forced expiratory flow at 25–75% of FVC (FEF25–75%)) are highly variable and significantly influenced by changes in FVC, such that pre- and post-bronchodilator measurements are not comparable [3].
Future directions
The recommended BDR threshold balances the available data and consistency across age groups. There were limited data in children and young adults to inform recommendations; further evidence is needed to validate this approach in children. Future research is also needed to understand the impact of bronchodilator protocols (e.g. delivered dose) on results. The ability of an acute response to bronchodilators to predict future clinical status other than survival is unclear and BDR does not accurately differentiate between types of airway diseases [71–73]. Further evidence is needed to support anchor-based approaches associated with outcomes other than survival. Finally, there are limited data regarding changes in pulmonary function indices derived from lung volumes, gas transfer and airway resistance following bronchodilator administration.
Natural changes in lung function over time
There are limited data to support a single recommendation for interpreting PFT reproducibility. Two distinct approaches were identified to express natural changes in lung function: conditional change scores for children and FEV1Q for adults.
The interpretation of a series of lung function measurements and identifying meaningful changes in lung function over time are often used to guide clinical decisions. Ideally, an individual's pre-disease measure of lung function or baseline should be used as a reference. Comparison with the rate decline observed in a group of healthy individuals can help to determine if rate of decline is greater than what can be expected in health. Accelerated lung function decline, irrespective of baseline lung function, is associated with poor clinical outcomes [74, 75]. Interpretation of serial measurements relies on accurate limits of reproducibility of a PFT index, including the natural changes over time and the changes that would be considered outside the normal biological variability over both short and long periods of time.
Reproducibility
Previous recommendations define a meaningful change as one greater than the biological variability (and measurement error) of a test. An absolute change in FEV1 (e.g. 100 mL) or the relative change from a previous assessment (e.g. a 10% change in FEV1 from baseline in healthy individuals) has historically been used to indicate clinically meaningful changes. However, changes over time have been demonstrated to be dependent on age, sex, baseline lung function and disease severity, limiting the generalisability of these approaches [76, 77]. Furthermore, these limits were derived from population data in healthy individuals and do not necessarily reflect clinically meaningful outcomes for a specific disease or condition [78].
A visual representation of serial measurements (e.g. a trend graph) may be included as part of a PFT report. A decline in lung function observed from multiple measurements over time is more likely to reflect a real change in lung function than two measurements alone.
Considerations in children
Lung function measurements in children are more variable than in adults. This is due to both the physiology of the chest wall muscles as well as cognitive development, which may influence test quality and biological variability. Interpretation of serial measurements during periods of rapid growth and development (e.g. adolescence and early adulthood) requires special attention to avoid misinterpreting the normal plateau of lung growth. Examination of absolute measures should be used to verify “decline” in this period. Generally, limits of reproducibility applied in children are extrapolated from studies in adults and do not consider the unique developmental aspects of childhood, including how somatic and lung growth are not always synchronous. We identified one recently published study that demonstrates conditional change scores can be used to identify changes in lung function greater than what can be expected in healthy children and young people [77]. The conditional change scores adjust for longitudinal changes in FEV1 z-score and conditions on the initial FEV1 value (see box 2). This concept has yet to be validated, extended to adults or applied to other lung function indices, but may be a reasonable tool to facilitate interpretation.
BOX 2 Calculation of a conditional change score
The change score is defined as: where zFEV1 at t1 and t2 are the observed forced expiratory volume in 1 s (FEV1) z-scores at the initial and second time-point, and r is defined as 0.642–0.04×time (years)+0.020×age (years) at t1. Changes within ±1.96 change scores are considered within the normal limits.
For example, a 14-year-old male (170 cm) with a lung function drop from −0.78 z-score (90.6% predicted) to −1.60 z-score (80.6% predicted) within 3 months (r=0.912) has a corresponding change score of −2.17, which is outside the limits of normal. The same drop over a period of 4 years (r=0.762) corresponds to a change score of −1.55, which is within the limits of normal variability.
Considerations in adults
In adults over the age of 25 years, FEV1 typically declines in healthy non-smokers by 30 mL per year [79, 80]; however, this does not necessarily translate into a threshold of change that can be expected within an individual between two repeated measurements. In occupational medicine, where repeated measurements are made annually (or further apart), a 15% threshold has been proposed as a change outside the biological variability of the test and considered clinically relevant [80]. These limits would not necessarily apply to an individual with a chronic progressive lung disease where the follow-up interval is shorter. Individualised approaches that consider the test quality, time interval between tests, an individual's baseline lung function as well as the clinical findings at the time of measurement are needed for accurate interpretation.
An alternative approach is FEV1Q, i.e. FEV1 divided by the sex-specific 1st percentile values of the absolute FEV1 values found in adults with lung disease (0.4 L for women and 0.5 L for men) [81]. FEV1Q expresses FEV1 in relation to a “bottom line” required for survival, rather than how far an individual's result was from their predicted value. Under normal circumstances 1 unit of FEV1Q is lost approximately every 18 years, and about every 10 years in smokers and the elderly (see box 3). Over a short interval, or even annually, FEV1Q should remain stable; changes in FEV1Q may indicate a precipitous change in lung function and can be used as an alternative approach to gauge meaningful changes over time in adults. FEV1Q is not appropriate for children and adolescents.
BOX 3 Calculation of FEV1Q in adults
FEV1Q is the observed forced expiratory volume in 1 s (FEV1) in litres divided by the sex-specific 1st percentile of the FEV1 distribution found in adult subjects with lung disease; these percentiles are 0.5 L for males and 0.4 L for females. The index approximates the number of turnovers remaining of a lower survivable limit of FEV1.
For example, a 70-year-old woman with an FEV1 of 0.9 L would have an FEV1Q of 0.9 L/0.4 L or 2.25. Values closer to 1 indicate a greater risk of death.
Further research
There is a paucity of data describing natural variability in lung function indices within an individual over time across all ages, PFTs and disease groups [82]. Future work is urgently needed to identify a minimum clinically important difference for each lung function test and index that is anchored to disease-specific outcomes. Further research addressing the short-term (months), annual and long-term (years) changes in healthy individuals is urgently needed. Disease-specific anchor-based approaches that link to clinically meaningful end-points are strongly recommended to define appropriate thresholds for clinical interpretation.
Severity of lung function impairment
A three-level system to assess the severity of lung function impairment using z-score values should be used; z-scores > −1.645 are normal, z-scores between −1.65 and −2.5 are mild, z-scores between −2.51 and −4 are moderate, and z-scores < −4.1 are severe.
The magnitude of lung function deviation from what is expected of healthy individuals, having accounted for age-dependent variability, can be used to determine the association with objective outcomes such as quality of life or mortality [83–87]. The association between lung function reported as z-scores with all-cause mortality in patients for FEV1, FVC and DLCO is shown in figure 6 [88]. As lung function impairment is a continuum, setting multiple fixed boundaries to define grades is in some sense artificial and may imply tiered differences that are unfounded.
The previously recommended severity levels for airflow obstruction used percent predicted FEV1 with five levels using cut values of 70%, 60%, 50% and 35% [3]. The use of percent predicted does not give uniform gradations across age [53, 89]. To account for an individual's sex, height, age and ethnic background, the previous severity scale for airflow obstruction was adapted for z-scores with cut values of −2, −2.5, −3 and −4 [88, 90]. z-score cut levels between −1.65 and −2.5 have little difference in risk of death and were therefore merged into a “mild” group (figure 7). Individuals with z-scores between −2.51 and −4 exhibit a moderate risk of mortality and these categories were therefore merged into the category called “moderate”. The proposed three-scale system reduces the previous two lower categories into one for mild impairment and extends the moderate levels to improve the fit for gradation of mortality risk [88].
Importantly, the severity of lung function impairment is not necessarily equivalent to disease severity, which encompasses quality of life, functional impairment, imaging, etc. Disease severity will be influenced by many other possible clinical features not related to lung function impairment such as anaemia, neuromuscular weakness or drug side-effects, to mention just a few. There are numerous questionnaires designed and validated to assess the severity of symptoms and impairment [91–94]; these are outside the scope of this work. In addition, the association between the proposed gradations and survival in children has not been evaluated.
Rationale for z-scores
z-scores express how far an observed lung function value is from the predicted value after accounting for sex, age, height and ancestral grouping, expressed in standard deviations. This is the method recommended for determining the limit of normality and for stating the degree of lung function impairment. Percentile values are easily derived from z-scores, and explicitly indicate the probability a healthy individual would have a result below this level and where the individual's result lies in relation to the healthy population. Percentile values are useful in assessing results around the normal range but are less useful for extreme values.
T-scores are similar to z-scores but are expressed in the number of standard deviations an observation is below a maximum predicted value achieved during early adulthood for an individual of the same sex, height and ancestral grouping [95]. However, T-scores assume that population level maximum lung function can be maintained throughout adulthood. Furthermore, T-scores cannot be applied to children and young adults.
Assessing severity of impairment using z-scores is more consistent across age and sex than percent predicted [88, 90]. Figure 7 shows the previously recommended categories for airflow obstruction using percent predicted (i.e. 70%, 60%, 50% and 35% defining mild, moderate, moderately severe, severe and very severe) for eight different people at their respective z-score values. Older age has the greatest differences in interpretation between percent predicted and z-score cut-points such that the 80-year-old individual is deemed to have a mild impairment using percent predicted thresholds when their lung function is within the normal range using z-scores. Figure 7 shows that percent predicted creates problems in equitable grading with mild impairment, but z-scores have problems with respect to severe grading in older subjects as many older individuals will be classified as severe.
Other approaches
In adults, FEV1Q has been found to be better than z-scores, percent predicted and FEV1 standardised by powers of height (e.g. FEV1·Ht−2 and FEV1·Ht−3) in predicting survival [81, 96, 97], chronic obstructive pulmonary disease (COPD) exacerbations [98] and adverse health outcomes [99]. There is also evidence that the FEV1Q approach may be more useful to differentiate lung function impairment within the “severe” group and in older adults [81, 96, 97], but FEV1Q has not been adequately explored in children and adolescents.
Considerations in the elderly
Reference equations for lung function indices represent the range of values expected in healthy individuals of the same sex, height and age. The number of healthy individuals over age 80 years in reference cohorts is smaller and may represent a selected population of survivors. In older individuals, interpreting lung function as an absolute measure, such as FEV1Q, may be more meaningful than using reference equations. There is evidence that extrapolating predicted values from a younger age may address some of these issues [32, 33]. Nonetheless, interpretation at the extremes of the age and/or height ranges has greater uncertainty and requires careful consideration.
Future directions
Assessing the severity of lung function reduction should be linked to important clinical outcomes (survival, exacerbations, admissions, symptoms, imaging, etc.) which may be disease-specific. FEV1Q and other reference-free indices should be explored in this way. FEV1Q highlights that survival better relates to how far the FEV1 is above a “survivable bottom line” rather than how far it has dropped from a predicted value. Simpler grading with fewer tiers as proposed should be investigated for a broader range of lung function indices and for different diseases in both children and adults.
Classification of physiological impairments by PFTs
The interpretation of PFTs should focus on values of airflow, lung volume and gas transfer measurements to recognise patterns of altered physiology. PFTs alone should not be used to diagnose a specific pathological condition.
PFT interpretations should be clear, concise and informative to help understand whether the observed result is normal and, if not, what type of physiological impairment is likely involved. In addition, repeated assessment of PFTs is important to detect clinically meaningful deviations from an individual's previous results. Here, we will review the interpretation of measurements made by spirometry, lung volumes and DLCO as they relate to underlying pathophysiology.
Routine PFTs address three functional properties of the lungs: 1) airflow (inspiratory and expiratory), 2) lung volumes and capacities (TLC, residual volume (RV) and FRC), and 3) alveolar–capillary gas transfer (measurement of carbon monoxide uptake over time), expressed as the transfer capacity of the lung for carbon monoxide (TLCO), also known as the diffusing capacity of the lung for carbon monoxide (DLCO). Abnormalities in these three functional properties are conventionally classified as obstructive ventilatory, restrictive ventilatory and gas transfer limitations or impairments (table 4).
Ventilatory impairments defined by spirometry
Airflow limitation and airflow obstruction
Expiratory airflow is generally assessed by spirometry, with the most important indices being FEV1, FVC and the FEV1/FVC ratio. In normal lungs, airflow is determined by the magnitude of expiratory driving pressure (expiratory muscles and elastic recoil) and the size and viscoelastic properties of the lungs and airways. Maximal airflow is generally assessed spirometrically and may be limited by different diseases that lead to different outcomes: 1) impaired expiratory muscle function (weakness or poor effort; neuromuscular ventilatory impairment), reduced elastic recoil or reduced chest wall expansion which reduce PEF, FEV1 and FVC, with a variable FEV1/FVC ratio; 2) physical obstruction of a central airway (i.e. outside of lung parenchyma), which can affect the trachea/major bronchi and leads to a disproportionate reduction in PEF compared to FEV1 with variable FEV1/FVC ratio; and 3) intrapulmonary airflow obstruction produced by premature airway collapse, bronchoconstriction or airway inflammation/wall thickening/oedema leading to airway narrowing. These obstructed airways reduce PEF and FEV1 to a much greater extent than any reduction in FVC, so the FEV1/FVC ratio is characteristically low [100–102].
While we recognise the normal physiological events involved in expiratory “airflow limitation”, we use the term “airflow obstruction” to refer to pathological reduction in airflow from the lungs that leads to a reduced FEV1/FVC ratio.
An obstructive ventilatory impairment is defined by FEV1/FVC (or VC) below the LLN, which is defined as the 5th percentile of a normal population (figure 8 and table 5). This spirometric definition of airflow obstruction is consistent with the 1991 ATS [103] and 2005 ATS/ERS [3] recommendations; however, it contrasts with the definitions suggested by the Global Initiative for Chronic Obstructive Lung Disease (GOLD) [104] and the ATS/ERS [105] guidelines on COPD, which use a fixed FEV1/FVC value of 0.7 to define an obstructive ventilatory impairment.
The earliest changes associated with respiratory diseases that produce airflow obstruction are thought to occur in the smaller, more distal airways [106]. Since the total cross-sectional area of the small airways is very large, they offer little resistance to airflow at high lung volumes and impairment limited to these airways has little impact on maximal airflow as measured by FEV1 [107]. However, as exhalation proceeds during a maximal forced exhalation manoeuvre, these smaller airways decrease in calibre, with a marked increase in resistance, which can reduce expiratory flow substantially at lower lung volumes. In addition, loss of elastic recoil with emphysematous changes in lung parenchyma also contribute to reduction in maximal expiratory flow [108]. This results in a slowing of flow in the terminal portion of the spirogram, even when the initial part of the spirogram is barely affected [100–102]. These reductions in late- or mid-expiratory flow are best appreciated by examination of the flow–volume loop, where a characteristic concave shape is thought to reflect small airway dysfunction (figure 9b and c) compared to the normal flow–volume curve (figure 9a).
A number of attempts have been made to quantify these small airway impairments, especially when FEV1 and FEV1/FVC are normal (“isolated small airway dysfunction”) [109]. A common approach is to measure the average flow between 25% and 75% of exhaled FVC (FEF25–75%); however, mid-range flow measurements during a forced exhalation are highly variable, poorly reproducible and not specific for small airway disease in individuals [110]. Furthermore, mid-range flow measurements usually do not add to clinical decision making beyond information contributed by FEV1, FVC and FEV1/FVC [111]. There is insufficient evidence to support the use of spirometry to identify small airway dysfunction [112]. There has been recent interest in FEV3/FVC [113] or FEV3/FEV6 [114] providing more sensitive indication of airflow obstruction in adults when FEV1/FVC is still in the normal range. Other tests such as oscillometry, multiple-breath washout and imaging may also provide evidence of airflow obstruction when FEV1/FVC is normal [115].
Dysanapsis and other patterns of impairment in FEV1, FVC and FEV1/FVC
For healthy individuals, the meaning of a low FEV1/FVC ratio accompanied by FEV1 within the normal range is unclear. This pattern may be due to “dysanaptic” or unequal growth of the airways and lung parenchyma [116]. While this pattern has been thought to be a normal physiological variant [103], new data suggest that it may be associated with the propensity for obstructive lung disease [117, 118]. Factors associated with this pattern in healthy people included male sex, younger age and taller stature, with higher FVC above predicted and higher terminal flows as seen by FEF75% [119]. A high FVC with a low RV can be seen in this instance (normal FEV1 but low FEV1/FVC). Whether this pattern represents airflow obstruction will depend on the prior probability of obstructive disease and possibly on the results of additional tests, such as BDR, DLCO, gas exchange evaluation and measurement of muscle strength or exercise testing.
The “non-specific” pattern: a low FEV1 and FVC with normal FEV1/FVC
The pattern of reduced FVC and/or FEV1, normal FEV1/FVC and normal TLC has been termed the “non-specific” pattern. This pattern was described in the 2005 ATS/ERS interpretation statement and was thought to relate to airflow occlusion/collapse [3], but we now know that this interpretation was too simple. Indeed, this pattern can reflect reduced effort, a restrictive ventilatory impairment or be an early consequence of small airway disease with air trapping and/or emphysema [120, 121]. However, measurement of a low TLC is necessary to confirm restriction.
In the setting of reduced effort, the non-specific pattern reflects the failure of the individual to inhale or exhale completely, resulting in a “falsely low” FEV1 and FVC. It may also occur when the flow is so reduced that the subject cannot exhale long enough to empty the lungs to RV. In this circumstance, the flow–volume curve should appear concave downward towards the end of the manoeuvre. In this case the volume–time curve can also be informative and may help to differentiate between glottis closure and sudden interruption of the expiration due to poor effort, and even other causes.
The non-specific pattern may be an early indicator of a restrictive process in which FVC reduction is not yet accompanied by a reduction in RV. A low TLC under these circumstances would confirm restriction. In contrast, in early obstruction, small airway collapse can reduce FVC and increase RV before the FEV1/FVC ratio falls. Three-year follow-up of the non-specific pattern has demonstrated continued non-specific pattern in two-thirds of people, with the other one-third having been diagnosed with overt obstructive or restrictive disease. In current and former smokers when TLC is not available (typically in population-based studies), the non-specific pattern has been labelled “preserved ratio impaired spirometry (PRISm)”, which in follow-up has been shown to be associated with both more typical restrictive or obstructive patterns [122–124]. As with any pattern involving a low FVC, TLC should be measured to confirm restriction, as clinically indicated
When the non-specific pattern is observed in an individual performing a maximal, sustained effort, it may be useful to repeat spirometry after treatment with an inhaled bronchodilator. Significant improvement in FEV1, FVC or both would suggest the presence of some degree of bronchial responsiveness. Another approach is to compare FVC to an untimed slow vital capacity (SVC). If SVC is significantly larger than FVC (>100 mL [125]), it implies that airway collapse is occurring during the forced exhalation [126].
Alternative spirometric indices and supplementary tests assessing ventilatory impairments
The use of VC (i.e. the largest VC of SVC and FVC) in place of FVC in the ratio (i.e. FEV1/VC) was recommended in the 2005 ATS/ERS interpretation document [3]. Using VC in this ratio for identifying obstruction may be more sensitive but not as specific compared to FEV1/FVC [127]. The recording of FVC is easier to standardise because there are many ways to record VC, some using different equipment, and VC is very dependent on the preceding flow and volume histories [128]. In health, FVC does not differ significantly from VC [10]. The use of FVC for the FEV1/FVC ratio should be used as they both should come from forced expiratory manoeuvres using the same equipment and there are robust reference equations for FEV1/FVC but not for FEV1/VC. Using the previously recommended FEV1/VC to diagnose airflow obstruction will increase the uncertainty about the validity of the diagnosis especially in the older population.
In adults, FEV6 may be substituted for FVC and appears accurate in diagnosing obstruction [129–133], but this only applies if the appropriate LLNs [134] for FEV1/FEV6 are used (GLI equations do not include FEV6). FEV2 or FEV3 have also been shown to be useful surrogates for the estimation of FVC in terms of providing an accurate diagnosis of obstruction [135].
Another measure of an obstructive ventilatory impairment derived from spirometry is inspiratory capacity (IC). A reduction in IC usually reflects an elevated FRC due to air trapping. IC, when expressed relative to TLC, correlates closely with acute exacerbations and survival in individuals with COPD, and reduction in IC during exercise is an important determinant of dyspnoea and exercise intolerance [136].
Multiple other indices derived from analysis of the forced expiratory manoeuvre, such as measures of the slope or curvature of the flow–volume loop, have been identified [137]. In the future, techniques using artificial intelligence/machine learning (AIML) of the expiratory flow–volume loop may offer more accurate assessments of small airway function [138].
In people with early manifestations of lung disease, and especially in children, spirometry values can be normal even in those with confirmed disease. Other measurements of airway function may supplement spirometry in assessing ventilatory impairments. Airway resistance (Raw) measured by body plethysmography, and its volume-related measures of specific Raw (sRaw) or specific airway conductance (sGaw), are not commonly used to identify airflow obstruction. They are more sensitive for detecting narrowing of extrathoracic or large central intrathoracic airways than of more peripheral intrathoracic airways. However, measurements of respiratory system resistance by the non-invasive techniques of oscillometry, which require only tidal breathing, may be useful in individuals who are unable to perform a maximal forced expiratory manoeuvre, including very young children [139–142].
Central and upper airway obstruction
Central airway obstruction and upper airway obstruction occur in the airways outside lung parenchyma. These may occur in the intrathoracic airways (intrathoracic trachea and main bronchi) or extrathoracic airways (pharynx, larynx and extrathoracic portion of the trachea). These conditions in the early stages may not lead to a decrease in FEV1 and/or FVC, but PEF can be severely reduced. The indices presented in table 6 may help to distinguish intrathoracic from extrathoracic airway obstruction. Therefore, an increased ratio of FEV1 (mL) to PEF (L·min−1) can alert the clinician to the need for an inspiratory and expiratory flow–volume loop [143]. An FEV1/PEF ratio >8 mL·L−1·min−1 in adults suggests the presence of central or upper airway obstruction [144]. Poor initial effort can also affect this ratio. Importantly, a progressively severe upper airway obstruction will ultimately reduce FEV1 and the FEV1/FVC (VC) ratio.
Examination of the expiratory flow–volume loop can be very helpful in assessing an upper airway obstruction. When a forced expiratory effort is acceptable, the repeatable pattern of a plateau of forced inspiratory flow in the presence of relatively normal expiratory flow suggests variable extrathoracic upper airway obstruction (figure 9d). Conversely, the pattern of a repeatable plateau in forced expiratory flow with relatively normal inspiratory flow suggests variable, intrathoracic central airway obstruction. The pattern of a repeatable plateau in both forced inspiratory and expiratory flows suggests fixed central or upper airway obstruction (figure 9e). With unilateral mainstem bronchus obstruction, a rare event, maximum inspiratory flow tends to be higher at the beginning than towards the end of the forced inspiration because of a delay in gas filling (figure 9f). In this instance, during forced expiration, flow initially diminishes during forced expiration as the rapidly emptying regions of the lung empty, but then plateaus in the mid-portion of the expiratory loop as the slower emptying regions now dominate expiratory flow. Another pattern of flow oscillations (sawtooth pattern) may be occasionally observed on either the inspiratory or expiratory phase and likely represents a mechanical instability of the airway wall. The absence of classic spirometric patterns for central airway obstruction does not accurately predict the absence of pathology [145]. As a result, clinicians need to maintain a high degree of suspicion for this problem and refer suspected cases for direct endoscopic inspection or imaging of the airways.
Ventilatory impairments defined by lung volume measurements
Spirometry can only suggest a restrictive pattern and lung volume measurements are necessary to confirm this. Lung volume measurements start with determinations of FRC by gas wash-in/washout analyses or body plethysmography. Thereafter, expiration to RV and inspiration to TLC define fractional lung volumes.
Typically, measurement of TLC and fractional lung volumes discussed in this section add little to spirometric measurements in identifying an obstructive ventilatory impairment; however, these measurements may be helpful in the setting of borderline or atypical spirometric patterns [146–149]. An increase in RV or RV/TLC above the 95th percentile may indicate hyperinflation or air trapping due to the presence of airway obstruction [102]. Indeed, one of the earliest manifestations of small airway disease is an increase in RV or RV/TLC due to premature airway closure and air trapping. With progression, lung hyperinflation and air trapping are reflected by increases in FRC or FRC/TLC and often in TLC. An increased FRC/TLC indicates a reduced IC, which is a hallmark of COPD and closely associated with reduced exercise tolerance and dyspnoea [150]. Note that an increased RV/TLC may also be seen with muscle weakness or suboptimal effort and in some restrictive processes when TLC is reduced proportionally more than RV (table 5) [151, 152].
Restrictive impairments
A reduction in lung volumes defines a restrictive ventilatory impairment and is classically characterised by a reduction in TLC below the LLN (5th percentile) (figure 10 and table 7). A typical example is shown in figure 9g. The presence of a restrictive impairment may be suspected from spirometry alone when FVC is reduced, FEV1/FVC is normal or increased and the flow–volume curve shows a convex pattern (reflecting high elastic recoil). However, a reduced FVC by itself does not prove a restrictive ventilatory impairment. Indeed, it is associated with a low TLC less than half the time [153, 154]. Conversely, in adults, a normal FVC and FEV1/FVC are highly reliable at ruling out restriction as measured by low TLC [153]. Note that a high PEF with normal FEV1 may be seen in early interstitial lung disease before restriction limits FVC [155].
In most restrictive disease processes, FEV1, FVC and TLC are typically reduced in roughly the same proportion; this pattern is known as “simple restriction”. However, some individuals present with a reduction in FVC that is out of proportion to the reduction in TLC, indicating a disproportionately elevated RV. This pattern is termed “complex restriction” and is associated with processes that impair lung emptying, such as neuromuscular disease, chest wall restriction or occult obstruction with gas trapping. When associated with a low FEV1/FVC ratio, it is termed a “mixed” disorder, indicating the presence of both significant airflow obstruction and restriction [156].
Obstructive impairments
Obstructive ventilatory impairments are generally assessed with spirometric measurements of expiratory airflow. As noted earlier, however, there are specific lung volume patterns associated with airflow obstruction that generally reflect hyperinflation/air trapping. These patterns involve reduced VC, IC and FVC with increased FRC and RV. Obstructive diseases, because they interfere with intrapulmonary gas mixing, may also have important effects on gas dilution or washout techniques to measure FRC, alveolar volume (VA) and TLC. In these conditions, TLC assessed by gas dilution techniques will be low since only communicating gas volume is measured. In the presence of airway disease, a low TLC from a single-breath test (such as VA from the DLCO) should not be interpreted as demonstrating restriction, since such measurements systematically underestimate TLC. The same is true of measuring lung volumes by multiple-breath helium dilution or nitrogen washout [157]. The degree of underestimation of lung volume increases as airflow obstruction and regional maldistribution of gas worsen. In the presence of severe airflow obstruction, TLC can be underestimated by a gas dilution method by as much as 3 L, greatly increasing the risk of misclassification of the type of physiological phenotype [158–160]. A method of adjusting the single-breath VA for the effect of airflow obstruction has been published but needs further validation [125, 161]. In the case of severe airflow obstruction, lung volume may be overestimated by body plethysmography, possibly due to heterogenous time constants (resulting in underestimation of alveolar pressure by mouth pressure) and increased extrathoracic airway compliance [160].
Mixed ventilatory impairments
A mixed ventilatory impairment is characterised by the coexistence of obstruction and restriction, and is defined physiologically when both FEV1/FVC and TLC are below the LLN (5th percentile). Since FVC may be equally reduced in either obstruction or restriction, the presence of a restrictive component in an obstructed individual cannot be inferred from simple measurements of FEV1 and FVC. A typical example is presented in figure 9h. If FEV1/FVC is low, FVC is below its LLN and there is no measurement of TLC by body plethysmography, it is possible that the reduction in FVC is due to an increased RV, but a superimposed restriction of lung volumes cannot be ruled out [162]. Conversely, when FEV1/FVC is low and FVC is normal, a superimposed restriction of lung volumes can almost always be ruled out [153, 154]. Mixed obstruction and restriction commonly involves the combination of a pulmonary parenchymal disorder plus a non-pulmonary disorder, such as COPD plus congestive heart failure [163]. In cases where expiratory airflow obstruction and restriction are concomitantly present, the sensitivity of a reduced FEV1/FVC or reduced TLC to identify one of these conditions is reduced. Table 8 shows a summary of spirometric and lung volume patterns with obstructive, restrictive and mixed ventilatory impairments.
Gas transfer impairments defined by DLCO
Gas transfer is commonly assessed by measuring the uptake of carbon monoxide (as a surrogate for oxygen) by the lungs. In general, overall carbon monoxide uptake is determined by the alveolar–capillary membrane surface area and diffusion properties, the volume of capillary blood haemoglobin in contact with alveolar gas (Vc), and the reaction rate between haemoglobin and carbon monoxide. The importance of haemoglobin cannot be overemphasised and all interpretations must have the reference values adjusted for haemoglobin content.
The primary measurements are KCO (the measured carbon monoxide concentration change over time) and VA (the volume of gas containing carbon monoxide measured by the dilution of an inert tracer gas in the inspired volume). Their product (DLCO=KCO×VA) is the key index that is interpreted for gas transfer, with its pathophysiological importance previously reviewed [164, 165].
Interpreting a reduced DLCO must be done with these concepts in mind. The normal range for DLCO and VA should be based on the 5th percentile and 95th percentile [6, 11]. In the setting of a normal VA, KCO also has 5th and 95th percentile values. However, because KCO will rise in a non-linear fashion as lung volumes fall (smaller lung gas volumes mean more rapid carbon monoxide concentration changes due to an increasingly higher surface area/volume ratio), this “normal” range for KCO progressively loses meaning as lung volumes decrease. This is why in the setting of low VA, a so-called “normal” KCO (often expressed as DLCO/VA) cannot “correct” for low lung volumes [154]. Defining an impaired KCO in the setting of a low VA has minimal evidence to inform interpreters and, in practice, becomes an empirical exercise often focusing on the observed KCO percent predicted [166]. Figure 11 depicts a reasonable interpretation algorithm using DLCO along with KCO and VA.
It is also useful to compare VA to TLC measured by body plethysmography to determine whether test gas maldistribution may contribute to lowering the DLCO (i.e. carbon monoxide uptake can only be determined for the regions in which the test gases distribute). The normal value for the ratio of VA/TLC in adults is ∼0.85–0.90 [166]. Values significantly below this suggest that gas mixing impairments are likely contributing to a low measured DLCO. In the absence of plethysmographic lung volume data, the presence of a steep downward slope to the inert gas tracing during exhalation suggests the possibility of gas maldistribution. There are no ideal ways to adjust for these conditions and the interpreter can only note that the problem exists [167, 168].
The future of pulmonary function interpretation
Normal results from routine PFTs do not exclude physiological impairment, especially in mild disease and in children. Specialised PFTs, when used together with routine PFTs, may provide a more comprehensive and multidimensional evaluation of lung function and may further improve interpretation. There is also rapid development of wearable devices that allow continuous monitoring of ventilatory indices during daily life (i.e. under natural physiological conditions) [169]. Together with applications that capture and interpret data, and integrated enterprise and cloud data repositories, wearable devices will provide novel solutions for personalised respiratory medicine, including tele-monitoring of respiratory function.
In the era of precision medicine and novel prediction tools, more sophisticated diagnostic models should be developed to more accurately identify early determinants of reductions in lung function. Longitudinal data across the life course are essential to identify opportunities for early intervention. There is exciting research in this field that will likely provide significant improvements, especially around the uncertainty of measurements. There are ongoing efforts devoted to the development of AIML approaches to both novel tests as well as currently standard tests. The updated interpretation standards may inform future AIML algorithms and ensure uncertainty is considered in the algorithm. Examples of uses in standard tests include AI analysis of the expiratory flow–volume pattern as noted earlier, along with measurements of inert gas washout and the carbon monoxide measurements through the DLCO exhalation manoeuvre [170]. AIML-based software may also provide more accurate and standardised interpretations, and may serve as a powerful decision support tool to improve clinical practice [171, 172]. AIML may help to develop personalised, unbiased prediction of normal lung function. AIML may enhance the analysis of lung function data by identifying complex, multidimensional patterns associated with disease subtypes. While such algorithms may help to reduce any bias from poor quality data [172], AIML must use only good quality data in training to avoid introducing bias into any algorithms.
The widespread use of electronic health records [173] for data collected during the course of routine clinical practice and large clinical databases from multicentre randomised controlled trials offer unique data sources for training AIML algorithms. These algorithms may be combined with natural language processing, a set of methods which apply linguistics and ML to large corpora of clinical textual passages in order to extract structured information at a large scale. Using linguistics and computer science to process and understand text written in natural language has the potential to extract relevant information on a large scale. Sharing and using individual data requires a robust and appropriate internationally recognised ethical, legal and information governance framework, which has yet to be established.
Conclusions
When interpreting PFT results, a clinician must interpret a particular result as within or outside the normal range for an individual of that age, sex, height and ethnic background based on reference equations, and consider how measures of lung function change over time. Interpretation of PFTs must take into account a level of uncertainty relating to 1) how representative the obtained result was of the individual's lung function at the time of testing, 2) how pre-test probability of disease may influence what is the appropriate threshold for each individual and 3) how valid for the individual is the reference population against which the test is being judged.
The requirements for obtaining a technically acceptable measurement have already been set out [4–7]. The quality of individual effort must therefore also be considered when assessing how representative the obtained result is of the individual's lung function. A poor quality result might be sufficient to answer a particular clinical question, such as if there is sufficient function to perform a lobectomy. However, a poor quality result should ideally be repeated before important decisions are made from the result. Some lung function indices are inherently more reproducible over time, such as FEV1, FEV6 and FVC, and will lead to more certainty in decision making than less reproducible tests.
There is clearly a level of uncertainty about the best choice of reference equations that considers an individual's sex, geographic and ancestral background. The GLI equations are the most generalisable suite of equations to date. Nonetheless, it remains unclear how to apply such a reference equation without introducing the possibility of bias. Clinicians must always take this increased uncertainty into consideration when making diagnoses and treatment recommendations.
It may also be reasonable to set clinical decision-making thresholds for a test based on clinical risk and observed clinical outcomes. A more comprehensive approach to interpretation (not simply relying on whether results are within or outside the normal range) is imperative for appropriate interpretation of lung function when pre-screening for employment, tracking the effects of exposure, disability assessment and risk assessment for therapies potentially toxic to the lungs. To date, no satisfactory outcome-based thresholds for lung function have been defined; therefore, careful consideration of the medical and exposure history of an individual is necessary when interpreting lung function results.
Importantly, clinicians should take time to explain PFT results to individuals and how these are used to guide decisions. A recent survey of people living with respiratory conditions found that more than half (59.4%) did not know what FEV1 meant or what it represented for their condition [174]. People living with respiratory conditions, as well as those referred for PFTs, may want to know what their results mean for them.
Translation of these recommendations to clinical practice will require a paradigm shift whereby the idea of an absolute level of ideal lung function (i.e. the predicted value) is replaced in favour of a range of values that are observed in the majority of individuals without respiratory disease (i.e. z-scores or percentiles). Graphical displays as part of a PFT report can be helpful in communicating results. Interpretation of results should consider the inherent biological variability of the tests and the uncertainty of the test result. We anticipate that these interpretation recommendations will be considered in future disease-specific guidelines.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Executive summary ERJ-01499-2021.Supplement
Shareable PDF
Supplementary Material
This one-page PDF can be shared freely online.
Shareable PDF ERJ-01499-2021.Shareable
Footnotes
This article has an editorial commentary: https://doi.org/10.1183/13993003.00317-2022This article has supplementary material available from erj.ersjournals.com
This document was endorsed by the ERS Executive Committee on 6 December 2021, by the American Thoracic Society Board of Directors on 8 December 2021, and by the Thoracic Society of Australia and New Zealand on 10 December 2021.
Conflict of interest: S. Stanojevic reports grants, support for travel from the European Respiratory Society and was the Co-Chair of the Global Lung Function Initiative for the European Respiratory Society; and additionally acted on the Pulmonary Function Testing Proficiency committee for the American Thoracic Society, all outside the submitted work.
Conflict of interest: D.A. Kaminsky reports speaker fees from MGC Diagnostics, Inc. and honoraria from UpToDate, Inc., outside the submitted work.
Conflict of interest: M.R. Miller has nothing to disclose.
Conflict of interest: B. Thompson reports grants from the NHMRC; consulting fees from 4D Medical and Chiesi; lecture honoraria from 4D Medical, Chiesi and Mundipharma; and reports past academic work with the Global Lung Function Initiative for the European Respiratory Society, outside the submitted work.
Conflict of interest: A. Aliverti reports patents on Forced Oscillation Technique from Philips and patents on Opto-Electronic Plethysmography from BTS Bioengineering, outside the submitted work.
Conflict of interest: I. Barjaktarevic has nothing to disclose.
Conflict of interest: B.G. Cooper acted as Co-Chair of the Global Lung Function Initiative at the European Respiratory Society, outside the submitted work.
Conflict of interest: B. Culver has nothing to disclose.
Conflict of interest: E. Derom has nothing to disclose.
Conflict of interest: G.L. Hall was former Co-Chair of the Global Lung Function Initiative and reports previous academic and leadership work with the Global Lung Function Initiative, both with the European Respiratory Society, outside the submitted work.
Conflict of interest: T.S. Hallstrand reports research grants from the NIH (NHLBI, NIAID), outside the submitted work.
Conflict of interest: J.D. Leuppi reports grants from the Swiss National Science Foundation (SNF 160072 and 185592) and Swiss Personalised Health Network (SPHN 2018DR108); and has also received unrestricted grants from AstraZeneca AG Switzerland, Boehringer Ingelheim GmbH Switzerland, GSK AG Switzerland and Novartis AG Switzerland, outside the submitted work.
Conflict of interest: N. MacIntyre reports consulting fees from Vyaire, outside the submitted work.
Conflict of interest: M. McCormack reports royalties for authorship for PFT chapters from UpToDate; consulting fees related to PFT quality and reading from Aridis, outside the submitted work.
Conflict of interest: M. Rosenfeld has nothing to disclose.
Conflict of interest: E.R. Swenson has nothing to disclose.
- Received May 26, 2021.
- Accepted November 18, 2021.
- The content of this work is not subject to copyright. Design and branding are copyright ©ERS 2022. For reproduction rights and permissions contact permissions{at}ersnet.org