European Respiratory Society

Outcomes for COPD pharmacological trials: from lung function to biomarkers

M. Cazzola, W. MacNee, F. J. Martinez, K. F. Rabe, L. G. Franciosi, P. J. Barnes, V. Brusasco, P. S. Burge, P. M. A. Calverley, B. R. Celli, P. W. Jones, D. A. Mahler, B. Make, M. Miravitlles, C. P. Page, P. Palange, D. Parr, M. Pistolesi, S. I. Rennard, M. P. Rutten-van Mölken, R. Stockley, S. D. Sullivan, J. A. Wedzicha, E. F. Wouters


The American Thoracic Society/European Respiratory Society jointly created a Task Force on “Outcomes for COPD pharmacological trials: from lung function to biomarkers” to inform the chronic obstructive pulmonary disease research community about the possible use and limitations of current outcomes and markers when evaluating the impact of a pharmacological therapy. Based on their review of the published literature, the following document has been prepared with individual sections that address specific outcomes and markers, and a final section that summarises their recommendations.


Clinical trials in chronic obstructive pulmonary disease (COPD) normally include forced expiratory volume in one second (FEV1), the principal measure of lung function, as an outcome, mainly because the COPD research community and regulatory agencies have traditionally recognised its importance as an objective index of airflow obstruction that measures both symptomatic relief and disease progression 1. International bodies, such as the Global Initiative for Chronic Obstructive Lung Disease (GOLD), the American Thoracic Society (ATS) and the European Respiratory Society (ERS), have been working together to promote the use of FEV1 as a means of defining and staging this disease. With their extensive efforts, COPD is now known as a “disease state characterised by airflow limitation that is not fully reversible” 2.

However, many researchers still have difficulty defining what constitutes a response to a pharmacological intervention in COPD 3. They are faced with a multicomponent disease characterised by a range of pathological changes, which include mucus hypersecretion, airway narrowing and loss of alveoli in the lungs, and loss of lean body mass and cardiovascular effects at a systemic level. COPD patients are also heterogeneous in terms of their clinical presentation, disease severity and rate of disease progression 4. Their degree of airflow limitation, as measured by FEV1, is also known to be poorly correlated to the severity of their symptoms or health-related quality of life (HRQoL), which adds to the difficulties of researchers who are trying to improve the definition of COPD and current disease staging systems.

Since the relationship between spirometry and symptoms appears to be poor, measures of lung physiology alone may not adequately describe both the social impact of COPD and the effectiveness of therapeutic interventions in individual patients 4. Most researchers regard changes in patient-centred outcomes, such as symptoms, exacerbations, exercise capacity and HRQoL, as important as or more important than changes in lung function. A panel of COPD experts have recently highlighted the importance of such outcomes and indicated the need for a more comprehensive assessment of both disease progression and treatment efficacy. Thus, they have proposed a multidimensional measurement for COPD that encompasses FEV1, the modified Medical Research Council (MRC) dyspnoea scale and the body mass index (BMI) 5. This development reflects the need for better understanding regarding newly proposed and existing COPD measures so that researchers and regulatory agencies can make better informed decisions when assessing new drug therapies for COPD, but also shows the challenges of abandoning the traditional outcome measures, such as FEV1 and BMI, that possess no clearly defined relationship with patient-centred outcomes.


The ATS/ERS jointly created a Task Force on “Outcomes for COPD pharmacological trials: from lung function to biomarkers” to inform the COPD research community about the possible use and limitations of current outcomes and markers when evaluating the impact of a pharmacological therapy. Based on their review of the published literature, the following report has been prepared with individual sections that address specific outcomes and markers, and a summary of their recommendations.

Task Force selection

Task Force members were selected using the following criterion: worldwide recognised experts in COPD trials and in specific COPD outcomes. An initial list was prepared by those who developed the proposal and this list was integrated with other names suggested by the ERS Scientific Committee and by the three reviewers of the application, considering the expertise in COPD trials and in specific outcomes.

Outcomes assessed

The major sections of the present report are: Lung function; Patient-reported outcomes; Exacerbations; Exercise; Mortality; Social and economic burden; Imaging; Nonpulmonary markers; Minimal important difference; and Biomarkers. Each outcome measure was assessed by a group of authors using the set of criteria described in table 1. This assessment was left to the discretion of the authors. Throughout this process, the authors have considered an “outcome” to mean a consequence of the disease that a patient normally experiences. This may be cough, dyspnoea, weight loss, exercise intolerance, exacerbations, impaired HRQoL, increased health resource use or mortality. Conversely, a “marker” is a measurement known to be associated with an outcome; for example, exercise capacity as tested in a laboratory is a marker of exercise intolerance in daily life 4.

Literature review

The authors searched the literature according to strategies that they developed independently. No central literature review or standardised evaluation for inclusion or exclusion of evidence was applied. Thus, while the ATS/ERS Task Force has adopted a comprehensive approach to reviewing the measures reported in the present report, it does acknowledge that not all measures and all evidence may have been included due to the amount of evidence available on each measure in the medical literature.



COPD is characterised by physiological abnormalities, including airflow limitation, abnormalities in gas exchange and lung hyperinflation. These can be objectively assessed in the laboratory using measurements such as FEV1, arterial oxygen tension (Pa,O2) and carbon dioxide arterial tension from blood gas determinations, as well as lung volumes measured at rest or during exercise. These markers serve as objective physiological measurements that aid in diagnosing disease, assessing its severity and analysing the mechanisms underlying some of its morbidity. In COPD, FEV1 is used in diagnosis and staging, arterial blood gases are useful in defining respiratory failure and dynamic hyperinflation helps to explain exertional dyspnoea 6.

Unfortunately, it is not easy to determine whether a measured change reflects a true change in pulmonary status or is only a result of test variability. Appropriate concepts such as sensitivity to change or responsiveness should be applied. These concepts account for test variability or variability in the measurement of differences at two different time-points. Repeated measures indicating change from baseline will provide more confidence that real change has occurred.

Forced expiratory manoeuvre outcomes

FEV1 is the volume of air that is forcibly exhaled in the first second, whereas forced vital capacity (FVC) is the total volume of air exhaled after a full inspiration. The methodology for obtaining forced expiratory manoeuvres and derived parameters have been standardised by the ATS and the ERS first separately and now also jointly 7. In the general population, reference equations based on the distribution of spirometric parameters in normal populations have been made available by the ATS 8, by the European Community for Coal and Steel (ECCS) 9 and by studies on different ethnic groups and age ranges 10. As such, it is important to carefully consider reference values that are most likely to represent the population to be tested in clinical practice or in multicentre clinical trials. In COPD patients, the FEV1 has been used to classify COPD patients by severity 11 and to describe progression 12 of the disease.

FEV1 and FVC have been shown to be highly reproducible in the vast majority of patients, with differences between highest and second highest values within 150 mL or 6%, if obtained by well-trained technicians. Variability coefficients for FEV1 and FVC over time (within days, weeks or years) have been reported 10. When taking into consideration the degree of change in FEV1 that would be considered clinically meaningful by the regulators, a change of 5–10% from baseline values is considered to be clinically important. During advisory committee meetings of the US Food and Drug Administration for new COPD drugs, a change of <3% of the baseline has been deemed to be not clinically important 13. According to ATS/ERS recommendations for the interpretation of lung function tests 10, the change in FEV1 should be ≥20% in short-term trials (of weeks of duration) and ≥15% in long-term trials (≥1 yr) to be confident that a clinically meaningful change has occurred (a summary of these recommendations can be found in the Minimal important difference section).

Forced expiratory volume in six seconds (FEV6) has recently been suggested as an alternative to FVC. It has been found to be ∼25% less variable than FVC 14. The operating characteristics of this measurement continue to be defined according to reference values recently published 15, 16. The diagnostic accuracy for defining airflow obstruction or restriction seems good 17, 18, although this depends on the population tested and the use of appropriate lower limit thresholds to define normality 19. It has been suggested that different fixed thresholds (FEV1/FEV6 <73% and FEV6 <82%) 20 are more appropriate than the currently utilised thresholds for obstruction in international guidelines (FEV1/FVC <70% and FVC <80% predicted).

In terms of health outcomes for the general population, a low pre-bronchodilator FEV1 has been found to be a predictor of mortality for all causes and cardiovascular mortality 2125. In COPD patients, post-bronchodilator FEV1 is poorly correlated with patient-centred outcomes, such as dyspnoea, exercise performance and HRQoL at baseline or after pharmacological interventions 2631. However, FEV1 has the advantage of being the most repeatable lung function parameter and one that measures changes in both obstructive and restrictive types of lung disease 7. Therefore, FEV1 has been proposed as a measure to be used for adopting treatment strategies in COPD based on severity classification 15.

The European Medicines Agency 1 specifies that strict airways reversibility criteria must be applied to patient inclusion in clinical trials (e.g. ≤12–15% FEV1 reversibility post-bronchodilator), with a pre-bronchodilator FEV1/FVC ratio ≤70% and FEV1 within 30–70% pred (post-bronchodilator). The ATS/ERS recommendation is that an increase in FEV1 and/or FVC >12% and 200 mL compared with baseline during a single testing session suggests a significant bronchodilatation 10. However, although changes in FEV1 and FVC have been suggested to assess the reversibility of airway obstruction, a negative response to a single administration of bronchodilator should not be used to determine the need for long-term bronchodilator treatment due to the lack of sensitivity of the parameters derived from full forced expiratory manoeuvres 32 and the variability of response over time 33. FEV1, FVC and FEV1/FVC ratio have all been used as predictors of the outcome of lung volume reduction surgery 34.

During exacerbations of COPD, measurement of FEV1 is possible but it does not seem to be useful for the early detection of exacerbations 35. However, even small changes in post-bronchodilator FEV1 following acute treatment were found to be associated with meaningful clinical outcomes 3638.

FEV1 has been used as a primary variable in the vast majority of multicentre trials, though an increase in FVC is also considered as proof of bronchodilatation 10, since physiologically relevant changes occur in a number of COPD patients who respond to bronchodilators with an increase of FVC without changes in FEV1 9, 39. Reliable values of FVC may be difficult to obtain in severe COPD patients who are unable to sustain prolonged expiration for a period >6 s; therefore, FEV6 may be used instead of FVC. The rate of decline of FEV1 has been shown to be decreased by smoking cessation 12.

With respect to feasibility, spirometry can be undertaken using different types of equipment, provided the standard requirements are met 7. Simple spirometry is only moderately time-consuming, with an average of 10–20 min per session 40. The reproducibility of spirometric measurements is critically dependent on the ability of the technician 7. Thus, specific training is mandatory for the technical staff and also for physicians who wish to undertake spirometry testing 36. The cost of spirometers is widely variable, but relatively inexpensive portable equipment can be used for large clinical trials, provided accuracy requirements are met 7.

When spirometry is performed, patients can experience some discomfort, light-headedness or even syncope. Airway obstruction may occur if multiple prolonged exhalations are required for the measurement of FVC. The use of FEV6 may prevent these events in more severe patients. There is also potential for infection transmission during pulmonary function testing but direct evidence has not been provided 41. Measures to reduce the risk of infection have been identified and should be followed 41.

Lung volumes

Functional residual capacity (FRC) is the volume of gas remaining in the lung at the end of tidal expiration. In healthy subjects at rest, the FRC corresponds to the relaxation volume (Vr) of the respiratory system. In COPD, FRC may be increased because the Vr is increased (static hyperinflation) or the end-tidal lung volume exceeds Vr (dynamic hyperinflation). Inspiratory capacity (IC) is the maximum volume of gas that can be inspired from end-tidal expiration, which is the difference between total lung capacity (TLC) and FRC. Residual volume (RV) is the volume of gas remaining in the lung at the end of a maximal expiration. In COPD, RV may be increased due to airway closure or extreme flow limitation.

The methods for measuring absolute lung volume and its subdivisions have been standardised in a joint ATS/ERS document 7, 42. In the general population, reference equations based on the distribution of FRC and RV in normal populations have been published by the ECCS 9. Published reference values are summarised in the ATS/ERS joint document on interpretation of pulmonary function tests 10. Differences in ethnicity are not well characterised. Most laboratories use reference values recommended by ECCS 9 or an ATS/ERS workshop 43. There are no equations for predicting IC; it is thus determined by the subtraction of predicted TLC from predicted FRC. Currently, there are no frequency distribution data available for any of these spirometry measures in COPD patients.

The reproducibility of FRC has been reported, with coefficients of variation ranging 3.5–6.7% for plethysmography and 4.9–10.4% for helium dilution, without apparent differences between normal and obstructed subjects 44. The coefficients of variation of RV measurements ranged 9.5–12.4% for plethysmography and 2.4–14% for helium dilution. However, no data are available for FRC or RV reproducibility in absolute values. Inter-laboratory differences seem to be as important as intra-individual differences. For IC, the short-term variability (95% confidence interval (CI)) in subjects with chronic airway obstruction at rest was 220–150 mL or 10–4.5% pred 32, 45.

In COPD patients, FRC increments 46 or IC decrements 29, 45, 4754 have been shown to correlate better than increased FEV1 with exercise tolerance and dyspnoea, and HRQoL, both at baseline 45, 47, 51, 52 and after pharmacological 29, 46, 49, 50, 53 or surgical 48, 54 interventions. When related to TLC, IC has been reported to be an independent predictor of mortality in COPD of different severity 55. Changes in RV have been shown to be a major determinant of response to lung volume reduction surgery 56, but its usefulness in large studies has yet to be determined.

Changes in lung volumes can occur in COPD patients after treatment with bronchodilators, even in the absence of changes in FEV1 32, 57. Small increases in IC after bronchodilator therapy, which signify a reduction in end-expiratory lung volume, are associated with reduced mechanical loading and increased functional strength of the inspiratory muscles. This in turn results in a decreased work of breathing and a reduced oxygen debt. Furthermore, an increased resting IC (of the order of 0.3 L or 10% pred) would be indicative of a greater ability to expand tidal volume during exercise, with a resultant increase in ventilatory capacity 29, 49, 58. Reduced operating volumes during exercise enhance the neuromechanical coupling of the respiratory system (i.e. the relationship between neural drive and mechanical response), thereby relieving respiratory discomfort. The net effect of these physiological benefits is an improvement in the patient's ability to engage in exercise 59. Several studies have also indicated that long-acting bronchodilators can reduce hyperinflation in COPD, as measured by RV, FRC and IC, in a manner that is somewhat similar to that seen with lung volume reduction surgery 49, 60, 61.

During recovery from mild exacerbations, both FRC and RV were found to decrease, while IC increased and TLC remained constant 38. Therefore, IC can be used to reflect changes in lung hyperinflation during acute exacerbations. Studies have confirmed that TLC does not change during more severe exacerbations 38, 62.

These variables can be measured in multicentre trials, possibly in selected centres. For acute interventions, in which TLC can be assumed to be constant 6365, changes in IC are a good surrogate for changes in FRC. For chronic interventions, in which TLC may not be constant, changes in IC may not mirror changes in FRC 60.

In terms of feasibility, FRC and RV measurements require a body plethysmograph or spirometers with inert gas analysers. Either method can be used, provided the equipment meets the standard requirements 42. However, they are not interchangeable, since moderate-to-severe airflow obstruction dilution methods tend to underestimate and body plethysmography tends to overestimate TLC. IC can be measured during simple spirometry if a closed circuit system is used. Dilution methods are too time-consuming for serial measurements, whereas body plethysmography is much less time-consuming and does not add much time to that necessary for simple spirometry. IC is part of the spirometric manoeuvre and only requires a few stable breaths before its measurement. The reproducibility of pulmonary function test measurements is critically dependent on the ability of the technician 41. Measurement of absolute lung volumes is technically more demanding than simple spirometry and specific training is necessary. The equipment for measuring lung volumes is substantially more expensive than simple spirometers. Body plethysmographs are more expensive than dilution equipment but the extra cost may be neutralised by time saving.

Gas exchange

The diffusing capacity of the lung for carbon monoxide (DL,CO) is a measure of carbon monoxide (CO) transfer from the airspace to pulmonary capillary blood; it is expressed as the total uptake of CO by the lung per unit of time and per unit of driving pressure 66, 67.

The methodology of the single-breath method has been standardised by the ATS and the ERS first separately 66, 67 and then jointly 68. For the general population, reference equations based on the distribution of DL,CO values in normal populations have been made available by studies on different ethnic groups and age ranges 6971. DL,CO has been used to quantify the extent of emphysema in COPD patients 72. Measurements have been made before and after rehabilitation 73 and have been used to assess patients for lung volume reduction surgery 74, 75.

The measure of DL,CO is reproducible even if the inter-laboratory coefficient of variation is larger than for spirometry measurements. Acceptable DL,CO test criteria have been standardised 68. A coefficient of variation of 3–4% has been reported in repeated measurements in normal subjects and in patients with abnormal spirometric patterns. An inter-sessional DL,CO variability of ≤9% over time has been reported 7679.

The influence of DL,CO on health outcome has not been established in the general population, whereas in COPD patients it has been used in the evaluation of surgical risk, particularly for lung cancer and other thoracic surgery 80. DL,CO has also been used in studies evaluating the effects of lung volume reduction surgery 74, 75, rehabilitation 73 and therapy for α1-antitrypsin deficiency 81, 82. However, it is a measure that can only be performed in stable conditions rather than during exacerbations of COPD.

DL,CO has been used as a primary outcome variable in multicentre trials, particularly in studies of lung volume reduction surgery 75, 76. It can be measured using different types of equipment, provided they meet the standard requirements 66, 67. The measurement takes 15–30 min and its reproducibility is critically dependent on the ability of the technician and on the achievement of acceptable test criteria 66, 67. Therefore, specific training is mandatory for the technical staff 66, 67. The cost of DL,CO equipment is variable but rather expensive. It is not available in primary care although it can be used for limited clinical trials, provided accuracy requirements are met 66, 67. Discomfort, light-headedness or even syncope may occur if multiple prolonged exhalations are required for measurement of DL,CO. There is also the possibility of infection transmission, although direct evidence has not been provided. General considerations for pulmonary function testing also apply to DL,CO equipment and procedures 41.

Pa,O2 is a measurement of arterial oxygen partial pressure and it can be determined by direct blood sampling. Some investigators have utilised arterialised earlobe capillary blood to assess Pa,O2, although potential for underestimation of arterial oxygenation has been suggested by several groups 83, 84. Conversely, arterial oxygen saturation (Sp,O2) can be measured directly 8587 or indirectly with the aid of pulse oximetry 88, 89.

Neither the ATS nor the ERS have standardised the methodology for both these variables. However, there are many studies in the literature that consider the methodology, techniques and equipment required to measure arterial blood gases 85, 90, 91. The frequency distribution of blood gas values has been established for the general population; reference equations based on the distribution of Pa,O2in normal populations have been made available by studies on different ethnic groups and age ranges 92, 93. In COPD patients, Pa,O2 has been used to define respiratory failure (Pa,O2 <7.98 kPa (<60 mmHg)) and it was included for adopting treatment strategies in COPD based on severity classification and disease progression 11. In terms of measurement reproducibility, modern blood gas analysers are automated self-diagnostic instruments that require minimal maintenance. Calibration of the machine's electrodes is important to correct for any drift in the measurements over time 8587. A coefficient of variation of 3–4% has been reported for many available pulse oximeters 90, 91.

In terms of health outcomes, Pa,O2 is useful for predicting survival status in hospitalised COPD patients with exacerbations 94, 95. Conversely, Sp,O2 is commonly used in sleep studies 96, 97. Pa,O2 has been used in many studies evaluating the effects of treatment in severe COPD patients. Since pulse oximetry is a noninvasive technique, Sp,O2 can be easily determined in large studies that occur in primary care settings 98102. As mentioned previously, Pa,O2 is an important variable that can be measured in severe, uncooperative COPD patients with exacerbations 103. Sp,O2 can also be easily used to monitor these patients 104, 105. Both variables can be obtained in all patients with or without comorbidities.

Pa,O2 and Sp,O2 can be useful in multicentre trials in severe COPD patients, provided the same equipment is used in each setting and standard requirements are met 85, 91. Both variables take little time to measure and the only training that is required is related to arterial blood sampling 86, 87. The cost of the equipment for the measurement of Pa,O2 and arterial oxygen saturation is variable but rather expensive. Sp,O2 that is obtained by pulse oximetry is relatively inexpensive and is available in almost all patient care settings. There are safety issues with arterial puncture for blood gas analysis, such as local pain, discomfort, light-headedness, nerve damage or syncope. Although the risk of bleeding may be increased, this procedure has been successfully performed in individuals who have been treated with therapeutic anticoagulation 106.

Safety and technical considerations

In general, pulmonary function testing can be physically demanding for a minority of patients when considering the influence on common comorbidities, e.g. ischaemic heart disease, diabetes mellitus or psychiatric disturbances. It is recommended that patients are not tested within 1 month of myocardial infarction 42. Conversely, measurements may be suboptimal in the presence of various comorbidities, such as chest or abdominal pain, facial abnormalities, stress incontinence or psychiatric disturbances. With respect to safety issues, claustrophobia may occur during body plethysmography. The potential also exists for infection transmission but direct evidence has not been provided 41.

Measurements of absolute lung volumes may also be suboptimal in the presence of various comorbidities, such as chest or abdominal pain, facial abnormalities, stress incontinence or psychiatric disturbances.


Based on the review of lung function measurements, FEV1 remains a primary end-point that regulatory authorities regard as an acceptable measure of efficacy for COPD pharmacological trials, particularly in combination with instruments that encompass symptomatic-based end-points 1. However, since lung hyperinflation and its reduction in response to a bronchodilator are not reflected in routine spirometry, IC could also be included in COPD trials, particularly where changes in lung physiology are expected. In hyperinflated patients, RV or FVC are useful measures for the identification of a therapeutic response that may not be determined from measuring FEV1. It has also been highlighted that measuring IC after a pharmacological intervention without plethysmographic determination of the static lung volumes may not be an adequate reflection of the underlying changes in these volumes 53. Arterial blood gases are useful outcomes in interventional studies that might affect respiratory drive or impair ventilation–perfusion relationships in the lungs.



The development of COPD may affect several aspects of a patient's health 107. These consequences of illness can be regarded as a process of illness progression, which normally starts at the development of physiological or biological abnormalities, resulting in symptoms and physical limitations that are noticed and reported by the patients. Eventually, patients will have to face their inability to take part in their usual activities, which will influence their perception of their health and ultimately their general well-being 108.

As a consequence, a number of clinical and physiological outcomes, such as dyspnoea, functional status, HRQoL and health status, are recognised as being important for the characterisation of response to treatment 109. For instance, dyspnoea is the primary reason for patients seeking medical care. Measurements of dyspnoea provide an insight into the practical effects of treatment on everyday life, reflecting whether or not patients perceive an improvement in this primary symptom of COPD. Patients with COPD frequently decrease their activity in order to avoid the unpleasant sensation of breathlessness. Functional status measurement reveals the number of activities that a patient can perform, something not reflected in measurements of FEV1 or dyspnoea.

Health status and HRQoL

Health status measurement provides a standardised method of assessing the impact of disease on patients' daily lives, activity and well-being. The term “quality of life” is often used loosely in this context, but this is inappropriate. The factors that determine an individual's quality of life are varied. Even in very ill people, health usually forms only a minor determinant of an individual's quality of life, with employment, finances, family and social factors being collectively more important 110. “HRQoL” is a more specific term.

Health status and HRQoL measurements are designed to allow comparisons across patients and studies. This means that all patients in all studies must be measured in the same way using a common instrument and unit of measurement. The measurements must be made without bias, i.e. the instrument should apply to relevant patients equally. Essentially, health status and HRQoL are instruments for use in groups of patients; they may also be used to assess individuals, but it should be understood that they will provide a standardised assessment of their health, not their quality of life.

The reader is also directed to the ATS Quality of Life website 111 as a useful source of information and to access detail of some of the questionnaires summarised herein.

Types of health status and HRQoL instruments

Generic health status

These questionnaires are designed to assess health irrespective of disease. A number have been used in COPD, in particular, the 36-item Short-Form Health Survey (SF-36), the Sickness Impact Profile (SIP) and the Nottingham Health Profile (NHP).

SF-36 was developed as a measure of general health 112. It has eight domains: physical function; mental health; energy/vitality; health perception; physical role limitation; mental role limitation; social function; and pain. Two summary scores may be calculated: physical summary and mental summary. It is self-completed and can be carried out online.

SIP was developed as a measure of general health 113. It has 12 categories: sleep and rest; diet; work; home management; recreation and pastimes; ambulation; mobility; body care and movement; social interaction; alertness behaviour; emotional behaviour; and communication. Two sub-scores are produced: i.e. physical and psychosocial. It is self-completed.

NHP was developed as a measure of general health 114. It has six domains: pain; physical mobility; emotional reactions; energy; social isolation; and sleep. It is also self-completed.

Long disease-specific health status and HRQoL

These questionnaires have been developed and validated in COPD patients. They are comprehensive, covering a range of aspects of COPD and are therefore reasonably time-consuming to complete. Examples include: the individualised Chronic Respiratory Questionnaire (CRQ); the St George's Respiratory Questionnaire (SGRQ); and the Quality of Life for Respiratory Illness Questionnaire (QoL-RIQ).

CRQ was developed in 1987 to measure HRQoL in patients with COPD 115. It includes 20 questions across four domains: dyspnoea; emotional function; fatigue; and mastery. A self-administered standardised (SAS) version, CRQ-SAS (which takes 7 min), is available and has been validated against the original interviewer-administered version and for telephone administration 116118.

SGRQ was developed in 1992 to measure health status in patients with respiratory disease 119. It has also been validated for use in bronchiectasis 120. It has three domains: symptoms; activity; and impacts. A total score is also calculated. It was designed for supervised self-administration.

QoL-RIQ was developed in 1997 for use in mild-to-moderate asthma and COPD 121. It has seven domains: breathing problems; physical problems; emotions; situations triggering or enhancing breathing problems; general activities; daily and domestic activities; and social activities, relationships and sexuality. A total score is also calculated. It is self-administered and a shorter version has been described 122.

Short disease-specific health status and HRQoL

These questionnaires were developed to provide valid but shorter estimates of overall healthy status in COPD. They are less comprehensive than the long instruments. Examples of these questionnaires are: the CRQ-SAS; the Airways Questionnaire (AQ)20; and the Breathing Problems Questionnaire (BPQ). AQ20 was developed in 1998 for use inasthma 123 and COPD 124, 125. BPQ was developed in 1994 for use in COPD 126, and was shortened in 1998 127. Both are self-administered questionnaires with a total score calculated.

Disease-specific health status and HRQoL for patients with respiratory failure

There is only one questionnaire in this class that assesses patients with severe hypoxia or ventilatory failure. The Maugeri Foundation Respiratory Failure questionnaire was developed in 1999 for use in patients with respiratory failure 128. It has three domains: daily activity; cognitive function; and invalidity. A total score is calculated and is also self-administered.

COPD control questionnaires

The Clinical Control Questionnaire is the only questionnaire that exists in this class. It was developed in 2003 to assess the quality of control of COPD from the physician's perspective 129. It has three domains: symptom; functional state; and mental state. A total score is calculated and it can be self-administered.

Functional status questionnaire

These questionnaires are the same length as the long disease-specific health status questionnaires but address functional status (which is more than just mobility) in more detail than those instruments. The two most common questionnaires are Pulmonary Functional Status and Dyspnoea Questionnaire (PFSDQ-M) and Pulmonary Functional Status Scale (PFSS).

PFSDQ-M and PFSS were both developed in 1998 for use in patients with COPD 130, 131. PFSDQ-M has two domains, activity level and dyspnoea, whereas PFSS has three sub-domains, daily activities/social functioning, psychological functioning and sexual functioning. Both are self-administered questionnaires with a total score calculated.

Activity of daily living scales

The Nottingham Extended Activity of Daily Living (EADL) scale and the London Chest Activity of Daily Living (LCADL) scale are the principal questionnaires in this class and have been designed for patients with more severe disease, largely more housebound than usual COPD patients.

The Nottingham EADL scale was developed in 1987 as a general measure of limitation of essential activities of daily living for stroke patients 132 but has been used in COPD 133. It contains four domains: mobility; domestic; kitchen; and leisure. A total score is calculated and a clinician administers it.

The LCADL scale was developed in 2000 as a measure of activities of daily living in patients with severe COPD 134. A total score is calculated and it is self-administered.

Preference-based instruments

These are generic scales from health to death designed to rank patients' health or preference for a health state. Complex direct methods of assessing health can be used, such as Standard Gamble or Time Trade-off techniques, or simpler methods, such as the feeling thermometer, a simple visual analogue scale (VAS), can be used to estimate an individual's preference score 135138.

Indirect preference instruments provide ratings that are based on translation of scores through reference equations from other outcome measures. In that respect, they are unlike any other scale discussed herein. One of these, the Short-Form 6 Dimensions, an indirect utility instrument, is derived from an existing generic instrument, the SF-36. Other common questionnaires in this class of scales are the Quality of Well-being scale (QWB) and the European Quality of Life Questionnaire (EQ-5D). They are, even more than the other health status questionnaires, very much population-based.

QWB was developed as a utility instrument 139. It addresses symptoms, mobility, physical activity and social activity. Scores can be translated into economic evaluation for cost-effectiveness studies or quality-adjusted life-years (QALYs). It is administered by the interviewer.

EQ-5D was developed as a utility questionnaire 140. It addresses mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Scores can be translated into economic evaluation for cost-effectiveness studies or QALYs. It contains a feeling thermometer and is self-administered.

General comments on available questionnaires in COPD

There is a wide range of questionnaires available. However, certain generalisations can be made.

Disease-specific instruments tend to be more sensitive to change (more responsive) and therefore better suited than generic instruments to measure treatment effects in COPD. Of the three generic instruments listed herein, SF-36 is the most widely used and is well supported by a comprehensive website. Of the disease-specific measures, CRQ and SGRQ have been used very widely and there is extensive literature concerning them both. They have slightly different properties. Both have been used in rehabilitation and pharmaceutical studies: CRQ more in rehabilitation and SGRQ in long-term pharmaceutical studies particularly. Both are supported by the groups that developed them and are subject to continuing developmental work. The published literature concerning QoL-RIQ is much more limited.

The comprehensive functional status questionnaires are well established and are similar to the comprehensive disease-specific questionnaires in many respects, but were developed from a narrower perspective. All of the remaining questionnaires listed have been developed to address perceived weaknesses in the long disease-specific instruments. They all have evidence for validity in their chosen applications and patient populations, but there is limited evidence as to whether they have the same level of validity, reliability and responsiveness as the long health status questionnaires when compared side by side in the same patients. It is also unclear whether they contribute any additional information.

The preference instruments are of greatest interest to health economists since they can be scaled from death to health, which means that data from patients who have died can be included in health economic analyses. QWB and EQ-5D have been used quite widely in COPD but, as with all generic instruments, they are relatively insensitive. The feeling thermometer has shown relatively good responsiveness 135138.

Details of questionnaires

Details of some of the properties of these questionnaires are provided hereafter. No attempt has been made to list all the references that provide validation data for each instrument, although the references listed provide important validation data on each questionnaire. All have been validated to a varying degree. The most extensively used questionnaires are CRQ and SGRQ, and a PubMed search will provide a very wide range of references on these instruments. For many of the instruments listed herein, few additional references are available. Websites are available for a number of these instruments (particularly for the generic and utility instruments).


Dyspnoea, or breathlessness, can be measured based on the principles of psychophysics (stimulus–response relationship). In general, two different approaches have been used to measure dyspnoea in clinical trials: clinical ratings based on activities of daily living and ratings during an exercise task.

The purposes of measuring dyspnoea in pharmacological trials include differentiating between individuals who have less dyspnoea and those who have more (a discriminative instrument), and determining how much dyspnoea has changed (an evaluative instrument). For the assessment of treatment efficacy, the two important measurement criteria of an evaluative instrument are responsiveness and construct validity 141. Responsiveness refers to the ability to detect change; if a treatment results in an improvement in dyspnoea, both the investigator and the clinician want to be confident that they can detect the difference, even if it is small. For construct validity, changes in dyspnoea scores should correlate with expected changes in other variables, such as lung function and exercise performance, consistent with theoretically derived predictions. However, the magnitude of the correlations is typically modest, indicating that dyspnoea ratings and measures of lung function represent different constructs.

Clinical ratings

Over the past 50 yrs a variety of questionnaires or scales has been developed to quantify dyspnoea 141. Many of the original instruments are considered one dimensional, which relate the severity of dyspnoea to various physical tasks (e.g. walking on a level surface or climbing stairs). More recently developed instruments have been multidimensional, which include additional factors that influence dyspnoea. The following sections will consider the three clinical scales most widely used to quantify dyspnoea. These include: the MRC scale 142; the Baseline and Transition Dyspnoea Indices (BDI and TDI, respectively) 143; and the dyspnoea domain of CRQ 115.

MRC scale

The MRC scale is a five-point scale published in 1959 that considers certain activities, such as walking or climbing stairs, which provoke breathlessness 142. In <1 min, the patient selects a grade on the MRC scale that most closely matches his/her severity of dyspnoea. The MRC scale is considered a discriminative instrument that can categorise patients with COPD in terms of their disability 144. In one study of COPD patients 145, the frequency distribution of MRC values showed a tendency for lower ratings (i.e. less dyspnoea) in patients with COPD, whereas dyspnoea ratings with other instruments demonstrated a more normal distribution. In the short (2 days–2 weeks) and long term, MRC values are reproducible. In terms of their influence on the health outcome in COPD patients, factor analyses have demonstrated that dyspnoea scores are separate and distinct from lung function and exercise capacity 145147. However, the MRC scale is not satisfactory as an evaluative instrument to measure changes in dyspnoea, its broad grades are generally unresponsive to interventions such as pharmacotherapy 141. It is also not a useful measure during exacerbations of COPD. It is possible that cardiac disease could influence the severity of dyspnoea. However, in randomised controlled trials (RCTs) involving patients with COPD, those patients with clinically significant cardiac disease are typically excluded from entry. With respect to the utility of the MRC scale in multicentre trials, it is not recommended since its broad grades are considered unresponsive to change.


BDI and TDI were developed in 1984 so that a physician, nurse or technician could interview a patient in order to obtain a comprehensive understanding of the patient's severity of breathlessness based on three components: functional impairment; magnitude of task; and magnitude of effort 143. BDI is a discriminative instrument used to quantify the severity of dyspnoea at an initial or baseline state, whereas TDI is an evaluative instrument used to quantify the changes in dyspnoea from the initial or baseline state.

The methodology of BDI and TDI has been standardised. With the original BDI/TDI, an interviewer would ask the patient various questions about how activities of daily living influenced the patient's breathlessness. According to the patient's responses to the BDI, the interviewer would select a grade, based on standard and specific criteria, for each of three components: 1) functional impairment, with grades 0–4; 2) magnitude of task, with grades 0–4; and 3) magnitude of effort, with grades 0–4. These components add up to a baseline total score ranging 0–12; the lower the score, the more severe the dyspnoea. For grading changes in dyspnoea with TDI, the interviewer would ask the patient various questions about how activities of daily living influence changes in dyspnoea experienced by the patient compared with the initial or baseline state. The components and grades are as follows: 1) functional impairment, with grades -3–3; 2) magnitude of task, with grades -3–3; and 3) magnitude of effort, with grades -3–3. These add up to a transition total score ranging -9–9; the negative value indicates deterioration, whereas a positive value indicates improvement.

In 2004, the self-administered and computerised (SAC) versions of BDI and TDI were developed to provide more standardised criteria for measuring breathlessness 148. For BDI, one criterion in the magnitude-of-task component changed from “climbing three flights of stairs” to “climbing one flight of stairs”. In the case of TDI, there were two changes. To remind the patient of his/her original dyspnoea status, an insert was provided on the computer screen of the descriptor selected for the corresponding component of the BDI. A bidirectional VAS was also created for each component of the TDI. The patient presses an up or down key on the computer keyboard to move a vertical bar up for improvement or down for deterioration in dyspnoea compared with the baseline dyspnoea status.

In terms of the statistical frequency distributions for BDI and TDI, their ranges have been determined to be normally distributed in COPD patients 145, 149, 150. They are both reproducible measures in the short (i.e. 2 days–2 weeks) and the long term.

With respect to their influence on health outcomes, factor analyses have demonstrated that dyspnoea scores are separate and distinct from lung function, exercise capacity and other outcomes in COPD patients 145147. Longitudinal studies have also demonstrated an expected decrement in the TDI score (i.e. more breathlessness) in patients with COPD over 1–2 yrs 151, 152. Numerous RCTs have also demonstrated improvements in TDI with pharmacotherapy compared with placebo in patients with COPD 150160. In an observational study 36, BDI/TDI were shown to be valid and responsive measures of acute changes in dyspnoea associated with a COPD exacerbation. In an RCT 161, COPD patients had significant improvements in dyspnoea (p = 0.04) after 10 days of treatment for an exacerbation with prednisone plus standard therapy (+4.0 units in TDI) compared with placebo plus standard therapy (+2.1 units in TDI).

In terms of feasibility, BDI/TDI have been used to measure dyspnoea with pharmacotherapy compared with placebo in numerous RCTs performed at multiple centres 152160. Both instruments require an interviewer and the questionnaire. For the SAC versions, a computer and the software program are required. It takes ∼3–4 min for either the original or the SAC versions. For the original BDI/TDI instruments, the interviewer should have a basic knowledge of respiratory disease and view a training video or observe an interview between a patient and an experienced interviewer. When implementing the SAC versions, the patient needs to complete a practice session by rating “tiredness” on the computer before selecting a grade for each of three components of BDI or TDI. Interviewer-administered BDI and TDI are free to academic institutions and to individuals using the instruments in a research protocol. Both instruments can be obtained from the Mapi Research Institute in Lyon, France. When pharmaceutical companies use them in RCTs, a modest fee is charged per study and per language. The SAC versions of BDI and TDI are available from Psychological Applications in Waterbury, VT, USA. A fee is determined based on the intended use of the instruments.

Dyspnoea component of CRQ

CRQ was developed in 1987 to measure HRQoL in patients with respiratory disease 115. Dyspnoea was one of four dimensions included in the CRQ. While the original dyspnoea domain was individualised (patients chose activities that made them most short of breath), the authors validated a standardised version that inquires about the same activities in all patients in two randomised validation studies. Comparison of the individualised with the standardised version revealed slightly greater responsiveness of the individualised version. In addition, self-administration led to increased responsiveness compared with interviewer administration 116, 117.

For the individualised domain, the patient is asked to report or identify the five most important activities that caused breathlessness over the previous 2 weeks. A list of 26 activities is offered for consideration by the patient. The patient then grades the severity of dyspnoea for each activity on a 1–7 (“extremely short of breath” to “not at all short of breath”) Likert-type scale; in other words, for each of the five activities that are either individual specific or standardised, a score of 1–7 is allotted. The mean score across completed items is calculated and yields a score for the CRQ dyspnoea domain ranging 1–7. In COPD patients, the scores for the dyspnoea domain of the CRQ are normally distributed 145 and they are reproducible in the short (2 days–2 weeks) and long term.

Although the CRQ has been used to measure HRQoL in RCTs evaluating pharmacotherapy in patients with COPD, in most of these studies the complete scores for the dyspnoea domain were not reported 160, 162164. In a study examining the efficacy of a pharmacological intervention using the dyspnoea domain of the CRQ 165, there were no significant differences among three treatment groups (salmeterol and placebo; salmeterol and ipratropium bromide; and placebo). In an observational study 36, the dyspnoea domain of the CRQ was shown to be a valid and responsive measure of acute changes in dyspnoea associated with a COPD exacerbation. In an RCT 161, patients had significant improvements in the dyspnoea domain (p = 0.02) after 10 days of treatment for an exacerbation of COPD with prednisone plus standard therapy (+1.69 units) compared with placebo plus standard therapy (+0.97 units).

In a similar manner to previous types of dyspnoea measurement, it is possible that cardiac disease could influence the severity of dyspnoea. However, in RCTs involving patients with COPD, those patients with clinically significant cardiac disease are typically excluded from entry into the studies.

The measurement of the dyspnoea domain of the CRQ can be used in multicentre trials 160, 162165. For the recommended SAS version, no specific training is required. It takes ∼10 min to administer with an interviewer, compared with ∼7 min for the self-administered version 115, 116. For the original CRQ instrument, an interviewer, a list of 26 activities that might cause dyspnoea and Likert scoring cards are required. The interviewer should have a basic knowledge of respiratory disease and scoring on the Likert-type scale. Any cost of either version is based on the intended use of the questionnaire, as determined by McMaster University, Hamilton, ON, Canada.

Ratings during exercise

Another approach is to instruct patients to report the severity of dyspnoea while performing an exercise task, such as cycling or walking. Various exercise protocols have been used, including incremental and constant work exercise tests. There are two common methods for COPD patients to rate their dyspnoea during an exercise test: the 0–10 category ratio (CR10) scale and VAS.

The CR10 scale incorporates nonlinear spacing or verbal descriptors of severity corresponding to specific numbers with ratio properties 166.

VAS consists of a vertical or horizontal line, usually 100 mm in length, with descriptors typically positioned at the extremes of the scale as anchors.

There are several advantages of using the CR10 scale rather than VAS for rating dyspnoea during exercise. First, the CR10 is open-ended as there is an opportunity to provide a rating >10, whereas VAS has a ceiling effect (the highest possible rating is 100 mm). Secondly, the presence of descriptors on the CR10 scale allows direct comparisons between individuals or groups. Thirdly, the CR10 scale should be easier for patients to use for exercise prescription. The use of a number or descriptor on the CR10 scale (e.g. a rating of 3 or moderate breathlessness) would be more relevant as a dyspnoea target for exercise training, rather than a length in millimetres on the VAS. The CR10 scale has been combined with a continuous method for patients to rate dyspnoea 167. With this system, the subject moves a computer mouse that adjusts a vertical bar positioned next to the CR10 scale so that patients can provide ratings “whenever there is a change in breathlessness” throughout exercise, rather than “on cue” at each minute of exercise 167169.

In general, ratings during exercise provide different and distinct information than that obtained by clinical ratings of dyspnoea 141.

The CR10 scale

In 1982, Borg 166 developed a 0–10 rating scale constructed as a category scale with ratio properties. This CR10 scale incorporates nonlinear spacing of verbal descriptors of severity corresponding to specific numbers and ratio properties of sensation intensities. The CR10 scale provides a standard method for patients to select ratings of dyspnoea on a scale based on descriptors that correspond to specific numbers. The patient should be given specific written instructions on how to use the CR10 scale to rate dyspnoea prior to the exercise test 170. During the incremental or constant work exercise test, the patient is typically cued to indicate a rating toward the end of each minute or each new workload. An alternative approach enables the subject to provide continuous ratings of dyspnoea (“whenever there is a change in breathlessness”) by moving a computer mouse that adjusts a vertical bar positioned adjacent to the CR10 scale visible on a monitor 167169.

With respect to frequency distribution of the CR10, there is a wide range of dyspnoea ratings among patients with COPD 169, 171. It is a reproducible measure in the short (i.e. 2 days–2 weeks) and long term. It has been determined from factor analyses that dyspnoea ratings during exercise tests are separate and distinct from lung function and exercise capacity 145. These ratings have also shown to be responsive to various pharmacotherapies in patients with COPD 29, 49, 50, 169, 172, 173. However, it is not recommended to be performed during exacerbations of COPD.

In multicentre studies, patients have successfully rated the severity of dyspnoea on the CR10 scale during exercise tests as part of RCTs evaluating various pharmacotherapies 29, 49, 50, 169, 172, 173. Again, similar to the other measures of dyspnoea, it is possible that cardiac disease could influence the severity of dyspnoea.

The ratings of dyspnoea on the CR10 scale are typically obtained from an exercise task, such as cycle ergometry or treadmill walking. For the continuous method for patients to rate dyspnoea, a computer with a mouse, a monitor with the CR10 scale, and the programme to operate the system are required. The time needed to retrieve these ratings is dependent on the duration of the exercise task. Patients need to have an initial familiarisation session to practice using the CR10 scale and exercise equipment prior to being randomised to therapy. The programme to operate the computerised continuous method for patients to rate dyspnoea is available from Psychological Applications and any fee related to its use is determined based on its intended purpose.


A number of clinical and physiological outcomes, such as dyspnoea, functional status and health status, are recognised as being important for the characterisation of response to treatment. Instruments used to measure dyspnoea should rely on patient-reported outcomes, be multidimensional where possible, adhere to standardised methodology and ideally be computerised. Instruments that are third-party rated may prove less compelling than validated patient-rated instruments. These recommendations should be upheld if the evaluation of dyspnoea in COPD is to provide consistently reliable data in future clinical trials 109. Health is an abstract concept but it is possible to produce standardised health status measures that have true interval-scaling properties (i.e. the questionnaire behaves like a ruler). While there is a relationship between reduced lung function and impaired health 174, this is not sufficiently strong for spirometric measures to provide a reliable estimate of HRQoL 175. For that reason, measurements of health status must be performed using specifically designed questionnaires. Unfortunately, recent evidence indicates the possibility of an improvement in health status in COPD patients even after a regular treatment with placebo 176, whereas a meta-analysis of the published clinical trials indicates that SGRQ is only able to differentiate between health condition and presence of COPD, but it is not a real indicator of the severity of the disease according to the severity of airway obstruction 177. At this time, the poor correlation between changes in FEV1 and in quality-of-life scores indicates the need for using a multidimensional approach.



The chronic and progressive course of COPD is often aggravated by short periods of increasing symptoms, particularly increasing cough, dyspnoea and production of sputum that can become purulent. The majority of these COPD exacerbations are caused by bronchial infection and, if frequent, have been demonstrated to have a negative impact on HRQoL in patients with COPD 178181. Furthermore, acute exacerbations are the most frequent cause of medical visits, hospital admissions and death among patients with COPD 182. One of the main objectives of COPD treatment is to reduce the frequency and severity of exacerbations. Unfortunately, there is no validated diagnostic test or biomarker of exacerbations 183; therefore, the diagnosis of an exacerbation must be based on a clinical definition that includes the most frequent symptoms observed during these episodes. Definition of exacerbations and their severity need to be standardised to allow comparisons between different interventions in different settings 184.

Definition of exacerbation

At present, there is no standardised and unanimously accepted definition of exacerbation of COPD 184. However, four definitions are widely used.

The first definition uses a combination of three cardinal exacerbation symptoms: increased dyspnoea; sputum volume; and sputum purulence. Type I exacerbations were defined as occurring when the three symptoms were present. Type II exacerbations were defined as occurring when two of the three symptoms were present. Type III exacerbations were defined as occurring when one of the three symptoms was present in addition to at least one of the following findings: upper respiratory infection within the previous 5 days; fever without other cause; increased wheezing; increased cough; or an increase in respiratory rate or cardiac frequency by 20%, as compared with baseline 185. This definition has been widely used in clinical trials of antibiotics for exacerbations of COPD.

The second definition of exacerbations looks at the presence of the following patterns of symptoms during ≥2 consecutive days: either two or more of three major symptoms (increase in dyspnoea, sputum volume and sputum purulence); or any one major symptom together with any one minor symptoms (increase in nasal discharge, wheeze, sore throat, cough or fever) 178. This definition has been used in follow-up studies of COPD patients and, unlike the first definition, has the advantage that all exacerbations, whether reported or unreported to healthcare professionals, can be identified, increasing the exacerbation frequency. The identification of unreported exacerbations requires the use of a diary card of symptoms.

The third definition was proposed as a consensus definition of an experts' panel: a sustained worsening of the patient's condition, from the stable state and beyond normal day-to-day variations, that is acute in onset and necessitates a change in regular medication in a patient with underlying COPD 186.

The fourth definition, proposed by some pharmacological randomised clinical trials 152, 187, identifies exacerbations as a complex of respiratory events (i.e. cough, wheezing, dyspnoea or sputum production) lasting ≥3 days. However, there is no evidence that 3 days of symptoms are required to define an exacerbation. Unlike asthma, patients with COPD do not experience sudden increases in symptoms that may disappear spontaneously or with medication in a few hours or days 188. Furthermore, delay in initiating treatment for an exacerbation may result in a longer duration of the episode 189. Consequently, no time limit should be required to define an exacerbation of COPD.

The use of exacerbation frequency as an outcome measure in clinical trials in COPD may require the quantification of all episodes. Consequently, the definition should include any increase in respiratory symptoms over baseline. In this respect, the definitions used in existing clinical trials of bronchodilators and/or inhaled corticosteroids in COPD are appropriate. Thus, the proposed definition of exacerbation of COPD would be an increase in respiratory symptoms over baseline that usually requires change in therapy 190, 191. However, if this definition is used, patients must be encouraged to recognise exacerbations and increase their own therapy, self-manage or present to healthcare professionals for exacerbation therapy. Otherwise, the exacerbation frequency will be underestimated. For the use of this definition, it is important to know that approximately two thirds of patients are aware when an exacerbation is imminent and, in most cases, symptoms are consistent from one exacerbation to another 192.

Definition of severity of exacerbations

The effect of any given therapeutic intervention may not only be to reduce the frequency of exacerbations, but also and more commonly to reduce their severity. No validated scale of severity exists for exacerbations. Some authors have used a composite scale of symptoms to evaluate the resolution of the episode in clinical trials of antibiotics 193 or in observational follow-up studies 194. However, to date these scales have not been validated in long-term trials of interventions in stable COPD patients. In contrast, most studies have used the intensity of the medical intervention required as a grade of severity, from self-management at home to admission to an intensive care unit 152, 154, 191, 195197. The classification in types I–III according to criteria defined by Anthonisen et al. 185 is not a severity scale but a classification that indicates the likelihood of bacterial infection as cause of an exacerbation (i.e. a type I exacerbation in a mild patient may have a better prognosis than a type III exacerbation in a severe patient).

The proposed severity classification includes three categories: 1) mild, which involves an increase in respiratory symptoms that can be controlled by the patient with an increase in the usual medication; 2) moderate, which requires treatment with systemic steroids and/or antibiotics; and 3) severe, which describes exacerbations that require hospitalisation or a visit to the emergency department.

Evaluating the frequency of exacerbations

Due to seasonal variation, an evaluation of exacerbation frequency requires a period of ≥1 yr 179. The methods for recording exacerbation frequency as a variable have not been standardised, but various methods have been used in several clinical trials of inhaled corticosteroids and/or bronchodilators in COPD 152, 154, 190, 191, 195198.

In observational studies of COPD patients, a skewed distribution of this variable has been found with a large number of patients having 0–2 exacerbations per yr and a small number of patients having ≥10 exacerbations per yr 179, 194, 198. The mean number of exacerbations is generally related to the severity of the baseline disease and the definition used, and in observational studies ranges 1–2.5 episodes per yr 179, 194, 198. If unreported exacerbations are included, severe patients (GOLD III) have a mean of 3.43 exacerbations per yr and GOLD II patients have a mean of 2.68 per yr 194.

In the short term (i.e. weeks or months), this variable does not appear to be reproducible due to the small number of episodes per yr; the probability of repeating an episode in weeks or months is small. However, in the long term, patients with frequent exacerbations in the past have a large probability of suffering frequent exacerbations in the future 194, 199. In COPD patients, there is also a short-term 36 and a long-term impact of exacerbations on HRQoL 178180, 200. The magnitude of this impact is directly related to the frequency of exacerbations 178180, 200. In addition, poor HRQoL is a risk factor for frequent exacerbations 179. Numerous clinical trials have also shown that it is sensitive to treatment effect, with apparent reduction in the frequency of exacerbations 152, 154, 190, 191, 195, 197, 198. Comorbidities, particularly cardiac diseases, are a risk factor for poor outcome of the exacerbation 201 but, to date, there is no evidence that comorbidities are a risk factor for frequent exacerbations.

With respect to the feasibility of measuring the frequency of exacerbations, simpler methods are more likely to be used in multicentre trials. A questionnaire in the form of a diary card is needed in order to capture the unreported exacerbations from patients and another questionnaire for investigators to quantify the reported exacerbations 178, 194. The questionnaires take <1 min each to complete. No additional equipment or training is required.

The statistical methodology used to calculate the annual ratio of exacerbations in a given cohort and to compare the different ratios between treatment arms in clinical trials must be described in detail, since great and significant differences have been reported when different approaches have been used 202.

Evaluating the severity of exacerbations of COPD

The methodology surrounding the use of severity of exacerbations as a variable has not been standardised. The criteria for hospital admission may vary from country to country and in different hospitals. In addition, the use of systemic steroids and/or antibiotics for exacerbations has different patterns in different areas. The severity of exacerbations is closely related to the severity of the baseline disease, i.e. severe COPD patients are more likely to be hospitalised due to an exacerbation. Therefore, distribution of severity of exacerbations parallels distribution of severity of COPD in a given population. Up to 50% of exacerbations may be unreported to the physician and managed at home (mild exacerbations), 40–45% are moderate exacerbations and <10% are severe exacerbations 178, 194, 198, 203. This variable is not reproducible in the short term but, in the long term, COPD patients who experience severe exacerbations have an increased risk of experiencing more severe exacerbations in the future 194, 204. In COPD patients, this variable does influence health outcome since severe exacerbations, which require hospital admission, do have an impact on health status 179. Many RCTs have also demonstrated improvements in the severity of exacerbations with pharmacotherapy compared with placebo in patients with COPD 152, 154, 195, 197, 198. It is possible that cardiac disease may influence the severity of the exacerbation. Patients with cardiac disease have an increased risk of hospitalisation due to an episode 205 and have an increased risk of mortality 206.

The severity of exacerbations can be measured in multicentre trials. This is based on medical resource utilisation observed in numerous RCTs of pharmacotherapy compared with placebo 152, 154, 187, 190, 195198. It requires the use of a questionnaire or a diary card for the patient to detect the unreported (mild) exacerbations; and a questionnaire is also used for the investigator to collect information on reported (moderate-to-severe) exacerbations 178, 194. Both take <1 min to complete and require no additional equipment or training costs.


The definition of exacerbations for clinical trials with anti-infectives should include the recognition of symptoms related to the likelihood of infectious aetiology. However, the definition used in clinical trials of pharmacotherapy in stable COPD is not restricted to exacerbations of infectious aetiology. The proposed definition of an exacerbation of COPD is an increase in respiratory symptoms over baseline that usually requires medical intervention. This definition requires the patient to be encouraged to recognise exacerbations and increase their own therapy or present to healthcare professionals. Patient diary cards are useful for recognising these episodes. The definition of severity is as follows: 1) mild, which involves an increase in respiratory symptoms that can be controlled by the patient with an increase in the usual medication; 2) moderate, which requires treatment with systemic steroids and/or antibiotics; and 3) severe, which describes exacerbations that require hospitalisation or a visit to the emergency department. The severity of exacerbations is largely based on the degree of healthcare utilisation and this definition has the disadvantage of having different criteria for hospital admission or different patterns of use of antibiotics and/or oral steroids in different settings or countries. At this time, no biological marker for the risk of exacerbation or severity exists that can be used in clinical practice or for an RCT.



In chronic cardiopulmonary diseases the ability to exercise is an important clinical outcome in its own right and a marker of other significant outcomes. Exercise tolerance is significantly impaired in many patients with COPD and is an important determinant of HRQoL 119, 207. In COPD patients, exercise tolerance cannot be predicted by resting lung function measurements (e.g. FEV1) and exercise testing is useful in the clinical setting to assess the degree of impairment, prognosis and the effects of interventions. Exercise capacity is mostly limited by ventilatory constrains, lung gas exchange inefficiency leading to increased exercise ventilatory demand, lung dynamic hyperinflation and dyspnoea sensation 208, 209. Deconditioning and/or malnutrition leading to peripheral muscle dysfunction and leg fatigue may also contribute significantly to reduced exercise capacity 210.

The ability of exercise to provoke breathlessness is used in the MRC dyspnoea scale to estimate symptom intensity (see section on Dyspnoea) 144 and exercise impairment is an important predictor of mortality 211, 212.

The severity and cause of exercise intolerance are best assessed by performing detailed physiological measurements in the laboratory (minute ventilation, breathing pattern, oxygen uptake, carbon dioxide production, oxygen saturation and other derived indexes; all during exercise). Simpler field tests, where it is normal simply to record the duration of exercise or the distance covered in a fixed time period, can also be utilised. Laboratory test protocols can be either incremental with a steady rise in the intensity of the workload or constant work rate tests (“endurance”), where the workload is undertaken at a fixed percentage of a previously established maximum. Incremental testing is less sensitive to interventions than endurance exercise, which is usually conducted at ∼75% of the symptom-limited peak oxygen uptake (V′O2,peak) or peak work rate 213, 214. Field tests, such as the widely used 6-min walking test (6MWT), represent an alternative in which patients are encouraged to walk as far as they are able to for the period of the test 215. In addition to the distance covered, it is useful to record, at the beginning and end of exercise, the intensity of breathlessness (and sometimes leg fatigue) using a modified Borg category scale or a VAS (see section on Dyspnoea), cardiac frequency and Sp,O2. Differences in physiological adaptation to cycling and to walking have been reported in COPD in patients with increased dyspnoea and arterial oxygen desaturation during walking exercise 216, 217.

All exercise tests have been shown to have good validity, specificity, reliability 49, 50, predictive ability 212, discriminative ability and evaluative ability 218.

Laboratory exercise testing

Laboratory exercise testing is conducted under standardised conditions, with the recording of cardiac frequency, minute ventilation, respiratory frequency and breathing pattern, oxygen uptake and carbon dioxide production (from which the anaerobic threshold can be estimated 219), exercise duration and maximum workload performed. Lung volumes can be measured noninvasively using optoelectronic plethysmography 220 but the most common method involves the measurement of IC (assuming the TLC to be constant during exercise). Care is needed in how this measurement is conducted 64 but in individual laboratories its reproducibility is good 65. The intensity of respiratory sensation, breathlessness and the presence of leg fatigue are recorded using a category ratio scale (see section on Dyspnoea) at rest and at intervals during exercise (commonly at 1 min or just before a planned increase in workload). It is useful to note the principal symptom limiting exercise performance 216, 217. Cardiac frequency, blood pressure, oxygen saturation and ECG are also recommended.

According to ERS and ATS/American College of Chest Physicians guidelines for cardiopulmonary exercise testing 208, 209, patients rest for a few minutes breathing quietly on the equipment before beginning light exercise, which increases in intensity in a ramp fashion. Cycle and treadmill exercise have been used interchangeably, although the former has been mostly used in COPD clinical studies, as the work rate for incremental and endurance tests is easier to quantify. As the exercise period should last 10–12 min, the work rate increment should be selected carefully. In COPD studies the usual rate of workload increase is 10 W·min−1, although slower or faster rates are possible in the very sick and the very fit patients, respectively. The maximal incremental exercise test is also utilised to determine the appropriate work rate to be used in an endurance protocol (i.e. 75% peak work rate).

The measurement of respiratory and metabolic variables together with various exercise protocols is standard, although there is no universal agreement on the rate of increase of work during exercise, which tends to be individual to the purpose of the investigation. Although the statistical frequency distributions of these measures have not yet been determined in the general population or in COPD patients, they are reproducible in the short term as well as in the long term, where changes can reflect disease progression. In the general population, exercise is an important determinant of overall cardiovascular mortality 221, and in COPD patients mortality has been related to V′O2,peak in this group 212. Many studies have also demonstrated that interventions, such as rehabilitation 222, 223, lung volume reduction surgery 224 and oxygen 225, 226 and heliox 227 administration, and treatment with bronchodilator drugs 49, 50 improve exercise-related indices. These have been largely demonstrated with an endurance test outcome rather than incremental testing. However, exercise testing should not be peformed during exacerbations of COPD due to the obvious patient practicalities. The presence of comorbidities, which themselves limit exercise performance, e.g. intermittent claudication or exertional chest pain, precludes exercise testing with any test modality. Coexisting occult cardiac disease can be identified during exercise from abnormalities in cardiovascular variables (i.e. cardiac frequency, ECG and blood pressure).

With respect to the practicality of implementing exercise testing, it can be performed in multicentre studies, providing there is strict quality control of data collection centrally. Studies with ≤100 patients that have been conducted in six or more centres have now been reported with good data quality 49. It can be expected that >90% of COPD patients would be able to carry out exercise testing in a study. In order to undertake such testing, appropriate laboratory space is necessary. An electrically braked cycle ergometer or a treadmill with facility to operate at a gradient is required. Aditionally, an appropriate flow meter and gas analysers, which are commonly part of commercial computerised exercise-recording systems, will be needed. Ideally, any exercise test should measure a physiological variable in a breath-by-breath fashion. Testing with this modality requires 45–60 min to fully complete a test and allow the patient to rest. Training is necessary for the operators, who would normally be physiological measurement technicians, graduate students or equivalent. Costing varies from centre to centre but is not insignificant.

Field tests

Self-paced timed walking tests

Patients are asked to walk as far as they are able to on a level corridor for a set time, originally 12 min 228 but now usually 6 min 215. They are allowed to stop if they cannot continue but are asked to resume walking when they are able to. The intensity of relevant respiratory sensation is recorded as for laboratory testing, usually at the beginning and end of the exercise period. Oxygen saturation and cardiac frequency are also recorded using a light-weight portable pulse oximeter. The principal outcome is the distance walked in either metres or feet.

In terms of standardisation and reproducibility, the use of both encouragement during the walk 229 and a circular course 230 improves test-to-test reproducibility. The reproducibility of the tests is well defined in both respiratory and cardiac populations 231, 232. There is a consistent practice effect and at least one, and preferably more, practice walks should be conducted before reliable data can be obtained 232. There are both floor and, in particular, ceiling effects in this form of testing. The measurement is not suitable for use in less disabled subjects with walking distances >600 m, as factors other than those related to disease intensity determine exercise performance. The 6-min walking distance (6MWD) is an important predictor of mortality 233 and health status 174 in COPD patients. This test is now used as a standardised outcome measure in studies of treatment of pulmonary hypertension and has been applied to patients with congestive cardiac failure. Multiple studies also confirm that treatment with lung volume reduction surgery 234, 235, rehabilitation 236, oxygen during exercise 237 and a range of bronchodilator drugs 27, 238 improve this outcome significantly. As with exercise testing, the walking test is not recommended to be performed during exacerbations of COPD or when the patient has another serious comorbidity.

The walking test can be performed in multicentre trials, provided that suitable care is taken with the instructions to the operators. The simplicity of this test can be deceptive, since its performance may be influenced by patients’ fatigue when they are asked to participate in protocols involving multimeasurements. The test requires a level measured distance in a corridor, an appropriate timer and a pulse oximeter, and CR10 questionnaires. The walk itself will take 6 min but appropriate preparation and resting usually requires 20–30 min per test. A practice walk for the patient and appropriate familiarity with the test by the operators are recommended. The test is relatively inexpensive and can be mastered by nonspecialists.

Shuttle walking tests

These tests were developed as an alternative to the unpaced 6MWT for use as field exercise tests in an attempt to improve standardisation and reproducibility. The patient walks around an elliptical course defined by two cones 10 m apart, i.e. the shuttle distance. Walking speed is externally paced by the frequency of bleeps on a pre-recorded tape. The frequency increases progressively during the walk in the incremental shuttle tests until patients can no longer match their walking pace to the bleep frequency. Cardiac frequency, oxygen saturation and symptom scores are measured. The principal outcome is the distance covered in metres or feet. The distance completed on the shuttle test strongly correlates (r = 0.86) with V′O2,peak measured during incremental cycling exercise 239.

This test has been adapted as an endurance protocol with the external signal frequency constant throughout the walk. The peak bleep rate is set at a value that corresponds to 70% of the maximum oxygen uptake, which is determined indirectly from a previously established relationship between cardiac frequency and shuttle distance. The principal outcome of the endurance shuttle walk test is the duration of exercise in seconds.

The shuttle walk test and its variants have been standardised and a protocol published 240, 241. The incremental test requires an a priori familiarisation study, while the endurance shuttle walk needs an incremental test that serves to familiarise the patient with the key aspects of the protocol. In terms of reproducibility, current data (not yet published in full) suggest that the shuttle walking distance is stable over a 6-month period, although this is sufficiently long for exercise performance to decline as a result of disease progression. The influence of this test on health outcome has not been specifically studied; however, there are studies published in full or in abstract that suggest that this measure is responsive to pulmonary rehabilitation 242, nutritional support 243, 244 and bronchodilator drugs 245. In addition, as with all exercise tests, the same considerations are made regarding its use during exacerbations or in the presence of coexisting diseases.

The shuttle walk test may also prove useful in multicentre studies. There has been only one small trial (n = 80) conducted at three UK centres, but the influence of the variability between centres on the results was not assessed. In terms of cost, the shuttle walk text is relatively inexpensive, requiring two cones separated by 10 m, an appropriate tape recorder, together with a pulse oximeter and a CR10 questionnaire.


Exercise capacity is another important clinical outcome that could be measured in COPD pharmacological trials. Several methods for evaluating exercise capacity have been developed. The 6MWT is a relatively simple test that has been used extensively in trials to evaluate possible benefits of pharmacological intervention. More standardised tests have patients walking at a specific speed on the treadmill or performing cycle ergometry. Exercise duration, power output (in watts) and oxygen consumption (in mL·kg−1·min−1) are also standard measures of exercise capacity.



Mortality provides the best possible outcome to measure. Reliable, relatively easy to measure and of great importance, it has been the gold standard in the evaluation of predictors and therapies. In COPD, several studies evaluated predictors of mortality. Four reports have shown improvements in mortality with COPD therapies: two trials evaluating oxygen for hypoxaemic patients 246, 247, one patient level meta-analysis trial of inhaled corticosteroids 248, and one of lung volume reduction surgery in a small subgroup of patients 218.

Cause of death in patients with COPD

Patients dying as a consequence of severe COPD do not regularly have COPD listed on their death certificates, making COPD-specific mortality a difficult outcome measure 249, 250. Patients with advanced COPD are often recorded as dying from other causes, sometimes trivial insults, such as hospital admissions for unrelated events (e.g. fractures), or therapeutic mishaps, such as sedation, uncontrolled oxygen or maintenance treatment withdrawal. COPD is also an independent risk factor for cardiovascular deaths 251, 252. All-cause mortality is probably the best COPD outcome measure.

Adjudication of the cause of death

The correct determination of the cause of death remains important, however, as it may help generate hypotheses regarding the mechanism by which a variable in question affects the outcome or the influence that a treatment under study may have on the outcome.

Very little has been published in the area of accurately defining the cause of death. As a matter of fact, only one study 196 was found in the COPD literature where a methodology for the adjudication of the actual cause of death was described. In contrast, adjudication committees for the cause of death and a description of their methods are frequent in cardiovascular trials 253, 254. The discrepancy between the original and the adjudicated cause of death in some of those studies has reached 20% 196. There is emerging consensus that large clinical trials evaluating death as an outcome should have an independent adjudicating committee 255 consisting of at least three members 256. More members do not appear to increase the accuracy of classification 254.

Predictors of mortality

In table 2, the variables that represent predictors of mortality are presented along with the stage (GOLD–ATS/ERS) included in the studies and whether it predicts all-cause and respiratory mortality 5, 55, 212, 249, 257269.

With mortality as an outcome, the variables listed represent surrogate outcomes. They could be used as surrogates for mortality, especially if their changes reflect changes in mortality. A case in point is correction of oxygen in hypoxaemic patients, which alters the mortality rate. Unfortunately, surrogates often do not show a relationship or even contradictory results when compared with important patient outcomes.

Thus, the relative strength of the variables remains difficult to ascertain. Recent data do suggest that, at least for the body mass index (B), obstruction (O), dyspnoea (D) and exercise endurance (E) index (BODE index), changes in its value after intervention confer changes in prognosis 270, 271. More studies are needed to confirm these preliminary findings. What is clear is that variables that differ from FEV1 are predictors of mortality and some, such as BODE, may be better than lung function alone.


Mortality remains the most important and robust clinical outcome in COPD research. Several variables different from the degree of airflow obstruction independently predict mortality in patients with COPD. Accurate determination in studies assessing mortality as an outcome requires the correct adjudication of the cause of death. The adjudication committee should consist of three individuals.



An economic evaluation is not an outcome measure as such, but rather a specific type of analysis that compares costs and effects between two or more interventions and integrates the differences in costs and effects into a cost-effectiveness ratio. Cost is a cumulative variable that includes a wide variety of different types of healthcare utilisation and other resource use, all of which can be measured in different ways. These data are increasingly important in various jurisdictions in order to support coverage and reimbursement decisions by health authorities.

Because economic evaluations require high-quality data on outcome measures, a detailed description of the methods, data acquisition and handling and reporting requirements are relevant to the remit of the ATS/ERS Task Force.

There are several educational books and guidelines on economic evaluation 272274 and there are reviews of economic evaluations of COPD interventions 275, 276. The purpose of the present section is to focus on: 1) the aspects of economic evaluations that are typical or specifically relevant to COPD in the context of pharmacological trials; 2) the issues around the measurement and valuation of healthcare utilisation; 3) the measurement and valuation of productivity costs; 4) the outcomes typically used in economic evaluation; 5) the interpretation of the cost-effectiveness ratio; and 6) the epidemiological models of COPD specifically developed to estimate cost-effectiveness of COPD interventions. Most economic evaluations of COPD interventions were conducted alongside randomised clinical trials. An increasing number of cost-effectiveness analyses are using models to estimate cost-effectiveness.

Use of healthcare resources

Factors to consider for measurement and valuation

Exacerbations are an important outcome in COPD, representing treatment failure and progression of the disease. Between 40–60% of medical expenditure for COPD is a direct consequence of exacerbations 277281. Hospitalisation, emergency department visits and unscheduled clinic visits, as well as use of rescue medication, including antibiotics, comprise the majority of these emergency treatment costs 203. When a resource use-based definition of severity is used, costs increase with severity by definition, but costs also increase with severity when a symptom-based definition is used 282, 283. In clinical trials, use of emergency treatments, alone or in combination with symptom and lung function data, are customarily employed to characterise an exacerbation, especially when the primary study outcome is reduction in the frequency of or the time to an exacerbation event. Routine collection of emergency treatment data can be undertaken in the field through patient or caregiver self-report. In some circumstances, automated data from clinical or billing records are more reliable and valid and can substitute self-reports.

For studies of cost impact, data on preventive pharmacotherapy, diagnostic and follow-up spirometry, oxygen use and routine office visits (downstream cost) are required to supplement data on emergency treatments in order to provide a comprehensive assessment of health resource use and costs. These data can be acquired in a similar fashion, using self-report or automated data collection methods.

Although it is relevant to record the aforementioned use of maintenance therapy and scheduled healthcare utilisation to obtain a complete picture of the costs, it has been well established that hospital admissions are the main driver of the cost-effectiveness of most COPD interventions. However, the incidence of hospital admissions in clinical trial populations of stable moderate-to-severe COPD patients is relatively low. Hence, a very large number of patients are required to demonstrate that a reduction in the number of hospital admissions or hospitalisation days by 20 or 30% is statistically significant. As this number exceeds by far the number needed to demonstrate a difference in lung function, exacerbation rate or COPD-specific quality of life, most clinical trials do not have sufficient power to detect cost differences.

Medications are an important contributor to the total costs of COPD, but, with the exception of the costs of study medication, they are not usually an important driver of the cost-effectiveness of COPD interventions. Economic evaluations that are performed alongside clinical trials are not suitable to detect a difference in costs of medications, because most medications are given as maintenance therapy and clinical investigators are often instructed to keep the dose constant during the trial. Moreover, the costs of medications to treat exacerbations, such as short-acting β-agonists, prednisone or antibiotics, are relatively inexpensive. So a relevant reduction in exacerbation rate will not immediately transfer into a relevant reduction in medication costs.

Total costs versus COPD-related costs

An important decision when calculating costs of healthcare utilisation is whether to calculate total costs or COPD-related costs. Theoretically, it is better to record all healthcare utilisation because it is unknown in advance whether the treatment under investigation may affect healthcare use for other than respiratory indications. Another reason it is better to record all healthcare utilisation is the difficulty of disentangling COPD from comorbidities that occur more frequently in patients with COPD than in patients without COPD. Conversely, a few rare but costly events unrelated to COPD or the treatment investigated, which by chance occur in one treatment group and not in the other, may influence the cost-difference in a way that does not correctly reflect treatment impact.

Implications of clinical trial protocol-driven costs

Among the disadvantages of an economic evaluation appended to a clinical trial is the occurrence of protocol-driven costs. These typically comprise the costs of the regularly scheduled trial visits and examinations. These protocol-driven costs are usually excluded. However, exclusion may underestimate the total costs since these trial visits may have substituted visits that would have occurred if the trial had not taken place. Conversely, due to the trial situation, patients may feel less reservation to contact their physician sooner in case of minor complaints. The latter affects both treatment groups equally. However, the substitution effect is more likely to occur in the control than in the active treatment group, as the condition of the patients in the control group may be less well controlled. Consequently, if there is a bias, it is more likely to be a bias against the active treatment group. However, the contribution of unscheduled visits to the total cost is generally small and it is unlikely that the difference in costs between treatment groups is largely affected by this bias.

Productivity losses

Measurement issues and analysis perspective

Assessment of patient travel and waiting time, disability and absence from and productivity while at work, and caregiver costs 284 comprise additional and important outcome measures in COPD. These nonmedical economic consequences of COPD comprise ∼50% of the overall disease burden. However, there are only a few instruments that can evaluate production losses 285.

Whether or not to include these costs into the cost-effectiveness analyses depends on the perspective of the analyses. Choosing a societal perspective implies including the costs of lost or impaired ability to work, as well as other production losses that may occur when patients can perform household or caring activities less well, or they engage in volunteer activities.

The absolute minimum that needs to be recorded to calculate productivity costs is the number of days of absence from paid work or the start and end date of the absence spell. At baseline, whether or not the patient has a paid job, how many days per week they work and how many hours per day should be recorded. This information should be updated during the trial. From that information, the mean number of working hours per day can be calculated and multiplied by the number of days of absence from work to estimate the total number of hours missed.

Possible approaches to calculating costs of productivity losses

There are two different approaches to calculating the costs of production losses: the friction cost approach and the human capital approach 286. The friction cost method is based on the idea that “the amount of production lost due to disease depends on the time-span organisations need to restore the initial production level” 286. It is assumed that sick employees can be replaced after a period necessary for adaptation, i.e. the friction period. In the friction cost method productivity costs are calculated by multiplying the days absent from work with the value of the daily productivity, where the number of days absent from work is limited to the duration of the friction period. The human capital approach does not take any friction period into account, but estimates the cost of lost production from the first day of sick leave onwards.

In both approaches the value of the daily productivity can be approximated by the average gross daily earnings, which include the direct salaries and social security contributions payable by the employee.

Outcomes in economic evaluation

Typical outcomes and level of assessment

Economic evaluations can assess the economic impact of different types of interventions ranging from diagnostic, therapeutic or palliative interventions to the organisation of the process of care delivery or the implementation of COPD treatment guidelines. The topic of the evaluation and the decisions that need to be supported with the evidence from the economic evaluation drive the choice of the outcome measures. When the decision to be supported is at the macro level, such as the inclusion of a new treatment into the reimbursed benefit package of a health insurer, economic evaluations require the use of final outcomes, such as life-years gained, improvement in generic quality of life and QALYs. Cost-effectiveness ratios, such as costs per life-year gained or cost per QALY, allow the comparison of cost-effectiveness of interventions across different diseases but limit comparison across jurisdictions, as the value of single cost units differs across settings. For decisions at the institutional level, such as whether or not to introduce early assisted discharge when patients are hospitalised for a COPD exacerbation, it might be sufficient to measure disease-specific quality of life, re-admission rate and mortality, in addition to other clinical outcomes. In this specific example, caregiver quality of life would also be a relevant outcome measure to include in an economic evaluation.


Obtaining QALYs requires the use of a preference-based quality-of-life instrument or utility instrument, such as the EQ-5D or the Health Utility Index. However, these instruments have been criticised for not being sensitive to changes in COPD patients' health status. Part of this insensitivity is caused by the absence of dimensions of health that are particularly relevant for COPD in these instruments and the relative less importance for COPD of some of the dimensions that are included. For example, the pain dimension is often present, but breathlessness and fatigue are usually absent. Moreover, these utility measures do not capture the impact of exacerbations on quality of life very well. This criticism also applies to the commonly applied COPD-specific quality-of-life measures.

In COPD, the cost-effectiveness of a few interventions has been assessed in terms of cost per QALY. These interventions include lung transplantation 287289, lung volume reduction surgery 290, mechanical ventilation 291, pulmonary rehabilitation 292, 293, smoking cessation 294, screening 295 and pharmacotherapy 296.

Incremental cost-effectiveness ratio

The cost-effectiveness ratio and its graphical analysis

In a similar manner to most new treatments, new interventions to manage COPD rarely generate net savings; that is, the costs of the intervention are not offset by the savings in other healthcare resources. More commonly, new interventions are more effective than their comparator but also more costly. This information on additional costs and effects can be combined into a cost-effectiveness ratio. A cost-effectiveness ratio is calculated as the mean difference in costs divided by the mean difference in outcomes (e.g. QALYs). Confidence limits cannot be applied to these differences. If they could, a negative ratio that results from dividing a negative difference in costs (i.e. savings) by a positive difference in outcomes would be treated exactly the same as a negative ratio that results from a positive difference in costs by a negative difference in outcomes, which is obviously wrong. Therefore, the uncertainty around a cost-effectiveness ratio is usually shown as a confidence region on the cost-effectiveness plane 297, 298. A cost-effectiveness plane is an x-y diagram of the difference in outcomes on the x-axis and the difference in costs on the y-axis. An example of a cost-effectiveness plane of drug A compared with drug B is provided in figure 1. The quadrants of the plane show the possible combinations of positive or negative outcomes with positive or negative costs. The dots that form the confidence region can be obtained by bootstrapping. The bootstrap technique estimates the sampling distribution of the costs and effects through a large number of random draws from the original data, based on sampling with replacement 299. For each bootstrap sample a new cost-effectiveness ratio is calculated. All these cost-effectiveness ratios are plotted on the cost-effectiveness plane, reflecting the uncertainty as a confidence region around the cost-effectiveness ratio. It can be calculated which proportion of the bootstrap replications of the cost-effectiveness ratio fall in each of the quadrants. Figure 1 shows a cost-effectiveness plane with 72% of all ratios in the north-east quadrant (drug A costs more than drug B, but is more effective), 3% of all ratios in the south-east quadrant (drug A is more effective at less cost than drug B), 2% of all ratios in the south-west quadrant (drug A is less costly but also less effective than drug B) and 44% of all ratios in the north-west quadrant (drug A is more costly and less effective than drug B).

The cost-effectiveness acceptability curve

Whether an intervention is cost-effective cannot be judged without information on the maximum that decision makers are willing to pay for a QALY, an exacerbation-free month or another unit of effect. Although some countries, such as the UK, disclose information on the maximum acceptable costs per QALY 300, most countries do not. As the maximum willingness to pay is unknown, the information on the cost-effectiveness plane can be used to estimate the likelihood that a treatment is cost-effective at various levels of the willingness to pay. This likelihood is what is presented in a cost-effectiveness acceptability curve 301. Figure 2 shows the acceptability curve created from the plane in figure 1. The acceptability curve presents the likelihood that drug A is the most cost-effective of the two treatments as a function of the maximum acceptable willingness to pay for a QALY. This maximum acceptable willingness to pay is often called a ceiling ratio. The curve shows, for example, an ∼70% chance that the incremental cost-effectiveness ratio of drug A versus drug B is <\#8364;10,000. In other words, the probability that drug A is most cost-effective when decision makers are willing to pay ≤\#8364;10,000 for a QALY is ∼70%. The curve starts somewhat below 0.2, which indicates the probability that treatment A is cost saving compared with treatment B, and asymptotes to 0.8, which indicates the probability that treatment A is more effective than treatment B. The reading starting at a probability of 0.5, across to the curve and down to the x-axis shows the incremental cost-effectiveness ratio of drug A compared with B; in this example \#8364;4,118.

COPD progression simulation models

Models may follow up on the empirical assessment of cost-effectiveness alongside clinical trials, for example when there is a need to extend the time horizon of a clinical trial in order to capture all relevant economic end-points. Simulation models can be used to project disease burden or estimate cost-effectiveness of interventions 277. The Burden of Lung Disease model is designed as a burden-estimation tool for use by policy makers and local researchers. In addition, recent work describes estimates for the cost-effectiveness of COPD interventions 294, 302305. These models are called “state transition models” or “Markov models” 306, which simulate the progression of COPD over different stages of disease severity and model the probability to experience COPD exacerbations. This interest in COPD models is also driven by the need to adapt the model input (e.g. the prevalence distribution of COPD severity or the average length of hospital stay) before transferring the cost-effectiveness results from one country or setting to another country or setting.


Cost-effectiveness analyses evaluate the net changes in costs and outcomes that will result from using a new treatment in a particular group of patients, compared with an existing treatment. In particular, where the drug is more expensive, it is necessary to determine whether the increase in cost is justified by the resultant improvement in patient-centred outcomes. Unfortunately, in COPD pharmacological trials the presence of protocol-induced visits may lead to increased monitoring of patients with associated improvements in health behaviour and compliance, which may lead to an underestimation of costs compared with what would occur in a more naturalistic setting. Furthermore, the clinical trials have a double-masked design, so the results are unlikely to reflect compliance and patient preference differences that might be seen in a comparison of treatments. The relatively short time period of trials, in particular, places important limitations on the accuracy of the cost estimates, as expensive hospitalisations may have occurred outside the study period. Moreover, few studies determine all resource use and potential downstream cost. In any case, without an evaluation of new agents using summary outcomes, such as QALYs, as the measure of effectiveness, it is difficult to gauge the value of pharmacological therapy in individuals with COPD.



FEV1 is a nonspecific end-point that does not distinguish the relative contribution to airflow obstruction arising from emphysema, chronic obstructive bronchitis, asthma and bronchiectasis. As has been recently suggested, progress toward specific treatments for COPD might be accelerated by moving beyond measurements of airflow limitation to the precise diagnosis of the specific targets responsible for the airflow limitation 307. Computed tomography (CT) imaging provides a means of accurately characterising lung parenchymal changes and the nature of the image data facilitates quantitative assessment. Although comparison between plain radiography and CT has shown that for clinical purposes the plain film still has an important role in the evaluation of COPD 308, CT is more sensitive than plain radiography in diagnosing emphysema, and correlates with the presence and severity of emphysema better than nonspecific physiological parameters, such as FEV1 and DL,CO/alveolar volume 309, 310. Longitudinal studies indicate that densitometric indices relate to the decline in FEV1 311 but, in addition, are a more sensitive measure of emphysema progression than pulmonary function tests and health status 312315. Furthermore, the addition of CT evaluation of measurements of airway wall thickening has greatly contributed to the in vivo morphological study of COPD 316. Recent data show that by quantifying both the extent of emphysema and of airway remodelling, high-resolution CT (HRCT) is useful in differentiating COPD patients who have predominant parenchymal disease from those who have predominant airway pathology 317, 318. The importance of determining the relative contribution of emphysema and conductive airway remodelling in individual subjects with COPD is further highlighted by a study showing that neutrophil counts in the induced sputum are significantly associated with CT indices of peripheral airway dysfunction but not with the severity of emphysema, as assessed by both CT and DL,CO 319. Consequently, quantitative CT presents the first real opportunity to measure accurately and repeatedly in vivo lung pathological changes related to specific mechanisms of airflow limitation in COPD.

General principles of CT lung parenchyma and airways analysis

Various indices have been used to quantify lung density changes by CT. Mean lung density (MLD) is calculated by averaging the density of all pixels in the image that represent the entire lung. It has been validated by correlation with lung function tests 320323. The percentile point is defined as the cut-off density value in Hounsfield units (HU), for which a predetermined percentage of all voxels has a lower value and, as with MLD, is also influenced by density changes in all lung structures 324, 325. Only the fifth percentile point has been correlated with pathology 326 and lung function tests 327, although in the assessment of emphysema progression there is similar sensitivity between the 10th and 20th percentile 328. The voxel index (VI), also referred to as Density Mask after the software program developed by General Electric Medical Systems (Milwaukee, WI, USA), or the “relative area”, is defined as the proportional area under the curve of the histogram below a predetermined threshold. It is not influenced by changes in the attenuation value of voxels that remain beyond the designated threshold. Different thresholds have been applied and validated by comparative studies using pathological standards 329333 and physiology 331, 332, 334337.

The percentage wall area and the ratio of wall thickness to whole diameter of the right upper lobe bronchus 316, 318 or all depicted bronchi of >2 mm in diameter 317 have been used to evaluate airway wall thickening. These measurements have been shown to be reliable in the assessment of the conductive airway remodelling that is characteristic of COPD, and in defining the relative contribution of emphysema and airway disease to physiological impairment. HRCT-pathological correlation has shown that CT measurements of airways with an internal perimeter ≥0.75 cm could be used to estimate the dimensions of the small peripheral conductive airways 338. Recent studies have attempted to address standardisation of the methodology of airway measurements by CT 339342.

Inherent errors in the evaluation of airways that are oriented obliquely to the axial plane may be overcome through the use of dedicated software that has been shown to be accurate in the three-dimensional computation of the central axis of the bronchi and in its two-dimensional lumen and wall contour segmentation 342.

Variability of quantitative CT indices

With respect to scanner performance, the density of the lungs is much lower than that of water or bone and falls in a range where not all manufacturers have optimised their scanners. Some systems, for instance, have large and varying offsets at air density 343, 344. The lung, consisting of air-filled cavities of almost zero density within nearly water-equivalent tissue, is very heterogenous and gives rise to the nonlinear partial volume effect, which may cause an underestimation of lung density 345.

Several physiological variables affect lung density. Lung volume is the main confounder and the physiological variable that is probably most pertinent to the reproducibility of densitometry in long-term studies. Gross lung density can vary by as much as 80–100 HU from full inspiration to end expiration 346, 347.

At this time, the methodology has not been standardised. A number of issues remain unresolved. First, there needs to be control of or correction for variability in inspiratory level. Incorporation of a respiratory gating device consisting of a spirometer and a microcomputer has been proposed as a means of controlling ventilatory volume during image acquisition 348. In an evaluation of this procedure, however, it was concluded that the repeatability of lung densitometry could not be improved by spirometric control 349. A similar technique using a pneumotachometer allows the patient to breathe between acquisitions and scanning is initiated on return to a pre-selected inspiratory level 350. An alternative approach is to standardise densitometric measurements for lung volume using a technique that acquires two volume scans at different lung volumes. The relationship between lung density and logarithmically transformed volume of air in the lungs measured from CT images is linear and thus lung density can be calculated for a specified lung volume using linear regression 328. This approach of volume correction has been shown to improve reproducibility 351, 352, but its use in long-term studies may mask some of the lung density loss secondary to emphysema-related hyperinflation. The routine use of these volume-control methods in densitometric studies remains contentious 350355.

Consistent measurements are most likely to be achieved at maximum inspiration 353 because variation in CT lung density is lowest at full inspiration 356; in cooperative patients, breath-holding at maximum inspiration is most reproducible 357. A study aimed at investigating the relationship between HRCT lung attenuation measurements (employing spirometric lung volume control), pulmonary dysfunction and dyspnoea severity in patients with COPD 358, has shown that pulmonary dysfunction in COPD cannot be assessed by a single modality of lung attenuation measurements. In particular, the inspiratory level at which spirometrically gated measurements of HRCT lung attenuation are acquired influences the relationship with physiological measurements and dyspnoea perception in COPD: inspiratory measurements assess the extent of emphysematous tissue loss, expiratory measurements may reflect airflow limitation and lung hyperinflation with attendant dyspnoea perception 358.

Secondly, there are issues with the optimum image acquisition protocol. CT numbers are recognised to be unreliable 359 and are dependent on scanner type, model, object positioning within the scanner gantry and various physical factors (e.g. kilovoltage, current-time product, slice thickness and reconstruction algorithm) 343, 359361. Furthermore, it is recognised that spatial uniformity of CT numbers over the entire subject area may only be achievable for certain combinations of parameters 362.

The optimum acquisition protocol remains contentious. Scanner settings for optimal visual resolution and density resolution 363 are mutually exclusive and, although a standardised densitometry protocol has been developed that gives comparable results in different CT scanners 361, 364, this is at the expense of visual interpretation.

Finally, delineation methods are perhaps the most important component of imaging software. Semi-automated image-processing programs, such as the “seeded-region growing” method 365, reduce interobserver variability compared with the manual outlining of complex structures 366. Software incorporating an internal calibration step may allow correction of errors in scanner calibration 344, 360, 367 and the generation of audit trails for regulation and electronic security ensure integrity 368.

There is no convincing evidence with regard to whether analysis of the whole lung is superior to single or limited slice assessment. Studies based on visual point-counting scores 369 suggest that adequate assessment cannot be obtained from one lung slice alone, although limited evidence from densitometric studies 328, 370 suggests that similar results may be obtained for the whole lung and single or limited slice analysis.

The optimum densitometric index for use in longitudinal trials remains contentious. The MLD is subject to noise 364 and is less sensitive to progression than other indices 371. The sensitivity of the VI is influenced by threshold 328, 344, whereas the percentile point is less dependent on the choice of percentile 328, 372. Other techniques have been described but are less well validated than the aforementioned methods 373375.

The normal range of lung density has been determined in relationship to age and height 327, 376, 377. This supports the sensitivity of changes in disease that are independent of either age or height 326. However, more recent studies have demonstrated age-related changes in both airspace dimensions 378380 and CT lung density measurements. CT densitometry is reproducible in the short term 314, 343, 351353, 381, 382 and also likely in the long term 344, 367.

During exacerbations of COPD, accurate and precise measurements of emphysema cannot be performed. Lung density will be altered by the influence of changes in airways resistance on gas trapping and by the presence of interstitial changes resulting from infection. In addition, increased dyspnoea will influence the ability to breath-hold. In the case of coexisting illnesses, lung density will be altered by respiratory (e.g. pneumonia, pulmonary embolism, asthma, interstitial lung disease) and nonrespiratory (e.g. cardiac failure and other causes of pulmonary oedema) comorbidities.

With regard to patient safety, CT examination does entail exposure to ionising radiation. Historically considered to be a high-dose examination 383, modern scanners and protocols have allowed great reduction in effective dose delivery without loss of fidelity. Conventional HRCT (1.5-mm images at 10-mm intervals with 140 kVp and 175 mAs) delivers an effective dose of 0.98 mSv, which is ∼12 times that of a posteroanterior and lateral chest radiograph 384. Volumetric protocols entail several times this dose but the use of low-dose multidetector imaging, employing tube currents as low as 8 mAs, reduces the dose to below that recommended in studies of mild-to-moderate risk (1–10 mSv) 385.

The exposure to ionising radiation with CT scanning causes concerns about the risk/benefit ratio, but estimates of this ratio can only be approximate. The damaging effects of ionising radiation are assumed to follow a linear dose–effect relationship and estimates of risk are largely extrapolated from data obtained from extremely high-dose exposure 386. The so-called “linear-no threshold” hypothesis 387 allows estimation of risk by extrapolation and rightly adopts a cautious stance towards safety. Nevertheless, this hypothesis has been questioned 388 and the explanation of risk may be more relevant and more meaningful to the lay person if expressed as multiples of the natural background radiation using the background-equivalent radiation time unit 389. The risk/benefit of using CT scanning in patients exposed to ionising radiation in the absence of a clinical indication underlies the need of standardising the use of low-radiation dose protocols in the context of COPD clinical trials 390. Conversely, the life-time mortality risk from cancer of an average 50–70-yr-old COPD patient after exposure to a low-dose chest CT is sufficiently low 390, 391 to justify the use of the technique in carefully controlled studies aimed to: 1) define the in vivo lung morphological changes underlying airflow limitation; 2) identify different clinical phenotypes of the disease; and 3) understand its natural history more clearly and the possible effects of preventative and pharmacological intervention. The risk/benefit balance of CT studies in the absence of routine clinical indication is compounded by the lack of consensus on whether the risk of low levels of exposure (<100 mSv) can be extrapolated from the complications arising from extreme levels of exposure, such as nuclear explosions or accidents.


Although CT densitometric evaluation allows the measurement of the progression of emphysema and the evaluation of airway wall thickening, a number of concerns have been raised regarding its use in COPD clinical trials. These mainly relate to the unresolved issues of the repeated exposure of patients to ionising radiation and the high costs involved in its frequent use. Another issue is that this methodology has not been fully validated yet. CT densitometric evaluation could be limited to a select group of patients in which the impact of treatment on airway remodelling and the progression of emphysema can be potentially studied.



There is a growing realisation that COPD is a multiorgan-system disease. In particular, there is accumulating evidence that the skeletal muscles do not function normally, contributing to exercise intolerance. This is important because skeletal muscle dysfunction may well be a remediable source of exercise intolerance 392. The fat-free mass (FFM) is also a significant determinant of exercise capacity in patients with COPD. With the recognition that extra-pulmonary aspects of COPD are important, standardised markers of skeletal muscle function and lean body mass must be considered. Loss of skeletal muscle mass is the main cause of weight loss in COPD, whereas loss of fat mass contributes to a lesser extent 393.

Body weight and FFM

BMI is a measure of body weight corrected for body height (body weight×height−2; in kg·m−2), whereas the FFM index (FFMI) is a measure of FFM corrected for body height (FFM×height−2; in kg·m−2) 394.

These measures have been standardised. BMI is widely used in health and disease as a measure of body weight adjusted to height 266. FFM includes all compartments of the body except fat mass. It consists of muscle tissue, bone tissue and body fluids, and it can be easily measured using bioelectrical impedance analysis. More sophisticated measures are dual X-ray analysis or deuterium and bromide dilution 394400. Statistical frequency distributions have been established for both measures. For BMI and FFMI in the general population and in the COPD patient population, the World Health Organization and COPD researchers have specified a number of ranges that represent certain nutritional states, as outlined in table 3 261, 266, 399, 401. Measuring BMI is highly reproducible when using the same equipment. In the case of FFMI, it is also reproducible but dependent on the method used 398, 402.

In the general population, these measures have been found to influence health outcome. Decreased BMI and FFMI are associated with impaired physical functioning. Obesity correlates with increased cardiovascular comorbidity and diabetes mellitus, but the shape of the association is not linear 403. In COPD patients, a decreased BMI and especially a decreased FFMI are associated with impaired muscle function, exercise capacity and health status and with decreased survival 261, 266, 397, 404, 405. BMI can be increased by diet or nutritional supplementation 244, 406408, whereas FFMI can be increased by training (rehabilitation) 409 and/or anabolic therapies, such as anabolic steroids 410. During exacerbations of COPD, BMI and FFMI can be measured especially using bioelectrical impedance analysis 411. However, it must be taken into account that fluid shifts can take place during exacerbations, which might affect FFM measurement. FFM depletion and involuntary weight loss are associated with an enhanced systemic inflammatory response 412414. Osteoporosis is commonly associated with FFM depletion 415.

The variables BMI and FFMI can easily be used in multicentre trials, with the restriction that the same equipment should be used throughout the centres. Measurements of these variables require a height rod, a weighing beam scale and bioelectrical impedance equipment. The process takes only 5 min and involves little training or cost.

Measurement of quadriceps muscle function

Quadriceps muscle weakness has been observed in patients with COPD 416, 417 and has been related to exercise intolerance 418, utilisation of healthcare resources 419 and survival in patients with moderate-to-severe COPD 420. Therefore, treatment of quadriceps muscle weakness is very important in the management of patients with moderate-to-severe COPD 421423.

Many different techniques have been used to measure quadriceps muscle weakness in clinical trials. Volitional and nonvolitional quadriceps muscle force can be measured.

Isometric assessments involve patients sitting with their hip in 90° flexion with a lever connected to the lower leg by a strap (with two fingers above the lateral malleolus). The lever has to be moved towards the knee angle of interest (e.g. 60° knee flexion) and the patient is then asked to extend the leg with as much force as possible for 4–6 s.

In COPD patients, no standardisation of this technique has been conducted. With respect to reproducibility in the short term, there have only been a few studies with small samples of patients to suggest that this is indeed the case 424. In terms of the influence of quadriceps muscle function on the health outcome of COPD patients, it is known that patients with moderate-to-severe COPD who had been admitted to hospital at least twice in the year prior to the study had significantly lower isometric quadriceps muscle force than those without hospital admission 419. However, the actual cost of healthcare resource utilisation was not studied. Quadriceps muscle function appears to be a sensitive measure of treatment effect attributable to specific lower limb training, and multiple international clinical trials, including patients with COPD, have used a computerised dynamometer to assess isometric quadriceps muscle force before and after an exercise training programme 421. Limited data are available with regard to muscle function testing in pharmacological intervention studies. Improvements in muscle bulk and strength are reported after hormonal replacement therapy in elderly males and COPD patients 425, 426.

In the case of isokinetic assessments, COPD patients sit with their hip in 90° flexion and a lever is connected to the lower leg by a strap (with two fingers above the lateral malleolus). The lever has to be moved from 90° to 0° knee flexion with as much force as possible; gravity will bring the leg back into the starting position. This can be repeated 15–30 times to assess quadriceps muscle endurance 417.

Femoral nerve magnetic stimulation requires that patients lie on a specially modified couch with their knees bent at 90°. The ankle on their dominant side is placed in an inextensible strap connected to a strain gauge. The signal is amplified and passed to a personal computer running the software program LabVIEW (Instron Deutschland GmbH, Darmstadt, Germany). Stimulation of the femoral nerve is carried out using a custom-made double 70-mm branding iron design coil connected via a Y-connector to two Magstim 200 Mono Pulse electromagnets (Magstim Co. Ltd, Whitland, UK). The output of two magnets combined in this fashion is equivalent to 120% of the output of a single unit. Stimulations are delivered spaced ≥20 s apart to avoid twitch-on-twitch potentiation.


The observed relationships between weight loss, muscle wasting and muscle weakness 394, 427, which are independent of FEV1, as well as the known close relationship between respiratory muscle weakness and dyspnoea, indicate the importance of adequately recording and reporting all patient characteristics in sufficient detail during pharmacological trials. This enables investigators to better understand the physical status of their patients, particularly the state of their peripheral muscles in the context of COPD. Unfortunately, BMI and FFMI are the only standardised measures available. Measurements of quadriceps muscle function are not yet standard, mainly due to the expense and need for a trained operator.



With the increased interest in COPD and increasing number of reports of clinical trials, physicians are faced with the almost daily challenge of evaluating published reports of therapies for this disorder. Evaluating the clinical significance of such studies requires a firm grasp of statistics. The concept of minimal clinically important difference (MCID) in outcomes of therapy for patients with COPD has been proposed as a tool to assist clinicians and researchers in understanding the results of clinical trials. The minimal important difference (MID) has been defined by a group of clinical epidemiologists at McMaster University as “the smallest difference in score in the outcome of interest that informed patients or informed proxies perceive as important, either beneficial or harmful, and which would lead the patient or clinician to consider a change in management.” The description of the MID precludes making MID estimates for outcomes that are remote from those important, in themselves, to patients, such as spirometry or laboratory exercise capacity. Further, the definition suggests that only if there were reasons to question the reliability or accuracy of data from patients could proxies be relied on to provide estimates of the MID 428430. The MID should optimally be determined in a population of subjects similar to that in which the MID is to be applied. Thus, the MID for various measures to be used in outcome studies of COPD should be determined from populations with COPD. The severity of the population and homogeneity of the population in which MID is estimated are also important factors to consider.

There are three basic methods for estimating MID: 1) statistical- or distribution-based methods that focus on the variance and distributional properties of scores in an untreated population of patients with the disease of interest; 2) panel-based estimates from healthcare professionals and patients; and 3) external- or anchor-based methods, which compare changes in the outcome of interest to other clinically important outcomes. Statistical methods for estimating MID include a half sd, se of measurement (the product of the sd and the square root of one, i.e. reliability of the measure), effect size (the average change divided by the baseline sd) and standardised response mean (the average change divided by the sd of that change) 431433. The original method to determine the MID relied on anchor-based approaches, while panel-based methods are infrequently used. Leidy and Wyrwich 434 have recently suggested “triangulation methodology” to describe the need to consider all three estimation methods along with expert input to identify a final MID. In addition, the uncertainty around the point estimate for the MID should be considered and provided.

MID estimates for some outcome measures in studies of COPD are summarised in table 4.

Health status and HRQoL

St George's Respiratory Questionnaire

Professional opinion-based MID

Physicians who were experts in respiratory disease were asked to make judgments about the magnitude of differences in exercise capacity, shortness of breath, wheeze, cough and depression, variables of importance in patients with COPD 444. The magnitude of change felt to be clinically significant was assessed by clinicians and the results were applied to making an estimate of the MID for the SGRQ 445. The resultant 3.9-unit MID estimate was similar for SGRQ total and impact scores 444. In a clinical therapeutic trial, a 4.2-unit SGRQ change was associated with a minimum clinician rating of improvement 435.

Patient opinion-based MID

In a 16-week controlled investigation of salmeterol in COPD 446, subjects were asked to rate the magnitude of treatment effect. The smallest possible treatment improvement correlated with an SGRQ of ∼4 units.

External measure-based MID

In a study of the effects of pulmonary rehabilitation, the SGRQ was compared with another disease-specific health status measure, the CRQ 447. Using the MID for the CRQ, it was estimated that the MID for SGRQ total score (95% CI) was 3.05 (0.39–5.71) units. The SGRQ has also been related to shortness of breath. A change in one grade of the MRC dyspnoea scale from 5 (dyspnoea leading to inability to leave the house) to 4 (significant dyspnoea but able to leave the house) correlated with an SGRQ change of 3.9 units 144.

Suggested MID

An MID (range) of ∼4 (2.4–5.6) units in the SGRQ is supported by published studies.

Chronic Respiratory Questionnaire

The reliability of CRQ has been studied. Test–retest reliability intra-class correlation coefficients of 0.73–0.95 and internal consistency reliability ranging 0.53–0.90 (Cronbach's alpha) have been reported 117, 165, 448451.

Statistical MID estimate

In 471 outpatients with COPD, Wyrwich et al. 118 found the se MID estimate for CRQ to be 0.5 units. Others have reported MID estimates using the se approach ranging 0.37–0.62 units 428.

Effect sizes have also been calculated from a study of 51 patients undergoing pulmonary rehabilitation. Effect sizes of 0.5 were found with a CRQ dyspnoea score of 0.61 units, CRQ fatigue score of 0.67 units, CRQ emotional function score of 0.60 units and CRQ mastery score of 0.60 units 116, 447.

Panel-based MID

Wyrwich et al. 452 used a panel of nine expert general physicians and specialists to estimate MID. The results of the research indicated available estimates of the MID of ∼0.5 units.

Anchor-based MID

In a longitudinal study of a group of patients with COPD 453, patients were asked to assess their degree of change on subsequent visits on a global rating scale and the results were compared with the CRQ. Within-patient global ratings suggested mean CRQ domain MIDs of 0.43–0.64 and ranges 0.28–0.80.

In a study of between-patient global ratings 454, subjects discussed and compared their problems on the CRQ to those of other patients. The MCID in CRQ domains ranged 0.09–0.87 with a pooled 95% CI 0.32–0.53.

Suggested MID

An MID for the CRQ in the range of ∼0.5 units is supported by numerous published investigations 428.


Transitional dyspnoea index

BDI and TDI (with TDI indicating change in response to an intervention) have been widely used in clinical studies of COPD to measure shortness of breath. The instruments are interviewer-administered and good reliability (r = 0.75) has been demonstrated in a study of 25 patients with COPD and different interviewers 148.

Statistical MID estimate

TDI cannot be analysed in an untreated population since the instrument is only used in response to an intervention. Some of the largest changes in TDI in response to a pharmacological therapeutic intervention in COPD have been shown with tiotropium. An sd of 2.4 in one study of tiotropium provides a statistical MID estimate of 1.2 units 149.

Professional opinion-based MID

The developer of the TDI indicated that expert physicians suggest an MID of 1 unit 436.

External measure-based MID

The Physician's Global Evaluation (PGE) has been used as an external measure to compare with changes in TDI in a total of 921 subjects with COPD in therapeutic studies of tiotropium. Witek and Mahler 149, 455 found that a 1-unit change in TDI corresponded to minimal change in the PGE. Subjects who had a >1-unit change in TDI had better health status, as assessed by the SGRQ, fewer COPD exacerbations, and used less rescue short-acting β-agonists.

Suggested MID

There is sufficient evidence to suggest that the MID for the TDI is 1 unit.


6-min walking test

Of the various measures of performance of activity, the 6MWT and the incremental shuttle walk test have been studied most extensively in patients with COPD. There is more information available on the 6MWT but less data available on the shuttle walk test upon which to base an MID. The reproducibility of the 6MWT has been very good and the coefficient of variation has been reported to be ∼8% 215, 232, 456, 457. However, when subsequent tests are performed there appears to be a definite improvement that may be due to a learning effect 230. Additionally, factors such as encouragement and course layout have been shown to affect the results 230. The ATS has published standards for the 6MWT and rigorous application of the suggested standardised technique for conducting the test may assist investigators in reducing variability of the test in clinical studies 458. The 6MWT also correlates with pulmonary function (typical correlation coefficient of 0.5–0.6), dyspnoea and, to a smaller degree, with HRQoL 458461.

Statistical MID estimate

The National Emphysema Treatment Trial (NETT) reported the results of the 6MWT on a very large number of subjects (n = 761) with severe pulmonary dysfunction and CT-documented emphysema 230. Wise and Brown 439 used the data from the 470 subjects who had repeated walking tests, with the second test 23 m longer, and calculated an intra-class correlation of 0.88 and reliability coefficient of 0.63 439. Using this approach, the MID was estimated as 80 m. Using 6MWT sd of 90 m in all 761 subjects in the NETT, the half sd estimate for MID was 47 m.

Patient opinion-based MID

Redelmeier et al. 462 studied 112 patients with severe COPD, of which 50% were female. Patients were asked to rate their change in 6MWT over a period of months. There was a poor correlation between patient perception and actual walk distance, and this was attributed to poor memory of prior walk distance. Patients were asked to rate their change to differences in other patients using the global ratings of change, indicating the magnitude of the change they perceived. The smallest difference in walk distance (95% CI) that patients were able to perceive was 54 (37–71) m.

Suggested MID

The ATS guideline on 6MWT 458, the report of Redelmeier et al. 462, the information from the NETT 230, and a recent summary by Wise and Brown 439 support a 6MWD MID range of 54–80 m.

Constant work rate tests at submaximal exercise intensity

Constant work rate tests performed at a submaximal exercise intensity are being increasingly used as an outcome measure in studies of COPD, largely because they have the additional advantage of indicating mechanisms leading to improved exercise. However, only a few recent studies use this measure 49, 225, 463. The outcome of constant work rate tests at a submaximal exercise intensity is the duration of the exercise (min) that patients can perform. As recently emphasised by Casaburi 464, methodological issues may limit the interpretation of utility of this measure. The choice of initial exercise intensity is a major factor in the increase in exercise duration that can be seen on repeated testing. If the baseline exercise intensity chosen is too low, then subjects may have an almost unlimited duration of exercise following a therapeutic intervention.

Statistical MID estimate

O'Donnell et al. 49 performed submaximal exercise tests on 187 subjects with severe COPD and hyperinflation. Mean FEV1 was 41% pred, TLC was 119% pred and RV was 198% pred. Submaximal exercise was performed at 75% of maximal work assessed on incremental cycle ergometry. Mean±sd baseline exercise duration prior to treatment was 492±290 s. A half sd MID estimate based on baseline exercise capacity is 145 s (2.4 min). Improvement following tiotropium was 105 s greater than with placebo for a modest effect size of 0.36 (mean change divided by baseline sd).

Oga et al. 213 evaluated three types of exercise tests (submaximal exercise test, 6MWT and incremental cycle ergometry) in response to oxitropium in patients with COPD. The endurance exercise test was performed at 80% of maximal work achieved during an incremental cycle test. In 42 consecutive male COPD patients with a mean FEV1 of 42% pred, mean±sd baseline submaximal exercise endurance time was 189±92 s, half that seen in the study by O'Donnell et al. 49. Thus, the half sd MID estimate based on this study is 46 s. The mean change following oxitropium was 34 s, with a calculated effect size of 0.37.

External measure-based MID

There has not been a rigorous evaluation of external-based MID estimates for submaximal exercise tests, and investigations of exercise performance have not simultaneously evaluated other outcomes. For example, the study by O'Donnell et al. 49 was an RCT, and the average improvement following tiotropium compared with placebo after 42 days of treatment was 105 s (1.75 min). This investigation did not assess other outcomes, but in other studies of bronchodilators, quality-of-life outcomes have reached the threshold for MCID. In addition, studies of oxygen therapy and exercise training have shown improved submaximal exercise performance in some studies, while from other studies it appears that these same interventions improve HRQoL.

Suggested MID

From this discussion, the MID range for submaximal exercise endurance time on a cycle ergometer may be 46–105 s (0.77–1.75 min). In a recent review, Casaburi 464 suggested an MID of 105 s (1.75 min). Further investigation of the MID for submaximal exercise is clearly warranted.

Maximal exercise test

Maximal cardiopulmonary exercise tests have infrequently been used as outcome measures in clinical studies of patients with COPD. However, this laboratory test is familiar to many pulmonary physicians and has the advantage of assessing the physiological limitations to exercise, as well as providing an objective measure of exercise performance. Test–retest reliability has been reported by Cox et al. 465 as good, with a reliability coefficient of 0.96.

Statistical MID estimate

In NETT, maximal exercise performance was used as a primary outcome measure 218. Using the half sd estimate in NETT demonstrates a 10.5–11.1-W MID; however, the se approach resulted in an MID of only 0.9 W 438.

Professional opinion-based MID

Prior to dissemination of the results of NETT, the investigators were asked their opinion of an MID in the context of lung volume reduction surgery. A value of 10 W was chosen by the investigators as the MID.

External measure-based MID

In NETT, the mean change in exercise capacity at 2 yrs in patients receiving lung volume reduction surgery compared with patients receiving medical therapy was 10.9 W. In the cohort not at risk of short-term mortality, there was also a significant improvement in HRQoL, as measured by the SGRQ 218.

Suggested MID

In the context of lung volume reduction surgery, 10 W may be the MID for maximal exercise workload. However, an MID for other interventions has not been established.

Lung function and FEV1

The universally used measure of lung function in clinical studies of COPD is FEV1. However, this measure of pulmonary function, currently recognised as only one of the key components necessary to fully characterise patients with COPD, has statistically significant but weak correlations with other patient-centred outcomes, such as dyspnoea 436. Therefore, FEV1 is only one (and an imperfect) method of assessing outcomes that are important to patients in more recent studies in patients with COPD. Despite its widespread use and the large number of clinical investigations that have simultaneously measured FEV1 and other patient-centred outcomes, there has been relatively little effort to determine an MID for FEV1. Furthermore, the definition of the MID calls into question the value of pulmonary function measures as being important for patients or leading to change in management. Nevertheless, as recently reviewed by Donohue 466, an MID for FEV1 of 100 mL can be suggested. However, a more rigorously defined MID for FEV1 is needed. A single MID estimate may be difficult to establish for several reasons. First, baseline severity of the underlying disease assessed by FEV1 is likely to be important. In this regard, change in FEV1 after bronchodilator therapy is less marked in terms of absolute change in FEV1 in patients with lower baseline lung function. Secondly, some patients have a greater degree of response to short-acting β-agonists, which may be associated with a widely variable response to other therapeutic agents. Thirdly, FEV1 has been used as an outcome at different time-points after therapy. FEV1 can be a short-term outcome measuring peak response over minutes to hours, a longer-term outcome over hours or days, as a “trough” response prior to the next treatment dose, and as a very long-term outcome over years. Fourthly, a recent study 467 indicated a greater variation in spirometry performed in clinical practice settings compared with pulmonary function laboratories.

Statistical MID estimate

Three spirometry manoeuvres must be performed for a test to be acceptable. The recent ATS/ERS spirometry standards note that at a single evaluation the two largest FEV1 values must be within <150 mL for an acceptable test 7. This standard is also suggested by the study by Enright et al. 468, which showed 90% of 18,000 patients were able to reproduce FEV1 by 120 mL during a single test. However, patients with moderate-to-severe pulmonary dysfunction, demonstrated higher variability in terms of FEV1 % pred but less variability when assessed as absolute volume (42–58 mL). In the Lung Health Study 469, spirometry performed 17 days apart showed an average absolute difference between the two tests of 110–123 mL. This reported variability of FEV1 within a single test and over a short period of time would suggest that a minimal detectable FEV1 difference over time in response to an intervention might be ≥110–150 mL in patients with less severe disease, but may be lower in patients with more severe pulmonary dysfunction.

Further information can be obtained using the statistical approach to estimating MID from studies of large populations. The largest study of reproducibility of spirometry over a short term was reported from the Lung Health Study 469, where spirometry was repeated after 17 days. Spirometry was performed 17 days apart in 5,885 subjects with a mean post-bronchodilator FEV1 of 2.75 L (78% pred). In this population, with only mild COPD, the coefficient of variation of FEV1 was 4.1–4.9%. The sd of FEV1 values seen in this untreated population at entry was 620 mL. Using the half sd approach to FEV1, the MID estimate from the Lung Health study 469 was 310 mL.

However, MID estimates based on patients with near-normal lung function may not be appropriate for patients with more severe disease. For example, in a recently reported study of 526 COPD patients enrolled in a clinical trial of N-acetylcysteine 187 FEV1 was 1.65 L (57% pred) with an sd of 380 mL. In another study more typical of clinical trials in COPD 470, 71 subjects in a crossover trial comparing tiotropium and formoterol had a mean FEV1 of 1.94 L (37% pred) with an sd of 290 mL. In a study of the effects of tiotropium on exacerbations, pre-treatment FEV1 was 1.60 L with an sd of 400 mL (35.6% pred) 37. Applying the half sd MID estimate to these clinical therapeutic trials of patients with symptomatic COPD indicates a statistical MID estimate of 145–200 mL.

Professional opinion-based MID

There are different perspectives on the change in FEV1 felt to be “significant” by professional organisations and clinical guidelines. The ATS/ERS and GOLD suggest that a “significant” response during one test session is a change of >12% or 200 mL, whichever is greater 2. However, this stipulation may simply indicate an improvement that is outside the range observed in normal individuals in response to a short-acting bronchodilator. The ERS has previously suggested that a change of 9% pred (∼250–300 mL) is a significant response to short-acting bronchodilators 9.

Patient opinion-based MID

One cross-sectional survey 471 asked 120 patients with COPD to compare their shortness of breath with other subjects enrolled in a pulmonary rehabilitation programme. There was a weak correlation of FEV1 and self-reported dyspnoea (r = 0.29). An FEV1 difference of 4% (or 112 mL) was associated with patients rating their dyspnoea as either slightly better or slightly worse than other patients.

Anchor-based MID

Although not rigorously evaluated to determine an MID, general results of published investigations in which FEV1 was assessed simultaneously along with other outcomes can provide insight into an FEV1 MID in COPD. Further studies are necessary to refine the FEV1 MID. Data from previously published clinical trials in COPD are likely to provide improved estimates of MID.

One study evaluated the relationship between change in FEV1 and clinical outcomes of acute exacerbations of COPD. In a study by Niewoehner et al. 37 of acute exacerbations of COPD, FEV1 change was associated with clinical response to treatment. An FEV1 improvement of <100 mL was associated with a higher relapse rate.

Other studies have shown improvement in FEV1 and effects on exacerbations, but these investigations have not been rigorously analysed to determine a precise MID. In an emergency department study of 147 patients, oral corticosteroid therapy was associated with a mean 140 mL improvement in FEV1 compared with patients treated with placebo. This change in FEV1 was associated with fewer relapses.

Suggested MID for FEV1

Table 5 summarises estimates for FEV1 MID using different approaches. As outlined by Donohue 466, further research including additional expert opinion is needed. Because these approaches result in different estimates and there has not been extensive literature addressing the MID for FEV1, an appropriate range of values for the MID for FEV1 might be 100–140 mL.

Other pulmonary function measures

Due to the cost and complexity of measuring lung volumes, diffusing capacity and arterial blood gases, these features have not been widely employed in research studies of therapeutic outcomes in COPD studies. Therefore, there is limited information regarding the MID for pulmonary function measures other than FEV1. However, there has recently been an increased interest in the assessment of hyperinflation and associated measurement of static and dynamic lung volumes in response to bronchodilators and lung volume reduction surgery 49, 58, 224. Some of these investigations have concurrently measured lung volumes, exercise capacity and quality of life, and could be further analysed to develop external measure-based estimates of MID, particularly for selected lung volume measures. Oxygenation is also a potentially important outcome. Substantial improvements in oxygenation may be associated with reducing or eliminating the need for supplemental oxygen therapy, which is critically important to patients. In addition, reduction in oxygen need may lead to a reduction in healthcare costs, a factor of importance to payers of healthcare and society.


The MID for outcome measures is a promising method to assist clinicians and investigators in the interpretation of therapeutic trials. However, the MID for some of the outcome measures (pulmonary function) should be more rigorously evaluated according to standard methods before they can be universally applied.



A biomarker refers to the measurement of any molecule or material (e.g. cells, tissue) that reflects the disease process. In COPD, several types of biomarker have been measured that are related to disease pathophysiology and the inflammatory and destructive process in the lung. Pulmonary biomarkers have been measured in bronchial biopsies, bronchoalveolar lavage (BAL), sputum and exhaled breath. Plasma biomarkers are discussed in the Nonpulmonary markers section. A review of >600 published studies suggests that few of these biomarkers have been validated and there is little information about reproducibility and the relationship to disease development, severity or progression 177, 472. In evaluating pulmonary biomarkers it is important to compare findings in patients with COPD with cigarette smokers matched for exposure who do not have significant airflow limitation (normal smokers) and with age-matched nonsmoking normal subjects. This is rarely performed accurately, making interpretation of abnormal findings difficult. The advantages and disadvantages of various pulmonary biomarkers in COPD have recently been reviewed 473.

Bronchial biopsies

Although the inflammation in COPD predominately involves lung parenchyma and small airways, bronchial biopsies appear to reflect the cellular abnormalities seen in the peripheral lung 474, 475. Bronchial biopsies have been useful for documenting the structural changes, cellular patterns and expression of inflammatory proteins in patients with COPD. In stable COPD there is increased infiltration of macrophages and activated T-lymphocytes, particularly of CD8+ T-lymphocytes 474, 476, which express interferon-γ, inducible protein-10 and interleukin (IL)-9 477, 478. Moreover, these lymphocytes express chemokine receptors associated to a type-1 response, such as CXC chemokine receptor 3, in contrast to lymphocytes in asthma, which express chemokine receptors typical of a type-2 response, such as CC chemokine receptor 4 479. While a prominent neutrophilia is present in the airway lumen of patients with COPD in stable conditions, it is not observed at the tissue level, except in patients with severe airflow limitation 480. Finally, during exacerbations of the disease, an increased recruitment of eosinophils and neutrophils has been described, which is associated with upregulation of specific chemoattractants, such as regulated on activation, normal T-cell expressed and secreted and CXC chemokine ligand 5 481483.

Several studies have assessed the potential anti-inflammatory effects of treatments in bronchial biopsies of patients with COPD. These studies usually involve either a baseline biopsy and a second biopsy after a defined period of treatment, or a single biopsy at the end of active treatment with a biopsy in a parallel group of patients taking placebo therapy. Overall, inhaled corticosteroids seem to have little effect on the airway inflammation typical of COPD, while they are able to reduce mast cells, an effect which is associated with a reduction in exacerbation numbers 484, 485. More encouraging results have been obtained after treatment with either a phosphodiesterase-4 inhibitor or with the combination of corticosteroids and bronchodilators 486, 487. However, further studies are required to establish whether the airway inflammation in COPD can be successfully eradicated and whether this would result in a significant clinical improvement.


The main advantage of endobronchial biopsies is that they directly sample airway tissue, maintaining the spatial relationships of structural components that may be important to functional changes 488. At variance with sputum and BAL, endobronchial biopsies can provide an assessment of structural components of the airway wall, such as epithelium, basement membrane, vessels, connective tissue deposition and, sometimes, smooth muscle and submucosal glands. Therefore, biomarkers of structural damage, such as apoptosis or uncontrolled proliferation, can be measured. Moreover, the different inflammatory cell subtypes can be identified by immunostaining in their microenvironment, thus allowing investigation of interaction between inflammatory and resident cells. Finally, individual structural components can be dissected from the biopsies and studied in isolation, using new techniques recently developed, such as laser microdissection 489.


There are, however, several limitations to bronchial biopsies as an outcome measurement in COPD. Since this is an invasive procedure, it may be difficult to recruit patients, especially in the studies investigating treatment effects, which require two biopsies (pre- and post-treatment). The biopsy of proximal airways may not closely reflect all the pathological changes present in peripheral airways and lung parenchyma, which are the sites responsible for airflow limitation in COPD. Moreover, it may not be possible to apply this procedure to patients with more severe disease, complicated by cardiac comorbid conditions and often associated with significant oxygen desaturation and hypercapnia 490. There is also a relatively high variability in baseline measurements of inflammatory cells, which would require multiple biopsies. Finally, since studies evaluating the effect of treatment should be designed to provide a power ≥80%, a large number of patients for each treatment group is usually required.

Bronchoalveolar lavage

BAL has the advantage, unlike bronchial biopsies, of sampling inflammation in the lung periphery. BAL can generally be safely performed 490, providing careful assessment is performed and guidelines are adhered to. In general, fluid recovery is greater in patients with less extensive emphysema, as assessed by diffusion capacity 491. BAL may be performed in the same patients as bronchial biopsy, thus providing additional and complementary information.

Cellular composition

The cellular composition in individuals with COPD is predominantly (>80%) alveolar macrophages, with some neutrophils and T-lymphocytes, and some patients having increased numbers of eosinophils. In general, the percentages of macrophages and neutrophils are significantly higher than in healthy nonsmokers, and also frequently reported in healthy smokers. Studies investigating individuals with COPD, healthy smokers and ex-smokers show that, generally, smoking is associated with increased numbers of neutrophils. Lymphocytes are generally higher in ex-smokers than in smokers, whether with or without COPD. Moreover, some patients with COPD have higher eosinophil percentages than healthy smokers, a finding that is not consistently shown in publications. Alveolar macrophages may be separated by adhesion and cultured in vitro. Macrophages from COPD patients behave abnormally in tissue culture, with increased expression of inflammatory proteins, such as tumour necrosis factor (TNF)-α, IL-8 and matrix metalloprotein (MMP)-9 492, 493. It may be possible in the future to study the effects of treatment in patients on cellular behaviour in vitro.


Several mediators can be measured in BAL fluid. Levels of eosinophil cationic protein, myeloperoxidase and IL-8 are frequently increased in COPD patients and in healthy smokers compared with healthy nonsmokers, an observation suggesting that smoking induces the changes rather than COPD itself. Two studies investigated tryptase and histamine levels and showed that COPD patients had higher levels as well, suggesting mast cell activation in COPD 494, 495. However, data were not compared with healthy smokers, and thus the increase in mast cell mediators may be completely attributed to smoking itself. This is also suggested by findings that adenosine monophosphate responsiveness diminishes after smoking cessation 496. Studies investigating other mediators have not been replicated and are not discussed herein.

Effect of smoking and disease severity

In one study 497 smokers with COPD had lower mast cell numbers in BAL than ex-smokers with COPD; no other studies have compared smokers and ex-smokers with COPD. Only one study has investigated the association between the severity of COPD and BAL inflammation and shows that healthy smoking males with near-normal FEV1 present signs of inflammation in the lower airways that are related to a decrease in DL,CO and to emphysematous lesions on HRCT 498. This inflammation seems to be the result of macrophage and neutrophil activation, as assessed by mediators measured in BAL. In contrast, in a healthy population, the number of inflammatory cells did not correlate with lung function decline over a 4-yr follow-up. However, higher levels of neutrophil elastase-α1 protease inhibitor complexes in BAL fluid were significantly correlated with an accelerated decline in FEV1 499. This also suggests that the number or percentage of cells is not a prerequisite for the development or progression of emphysema, but that the activation state of these cells with accompanying mediator release is important.

Effects of interventions

There are few published studies of the effects of different treatments on BAL cellular and mediator components. Three studies, one open label and two double-blind, assessed the effect of different types of inhaled corticosteroid for various periods of treatment on inflammatory cell counts and mediators in BAL. Though the numbers of patients involved were small, precluding firm conclusions, these studies suggest that there may be a reduction in the percentages of neutrophils and lymphocytes with inhaled corticosteroid treatment; however, long-term studies in larger populations must elucidate whether this is indeed the case. Some studies have investigated the effects of smoking cessation on BAL composition, showing inconsistent decreases in cell numbers, particularly macrophages 500, 501.


BAL is an invasive procedure and may cause more discomfort to the patient than bronchial biopsy. It may also cause transient fever 490. The return of fluid is often reduced in COPD patients, resulting in samples that are inadequate for analysis. The quantification of biomarkers in supernatant is a problem as there is no satisfactory marker for the dilution of the saline lavage. This is one of the factors that may contribute to the variability in measurements and the necessity for relatively large numbers of patients.


Many COPD patients produce suitable sputum spontaneously, but spontaneous sputum may contain a high proportion of dead cells 502, which potentially provide misleading cell counts and mediator measurements 503, 504. For this reason, induced sputum has usually been the procedure of choice. It should be recognised that sputum obtained after inhaling nebulised hypertonic saline may have a different composition than mucus and may be more similar to a washing of the proximal airways. The procedure is tolerated by patients with FEV1 >30% pred. However, airflow obstruction is often observed 505, 506 and cannot be totally prevented by pre-medication with β2-agonists 507.

Inflammatory cells

There is an abnormal pattern of inflammatory cells in COPD patients, with an increase in number of total inflammatory cells in the percentage of neutrophils and, in some patients, eosinophils (the latter predicting a greater response to corticosteroids) 508, 509. CD8+ T-cells are increased in induced sputum of COPD patients 510. Neutrophils have been studied most extensively and are increased in number compared with matched smokers with normal lung function 511. Several studies have reported the effects of drugs on sputum neutrophils. Most studies have not shown a change in inflammatory cells with inhaled or oral corticosteroids 512514, although a reduction with oral theophylline has been reported 515.

However, it is important to note that, although there is some evidence for the long-term reproducibility of inflammatory cells and mediators 516, the amount of evidence present is still preliminary since the sample sizes investigated are usually too small to extrapolate to the world COPD population.

Inflammatory mediators

Many mediators have been reported to be increased in the supernatant of COPD patients and most show a greater increase in normal smokers than in COPD, with a further increase during exacerbations; however, few have been related to disease severity or progression. Sputum IL-8 has been studied most extensively and is increased in COPD patients compared with smokers, is related to disease severity (FEV1 % pred) and is further increased with exacerbations 511, 517, 518. Sputum concentrations are unaffected by corticosteroids but reduced by theophylline 512514, 519, 520. Increased proteases have been reported in sputum of patients with COPD, including neutrophil elastase 521 and MMP-8 and -9 522524.


Although induced sputum samples are relatively easy to obtain in COPD patients and give a lot of information about inflammatory cells and mediators, there are several problems that need to be addressed. Induced sputum samples were obtained from predominantly large airways 525 and may not reflect the peripheral inflammation that may be important for clinical outcomes in COPD. Sputum induction with hypertonic saline induces neutrophilic inflammation that persists for 24 h and thus repeated sampling within this period is not possible 525, 526. Solubilisation of sputum with dithiotheitol (DTT) may disrupt sulphydryl and alter proteins so they are not recognised by antibodies 527. This is a particular problem with several cytokines and chemokines. Furthermore, proteases in sputum, particularly in COPD, may degrade certain protein mediators. A recent study using dialysis to remove DTT and protease inhibitors 528 showed that it is possible to markedly increase the concentrations of several cytokines in induced sputum of COPD patients. More work is needed on the long-term reproducibility in COPD patients, studying the effect and duration of exacerbations and correlating individual biomarkers with severity and progression.

Exhaled gases

Measuring biomarkers in the breath is a very attractive approach to monitoring COPD airways inflammation as it is noninvasive and makes repeated sampling possible 529, 530. However, there are important issues regarding reproducibility and sensitivity that need to be addressed before this approach can be recommended as an outcome measurement.

Nitric oxide

Exhaled nitric oxide (eNO) has been extensively investigated in asthma and shown to correlate with eosinophilic airway inflammation and to be reduced by corticosteroid therapy. There are ERS and ATS recommendations for measuring exhaled nitric oxide fraction (FeNO) 531, 532. The measurement is highly reproducible in normal and asthmatic subjects if careful attention is paid to technique 533. However, conventionally measured eNO is less useful in COPD as the levels are usually normal or only slightly elevated, except during exacerbations 534538. This is likely to be due to the increase in oxidative stress, resulting in the formation of peroxynitrite and nitrate, so that nitric oxide (NO) is removed from the gaseous phase. This also explains why eNO is reduced in normal smokers 539.

Recently, the measurement of eNO has been extended by performing measurements of eNO at different flows, so that it is possible to partition airway-derived NO, which is flow-independent, and peripheral NO derived from alveoli and probably the small airways. Using this technique it is possible to show that, while airway NO is low or normal on COPD, there is an increase in peripheral NO that is related to disease severity 540. This may reflect the increase in inducible NO synthase in the lung periphery of patients with COPD 541. This peripheral NO may prove to be a useful noninvasive biomarker of COPD inflammation but further studies on reproducibility relationship to disease severity and the effects of treatments are now needed.

Carbon monoxide

Although it is easy to measure CO in the breath, this has not proven to be as useful a measurement as FeNO. Exhaled CO is elevated in patients with COPD but it is also elevated in normal smokers due to the high CO content in cigarette smoke 538, 542. Exhaled CO is elevated to a greater extent in COPD than in matched normal smokers and remains elevated in sustained ex-smokers. However, the signal is small and the measurement is also confounded by highly variable environmental CO levels and the effects of passive smoking, so further evaluation is not warranted.


Volatile hydrocarbons, such as ethane and pentane, have been detected in exhaled breath and are biomarkers of lipid peroxidation as a result of oxidative stress. Concentrations of ethane are elevated in patients with COPD and correlated with disease severity 542. Measurement of ethane by gas chromatography–mass spectrometry offline is difficult, so this measurement is unlikely to be useful in clinical trials, but smaller and more sensitive detectors for hydrocarbons are now in development.

Exhaled breath condensate

Many mediators have now been detected in exhaled breath condensate (EBC), which has the advantage that it is easy to perform and completely noninvasive 543. Several factors affect the measurement and recommendations have recently been formulated by an ERS/ATS Task Force 544. A limitation of the technique is the variability of the measurement and the low concentrations of mediators detected.

Oxidative/nitrative stress

Hydrogen peroxide (H2O2) is increased in EBC of COPD patients, is further increased during exacerbations 545 and is related to disease severity 546. Exhaled H2O2 is reported to be reproducible in repeated measurements over 3 days 547. 8-Isoprostane is a stable marker of oxidative stress and is also increased in EBC of COPD patients. Concentrations of 8-isoprostane are greater in COPD patients than normal smokers and are related to disease severity 548550, and further increased during exacerbations 551. Certain aldehydes resulting from lipid peroxidation are also increased in COPD patients but only malondialdehyde is increased in COPD patients compared with normal smokers 552. Increased nitrosative stress in COPD is indicated by increased concentration of nitrite and nitrosothiols in EBC 553.

Inflammatory mediators

Inflammation is associated with tissue acidification and there is a decrease in pH in EBC of COPD patients 554. There is considerable variability in exhaled pH in COPD patients, which is greater than in normal subjects 555. There is an increase in the concentration of leukotriene B4 in COPD patients, which is further increased during exacerbations 551, 556, 557. Increases in prostaglandin E2 and IL-6 have also been reported in COPD patients 556, 558. It is unclear how most of these biomarkers relate to disease severity and patient-centred outcomes. Most protein mediators, including cytokines and enzymes, cannot reliably be measured in EBC.


There is a relative high variability in repeated measurements of EBC biomarkers and this may relate to the extensive variable dilution that occurs from water vapour during condensation and the low concentrations that may be near to the detection limits of the assays used 559. Further work is needed to optimise these measurements and to determine the causes of variability. Correction for the variable dilution is one approach 560. Assays are usually performed using ELISA and these assays have been validated using gas chromatography–mass spectrometry for some mediators 561, 562.


In recent years, researchers have hypothesised that a “low-grade” systemic inflammation may play an important role in the pathogenesis of systemic complications observed in chronic respiratory diseases such as COPD 248, 252, 268, 563568. In the Third National Health and Nutrition Examination Survey of the US population 564, a number of markers of systemic inflammation were shown to be associated with active smoking and reduced FEV1. The markers investigated were C-reactive protein (CRP), fibrinogen, leukocytes and platelets in serum and plasma. However, their level of involvement has since been questioned due to the cross-sectional methodology used and the lack of information regarding the temporal aspects and biological plausibility of this observed association 569. Previous studies 566, 568, 570584 have suggested roles for systemic CRP, fibrinogen, leukocytes, TNF-α, IL-6 and IL-8 in COPD and its exacerbations, but more longitudinal randomised controlled studies with larger sample sizes are needed to confirm their specificity and sensitivity as biomarkers in patients with reduced lung function 441.

Regulatory issues

Many drugs are now in development as potential anti-inflammatory therapies for COPD 585. However, as there is no effective anti-inflammatory treatment for COPD, it is uncertain how much and how rapidly clinical parameters will change in patients. This makes it important to develop reliable biomarkers to quantify inflammation in COPD patients and to validate these against some other measure of disease activity and progression. For assessment of anti-inflammatory treatments it is important to identify biomarkers that indicate the efficacy of the drug on components of the inflammatory process before proceeding to large and prolonged clinical trials. Biomarkers can facilitate drug development in a number of ways, such as: 1) providing evidence that a drug can reach its target and modify that target in some positive way; 2) identifying criteria for dose selection for phase-2 and -3 studies; 3) providing “go-no-go” decisions at early stages of the drug-development process; 4) identifying populations that are more likely to benefit from a drug; and 5) predicting safety problems.

There are several types of drugs that can be developed for COPD based on whether the drug is intended to improve airflow obstruction, provide symptom relief, modify or prevent exacerbations, alter disease progression, or modify lung structure. The efficacy end-points that are currently used in phase-3 studies to support registration of a drug for COPD are based on measures that translate to direct benefit of some aspects of the disease that is important to patients, such as improvement of symptoms, functional capacity, HRQoL or survival. With the possible exception of drugs that are intended to improve airflow obstruction, whose efficacy can be relatively easily assessed by measuring FEV1 in short-term studies, drugs of other types are likely to require prolonged studies, often extending to many years. These studies become rather risky and expensive endeavours and this further underscores the need for the development of biomarkers.

The biomarkers described elsewhere in the present report are not sufficiently validated to date for use as evidence of efficacy in phase-3 studies or for supporting specific labelling claims. Nevertheless, these biomarkers are reflective of the disease and have potential use for regulatory purpose. Carefully selected biomarkers with or without a patient-centred clinically meaningful end-point can be used in early phase studies, such as proof-of-action or proof-of-concept studies, based on which a rational decision can be made on further development of the drug. Biomarkers can also be used in either early phase studies or phase-3 studies to support the drug's putative mode of action. In addition, use of the biomarkers in phase-3 studies in conjunction with clinically meaningful end-points may help validate the use of the biomarker, or even help elevate a biomarker to a surrogate end-point status.


Although many pulmonary biomarkers have been described in COPD patients, there is little information regarding reproducibility and correlation with other outcome measurements in COPD (i.e. dyspnoea, HRQoL, exacerbation frequency and mortality). In the future, these biomarkers need to be assessed in normal smokers and age-matched normal subjects and linked to disease stage (and rate of FEV1 decline), clinical phenotype (emphysema versus small airway disease), smoking status (current versus ex-smokers), clinical status (stable versus exacerbation) and treatment (effect of corticosteroids, theophylline, etc.). Further research in this area is important as pulmonary biomarkers may be useful in the future for predicting clinical outcomes of COPD and for assessing new therapies which may modify the inflammatory/destructive disease process.


By the time patients with COPD seek medical attention, they usually have significant symptoms, especially dyspnoea, reduced exercise performance and impaired health status. These aspects of COPD morbidity have been investigated for many years and their association with the disease process has resulted in measurable outcomes used for the assessment of pharmacological treatment. The ATS/ERS Task Force has summarised many of these outcomes in tables 69. Although some of these outcomes have been shown to change with therapy, their observed changes are not always reflected by changes in traditional measures of disease severity such as FEV1. This is because other pathophysiological (e.g. dynamic hyperinflation of the lungs) and psychological (e.g. coexisting anxiety) influences also affect these outcomes. Therefore, changes in FEV1 with therapy should not be regarded as a surrogate for changes in dyspnoea, exercise performance or HRQoL. These variables should be measured separately to complement other markers of physiological impairment when assessing a therapy for COPD 6.

Therefore, it is necessary that COPD trials include lung function parameters other than FEV1; for example, FVC and IC to TLC ratio, measures of dyspnoea, functional status, health status and HRQoL, exercise tolerance, and breathlessness after exercise.

The frequency of exacerbations is another important outcome that should be considered in COPD pharmacological trials. The definition used for an exacerbation can significantly affect trial outcomes to the extent that any observed treatment benefit may vary. A general definition, such as “an exacerbation of COPD is an increase in respiratory symptoms over baseline that usually requires medical intervention”, may be more applicable. However, exacerbations should be classified according to a severity scale.

Currently, no well-validated biomarker or surrogate marker of COPD or its exacerbations has been identified other than FEV1, but the value of FEV1 as a surrogate marker is limited. Mortality, dyspnoea and HRQoL remain the most important and robust clinical outcomes in COPD research. Care should be taken to include other potential surrogate markers as secondary end-points in future clinical trials. This may lead to the identification of biomarkers that correlate with patient-centred outcomes. Generation of such data may also help in the development of new hypotheses for future clinical trials 3.

Based on the rate of disease progression and the frequency of exacerbations, it is now recognised that pharmacological trials in stable chronic obstructive pulmonary disease should be ≥6 months in order to examine potential outcomes or support claims of treatment response, particularly for regulatory submissions. However, due to seasonal variation, an evaluation of exacerbation frequency requires a period of ≥1 yr and, in any case, the timing of the study treatment may prove important (e.g. capturing winter cold season in the majority of patients).

A minimal important difference between treated and untreated groups in small studies using interventions of limited efficacy may not be observed, bearing in mind the slow progression of chronic obstructive pulmonary disease (as measured by forced expiratory volume in one second). However, those who design clinical trials should be aware that the comparison of proportions of patients reaching the minimal important difference may provide important information, even if the mean effect does not exceed the minimal important difference. In general, a randomised controlled trial design is the most useful study design for determining the effect of treatment on outcomes, including the rate of forced expiratory volume in one second loss and the change in frequency of chronic obstructive pulmonary disease exacerbations. Only placebo-controlled trials enable analysis of the effect of active treatment, but control groups should always receive the best available proven treatment and, consequently, use of placebo raises ethical issues.

Support statement

This Task Force has received a grant for its activity directly from the ERS.

Statement of interest

Statements of interest for F.J. Martinez and S.I. Rennard can be found at

Fig. 1—

Example of a cost-effectiveness plane comparing drugs A and B. QALYs: quality-adjusted life-years.

Fig. 2—

Example of an acceptability curve. #: drug A is cost-effective compared with drug B.

View this table:
Table 1—

Confounder/effect-modifier criteria for assessment of published chronic obstructive pulmonary disease (COPD) outcomes and markers

View this table:
Table 2—

Variables that predict mortality

View this table:
Table 3—

Body mass index(BMI) and fat-free mass index (FFMI) ranges for the general and chronic obstructive pulmonary disease (COPD) populations

View this table:
Table 4—

Suggested minimal important differences(MIDs) of commonly used outcomes in chronic obstructive pulmonary disease (COPD) trials

View this table:
Table 5—

Minimal important difference(MID) estimates for forced expiratory volume in one second (FEV1)

View this table:
Table 6—

Summary of various chronic obstructive pulmonary disease(COPD) outcomes from lung function and patient-reported outcomes

View this table:
Table 7—

Summary of various chronic obstructive pulmonary disease(COPD) outcomes from patient-reported outcomes

View this table:
Table 8—

Summary of various chronic obstructive pulmonary disease(COPD) outcomes from patient-reported outcomes, and exacerbations and exercise

View this table:
Table 9—

Summary of various chronic obstructive pulmonary disease(COPD) outcomes from mortality, social and economic burden, computed tomography (CT) imaging, nonpulmonary markers of disease and biomarkers


Affiliations were as follows. From Italy: M. Cazzola, University of Rome Tor Vergata and P. Palange, University of Rome La Sapienza, Rome; V. Brusasco, University of Genoa, Genoa; and M. Pistolesi, University of Florence, Florence. From the UK: W. MacNee, Colt Research Laboratories, Medical School, Edinburgh; L.G. Franciosi, King's College London, P.J. Barnes, National Heart and Lung Institute, Imperial College London, P.W. Jones, St George’s Hospital, University of London, C.P. Page, Sackler Institute of Pulmonary Pharmacology, King’s College Hospital Medical School and J.A. Wedzicha, University College London, London; P.S. Burge, Birmingham Heartlands Hospital and R. Stockley, Queen Elizabeth Hospital, Birmingham; D. Parr, University Hospitals of Coventry and Warwickshire, Coventry; and P.M.A. Calverley, University Hospital Aintree, Liverpool. From the USA: F.J. Martinez, University of Michigan Medical School, Ann Arbor, MI; B.R. Celli, Caritas-St. Elizabeth’s Medical Center, Boston, MA; D.A. Mahler, Dartmouth-Hitchcock Medical Center, Lebanon, NH; B. Make, National Jewish Medical and Research Center, Denver, CO; and S.D. Sullivan, University of Washington, Seattle, WA. From the Netherlands: K.F. Rabe, Leiden University Medical Center, Leiden; M.P. Rutten-van Mölken, Erasmus MC-University Medical Centre Rotterdam, Rotterdam; and E.F. Wouters, University Hospital Maastricht, Maastricht. From Spain: M. Miratvilles, Hospital Clínic, Barcelona.

Participants in the Task Force report were as follows.

From the UK: I.M. Adcock (Imperial College School of Medicine, London), N. Barnes (London Chest Hospital, London), P.J. Barnes (National Heart and Lung Institute, Imperial College London, London), P.S. Burge (Birmingham Heartlands Hospital, Birmingham), P.M.A. Calverley (University Hospital Aintree, Liverpool), R. Djukanovic (University of Southampton, Southampton), P.W. Jones (St George’s Hospital, University of London, London), S. Kharitonov (National Heart and Lung Institute, Imperial College London, London), W. MacNee (Colt Research Laboratories, Medical School, Edinburgh), D. Parr (University Hospitals of Coventry and Warwickshire, Coventry), R. Stockley (Queen Elizabeth Hospital, Edgbaston, Birmingham), J. Vestbo (University of Manchester, Manchester) and J.A. Wedzicha (University College London, London).

From Sweden: L. Bjermer (University Hospital, Lund) and C.G. Löfdahl (Lund University, Lund).

From Italy: V. Brusasco (University of Genoa, Genoa), M. Cazzola (University of Rome Tor Vergata, Rome), G. D'Amato (A. Cardarelli Hospital, Naples), C.F. Donner (Multidisciplinary and Rehabilitation Outpatient Clinic, Borgomanero), L.M. Fabbri (University of Modena, Modena), P.L. Paggiaro (University of Pisa, Pisa), P. Palange (University of Rome La Sapienza, Rome), M. Pistolesi (University of Florence, Florence), A. Rossi (Bergamo General Hospital, Bergamo), M. Saetta (University of Padua, Padua), G. Viegi (CNR Institute of Clinical Physiology, Pisa) and A.M. Vignola (Institute of Lung Pathophysiology, National Research Council, Palermo).

From Germany: R. Buhl (University Hospital, Mainz) and H. Magnussen (Center for Pneumology and Thoracic Surgery, Großhansdorf).

From the USA: S. Buist (Portland Oregon Health and Science University, Portland, OR), B.R. Celli (Caritas-St Elizabeth’s Medical Center, Boston, MA), D.A. Mahler (Dartmouth-Hitchcock Medical Center, Lebanon, NH), B. Make (National Jewish Medical and Research Center, Denver, CO), F.J. Martinez (University of Michigan Medical School, Ann Arbor, MI), S.I. Rennard (University of Nebraska Medical Center, Omaha, NE), B. Chowdhury (Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD) and S.D. Sullivan (University of Washington, Seattle, WA).

From Denmark: R. Dahl (Aarhus University Hospital, Aarhus).

From Japan: M. Ichinose (Wakayama University Hospital, Wakayama).

From the Netherlands: D. Postma (University Medical Center Groningen, Groningen), K. F. Rabe (Leiden University Medical Center, Leiden), M.P. Rutten-van Mölken (Institute for Medical Technology Assessment, Rotterdam) and E.F. Wouters (University Hospital Maastricht, Maastricht).

From Belgium: G. Joos and R. Pauwels (Ghent University Hospital, Ghent).

From Spain: M. Miravitlles (Hospital Clínic, Barcelona) and J. Roca (Hospital Clínic, IDIBAPS, Universitat de Barcelona, Barcelona).

From Argentina: L. Nannini (G. Baigorria Hospital, National University of Rosario, Rosario).

From Ireland: D. Lyons (Irish Medicines Board, Dublin).


  • For editorial comments see page 238.

  • Received July 31, 2006.
  • Accepted November 20, 2007.