|
|
||||||||
1 St George's Hospital Medical School, London, UK. 2 Dept of Family and Preventive Medicine, University of California, San Diego, CA, USA
CORRESPONDENCE: P.W. Jones, St George's Hospital Medical School, London SW17 ORE, UK. Fax: 44 2087255955. E-mail: pjones@sghms.ac.uk
Keywords: chronic obstructive pulmonary disease, health status, measurement standardisation, outcome measures, utility-based measures
Received: August 21, 2002
Accepted February 20, 2003
| Abstract |
|---|
|
|
|---|
Specific outcomes measure a single biological variable, such as forced expiratory volume in one second or depression. The specificity of such measures is attractive but requires precise definition of what is being measured and why. Other, summative, outcomes are used to quantify the overall effect of a number of different biological processes.
The simplest summative measures are global questions such as "How would you rate your health overall?" Others are complex with many items. If designed and used correctly, these questionnaires can provide an estimate of the overall impact of disease or response to therapy and an index of whether that response was clinically worthwhile.
Standardisation of measurements is important to permit comparisons between patients and studies, which makes the measurement of an individual's "quality of life" difficult. The term "health-status measurement" may be better when referring to the use of standardised questionnaires. Utility-based measures help address concerns regarding clinical versus statistical improvement and place outcomes for chronic obstructive pulmonary disease treatment trials in the context of all healthcare treatments.
Chronic disease usually has three types of effects, which in chronic obstructive pulmonary disease (COPD) would be defined as follows: primary effects in lungs, which may be structural or mechanical; secondary effects in other organs, such as muscles and circulation; and tertiary effects, which involve an interaction between patients and their environment.
From the patient's perspective, health is related to better functioning, symptom relief and longer life 13. However, life duration and quality of life by themselves are not the only important outcomes; all effects need to be taken into account when evaluating treatment. Effects on pulmonary function or secondary effects on organs are important because they may reduce quality of life or shorten life expectancy 4, 5. If pulmonary function had no effect on these outcomes, it would be of little concern 1.
Against what criteria should measures be judged? Is it appropriate to evaluate outcome measures against forced expiratory volume in one second (FEV1), maximum oxygen consumption and diffusing capacity? A substantial number of studies in the literature show that the correlations between physiological outcomes and measures of health-related quality of life (health status) are modest, but much of the variance in the latter is not explained by physiological variables 6, 7.
Nevertheless, several lines of evidence suggest that health-status measures are important. For example, it is a significant, prospective, predictor of mortality for patients with advanced lung disease 5. Furthermore, improved quality of life is what patients want to achieve with their medical treatment. When seeking care, patients want relief from shortness of breath, the ability to function in the community and the capacity to perform activities of daily living 1. A treatment that alters a physiological parameter, such as FEV1, but does nothing for quality of life, may not be successful.
A wide variety of measures suitable for assessing outcomes are available 812. The selection of which measure to use will depend on the aspect of the disease that is being addressed and the purpose of the study 1315. Measurement of outcome for patients with COPD may be different than for patients with other chronic diseases and may involve more significant challenges. To illustrate this point, consider comparing outcomes for patients treated for COPD with those of patients treated for osteoarthritis of the hip or for cataracts. Patients with osteoarthritis of the hip are often treated with total joint replacement surgery. Many studies using both generic and disease-targeted measures have demonstrated clinical improvement 1620. A significant number of studies have evaluated patients undergoing cataract extraction with lens replacement and show substantial changes with disease-targeted measures, but only modest changes using generic measures 21, 22. In both osteoarthritis of the hip and cataract disease, surgical interventions produce substantial treatment benefit.
Although treatments for COPD may not produce dramatic benefits seen for total joint replacement or cataract replacement, quality of life can improve after some treatments for COPD patients. In particular, several studies have documented improvements in quality of life following participation in rehabilitation programmes 23. Quality-of-life measures, such as the 36-item Short Form (SF-36) of the Medical Outcomes Study 24, the Quality of Well-Being Scale (QWB) 25, the St George's Respiratory Questionnaire (SGRQ) 26 and the University of California at San Diego Shortness of Breath Questionnaire 6 are sensitive to relatively minor changes for COPD patients.
| Classifying outcomes |
|---|
|
|
|---|
Specific outcomes
Specific outcomes measure a single biological variable such as FEV1 or depression. Their characteristic attribute is that they address a unidimensional construct (for example, the degree of airway obstruction or a particular mood disturbance).
Use of a specific outcome is attractive because it should be clear what is being measured. That very specificity, however, requires a precise definition of the question being asked. This may be illustrated by the choice of outcome used to assess the effect of long-acting bronchodilators, agents that act by inducing airway smooth muscle relaxation, which cannot be measured in vivo. Indeed, it is noteworthy that even at the level of physiological function, outcome parameters other than those produced by the immediate action of the drug must be used. In other words, the outcomes used in practice are often surrogate measures of the drug's basic physiological action. The inability to measure airway smooth muscle relaxation directly may be important only occasionally, because it is the consequence of that process that is clinically relevant. Furthermore, the process of interest does not occur in isolation, but is taking place in the context of other primary and secondary effects of the disease and may be modified by them.
Measurable outcomes of airway smooth muscle relaxation caused by long-acting bronchodilators include FEV1, forced inspiratory flow, inspiratory capacity, slow vital capacity and end-expiratory lung volume. Some of these measurements are the direct result of changes in the airway wall, but others are influenced by lung volumes, which may themselves be improved through a reduction in volume of trapped gas. Other relevant physiological variables that are more difficult to collect are dynamic end-expiratory and end-inspiratory lung volumes, yet these may be more closely associated with breathlessness during exercise than spirometric measurements obtained at rest 27, 28. Some benefits of reduced bronchomotor tone, such as those that occur during sleep or an acute exacerbation, may be timing- or state-specific and only vaguely related to spirometric measurements made during the day in a laboratory and in a stable state.
The selection of a particular specific outcome should depend on the study's purpose, clinical efficacy or a mechanistic explanation of drug effects. If the study is not directed primarily at elucidating mechanisms, a specific outcome should be chosen because it may provide pathophysiological confirmation that the therapy produced clinical benefit through its postulated mechanism of action. Unfortunately, a specific mechanistic outcome is more often chosen because of ease of measurement than for sound scientific reasons.
Global outcomes
Global or summative outcomes are used to quantify the overall effect of a number of biological steps 29, but they may not be recognisable immediately as being such, because some of them measure factors that appear to be unidimensional. One example is exercise performance because this physiological outcome is determined by cardiac, pulmonary, circulatory, and peripheral muscle function, taken together with the sensations of breathlessness and fatigue 6, 7, 30. Even the FEV1 is a summative measure in COPD (as opposed to asthma) because it reflects both disease in the airway wall and the loss of alveolar attachments caused by emphysema. This summative property of the FEV1 is employed in practice because it is used to define the severity of COPD regardless of underlying pathophysiology 9, 31.
Health status is more readily recognisable as a summative measure. In theory, it is easy to conceptualise health as being a single construct, but in practice, such measurements address a range of different aspects of disturbance to health and well-being. Some questionnaires, such as the generic SF-36 32, do not even provide a single summative scale of overall health impairment, and present their results as a profile of scores or as physical or mental summary scores. Conversely, other generic instruments, such as the Sickness Impact Profile 33 and the QWB 34, do provide a total score, as do some disease-specific questionnaires for COPD, such as the SGRQ 35.
Health-status questionnaires are complex instruments, but other global outcomes, such as asthma severity scores used on diary cards, use much simpler techniques. Typically, patients are asked to rate their overall symptom level using a three- to seven-point category scale (e.g. none, mild, moderate, severe). Similar techniques are used for assessing the overall efficacy of therapy by patients or physicians. Scores of this type are now being used in COPD clinical trials. In contrast to the total scores obtained from complex questionnaires, such outcomes are pure global scores because they are not calculated from responses to multiple discrete items. Their chief disadvantage is that it is never clear how an individual is making a judgment as to the overall level of symptoms, state of health or the effect of therapy.
Global outcomes offer a number of attractive properties. If designed and used correctly, they may provide a measure of the overall impact of disease or response to therapy 2, 3639. This may be especially useful when a treatment has multiple beneficial actions. Global outcomes may also be more sensitive to treatment than specific outcomes because they have the potential to aggregate multiple small effects together. Each treatment effect may not be large in itself, but becomes of significant benefit when seen together with other effects.
Global scores may be useful in one other respect. They are high-level outcomes and thus they may be closer to constructs that are relevant to patients and physicians alike. As a result, concepts such as a "worthwhile" improvement in exercise tolerance or reduction in symptoms may be easier to conceptualise than a worthwhile improvement in FEV1. The latter has little immediate or obvious worth to a patient, unlike exercise performance or reduced breathlessness, each of which has an intrinsic worth. Thus, improvement in FEV1 may be perceived as worthwhile only because it is associated with an improvement in other measures of clinical outcome.
When using a global outcome measure, it is important to recognise that its role is to summarise and aggregate. It can demonstrate that a change has occurred and provide an assessment of whether that change is clinically significant, but it may not identify the mechanisms. In this respect, it should be used for hypothesis generation rather than hypothesis testing.
| Measurement properties |
|---|
|
|
|---|
In terms of questionnaires that were designed to be responsive to therapeutic intervention, the utility of the instrument will depend upon three additional properties: reliability, the ability of the instrument to perform in the same manner in different settings with different operators; repeatability, the stability of the measurement when the testing conditions and patient are stable; and sensitivity, the ability to detect changes 8, 43, 44. This latter property may depend in part upon a trade-off between the other two properties. For example, a measure that is reliable because it has very broad categories of response and has high repeatability may have poor sensitivity due to lack of precision; it cannot detect small changes or discriminate between small differences. The parameter that most clearly defines an outcome's usefulness is its signal-to-noise ratio (i.e. the ratio of sensitivity to repeatability) 45. A highly sensitive outcome will be of practical value only if it is reliable.
| Significance of outcome measurements |
|---|
|
|
|---|
It is of fundamental importance to distinguish between statistical significance and clinical significance. The former depends on the size of the study as much as on the size of the effect. Small effects may be rendered statistically significant if the study is sufficiently large. Clinical significance is a much more useful concept but one that is difficult to define and measure 10. Value judgments are always required at some stage in the establishment of thresholds for a clinically significant effect or minimum clinically important difference. Such judgments are required whether the outcome being validated is a quality-of-life score or a physiological measure.
When establishing thresholds for clinical significance, it is necessary to prespecify the criteria used for assessing what magnitude of change in an outcome will be judged clinically significant. These criteria will also require a selection of other outcomes to be used as a reference standard for what constitutes clinical significance 10. Such references may include patient/physician global judgments, clinically significant changes in another clinical variable and prediction of future events (e.g. death, exacerbations and hospital admission).
It may not always be necessary, or even a worthwhile enterprise, to produce thresholds for clinical significance for all outcomes used in COPD. Although it may be possible to produce reliable estimates for a clinically significant threshold for changes in FEV1 in patients with COPD, is it worthwhile to do so? The outcomes used to establish criteria for a clinically significant improvement in COPD can themselves be measured in a clinical trial, and the degree of association between FEV1 and these outcomes is only modest, such that a clinical threshold for FEV1 would merely be a weak surrogate for a clinical outcome's threshold.
Issues surrounding the establishment of thresholds for clinical significance are complex and are reviewed in detail elsewhere 10. Thresholds for the 6-min walking distance 48, CRQ 49 and SGRQ 50 are available, but it is important to appreciate that these are mean estimates obtained from patient groups. These thresholds are helpful, but they should be used only as indicative values, not as rigid or high-precision boundaries between that which is worthwhile to a patient and that which is not 10.
| Measurement standardisation |
|---|
|
|
|---|
Some physiological measures, such as FEV1, are expressed in agreed-upon standard units and have criteria for the adequacy of a measurement. Such standardisation is the result of years of custom, practice and international agreement. By contrast, psychology has a number of scales for measuring depression, but none is universally accepted and consequently there is no standard measurement unit. That said, one or two depression scales are now widely used in respiratory medicine, an example being the Hospital Anxiety and Depression Scale 51. However, it will be some time before a particular scale becomes the de facto standard.
The concept of a "health-related, quality-of-life measurement" provides a challenge to standardisation 2, 10. Life is potentially too rich and varied to capture standardised quality-of-life effects in individuals, even for the most socially restricted of COPD patients. For example, the inability to play with grandchildren may be an important factor in the lives of many patients with COPD, but often, reasons unrelated to health may restrict this activity. As a result, an item in a questionnaire would need to be worded along the lines of "If you have suitably aged grandchildren with whom you would wish to play but are unable to do so solely because of breathlessness or fatigue, please check the box." This complex item, with conditional and specific requirements, may have low repeatability and would certainly present the developer of the questionnaire with the problem of how to handle the "not applicable" responses in the scoring system. Furthermore, the presence of items that are "not applicable" reduces the number of items that can be used by some patients and, thereby, the instrument's precision.
In clinical trials, all measurements should be made using an instrument that is appropriate to the task. Each patient must be evaluated using a standardised questionnaire that is suitable for every individual who is being assessed. In this context, standardisation means that all items in the questionnaire are common (at least potentially) to all patients with the disease. The consequence of this item selection process is that the resulting scores are population-based estimates of health that may not reflect precisely any given individual's actual health impairment. This is in no way different from the use of the FEV1 expressed as a percentage of age-, sex-, height- and race-matched predicted values for assessing an individual patient's degree of airway obstruction. Such estimates are based upon population norms and not the patient's own premorbid state. By analogy, the items in health-status questionnaires are those that reflect the usual effect of the disease in a population of patients with COPD.
Standardisation should also apply to symptom measurement. Diary cards have been used for many years as an outcome in asthma and they are now being used in COPD. Data from such diaries are used to calculate mean symptom scores and also to calculate derived parameters such as "symptom-free days" in asthma and "bad days" in COPD. Both are potentially valuable measurements, but there is no consensus (except perhaps within a given pharmaceutical company) concerning the wording and number of response categories in the diary. This is important because a recent study has shown that one diary card question, phrased to address the level of asthma symptoms, produced a more severe mean score over 14 days than a similar question in the same diary that addressed the effect of asthma on daily life 52. Diary cards are also being used to identify exacerbations prospectively 53, but again, there is no consensus concerning the level and duration of change in symptoms (or FEV1 or peak expiratory flow) that constitutes an exacerbation. In view of the increased appreciation of the importance of these events, agreement on methods of identifying the occurrence of an exacerbation must be sought soon.
| Utility-based measures |
|---|
|
|
|---|
Utility-based, health-outcome measures place levels of wellness on a continuum ranging from death (0.0) to perfect health (1.0). These measures represent a significant refinement over traditional survival analysis that considers each individual in a binary (alive or dead) fashion. Utility weights can be used to represent levels of wellness along this continuum and are often applied to "quality adjust" survival time 29. Utility-based measures put clinical effect size into context by showing how observed differences map on the continuum between optimum function and death. For example, COPD patients participating in rehabilitation programmes improve by
0.04 units. This is four hundredths of the distance between death and perfect health. The numbers can be used to estimate quality-adjusted life years, as well as the duration of the benefit 34.
The three most commonly used methods are the EuroQol (EQ)-5D 56, the Health Utilities Index (HUI) 57, and the QWB 25. The EQ-5D has been created by a collaborative group from Western Europe known as the EuroQol group 56. Its method has been validated in postal surveys in England, Sweden and the Netherlands. More recent versions of the EQ-5D are now used in a substantial number of clinical and population studies 58, 59.
The HUI, which was developed in Canada by Feeny et al. 60, uses a multiattribute model to map preference for the 972,000 possible states onto the 0.01.0 continuum. The HUI has been used in many population and clinical studies.
The QWB integrates several components into a single score 25, 61, 62. Patients are classified according to objective levels of functioning, represented by the scales of mobility, physical activity and social activity. Once observable, behavioural levels of functioning have been classified, each individual is placed on the 0.01.0 scale of wellness, a continuum between optimum function and death.
Each of these three methods is well validated and can be used in outcome studies for patients with chronic disease. Most importantly, the methods are required if the investigator intends to perform cost-utility studies 54.
| Conclusions |
|---|
|
|
|---|
The choice of an outcome should reflect a study's purpose. Studies of basic physiological mechanisms or pharmacological efficacy should use specific outcomes that assess biological variables as close to the site of action or process as possible. In studies where the result of interest is the product of multiple effects or mechanisms, the chosen outcome should be at the point of convergence or the end of a sequence of effects. The outcome should be clinically relevant and provide a measure of overall efficacy and an estimate of clinical value. At present, clinical outcomes, such as breathlessness, exercise capacity and health status, may provide the closest approaches to this ideal.
Several excellent quality-of-life measures are designed specifically for evaluating outcomes in chronic obstructive pulmonary disease patients. Beyond these measures, other, more generic methods are available for estimating outcomes in clinical trials. Utility-based measures offer the extra advantage of contributing to economic analysis, and these methods should be given careful consideration for clinical studies of patients with chronic obstructive pulmonary disease.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A.C. Simpson and G.M. Rocker Advanced chronic obstructive pulmonary disease: rethinking models of care QJM, September 1, 2008; 101(9): 697 - 704. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Hendry and C. McVittie Is Quality of Life a Healthy Concept? Measuring and Understanding Life Experiences of Older People Qual Health Res, September 1, 2004; 14(7): 961 - 975. [Abstract] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |