Abstract
The endurance time during constant high work-rate exercise (tLIM) is used to assess exercise capacity in patients with chronic obstructive pulmonary disease and as an outcome measure for pulmonary rehabilitation. Our study was designed to establish the minimum clinically important difference for the tLIM.
tLIM was measured in 105 patients (86 males) before and after an 8-week outpatient pulmonary rehabilitation programme. Subjects were asked to identify, from a five-point Likert scale, the perceived change in their exercise performance immediately upon completion of the exercise tests. The scale ranged from “better” to “worse”.
The mean±sd age was 64±5 yrs, forced expiratory volume in 1 s (FEV1) 47±10% and FEV1/forced vital capacity 54.7±16.3%. Baseline tLIM at 75% of the peak work rate was 397±184 s, which increased by 62±63% after rehabilitation. In subjects who felt their exercise tolerance was “slightly better”, the mean improvement was 34% in the relative improvement over the baseline value (95% CI 29–39)% or 101 (86–116) s compared with 121 (109–134)% in those who reported that their exercise tolerance was “better” and 8 (2–14)% in those who felt their exercise tolerance was “about the same”.
Minimum clinically important improvement for tLIM averaged ∼33% of baseline. Patients were able to distinguish at least one further additional level of benefit at 120% of baseline.
Limitation of physical activity occupies a central role in the symptom complex of patients with chronic obstructive pulmonary disease (COPD) and for that reason, improvement in exercise capacity is a key dimension of response to therapy in COPD and other chronic respiratory diseases 1. Standardised exercise testing is a reliable and accurate means for assessing exercise performance both in clinical practice and large-scale clinical trials 2. Several exercise-testing protocols are available, however, endurance time during constant high work-rate exercise (tLIM), i.e. above the critical power 3, can be very sensitive in detecting physiological changes induced by interventions 4, 5. However, we have previously shown that both tLIM and the change in tLIM after leg muscle training vary considerably depending on the intensity set for the constant work rate (CWR) 3; when above the “critical power”, the closer tLIM is to such a physiological point (i.e. the less stressful within the high-intensity work rates domain), the larger the expected change 3, making the test more responsive to interventions.
Published reports of clinical investigations generally provide sufficient information to judge the statistical significance of the treatment effect as measured by functional outcomes. However, the clinical interpretation of these changes would be more meaningful if clinically relevant thresholds (i.e. the minimum cut-off points that are perceived as beneficial by the patients or subjects) were defined 6–8. A pilot attempt to estimate the minimal clinically important difference (MCID) of tLIM at 75% of the peak work rate (tLIM,75) arrived at a tentative figure of 105 s 9.
The aim of this study was, therefore, to define MCID for tLIM on a cycle-ergometer. A secondary aim was to test whether different CWR intensities (75% and 85% of the peak ramp work rate) performed differently.
METHODS
Patients diagnosed with COPD 10 according to the reference values used in our laboratory were sent to our centre for rehabilitation at the Hospital Universitario Gregorio Marañón, Madrid, Spain, and were selected for the study if they were currently nonsmoking, stable and not in need of chronic oxygen therapy 11. Subjects were excluded if desaturation (<85% by pulse oximetry) was accompanied by arrhythmia or chest pain, or if the patient did not complete the training programme. All subjects signed an informed consent form as approved by our Institutional Committee for Ethics in Human Research (Comité de Ética en la Investigación Clínica del Área 1, Madrid).
The outpatient rehabilitation programme consisted of leg training on the cycle-ergometer for 45 min·day−1 divided into as many as three bouts, 3 days·week−1, for 8 weeks. The training began at a work rate equal to 70% of the maximal work rate (WRmax) achieved on the baseline incremental exercise test (described below). When the subjects were able to tolerate the work rate for the 45 min session, we attempted to incrementally increase the work rate by 5 W every week.
Incremental exercise tests were performed on an electromagnetically braked cycle-ergometer (ER-900; Jaeger, Hochberg, Germany) using a 1-min step protocol 12 at 10 W·min−1 to a symptom-limited maximum. Ventilation and pulmonary gas exchange were measured breath-by-breath by an cardiopulmonary exercise system (Oxycon α; Jaeger). WRmax was defined as the maximum work rate that could be sustained for at least 30 s. At least 1 day after the incremental test, the subjects randomly performed the two CWR tests on different days at 75% and 85% of the WRmax from the pre-training incremental test. The order of the CWR tests was randomised in sequence. All the exercise tests (both incremental and constant) were standardised with respect to the proper seat adjustment relative to leg length and pedalling cadence (60 rpm). The CWR endurance test protocol was 1 min of rest and 1 min of unloaded cycling before the pre-set work rate was instituted. The tests were terminated when, after standardised encouragement, the subjects were unable to continue because of symptoms. If the initial tLIM was <120 s or >1,200 s in either of the two tests, the work rate was increased or decreased in both the 75% and 85% tests by 10% of WRmax or by 5 W, whichever was smaller. In total, 13 (12%) of the study subjects needed work-rate adjustment, but none needed more than one adjustment. Test–retest reliability of the 75% and 85% CWR endurance and ventilatory responses were assessed in 25 consecutive subjects by repeating the tests within 2–3 days after the first one. All these tests were performed prior to the start of the leg training programme. Post-rehabilitation tests were performed within a week of the last session.
Subjective ratings of exercise capacity change after the intervention were measured by a Likert scale containing seven categories: “no change”, “a little bit better (or worse)”, “somewhat better (or worse)”, and “much better (or worse)” 7, 13, and administered by the therapist after the training programme. As done previously by Singh et al. 8, we grouped the categories “somewhat better” and “much better” into “better” and “somewhat worse” and “much worse” into “worse” to increase the power of the sample.
Quality of life was measured in all subjects by the same investigator using a version of the Chronic Respiratory Questionnaire (CRQ) 14 translated into, and validated, in Spanish 15. The CRQ was administered just before the exercise tests. Clinically relevant changes in the dyspnoea score (CRQ-D) have been found to be 0.5 per item 13, so that a sum total of 2.5 points for the five items was considered relevant. Accordingly, four categories of CRQ-D response were defined: ≤ -2.5, -2.4–2.4, 2.5–4.9 and >5 of the baseline CRQ-D score 16.
We integrated certain statistical indexes with the two clinical benchmarks described above (subjective ratings of improvement and CRQ-D score) to “triangulate” the minimum relevant changes, as suggested by Leidy and Wyrwich 16. The statistical indices utilised were the percentage of change from baseline and a distribution-based approach, the effect size (defined as change from baseline to the end of the treatment divided by the sd of the whole group at baseline). Therefore, an effect of 1 corresponds to an increase equal to one sd of the parameter at baseline. In addition, we calculated the “optimal” tLIM cut-off points (i.e. those that maximised the area under the receiving operating characteristic curve (AUC)) to detect a subject's changes in their self-rating from an “about the same” or lower score to an “a little bit better” or higher score, for the subjective ratings of improvement, and changes of <2.5 points versus improvements of ≥2.5 for the CRQ-D. Sensitivities and specificities of such cut-off points were also calculated.
tLIM exercise variables and their changes after intervention were normally distributed and are described well by their mean±sd values. The relationships between changes in tLIM, subjective global scale of exercise capacity and CRQ were assessed using the nonparametric Spearman's rank correlation coefficient “ρ”. Reliability was evaluated by means of Bland–Altman graphical analysis 17, intra-class correlation coefficients and paired t-tests. Variation coefficients were also calculated. The sample size was estimated for an α error of 0.05 and a power of 0.8 from the mean differences observed after recruiting the first 70 subjects (mean values for “worse”, “slightly worse”, “about the same”, “slightly better” and “better” were (in s): -100 (-134− -65), -23.8 (-33− -14), 39 (26–53), 102 (89–116) and 544 (474–614), respectively, with an overall standard deviation of 262 s). From these results we considered it reasonable to power the study to detect differences with one-way ANOVAs of at least 60 s between all groups except for the last two (slightly better and better), for which detection of a difference of 120 s was considered enough due to the large differences seen in the pilot study and a distribution of the sample over the categories of 10, 10, 20, 30 and 30%, respectively. The estimated sample size needed was 12, 12, 23, 34, 34 or 105 subjects, respectively. The required sample size was calculated using Ene 2.0 software (GSK Servei d'Estadística de la UAB; www.e-biometria.com/ene-ctm/index.htm).
RESULTS
We studied 105 COPD subjects (86 males); 22 met the Global Inititaive for Chronic Obstructive Lung Disease (GOLD) II criteria and the rest, who were moderately hyperinflated and had moderate reductions in exercise capacity, met GOLD III (table 1⇓) 10. Patients were recruited until the total estimated number was reached, even if the expected number in the “worse” and “a little bit worse” groups could not be reached. All the participants who tolerated the 75% test were able to tolerate the 85% tests (i.e. were able to endure the 85% test for >2 min).
Both CWR tests showed excellent test–retest reliability (fig. 1⇓), with a tendency to increase tLIM in the second repetition (table 2⇓). Highly significant intra-class correlation coefficients (≥0.85; p<0.001) were found between test and retest assessments of endurance time, as well as of end-exercise ventilatory responses.
For the 75% and 85% CWR endurance tests, pre-treatment tLIM was 397±184 s and 315±194 s. The average increases in tLIM after the intervention were 289±311 s and 138±147 s, or 62±63% and 48±57%, respectively, for the 75% and 85% CWR tests.
The distribution of responses of the questions about perceived improvement was as follows: “better”, 33% (n = 35); “slightly better”, 32% (n = 33); “about the same”, 24% (n = 25); “slightly worse”, 6% (n = 7); and “worse”, 5% (n = 5). There were significant differences (p<0.05) in baseline CRQ-D scores and tLIM between the “better” group and the other groups.
The average improvement in the CRQ-D score was 5.5±5.1. Of the subjects, ∼75% improved >2.5 points in their scores (table 3⇓). The changes observed with the CRQ scores correlated well (ρ = 0.65, p<0.001 for the 75% CWR test and ρ = 0.61, p<0.001 for the 85% CWR test), with improvements in tLIM and the CRQ-D score. A very good correlation between changes in CRQ-D and perceived improvement was found (ρ = 0.89).
The mean changes in daily living dyspnoea and perceived exercise tolerance, as measured by the CRQ-D, are described in tables 3⇑ and 4⇓, respectively. Patients needed to improve by 34 (29–39)% or 101 (86–116) s in the 75% CWR test and by 31 (25–34)% or 67 (61–85) s in the 85% CWR test to perceive their tolerance to exercise as “slightly better”. A further difference was found at 121 (109–134)% or 521 (472–571) s for the 75% CWR and at 92 (81–103)% or 299 (263–335) s for the 85% CWR. When the effects were reported as absolute time, a relationship (r≈0.4) between baseline tLIM and the magnitude of the improvements was observed (fig. 2⇓); the apparent influence of the initial performance on the magnitude of the effects disappeared when they were standardised as a percentage over the baseline.
The “optimum” tLIM cut-off points, by receiver operating characteristic analysis (sensitivity 0.97, specificity 0.93), to detect a subject's changes in their self-rating from “about the same” to “a little bit better” were 90 s for tLIM,75 (sensitivity 0.9, specificity 1.0) and 76 s (sensitivity 0.84, specificity 1.0) for tLIM,85. On analysing the percentages, the best cut-off points were 27% (sensitivity 0.87, specificity 0.9) for tLIM,75 and 23% (sensitivity 0.83, specificity, 0.9) for tLIM,85. From the point of view of their classifying power, the AUC was slightly better (AUC 0.94 (95% CI 0.92−0.98)) for the 75% CWR test compared with the 85% CWR test (AUC 0.85 (95% CI 0.82–0.88)).
The cut-off points were similar: 30 (26–35)% or 90 (74–106) s for the 75% CWR test and 26 (21–30)% or 55 (39–73) s for the 85% CWR test when using changes in CRQ-D ≥2.5 points as the clinical benchmark of improvement.
DISCUSSION
Our study showed that both tLIM,75 and endurance to constant work-rate test at 85% of the peak work rate (tLIM,85) were quite reproducible (fig. 1⇑), which could be explained, in part, by the subjects becoming acquainted with the testing environment and procedures in the previously performed baseline incremental test. In addition, we integrated several clinical and statistical approaches to “triangulate” the MCID 16, and the different methods rendered narrow ranges of values of important differences or cut-off points for both tLIM,75 and tLIM,85 (tables 3⇑ and 4⇑). For tLIM,75, for instance, subjects needed to improve by 101 (86–116) s or 34 (29–39)% to rate themselves as clinically slightly improved (table 4⇑). Results were quite similar: 30 (26–35)% or 90 (74–106) s for a significant increase in the CRQ-D score (table 3⇑) and the “optimum” cut-off point, calculated as described in the methods, was 90 s. Additional levels of benefit could be perceived by those with larger improvement (tables 3⇑ and 4⇑). Finally, we found that decreases in tLIM,75 of 12 (-28–3.5)% or -29 (-36− -22) s were perceived by the subjects as a slight deterioration, but the sample size of the group who deteriorated was very small.
We found that, when measured in seconds, the MCID was dependent on the baseline state, but this was not the case when it was expressed as a percentage of baseline (fig. 2⇑). Furthermore, when expressed as a percentage of baseline, the MCID were similar in both intensities, suggesting that the percentage of change over baseline is a better way of standardising the response to interventions.
tLIM during high CWR exercise (i.e. at 75–85% of the WRmax) has previously been shown to be reliable 18 and highly responsive to therapeutic interventions, such as bronchodilator therapy 5, 19–22, oxygen 23, heliox administration during exercise 24 and rehabilitation 3, 4. Work rates utilised in recent clinical studies were 75–85% of maximal oxygen uptake or WRmax measured during symptom-limited incremental cardiopulmonary exercise testing 5, 18–22, 24. Changes much larger than the proposed cut-off points are typically seen in patients after rehabilitation programmes that include intense leg training 2–4.
One limitation of our study is that we did not perform a prospective validation of the criteria obtained in this work for an independent population in different settings. Secondly, in the present study, we did not test the minimum clinically important difference for bronchodilator therapy. Prior work has demonstrated that small improvements in tLIM can be achieved with this therapy 5, 19–22, but effects >100 s are unusual; for example, the mean increase in endurance time in a large-scale tiotropium trial was observed to be 110 s compared with placebo 19. The physiological determinants of improvement after leg training are likely to be different from those after bronchodilator therapy 3, 19. Thirdly, we had to readjust the work rate in 13 (12%) subjects. Based on previous experience 3, 4 we considered that tests of <2 min might be unreliable and tests of >20 min might not be stressful enough for the individual and therefore the subject might terminate them for different physiological or psychological reasons (boredom, seat pain, exhaustion of the muscles, etc.). Furthermore, while tests <2 min do not allow time for certain additional measurements, tests >20 min are impractical. Therefore, we faced a dilemma: either to exclude the subjects or to retest them. It is doubtful that these patients represent “outliers” to the COPD population, rather they represent either an under- or overestimation of the target work rate; moreover, we believe that in both clinical practice and trials, similar adjustments are and will be common and thus our data could be more readily extrapolated if we included such subjects. Fourthly, only ∼20% of subjects enrolled were female, although this proportion is consistent with the published gender distribution of COPD in Spain 25, 26. Finally, in the present paper, we studied symptomatically mostly severe (GOLD III) patients and, therefore, inferences upon other populations of COPD subjects have to be made with caution. However, the types of patients included in the present study are those most frequently included in rehabilitation programmes.
In summary, our study has found that CWR tLIM is highly reproducible. We also have identified that an increase of 33% in baseline (∼100 s in the 75% CWR test, and ∼70 s in the 85% CWR test) are important thresholds for clinically relevant changes of tLIM.
Support statement
This work was partly funded by the FIS PI052562 Grant.
Statement of interest
None declared.
- Received May 24, 2008.
- Accepted February 17, 2009.
- © ERS Journals Ltd