Lung function decline in COPD trials: bias from regression to the mean
- 1McGill Pharmacoepidemiology Research Unit, Jewish General Hospital, Depts of 2Epidemiology and Biostatistics, and 3Medicine, McGill University, Montreal, QC, Canada.
- S. Suissa, Division of Clinical Epidemiology, Royal Victoria Hospital, 687 Pine Avenue West, Ross 4.29, Montreal, QC, Canada H3A 1A1. Fax: 1 5148431493. E-mail: samy.suissa{at}clinepi.mcgill.ca
The decline in lung function over time is a fundamental measure of disease progression among patients with chronic obstructive pulmonary disease (COPD). As a result, forced expiratory volume in one second (FEV1) decline has been used as a central outcome measure for many randomised controlled trials evaluating whether pharmacological treatments could modify the natural history of COPD. These trials, mostly focussed on assessing the benefit of inhaled corticosteroids, and their meta-analyses highlighted the complexities of analysing data from repeated FEV1 measurements over time, particularly in the context of studying COPD patients who discontinue follow-up early and in large numbers. This has lead to contradictory results and divergent conclusions 1–3. At present, the recent paper from the Towards a Revolution in COPD Health (TORCH) trial reports that the yearly decline in FEV1 was significantly slower with fluticasone, salmeterol or both compared with placebo 4.
A common misconception in the interpretation of these studies is that the effect of the study drug on FEV1 decline is likely to be greater than the data suggest. The rationale is that more patients receiving placebo were discontinuing the study drugs and the patients who dropped out early had a steeper decline in lung function than those who remained. Consequently, it has generally been believed that the resulting analysis “actually minimises the differences observed in the rate of FEV1 decline” 4.
To clarify this misconception, a fundamental aspect of the design and statistical analysis of these trials is addressed, namely bias from regression to the mean resulting from the absence of an authentic intent-to-treat approach. It is described based on the TORCH trial and illustrated using data from the Canadian Optimal randomised trial.
BIAS FROM REGRESSION TO THE MEAN
Most of the trials to date are not designed for a full intent-to-treat analysis of lung function decline. The TORCH trial, for example, only measured FEV1 until the patients discontinued treatment. It involved 6,112 moderate-to-severe COPD patients randomised to one of four treatment groups including fluticasone, salmeterol, both or placebo, followed for 3 yrs. However, the lung function analysis involved only 5,343 subjects with 26,539 measurements of post-bronchodilator FEV1 made twice a year during follow-up to estimate the rate of FEV1 decline between 6 months and 3 yrs after randomisation. Consequently, 10,133 measurements were missing so that the pure intent-to-treat analysis was not possible. These missing measurements are likely to not be a random sample of all possible 36,672 measurements that the study could have yielded, i.e. they are not missing at random, which can generate two forms of bias from regression to the mean 5.
The first form of the bias results from excluding some subjects altogether. Nearly 18% of patients allocated to placebo did not contribute a single FEV1 value because they discontinued placebo before the first 6-month visit when the initial FEV1 was measured. In contrast, only 9% of patients allocated to combination therapy did not make it to this first 6-month visit. It is generally accepted that these excluded patients would have had the worse FEV1 values at the first visit had they been available to be measured. Thus, the slope of decline in the remaining subjects with better FEV1 values at the first visit may have been affected by regression to the mean. This phenomenon is illustrated below.
The second form of this bias results from discontinuing the follow-up of patients who have the initial FEV1 value measured but are missing some subsequent values; these measurements are also unlikely to be missing at random. In the TORCH study, the placebo patients who discontinued before the end of follow-up had a faster decline in FEV1 (76 mL·yr−1) than those completing the trial (54 mL·yr−1) 4. Here again, these slopes of decline may have been affected by regression to the mean.
ILLUSTRATION
Data from the Canadian Optimal study, a three-arm randomised trial of 449 patients with moderate or severe COPD, are used to illustrate the bias 6. Measurements of post-bronchodilator FEV1 were made at randomisation (visit 0) and at 4, 20, 36 and 52 weeks thereafter during the 1-yr follow-up (visits 1–4). The 322 subjects who had measurements of FEV1 for all visits were used in this illustration. FEV1 decline was measured as of visit 1 and the rate of FEV1 decline was estimated in two ways: 1) as the difference in FEV1 between visits 4 and 1; and 2) using measurements from all four visits with a mixed linear regression model accounting for within-subject correlation, with the slope standardised to a 1-yr time span.
The 322 patients had a mean FEV1 of 1,131 mL at visit 1, with a change in FEV1 from visit 1 to 4 of 38.7 mL. There was a significant correlation of 0.33 between this decline and the FEV1 measure at visit 1, which indicates that patients with the highest initial FEV1 have the greatest decline, while the patients with the lowest initial FEV1 have the lowest decline. Figure 1⇓ depicts this correlation by showing that patients in the highest quartile of initial FEV1 values (>1,440 mL) have the largest decline (mean decline 119 mL), while the patients in the lowest quartile of initial FEV1 values (<770 mL) in fact show an improvement of 32 mL.
Table 1⇓ shows that the overall 1-yr decline in FEV1 estimated using measurements from all four visits was 38.6 mL (p = 0.001). It shows that if the 18% of patients with the lowest FEV1 at visit 1 (<700 mL) are excluded, in keeping with the hypothesis that patients in poorer health are more likely to leave the study, the 1-yr rate of FEV1 decline among the remaining subjects becomes 52.2 mL, a clear overestimate of the decline. If only the 9% of patients with the lowest FEV1 at visit 1 (<630 mL) are excluded, analogous to the lower exclusion rate with combination therapy in TORCH, the 1-yr rate of FEV1 decline among the remaining subjects is 44.1 mL, still an overestimate but less marked. Conversely, the two groups of excluded subjects have a mean increase in FEV1 of 40.7 and 52.5 mL, respectively.
Table 2⇓ addresses the issue of subjects with a FEV1 value at visit 1, but values possibly missing thereafter. Here, it is found that if the 20% of patients who have the lowest FEV1 at visit 2 (<740 mL) are excluded, the 1-yr rate of FEV1 decline among the remaining subjects becomes 50.3 mL. These excluded subjects had, between the first two visits, a decline in FEV1 of 50.6 mL over the initial 4-month period but no significant decline over all four visits (1.7 mL).
CONCLUSION
The intent-to-treat principle for randomised controlled trials is fundamental to avoid bias. Selection bias can occur when patients are permitted to exit the study when they discontinue the study medication. A more severe form of this bias occurs when randomised patients are excluded from the analysis altogether.
The TORCH study performed an authentic intent-to-treat analysis for the outcome of mortality by assessing the survival status of all 6,112 patients randomised for the entire 3-yr follow-up period. However, the recent TORCH analysis of lung function was not a true intent-to-treat analysis since it was based on only 5,343 out of the 6,112 patients with randomised FEV1 values. Moreover, twice as many patients randomised to placebo were excluded altogether compared with the combination therapy group, while many patients discontinued early with no further FEV1 measurements. This paper shows that such differential exclusion rates can introduce selection bias if the basis for exclusion is associated with the outcome, in this case decline in FEV1.
The current illustration can help interpret the TORCH findings. Assume, for instance, that the FEV1 decline in the TORCH study was 38.6 mL·yr−1 equally for all treatment arms. The exclusion of 18% of placebo patients, most likely with the lowest initial FEV1 value, can result in a mean FEV1 decline of 52.2 mL among the remaining subjects, compared with the exclusion of 9% in the treated patients which changes the decline to 44.1 mL. Thus, instead of comparing 38.6 with 38.6 mL·yr−1, a study based on such incomplete groups would mistakenly compare 44.1 with 52.2 mL·yr−1 and infer that the declines are different. In addition, among those with an initial FEV1 value, the TORCH study placebo patients who discontinued before the end of follow-up had a decline of 76 mL·yr−1 compared with 54 mL for those completing the trial. In the present illustration, the patients most likely to discontinue had a 4-month decline in FEV1 of 50.6 mL (extrapolated to 152 mL over 1 yr) compared with a 1-yr FEV1 decline among the remaining subjects of 50.3 mL, both quite different from the true 1-yr decline of 38.6 mL. Incidentally, this second form of the bias may explain the different initial spikes in FEV1 seen just after randomisation in these trials. Such bias is not eliminated by advanced techniques of data analysis, such as mixed linear regression, that account for within-subject correlation and variable numbers of FEV1 measurements per subject.
While the TORCH study may provide some evidence that pharmacological therapy could modify the decline in lung function, it could also reflect an artificial effect of the phenomenon of regression to the mean. Indeed, the decline was estimated after excluding patients, in a context where the patients with the best initial forced expiratory volume in one second value generally have the greatest decline in forced expiratory volume in one second and those with the poorest initial forced expiratory volume in one second value have the lowest decline, or even an increase, in forced expiratory volume in one second. In such a context, the impression that the effect of the study drug on forced expiratory volume in one second decline is greater than the data would suggest is a misconception; the effect of the drug could even be nil. The regression to the mean phenomenon can clearly lead to bias in randomised trials of chronic obstructive pulmonary disease treatment, thus, proper attention to the intent-to-treat principle becomes crucial to avoid this bias and provide valid data. This will hopefully be addressed in the upcoming Understanding Potential Long-term Impacts on Function with Tiotropium (UPLIFT) trial 7.
Support statement
S. Suissa is a Distinguished Investigator of the Canadian Institutes of Health Research (CIHR).
Statement of interest
A statement of interest for S. Suissa can be found at www.erj.ersjournals.com/misc/statements.shtml
Acknowledgments
The author would like to thank S. Aaron (University of Ottawa, Ottawa, ON, Canada) and the Canadian OPTIMAL trial group for kindly providing data used in this editorial and P. Ernst (McGill University, Montreal, QC, Canada) who provided crucial comments.
- © ERS Journals Ltd