Abstract
The TORCH (Towards a Revolution in COPD Health) trial has highlighted some important issues in the design and analysis of long term trials in chronic obstructive pulmonary disease. These include collection of off-treatment exacerbation data, analysis of exacerbation rates and the effect of inclusion of patients receiving inhaled corticosteroids (ICS) prior to randomisation.
When effective medications are available to patients who withdraw, inclusion of off-treatment data can mask important treatment effects on exacerbation rates. Analysis of on-treatment data avoids this bias but it needs to be combined with careful analysis of withdrawal patterns across treatments.
The negative binomial model is currently the best approach to statistical analysis of exacerbation rates, while analysis of time to exacerbation can supplement this approach. In the TORCH trial, exacerbation rates were higher among patients with previous use of ICS compared to those with no prior use on all study treatments. Retrospective subgroup analysis suggests ICS reduced exacerbation rates compared with placebo, regardless of prior use of ICS before entry to the study.
Factorial analysis provides an alternative analysis for trials with combinations of treatments, but assumes no interaction between treatments, an assumption which cannot be verified by a significance test. No definitive conclusions can yet be drawn on whether ICS treatment has an effect on mortality.
The TORCH (Towards a Revolution in COPD Health) study 1 was one of the largest (6,112 subjects in the intention-to-treat (ITT) population) and longest trials (3 yrs) of pharmacotherapy in patients with chronic obstructive pulmonary disease (COPD) and the first prospective mortality study. Subjects were randomised to placebo, salmeterol 50 μg, fluticasone propionate 500 μg or the combination of salmeterol 50 μg and fluticasone propionate 500 μg (SFC). All treatments were administered twice daily via a single inhaler.
The primary outcome was all-cause mortality, based on 3-yr survival data from all subjects, regardless of whether they had withdrawn. The primary analysis of mortality compared SFC to placebo. Secondary efficacy end-points were the rate of moderate and severe COPD exacerbations (requiring treatment with systemic corticosteroids and/or antibiotics, or requiring hospitalisation), and health status, determined using the St George’s Respiratory Questionnaire (SGRQ). The design and analysis of this study were discussed with regulatory agencies in advance of unblinding.
Our paper seeks to explain the background to the choices made in the design and analysis of TORCH and explore the impact of these choices on the results. It is important to build on the experience of this trial in order to plan future trials in COPD. Specific areas that required careful evaluation were as follows.
1) Extent of follow-up. Data on exacerbations were only collected while patients were receiving treatment. It has been argued that data following discontinuation of treatment must be obtained and that if these data are missing, then the subsequent analysis cannot be considered to follow the ITT principle 2, 3.
2) Analysis of exacerbations. The primary analysis of exacerbation rates was completed using the negative binomial model, a relatively innovative approach. In addition, it is of interest to examine whether randomised treatment affects time to subsequent exacerbations beyond the first one.
3) Impact of previous therapy. Approximately 50% of patients entering the TORCH trial had used inhaled corticosteroids (ICS) in the year prior to randomisation. It is important to evaluate whether ICS only reduces exacerbation rates among those patients already using ICS and whether any effect seen of ICS therapy is, therefore, due to steroid withdrawal.
4) Analysis combining treatment arms. The primary analysis compared individual treatment arms. An alternative approach using factorial analysis would have been to combine the two long-acting β-agonist (LABA)-containing arms and compare the results with the two non-LABA-containing arms. Similarly, the two ICS-containing arms would be combined and their results compared with the two non-ICS-containing arms.
It was recognised at the design stage that patient withdrawal rates could have an effect on the outcome of the trial 4. As expected, the results did show different levels of withdrawal between treatments (fig. 1⇓). Patients withdrew significantly more frequently in the placebo group (44%) and were least likely to withdraw when taking SFC (34%). The large number of withdrawals and the different withdrawal patterns between treatments have important implications for design and analysis of trials in COPD, as will be discussed below.
TORCH (Towards a Revolution in COPD Health): summary of study drug discontinuation. – – – –: placebo; ▓: fluticasone propionate 500 μg (FP); - - - -: salmeterol 50 μg (SAL); ––––: combination of salmeterol 50 μg and fluticasone propionate 500 μg (SFC). Reproduced from 1 with permission from the publisher.
ITT ANALYSIS OF EXACERBATION RATES AND OFF-TREATMENT INFORMATION
ITT is the accepted methodology for analysis of clinical trials. This requires inclusion of all patients and was introduced to address the possible bias of only including patients who adhered closely to the requirements of the protocol. This pragmatic approach is normally preferred to one focusing on compliant patients only, “since it provides a more valid assessment of treatment efficacy as it relates to actual clinical practice” 5. For example, if a treatment benefits comparatively few patients but results in adverse events that lead to withdrawal, an analysis which only includes those who complete the study could overestimate treatment efficacy.
In the TORCH trial there was virtually complete follow-up of mortality status of all patients at 3 yrs, including those who had discontinued treatment. In contrast, data on exacerbations were only collected while patients were on randomised treatment.
Suissa et al. 2 have argued that the analysis of exacerbation rates in the TORCH study does not conform to the ITT principle, because data on exacerbations were missing from patients following discontinuation of randomised therapy. ITT analysis requires inclusion of all available subjects in the analysis. In principle there should also be complete follow-up of all patients 6 but, in practice, for outcomes other than mortality there are nearly always missing data. The CONSORT (Consolidating Standards of Reporting Trials) statement 7 is the standard guideline for reporting randomised clinical trials adopted by major medical journals. In an accompanying article to the latest revision, the CONSORT group state 8: “It is common for some patients not to complete a study – they may drop out or be withdrawn from active treatment – and thus are not assessed at the end. Although these participants cannot be included in the analysis, it is customary still to refer to analysis of all available participants as an intention-to-treat analysis.” In the analysis of outcomes from the TORCH trial, all available patients were included in the analysis, whether they withdrew early or not, so, by this definition, the results presented are from an ITT analysis.
The question remains of what should be the relevant data and end-points for testing the effect of treatment on exacerbations. Should the TORCH trial have collected data on exacerbations after discontinuation of randomised treatment and used this in the primary analysis of this outcome? Using this approach, data would have been included regardless of the therapy that the patient received after withdrawing from the study treatment. Such a design would conform more closely to the ITT ideal of complete follow-up data for all patients.
The fundamental problem with including post-withdrawal data is that after withdrawal, patients can switch to any licensed COPD therapy. In the TORCH trial, many patients started treatment with one of the active comparators in the trial. By the end of the trial, over 50% of the 673 withdrawn placebo patients had started open-label treatment with LABA, ICS, or LABA plus ICS prescribed by their physician. This represents over 25% of randomised placebo patients.
A comparison which includes the data off-treatment does not provide a reliable assessment of the clinical question under test in the study. For example, suppose hypothetically that all placebo patients withdraw early in the trial and take active treatment. Under these circumstances, analysis that included off-treatment data would essentially compare identical treatment regimens. It is illogical to conclude that a treatment lacks efficacy on the basis of an analysis that compares the test treatment to a treatment regimen where a substantial number of patients take the same or similar treatment but in a nonrandomised manner. A more meaningful analysis is to compare on-treatment data only, since it is a comparison of the treatment regimens of clinical interest.
On-treatment analysis is open to bias in the sense that those patients who stay on the active treatment regimen may be those who benefit from and/or tolerate the treatment. If patients withdraw from a treatment because of lack of efficacy or because of an adverse event, then this may not be appropriately captured in the on-treatment data. This is a legitimate concern, particularly when there are more withdrawals on the test treatment compared with placebo. In order to understand these potential biases, it is important to examine the pattern of withdrawals and the reasons for withdrawal. This set of data in itself can aid assessment of whether a treatment is useful, as well as give reassurance regarding the validity of the on-treatment analysis.
In the TORCH trial, 673 (44%) withdrew on placebo compared with 561 (37%) on salmeterol, 587 (38%) on fluticasone propionate and 522 (34%) on the combination therapy 1. As in other studies 9, patients withdrawing tended to have more severe disease than those remaining in the study on all treatment arms. Because of the increased withdrawals on placebo, the patients remaining on-treatment in the placebo arm tended to have less severe disease than those on the active treatment arms. More patients withdrew for lack of efficacy on the placebo arm compared with the other treatment arms 1. Therefore, the potential bias from an on-treatment analysis seems most likely to be against active treatment.
In contrast to TORCH, the OPTIMAL study obtained information on exacerbations from some subjects withdrawing from the study. Of the 449 patients randomised, 175 patients stopped their randomised treatment during this 1-yr study and, of these, 110 (63%) provided off-treatment data 3. Withdrawals were less frequent in the arm that received ICS compared with the two arms that did not. In the tiotropium plus placebo arm, 74% of withdrawn patients received an open-label inhaled steroid and LABA combination inhaler for the remainder of the study 3. In the primary analysis, data from these patients were included as being from patients on tiotropium plus placebo. It is not possible to conclude a lack of effect of the addition of ICS when comparing groups of patients who are both receiving the same medication.
There are practical difficulties in collecting some data, such as those regarding exacerbations from the patients who withdraw from the study. Mortality is relatively straightforward, since it only requires the determination of whether a patient is alive or dead and, if dead, the date of death. For exacerbations the issue is more complex, since exacerbations may not be routinely recorded in the patient notes, clinicians managing the patient may use a different definition of exacerbation from the one in the study, and patients may have withdrawn their consent for their data to be used in the trial. Even when extensive efforts were made to follow-up patients off-treatment, as in the OPTIMAL trial, the ideal of complete capture of all data was not possible and a problem with missing data remains.
Missing data represent a problem for any statistical analysis of clinical trial data and no statistical method can completely compensate. Although analysis of exacerbations using methods such as the negative binomial model accounts for length of exposure to treatment 10, 11, the analysis makes an important assumption regarding missing data that, conditional on the data observed for each patient and the covariates in the model, the remaining data are randomly missing. Standard time-to-event analyses, such as the Cox model, make a similar assumption in that it assumes that, conditional on the covariates in the model, a patient lost to follow-up is just as likely to have a future event as one staying in the study (e.g. this would assume that patients do not withdraw shortly before they are about to experience an exacerbation).
An alternative approach to the analysis would be to include withdrawal as an adverse outcome in a composite end-point, such as time to withdrawal or exacerbation. Such an approach is similar in principle to the early escape designs proposed by Temple and co-workers 12, 13 for situations where long-term use of placebo is problematic. All patients in such an analysis would be included as either having the event or as having completed the treatment course.
STATISTICAL ANALYSIS OF EXACERBATIONS, INCLUDING TIME TO SECOND AND SUBSEQUENT EXACERBATIONS
Patients withdrawing early reveal important information on treatment efficacy. Estimates of exacerbation rate from the Poisson model are weighted according to the follow-up time and an over-dispersion correction is used to account for inter-patient variability 10. This model does not account for the overall higher exacerbation rate among patients who withdraw; therefore, it underestimates the true exacerbation rates and the efficacy of treatment with ICS 11. The negative binomial model used in the TORCH analysis assumes that each individual has their own underlying rate of exacerbations and that the number of exacerbations for each individual follows a Poisson distribution. In contrast to the simple Poisson model, the negative binomial model allows the expected number of exacerbations to vary across patients. Nonparametric methods, such as those used to analyse the ISOLDE (Inhaled Steroids in Obstructive Lung Disease in Europe) trial, make no assumptions about the distribution of exacerbation rates and are a valuable alternative.
As well as the primary analysis of rate of exacerbations, it can be of interest to examine the time to first exacerbation and the time to subsequent exacerbations. The analysis of time to first exacerbation is relatively straightforward using a Cox's proportional hazards analysis. However, it is not correct to apply the same methodology to the time from the first to the second exacerbation, as was done in the analysis by Suissa et al. 2. The problem is that this analysis does not compare similar groups of patients, since it includes only the patients who have had a first exacerbation, i.e. they are a biased sample that is not entirely representative of the patients initially randomised. For statistical analysis, the point of randomisation is the only sensible choice for time zero 14. The patients on ICS with an exacerbation are likely to have more severe disease than those not on ICS, and this cannot be accounted for through use of covariance analysis.
The correct statistical approach to addressing time to second and subsequent exacerbations is to use a multiple time-to-event method, such as the proportional hazards model reported by Prentice et al. 15 or the model described by Andersen and Gill 16. The Prentice et al. 15 method estimates the hazard ratio for the time to first event, time to second event and so on. The results of this analysis (table 1⇓) show that the active treatments in TORCH maintained the reduction in risk of an exacerbation for the second and subsequent exacerbations.
The TORCH(Towards a Revolution in COPD Health) trial: summary of comparisons of time to each event
The Andersen and Gill 16 method combines information on time to each event to produce an overall hazard ratio for the risk of experiencing any exacerbation event. This analysis shows a hazard ratio of 0.85 (95% CI 0.77–0.93) for salmeterol compared with placebo, 0.87 (95% CI 0.79–0.95) for fluticasone propionate compared with placebo and 0.78 (95% CI 0.72–0.86) for SFC compared with placebo.
USE OF PREVIOUS THERAPY
It has been suggested that the benefit of ICS therapy in reducing exacerbation frequency is confined to those who were previously receiving ICS at randomisation 2. We have examined the TORCH dataset to see what evidence there is to support this.
In TORCH, of the 6,112 patients in the ITT population, it was possible to determine prior use of ICS in 5,960 (98%). Of these, 2,976 (50%) were recorded as having received any ICS in the year prior to screening. Results for exacerbation rates are given in table 2⇓. For comparison, we have also included exacerbation rates split by prior use of LABA.
The TORCH(Towards a Revolution in COPD Health) trial: exacerbation rates by prior use of inhaled corticosteroid (ICS) or long-acting β-agonist (LABA)
In all four treatment arms, patients with prior use of ICS had higher exacerbation rates after randomisation compared with those with no prior use. This suggests that patients who had prior use of ICS were different from those who did not. ICS-containing treatments are recommended in patients with more severe COPD who have a history of repeated exacerbations 17. Therefore, patients prescribed ICS at baseline will be more likely to have a long-term history of repeated exacerbations and, hence, exacerbate more during the study. A similar pattern is evident for patients with prior use of LABA: patients with prior use of LABA had higher exacerbation rates after randomisation compared with those with no prior use of LABA for all treatment groups.
Despite the lower exacerbation rates among those with no prior use of ICS, exacerbation rates were significantly reduced for SFC and fluticasone propionate compared with placebo in this retrospective subgroup analysis.
An analysis performed at the request of one of the reviewers of this paper, using the alternative approach of time-to-first event, also suggests that the reduced risk of exacerbation for SFC compared with placebo was similar for those with no prior use of ICS (hazard ratio 0.87, 95% CI 0.77–0.99) and for those with prior use of ICS (hazard ratio 0.83, 95% CI 0.74–0.93).
ANALYSIS COMBINING TREATMENT ARMS
The design of two ICS-containing arms and two LABA-containing arms permits a factorial analysis 2, 18 that combines the results from the two ICS arms to examine the effect of ICS, and combines the results from the two LABA arms to examine the effect of LABA. Such an analysis has increased power relative to an analysis which compares individual treatment arms.
A key assumption, however, of such an analysis, is that each treatment has the same additive effect in the presence and absence of the other treatment. Suissa et al. 2 use a nonsignificant p-value for the interaction test to claim such an interaction does not exist but, unfortunately, an assumption such as this cannot be proven by a nonsignificant p-value 19, particularly since these tests lack power 20. Compared with placebo, the two treatments used together might produce an effect greater than the numerical sum of each used separately or an effect smaller than this but which was still worthwhile.
The primary statistical methods for a clinical trial need to be determined in advance of seeing the data. Lubsen and Pocock 21, in a review of factorial trials in cardiology, state: “there are few situations where it is reasonable a priori to make such a strong assumption about an absence of interaction.” The pre-planned analysis of TORCH, therefore, compared individual treatment arms, rather than pooling arms.
TORCH showed reduced mortality for SFC compared with salmeterol that was not statistically significant. The lack of any effect of ICS on mortality in the post hoc factorial analysis is driven by the small increase in mortality on the fluticasone propionate alone arm. The fact that the two ICS-containing arms produce opposite directions in terms of effect on mortality gives rise to doubts as to whether the effect of fluticasone propionate is the same in the presence and absence of salmeterol and, therefore, on the validity of pooling these two arms for analysis.
Given these reservations, it is not appropriate to conclude that adding an ICS to a LABA has no additional effect on mortality. More studies are needed with adequate power to conclusively establish whether there is a real reduction in mortality in this setting.
DISCUSSION AND CONCLUSION
Long-term trials in COPD to evaluate exacerbations present difficult problems of design and statistical analysis. A major issue is the large numbers of withdrawals and this problem is increased by the fact that patients can be prescribed the same type of active medications used in the trial after they withdraw. Withdrawals due to lack of efficacy are very important in the assessment of the efficacy of a treatment in any disease setting. An analysis which includes off-treatment data from patients treated with effective medications post-withdrawal has a large potential for bias and could favour study treatments that increase withdrawal. This approach cannot be recommended for the primary analysis of exacerbation rates in COPD trials. Analysis of on-treatment responses is not ideal and is open to different types of bias but it remains the preferred choice for the primary analysis for exacerbation rates in these circumstances.
Evaluation of mortality is open to similar sources of potential bias, but the issues here are different. The effect of treatment on exacerbations can be expected to be comparatively rapid, while the potential for death to be delayed beyond discontinuation of treatment cannot be ignored. Therefore, complete follow-up of mortality, as was done in the TORCH trial, is typically required. Nevertheless, the potential for such analysis to underestimate treatment effects still needs to be recognised when considering the outcome of trials like TORCH.
Increased rates of exacerbation are observed among patients stopping steroid treatment. In the TORCH trial, those patients with prior use of ICS had higher rates of exacerbation compared with those with no prior use, and a similar increase was also observed among those with prior use of LABAs. It remains unclear whether this increase was due to patients who exacerbate more frequently being prescribed steroids and, therefore, returning to their previous rate of exacerbations, or whether there is an underlying biological mechanism that produces a steroid withdrawal effect. The efficacy of randomised fluticasone-containing regimens in TORCH was not, however, confined simply to patients who had previously been prescribed inhaled steroids. This conclusion is supported by data from the INSPIRE study, in which similar exacerbation rates were observed between tiotropium and SFC, regardless of previous use of inhaled steroids 22.
This potential confounding of medication use with severity has important implications for the interpretation of observational database studies that assume that the treatment received depends only on measurable baseline variables, such as lung function. If the way patients are treated cannot be captured by these baseline variables, then such studies can only provide limited information regarding efficacy and safety of specific treatments.
Statistical analysis of exacerbation rates is not simple and the negative binomial analysis is currently the most appropriate choice for primary analysis. Time-weighted estimates from the Poisson model analysis of exacerbations do not account for the overall higher exacerbation rate among patients who withdraw, so they may underestimate the true exacerbation rates and treatment effects. The finding of a reduced exacerbation rate with ICS reported from the TORCH study does not, however, depend on the choice of analysis; irrespective of statistical method used (negative binomial or Poisson), fluticasone propionate has been shown to be more effective than placebo 10, 11 and SFC has been shown to be more effective than salmeterol in the TORCH trial 1, 23.
Post hoc factorial analysis of the TORCH trial provides some additional support for efficacy of LABA treatments in reducing mortality. However, the key assumption of lack of interaction between effects of different treatment arms cannot be verified by a significance test. The mortality benefit shown in TORCH for the SFC combination arm is encouraging and reductions in rate of decline of forced expiratory volume in 1 s with ICS-containing regimens are supportive 24. As previously noted, the use of open-label active medication among withdrawals may have diluted the size of the effect of ICS on mortality in the TORCH study. However, no definitive conclusions can yet be drawn and further studies are required to clarify whether ICS treatment has an effect on mortality.
As this extensive discussion of the issues involved in the TORCH trial illustrates, there are important methodological issues in the design and analysis of COPD trials which merit wider discussion. Different approaches to statistical analysis can lead to different conclusions about the efficacy of treatments and, therefore, interpretation of presented results requires understanding of what comparisons are being made. The conclusions of trials should reflect the analyses performed. When reporting a meta-analysis, which involves combining results from several clinical trials, allowance should be made for the impact of these different methods when drawing conclusions about the effectiveness of treatment.
Statement of interest
Statements of interest for all authors can be found at www.erj.ersjournals.com/misc/statements.dtl
- Received August 8, 2008.
- Accepted April 16, 2009.
- © ERS Journals Ltd