Missing data in IPF trials: do not let methodological issues undermine a major therapeutic breakthrough
- 1Service de pneumologie B et transplantation pulmonaire, Hôpital Bichat, APHP, Paris, France
- 2INSERM U1152, Université Paris Diderot-Paris 7, France
- 3Service de pneumologie A, Hôpital Bichat, APHP, Paris, France
- 4Centre d’épidémiologie Clinique, Groupe Hospitalier Cochin – Hôtel Dieu, Paris, France
- 5INSERM U1153, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
- 6National Institute for Health Research Southampton Respiratory Biomedical Research Unit and Clinical and Experimental Sciences, University of Southampton, Southampton, UK
- Gabriel Thabut, Service de pneumologie B et transplantation pulmonaire, Hôpital Bichat, 46 rue Henri Huchard, 75018, Paris, France. E-mail: gthabut{at}gmail.com
Abstract
Missing data will not disappear in future IPF (and not only IPF) trials http://ow.ly/PijJo
Idiopathic pulmonary fibrosis (IPF) is a chronic, progressive disorder associated with a poor prognosis. Until recently, there has been limited evidence that any drug could alter the course of the disease [1]. However, thanks to worldwide cooperative efforts, large randomised controlled trials have recently demonstrated the efficacy of pirfenidone and nintedanib in slowing down the course of IPF [2–4]. After decades of disappointing results in a number of clinical trials, these robust, groundbreaking studies now give patients with IPF the choice of safe and effective therapeutic options.
Although there has been some debate over the optimal end-point for phase III clinical trials in IPF [5, 6], these trials used forced vital capacity (FVC) decline as the primary end-point, in line with the recommendations from the US Food and Drug Administration (FDA), and have measured mortality in order to support the FVC results [7]. However, as for any functional parameter, lung function could not be evaluated in some study participants who dropped out because of loss to follow-up, withdrawal or death, leading to missing data that complicated the interpretation of the results. The same is true for vital status data that may not be reported for patients lost to follow-up or are ambiguous to interpret in patients who underwent lung transplantation. It is therefore important to understand the main challenges that missing values pose in these and future IPF trials, and the pitfalls of the methods that are used to deal with them [8, 9].
FVC decline
FVC has been the standard clinical measure of pulmonary function in IPF for many years. Longitudinal change in serial measures of FVC is a widely accepted marker of disease progression in patients with IPF. Several studies have identified change in FVC % predicted as an independent predictor of mortality [10–15]. In a recent study, change in FVC % predicted over 24 weeks was highly predictive of death over the subsequent 1-year period. Risk of death was nearly five-fold higher for patients with absolute declines in FVC % predicted ≥10% and more than two-fold higher for those with absolute declines between 5% and 10%, compared with patients who experienced declines in FVC % predicted <5% [16]. Similarly, a relative or absolute decline of FVC ≥10% over 12 months was associated with a shorter 2-year transplant-free survival [17]. FVC decline has been consistently used as an end-point in IPF clinical trials, and is also used by clinicians as a trigger for treatment initiation and as a criterion for referral for lung transplantation [18]. The FDA acknowledges that interpreting changes in lung function in IPF is challenging but proposes that a decrease in FVC % predicted ≥10% could be regarded as clinically meaningful based on available evidence [19]. The five trials that investigated the efficacy of pirfenidone and nintedanib in IPF patients have generated a considerable amount of data that reinforce the validity of FVC change as a surrogate for mortality [20].
In the setting of controlled clinical trials, FVC values are typically collected every 6–12 weeks. FVC decline has thus usually been calculated as the difference between FVC measured at the last study visit and that measured at baseline. However, FVC cannot be measured at the end of the study in all patients.
When looking at the five phase III trials that led to the approval of pirfenidone and nintedanib, FVC data were missing in 13.2% of patients in the CAPACITY trials and 15.3% of patients in the two INPULSIS trials, whereas the number of missing FVC data has not been reported for the ASCEND trial [2]. The main cause of missing data was death, occurring in 8.8% of CAPACITY, 6.4% of INPULSIS and 5.6% of ASCEND patients.
Missing data for the primary end-point raise problems in the assessment of the magnitude and statistical significance of the treatment effect. Nonetheless, it is also important to point out that most missing data are informative in nature and possibly biased across treatment groups, which is an important consideration when interpreting efficacy outcomes. This includes dropouts due to adverse events and disease progression, which, in addition to deaths, have accounted for a high proportion of dropouts in these trials. Investigators have used different strategies to deal with these data that are summarised in table 1. Here, we discuss the methods used in these trials.
Handling of missing data in the INPULSIS trials
In these trials of nintedanib in IPF, the primary efficacy end-point was the annual rate of decline in FVC (measured in mL per year). With this approach, mean FVC at each time-point can be computed on all the nonmissing available data. Consequently, FVC is computed for a decreasing number of patients over time. This method implicitly assumes that patients who dropped out had the same FVC decline than patients who did not (data missing completely at random assumption). However, FVC decline is a well-known risk factor for death, implying that patients who died were likely to have faster FVC decline than those still alive at the end of the study. In the INPULSIS trials, analysis of the treatment effect involved a random coefficient regression model (with random slopes and intercepts) that included sex, age and height as covariates. As acknowledged by the authors, this model allowed for missing data, assuming that they were missing at random. In addition, this model assumes that the evolution of FVC is linear over time, which might not be the case. The impact of the violation of these assumptions remains unknown. The authors ran several pre-specified sensitivity analyses to test whether the violation of this assumption would alter the results of the study.
Handling of missing data in the CAPACITY and ASCEND trials
In these trials of pirfenidone in IPF, the primary efficacy end-point was the decline in FVC % predicted in the intention-to-treat population. The magnitude of the treatment effect was estimated by categorical changes in FVC (absolute decline of 10 points in FVC or death) and use of differences in treatment group means.
The use of a composite end-point, including death or FVC decline, is a way to deal with missing FVC values owing to death, as it gives the same weight to death or FVC decline >10%. However, there is no natural FVC cut-off and dichotomisation of a continuous variable (such as FVC) leads to a considerable loss of power. Two patients with 1% and 9% FVC decline will be in the same group, whereas two patients with 9% and 11% FVC decline will not. The use of composite end-points raises concerns that have been described elsewhere [33].
In the analysis of differences in treatment group means, imputation strategies in these trials varied according to the cause of missing data. Missing data owing to death were assigned a FVC value of 0 mL at week 72 or 52 in the CAPACITY and ASCEND trials, respectively [2, 3]. Even if the absolute number of deaths was low in the patients who fulfil the inclusion criteria of these trials [35], this strategy greatly influenced the estimate of the mean FVC decline in the whole population, as patients who had missing data are attributed a huge, unrealistic FVC decline. In these trials, the mean FVC decline (expressed as a percentage of the predicted value) measured in patients with the entire follow-up was around 5%. For patients who died, assuming that their baseline FVC value was 70% predicted, their FVC decline was then set to 70%. For instance, in the CAPACITY-004 trial, decline in FVC % predicted in the pirfenidone group was 4.4% when missing data were ignored compared to 8.0% when imputation of missing data was done, which is almost twice the decline. When the absolute decline in FVC was expressed in millilitres, the estimate for the pirfenidone group was 169 mL without imputation as compared to 318 mL with imputation. Figure 1 shows the FVC decline over time in the two treatment arms of the CAPACITY-004 trial according to the method of imputation [19]. Not only does the imputation strategy influence the estimation of FVC decline in both treatment groups but it may also impact the estimate of the treatment effect. For instance, in the CAPACITY-004 trial, the effect size of treatment increased from 2.1% to 4.4% with imputation; however, both were statistically significant (p=0.001 and p=0.007 with and without imputation, respectively). Patients who had no FVC data available for reasons other than death were given imputed values according to the sum of squared differences method. This method replaces missing data with imputed data based on the average measurements for “similar” patients at the given time-point, namely the three patients with the smallest sum of squared deviations from that patient for all visits before the one with the missing data.
In these trials, analysis of the treatment effect involved a ranked ANCOVA in which each original data value is replaced by its rank (from 1 for the smallest to N for the largest). This method accommodates non-normal data. In this analysis, missing values owing to death were assigned the worst rank, with early deaths ranked worse than later deaths. This method makes the reasonable assumption that patients who died had worse FVC than patients who did not.
Other strategies to handle missing data
Table 1 summarises the methods used to account for missing data in the primary end-point for trials assessing treatments for IPF with FVC decline as the primary end-point. A historically popular approach is the last observation carried forward method [30, 33], in which the last FVC value recorded before the visit where FVC is missing is used. This method has its own limitations, however, since it assumes that there was no change in FVC value between the last point available and the time the patient dropped out [36]; this method has been mostly abandoned. Other strategies to impute data missing because of dropouts have been more recently advocated, such as reference-based imputation [37, 38]. These methods may accommodate data missing not at random and rely on hypothesised behaviour of the outcome after dropout. For instance, one may consider the so-called “jump to reference” scenario, where patients who withdraw from the experimental arm would behave as similar patients from the control arm. Several other “patterns” have been studied and can be used as sensitivity analyses [39].
A joint modelling of the longitudinal (FVC over time) and of the time to event (survival) outcomes has been recently proposed, and could provide more accurate estimates of FVC decline when nonrandom dropouts occur. As stated above, the traditional linear mixed-effect model used, for instance, in the INPULSIS trial assumes that missing FVC data are missing at random, which is untenable, as FVC values and FVC evolution are associated with survival. The basic idea of joint modelling is to evaluate both outcomes simultaneously and borrow strength from each outcome to better estimate the effect of treatment on the other. In the present case, it allows recovery of information from a survival model that includes FVC as a time-varying covariate, to estimate the evolution of FVC. However, the use and interpretation of these models are not straightforward [40].
Mortality
In the aforementioned trials, vital status could not be measured in some patients because some of them received a lung transplant before the end of the study. For instance, in the ASCEND trial, six patients underwent lung transplantation in the pirfenidone arm compared with only one in the placebo arm [2]. To further explore the validity of the findings, the investigators face several choices. The first possibility is to treat transplantation as a censoring event, as in a traditional survival analysis where the patients still alive at the last follow-up are censored. This approach assumes that the distribution of the survival times of the patients who received a transplant (had they not received a transplant) would be the same as that of the patients who were followed up till death, which is called noninformative censoring. This assumption is untenable in IPF, as it is clear that being listed for a lung transplant is a strong indicator of a high risk of death (informative censoring). Another method, used, for instance, by the investigators of the ASCEND study, is to disregard the lung transplant event and to record the vital status of all patients whether they receive a transplant or not. Although statistically correct, this approach is of dubious value as, despite the patient being alive, their being in need of a transplant is not really what one would expect from an effective IPF treatment. Moreover, lung transplantation reduces the risk of death in these patients [41] and when death occurs following lung transplant it is largely unrelated to the underlying disease. The last solution is the worst case scenario, counting transplantations as deaths; in other words, this approach assumes that the patient would have died the day they underwent a transplant, had the transplant not been done. These three different approaches give obviously different survival figures as well as treatment estimates and all yield biased estimates of the true unknown survival.
Another issue regarding transplantation is that not all patients have the same likelihood of receiving a transplant. According to their characteristics, some might be less likely to receive transplants than others and some may even not be eligible for transplantation at all, as eligibility for lung transplantation depends not only on several patient-related factors (age and comorbidities) but also practices that may vary across centres and countries. It must be noted, however, that in these studies, several sensitivity analyses were performed in order to test the robustness of the results. In the mortality analysis with the worst-case handling of lung transplantations (i.e. transplants that were counted as deaths), the risk was 37% lower at 1 year in the pirfenidone group than in the placebo group (hazard ratio 0.63, 95% CI 0.40–0.99). In the mortality analysis in which patients who underwent transplantation were followed and the data were not censored, the hazard ratio was 0.51 (95% CI 0.31–0.86), close to the results of the pre-specified analysis of all-cause mortality (hazard ratio 0.52, 95% CI 0.31–0.87) [42].
Approaches have been developed to obtain more accurate estimates of survival times of patients while accounting for informative censoring due to transplantation, such as inverse probability of censoring weighting (IPCW) [43]. The basic idea is to weight observations based on their likelihood of being incomplete (because of lung transplant); that is, to reweight cases from underrepresented groups. This approach assigns time-dependent weights to every patient still under follow-up, equal to the inverse of the probability that a patient with similar characteristics would be transplanted after each time. Patients who did not receive lung transplant (but with a high probability of transplantation) are thus given more weight to account for the attrition of patients like them due to transplant. In the context of a different lung disease, the use of the IPCW method yielded survival estimates that fell between a traditional survival analysis that would treat transplant as a censoring event and the worst-case scenario [44].
We think that the limitations that we have underlined do not invalidate the results of these trials (leading to the approval of both pirfenidone and nintedanib) because they were backed up by numerous pre-defined sensitivity analyses showing that the results were consistent whatever the imputation strategy employed.
A need for uniform recommendations for handling of missing data in IPF trials
The use of imputation strategies that differ from trial to trial may create misleading confusion in results interpretation. For instance, in his editorial on the ASCEND and INPULSIS trials, Hunninghake [45] pointed out that the pirfenidone group in the ASCEND study had a greater annual FVC decline (235 mL) than did the placebo groups of both INPULSIS studies (205 mL). This observation led to speculation about inclusion criteria or recruitment issues but these apparent discrepancies can be explained by differences in the imputation strategies for the two trials. Actually, when no imputation strategy was used in either trial, the annual FVC decline in the active groups of the ASCEND and INPULSIS trials were 164 and 95 mL·year−1, respectively, whereas the FVC declines in the placebo groups were 280 and 205 mL·year−1, respectively. However, it has to be noted that, even if imputation strategies would be identical, still comparing the effect of two drugs tested in two separate trials would likely lead to false estimates. The only way to decide whether one drug is better than another is to carry out adequately powered head to head studies.
Nonetheless, every effort should be made to limit missing data in clinical trials. However, missing data will not disappear in future IPF (and not only IPF) trials. Although trial designs in IPF will differ, as they will need to be tailored to the specific intervention, the mechanism of action, the patient population and various other factors, we advocate a consensus statement about the reporting of these trials with uniform strategies regarding the handling of missing data and the analysis of efficacy end-points.
Footnotes
Conflict of interest: Disclosures can be found alongside the online version of this article at erj.ersjournals.com
- Received November 29, 2014.
- Accepted May 9, 2015.
- Copyright ©ERS 2015