Abstract
Joint modelling is a valuable method to accurately estimate the association between a time-varying biomarker and a survival outcome. The idea of the joint modelling closely reflects contemporary trends in medicine: personalised and dynamic. https://bit.ly/3pbMBFX
Introduction
A commonly used approach to study the association between a time-varying biomarker, such as forced vital capacity (FVC), and a time-to-event outcome, such as new onset of COPD, is to incorporate the biomarker as a time-dependent covariate in a Cox model. This approach, which is known as the time-dependent Cox model (TDCM), requires knowledge of the value of the time-varying biomarker at all time points at which the event of interest occurs [1]. However, in clinical studies, longitudinal biomarker measurements are taken intermittently during scheduled (and sometimes unscheduled) visits, meaning that imputation of missing values is required for those event times at which the biomarker is not observed. In practical application of the TDCM, this is achieved by carrying forward the most recent biomarker measurement. While the use of last observation carried forward (LOCF) is easy to understand and implement, the resulting step function (example in figure 1c) is unlikely to provide a good approximation of the true biomarker trajectory. Consequently, the use of LOCF in the context of the TDCM has been shown to result in an underestimation of the true effect of that biomarker [2]. For example, when studying whether the amount of gluten intake during the first 5 years of life was associated with coeliac disease autoimmunity and coeliac disease in genetically at-risk children, Andrén Aronsson et al. [3] found that the TDCM greatly underestimated the effect of this variable as compared to an alternative approach that was based on the joint modelling of the longitudinal and time-to-event data, with estimated values of the hazard ratio (HR) of 1.14 and 1.30, respectively. In clinical terms, the 6% to 7% increase in absolute risk of coeliac disease for a small 1 g per day increase in gluten intake as estimated by the joint model appeared to be clinically meaningful, while the attenuated effect estimated by the TDCM might not have surpassed the minimum clinically important difference. How this alternative approach based on joint modelling of longitudinal and time-to-event data works and of how it can be applied in the setting of respiratory medicine is described in the next sections.
a) Schematic representation of the joint model (the subscript i is used as the subject index and the subscript j is used as the measurement moment index; tij denotes the time of the j-th measurement moment for the i-th subject). Difference between the b) joint model (JM) and c) time-dependent Cox model (TDCM) in approximating the biomarker trajectory. The black line represents the unobserved true biomarker trajectory for a particular subject. Asterisks represent the observed biomarker measurements for that subject. The red line in panel (b) represents the estimated biomarker trajectory by the JM. The red line in panel (c) represents the estimated biomarker trajectory by the TDCM. The x-axis is the follow-up time. The y-axis is the biomarker value.
How does it work?
A schematic representation of the joint model [4, 5] is provided in figure 1a. At the heart of the joint model are the subject-specific biomarker trajectories mi(t) that provide, for a given subject i, the value of the subject's time-varying biomarker at time t. Given mi(t), the association between the time-varying biomarker and the risk for an event can be described through the proportional hazards modelwhere exp(β) is the HR for a unit increase in the value of the biomarker at time t. The hazard function hi(t) may also depend on other covariates, such as age and gender, for which additional terms can be included.
In clinical studies, the value of a time-varying biomarker is not observed in between two measurement moments. In addition, a subject's observed biomarker measurement at a particular time point may be expected to fluctuate around the “true” value of the subject's biomarker trajectory at that time point due to short-term biological variation and variability in the measurement process (e.g. intra- and inter-assay variability). The joint model addresses this issue by approximating the true but unknown biomarker trajectories with linear regression equations. If we assume, for simplicity of exposition, that these trajectories change linearly with time, the model for mi(t) can be specified aswhere ai and bi are the intercept and slope (modelled as random coefficients that are distributed according to a bivariate normal distribution) of the biomarker trajectory of the i-th subject. This approach is illustrated in figure 1b, where the red line represents the modelled biomarker trajectory (i.e. the fitted regression line) for a particular subject and the asterisks represent the observed biomarker measurements for that subject. The random fluctuation of the observed measurements around the modelled biomarker trajectory is represented by a normally distributed error term.
The different components of the joint model (i.e. the parameters of the hazard function, the parameters of the bivariate normal distribution of the intercepts and slopes of the subject-level biomarker trajectories, and the standard error of the normally distributed error term) are jointly estimated from the available longitudinal and survival data by maximising the joint likelihood of these observations. For a more detailed discussion of the construction of the joint likelihood function and the available estimation procedures, we refer to Rizopoulous [4].
Real example
Data
SERVE-HF was a multicentre randomised clinical trial that included 1325 patients with heart failure and reduced ejection fraction (ejection fraction ≤45%) and predominant central sleep apnoea [6]. Patients were randomised into an adaptive servo-ventilation (ASV) group (n=666) and a control group (n=659). Clinical visits took place at study entry, after 2 weeks, at 3 and 12 months, and every 12 months thereafter until the end of the study.
Objective
Our goal was to estimate the association between the current value of a subject's central apnoea-hypopnoea index (cAHI) trajectory and cardiovascular mortality. All analyses were performed in the ASV group because follow-up cAHI measurements were limited to this group. TDCM and joint model were fitted and compared. Two covariates, left ventricular ejection fraction (LVEF; ≤30%, 31–36%, or >36%) and Cheyne–Stokes respiration (CSR; <20%, 20–50%, or >50%), were included as covariables and potential interactions with current value of cAHI trajectory [7, 8].
Results and interpretation
We first fitted a TDCM and found that the association between the current value of cAHI and cardiovascular mortality differed in the LVEF subgroups (pinteraction=0.02). Higher cAHI in the follow-up was not associated with outcome in patients with LVEF ≤30% (HR per 5 units: 0.96, 95% CI 0.85–1.08). It was, however, significantly associated with increased risk of cardiovascular mortality in those with LVEF 31–36% (HR per 5 units: 1.28, 95% CI 1.13–1.45). This association disappeared again in patients with LVEF >36%, (HR per 5 units: 1.19, 95% CI 0.97–1.45).
The joint model was then fitted to the same data. Compared to the TDCM, the interaction effect between the current value of the cAHI trajectory and LVEF was more evident under the joint model (pinteraction<0.01). Higher cAHI in the follow-up indicated lower risk of cardiovascular mortality in patients with LVEF ≤30%, although the effect was still not statistically significant (HR per 5 units: 0.82, 95% CI 0.61–1.10). In the other two LVEF subgroups, higher cAHI values were significantly associated with increased risk of cardiovascular mortality (LVEF 31–36%, HR per 5 units: 1.35, 95% CI 1.07–1.69; LVEF >36%, HR per 5 units: 1.44, 95% CI 1.03–2.00).
By comparing the results from the two models, we found that the estimated hazard ratios from TDCM were attenuated as expected based on the statistical literature. This attenuation might conceal a true harmful effect of increased cAHI on cardiovascular mortality in patients with LVEF >36%, thus missing the indication that lowering cAHI by wearing ASV might improve the outcome in this subgroup.
Current and future application of joint modelling in respiratory medicine
By simultaneously modelling longitudinal and survival data, joint modelling can also be used to account for informative dropout when the longitudinal outcome is of main interest. This usage of joint modelling has already received some attention in respiratory medicine, especially in scleroderma-related interstitial lung disease (SSc-ILD) [9, 10]. However, joint modelling has seldom been used to examine the association between a time-varying biomarker and risk of experiencing a clinical event. Instead, such analyses were conducted using simpler methods. For example, in the SSc-ILD field, several studies identified change in FVC% as an independent predictor of mortality [11–16]. In these studies, change in FVC% was defined as the percentage change from baseline to a specific landmark time point (mostly 6 months to 12 months). Clinicians found that the prognostic value of change in FVC% over 12 months was highly variable across studies [16], possibly because more patients dropped out or died over a longer follow-up. Joint modelling would be a better choice here to study the association of the change in FVC% with survival by including all the available information and accounting for the measurement error of FVC.
Compared to the TDCM, the use of joint modelling has shown to result in less biased estimates of the association between a time-varying biomarker and a time-to-event outcome [2]. However, the approach does require the analyst to make additional assumptions that could adversely affect these estimates if left unchecked. For example, previous studies have shown that the estimated value of the hazard ratio could be severely biased when the shape of the biomarker trajectory is misspecified [2, 17]. This issue can be addressed by considering a more flexible function of time, e.g. polynomials or splines, for the subject-level biomarker trajectories. A similar bias has shown to occur when the baseline hazard is misspecified [2, 4]. The joint model is, however, relatively robust to misspecification of the distribution of the random coefficients of the subject-level biomarker trajectories [2, 18]. Another important aspect of joint modelling is the appropriate specification of the relationship between the biomarker trajectory and the risk for an event. Until now, we have assumed that the risk for an event depends on the current value of the biomarker trajectory. However, other specifications, such as where the hazard is modelled as a function of the slope of the biomarker trajectory or the cumulative area under the biomarker trajectory, are also possible [4]. Which specification of the relationship to use can be decided based upon statistical criteria like the Bayesian information criterion or can be based on reasoning regarding the underlying biological mechanisms.
While joint modelling may be more complicated to understand and utilise than some of the simpler methods, functionality to perform such analyses is nowadays implemented in most major statistical software packages, such as the JM [19] and joineR [20] packages in R, the %JM macro [21] in SAS, and the stjm command [22] in Stata. The availability of such software to fit a range of flexible models (e.g. competing risk, cure rates) and to account for epidemiological biases (e.g. late-entry, interval-censored) is likely to facilitate the application of joint modelling even further. Precision medicine, including the development of biomarkers, is increasingly influencing respiratory medicine, e.g. PDL1 in lung cancer, FENO in asthma, and fibrinogen in COPD [23]. Joint modelling is likely to gain more popularity because it can easily update a patient's prognosis after each visit in which new biomarker measurements are taken. This strategy more closely reflects contemporary trends in medicine: personalised and dynamic. Also, from the perspective of statistical design, joint modelling, by incorporating longitudinal biomarker information, may lead to a reduction of bias and increase in power in estimating treatment effects in randomised controlled trials [24].
Shareable PDF
Supplementary Material
This one-page PDF can be shared freely online.
Shareable PDF ERJ-03206-2020.Shareable
Footnotes
Conflict of interest: Y. Chen has nothing to disclose.
Conflict of interest: D. Postmus has nothing to disclose.
Conflict of interest: M.R. Cowie has nothing to disclose.
Conflict of interest: H. Woehrle has nothing to disclose.
Conflict of interest: K. Wegscheider has nothing to disclose.
Conflict of interest: A.K. Simonds has nothing to disclose.
Conflict of interest: M. Boezen has nothing to disclose.
Conflict of interest: V.K. Somers has served on a steering committee for SERV HF, outside the submitted work.
Conflict of interest: H. Teschler has nothing to disclose.
Conflict of interest: C. Eulenburg has nothing to disclose.
- Received August 19, 2020.
- Accepted November 4, 2020.
- Copyright ©ERS 2021