Abstract
Background The gender–age–physiology (GAP) model was developed to predict the risk of death. Comorbidities are common in idiopathic pulmonary fibrosis (IPF) and may impact on survival. We evaluated the ability of comorbidities to improve prediction of survival in IPF patients beyond the variables included in the GAP model.
Methods We developed a prediction model named TORVAN using data from two independent cohorts. Continuous and point-score prediction models were developed with estimation of full and sparse versions of both. Model discrimination was assessed using the C-index and calibrated by comparing predicted and observed cumulative mortality at 1–5 years.
Results Discrimination was similar for the sparse continuous model in the derivation and validation cohorts (C-index 71.0 versus 70.0, respectively), and significantly improved the performance of the GAP model in the validation cohort (increase in C-index of 3.8, p=0.001). In contrast, the sparse point-score model did not perform as well in the validation cohort (C-index 72.5 in the derivation cohort versus 68.1 in the validation cohort), but still significantly improved upon the performance of the GAP model (C-index increased by 2.5, p=0.037).
Conclusions The inclusion of comorbidities in TORVAN models significantly improved the discriminative performance in prediction of risk of death compared to GAP.
Abstract
This is the first ever validated clinical prediction model and point score index for all-cause mortality in IPF to include comorbidity variables. Their inclusion significantly improved prediction of survival beyond demographic and physiological parameters. http://ow.ly/H6Dn30mZsxh
Introduction
Idiopathic pulmonary fibrosis (IPF) is a rare lung disease of unknown aetiology, characterised by irreversible, progressive fibrosis of the lungs that leads to an increasing worsening of lung function [1, 2]. Prognosis of IPF is very poor, with a median survival estimated at 3–5 years, which is worse than many types of cancer [3–5]. However, there is substantial heterogeneity in risk of death among individual patients, with survival times ranging from <1 year to >10 years [6–8]. Accurate prediction of survival in IPF is important both for patient counselling and for informing management decisions.
Several studies have reported predictors of survival in IPF, either alone or in combinations, the latter usually through the use of multivariable risk prediction models [9–13]. The gender–age–physiology (GAP) model is the most widely validated multivariable prediction model for mortality in IPF, which includes variables for age, sex and percentage of predicted forced vital capacity (FVC) and diffusing capacity of the lung for carbon monoxide (DLCO) [14]. While this model has demonstrated consistent prediction across multiple cohorts, its discriminative performance is modest. Because the GAP model is intended to predict all-cause mortality in IPF, it lacks accounting for mortality reasons other than respiratory. This is important as only 60–70% of patients with IPF die from causes directly related to IPF [7], and the remaining causes of death may be due to other comorbid diseases present in this older population. In addition, it was reported that comorbid diseases and IPF and other progressive interstitial lung diseases (ILDs) interact to increase the risk of both IPF and non-IPF mortality [15, 16]. Comorbidities are common in IPF, and several have been shown to be associated with survival in IPF; the most notable examples include lung cancer, pulmonary hypertension and cardiovascular diseases [17–25]. However, the ability of these comorbidities to improve survival prediction in IPF, beyond basic demographics and measures of disease severity (i.e. pulmonary function) has not been systematically evaluated.
In this study (the TORVAN study), we evaluated the ability of comorbidities to improve prediction of overall survival in patients with IPF beyond those variables included in the GAP model. To do this, we derived and validated multivariable prediction models that considered comorbidities, in addition to the GAP variables, in two large, multinational, independent cohorts of IPF. We then evaluated their predictive performance, including discrimination and calibration, in comparison to the GAP models in order to assess the contribution of comorbidities to survival prediction in IPF. The online TORVAN index calculator tool is available at www.polmoraresicilia.it/torvan.
Methods
Study patients
The study population consisted of 931 consecutive patients with IPF evaluated at four international academic ILD centres. Data were retrospectively extracted from clinical medical records. All patients were required to have received a diagnosis of IPF according to established criteria [1, 26]. Patients were then divided into a derivation and a validation cohort. The derivation cohort included a total of 476 patients diagnosed at the Regional Referral Centre for Interstitial and Rare Lung Diseases of the University of Catania (Catania, Italy) (n=126); the department of respiratory medicine of the Erasmus Medical Center (University Medical Center Rotterdam, Rotterdam, the Netherlands) (n=91) and the Centre for Interstitial and Rare Lung Diseases (Thoraxklinik, University of Heidelberg, Heidelberg, Germany) (n=259) between January 2004 and December 2016. The validation cohort included 461 patients diagnosed at the University of California, San Francisco (San Fransisco, CA, USA) between January 2007 and March 2017. Some patients in the derivation cohort were previously included by Kreuter et al. [15] in their study on the establishment of a comorbidome in IPF, while some others (n=228) in the validation cohort were already included in the derivation of the GAP model [14]. This excluded the possibility of a self-validation of the study.
Pulmonary function tests
Pulmonary function tests (PFTs) were performed according to American Thoracic Society/European Respiratory Society criteria [27]. Only patients with PFTs completed within 3 months of the time of diagnosis were included in the analysis. As in the GAP model, FVC % pred and DLCO % pred were considered as potential predictors of prognosis and if patients were found to be unable to perform DLCO, this was considered as a further indicator of worse prognosis.
Comorbidities
In the derivation cohorts, data on comorbidities and related treatments were routinely collected at baseline visits through direct questioning of the patient (including standardised questionnaires) and a systematic analysis of related medical reports and exams. Comorbidities collected in the derivation cohort included systemic hypertension; coronary artery disease; cerebrovascular diseases; atrial arrhythmias; valvular heart diseases defined as mitral, tricuspid or aortic stenosis or regurgitation assessed through echocardiography; venous thromboembolism; peripheral vascular disease; emphysema defined as areas of decreased attenuation in comparison with contiguous normal lung assessed using computed tomography [28]; diabetes mellitus; gastro-oesophageal reflux disease (GORD) assessed by direct questioning about symptoms and use of proton-pump inhibitors/histamine 2 blocker drugs and/or evaluation with 24-h pH monitoring and endoscopy; pulmonary hypertension defined as mean pulmonary artery pressure of ≥25 mmHg on right heart catheterisation or estimated systolic pulmonary artery pressure of ≥40 mmHg according to the criteria of Galiè et al. [29]; sleep apnoea assessed using polysomnography; major depressive disorder assessed through medical reports and related drugs; dyslipidaemia; hypo-/hyperthyroidism; lung cancer; kidney failure; and liver failure. These comorbidities represented the candidate comorbidity variables for the derivation model. In the validation cohort, only comorbidities selected as important for survival prediction in the derivation cohort were collected through a combination of retrospective review of the medical record and from an intake ILD questionnaire that specifically asked patients about a history of GORD, diabetes, pulmonary hypertension and obstructive sleep apnoea. Only comorbidities present at the time of diagnosis were considered for the analysis.
Outcome
The primary outcome was survival, which was defined as the time from initial diagnosis to death, with right-censoring at the time of lung transplantation or at the end of the observation period for those individuals who were alive and transplant-free at the end of the observation period.
Statistical analysis
The distributions of baseline continuous variables were reported as mean±sd and compared between cohorts using the t-test. For baseline binary variables, the number and percentage of the cohort were reported and compared between cohorts using the Chi-squared test.
Multivariable Cox proportional hazards models for transplant-free survival were estimated in the derivation cohort using the least absolute shrinkage and selection operator (LASSO) [30]. We chose LASSO analysis because is able to improve the prediction accuracy and interpretability of regression models. LASSO forces the sum of the absolute value of the regression coefficients to be less than a fixed value and forces certain coefficients to be set to zero. This leads to alteration of the model fitting process to select only a subset of the provided covariates for use in the final model rather than using all of them. Age, sex, baseline FVC and baseline DLCO, as well as all comorbidity variables were considered as potential predictors. The LASSO Cox model was first estimated using continuous variables (e.g. age, FVC and DLCO) as observed; in addition, we categorised the continuous variables, re-estimated the model and re-scaled the resulting coefficients, generating point scores. Full and parsimonious (sparse) versions of both models were estimated, respectively minimising cross-validated prediction error and obeying a more parsimonious criterion accounting for simulation variability in the cross-validation. In a final step, model results were used to estimate probability of transplant-free survival 1–5 years after diagnosis for patients in both the derivation and validation cohorts.
Model discrimination was evaluated using the C-index, with 95% confidence intervals estimated using bias-corrected bootstrap resampling with 500 repetitions. The LASSO models were compared in terms of the C-index to the GAP model, re-estimated using the derivation cohort, using bootstrapping to evaluate differences. In addition, model calibration was evaluated by comparing model-based and nonparametric Kaplan–Meier transplant-free survival estimates at years 1–5, by quartile of model-estimated risk for the continuous models and approximate quartiles of point scores for the point-score models. In addition, we formally compared the Kaplan–Meier survival rates across quartiles using the log-rank test. LASSO was implemented using the glmnet package version 1.0 in R version 3.4.3 (www.R-project.org). All other analyses were performed using STATA version 15.0 (StataCorp, College Station, TX, USA).
Results
Cohort characteristics
Characteristics of both cohorts are reported in table 1. Compared to the validation cohort, patients in the derivation cohort were, on average, younger and had higher baseline FVC % pred, but had similar DLCO % pred. Fewer patients had GORD, while more patients had lung cancer, pulmonary hypertension, cerebrovascular disease, diabetes and systemic hypertension. The proportion of patients with atrial arrhythmias, valvular heart disease and depression were similar between the two cohorts. Median follow-up time was comparable between the cohorts (2.9 versus 2.5 years, p=0.95). A higher proportion of patients died in the derivation cohort (57% versus 41%) while fewer patients underwent lung transplantation (1.26% versus 13%). Overall median transplant-free survival was shorter in the derivation cohort compared to the validation cohort (3.7 versus 4.6 years, log-rank p=0.001).
Model derivation and variable selection
Variables selected, and their effect sizes, for each version of the model (full versus sparse and continuous versus point-score) are shown in table 2. All models selected age, FVC, DLCO, GORD (the presence of which was protective), pulmonary hypertension, lung cancer, valvular heart disease and atrial arrhythmias as important for survival prediction. The full models also selected for diabetes, cerebrovascular disease, arterial hypertension and major depressive disorder. None of the models selected for sex. Comorbidities not selected for in any of the models included coronary artery disease, venous thromboembolism, peripheral vascular disease, emphysema, sleep apnoea, dyslipidaemia, hypo-/hyperthyroidism, kidney failure and liver failure.
Model performance and external validation
Model discrimination in the derivation and validation cohorts compared to the GAP model is shown in table 3. Discrimination was similar for the sparse continuous model in the derivation and validation cohorts (C-index 71.0, 95% CI 67.8–74.2 versus 70.0, 65.6–74.3, respectively), and significantly improved upon performance of the GAP model in the validation cohort (increase in C-index 3.8, p=0.001). In contrast, the sparse point-score model did not perform as well in the validation cohort (C-index 72.5, 95% CI 69.5–75.6 in the derivation cohort compared to 68.1, 65.1–72.1 in the validation cohort), but still significantly improved upon the performance of the GAP model (increase in C-index 2.5, p=0.037). The full versions of the continuous and point-score models demonstrated similar discrimination as the sparse versions, without appreciable improvement in discrimination despite inclusion of more variables. Table 4 shows how we calculated TORVAN index and stage.
Model calibration in years 1–5 for both cohorts is shown in figure 1 for the sparse models and supplementary figure S1 for the full models. In general, all models tended to overestimate risk of death at each time point in the validation cohort. Kaplan–Meier survival plots, by quartile of risk, for both cohorts, are shown in figure 2 for the sparse models and supplementary figure S2 for the full models. Survival by these groupings was significantly different for all models in both cohorts (log-rank p-value <0.001 for all comparisons).
Discussion
In this study, we developed and validated the first-ever clinical prediction model and point-score index (called the TORVAN model and index) for all-cause mortality in IPF that includes comorbidity variables. In addition to the model's potential clinical value, we made other important observations in developing the models. These include 1) inclusion of comorbidities improves prediction of survival beyond basic demographic and physiological information (i.e. the GAP model); 2) relatively few comorbidities demonstrated significant improvement in survival prediction when considered along with basic demographic and physiological information and these tended to be comorbidities expected to influence short-term mortality; and 3) patient sex becomes a less important prognostic indicator when considered in the context of comorbidities.
We found that the most important comorbidities for survival prediction in IPF are GORD, pulmonary hypertension, lung cancer, valvular heart disease and atrial arrhythmias (figure 3). These variables were selected in all modelling analyses, and the inclusion of more comorbidities in models (by relaxing the LASSO selection criterion) did not appreciably improve prediction. Most of the selected comorbidity variables have previously been associated with survival in IPF [17–25]. It is notable that selected variables, with the exception of GORD, tended to be less common but highly morbid, whereas more common comorbidities such as systemic hypertension and coronary artery disease were not selected. We speculate that this may be because these comorbidities would be expected to influence longer-term mortality (relative to pulmonary hypertension and lung cancer) and IPF itself has high short-term mortality. Somewhat unexpected was the consistent, protective association of GORD in all of our modelling analyses. The reason for this association is unclear, but is consistent with findings of previous studies [15–31]. Potential explanations include 1) patients with GORD may have received an earlier diagnosis of IPF because of symptoms related to reflux; 2) a GORD-driven endotype of IPF may exist that has better prognosis relative to non-GORD-driven endotypes; and 3) the association could indirectly reflect benefits of anti-acid therapies in IPF. In addition, we explored the possibility that the selection of atrial arrhythmia could have represented a surrogate measure of anticoagulant use [32]. However, since only a small and nonsignificant number of patients were treated with vitamin K antagonists, we concluded that in our study the impact of atrial arrhythmias is due to the comorbidity itself and not related to its therapy. Finally, in contrast to the GAP model, sex was not selected as an important predictor of survival in the context of comorbidities. We speculate that this may be because male sex serves as a marker of greater comorbidity burden, rather than a biological marker of disease behaviour, and thus becomes less important of a predictor when comorbidities are considered.
All models (sparse versus full, continuous versus point-score) demonstrated acceptable, but modest discriminative performance with very little difference in the C-index across models and the derivation and validation cohorts. Importantly, the comorbidity models significantly improved upon discriminative performance compared to basic demographic and physiological variables included in the widely validated GAP prediction models. Because the comorbidity variables included in our models are routinely collected in the course of patient evaluations, their use in the clinical setting should be straightforward, adding relatively little complexity compared to the GAP model. Calibration (the comparison of model-predicted and observed mortality risk) was generally good in both cohorts, but the TORVAN models tended to overestimate risk in the validation cohort because of overall reduced mortality risk in this cohort compared to the derivation cohort, which was mostly explained by the higher rate of lung transplantation in the validation cohort. We believe that the use of the TORVAN model may provide clinicians with a way to discuss the prognosis of patients; a means by which to identify patients at greater risk of mortality to focus on for an early referral to lung transplant or for future clinical trials dedicated to these subsets; and to discuss end-of-life care and palliative support [33, 34].
There are several strengths to our study. First, our study design has several features that increase generalisability of our prediction model. These include the use of large multicentre and multinational cohorts of well-characterised patients with IPF, collection of data from the real-world clinical setting and the use of independent model derivation and validation cohorts. Second, our analytic strategy, which utilised the LASSO procedure, is expected to improve generalisability by limiting overfitting of the model and overly optimistic (inflated) predictor effects. Third, we evaluated model performance by assessing both model discrimination and calibration, the two essential features of model fit. Finally, we compared the ability the TORVAN models to improve upon the discriminative performance of a widely validated base model, the GAP model.
There are important limitations of our study to consider. Perhaps the most important limitation is the retrospective design, which could affect the quality and accuracy of our comorbidity data. This is because comorbidities were collected from retrospective review of medical records and patient intake questionnaires and were not prospectively and systematically evaluated at the onset of the study. This could cause both under-reporting (especially in the case of pulmonary hypertension, where echocardiography was not routinely performed in all patients) and over-reporting of certain comorbidities (e.g. GORD; confirmatory tests were performed on all patients). However, both forms of misclassification would be expected to reduce overall model performance. We are missing the effects of IPF therapies (i.e. antifibrotics as positive and immunomodulators as negative effects on survival) and potential influences of comorbidities related treatments on survival. In addition, we were not able to evaluate the cause of death in most cases, and therefore we were only able to develop models predictive of all-cause mortality. This is unfortunate because we may expect to find different sets of predictors, and predictor effects, for IPF versus non-IPF related causes of death. Differentiating the probabilities of death from IPF versus non-IPF causes of death could be useful clinically by informing certain aspects of management such as treatment of comorbidities, anticipated benefit from antifibrotic therapy and appropriateness for lung transplantation. However, it must be said that the separation between IPF and non-IPF related causes of mortality might be somewhat artificial and, in a “intention to prognosticate” approach, all-cause mortality is the only end-point that captures the true prognostic significance of a comorbidity.
In conclusion, the TORVAN prediction index demonstrates that inclusion of comorbidities improves the prediction of survival beyond basic clinical and physiological parameters in IPF, with similar predictive performance in two independent, multinational cohorts. Risk stratification by this index may inform both clinical practice and the design of new clinical trials.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary figure S1. Calibration plots for the (A) continuous full model in the derivation and the (B) validation cohorts, and the (C) point-score full model in the derivation and (D) validation cohort. ERJ-01587-2018_Figure_S1
Supplementary figure S2. Kaplan-Meier plots of transplant-free survival by (A) quartile of model-predicted risk for the continuous full model in the derivation and (B) validation cohorts, and by (C) point-score grouping for the point-score full model in the derivation and (D) validation cohort. ERJ-01587-2018_Figure_S2
Acknowledgements
We would like to acknowledge Emanuele Martorana (Dept of Physics and Astronomy, University of Catania, Catania, Italy) for developing the TORVAN calculator tool.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Author contributions: S.E. Torrisi and C. Vancheri conceived and designed the study. S.E. Torrisi collected the data. B. Ley and E. Vittinghoff did the statistical analysis. All authors contributed to data interpretation. S.E. Torrisi and B. Ley wrote the original draft of the paper and all authors reviewed and edited drafts and approved the final version for submission.
Conflict of interest: S.E. Torrisi reports grants from Boehringer Ingelheim and F. Hoffmann-La Roche Ltd, outside the submitted work.
Conflict of interest: B. Ley has nothing to disclose.
Conflict of interest: M. Kreuter reports grants from Galapagos, and grants and personal fees from Boehringer Ingelheim and Hoffman la Roche, outside the submitted work.
Conflict of interest: M. Wijsenbeek reports grants and other from Boehringer Ingelheim and Hoffman la Roche, and other from Galapagos, outside the submitted work.
Conflict of interest: E. Vittinghoff has nothing to disclose.
Conflict of interest: H.R. Collard reports personal fees from Bayer, Boehringer Ingelheim, Bristol-Myers Squibb, Global Blood Therapeutics, Genoa, ImmuneWorks, Navitor, Parexel, PharmAkea, Prometic, Toray, Unity, Patara, Veracyte, Roche/Genentech, aTyr, Advance Medical, Aeolus and MedImmune, grants from Pulmonary Fibrosis Foundation, and grants and personal fees from Three Lakes Partners, outside the submitted work.
Conflict of interest: C. Vancheri reports grants from AstraZeneca, Boehringer Ingelheim, Chiesi, F. Hoffmann-La Roche Ltd and Menarini, outside the submitted work.
- Received August 22, 2018.
- Accepted November 28, 2018.
- Copyright ©ERS 2019