Abstract
The Thoracoscore mortality risk model has been incorporated into the British Thoracic Society guidelines on the radical management of patients with lung cancer. The discriminative and predictive ability to predict mortality and post-operative pulmonary complications (PPCs) in this group of patients is uncertain.
A prospective observational study was carried out on all patients following lung resection via thoracotomy in a regional thoracic centre over 42 months. 128 out of 703 subjects developed a PPC. 16 (2%) patients died in hospital. In a logistic regression analysis the Thoracoscore was not a significant predictor of mortality (OR 1.07, 95% CI 0.99–1.17; p=0.11) but was a significant predictor of PPCs (OR 1.08, 95% CI 1.03–1.13; p=0.002). However, the area under the receiver operator characteristic curve for the Thoracoscore was 0.68 (95% CI 0.56–0.80) for predicting mortality and 0.64 (95% CI 0.59–0.69) for PPCs, indicating limited discriminative ability.
In a logistic regression analysis, another risk model, the European Society Objective Score, was predictive of mortality (OR 1.43, 95% CI 1.11–1.83; p=0.006) and PPCs (OR 1.48, 95% CI 1.30–1.68; p<0.0001).
Therefore, Thoracoscore may have poor discriminative and predictive ability for mortality and PPCs following elective lung resection.
Risk models may aid in making the decision for surgery and allow surgeons to provide accurate information and obtain informed consent from patients undergoing surgery. Several risk models and stratification tools have been developed to measure and evaluate the risk of death after adult general thoracic surgery [1, 2]. In the new British Thoracic Society (BTS) guidelines for the radical management of patients with lung cancer, much emphasis is placed on informing patients of the possible risks and letting them make the final decision regarding surgery [3].
The new BTS guidelines selection for lung cancer surgery [3] have incorporated the Thoracoscore [1], which incorporates age, sex, American Society of Anaesthesiologists (ASA) score, performance status, dyspnoea score, priority of surgery, procedure class, diagnosis group and comorbidity score in order to provide a predicted risk of in-hospital mortality for individual patients. It was developed and validated in patients undergoing general thoracic surgery [1, 4]. However, the discriminative and predictive ability in this specific group of lung cancer patients is untested.
The European Society Objective Score (ESOS), involving age and post-operative predictive forced expiratory volume in 1 s (FEV1), was developed to identify pre-operative risk factors associated with mortality after lung resection surgery [2]. ESOS was developed in a population where 83% of cases had cancer, but it requires external validation within the lung cancer population.
Post-operative pulmonary complications (PPCs) are the leading cause of morbidity and mortality following lung resection surgery. With an incidence rate of 15%, PPCs have a significant health and economic impact on patients and healthcare services [5].
Consequences of PPCs may include a prolonged hospital stay, invasive ventilation or intensive therapy unit (ITU) admission, all these may be more important to patients than a small risk of death. No validated system exists to provide patients with the risk of this complication.
In this study we aim to test whether the Thoracoscore and ESOS can accurately predict post-operative mortality and PPCs after elective lung resection surgery.
METHODS
In this prospective observational study, patients who underwent consecutive thoracotomy and lung resection between October 2007 and October 2010 at a large UK regional thoracic centre were observed and selected based on the BTS guidelines (2001) [6].
All patients were managed daily by a specialised thoracic team comprising of a senior thoracic surgeon and junior doctors. All patients received a daily physiotherapy programme from the first post-operative day which included sitting out of bed, early mobilisation and progressive exercise, deep breathing exercises and assisted coughing. The physiotherapy regimen and frequency was escalated in any patient thought to require more assistance with airway clearance. The Melbourne Group Scale was used to assess PPCs and was recorded daily by an experienced physiotherapist (table 1) [5, 7].
Data collected included demographic information, diagnosis, surgery type, smoking and alcohol intake, ASA score, performance status and breathlessness score, pre-operative self-reported exercise capacity, pre-operative FEV1, post-operative predicted FEV1, comorbidities, length of in-hospital and high-dependency unit stay, ITU admission and mortality.
Statistical methods
Results are expressed in percentage for categorical variables, and median (interquartile range (IQR)) and full range for non-normally distributed continuous variables.
Univariate analyses using Pearson’s Chi-squared test or Fisher's exact test, where appropriate, were performed for categorical factors and a Wilcoxon rank sum test was performed for continuous factors to assess their relationship with mortality or PPCs.
The performance of the scores in predicting mortality and PPCs were investigated by fitting a logistic regression model. The models were validated by assessing the model calibration using the Hosmer and Lemeshow goodness of fit test and model discrimination using the area under a receiver operator characteristic (ROC) curve and associated 95% confidence intervals [8]. R2 measure of goodness of fit was used as a measure of predictive ability.
A stepwise multivariate logistic regression analysis was performed to identify the independent predictors of PPCs within this dataset. A complete case analysis was considered the primary analysis as physical activity was the only incomplete variable. A sensitivity analysis was performed to assess the impact of the missing data on the conclusions of the multivariate logistic regression. Multiple imputations were carried out using an ordinal logistic regression model for a monotone missingness, including all characteristic variables and scores and outcome variables [8]. Estimates from fitting a logistic regression model to predict the PPC in each imputed dataset were combined using the rules of Rubin [9]. All analyses were performed using the SAS statistical package (version 9.2; SAS Institute Inc., Cary, NC, USA).
RESULTS
A total of 703 patients who underwent surgery in a single institution were analysed. The median (IQR) age was 68 (14–89) yrs and 57% were male. The following procedures were performed: lobectomies (n=385, 55%), wedge excisions (n=158, 22%), pneumonectomies (n=67, 10%), segmentectomies (n=42, 6%), explorations (n=31, 4%) and sleeve resections (n=20, 3%). 641 (91%) patients underwent surgery for primary lung cancer. The median (IQR) FEV1 was 2.00 (0.40–4.65) and FVC was 3.09 (0.48–6.05). Patients’ characteristics are presented in table 2.
There were 16 (2%) in-hospital deaths and all but two were secondary to pulmonary complications. A total of 128 (18%) patients experienced PPCs and had a longer length of stay (11 (8–23) days) compared to patients who did not develop a PPC (5 (4–7) days). Patients who developed a PPC also had a higher rate of ITU admission: 14% versus 2%.
The median (IQR) predicted in-hospital mortality based on the Thoracoscore for those patients who died was 4.1 (2.3–5.7)% and for the remaining patients was 2.3 (1.3–4.1)%. The predicted risk of mortality from the Thoracoscore was considerably higher than the observed mortality within each Thoracoscore risk group (table 3). The Thoracoscore was not a significant predictor of mortality in a logistic regression analysis (OR 1.07, 95% CI 0.99–1.17; p=0.11). The area under the ROC curve for the Thoracoscore in this population was 0.68 (95% CI 0.56–0.80) for predicting mortality, indicating limited discriminative ability. The Hosmer and Lemeshow goodness of fit test was not statistically significant (Chi-squared=11.32, degrees of freedom (df)=8, p=0.18), suggesting there may be adequate calibration but the R2 measure of goodness of fit of 0.0028 indicates poor predictive ability. The predicted risk of mortality increases with increasing Thoracoscore risk group (table). The factors showing disparity between the patients who died and those who were alive were sex (p=0.003), chronic obstructive pulmonary disease (COPD) (p=0.03) and type of operation (p=0.06) (table 2). The number of events was too small to perform a reliable multivariate analysis.
The median (IQR) predicted mortality rate based on the Thoracoscore was higher for those patients who had a PPC (3.2 (2.0–4.8)%) than for patients without PPCs (2.3 (1.2–4.1)%). The Thoracoscore was a significant predictor of PPCs in a logistic regression analysis (OR 1.08, 95% CI 1.03–1.13; p=0.002). However, the area under the ROC curve for the Thoracoscore in this population was 0.64 (95% CI 0.59–0.69) for predicting PPCs, indicating unsatisfactory discriminative ability. The Hosmer and Lemeshow goodness of fit test was statistically significant (Chi-squared=17.78, df=8, p=0.02), suggesting poor calibration and an R2 measure of goodness of fit of 0.013 indicates relative poor predictive ability. The predicted risk of PPCs from the logistic regression analysis is higher in the high-risk group based on the Thoracoscore (table 4).
The median (IQR) predicted in-hospital mortality based on the ESOS score for those patients who died was 3.4 (2.6–4.2)% and for the remaining patients was 1.9 (1.2–3.1)%. The predicted risk of mortality is similar to the observed mortality and was highest for the patients in the high-risk group based on the ESOS score (table 3). The ESOS score was predictive of mortality in a logistic regression analysis (OR 1.43, 95% CI 1.11–1.83; p=0.006). It had good discriminative ability (area under the ROC curve 0.73, 95% CI 0.62–0.85) and adequate calibration (Hosmer and Lemeshow goodness of fit test Chi-squared=8.37, df=8, p=0.40), but poor predictive ability (R2=0.009).
The median predictive mortality rate based on the ESOS score for those patients who had a PPC was 3.1 (7–4.0)% and for the patients without a PPC was 1.8 (1.1–2.2)%. The ESOS score was a significant predictor of PPCs in a logistic regression analysis (OR 1.48, 95% CI 1.30–1.68; p<0.0001). However, it had limited discriminative ability (area under the ROC curve 0.68, 95% CI 0.63–0.73), poor calibration (Hosmer and Lemeshow goodness of fit test Chi-squared=15.75, df=8, p=0.05) and limited predictive ability (R2=0.05). The predicted risk of PPCs from the logistic regression analysis is higher in the ESOS high-risk group than the other risk groups but differed from the observed PPC rates (table 3).
Univariate analysis of factors in patients with PPCs compared with those without PPCs are shown in table 5. Fitting a stepwise multivariate logistic regression identified that having COPD (p<0.0001), an ASA score of 3 or 4 (p=0.002), a dyspnoea score of 3 (p=0.004), high creatinine levels (>2 mg·dL−1; p=0.002) and smoking (p=0.02) as the most important independent factors for predicting PPCs in this dataset (table 6). The area under the ROC curve was 0.74 (95% CI 0.69–0.79), suggesting good discriminatory ability. The Hosmer and Lemeshow goodness of fit test was not statistically significant (Chi-squared=3.37, df=6, p=0.76), indicating good calibration and an R2 of 0.11 indicates reasonable predictive ability. Physical activity was not significant after fitting COPD, ASA, creatinine and dyspnoea into the model. Sensitivity analysis was performed to assess the impact of the missing physical activity data on the logistic regression model and results were such that after multiple imputation physical activity was not an independent predictor of PPCs.
DISCUSSION
Thoracoscore is a model with good accuracy to predict in-hospital and mid-term mortality after all types of general thoracic surgery [1, 4]. The new BTS guidelines have incorporated Thoracoscore into the selection for elective surgery for lung cancer [3]. Its applicability in this situation is uncertain. For example, it bundles together a minimally invasive keyhole procedure (removing <1% of lung function) with the same risk as a bilobectomy open surgery (removing one-third of lung function). Similarly the number of patients with poor performance status and/or breathlessness who would be selected for elective lung resection is very small, but these are important factors defining risk in the original cohort. In this study, Thoracoscore has limited discriminative and poor predictive ability for mortality following elective lung resection. There is a trend for Thoracoscore to underestimate risk in low-risk cases and overestimate risk in high-risk cases.
The ESOS score is a model developed to identify pre-operative risk factors associated with mortality after lung resection surgery [2]. The score is defined by age and post-operative predictive FEV1. Interestingly this factor was not a significant risk factor but the authors chose to use this measure as it could be clearly defined. It is difficult to define how well the objective model performed in the defining cohort because no solid data is presented on this in the original publication. The ESOS, in contrast to Thoracoscore, in our population was a more significant predictor of mortality and had better discriminative, but still poor, predictive ability.
The variation of expected and observed mortality in both ESOS and Thoracoscore might not be considered relevant clinically. This raises the issue of the importance to the patient of such defined mortality values. These numbers might be of use in research and outcome assessment, but quoting a range of risk to patients might be more appropriate and easier to comprehend.
The risk of in-hospital mortality is a rare occurrence; perhaps what is of more relevance to patients is the development of more common complications such as PPCs. Both the Thoracoscore and ESOS had limited discriminative and predictive ability and poor calibration, providing predictions for PPC that were too low following elective lung resection. At first glance, observed PPC rates appear misleadingly similar to rates predicted using the Thoracoscore or ESOS. However, the patients who actually develop PPCs are not the ones predicted to do so by these scores. The poor model performance of these scores may be a consequence of their development for predicting mortality and not PPCs.
Factors identified as independently associated with a risk of developing PPCs were COPD, ASA, smoking, high dyspnoea score and pre-operative blood creatinine. Pre-operative self-reported exercise capacity (unavailable in 26%) was found not to be a significant predictor of PPCs once other factors were included in the multivariate logistic regression model sensitivity analysis. Using these significant factors in a regression model resulted in good predictive and discriminatory ability. PPCs are associated with patient morbidity and prolonged length of hospital stay. Thus, a predictive model that could identify those at high risk of PPCs would help to target preventative strategies, such as pre-operative rehabilitation.
By the time a scoring system finds its way into clinical practice it can already be out of date. A revised Thoracoscore (EPITHOR) for predicting in-hospital mortality specifically in cancer patients has been developed [10] since the publication of the new BTS guidelines and the development of our study. The EPITHOR score was not a significant predictor of mortality (OR 1.23, 95% CI 0.90–1.68; p=0.19) and had poor discriminative ability (area under the ROC curve 0.61, 95% CI 0.46–0.76) in our cohort of patients.
In addition, the ESOS has also been updated [11]. The European Society for Thoracic Surgery mortality score was more predictive of mortality (OR 2.33, 95% CI 1.24–4.37; p=0.008) than the ESOS score but the discriminative ability was slightly worse (area under the ROC curve 0.68, 95% CI 0.54–0.82). Therefore, a large national study is required to corroborate the findings from this study, and provide a validation of these new scores and a comparison of all scores in order to be able to inform and update the BTS guidelines.
In the new BTS guidelines much emphasis is placed on putting the risk of mortality to the patients to help them decide on surgery. By using a scoring system that inaccurately assesses mortality and morbidity are we unknowingly misleading patients rather than improving consent?
Footnotes
Statement of Interest
None declared.
- Received December 12, 2011.
- Accepted February 28, 2012.
- ©ERS 2012