Abstract
The aim of the present study was to validate and compare published prognostic classifications for predicting the survival of patients with small cell lung cancer.
We pooled data from phase III randomised clinical trials, and used Cox models for validation purposes and concordance probability estimates for assessing predictive ability.
We included 693 patients. All the classifications impacted significantly on survival, with hazard ratios (HRs) in the range 1.57–1.68 (all p<0.0001). Median survival times were 16–19 months for the best predicted groups, while they were 6–7 months for the most poorly predicted groups. Most of the paired comparisons were statistically significant. We obtained similar results when restricting the analysis to patients with extensive disease. Multivariate Cox models for fitting survival data were also performed. The HRs for a single covariate were 8.23 (95% CI 5.88–11.69), and 9.46 (6.67–13.50), and for extensive disease were 5.60 (3.13–9.93), 12.49 (5.57–28.01) and 8.83 (4.66–16.64). Concordance probability estimates ranged 0.55–0.65 (overlapping confidence intervals).
Published classifications were validated and suitable for use at a population level. As expected, prediction at an individual level remains problematic. A specific model designed for extensive-disease patients did not appear to perform better.
Despite the sensitivity of small cell lung cancer to radiotherapy and chemotherapy, the prognosis of patients with this disease is poor, with the 5-yr survival rate being <10% [1]. The most reproducible prognostic factor is disease extent, which is also the main factor guiding therapy. Very few other prognostic factors have been clearly established: performance status, sex and some routine biological parameters, such neutrophil or leukocyte counts, or albuminaemia [2, 3]. Our group, the European Lung Cancer Working Party (ELCWP), has long been interested in the identification of prognostic factors, and their integration in a classification system that could be used in care for providing information to patients and in clinical research for stratifying randomisation procedures or adjusting treatment comparisons. In 2000, we published such a classification, which we constructed using two different statistical strategies: recursive partitioning and amalgamation (RECPAM) algorithms, which lead naturally to patient classification, and Cox regression modelling [4]. The RECPAM technique consists of starting with the whole population of patients and then dividing it into groups according to the prognostic covariate with the highest significance level on the outcome (overall survival in the present case). Once the population has been divided into two groups, the search for a new split is performed separately on the two obtained nodes. The partitioning process proceeds iteratively until no further variable is identified as a significant prognostic factor or until the number of patients becomes too small. A second iterative algorithm is then applied, as the terminal nodes obtained from different branches of the tree do not necessarily correspond to patients with significantly different survival distributions. After application of the amalgamation algorithm, the classification is immediate. From data on 763 patients registered in four clinical trials (one phase II and three phase III trials), we proposed a classification based on disease extent, Karnofsky performance index, age, sex and relative neutrophil count. Our system included four groups: one with the best prognosis, including only limited-disease patients; one with the worst prognosis, including only extensive-disease patients; and two intermediate groups composed of patients of any disease extent. To date, the RECPAM classification and final Cox model have not been further tested with the intent to validate them. Moreover, therapeutic standards have changed with the introduction of combined thoracic radiation and chemotherapy for patients with limited disease. We conducted the present study on a new series of patients with small cell lung cancer with two objectives: 1) to validate our published classification and Cox model; and 2) to validate other previously published classifications/models and compare the different prognostic systems.
MATERIAL AND METHODS
We searched the literature for other published classifications based on series of patients with a sample size >500 subjects and including only purely prognostic factors (excluding therapeutic covariates). The limit of 500 patients was purely arbitrary but considered necessary to provide estimates of survival distributions with low variances. We identified four other studies with a published classification; all were based on RECPAM [2, 5–7]. When the publications also proposed a model based on regression analysis [2, 4, 7], we considered it in our comparisons too.
All the selected classifications and models were based on clinical, demographic or routine biological variables. All the RECPAM classifications lead to four prognostic groups, with the exception of the classification proposed by Foster et al. [7], which focuses on patients with extensive disease and has five different prognostic levels. The Cox regression models had four to five factors to explain the distribution of overall survival. The most frequently used covariates were disease extent, sex and age. Other covariates were lactate dehydrogenase level, white blood cell or neutrophil counts, creatinine level, alkaline phosphatases, or variables linked to disease stage. The full classifications and models are presented in the online supplementary material.
As validation series, we used a pooled database constituted from the databases of three further clinical trials that the ELCWP conducted on small cell lung cancer, from 1992 to 2008. Two of these studies are closed and the results have been published (ELCWP 1923 [8] and ELCWP 1922 [9]), while one is still ongoing (ELCWP 1994). The data on the patients registered in these studies were not included in the database used for derivation of our prognostic classification [4]. These are all phase III trials and their characteristics are described in the online supplementary material. Two trials included patients with extensive disease and treatment consisted of chemotherapy alone. The third trial was dedicated to patients with limited disease who were treated with combined chemoradiation and addressed the issue of the possible radiosensitising effect of cisplatin. For the ongoing trial, only patients randomised before October 31, 2007 were included in the database constructed for the present analysis, in order to achieve a theoretical follow-up duration of >2 yrs for all patients.
Some eligibility criteria were common to the three trials and similar to the eligibility criteria used for the trials considered in our previous study [4]. Small cell lung cancer had to be histologically proven and untreated; patients had to have normal haematological, hepatic and renal functions, have a Karnofsky performance status of ≥60, have provided informed consent and be accessible for follow-up. The definition of limited disease was a disease confined to primary site, mediastinum and homolateral subclavicular lymph nodes without malignant pleural effusion in one trial [8]. In the other two studies, it was defined as a disease that could be treated in one radiotherapy field. Evaluation criteria in the trials included response to treatment after three and six courses of chemotherapy, and progression-free and overall survival.
Statistical methodology
The objectives of the present study were, primarily, the validation of the results the ELCWP published in 2000 and, secondarily, the assessment and comparison of the prognostic values of the selected previously published RECPAM classifications and Cox models, using the new series of patients. The evaluation criterion was overall survival, measured from registration in the trials. All deaths were taken into account.
All the data required for the assessment of the five published classifications were prospectively collected during the conduct of each of the trials.
We used nonparametric estimation of the survival distributions and comparison by log-rank tests as well as hazard ratio (HR) estimates based on Cox regression models. For assessing the prognostic value of the previously published Cox models, we constructed an overall score based on the published regression coefficients. The original scores were transformed to have the same range of theoretical values in order to allow comparison of HR estimates between the models.
As we had one model specific for patients with extensive disease, we tested the general classifications and models both on all the patients and on the subgroup of patients with extensive disease.
In the first two published trials that we used for constructing our validation series, we failed to identify any survival benefit between the arms [8, 9]. For the third, ongoing trial, we did not look at survival comparison between arms. We did not use any stratification by trial for our analyses, as one trial was dedicated to limited disease patients only and the others were limited exclusively to patients with extensive disease.
To assess the predictive ability of the prognostic covariates, we used the concordance probability estimate [10].
All the significance probabilities were two-tailed; a p-value <0.05 was considered statistically significant.
RESULTS
We collected data on a total of 693 patients, 204 from ELCWP 1922 [9], 233 from ELCWP 1923 [8] and 256 from the ongoing trial (ELCWP 1994). There was an intersection between institutions having recruited patients for the derivation series and the institutions contributing to the trials used for the present validation series. However, the overlap was not total and our validation project was intermediate between internal validation and external validation [11]. Patient characteristics are presented in table 1. Compared with our derivation series, there was an increase in the proportions of both females and patients with a Karnosfky index ≥80. The proportion of patients with limited disease was lower than in our previous study. There were some missing data for biological parameters preventing us from assessing the five classifications in all the patients. Depending on the classifications, rates of missing data ranged <1–12%.
Median length of follow-up was 119 months; death was observed in 646 (93%) patients. Theoretical follow-up was >2 yrs in 97% of the patients and >5 yrs in 85% of the patients.
Distributions of classifications
Table 2 presents the distribution of the classifications on the validation series, for all patients and for the subgroup of patients with extensive disease. Frequencies for each category of the classifications are presented after exclusion of patients with missing data, preventing us from allocating them to one of the categories. For each of the classifications, the lower the category is, the better the predicted survival time. Depending on the classification, the proportion of patients selected as having the best prognosis was 17–31% and the proportion of patients with the worst prognosis was 11–26%.
Validation study
Figure 1 shows survival curves according to the ELCWP prognostic classification. Group I had an estimated median survival time of 90 weeks, compared with 48, 34 and 28 weeks for groups II, III and IV, respectively. The overall comparison was highly significant (p<0.001), using the prognostic classification as a continuous covariate, as well as all the paired comparisons. Our previous Cox model identified four independent prognostic covariates. Table 3 provides the newly estimated regression coefficients for the validation series if all four covariates were entered into the model together with the same categorisation than for the derivation model. Three of them retained a statistically significant p-value (p<0.0001) while the last one, sex, was not significant, with a p-value of 0.08. The goodness-of-fit of a model including disease extent alone was improved if the RECPAM classification was used and further improved if the Cox model was used, confirming the usefulness of the integrated models.
Prognostic models comparison
In table 4, we present the estimates of the median survival times in each prognostic category, and an overall comparison reflecting the global prognostic value of each RECPAM classification using a nonparametric estimation method and the log-rank test. When all the patients were analysed, the overall comparison in each situation was highly significant. The HR estimates range 1.57–1.87 (change when comparing one category to the adjacent higher one). This means that the hazards for the group with the worst prognosis were four- to five-fold higher compared with the group with the best prognosis. No heterogeneity was found between these HRs. As the overall comparisons were all significant, we performed paired comparisons (group n compared with group n+1), the results of which are presented in table 5. For the classification of Albain et al. [5], there was no statistical evidence that groups II and III had different survival distributions. For the classification of Foster et al. [7], which focused on patients with extensive disease, group I could not be shown to be different from group II, and the comparison between groups II and III was not significant either. Group IV was not analysed because very few patients in our series belonged to that group.
The regression coefficients for the three Cox models were used to calculate an overall prognostic score. The scores were standardised in order to vary on a scale between 0 and 2 (on a patient population 18–85 yrs of age). The HRs associated with these scores are shown in table 6 and are all significantly different from 1. The more recent models [2, 7] appear to perform better than the third [4].
Concordance probability estimates
Concordance probability estimates were calculated for each classification and model; predicted survival times were obtained from a Cox model using a single covariate that was a RECPAM classification or a covariate calculated from the regression coefficients of the published Cox models. They are presented in table 7 with 95% CIs. The Cox models had slightly higher coefficients than the RECPAM classifications. The coefficients were worse when the analysis was restricted to the population of patients with extensive disease, even for the models proposed by Foster et al. [7], which were specifically constructed on patients with extensive disease.
DISCUSSION
Prognostic factor studies are numerous in the literature. However, reproducible and well-established independent prognostic factors are lacking [3]. Indeed, most often, identification of prognostic factors is performed retrospectively and there are few, if any, for patients with small cell lung cancer. Phase III prognostic factor studies, as described by Simon and Altman [11], are prospectively conducted with a priori hypotheses and evaluation of the required sample size according to those hypotheses. Therefore, it is of crucial importance to validate results before making use of them. Furthermore, the independent prognostic value of a factor needs to be validated. In that context, we do not need to validate a single factor but rather a set of prognostic factors and, therefore, a classification or a model. There are several possibilities for validation exercises: internal validation with cross-validation or bootstrapping techniques, or external validation on new series of patients, which is the more convincing way to proceed whenever possible [12]. Several factors may be considered for external validation too: using patients recruited at the same institutions and during the same time period is less general than carrying out a validation study on patients of different origins and developing the disease later than the patients included in the derivation sets. We performed an external validation of our previous study [4] with an assessment of its historical transportability, as defined by Justice et al. [13]; however, some institutions had contributed to both series of patients. It should be noted that the classification constructed by the International Association for the Study of Lung Cancer (IASLC) staging project is not entirely independent, as the ELCWP contributed to the constitution of the worldwide database and it is possible that data from the other series were also integrated into that database. This later classification was already validated as, in the IASLC study, two-thirds of the patients were used for model derivation and one-third was used for validation. Beyond the fact that the validation is not fully external, we should stress that our study was retrospective, leading to some missing data in the assessment of the classifications/models and to some heterogeneity in the way the required covariates were assessed. Furthermore, our patient population was restricted to a population of patients registered in clinical trials, limiting the generalisability of the conclusions.
Our validation was successful for both our RECPAM classification and our Cox model in a population of patients included in a clinical trial, despite the fact that treatment strategies have evolved with the introduction of combined chemoradiation modalities for limited disease. The prognosis of patients with limited disease has improved and we may hypothesise that this is the joint result of a higher accuracy of staging techniques and a more effective treatment. Furthermore, we validated the four groups of our classification, as the ordered paired comparisons were all significant. Group I, which included only patients with limited disease, might be the target for developing new treatment modalities. A second conclusion for the validation of our models is that the concordance between predicted and true survival times in patients belonging to different risk groups is insufficient for making use of the models at the level of the individual patients. The prognostic covariate built on the Cox model performed slightly better than the RECPAM classification, which might be a disappointing result, as the RECPAM classification did not seem to benefit from the interaction effects that were naturally integrated during the building of the classification. Finally, the performance of our models decreased when applied to patients with extensive disease.
Most of these conclusions remain true when we are looking at the other proposed prognostic classifications published before the present study or developed recently, with a global validation and the most ordered paired comparisons being significant. Among the exceptions is the comparison between groups I and II in the classification of Sagman et al. [6]. This may be due to a small proportion of patients in group I and to a lack of power in that setting. Also, recently, the concept of very limited disease may have come to correspond to a highly selected patients population. The other exception is between groups II and III in the IASLC classification [2]; the proportion of patients belonging to group II was also rather small and there might again be an issue of power in the confirmation of the separation between groups II and III. The most relevant difference between groups for the classifications of Sculier et al. [2], Paesmans et al. [4] and Albain et al. [5] appeared to be the identification of the group with the best predicted survival distribution (group I), i.e. limited-disease patients with other favourable features. Depending on the classification, this group was more or less strictly defined, with logical consequences on the size of the group and its median survival duration.
One of the prognostic classifications [7] we compared was specifically developed for patients with extensive disease with the expectation of being more specific and, therefore, more accurate. Looking at our results, this is not the case, suggesting that other covariates than the one tested should be of importance or that the impact of the other covariates is associated with higher variability. Our sample size might also be insufficient, as this classification was based on five groups in a restricted validation series. The predictive ability, assessed by the concordance probability estimates of all the models, is decreased in the restricted populations of patients with extensive disease.
All the models studied here were based on easy-to-assess variables (age, sex and routine laboratory parameters), or on variables that would be required to determine tumour stage and therapeutic strategy anyway. Therefore, it should be recommended to use at least one classification for stratifying patients in a clinical trial or allow comparisons of patient populations. Indeed, all are of value, as there is no relevant difference between the models. To our knowledge, this is the first work validating several classification systems. All of them have been shown to be successful in a further series, showing once again that it is less relevant to identify isolated prognostic factors than to integrate them into a prognostic system and that several prognostic systems, although based on different covariates, may have comparable discriminant and predictive values. Disease extent, age, sex and performance status are, however, the cornerstones of the classifications.
None of these classifications takes into account molecular biological factors or gene signatures, and there is indeed room for improvement, as concordance probability estimates are clearly unsatisfactory. However, as they are very simple to assess, any study of these new possible prognostic factors should compare the predictive ability of more costly prognostic tools to more simple tests.
Footnotes
This article has supplementary material available from www.erj.ersjournals.com
Statement of Interest
None declared.
- Received July 14, 2010.
- Accepted December 15, 2010.
- ©ERS 2011