Abstract
18F-fluoro-2-deoxy-d-glucose positron emission tomography (PET) complements conventional imaging for diagnosing and staging lung cancer. Two literature-based meta-analyses suggest that maximum standardised uptake value (SUVmax) on PET has univariate prognostic value in nonsmall cell lung cancer (NSCLC). We analysed individual data pooled from 12 studies to assess the independent prognostic value of binary SUVmax for overall survival.
After searching the published literature and identifying unpublished data, study coordinators were contacted and requested to provide data on individual patients. Cox regression models stratified for study were used.
Data were collected for 1526 patients (median age 64 years, 60% male, 34% squamous cell carcinoma, 47% adenocarcinoma, 58% stage I–II). The combined univariate hazard ratio for SUVmax was 1.43 (95% CI 1.22–1.66) and nearly identical if the SUV threshold was calculated stratifying for histology. Multivariate analysis of patients with stage I–III disease identified age, stage, tumour size and receipt of surgery as independent prognostic factors; adding SUV (HR 1.58, 95% CI 1.27–1.96) improved the model significantly. The only detected interaction was between SUV and stage IV disease.
SUV seems to have independent prognostic value in stage I–III NSCLC, for squamous cell carcinoma and for adenocarcinoma.
Abstract
SUV has independent prognostic value in stage I–III NSCLC, both for squamous cell carcinoma and for adenocarcinoma http://ow.ly/Q2FKI
Introduction
Worldwide, lung cancer is the leading cause of cancer-related death in men and the second in women [1]. Despite the availability of new treatments, survival of patients has remained relatively poor [2]. Few validated and accurate prognostic factors are currently used in clinical practice for managing or predicting outcomes for individual patients, although many clinical and histopathological characteristics, laboratory markers, molecular biological markers and gene signatures have been tested for their potential prognostic value. Despite abundant literature on this topic, until now only two characteristics are definitely established as independent prognostic factors: performance status (especially in advanced disease) and disease stage [3], as confirmed by the International Association for the Study of Lung Cancer (IASLC) staging project. In this context, data on >100 000 patients were collected and used to build the 7th edition of the TNM (tumour, node, metastasis) classification for lung cancer [4]. Prognostic factors were also studied, but the findings were limited by the small number of covariates present in the existing worldwide databases. This further indirectly illustrates that very few factors are universally accepted as being important for predicting outcome in lung cancer.
TNM staging assessment was traditionally based on surgical findings and conventional imaging. However, for more than two decades, positron emission tomography (PET) with the glucose analogue 2-[18 F]-fluoro-2- deoxy-d-glucose (FDG) has been used extensively for disease staging in lung cancer. Indeed, several meta- analyses have shown that the use of FDG-PET improves the accuracy of staging [5, 6].
Our group has performed two systematic reviews and meta-analyses of the literature (the second an update of the first) analysing the possible prognostic value of FDG-PET for survival [7, 8]. The results, published in January 2008 and May 2010, suggested that having a high maximum standardised uptake value (SUVmax) of the primary tumour might be a poor prognostic factor. Altogether, 21 studies were analysed [9–29].
In these meta-analyses, our comparison of patients with low and high SUV produced an overall combined hazard ratio of 2.08 for high SUV, significantly different from 1 (95% CI 1.69–2.56). No interaction was detected between older and newer studies (p=0.60) or between studies in which patients with metastatic disease were excluded or included (p=0.46).
Considering the limitations of literature-based meta-analyses [30], in particular the absence of multivariate analysis, the previous work was extended by performing a pooled analysis of individual patient data, a project endorsed by the IASLC staging project. The primary objective was to assess the prognostic value of primary tumour FDG-PET SUVmax (corrected for body weight) for overall survival, with and without adjustment for established prognostic factors, especially the TNM classification. As a secondary objective, subgroup analyses were performed and the possible interactions between SUV and patient or tumour characteristics were investigated.
Methods
All corresponding authors of the studies previously included in the systematic reviews were contacted and provided with a protocol and a plan for further analysing individual data. Those who joined the project were asked to provide such data to the coordinating institution (Institut Jules Bordet, Brussels, Belgium) with pre-specified variables, to update their survival data and to complete a questionnaire about the method used to measure FDG uptake. One unpublished series was also included.
Data from the different series were then reviewed, pooled and analysed at the coordinating institution. A descriptive analysis of the associations between SUV and other possible prognostic factors was preformed. Cox regression models were used to obtain, for each individual study, an estimate of the hazard ratio for the effect of SUV on overall survival. SUV was represented for these comparisons as a binary variable (low or high level) with the median in each of the individual series chosen as threshold. The overall impact of SUV was assessed using a stratified Cox model. When possible, overall survival was calculated from the date of PET until the date of death (from any cause) or date of last follow-up. Otherwise, overall survival was used as provided in the individual datasets.
Subgroup analyses were performed using the same method and interaction tests were undertaken. Finally, multivariable Cox regression models stratified by study were used to assess the independent prognostic value of SUV and to assess the benefit of adding SUV to a model based on already-known prognostic variables. For this purpose, a multivariable analysis stratified per study was carried out using the covariates available except SUV. A backward selection method was used. Afterwards, entry of SUV in the model was forced to evaluate model improvement. Only patients with complete data were included in the analyses (no missing values). Hazard ratio estimates are reported with 95% confidence intervals, and all p-values are two-tailed. SAS software (version 9.4; SAS Institute Inc., Cary, NC, USA) was used.
The method of measuring SUV and the type of FDG-PET scanner are clearly important, and the reproducibility of measurements between centres depends on the use of standardised protocols. Nevertheless, a large measurement variability is possible and already described elsewhere (see appendixes 1 and 2 of [7]). Briefly, different factors and errors can lead to differences in SUV measurements: clinical/physiological factors (such as glycaemia, time between injection and images, fasting, motion and patient comfort), sources of error (cross-calibration of clocks for correct decay time, accurate cross-calibration of dose calibrator and cameras, exact injected activity and paravenous injection) and physical effects (acquisition settings such as time per bed position, percentage of overlap between bed positions, two- or three-dimensional acquisition mode, type of attenuation correction, image reconstruction methods and image resolution). For these reasons, a specific questionnaire (online supplementary material) was sent to the authors to collect clinical and technical information about the conditions under which both the FDG-PET measurements and the SUV measurements were obtained. The completed questionnaires were intented to be used to construct a covariate, representing subgroups of studies that might be considered homogeneous in the method of acquiring SUV values.
Results
25 corresponding authors were contacted. Five did not reply [14, 21, 22, 26, 31]; another nine [9, 10, 12, 15, 17, 24, 28, 32, 33] did not provide approval for participation or did not send the data after preliminary agreement or did not have the data available at the time we asked them. For the remaining 11 studies [11, 13, 16, 18–20, 23, 25, 27, 29, 34], the individual data were available and received. Additionally, data on one unpublished series of patients was collected.
In terms of numbers of patients, 3578 potentially eligible patients according to the publications (including the unpublished series) were identified and data related to 1563 of these patients (44%) were received. Baseline characteristics of the studies as well as reasons for excluding some patients from the analysis are presented in table 1.
Patient characteristics, including covariates present in most of the datasets for the 1526 patients included in the analysis are shown in table 2. Plans had been made to adjust the analysis for other known prognostic factors (e.g. biological factors), but this was not possible because of lack of data. Pre-specified analysis of progression-free survival was not possible as the available data allowed calculation of progression-free survival for only ∼65% of the patients.
SUV was provided as a continuous covariate in 10 out of the 12 series. In one series, SUV was provided as a binary covariate [27], and in the other, SUV was provided in a mixed way [19]. However, in those two series, the individual hazard ratio could be estimated using the median as the SUV threshold. 10 out of the 12 series reported SUVmax, and two did not [13, 20]. In one study [29], both SUVmax and SUVmean were available.
Using ANOVAs adjusted for study, an association between SUV (as a continuous covariate) and sex, with higher metabolic activity (and hence SUV) in male patients (p=0.04) was identified. In contrast, metabolic activity was lower in adenocarcinoma (p<0.001) (as illustrated in table 1 and fig. 1) and in stage I or stage II disease (p<0.001). Finally, a significant positive correlation was noted between SUV and tumour size (both variables continuous, p<0.001), but not with age.
The impact of SUV on overall survival is shown in table 3, together with the rates of events and the hazard ratios we obtained in our literature-based meta-analysis. The main differences between the literature-based hazard ratio estimates and the presently obtained hazard ratio estimates resulted from the use of SUVmedian as the threshold or to updates of survival data or to slight changes in the populations of patients analysed.
The combined hazard ratio was 1.43 (95% CI 1.22–1.66): patients with a high SUV had significantly worse survival than patients with a low SUV. If the two studies in which SUVmean was used were excluded, the combined hazard ratio became 1.46 (95% CI 1.23–1.73), with the same conclusion. Survival curves are depicted in figure 2.
When SUV was dichotomised according to the median values with stratification by histology (adenocarcinoma versus squamous cell versus other histological subtype), 159 (10%) patients changed SUV categories (88 from a low value to a high value and 71 in the other direction). However, the combined hazard ratio did not change appreciably (HR 1.43, 95% CI 1.23–1.67).
Results of subgroup analyses are shown in table 4. The only interaction identified was with stage, suggesting that SUV has little or no prognostic value among patients with stage IV disease.
The first fitted multivariable model included age, stage, tumour size and receipt of surgery as explanatory variables (981 patients and 337 events). Because sex was not in the model, refitting was performed, adding patients with unknown sex category (1117 patients and 374 events). Older age (HR 0.51, 95% CI 0.40– 0.66; p<0.0001), stage I (HR 0.42, 95% CI 0.29–0.60; p<0.0001) or stage II (HR 0.60, 95% CI 0.40–0.91; p<0.0001) were favourable prognostic factors; larger tumour size (HR 1.09, 95% CI 1.04–1.14; p=0.0006) and lack of surgery (HR 2.98, 95% CI 2.04–4.35; p<0.0001) were poor prognostic factors. Notably, because tumour size was not available for any patient with stage IV disease, this model did not include any stage IV patient. Addition of SUV to this previous model significantly improved it (table 5). The hazard ratio for high versus low SUV was 1.58 (95% CI 1.27–1.96; p<0.0001) and increased slightly if the threshold used to categorise it depended on histology (HR 1.61, 95% CI 1.30–2.00). No interaction between SUV and other covariates included in the model was found to be significant.
When tumour size was not considered, stage IV patients could be added to the analysis, again without statistical significance for sex. Without SUV, the best model (1526 patients and 669 events) included age, stage and lack of surgery, with the following hazard ratios. Age ≥70 years: 0.66 (95% CI 0.55–0.78; p<0.0001); stage I: 0.20 (95% CI 0.13–0.29; p<0.0001); stage II: 0.31 (95% CI 0.20–0.48; p<0.0001); stage III: 0.48 (95% CI 0.37–0.62; p<0.0001); and lack of surgery: 2.46 (95% CI 1.94–3.13; p<0.0001). The addition of SUV to this model improved its predictive ability, as shown in table 5 (similar results for the two different categorisations of SUV). In this latter case, the interaction between stage and SUV was significant with no detectable effect of SUV in patients with stage IV.
The data collected on the methods used to obtain SUV are shown in table 6. This table shows that the various studies did not have identical clinical or physiological patient preparations, probably did not have the same physical effects and had some unassessable sources of error because several studies were retrospective. This substantial heterogeneity did not allow us to construct groups of studies as planned, and therefore prevented us from studying SUV as a continuous variable.
Discussion
We performed a pooled analysis based on individual data from 1526 patients from 12 different series; in other words, 44% of the patients identified as eligible were included in the analysis. We confirmed the results of our previous work: patients with high SUV uptake of the primary tumour have shorter survival than patients with lower metabolic rate, and we have now identified high SUV as an independent prognostic factor, a conclusion that could not be reached from our previous literature-based meta-analysis. However, as expected, the point estimates were lower than those previously obtained; indeed, we used the SUVmedian values in each series as threshold to dichotomise SUV measurements instead of the so-called “best cut-off”, which is known to increase the chance of false-positive results [35]; adjustment for other prognostic covariates may also explain the lower point estimates. As expected, tumour metabolic activity was associated with histology. To correct for this factor when assessing the prognostic value of SUV, we calculated separate thresholds stratifying by histology, but we did not identify any modification of the magnitude of SUV's effect on survival by doing so. Similarly, we did not find an interaction between SUV and histology. This is also a new finding obtained thanks to the use of individual patient data.
There are several limitations to this study. The number of covariates was restricted compared with our protocol; as previously discussed [3, 4], this illustrates the need to standardise the study of prognostic factors in lung cancer. One of the cited advantages of an individual patient data meta-analysis [30] is to provide more mature data, but in the present analysis we observed that it is difficult, at least outside the context of clinical trials, to perform searches to update survival status. Finally, we did not have a sufficient number of datasets that included information on disease progression, and thus we could not calculate progression-free survival, which would have been a further advantage over our previous systematic reviews.
Our literature-based meta-analyses [7, 8] prevented us from addressing the issue of heterogeneity of studies with regards to the method of assessing SUV and of the subsequent large variability in measurements. This is why our analyses of correlations between SUV as a continuous variable and other baseline characteristics should be interpreted carefully, although they were adjusted for the study. Some scientific associations, such as the European Association of Nuclear Medicine have not only published guidelines for semi-quantitative FDG-PET evaluation in oncology [36], but have also built a system for PET/computed tomography centre accreditation [37], taking into account staff training, clinical conditions of the examination, calibration of the machines, methods of reconstruction and attenuation algorithms, among others. However, accreditation does not fully solve the issue of reproducibility of results [38]. By using a questionnaire asking specifically about the methods used in each study, we were unable to identify homogeneous groups, and we can only conclude that higher values of SUV imply higher hazards. Some authors have suggested that patients do not fall into two discrete groups, but rather show a continuous increase in the hazard as SUV increases [23, 39]. However, our data did not allow us to propose one or several thresholds. In this field of metabolic imaging perhaps more than in any other, prospective standardised multicentre studies with an a priori estimated sample size allowing sufficient statistical power are lacking but greatly needed.
Publication bias is another important issue, and clearly our study is likely to have suffered from such a bias: we missed 56% of the “eligible” patients, and many other series may well exist without being published. This is why our study has to be interpreted as a pooled data analysis and is not a meta-analysis of the available evidence. With the limited number of patients, there is no power issue, as a significant independent prognostic value for high SUV was identified, but there is a problem of publication bias, intrinsic to any meta-analysis of prognostic factors, as no registry of such studies exists. This is the price to pay when conducting individual patient data meta-analyses, as obtaining approval to share data from all database owners is very difficult; we partially failed in this exercise despite the fact that we shared with all the authors contacted a protocol for the present study and a pre-planned analysis. However, when returning to our literature-based meta-analysis and calculating combined hazard ratios for studies with individual patient data available or not, similar point estimates were found: 2.15 (95% CI 1.57–2.93) and 2.03 (95% CI 1.54–2.69) with a p-value of 0.79 for heterogeneity test. This suggests that our subgroup of studies might be well representative of the overall evidence and that publication bias is not important. As discussing data exchange with the authors of published series takes a lot of time, we did not try to obtain more recently published data. However, should high SUV have truly no effect at all on survival, a very large study (or many medium-sized studies) would be required to transform our results into non-statistically significant ones.
Nevertheless, for all the above reasons, a prospective study aiming to detect a hazard ratio similar to the point estimate we obtained would be necessary to confirm our results.
Despite the mixture of SUV measurements, only two series provided SUVmean, compared to 10 series that measured SUVmax. The exclusion of these two series did not significantly affect the results nor modify the conclusion that patients with a high SUV have significantly worse survival than patients with a low SUV.
Interestingly, we confirmed that lung adenocarcinomas showed the lowest SUV values among tumour histologies, as has been reported by several authors [32, 40–42]. As far as the analysis could be taken without using SUV as a continuous covariate, this correlation did not modify the prognostic value of SUV.
The well-known partial volume effect [43] could at least partially explain the finding that patients with stage I–II lung cancer showed lower metabolic uptake. Nevertheless, SUV consistently showed prognostic value in each stage, from stage I to III, but our results suggest that SUV is not prognostic among patients with stage IV disease. It is possible that once a lung tumour shows metastatic spread, the anatomic extent of the tumour is more important in predicting prognosis than the metabolic activity of the primary tumour only. Although this supposition should be confirmed with a larger number of patients with stage IV disease, this might be the reason why the introduction of other measures of metabolic activity such as tumour glycolysis and metabolic volume as well as heterogeneity and texture analysis are very promising as prognostic factors [44–49]. However, these “new” derived metabolic parameters seem to be more interesting in large tumours, due to the inability of PET to characterise tracer distribution within small tumours because of its limited spatial resolution. However, multiple texture parameters have been described and tested to describe the tumour heterogeneity but no consensus has been reached to conclude which is the best and the most robust. In addition, these parameters are also very interesting in NSCLC stage IV patients, where we failed to identify any independent prognostic value for SUVmax. In our opinion, multicentric large prospective trials testing these new derived parameters are needed to conclude in which cases/stages their use is the most pertinent.
Conclusions
Although our study suffered from selection bias and lack of standardised SUV assessment, our findings suggest that SUV at the time of diagnosis, measured on the primary tumour, is an independent prognostic marker for patients with stage I–III NSCLC. There was no detectable interaction between histology and SUV, despite the confirmed observation that SUV uptake is higher in squamous cell cancer compared to adenocarcinoma. The utility of SUV in predicting survival in stage IV patients requires further study.
Acknowledgements
The authors thank Claude Hossein-Foucher (Nuclear Medicine, CHRU de Lille, Lille, France) and Arnaud Scherpereel (Pulmonary and Thoracic Oncology, CHRU de Lille) for their support of the work and their critical reading of the manuscript.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Conflict of interest: Disclosures can be found alongside the online version of this article at erj.ersjournals.com
- Received January 20, 2015.
- Accepted July 5, 2015.
- Copyright ©ERS 2015