Main

Lung cancer accounts for almost one-fourth of total cancer deaths (Jemal et al, 2011). Approximately, 80% of all lung tumours belong to non-small cell lung cancer (NSCLC). Since most NSCLC are not resectable at diagnosis, chemotherapy remains the mainstay of treatment. Despite recent progress with individualised treatment, prognosis is still poor. To maximise the benefit of therapy, prognostic assessment according to the tumour, node, metastasis system, histomorphological characteristics (Warth et al, 2012b), and predictive biomarker analyses, that is, for epidermal growth factor receptor (EGFR) mutations (Mok et al, 2009), and anaplastic lymphoma kinase (ALK) translocations (Kwak et al, 2010) are of utmost importance for patient stratification.

Assessment of tumour proliferation, as a morphology-based measure of tumour growth kinetics, has a long-standing history. To this end, analysis of proliferation-associated antigens, especially Ki-67, is frequently performed. Ki-67 is a DNA-binding nuclear protein expressed throughout the cell cycle in proliferating but not in quiescent (G0) cells. It is a well-known potent biomarker with significant prognostic and/or predictive value in leading cancer entities such as breast (Urruticoechea et al, 2005; Viale et al, 2008; Cuzick et al, 2011; Luporsi et al, 2011), prostate (Pollack et al, 2004; Khor et al, 2009) and colorectal cancer (Allegra et al, 2003). Consequently, the proliferation index (PI) influences clinical decision-making and choice of treatment in several tumour entities. This includes breast cancer where PI is used to separate luminal A and luminal B phenotypes (Leong and Zhuang, 2011), which impacts on therapy, and neuroendocrine tumours of the gastrointestinal tract where assessment of PI is the key component of tumour grading with strong prognostic as well as therapeutic implications (Rindi et al, 2012). First data also indicate a diagnostic and prognostic value of PI in pulmonary carcinoids (Zahel et al, 2012; Warth et al, 2013); the value of Ki-67 in neuroendocrine tumours of the lung will be further substantiated in the future (Pelosi et al, 2014). Beyond its use in formal classification systems, tumour cell proliferation is frequently analysed by pathologists to comprehensively describe the aggressiveness and the biological behaviour of a tumour which may provide additional ‘non-formalised’ clues for therapy selection.

In resected NSCLC, meta-analysis of numerous studies suggested that high PI is a negative prognosticator (Martin et al, 2004) and predicts recurrence (Poleri et al, 2003; Yamashita et al, 2011) as well as survival after lung tumour ablation (Sofocleous et al, 2013). However, the informative value of the vast majority of studies on this issue is limited due to the usually small sample numbers investigated. Hence, published meta-analyses are hampered by a lack of uniformity with respect to clinicopathological data and chosen cut-off values. Thus, Ki-67 assessment in NSCLC has not been successfully translated into routine diagnostics yet.

In order to clarify some of these issues and to elucidate the prognostic impact of PI in NSCLC we therefore retrospectively analysed three independent NSCLC cohorts including a total of almost 1500 cases in which we correlated PI with clinicopathological data including overall (OS), disease-specific (DSS), and disease-free survival (DFS).

Patients and methods

Patients

For the test cohort only R0-resected NSCLC with clinicopathological data including sex, age, tumour size, lymph node and distant metastases, type of resection, histotype, applied adjuvant therapy, OS, DFS, and DSS were chosen. All tumours were resected between 2002 and 2008 at the Thoraxklinik Heidelberg, Germany (local ethics committee no. 206/2005). Tumours resected after neoadjuvant therapy were excluded. The ADC (adenocarcinoma) validation cohort consisted of tumours with the same dataset resected between 2009 and 2010 at the Thoraxklinik Heidelberg. The SQCC (squamous cell carcinoma) validation cohort consisted of tumours resected R0 between 1993 and 2002 at the University Hospital Zurich, Switzerland. Basic clinical (age, sex and stage) data, OS and DFS status as well as postoperative treatment data were available in this dataset, as well. Tumours were classified by AW, WW, and PAS (Heidelberg) or AS (Zurich) according to the criteria of the 2004 WHO classification and re-staged according to the seventh Edition International Union Against Cancer/American Joint Committee on Cancer Tumour, Node, and Metastasis (TNM) classification. Adenocarcinoma subtyping was performed according to the International Association for the Study of Lung Cancer (IASLC)/American Thoracic Society (ATS)/European Respiratory Society (ERS) proposal (Travis et al, 2011).

Clinicopathological characteristics

The test cohort included 1065 NSCLC specimens, 482 ADC (45.3%), 437 SQCC (41%), 42 adenosquamous carcinomas (ASC; 3.9%), 57 large cell carcinomas (LC; 5.4%), 11 large cell neuroendocrine carcinomas (LCNEC; 1%), 28 sarcomatoid carcinomas (SC; 2.6%), and 8 combined tumours (0.8%). Mean age of patients was 63.2 years, 773 patients were male (72.6%), 292 patients were female (27.4%). A total of 17 patients were treated by wedge resection (1.6%), 8 patients received segmentectomy (0.8%), 779 (73.1%) lobectomy, 37 (3.5%) bilobectomy, and 224 (21%) pneumonectomy, accompanied by systematic lymph node dissection. Adjuvant chemotherapy or mediastinal irradiation was administered according to the guidelines in effect at the time of resection. Mean follow-up of patients alive at the endpoint of analysis was 45.6 (ADC: 43.5; SQCC: 49.3) months. Mean follow-up of patients alive at the endpoint of the analysis or who died of other causes was 42.1 (ADC: 40.5; SQCC: 44.8) months. Mean follow-up of patients alive without recurrence was 40.5 (ADC: 37.8; SQCC: 44.1) months. Clinicopathological characteristics of the test cohort are provided in Table 1.

Table 1 Association of proliferative activity with staging parameters for the whole test cohort of non-small cell lung cancers (NSCLC), the test cohort adenocarcinoma subgroup (ADC), and the test cohort squamous cell carcinoma subgroup (SQCC)

The ADC validation cohort comprised 184 ADC. Mean age of patients was 63.1 years. Overall, 96 (52.2%) patients were male, 88 (47.8%) were female; 73 (39.7%) tumours were classified as stage I, 43 (23.4%) as stage II, 62 (33.6%) as stage III and 6 (3.3%) as stage IV. Mean follow-up time of patients alive at the endpoint of analysis was 27.9 months. Mean follow-up time of patients who were alive at the endpoint of analysis or died of non-cancer-related causes was 26.7 months. Mean recurrence-free follow-up time was 25.1 months.

The SQCC validation cohort comprised 233 SQCC. Mean age of patients was 65.6 years, 187 (80.3%) were male, 46 (19.7%) were female; 86 patients (36.9%) were classified as stage I, 88 (37.8%) as stage II, 46 (19.7%) as stage III and 13 (5.6%) as stage IV. Mean follow-up of patients alive at the endpoint of analysis was 89.1 months. Mean recurrence-free follow-up was 91.7 months.

Tissue microarray construction, immunohistochemistry, and PI analysis

Tissue microarray (TMA) construction for the test cohort and the ADC validation cohort was previously described in detail (Warth et al, 2012a). In brief, a TMA machine (AlphaMetrix Biotech, Rödermark, Germany) was used to extract two 1.0-mm cylindrical core samples from each tumour block. Immunohistochemistry (IHC) was performed on both cohorts using an antibody against Ki-67 (MIB1, 1 : 400; DAKO, Hamburg, Germany) according to a quality controlled and accredited (DAkkS) protocol. Tissue microarray slides were deparaffinised, pre-treated with an antigen retrieval buffer (pH 9.0; DAKO) and stained using an automated device (DAKO Techmate 500plus). For the SQCC validation cohort, TMA construction was performed with a custom-made, semiautomatic tissue arrayer (Beecher Instruments, Sun Prairie, WI, USA) extracting two 0.6-mm core samples from each donor block. A rabbit monoclonal anti-Ki-67 antibody clone 30-9 (Confirm kit, prediluted, Ventana-Roche, Basel, Switzerland) was used on a Benchmark automated IHC platform (Ventana-Roche) using protocol CC1m and ultraView-HRP detection.

Ki-67-positive tumour cells were counted per 100 tumour cells separately in every core by focusing on representative areas avoiding obvious hot spots as well as areas with specifically low indices and areas with necrosis or haemorrhage. The mean of the results of all cores, which could be evaluated for a given case, was assigned as final Ki-67 index to this case. In a certain number of cases, only one core was available for evaluation due to inadequate sampling of tumour tissue in one of the cores or loss of the second core during TMA processing. In the test cohort, for example, in 913 (ADC: 404; SQCC: 386; others: 123) cases both cores could be evaluated while in 152 (ADC: 78; SQCC: 51; others: 23) cases (14.3%) only one core was available for Ki-67 evaluation.

Statistics

Correlation of PI with clinicopathological characteristics was done by unpaired t-test and analysis of variance. Comparison of tumour diameters with PI was tested with Pearson’s correlation coefficient. Overall survival, DSS, and DFS were analysed for all possible cut-offs using the Kaplan–Meier method, with a log-rank test to probe for significance. The testing for continuous hazard ratios (HRs) was done with a web-based software tool (cut-off finder) based on Cox regression (Budczies et al, 2012). Multivariate survival analysis was done by the Cox proportional hazard model. All tests were done two sided and performed using SPSS Statistics 19 (IBM, Ehningen, Germany), Prism 4.03 (GraphPad, La Jolla, CA, USA), and R (www.r-project.org). P-values <0.05 were considered significant.

Results

NSCLC proliferative activity depends on the histotype

Mean PI in the test cohort was 40.7%. Large cell neuroendocrine carcinomas showed the highest PI (mean: 70.7%) followed by SQCC (52.8%), SC (48.1%), LC (47.7%), mixed tumours (38.1%), and ADC (25.8%; Supplementary Figure 1, Figure 1). Within ADC subtypes solid predominant ADC showed the highest PI (39.4%), followed by acinar (21.6%), micropapillary (15.7%), papillary (14.4%), and lepidic predominant ADC (9.5%; Figure 1).

Figure 1
figure 1

Histology and proliferative activity. (A) Proliferative activity in dependence of histological tumour type. (B) Association of pulmonary adenocarcinoma growth patterns with proliferation.

Proliferative activity of NSCLC is associated with tumour stage and nodal status

In the test cohort PI was associated with tumour size (r=0.121, P<0.001; Supplementary Figure 2), due to an association of PI and size in ADC (r=0.118). This association was not observed for SQCC. UICC stage, tumour stage, and nodal status were also associated with PI (Table 1). This could be demonstrated for the whole NSCLC cohort and for the ADC subgroup, but not for SQCC. The most striking differences of proliferation with respect to UICC stage were evident between stages I and II for ADC with a >10% higher PI in stage II. Adenocarcinomas with lymph node metastasis had higher PI compared to node-negative tumours (Table 1).

Proliferative activity of NSCLC and survival: opposing effects of low and high Ki-67 indices

In most published studies mean or median PI has been used for stratification of cases according to proliferative activity. Applying these categories to our test cohort did not yield significant differences with respect to OS, DSS, and DFS (P0.1 for all comparisons; not shown). To investigate this further, we continuously plotted HRs for all possible cut-offs for OS, DSS, and DFS (Figure 2A, Supplementary Figure 3) using an automated software tool (Budczies et al, 2012). Cut-offs in the low PI range resulted in HR higher than (OS and DSS) or just 1 (DFS), indicating a survival disadvantage, while cut-offs in the higher PI range showed HRs <1, indicating a better prognosis for OS, DSS, and DFS. This is exemplified by Kaplan–Meier curves for the cut-offs of 10% and 55% as shown in Supplementary Figure 4. To dissect these surprising findings further, we then investigated ADC and SQCC separately.

Figure 2
figure 2

Association of proliferative activity with survival in non-small cell lung cancer. All possible cut-off values for proliferative activity and their impact on overall survival (OS) are depicted for the whole test cohort (A), and in the subgroups of pulmonary ADC (B) and SQCC (C).

Strong proliferative activity in ADC predicts poor survival: definition of cut-off values and validation

In ADC, high PI was significantly associated with a worse prognosis almost over the whole range of potential cut-off values for OS, DSS, and DFS (Figure 2B, Supplementary Figure 5). The statistically optimal cut-off for the separation of a good and a poor prognostic ADC group across all survival parameters was at 25%. Application of this cut-off led to a mean DFS of 60.3 months (OS: 71.3 months; DSS: 80.3 months) for patients with a PI <25% vs 54.3 months (OS: 56.4 months; DSS: 65.2 months) for patients with a PI 25% (Figure 3) in the test cohort.

Figure 3
figure 3

Adenocarcinoma survival probabilities in dependence of proliferative activity. Distribution of survival probabilities for (A and D) overall survival (OS), (B and E) disease-specific survival (DSS), and (C and F) disease-free survival (DFS) in pulmonary adenocarcinomas stratified for proliferative activity (cut-off 25%) in the test cohort (AC) as well as in the independent validation cohort (DF).

Stage and growth pattern of ADC were also predictors of OS, DSS, and DFS in univariate analyses (not shown). In multivariate survival analysis under inclusion of stage, pattern and PI, PI was an independent predictor of OS (HR 1.56, P=0.004; Supplementary Table 1) and DSS (HR=1.80, P=0.002) but not DFS (HR=1.11, P=0.442) in the ADC test cohort.

Analysis of the independent ADC validation cohort confirmed the impact of the predefined cut-off of 25% on survival in ADC. Survival of patients with high PI (25%) was considerably shorter, which was statistically significant for DFS but not for OS and DSS (Figure 3). Mean DFS for patients with low PI (<25%) was 35.4 months (OS: 39.3 months; DSS: 41.2 months), while DFS for patients with a PI 25% was only 28.8 months (OS: 37.5 months; DSS: 40.0 months). This survival difference was borderline significant in a multivariate Cox proportional hazard model for DFS (HR: 1.63, P=0.055) under inclusion of the above mentioned parameters.

The prognostic impact of proliferation in ADC patients is abolished by adjuvant treatment

We addressed if adjuvant treatment regimens alter the prognostic impact of PI in ADC. The rate of potential cut-offs that produced a HR significantly >1 for DSS decreased from 68.5% in ADC patients without adjuvant chemotherapy to 13.9% in patients with adjuvant chemotherapy (Supplementary Figure 6). This effect is exemplified for the cut-off at 25% (Supplementary Figure 7). A comparable drop in the frequency of significant results (48.1% to 6.7%) was also evident for adjuvant radiotherapy (Supplementary Figure 6). The same was true for OS and DFS (Supplementary Table 2). To test whether adjuvant therapy interfered with the overall independent prognostic impact of PI in ADC in general, we included this information into an additional multivariate analysis. In the respective multivariate analysis under inclusion of PI, stage, pattern as well as adjuvant chemotherapy and adjuvant radiotherapy, adjuvant chemotherapy (HR: 0.63, CI: 0.44–0.88; P=0.008) but not adjuvant radiotherapy (HR: 0.88, CI: 0.59–1.31; P=0.536) proved to be an additional independent factor for better OS in this setting. The independent impact of PI on OS, however, remained unchanged in this setting (HR: 1.57, CI: 1.16–2.12; P=0.004).

High proliferation in SQCC is associated with better survival

In SQCC of the test cohort only cut-offs in the low range (<15%) resulted in a HR >1. The majority of potential cut-offs over the medium to high PI range resulted in a HR <1 (Figure 2C, Supplementary Figure 8), indicating better survival of SQCC patients with higher PI. The most robust differentiation for survival was found at a PI cut-off of 50%. Patients with a PI <50% had a mean OS of 63.2 months (DFS: 65.9 months), while patients with a PI of 50% had a mean OS of 73.7 months (DFS: 74.4 months; Figure 4). The survival effect was stage independent for OS (HR: 0.65, 95% CI: 0.48–0.89; P=0.007; Supplementary Table 3) and DFS (not shown). We validated these data in an independent cohort of 233 SQCC. In our SQCC validation cohort patients with a PI <50% also had a worse survival (mean OS: 58.7 months; DFS 54.8 months) compared to patients with a PI of 50% (mean OS: 74.8 months; mean DFS 67.1 months; Figure 4). Again, the effect of PI on OS was stage independent in multivariate survival analysis (HR: 0.63, 95% CI 0.46–0.87; P=0.005). In contrast to ADC, adjuvant treatment did not significantly impact on the association of PI with survival in SQCC (data not shown).

Figure 4
figure 4

Squamous cell carcinoma survival probabilities in dependence of proliferative activity. Kaplan–Meier curves depicting survival differences for OS (A and C) and DFS (B and D) in dependence of proliferative activity in squamous cell carcinomas (SQCC) with a 50% cut-off in the test cohort (A and B) as well as in the independent validation cohort (C and D).

Impact of proliferation on survival in other NSCLC subtypes

To delineate whether the prognosis of rare NSCLC subgroups is influenced by PI we tested the prognostic impact of our optimised cut-offs for ADC (25%) and SQCC (50%) in LC, LCNEC, SC, and ASC. The 50% cut-off did not result in statistically significant survival differences in any of the groups. In contrast, patients with ASC were prognostically stratified in the same way as ADC when the cut-off of 25% was applied (Supplementary Figure 9; Supplementary Table 2). In the other subgroups no significant survival differences were noted.

Discussion

In this large-scale study on the prognostic impact of proliferation in NSCLC we demonstrate that PI is a highly significant and independent predictor of survival in ADC and ASC. In accordance with our data, Kadota et al (2012) recently showed that mitotic count in combination with architectural grade is an independent recurrence predictor in stage I ADC. Ki-67 has been described as a potent biomarker with significant prognostic value in cancer entities such as breast (Cuzick et al, 2011), prostate (Khor et al, 2009) and colorectal cancer (Allegra et al, 2003). In breast cancer and gastrointestinal neuroendocrine tumours it already entered clinical decision-making. The yet largest meta-analysis on PI in NSCLC indicated a prognostic value as well (Martin et al, 2004); however, one recent study investigating PI in 778 NSCLC failed to predict a benefit from adjuvant cisplatin-based chemotherapy (Filipits et al, 2007). Optimal and widely applicable Ki-67 cut-offs allowing clinically relevant patient stratification in NSCLC, however, have not been determined yet. Many single studies provided heterogeneous results; this has led to confusion and hampered the use of Ki-67 in NSCLC routine diagnostics. There are several reasons for this heterogeneity. First, many studies on this topic are considerably underpowered with regard to the number of analysed cases. Second, our data demonstrate that the prognostic impact of PI depends on the NSCLC subtype analysed. Hence, analysing a mixed NSCLC cohort will not lead to meaningful results. Third, different and sometimes randomly chosen cut-off values are likely confounders resulting in heterogeneous results (Lee et al, 1995). We could demonstrate that the determination of suitable cut-off values for NSCLC subtypes is one of the keys to obtain meaningful results. Evidence-based cut-off values and standardised assessment will be a prerequisite to establish proliferation as a clinically useful parameter in the routine diagnostic setting.

The usage of means/medians is generally accepted as one approach to define biomarker-related cut-off values (Altman et al, 1994), this, however, did not turn out to be useful for PI in NSCLC. Continuous evaluation of all possible cut-offs over several survival parameters with the subsequent choice of optimal discriminators and independent validation of these, suggested clinically meaningful PI cut-off values of 25% for ADC and 50% for SQCC. Nevertheless, it is evident that dichotomisation of continuous parameters may pose a problem for diagnostic categorisation in some instances (Royston et al, 2006); a sharp PI cut-off of 25% implies that patients with 24% and 26% PI have a significantly different outcome, this potentially leads to different treatment decisions. For clinical use it has therefore been suggested to apply two cut-off values that delineate a central ‘grey’ area in which other patient-related factors might be considered to select appropriate treatment regimens (Goldhirsch et al, 2009). Although we were not able to define such a grey area in our dataset (not shown), large-scale validation studies in well-characterised NSCLC cohorts are warranted to ultimately determine respective cut-off values and potentially delineate such a ‘grey’ zone.

The survival differences between ADC with high and low PI were especially prominent in patients who did not receive adjuvant therapy and diminished in patients who received adjuvant treatment, implying that patients with strongly proliferating ADC may benefit to a higher extent from adjuvant treatment compared to patients with slowly proliferating ADC. Thus, PI may be a useful parameter for stratification of patients for adjuvant therapy. This should be tested in the prospective setting.

This study is the first to correlate PI assessed by Ki-67 staining with the different ADC subtypes defined by the IASLC/ATS/ERS (Travis et al, 2011). Of note, although micropapillary predominant ADC have the worst prognosis and are significantly associated with the presence of metastasis at the time of diagnosis (Warth et al, 2012b), this extremely aggressive biological behaviour is not reflected by a high PI, pointing to other malignancy-associated mechanisms, for example, altered expression of adhesion molecules (Kuner et al, 2009) as crucial factors.

In contrast to the situation in ADC we found a complex association of PI with prognosis in SQCC. An increased PI had a negative prognostic impact in SQCC only in the low PI range (<15%), while higher PI cut-offs resulted in an inverse correlation of PI with patient survival. Comparable findings concerning the prognostic impact of Ki-67 for SQCC (and also for ADC) have been reported previously in a few smaller studies (Hommura et al, 2000; Takahashi et al, 2002; Poleri et al, 2003). From a biological viewpoint these results are difficult to explain and presently one could only speculate on potential underlying mechanisms of this observation. One aspect might be that fast uncontrolled tumour growth could lead to an inadequate blood supply with a higher propensity of the tumour to develop necrosis, which in turn might induce a stronger antitumour immune response. However, functional data on such associations are missing yet. Although we do not see an association of adjuvant treatment with survival differences predicted by PI in SQCC, another explanation could be that chemotherapeutic treatment administered later in the disease course might have stronger effects in patients with rapidly proliferating tumours.

Limitations of our study include its retrospective nature as well as the use of TMA cores and not whole slide sections to determine the proliferative activity of tumours. Although the use of TMA may result in a bias, PI in NSCLC biopsies (which are comparable to TMA cores with respect to size) corresponds to PI in the respective resection specimens in >80% of cases (Meert et al, 2004). Another limitation in our study might be that staining protocols and antibodies used in the SQCC test and validation subgroup were not identical. However, this also indicates that the determination of PI as a biomarker is not strictly assay dependent. Nevertheless, international standardisation of analysis and assessment of any potential biomarker is an important aspect for a successful translation into the routine setting.

Although the findings concerning the prognostic impact of PI in our test cohort could largely be recapitulated in our ADC validation cohort with respect to DFS, we could not ultimately confirm the prognostic effect of PI for OS and DSS. This is most likely due to the fact that the validation cohort was smaller and follow-up was less mature when compared to the test cohort. This again points out that large and well-characterised cohorts are essential for biomarker studies which might explain why the results of many previous smaller studies on PI do not show conclusive results. Of course, the results of our study should be validated and further refined in other large, independent and well-characterised cohorts. Furthermore, it has to be investigated if PI is also of prognostic value in the palliative setting.

Perspectively, Ki-67 might be used in concert with other clinicopathological parameters to prognostically stratify patients. In preceding studies, we could already demonstrate that combined assessment of clinical, morphological, immunohistochemical, molecular, and imaging characteristics is helpful in this regard (Lederlin et al, 2013; Warth et al, 2014). Inclusion of a variety of different markers in multidimensional prognostic models will ultimately pave the way for an individualised tumour treatment based on evidence-based prognostic stratification. The combined assessment of clinical and (molecular) pathological risk factors has already been proven to be a helpful tool in other major cancer entities. Good examples for such clinically useful combined risk scores are nomograms, which have been established for a variety of different cancer entities, including, among others, prostate and breast cancers (Toi et al, 2012; Adamis and Varkarakis, 2014).

Taken together, we demonstrate that proliferation in ADC is an independent negative prognostic factor for survival, while in SQCC high proliferative activity is associated with better outcome. If these results can be confirmed in future prospective studies, standardised assessment of proliferation might become a useful biomarker in the daily routine diagnostic setting in NSCLC.