Discriminant analysis on small cell lung cancer and non-small cell lung cancer by means of NSE and CYFRA-21.1

Discriminant analysis on small cell lung cancer and non-small cell lung cancer by means of NSE and CYFRA-21.1. G. Paone, G. De Angelis, R. Munno, G. Pallotta, D. Bigioni, C. Saltini, A. Bisetti, F. Ameglio. ©ERS Journals Ltd 1995. ABSTRACT: A correct diagnosis of small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) is essential both for prognostic and therapeutic reasons. We used discriminant analysis as a method to optimize the discriminant power of serum tumour marker levels for differentiation between SCLC and NSCLC. A panel of serum markers, including neurone specific enolase (NSE), cytokeratin fragment antigen 21.1 (CYFRA-21.1), tissue polypeptide antigen (TPA) and carcinoembryonic antigen (CEA) was obtained in 50 consecutive NSCLC and 17 SCLC. Data were analysed by the BMDP statistical program after logarithmic transformation of marker levels. The variables selected were NSE and CYFRA-21.1. Considered together, they were able to give a 97% rate of correct classification. The formula generated (canonic variable, CV) was validated on a group of seven SCLC and 22 NSCLC patients. Only two errors occurred. We therefore conclude that the canonic variable tested, based on NSE and CYFRA21.1, provides a good discrimination between the two types of lung cancer. The method is rapid, relatively inexpensive, and based on simple serum tests. Eur Respir J., 1995, 8, 1136–1140. *Dept of Cardiovascular and Respiratory Sciences, University "La Sapienza", Rome, Italy. **III Division of Pneumology, and +Centre of Nuclear Medicine, C. Forlanini Hospital, Rome, Italy. ++Dept of Medical, Oncologic and Radiological Sciences, University of Modena, Italy. #Laboratory of Clinical Pathology, S. Gallicano Institute, Rome, Italy.

Lung cancer represents the most frequent malignant neoplasm in man [1], with an incidence of 33,000 new cases annually in Italy [2], and 600,000 deaths worldwide [3]. The latter figure may be an underestimation due to the lack of reliable statistics in many countries.
Lung cancers are classified into four major cell types by histology: small cell lung cancer (SCLC); lung adenocarcinoma (LADC); squamous cell lung cancer (SQCLC); and large cell lung cancer (LCLC) [4]; the last three types being grouped together as non-small cell lung cancer (NSCLC). Other known types of lung cancer are not represented in our study.
Differentiation between SCLC and NSCLC is very important for prognostic and therapeutic reasons, due to their different behaviour [5,6].
In addition to histology, an alternative diagnostic methodology may be useful, especially if the system is based on simple laboratory tests, performed on serum. Until now, by using single tumour markers, it was not possible to classify lung cancer as SCLC or NSCLC. Some authors tried to improve the discrimination rate by combining several markers, but with limited success [7][8][9][10][11][12][13].
Discriminant analysis is a mathematical method, which may be applied to a set of markers and which may improve the classification power of any single variable, increasing the final discrimination rate [14][15][16][17].
By means of a suitable computer program [18,19], working with four tumour markers, we investigated the possibility of enhancing differentiation between SCLC and NSCLC by discriminant analysis.

Patient population
From January 1993 to December 1993, a first group of 67 consecutive unselected and untreated patients with newly diagnosed lung cancer was evaluated in a division of the Forlanini Hospital (Rome, Italy) to generate a canonic variable (CV) for discrimination between SCLC and NSCLC using serum tumour marker levels.
A second group of 29 patients was then enrolled to validate the discriminant power of the canonic variable in patients not previously used to generate the algorithm [20].
Histological types and disease stages of both study groups are displayed in table 1.

Lung cancer diagnosis
Histological diagnosis was made by at least two pathologists, following World Health Organisation (WHO) criteria [4]. For each patient, three sputum samples and bronchoscopic biopsies were routinely evaluated. Needle transthoracic aspiration was needed for seven patients and resected tissues for three. Serum marker data were not known by the pathologists.
The laboratory personnel did not know the histological diagnosis.

Statistical analysis
Due to non-normal distribution of the raw data, a logarithmic transformation was needed. To permit comparisons with other studies, results are presented as medians and ranges of the individual non-transformed data and as mean±SD of numbers obtained after the Ln transformation.
Comparisons or correlations were evaluated by using non-parametric tests (Kruskall Wallis one-way variance analysis or Spearman Rank test, respectively) on the raw data, and by means of Student's t-test on normalized data.
Discriminant analysis, a multiparametric test, was performed by using a computer furnished with the BMDP program (BMDP, statistical software, University of California, USA, P7M module) [18,19].
This method generates an index, negative or positive, able to separate two groups. This index is named canonic variable (CV). CV=0 serves as the cut-off point. Patients with CV >0 are classified as SCLC, patients with CV<0 are classified as NSCLC. Inclusion of the variables into the CV formula was obtained by the criterion of corrected means by F statistics. The analysis was performed using the option of the "equal prior probability" to assign the subjects to groups.
The jack-knifed approach was used to discriminate the patients. With this method, each patient is evaluated by a canonic variable generated after exclusion of the same patient data. Furthermore, the canonic variable formula was also validated on another group of patients enrolled consecutively after the first [20].

Sample size
As suggested by LACHENBRUCH [21], the number of subjects considered must be at least five times greater than that of the variables selected. In our case, this recommendation was largely satisfied. The significance level of the discrimination rate obtained in the study, indicated that the sample size was sufficient.

Results
As shown in table 2, only CYFRA-21.1 and NSE were statistically different between the two types of lung cancer. Figure 1 compares the distributions of Ln CYFRA-21.1 and Ln NSE in SCLC and NSCLC patients. NSE has the highest discrimination power (21 patients misclassified by CYFRA-21.1 and eight by NSE).
Correlation matrices of the four variables in the groups are shown in table 3a (SCLC) and 3b (NSCLC). Serum levels of CEA and TPA, TPA and CYFRA-21.1, and TPA and NSE were significantly correlated in both cancer types. In NSCLC the levels of CEA and NSE, and in SCLC the levels of CYFRA-21.1 and NSE, were also correlated. Applying the discriminant analysis to SCLC and NSCLC, a good classification was obtained by means of Ln NSE and Ln CYFRA-21.1. The results are reported in figure  2; the calculation formula is reported in the legend to this figure. CV showed significantly different values between the two groups: SCLC (mean±SD 1.71±1.42 versus NSCLC -1.62±0.81, (p<0.0001).
The statistical evaluation of the correlation found between the two classification systems was obtained by canonic correlation (r coefficient = 0.83; p<0.0001).
As reported in the legend to figure 2, the Ln NSE variable coefficient is positive in contrast to that of Ln CYFRA-21.1, indicating that these two variables have an opposite effect in determining the final CV value, as reported previously [7][8][9][10]12]. The most important discriminant variable is NSE (see fig. 1).
TPA and CEA were not selected, possibly due to nonsignificant differences found between the two groups of subjects and strong correlations with CYFRA-21.1 and NSE markers (tables 2 and 3).   To validate the formula previously generated, a control group (table 1b) was used. The rate of correct classification obtained was 93%, with only two errors: one patient affected with NSCLC and one with SCLC ( fig.  3). Although misclassified, these two patients presented CV values between -0.5 and 0.5.
Early stages (I or limited disease (LD)) were misclassified in two cases, 1 of 13 belonging to the first group (table 1a) and 1 of 7 to the validation group (table 1b).
A further attempt to improve the correct classification rate by adding the squamous cell carcinoma antigen (SCC-Ag) and carbohydrate antigenic determinant 19-9 (Ca 19-9) [22] did not improve the results (data not shown).

Discussion
Due to the different biology, prognosis and sensitivity to therapy of SCLC and NSCLC, their differentiation is very important. Generally, this aim is obtained using histological techniques. The recognition that lung cancer is often associated with changes in the levels of various plasma markers, suggests their possible employment as diagnostic and discriminant indices. For this purpose, several studies have been performed to evaluate the ability of tumour markers to diagnose and to differentiate the various histological types of lung cancer [7][8][9][10][11][12][13].
Discriminant analysis represents one of the best methods to associate the discriminant power of more variables to obtain the maximum classification between two or more groups. This methodology is currently applied in taxonomy. Several examples of the use of this method are also reported in the literature concerning different fields [14][15][16][17]20]. Recently, serum tumour marker levels were reliably used to distinguish between primary and metastatic malignant bone tumours [16].
Our study attempted to optimize the use of some common markers of lung cancer (TPA, CEA, NSE and CYFRA-21.1), to differentiate SCLC and NSCLC by means of discriminant analysis. To our knowledge, no other reports exist in the literature for this topic.
The canonic variable generated was able to separate SCLC from NSCLC with an overall 97% (98% on NSCLC and 94% on SCLC) rate of correct classification, whilst no acceptable classification was obtained among LADC, SQCLC and LCLC.
One of the critical points for the correct use of discriminant analysis is the validation of the canonic variable generated. In fact, overestimation cannot be ruled out, since the formula directly derives from the data of the group selected. To overcome this problem, a second group of patients, not previously employed to generate the algorithm, was evaluated confirming the validity of the formula generated.
Generally, histology furnishes both cancer diagnosis and histological typing; whilst, due to the type of groups used, the formula obtained cannot be employed to recognize lung cancer from other diseases, but only to distinguish histological types. Therefore, a previous clinical examination was necessary to indicate the presence of a lung malignancy.
In addition to the theoretical importance of CV in recognizing histological types of cancer, a clinical role for this method may be represented by those cases where histology cannot be obtained. For instance, there are subsets of patients with poor cardiorespiratory function and negative sputum cytology in which fibrebronchoscopy, needle transthoracic aspiration and thoracotomy may not be performed. Although histology remains the reference method, a relatively simple serological test may be helpful to obtain a presumptive diagnosis of lung cancer type in these patients.
Consistent with the notion that tumour marker levels in the serum are more readily detectable within advanced stages, one might expect that this multimarker serological test may not be employed with success in early stage cancers. However, although the relative frequency of misclassification was higher in early stage cancers, the rate of correct classification (90%) appears promising.
In conclusion, the results indicate that, the use of discriminant analysis with a small panel of markers may be useful to differentiate SCLC and NSCLC, especially when applied to an appropriate subset of lung cancer patients.