Abstract
Endobronchial ultrasound (EBUS)-guided transbronchial needle aspiration (TBNA) may diagnose suspected lung cancer. Determination of non-small cell lung cancer (NSCLC) subtype may guide therapy in select patients. Small-volume biopsies may be subject to significant interobserver variability in subtype determination.
Three pathologists independently reviewed specimens from 60 patients who underwent EBUS-TBNA for diagnosis/staging of suspected/known NSCLC. Smear, haematoxylin and eosin (H&E) and immunohistochemistry (IHC) specimens were reviewed without reference to other specimen types obtained from the same patient. Final diagnoses, and degree of confidence in the diagnosis, were recorded for each specimen.
Almost perfect agreement was seen for distinguishing between small cell lung cancer and NSCLC for all specimen types. Agreement in determination of NSCLC subtype for smear, H&E and IHC specimens was slight (κ=0.095, 95% CI -0.164–0.355), fair (κ=0.278, 95% CI 0.075–0.481) and moderate (κ=0.564, 95% CI 0.338–0.740), respectively. Perfect agreement was seen when all three observers were confident of diagnoses made on IHC specimens.
Interobserver agreement in interpretation of EBUS-TBNA specimens is moderate for determination of NSCLC subtype. Agreement is highest following examination of IHC specimens. Clinicians should be aware of the degree of pathologist confidence in the tissue diagnosis prior to commencement of subtype-specific therapy for NSCLC.
Endobronchial ultrasound (EBUS)-guided transbronchial needle aspiration (TBNA) was first developed to allow minimally invasive mediastinal staging of patients with known non-small cell lung cancer (NSCLC) [1]. More recently, it has been used to obtain diagnostic tissue samples in patients suspected of having lung cancer in whom computed tomography (CT) or positron emission tomography suggest mediastinal or hilar metastases [2–4]. Such an approach allows both diagnostic and staging information to be obtained in a single procedure, thereby expediting the management process, and is predicated upon the high diagnostic sensitivity of EBUS-TBNA [5]. Consequently, an increasing number of patients may have treatment decisions based solely upon small-volume tissue samples obtained at EBUS-TBNA.
Recent studies have demonstrated that NSCLC subtype determines the choice of systemic therapy in patients with advanced NSCLC [6], and the need for molecular characterisation of tumours [7]. Significant interobserver variability between pathologists in the subclassification of NSCLC subtypes in small specimens obtained at routine flexible bronchoscopy has previously been observed, with only “fair” agreement observed among pathologists examining NSCLC in endobronchial biopsies [8]. Cytological specimens are subject to significantly higher interobserver variability than histologic specimens [9]. Given the increasing importance of accurate NSCLC subclassification, we believed it was important to examine interobserver variability in evaluation of the small-volume samples obtained by EBUS-TBNA.
MATERIALS AND METHODS
Institutional review board approval was granted for the performance of this study (Melbourne Health Human Research Ethics Committee, Melbourne, Australia).
Patients
From the time of inception of EBUS-TBNA at our two tertiary referral centres, we have prospectively recorded demographic and detailed clinical information for all completed procedures. After performing a retrospective review of this database, we identified a convenience sample of 60 consecutive patients who underwent EBUS-TBNA for staging/diagnosis of known/suspected NSCLC.
Performance of EBUS-TBNA
Experienced consultant respiratory physicians performed the EBUS-TBNA performed EBUS-TBNA. A dedicated linear-array bronchoscope (BF- UC180F-OL8; Olympus, Tokyo, Japan) was used to visualise pathological lymph nodes, as directed by CT chest findings, before performance of EBUS-TBNA using a 22-gauge needle (NA-201SX-4022; Olympus). A maximum of three needle passes were performed with initial material transferred to the slides for rapid on-site cytological evaluation performed by a cytotechnician. All subsequent material was placed in formalin solution to allow the preparation of a cell block for histological evaluation and immunohistochemistry (IHC).
Specimen processing
Cytological slides were fixed in 95% ethanol and Papanicolaou stain was used for all slides. Material in formalin was centrifuged and 3% molten agar was added to the pellet, then solidified and processed as a cell block using a tissue processor. Serial sections were cut and placed on slides before staining with haematoxylin and eosin (H&E). IHC stains used were at the discretion of the pathologist at the time of original reporting of the specimen; however, they mostly consisted of panels including cytokeratin (CK)5/6 (Dako, Glostrup, Denmark; dilution 1:50), p63 (Dako; dilution 1:500), CK7 (Novocastra, Newcastle upon Tyne, UK; dilution 1:120) and thyroid transcription factor 1 (Novocastra; dilution 1:300). In some cases, markers of neuroendocrine differentiation, such as CD56 (Novocastra; dilution 1:50), chromogranin A (Dako; dilution 1:800) and synaptophysin (Dako; dilution 1:400), were also used. IHC stains were prepared using a Leica Bond automated immunostainer (Leica Microsystems, Wetzlar, Germany).
Pathology review
Three pathologists with experience in the reporting of lung cancer pathology specimens independently reviewed each specimen. All three are consultant anatomical pathologists with a minimum of 10 yrs experience. Each regularly attends Australian and international meetings in pulmonary pathology, though none are specialist pulmonary pathologists or pulmonary cytopathologists. All are regularly involved in multidisciplinary lung cancer meetings at their institution.
All specimens were deidentified and reviewed in random order. Smear, H&E and IHC specimens were reviewed without reference to other specimen types obtained from the same patient. Previously described cytological criteria were used for the diagnosis of each tumour subtype [10, 11].
Diagnoses were recorded on a specifically designed pro forma, first utilised by Burnett et al. [8] and outlined in the International Association for the Study of Lung Cancer (IASLC)/American Thoracic Society (ATS )/European Respiratory Society (ERS) guidelines for classification of NSCLC in small biopsy/cytology specimens [12]. Pathologists were also asked to record on the form whether they were confident or had some doubt regarding the diagnosis. Final diagnoses were classified as either NSCLC (adenocarcinoma, squamous cell carcinoma (SCC) or not otherwise specified (NOS)) or small cell lung carcinoma (SCLC). Diagnoses of large cell carcinoma were included in the NOS category. In a 2010 review, Travis et al. [13] noted that NSCLC-NOS is a more appropriate term than “large cell carcinoma” in small biopsy specimens, but recognised that the terms are frequently used interchangeably.
Evaluation of agreement using kappa statistics does not require any assumption about the "correct" diagnosis, therefore no separate confirmation of diagnoses determined by EBUS-TBNA was sought.
Statistical methods
Summary statistics were used to describe patient groups. Kappa statistics were used to calculate interobserver agreement. Degree of agreement was determined according to the widely used scale first described by Landis and Koch [14]. Categorical data was analysed using a two-sided Fisher's exact test. A p-value of 0.05 was considered significant. Statistical analysis was performed using GraphPad Instat 3 for Macintosh (GraphPad Software, La Jolla, CA, USA).
RESULTS
60 consecutive patients undergoing EBUS for evaluation of suspected/known primary lung cancer were identified. H&E specimens were available for all 60 patients, with matched “smear” and IHC specimens available in 49 and 36 patients, respectively. Kappa scores for each specimen type are summarised in table 1.
Cytology smears
All three pathologists gave concordant diagnoses after examination of smear specimens in 19 (39%) of 49 cases. Substantial agreement was seen in differentiation of SCLC from NSCLC (κ=0.701, 95% CI 0.420–0.982). Only slight agreement in determining NSCLC subtypes was seen (κ=0.095, 95% CI -0.164–0.355).
All three pathologists expressed confidence in their diagnosis in 13 (27%) of 49 smear specimens, with complete concordance in diagnosis seen in all 13 cases (κ=1.0). In contrast, where at least one pathologist expressed doubt, concordance was seen in only six (17%) of 36 cases. Agreement was reduced for this group of specimens, with moderate agreement in differentiation of SCLC from NSCLC (κ=0.426, 95% CI 0.097–0.948), and agreement in determination of NSCLC subtype less than that expected due to chance alone (κ= −0.194, 95% CI −0.418–0.030).
H&E
All three pathologists gave concordant diagnoses following examination of H&E specimens in 33 (55%) of 60 cases. In 23 cases, agreement was seen between two out of three pathologists, while in four cases of NSCLC, different subtypes were reported by each pathologist for the specimens studied (fig. 1). Final diagnoses for specimens where at least two of three pathologists concurred are recorded in table 2.
Demonstration of a smear specimen where each pathologist identified a different non-small cell lung cancer (NSCLC) subtype. Pathologist interpretation of the Papanicolaou-stained specimen (×400) included: 1) a reasonably cohesive group of malignant cells with some possible papillary structures and mildly pleomorphic eccentric nuclei, some with prominent nucleoli, suspicious for adenocarcinoma; 2) spindling of the tumour cells with streaming within the groups as well as dense keratin-like material in the background indicative of squamous cell carcinoma; and 3) cellular sheets with homogeneous non-orangophilic cytoplasm, and oval to elongated nuclei with hyperchromasia and some nucleoli, consistent with not otherwise specified NSCLC.
Almost perfect agreement was seen in differentiating SCLC from NSCLC (κ=0.814, 95% CI 0.562–1.067). Fair agreement in determination of NSCLC subtype between the three pathologists was seen (κ=0.278, 95% CI 0.075–0.481).
In 26 out of 60 cases, all three pathologists expressed confidence in their diagnosis, with concordant diagnoses made in 21 (81%) of these 26 cases. In contrast, where doubt was expressed by at least one pathologist, concordance was seen in only 11 (32%) of 34 cases. The comparison was highly significant (p=0.0002). The two most frequent sources of doubt identified were the poor differentiation of the tumour and a paucity of cells present for pathological examination.
Interobserver agreement for the subtyping of NSCLC was almost perfect when all three pathologists reported confidence in their results (κ=0.881, 95% CI 0.655–1.0). In contrast, only slight agreement was observed when at least one pathologist expressed doubt regarding their diagnosis (κ=0.143, 95% CI -0.119–0.405).
Immunohistochemistry
Complete agreement in differentiating SCLC from NSCLC following IHC analysis was observed following examination of IHC specimens. Overall agreement for determination of NSCLC subtype was moderate (κ=0.564, 95% CI 0.338–0.740).
All three pathologists expressed confidence in their diagnosis following examination of IHC specimens in 19 (53%) of 36 cases, with complete concordance of diagnosis seen between all three pathologists for these 19 cases (κ=1.0, 95% CI 1.0–1.0). In contrast, in the 17 cases where at least one pathologist expressed “doubt” regarding the final diagnosis, concordance was seen in only nine (53%), with the difference in concordance rates being highly significant (p=0.0008). Doubt was expressed by a pathologist for a total of 33 specimens out of 108 IHC examinations undertaken. Specific sources of doubt were recorded for 20 of these, the commonest being an inadequate panel of IHC stains performed to fully characterise the NSCLC subtype (n=14).
Diagnoses recorded for IHC were discordant with H&E diagnoses for the same pathologist in five, six and 11 cases for individual pathologists. The most frequent revision of diagnosis was from SCC on H&E to adenocarcinoma after IHC analysis (seven cases, of 22 overall discordant cases) (fig. 2). IHC specimens were available for 20 (58%) of 34 cases for which “doubt” regarding NSCLC subtype was expressed by the pathologist following examination of H&E specimens. Use of IHC resulted in confident diagnoses being made by all three pathologists in 10 (50%) of the 20 specimens. Agreement for specimens where at least one pathologist expressed “doubt” following examination of H&E specimens was significantly improved with use of IHC (from κ=0.143, 95% CI -−0.119–0.405, to κ=0.494, 95% CI 0.118–0.871).
Cytology smear specimens stained by Papanicolaou staining showing a) necrotic tumour cells (×200) mimicking keratinised squamous cells with shrunken dark nuclei and orangeophilic cytoplasm, and b) tumour cells (×400) appearing squamoid in appearance with dense cytoplasm and irregular hyperchromatic nuclei. c) Immunohistochemistry of corresponding cell block specimens demonstrates thyroid transcription factor 1 positivity. Cells were also cytokeratin (CK)7 positive but CK5/6 and p63 negative. Final diagnosis for this patient was adenocarcinoma.
IHC specimens were available for 14 (58%) of 24 cases where all three pathologists were confident of their H&E diagnosis. No pathologists altered a diagnosis of SCLC made on five H&E specimens. Despite confidence in their H&E diagnosis, at least one pathologist altered their final NSCLC subtype diagnosis following IHC analysis in three of nine cases (33%; 95% CI 12–65%).
Different interobserver agreement was seen for each pathologist when H&E diagnoses for NSCLC specimens were compared to IHC diagnoses made by the same pathologist, with differing agreement seen for each pathologist. One pathologist demonstrated complete concordance in diagnosis when their H&E diagnosis was compared to the paired IHC specimen from the same patient (κ=1.0), one pathologist altered the H&E diagnosis following review of IHC specimens in one case (κ=0.609, 95% CI −0.114–1.332) and one altered the diagnosis in three cases (κ=0.308, 95% CI -0.332–0.947).
DISCUSSION
The distinction between NSCLC subtypes is becoming increasingly important in determining optimal treatment. This is due to recent studies that have shown either increased efficacy or toxicity of chemotherapeutic [15–17] and biological [18, 19] agents in particular histologies, and the association of molecular abnormalities, such as epidermal growth factor receptor and echinoderm microtubule-associated protein-like 4–anaplastic lymphoma kinase gene abnormalities with adenocarcinoma histology [20, 21]. For this reason, an understanding of the reliability of NSCLC subtype as determined by EBUS-TBNA is critical to guide future clinical decision-making in patients in whom the only diagnostic tissue has been obtained by EBUS-TBNA. To our knowledge, no studies have previously examined interobserver agreement in interpretation of NSCLC obtained by EBUS-TBNA.
Our findings indicate that there is very high interobserver agreement between pathologists in distinguishing between NSCLC and SCLC (κ=0.814, 95% CI 0.562–1.067). However, agreement is lower for determination of NSCLC subtype. The agreement seen for both determination of NSCLC subtype (κ=0.278, 95% CI 0.075–0.481) and distinction of SCLC from NSCLC is consistent with agreement previously reported for bronchial biopsy specimens [8, 22, 23].
Two factors appear to improve interobserver agreement. First, pathologist confidence in their diagnosis appears to be associated with improved interobserver agreement. Concordance in final diagnosis was significantly higher among the three pathologists when all were confident of their diagnosis (H&E, p=0.0002; IHC, p=0.0004). Interobserver agreement was also higher when all three were confident in their diagnosis (κ=0.881 versus κ=0.143). Second, IHC appears to increase pathologists' confidence in their diagnoses. In 20 cases where previously on H&E examination at least one pathologist had expressed doubt, IHC allowed all three pathologists to be confident in their diagnosis in 10 of these cases (50%). This in turn improved interobserver agreement for these specimens (κ=0.494 versus κ=0.143).
The diagnosis NSCLC-NOS has been used to convey the difficulty in confidently determining the NSCLC subtype. The proportion of NOS as the histologic diagnosis in NSCLC has increased over time, which may be due to increasing use of minimally invasive means to achieve diagnosis [24]. Small-volume specimens may be paucicellular or have an absence of tissue architecture, making identification of tumour subtype more difficult [25]. Use of IHC stains may potentially overcome this by identifying differentiation (e.g. squamous or glandular differentiation) and by more accurately characterising the differentiation of the limited cellular material present.
Consistent with our findings, previous studies have noted a decreased proportion of NSCLC-NOS diagnoses made on small-volume samples with use of IHC studies [26], indicating improved ability to subtype NSCLC specimens as a result of IHC. Despite this, there are still a number of patients in whom confident diagnoses could not be made. This may reflect poor differentiation of underlying tumour rather than the limitations of EBUS-TBNA, as interobserver agreement has previously been reported to be lower in poorly differentiated tumours [27].
Our results highlight the importance of use of the NSCLC-NOS diagnosis to accurately convey pathologist doubt regarding the NSCLC subtype. Our results strongly suggest that doubt in the subtype diagnosis is associated with a low interobserver agreement and, by inference, the accuracy of NSCLC subtyping is likely to be low. For this reason, inclusion of a measure of pathologist confidence within the diagnostic report may be of value to clinicians. Our results also support performance of the minimum IHC panel of stains (when possible), as recommended by the IASLC/ATS/ERS guidelines, to maximise the likelihood of a confident diagnosis [12].
Original reports confirming the excellent diagnostic accuracy of EBUS-TBNA in evaluation of the mediastinum did not compare NSCLC subtype diagnoses to those obtained at surgical resection [1, 28, 29]. Therefore, while diagnostic accuracy of EBUS-TBNA for detection of NSCLC matches [30], or even exceeds [31], that of mediastinoscopy, the accuracy in determination of NSCLC subtype remains unclear. One study examined accuracy of EBUS-TBNA in subtype determination in 23 specimens (retrospectively selected from over 1,800 EBUS-TBNA procedures performed) [32]. However, in 19 of these, comparison was made solely with other small-volume biopsies (e.g. transbronchial biopsy or CT-guided fine-needle aspiration). Diagnostic accuracy in interpretation of small-volume specimens obtained at routine bronchoscopy has previously been suggested to be as low as 50% for identification of NSCLC subtype [23], making use of these as gold standard measures highly problematic. Furthermore, those authors did not examine interobserver variability in discordant diagnoses, which we have demonstrated to be significant for EBUS-TBNA specimens. Of note, consistent with our findings, the study reported that accuracy was improved with examination of H&E slides, and improved further with use of IHC [32].
A more recent study examined the accuracy of fine-needle aspirate cytology (FNAC) specimens in differentiating squamous from non-squamous NSCLC [33]. The authors retrospectively reviewed 474 patients who had NSCLC diagnosed by FNAC and identified 186 who had tissue retrieved by other means and noted good agreement between cytological and histological diagnoses (κ=0.755). The study did not use IHC to achieve cytological diagnosis nor did the authors examine interobserver variability in cytological diagnosis, which our study suggests may be significant, and for 60% of patients, only endobronchial biopsies were available as the reference test. Given the significant interobserver variability [8] and limited diagnostic accuracy [23] for such specimens, the clinical utility of these findings is uncertain.
Given the poor interobserver variability, our results suggest that subtype-specific therapies should not be based on smear diagnosis alone unless the classic cytological features of SCC, adenocarcinoma or small cell carcinoma are present. If a confident diagnosis of a NSCLC subtype cannot be made on examination of smears alone, examination of a H&E specimen coupled with use of the minimum panel of IHC, as recommended by the IASLC/ATS/ERS guidelines [12], is suggested. Prior to commencement of subtype-specific therapies, review of such specimens in a multidisciplinary setting may inform clinicians of the level of confidence a pathologist has in a particular diagnosis and the manner in which the diagnosis was made, e.g. examination of smears alone versus use of an IHC panel.
Limitations
Pathologists involved in the study were experienced, but not expert, pulmonary pathologists. Interobserver agreement and confidence may differ based on the experience of reporting pathologists. While agreement may be higher between experienced pulmonary pathologists, interobserver agreement noted in this study is comparable to that reported by Burnett et al. [8] among pathologists not experienced in evaluation of lung pathology. This suggests that our findings may accurately represent clinical practice in the majority of centres worldwide where anatomical pathologists report on EBUS-TBNA specimens rather than specialist pulmonary pathologists.
We have not attempted to examine the accuracy of EBUS-TBNA. While histological specimens would constitute the gold standard, the intrinsic heterogeneity of NSCLC [12] means that even surgically resected specimens are subject to less-than-perfect interobserver agreement [9, 34]. The fact that EBUS-TBNA demonstrated NSCLC in these patients means that further biopsy was clinically unnecessary. Prior biopsy specimens, if present, would also mostly be small-volume specimens (e.g. transbronchial lung biopsy (TBLB)), where poor accuracy has previously been reported [23]. Any retrospective analysis comparing EBUS-TBNA diagnosis with surgical diagnosis will have several inherent biases as only a select minority of patients will have surgical tissue available. The question of accuracy is answerable only through a prospective study.
Furthermore, previous studies have suggested that no more than 30% of lung carcinomas are of a single cell type [35]. Heterogeneity may mean that discrepant diagnoses in some cases are likely. Therefore the optimal assessment of accuracy may in fact be consensus among multiple pathologists. While some experts have suggested that cytology smears may provide greater nuclear and cytoplasmic resolution than histology [25], our results suggest that this is most likely to be achieved through H&E examination and use of IHC staining, rather than simply relying on a smear diagnosis.
A standardised panel of IHC markers was not used. Use of IHC markers was at the discretion of the pathologist at the time the biopsy was performed. 40% of samples were not analysed immunohistochemically. The reason some specimens were not subject to IHC analysis remains unknown, and it is not known if this would alter the recorded interobserver agreement. However, improvement in agreement is clearly demonstrated through use of IHC and we believe the absence of IHC analysis for a minority of specimens does not significantly alter the findings of the study. IHC analysis of Papanicolaou-stained smear specimens is possible [36], although it is less reliable for examination of Diff-Quik (Point of Care Diagnostics, Artarmon, Australia) prepared smears [37]. The interobserver variability in reporting of such specimens remains unknown [37].
Conclusions
In summary, our findings confirm very high agreement between pathologists of differentiation of SCLC from NSCLC in specimens obtained by EBUS-TBNA. We also confirm only slight interobserver agreement in determination of NSCLC subtype based on cytology smear specimens, and fair agreement following examination of H&E specimens obtained by EBUS-TBNA. Agreement in NSCLC subtyping is improved when pathologists feel confident in their diagnosis, and both confidence and interobserver agreement may be improved with use of IHC.
Our results highlight the value of IHC analysis of low-volume specimens to confirm histological subtyping in patients prior to commencement of therapy with divergent clinical outcomes based on tissue subtypes. Finally, it is important that clinicians are aware of the degree of pathologist confidence in the tissue diagnosis prior to commencement of subtype-specific therapy for NSCLC. Inclusion of a measure of pathologist confidence within reports may be of value to treating clinicians.
Footnotes
Support Statement
D.P. Steinfort is supported by a post-graduate research scholarship from the National Health and Medical Research Council of Australia, and by the Roslyn Hogan Early Detection of Lung Cancer Award, awarded by the Australian Lung Foundation.
Statement of Interest
None declared.
- Received June 27, 2011.
- Accepted January 11, 2012.
- ©ERS 2012