Copyright ©ERS Journals Ltd 2008 Multidisciplinary interobserver agreement in the diagnosis of idiopathic pulmonary fibrosis1 University Hospitals, Catholic University of Leuven, Leuven, and 2 East Limburg Hospital, Genk, Belgium, 3 Dept of Internal Medicine I, Grosshadern Clinic, Ludwig Maximilian University, Munich, 4 Dept of Internal Medicine III, Johannes Gutenberg University Clinic, Mainz, and 5 Dept of Respiratory Medicine, Ruhrland Clinic, Essen, Germany, 6 Evelyn Hospital, Cambridge, and 8 Royal Brompton Hospital, London, UK, 7 Cardiological Hospital, Bordeaux University Hospital, Bordeaux, and 9 Pitie-Salpetriere Hospital, Paris, France, 10 Zambon Group, Bresso, Italy. CORRESPONDENCE: M. Thomeer, UZ Leuven, Afdeling Longziekten, Herestraat 49, B - 3000 Leuven, Belgium. Fax: 32 16346803. E-mail: michiel.thomeer{at}scarlet.be Keywords: Idiopathic pulmonary fibrosis, kappa coefficient, lung biopsy, radiology
Received: May 11, 2006
The purpose of the present study was to evaluate the accuracy of the diagnosis of idiopathic pulmonary fibrosis (IPF) by respiratory physicians in six European countries, and to calculate the interobserver agreement between high-resolution computed tomography reviewers and histology reviewers in IPF diagnosis. The diagnosis of usual interstitial pneumonia (UIP) was assessed by a local investigator, following the American Thoracic Society/European Respiratory Society consensus statement, and confirmed when a minimum of two out of three expert reviewers from each expert panel agreed with the diagnosis. The level of agreement between readers within each expert panel was calculated by weighted kappa. The diagnosis of UIP was confirmed by the expert panels in 87.2% of cases. A total of 179 thoracic high-resolution computed tomography scans were independently reviewed, and an interobserver agreement of 0.40 was found. Open or thoracoscopic lung biopsy was performed in 97 patients, 82 of whom could be reviewed by the expert committee. The weighted kappa between histology readers was 0.30. It is concluded that, although the level of agreement between the readers within each panel was only fair to moderate, the overall accuracy of a clinical diagnosis of idiopathic pulmonary fibrosis in expert centres is good (87.2%). Idiopathic pulmonary fibrosis (IPF) is a specific form of a chronic fibrosing interstitial pneumonia limited to the lung, and is typically characterised by the histological appearance of usual interstitial pneumonia (UIP) on open or thoracoscopic lung biopsy (OLB and TLB, respectively) 1. The clinical diagnosis of IPF is based on the exclusion of known causes of interstitial lung disease, a restrictive lung function pattern with impaired gas exchange and the presence of a typical pattern of bibasilar reticular abnormalities with minimal ground-glass opacities on thoracic high-resolution computed tomography (HRCT) 1. Patients with IPF show worse survival than those with other types of idiopathic interstitial pneumonia 2–4. Since the diagnosis of IPF depends upon the expertise of the pathologist and radiologist, it is important that the clinician knows the diagnostic accuracy of thoracic HRCT and of lung biopsy in UIP. Various studies have calculated the accuracy of thoracic HRCT in fibrotic lung diseases 5–7, evaluated interobserver agreement for the diagnosis of different thoracic HRCT patterns (e.g. ground-glass and reticular pattern) 8, 9 in patients with biopsy-proven nonspecific interstitial pneumonia (NSIP) or UIP 3, or in different forms of interstitial lung disease 10, 11. Studies on interobserver agreement amongst pathologists are sparse 12, and only one study with a multicentric prospective design has addressed the issue of diagnostic accuracy in UIP in relation to both radiologist and pathologist 7. No study has addressed this issue in view of the new American Thoracic Society (ATS)/European Respiratory Society (ERS) consensus criteria 1. Therefore, the aim of the present study was to evaluate the diagnostic accuracy of respiratory physicians in IPF, and to calculate the interobserver agreement between HRCT reviewers and histology reviewers in the diagnosis of UIP.
Patients All of the patients presented in the current study were included in the Idiopathic Pulmonary Fibrosis International Group Exploring N-Acetylcysteine I Annual (IFIGENIA) trial 13. The IFIGENIA trial is a European prospective double-blind placebo-controlled trial studying the effect of high-dose N-acetylcysteine in combination with standard therapy (prednisone and azathioprine) in patients with IPF. Following the judgment of a local investigator, patients were included if the diagnosis of IPF was based on the international consensus criteria 1, and they were aged 18–75 yrs. Newly diagnosed (<6 months) as well as previously diagnosed (>6 months) patients were considered for the study. The IFIGENIA trial was approved by the local ethical committee of the participating centres and every patient signed their informed consent.
HRCT scanning protocol
Review by the radiology committee
Review of lung biopsy specimens by the histology committee
Definite diagnosis of UIP
Statistics
A total of 36 local investigators from six European countries were included (table 1
Radiology reviewer A reviewed 178 HRCT scans (one scan was never reviewed), reviewer B 176 (two scans were judged uninterpretable and one was never reviewed) and reviewer C 176 (two scans were judged uninterpretable and one was never reviewed; fig. 1
All 82 biopsy specimens (44 OLB and 38 TLB) were sent to the international trial coordinator for review by pathology reviewers D and E. After combining the observations of the three histology reviewers, the 178 OLB/TLB observations were judged to be unlikely in 33 (18.5%) cases, probable in 66 (37.1%) and very suggestive in 76 (42.7%) for the diagnosis of UIP. For three (1.7%) observations, the biopsy slide was judged to be uninterpretable. Reviewer D reviewed all 82 OLB/TLB specimens and reviewer E 79 (three were judged uninterpretable). Histology reviewer F was solicited to review 14 biopsy slides (fig. 1
In 12.8% of the patients, the diagnosis of UIP was rejected by at least one review committee (table 1
Table 2
Two salient findings emerge from the present study. First, the diagnosis of IPF proposed by a respiratory specialist was rejected in 12.8% of cases after review of histology and HRCT by expert committee. Secondly, the mean level of agreement between the three different HRCT reviewers was 0.40, and between the two pathology reviewers 0.30. The diagnostic accuracy of a pulmonary physician in IPF in relation to the ATS/ERS diagnostic criteria 1 remains to be established. A confident diagnosis of IPF proposed by a clinician was confirmed in 87.2% of cases in the present study. The rejection of the diagnosis was not based on clinical criteria, but rather on HRCT and/or lung biopsy findings that were not compatible with the diagnosis of UIP. Hunninghake et al. 7 found the probability of a patient being given a confident diagnosis by the referring clinician to be 81%, similar to the present results. Although the study of Hunninghake et al. 7 represented the first published prospective multicentric study regarding the level of agreement between clinicians, radiologists and pathologists as to the diagnosis of IPF, it is not clear from their study on which clinical grounds the diagnosis of IPF was made, since no clinical or radiological criteria were provided for the diagnosis of IPF.
The present study also addressed the question of agreement between histology reviewers in the diagnosis of UIP in view of the new pathological classification 1. The interobserver agreement between the histology reviewers was low, with a mean
The level of agreement between the HRCT readers was fair to moderate 14. This
It is important to emphasise the fact that radiologists with differing levels of experience and expertise may interpret radiographic images differently. The radiologists in the present study are specialists in thoracic imaging and have extensive expertise in the interpretation of HRCT scans. Each reader was blinded to the clinical parameters, and the reading was performed separately, so that the different readers could not influence each other. The w was used to evaluate observer variability in order to remove that component of agreement attributable to chance. Although this method of statistical analysis permits a more accurate assessment of observer variability than unadjusted data, a w may underestimate a high level of agreement 14. In the present study, the level of agreement between the different readers was unexpectedly low. Might this be due to the high prevalence of the disease in the study population, or might it be observer variation bias? The interpretation of w depends upon the prevalence of the disease, which was high (0.84) in the present study 14. The prevalence of the disease was high because the HRCT scans and lung biopsy slides were from patients selected by a local investigator who had already confirmed the diagnosis of IPF to conform to the ATS/ERS criteria. The higher prevalence of the disease in the present study population is a possible explanation for the conflicting finding that 67% of histologically confirmed UIP cases gave HRCT results that were reported as unlikely. However, this is not astonishing, since a recent publication reported that 59% of patients with definite or probable NSIP on CT had a histological diagnosis of UIP 4. The present authors assume that many of the CT scans that had been reported as being unlikely for UIP would fulfil the CT criteria for NSIP. The results of the present study may evoke concerns about diagnostic accuracy in IPF. This form of lung fibrosis is a rare disease, and no single accurate diagnostic test for it exists. Studies of diagnostic accuracy in IPF are performed mostly in tertiary referral centres. Even in these studies, significant interobserver variability exists. In most of these studies, as in the present one, prior knowledge of the presence of a form of interstitial lung disease exists, which may incite observer bias and therefore influence the results in terms of diagnostic accuracy. The incidence of IPF is low in a general pulmonary practice. Diagnostic accuracy (i.e. sensitivity and specificity) also depends upon the prevalence of the disease. A lower prevalence of disease results in a higher number of false positive and false negative diagnoses. If very costly therapeutic options come to market in the future, the only means of ensuring the greatest possible diagnostic accuracy in IPF is to refer patients to centres with expertise in pulmonary histology and thoracic imaging and clinical experience of IPF 22. In summary, it has been shown that the accuracy of a clinical diagnosis of idiopathic pulmonary fibrosis is 87.2%. Given that idiopathic pulmonary fibrosis has such a poor prognosis 2 in relation to other forms of idiopathic interstitial pneumonia 3, it was concluded that the use of independent high-resolution computed tomography and histology panels to ensure accurate diagnosis of idiopathic pulmonary fibrosis, as performed in the Idiopathic Pulmonary Fibrosis International Group Exploring N-Acetylcysteine I Annual study 13, is extremely valuable and helps minimise bias. The present study demonstrated that the use of reviewer panels for radiology and histology in idiopathic pulmonary fibrosis trials is feasible. It is important that the clinician knows that an accurate diagnosis of idiopathic pulmonary fibrosis requires specific expertise that is available in tertiary referral centres with the close collaboration of histopathologists, radiologists and clinicians.
Statements of interest for M. Thomeer, M. Demedts, C.D.R. Flower, J. Verschakelen, F. Laurent, A.G. Nicholson, E.K. Verbeken, F. Capron, M. Sardina, G. Corvasce and I. Lankhorst, and the Idiopathic Pulmonary Fibrosis International Group Exploring N-Acetylcysteine I Annual (IFIGENIA) study can be found at www.erj.ersjournals.com/misc/statements.shtml
The members of the Idiopathic Pulmonary Fibrosis International Group Exploring N-Acetylcysteine I Annual (IFIGENIA) study group are as follows. Steering committee: J. Behr (Grosshadern Clinic, Ludwig Maximilian University, Munich, Germany); R. Buhl (Johannes Gutenberg University Clinic, Mainz, Germany); U. Costabel (Ruhrland Clinic, Essen, Germany); R. Dekhuijzen (Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands); M. Demedts (Chairman) and M. Thomeer (University Hospitals, Catholic University of Leuven, Leuven, Belgium); H.M. Jansen (Academic Medical Centre, Amsterdam, The Netherlands); W. MacNee (University of Edinburgh Medical School, Edinburgh, UK); and B. Wallaert (Calmette Hospital, Lille Regional University Hospital, Lille, France). Country coordinators: P. de Vuyst (Erasmus University Hospital, Brussels, Belgium); B. Wallaert (France); J. Behr (Germany); S. Petruzzelli (Cardiothoracic Dept, Pisa University, Pisa, Italy); J.M.M. van den Bosch (St Antonius Hospital, Nieuwegein, The Netherlands); E. Rodríguez-Becerra (Virgen del Rocío University Hospital, Seville, Spain); W. MacNee (UK). Radiology review committee: C.D.R. Flower (Evelyn Hospital, Cambridge, UK); J. Verschakelen (University Hospitals, Catholic University of Leuven, Leuven, Belgium); F. Laurent (Cardiological Hospital, Bordeaux University Hospital, Bordeaux, France). Histology review committee: A.G. Nicholson (Royal Brompton Hospital, London, UK); E.K. Verbeken (University Hospitals, Catholic University of Leuven, Leuven); F. Capron (Pitie-Salpetriere Hospital, Paris, France). Local investigators. Belgium: M. Demedts, P. de Vuyst, E. Michiels (East Limburg Hospital, Genk), H. Slabbynck (Middelheim General Hospital, Antwerp), M. Thomeer. France: A. Bourdin and P. Chanez (Arnaud de Villeneuve Hospital, Montpellier), J. Cadranel (Tenon Hospital, Paris), P. Camus (Le Bocage University Hospital, Dijon), P. Delaval (Pontchaillou Hospital, Rennes), N. Just and B. Wallaert (Calmette Hospital, Lille Regional University Hospital, Lille, France), J.F. Muir (Bois Guillaume Hospital, Rouen). Germany: U. Costabel, R. Baumgartner (Grosshadern Clinic, Ludwig Maximilian University, Munich), J. Behr, R. Bonnet and I Mäder (Bad Berka Central Clinic, Bad Berka), R. Buhl, A.M. Kirsten (Johannes Gutenberg University Clinic, Mainz), R. Loddenkemper (Heckeshorn Lung Clinic, Zehlendorf Clinic, Berlin), A. Meyer (Eppendorf University Hospital, Hamburg), J. Müller-Quernheim (Borstel Research Centre, Medical Clinic, Borstel), H. Steveling (Ruhrland Clinic, Essen, Germany), T. Welte (Magdeburg University Clinic, Magdeburg), H. Worth (Clinic Fürth, Fürth). Italy: G. Anzalone (Prato Hospital, Prato), G.B. Bottino (DIMI, Genoa University, Genoa), G. Bustacchini (S. Maria delle Croci Hospital, Ravenna), M. Dottorini (R. Silvestrini Hospital, Perugia), S. Gasparini (Torrette Hospital, Torrette di Ancona), C. Giuntini (Cardiothoracic Dept, Pisa University, Pisa), A. Rossi (IRCCS S. Matteo General Hospital, Pavia), G. Simon (Azienda Ospedaliera Villa Sofia, Palermo). The Netherlands: F. Beaumont (Bosch Medicentrum, Locatie Grootziekengasthuis, Hertogenbosch), M. Drent (Maastricht University Hospital, Maastricht), H.M. Jansen, J.M.M. van den Bosch, and F.J.J. van den Elshout (Rijnstate Hospital, Arnheim). Spain: J. Ancochea Bermudez (Hospital Universitario de la Princesa, Madrid), L. Callol Sanchez (Hospital Universitario Del Aire, Madrid), J.L. Llorente (Hospital De Cruces, Baracaldo-Bilbao), J.M. Rodriguez-Arias and I. Vigil (Hospital Sant Pau, Barcelona), E. Rodríquez-Becerra (Hospital Universitario Virgen del Rocío, Seville). Zambon personnel and consultants: A. Ardia (consultant), M. Sardina, G. Corvasce, and I. Lankhorst (consultant).
This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||