Abstract
Accurate assessment of prognosis in idiopathic pulmonary fibrosis remains elusive due to significant individual radiological and physiological variability. We hypothesised that short-term radiological changes may be predictive of survival.
We explored the use of CALIPER (Computer-Aided Lung Informatics for Pathology Evaluation and Rating), a novel software tool developed by the Biomedical Imaging Resource Laboratory at the Mayo Clinic Rochester (Rochester, MN, USA) for the analysis and quantification of parenchymal lung abnormalities on high-resolution computed tomography. We assessed baseline and follow-up (time-points 1 and 2, respectively) high-resolution computed tomography scans in 55 selected idiopathic pulmonary fibrosis patients and correlated CALIPER-quantified measurements with expert radiologists’ assessments and clinical outcomes.
Findings of interval change (mean 289 days) in volume of reticular densities (hazard ratio 1.91, p=0.006), total volume of interstitial abnormalities (hazard ratio 1.70, p=0.003) and per cent total interstitial abnormalities (hazard ratio 1.52, p=0.017) as quantified by CALIPER were predictive of survival after a median follow-up of 2.4 years. Radiologist interpretation of short-term global interstitial lung disease progression, but not specific radiological features, was also predictive of mortality.
These data demonstrate the feasibility of quantifying interval short-term changes on high-resolution computed tomography and their possible use as independent predictors of survival in idiopathic pulmonary fibrosis.
Abstract
Short-term quantified CT changes are predictive of survival in IPF http://ow.ly/qmbjd
Introduction
Baseline measures of disease severity have been used to predict survival in idiopathic pulmonary fibrosis (IPF) and include clinical features, pulmonary function measures, computed tomography (CT) findings and composite scoring indices [1–4]. While these measures have been validated in several studies, they appear inferior to dynamic surrogates of disease progression, such as longitudinal changes in physiological indices, i.e. decline in forced vital capacity (FVC) and diffusing capacity of the lung for carbon monoxide (DLCO) over 6–12 months [2, 5, 6]. However, the utility of such longitudinal physiological indices for individual patient management is still significantly limited by substantial intra-individual variability, perhaps with confounding by coexisting conditions, such as emphysema or pulmonary hypertension [7, 8].
CALIPER (Computer-Aided Lung Informatics for Pathology Evaluation and Ratings) is an image analysis tool developed by the Biomedical Imaging Resource Laboratory at the Mayo Clinic Rochester (Rochester, MN, USA) for the characterisation and quantification of lung parenchymal findings on high-resolution computed tomography (HRCT). The detection and quantification of pulmonary parenchyma by CALIPER is based on histogram signature mapping techniques trained through expert radiologist consensus assessment of pathologically confirmed training sets obtained through the Lung Tissue Research Consortium (LTRC) [9]. We hypothesised that this novel computer-aided method of analysing pulmonary parenchymal features on chest HRCT would provide a reproducible and accurate assessment of disease that may correlate with the semi-quantitative assessment of radiologists. We further postulated that the overall extent or short-term changes in parenchymal features as detected by CALIPER or radiologists may correlate with and be independently predictive of survival in patients with IPF.
Materials and methods
Patient and CT selection
Selected patients evaluated at the Mayo Clinic from January 2000 to December 2010, with IPF as diagnosed according to the latest international consensus guideline [10] (with or without surgical lung biopsy), and for whom at least two serial HRCTs had been obtained within a 3–15-month interval were included, representing time-points 1 and 2. HRCTs obtained at the time of acute exacerbation, infection, fluid overload or thromboembolic disease were excluded from the study. In addition, the HRCT had to satisfy technical requirements for CALIPER, which include high-resolution images (≤2-mm slice thickness). The scans in this study were acquired using LightSpeed Ultra (BONE kernel; GE Healthcare, Cleveland, OH, USA) and Model Sensation 64 (B46 kernel; Siemens, Munich, Germany) scanners.
Collected demographic data included age, sex and smoking history with number of pack-years. Time of diagnosis was defined as the first date of contact with a clinical care provider who made or suspected diagnosis of IPF based on assessment of clinical presentation, HRCT scan and/or biopsy. Patients were followed to end-points of death or last known visit date recorded in the medical record at Mayo Clinic. Patients who underwent lung transplantation were followed until the transplant date and censored, with interval HRCT changes prior to this included in the analysis.
The study protocol was approved by the Mayo Clinic institutional review board (IRB number 10-000320). All included subjects provided informed consent.
Pulmonary function studies
All included patients had corresponding pulmonary function test (PFT) data obtained within 30 days of the selected scans. Spirometry was performed using a Puritan Bennett Renaissance pneumotachography-based flow spirometer (Mallinckrodt, St Louis, MO, USA) according to standards set by the American Thoracic Society/European Respiratory Society guidelines [11]. Flows were expressed as % predicted using the reference equations of Miller et al. [12].
Radiologists' semiquantitative assessment of CT
For comparison of radiologists' semiquantitative assessment of HRCT findings with CALIPER quantification, two subspecialty thoracic radiologists (T.E. Hartman and B.J. Bartholmai) independently reviewed the HRCT scans unblinded to time-points 1 and 2 for each patient and scored them in a standardised manner while blinded to the clinical data and CALIPER results. The scoring system consisted of a visual estimation of the percentage of parenchymal involvement, to the nearest 5%, in a particular region for four radiological abnormalities: emphysema, ground-glass opacities, reticular opacities and honeycombing. The distribution of abnormalities was scored separately within each of 12 regions: the central and peripheral zones of right upper, right middle, right lower, left upper, lingula and left lower lobes. The reviewing radiologists also recorded their assessment of interval stability, progression or improvement of overall CT abnormalities between the two scans of each subject. The percentage of regional involvement for each abnormality and total involvement (summing the radiologist’s assessment for each region) of each of the radiological parameters (emphysema, ground glass, reticulation and honeycombing) was used for statistical analysis.
CALIPER analysis
Data processing
The data processing step included lung extraction and segmentation with identification and extraction of central airways and central vascular structures, manual segmentation of the anatomical lobes, and classification of the remaining pulmonary parenchyma. Segmentation of the lungs was achieved using an adaptive density-based morphological approach [13]. Airways were segmented using iterative three-dimensional region growing and connected components analysis. Pulmonary vessels were extracted using an optimised multi-scale tubular structure enhancement filter [14]. Additional sub-segmentation into anatomical lobes by manual tracing technique was performed by a thoracic radiologist (B.J. Bartholmai) and determination of central/peripheral regions was completed by three-dimensional volume erosion techniques, such that the peripheral zone represented ∼50% of the total volume of each lung. Separate inclusion of the perihilar regions into the central zone of each lobe was performed by semi-automated determination of the hila through tracheal tree analysis and inclusion of a spherical region of 5 cm, bilaterally, from the hilar points. Parenchymal tissue type detection and quantification of the classes (normal, emphysema, ground glass, reticular and honeycombing) (fig. 1) was performed using a sliding window supervised classification scheme.
During the pre-processing supervised training phase, multiple (n=976) 15×15×15-pixel volumes of interest were selected through independent analysis by four subspecialty thoracic radiologists from HRCT scans of 14 subjects with proven pathological diagnosis of diffuse pulmonary disease (interstitial lung disease (ILD) of emphysema) or known control subjects without ILD. The exemplar candidates with agreement on the class of abnormality by all four radiologists were used to determine canonical histogram signatures for each of the classes of visual abnormality with automatic cluster affinity techniques and those signatures of each of the visual classes were used for the volumetric classification of the HRCT data of test subjects (fig. 2).
Tissue quantification
CALIPER analysis involved algorithmic identification and volumetric quantification of five radiological parenchymal features: normal lung, emphysema, ground-glass density, reticular abnormalities and honeycombing measured in total litres for the whole lung. Total ILD was defined as the volumetric sum of total ground-glass density, reticular abnormalities and honeycombing. Percentage ILD was defined as the ratio of the sum of the total ILD divided by the CALIPER segmented total lung parenchymal volume.
Statistical analysis
Associations between CALIPER measures, PFTs and radiologists’ estimation of percentage involvement were assessed using Spearmans' correlation coefficient. Changes in CALIPER measures between the two CT scans of each subject were calculated for total volume of normal lung, ground-glass density, reticular abnormalities, honeycombing, total lung volume, total ILD and percentage ILD. The association of change in CALIPER measures and change in radiologists’ assessments of regional semi-quantitative findings and overall impression of progression, regression or stability of disease with survival was assessed using Cox proportional hazards regression. A landmark survival analysis was performed with survival indexed from the date of the second CT, a method we chose in order to avoid survival bias conditioned on living to the second CT. Both univariable and multivariable analyses including sex, pack-years, baseline FVC % pred, baseline DLCO % pred and time between CTs were performed. In all cases, two-sided p-values <0.05 were considered statistically significant.
Results
Patient and HRCT characteristics
55 patients were included in the analysis with a mean±sd age of 72.4±6.9 years and a sex distribution of 51% males (n=28) (table 1). 31 (56%) of the 55 patients had undergone surgical lung biopsy to confirm usual interstitial pneumonia pathology, with the remainder diagnosed on the basis of typical usual interstitial pneumonia pattern on HRCT and clinical exclusion of known causes of pulmonary fibrosis. The median transplant-free survival was 2.1 years (range 1.1–3.4 years) from the second HRCT time-point. For the 18 subjects not known to be deceased, the median follow-up time was 2.4 years (range 0.1–8.5 years). Mean±sd time between serial HRCT was 289.2±109 days. Although the study did not exclude subjects based on differences in the brand or model of scanner between the two scans, 75% of subjects had studies performed on the same brand of scanner at both time-points (Siemens and GE Healthcare).
PFTs
PFT values (absolute and % pred values) corresponding to the two CT scan time-points are presented in table 2 along with % pred differences over the interval.
CALIPER analysis
Mean volumetric quantification by CALIPER in litres for each specific radiological parameter at time-points 1 and 2 are presented in table 3 along with respective differences over the time interval. As CALIPER quantifies the total volumetric proportion of ground glass, reticulation and honeycombing in patients presenting with varying stages of IPF severity, mean total and percentage changes over time rather than single time-point volumes were used in survival analysis. Radiologist interpretation of radiological findings at time-points 1 and 2 are presented in table 4. Mean percentages represent scoring to the closest 5% for each region and averaged across 12 regions for a percentage of that radiological finding in the whole lung. Results of univariable analysis adjusted for sex, pack-years, baseline FVC % pred and baseline DLCO % pred are presented in table 5. A statistically significant association with survival was noted for changes in DLCO % pred (hazard ratio (HR) 2.14, 95% CI 1.27–3.60; p=0.004) and total lung capacity (HR 4.17, 95% CI 1.42–12; p=0.009) on analysis of pulmonary function testing. Adjusted analysis of changes over time for CALIPER measured percent ILD (HR 1.52, 95% CI 1.08–2.15; p=0.017), total ILD volume (HR 1.70, 95% CI 1.19–2.43; p=0.003) and total reticulation volume (HR 1.91, 95% CI 1.21–3.0; p<0.006) were associated with survival (table 4).
The correlation between total lung volume as measured by CALIPER and total lung capacity measured by PFT was very good (r=0.77 (p<0.001) and r=0.87 (p<0.001) for time-points 1 and 2, respectively).
Radiologist correlation with CALIPER and survival
Interobserver agreement between two expert radiologists and CALIPER is reported in the online supplementary tables S3 and S4. Overall, correlation between the two radiologists for ILD scoring (ground-glass opacities, reticular and honeycombing) was moderate to substantial in all regions (range 0.33, p<0.001 to 0.73, p<0.001 for time-point 1, and 0.33, p<0.001 to 0.77, p<0.001 for time-point 2). Correlations were overall best for ground glass and reticular findings in all lobes and poorest for honeycombing. Correlation between radiologist 1 and CALIPER for ILD scoring was mild to moderate (range 0.29, p=0.001 to 0.63, p<0.005 for time-point 1 and 0.24, p<0.001 to 0.64, p<0.001 for time-point 2) and mild to moderate (range 0.29, p<0.001 to 0.48, p<0.001 for time-point 1 and 0.28, p<0.001 to 0.64, p<0.001 for time-point 2) for radiologist 2. After adjustment for smoking history (pack-years), sex, FVC and DLCO % pred, and time between HRCT scans, change in individual ILD parenchymal findings as assessed by radiologists were not predictive of survival (table 5). Radiologist interpretation of overall global change in terms of progression of ILD was predictive of survival.
Discussion
CALIPER represents an automated volumetric quantification tool for assessing specific parenchymal radiological features on HRCT. In our study of IPF patients, CALIPER measured short-term (3–15 months) reticular changes, and percentage and total ILD changes were predictive of survival. Correlation between radiologists was moderate to substantial regarding ground glass and reticular findings when estimating to the nearest 5%, and mild to moderate between radiologists and CALIPER for those same ILD findings. No specific parenchymal estimates of ILD were predictive of survival with radiologist assessment, although overall global assessment of disease progression or change by radiologists was predictive in our study.
The absence of a gold standard to validate accuracy of CALIPER volumetric measurements to specific regions in a pathological specimen led us to attempt to correlate longitudinal changes in quantitative estimates of radiological fibrosis to patient outcomes, specifically mortality. As the quantitative assessment of fibrosis by CALIPER is not influenced by confounding conditions such as emphysema or pulmonary hypertension that may affect pulmonary function measures, we felt radiological features may reflect more directly the progression of fibrotic processes and could represent a novel and promising tool in the assessment and management of patients with IPF.
The mild-to-moderate correlation between CALIPER measurements and estimates of extent of fibrosis by two expert radiologists in interstitial lung diseases is reassuring and suggests that CALIPER may indeed allow both accurate and reproducible assessments, though CALIPER’s semiquantitative measurements of specific parenchymal ILD features may prove more advantageous when subtle changes are not readily recognised between serial scans. Estimation using a scoring system to the nearest 5% allowed for moderate-to-substantial correlation between two experienced radiologists in our study, but only had mild-to-moderate correlation with CALIPER. We note that while algorithmic quantitative analysis by CALIPER may detect subtle abnormalities that radiologists may not, misclassification of abnormalities with similar density but different morphology (such as honeycombing versus emphysema) may occur with CALIPER or differ slightly from the radiologist assessment (ground glass versus reticular abnormality), explaining a difference in correlation.
Interestingly, radiologists' global assessment of short-term ILD progression was also predictive of survival in our study. We know, in general, that radiologists' quantitative assessments and diagnostic interpretation are often inconsistent with one another, although our degree of radiologist correlation was higher. Perhaps this may be explained by our reviewing radiologists being subspecialists in thoracic radiology at the same institution, and both having reviewed and standardised cases and terminology as training for other ILD research studies. Nonblinding to time-points 1 and 2 may have increased radiologist vigilance for expected change between interval scans and biased interpretation of disease progression. Nonetheless, this conclusion is gratifying in regards to experienced radiologists detecting progression over shorter time intervals as being statistically predictive of survival, although detection of such subtle changes may not be reproducible or consistent across all practices and institutions. A quantitative method such as CALIPER may provide this consistency, particularly as detection of subtle progression over shorter time intervals may be valuable in estimating survival and perhaps be used as a marker of treatment response in future clinical trials. We are encouraged that correlation of our reproducible quantitative assessments of disease with outcomes is significant, independent of correlation with the subjective descriptions of disease features by a radiologist.
Prognostication of IPF remains challenging due to the paucity of validated surrogate markers of disease progression and an unpredictable natural history. The use of physiological data as surrogate markers has significant limitations. First, physiological measures, such as FVC and DLCO, are characterised by significant intra-individual variability. In fact, the threshold of 10% decline in FVC is below the degree of normal intra-individual variability, as suggested by recent guidelines [15]. Secondly, physiological measures provide only indirect estimates of the progression of fibrosis, and may be affected by coexisting emphysema and/or pulmonary hypertension that are frequently associated with IPF [7, 8]. Lastly, PFTs may not be sensitive enough to detect subclinical progression of fibrosis: FVC decline as defined by current thresholds is a relatively rare event in IPF clinical trials, which has led some authors to suggest that marginal declines in FVC may be more sensitive, though less specific [16].
A number of other studies suggest that the extent of fibrosis assessed semiquantitatively on HRCT is a strong predictor of outcomes in IPF [1, 4, 17]. Correlation of quantitative estimates of pulmonary fibrosis by automated analysis to mortality has been the object of few reports [1, 18, 19]. This line of research has been hampered by the lack of validation methodology and poor access to high-quality HRCT allowing for volumetric assessment of the lung parenchyma. Most of the available data on quantitative analysis of lung fibrosis have focused on the predictive power of baseline HRCT abnormalities.
The use of longitudinal quantitative indices of lung fibrosis as identified on HRCT could represent an appealing alternative to physiological measures. An accurate and reproducible method allowing monitoring of fibrosis progression on HRCT would be a valuable surrogate marker of disease. Unfortunately, assessment of fibrosis volumes by expert radiologists has been hampered by substantial intra- and interobserver variability, and quantitative CT indices using fractal analysis and global histogram-based methods have not been validated or found helpful in clinical practice thus far [1, 4, 19–24]. CALIPER is based on a texture-sensitive volumetric analysis that allows automated classification of lung parenchyma according to a database of HRCT volumes of interest validated by radiologists using data from the LTRC [9]. The majority of existing expert systems and associated quantitative tools depend on strictly controlled image acquisition protocols to provide consistent results. We believe careful selection of training sets for CALIPER enables more reproducible classification, whose greyscale local histogram-based algorithms are less affected by image noise, reconstruction kernel and other scan parameters.
While our preliminary results are promising, we recognise the limitations of our study. First, the CALIPER technology was developed based on “lung pattern signatures” derived from standardised CT analyses and acquisition protocols used in the LTRC database [9]. Scan parameters used for CT in this retrospective study were different to those used in the LTRC database and were not always identical for the scans at the two time-points. Despite differences in acquisition parameters for some of the data sets, the mild-to-moderate correlation between radiologist semiquantitative assessment and CALIPER analysis supports the validity of our results. We postulate that our histogram signature-based method may be more robust and less sensitive to specific slice thickness or image reconstruction parameters than other texture-based or pixel counting techniques. Secondly, the requirement of two serial HRCT obtained solely for follow-up purposes led us to exclude patients with HRCT obtained for coexisting illnesses (heart failure, infection and acute exacerbation) and others with only one HRCT available. This exclusion criteria arguably limits the external validity of our study, as included patients were more likely to represent a subset of IPF patients with gradual decline rather than stable (less likely to have repeat HRCT) or unstable patients (more likely to be lost to follow-up, die or experience acute exacerbation). However, the fact that the median survival of included patients was similar to that typically seen for patients with IPF is reassuring in this regard. Finally, the number of patients included was small, due to stringent eligibility criteria. While small, the numbers are comparable to those used in prior studies evaluating the value of longitudinal trends in physiological measures. We recognise these limitations and believe that further validation of our preliminary results is warranted, including prospective analysis of standardised HRCT with equivalent time intervals, comparison of IPF abnormalities to those of other ILD and application of short-term changes found with CALIPER to predicting acute exacerbation or the presence of related complications such as pulmonary hypertension.
In conclusion, we have shown that CALIPER characterisation and quantification of lung parenchyma on HRCT correlates with visual assessment by expert radiologists, and that quantitative short-term volumetric longitudinal changes on serial HRCT correlate with IPF mortality. We believe that quantitative features of ILD on HRCT determined by CALIPER may represent an accurate and reproducible biomarker for IPF that warrants further validation studies and application.
Footnotes
This article has supplementary material available from www.erj.ersjournals.com
Support statement: This study was funded by the Brewer Award for Research in Idiopathic Pulmonary Fibrosis and Other Interstitial Lung Diseases. This publication was supported by NIH/NCRR CTSA grant number UL1 RR024150. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.
Conflict of interest: Disclosures can be found alongside the online version of this article at www.erj.ersjournals.com
- Received May 5, 2012.
- Accepted February 19, 2013.
- ©ERS 2014