Abstract
Deep learning software can provide decision support to radiologists. New evidence shows that these tools are almost ready for implementation in clinical practice. https://bit.ly/3vk4h5t
Artificial intelligence (AI) presents an attractive opportunity for providing decision support to radiologists, who are often overburdened by the ever-increasing number of radiographs that are requested each year [1]. Interpretation errors, reporting delays and backlogs, particularly of chest radiographs (CXR), continue to be a major problem faced by busy radiology departments.
Deep learning is a branch of AI that shows particular promise, being proficient at identifying patterns in large quantities of data and mapping these patterns to simple categories, such as diagnosis, without the need for human programming [2]. This technology is particularly suited to medical imaging analysis and with the increasing workload faced by radiologists, may play an invaluable role in providing instantaneous decision support and reducing perceptual errors and reporting delays. The autonomous learning capabilities of deep learning algorithms also create an opportunity for developing novel image-based biomarkers that are not easily detected visually [3].
Several AI applications for CXR diagnosis have been developed for specific tasks, such as the detection of lung nodules [4], tuberculosis [5] or pneumothorax [6], and have demonstrated radiologist-level accuracy. However, an obstacle to implementing these tools in clinical practice is that they are designed to identify only one specific pathology and have been validated in retrospective, in silico settings. Also, although there has been some attempt to develop systems capable of identifying multiple different CXR abnormalities [7–9], many lack prospective validation in real-world clinical practice. Furthermore, it is unclear how easily deep learning-based applications will integrate into a radiologist's clinical workflow and what impact they will have on efficiency. These data will be critically important if deep learning technology is to be routinely adopted.
In this issue of the European Respiratory Journal, Nam et al. [10] present a deep learning algorithm trained to identify 10 abnormalities on CXR. The abnormalities selected (pneumothorax, mediastinal widening, pneumoperitoneum, nodule/mass, consolidation, pleural effusion, linear atelectasis, fibrosis, calcification and cardiomegaly) include some of the more significant findings routinely encountered in clinical practice. The software was tested on an internal and external cohort of cases and showed an excellent detection accuracy, with area under the curve 0.893–1.0, depending on the patient cohort and abnormality. The algorithm also demonstrated improved sensitivity when compared to radiologists, with lower specificity. This is particularly attractive for radiologists facing increased workloads where fatigue and stress can lead to search and recognition errors during reporting [11]. By ensuring a low false-negative rate, important pathology is not missed, while false-positives can been be reviewed and discarded by the radiologist.
Importantly, the authors tested its applicability in clinical practice by integrating it into the picture archiving and communication system (PACS). The algorithm boosted the detection performance of the radiologists and helped in identifying those CXRs needing more urgent attention. Using the software as a prioritisation tool allowed a significant reduction in the time-to-report of critical cases by sorting radiographs based on clinical urgency.
As with any deep learning software development, the training methodology is critical to the algorithm's performance. Although the quantity of data available for training is often a limiting factor, how accurately training data are labelled is equally important. In the study reported by Nam et al. [10], the software was trained with more than 146 000 radiographs, which were reviewed and labelled by board-certified radiologists. One concern with the human labelling of diagnostic images is the inevitable bias that is introduced because of the subjective nature of the evaluation; visual assessment of medical images is notoriously susceptible to interobserver disagreement [12]. This problem may be amplified when the diagnostic reference standard is not well-defined [13]. Nam et al. [10] attempted to mitigate this difficulty by labelling CXRs, where available, based on same-day CT findings. Only a proportion of CXRs had same-day computed tomography (CT) available, so one might expect that the algorithm performance might have been improved if more CT data had been available. This highlights a pervasive problem when developing deep learning applications for medical imaging analysis; most institutions do not have access to the requisite quantity of high-quality data for algorithm training. Several international initiatives are underway to address this challenge by creating large, diverse multi-institution repositories of imaging data to support deep learning research. Such a resource could also serve as common dataset for benchmarking algorithms as wells as comparing the performance of different applications [3].
Another major obstacle to implementing this technology in clinical practice is that the complexity of deep learning algorithms limits their interpretability [14]. This issue is exacerbated when an algorithm is basing its predictions on features that human observers cannot detect. In recent years efforts have been made to improve algorithm interpretability; attribution techniques, such as saliency mapping, can help to identify which features have most influence on algorithmic decision making. Nam et al. [10] used probability maps in order to localise the regions representing the target abnormality on the CXR. This feature allows the radiologist to be more confident in confirming or rejecting a diagnosis made by the software.
Introducing deep learning software to the daily workflow can dramatically impact radiology reporting, just as the introduction of PACS did many years ago. However, this will take time; radiologists will need to become familiar with, and fully understand, these tools before they can be implemented in routine practice [7]. Lastly, prospective clinical utility studies are needed to test algorithm performance in real world clinical settings and demonstrate patient benefit over current best practice. The study by Nam et al. [10] represents a significant step in the right direction.
Shareable PDF
Supplementary Material
This one-page PDF can be shared freely online.
Shareable PDF ERJ-00625-2021.Shareable
Footnotes
Conflict of Interest: L. Calandriello declares honoraria from Boehringer Ingelheim and Roche.
Conflict of interest: S.L.F. Walsh declares a fellowship and honoraria from the National Institute for Health Research; consultancy fees and honoraria from Boehringer Ingelheim, Sanofi-Genzyme, Galapagos, Roche, Bracco, Fluidda, the Open Source Imaging Consortium, Oncoarendi Therapeutics and Medscape; and advisory board membership for Boehringer Ingelheim, Sanofi-Genzyme, Galapagos and Roche.
- Received March 1, 2021.
- Accepted March 2, 2021.
- Copyright ©The authors 2021. For reproduction rights and permissions contact permissions{at}ersnet.org