Abstract
Artificial intelligence holds promise as an aid to interpreting complex medical data, such as pulmonary function tests. Transparency and reproducibility are just as important as diagnostic accuracy for the success of this technology. http://bit.ly/2vYjZ8w
To the Editor:
We read with interest the recent article by Topalovic et al. [1], in which they showed that artificial intelligence (AI) could interpret pulmonary function tests (PFTs) and reach a diagnosis with accuracy greater than individual pulmonologists, and approximately equal to that of an expert panel. Their work raises a number of important issues which should be further explored.
First, a number of caveats to their work should be acknowledged. PFTs, like any other investigation, can only be interpreted in the light of the pre-test probability of the diagnoses under consideration. The probabilities of different diagnoses suggested by the AI algorithm would have been heavily influenced by their prevalence within the training dataset. This may not have been fully representative of a population with undifferentiated symptoms encountered in clinical practice, since it was enriched with less common diagnoses such as neuromuscular disease and pulmonary vascular disease. Furthermore, the authors do not describe in detail how the 50 cases within the test dataset were chosen: were they truly random or to a certain extent hand-picked to ensure a broad mix of diagnoses? The AI algorithm would have been advantaged if the diagnostic prevalence in the test dataset broadly matched that of the training dataset.
The authors show that the AI algorithm could match an expert panel with respect to diagnostic ability, but that individual pulmonologists fell far short of this. While it may be over-optimistic to expect every individual pulmonologist to reach the level of an expert panel, the results do suggest that there is room for improvement in the training of pulmonologists in PFT interpretation. Patout et al. [2] recently reported that only 16% of French pulmonology trainees had attended a placement with a PFT laboratory, but that those who had attended such a placement performed significantly better on a written test of PFT knowledge. We consider that all pulmonology trainees should attend an accredited course on PFT interpretation and undertake a short attachment with a regional PFT laboratory.
Much of the focus of AI research has hitherto been on training AI algorithms to learn patterns from data in order to emulate or surpass human abilities. Often this has involved training the AI using human expert judgement as the gold standard (AI learning from humans). However, there has been relatively little attention paid to the transfer of knowledge from AI back to humans. The AI algorithm developed by Topalovic et al. [1] presents the opportunity to do exactly that. It would be relatively straightforward to develop an interface for educational use that would allow users to alter the various parameters included in the model and observe how the probabilities of the different diagnoses change accordingly. Moreover, it would be worthwhile exploring the multidimensional space of the model with an expert's hand, in order to discern whether the algorithm has uncovered completely novel associations between PFT patterns and disease, or simply recapitulated the multitude of patterns which are already known to experts. As AI becomes increasingly embedded into medical practice it is important that the diagnoses and treatment decisions emerging from computer algorithms are presented together with a rationale or explanation that is understandable to humans. This will allow clinicians to sense-check the computer output as well as ensuring continued two-way learning between artificial and human intelligence.
We encourage the authors to make the detailed structure and parameters of their model publicly available to allow others to investigate its properties. We consider that manuscripts reporting the results of AI research should specify their models in detail within a supplementary appendix in order to allow others to replicate their research. Open source code should also be made available within an online repository. As the field matures it is likely that standard checklists will be developed by the scientific publishing community to ensure that all relevant aspects of AI research are reported, particularly in clinical and other domain-specific journals.
Footnotes
Conflict of interest: S. Gonem reports personal fees from 3M and Anaxsys, outside the submitted work.
Conflict of interest: S. Siddiqui reports grants from Sir Jules Thorne Trust, NIHR-EME, Anaptys Bio, MRC/EPSRC and Napp, personal fees from AstraZeneca, GSK, Novartis, Boehringer Ingelheim, Mundipharma and Owlstone Medical, outside the submitted work.
Support statement: This work was supported by the National Institute for Health Research (NIHR) Leicester Respiratory Biomedical Research Centre. The views expressed are those of the authors and not necessarily those of the National Health Service, the NIHR or the Department of Health. Funding information for this article has been deposited with the Crossref Funder Registry.
- Received March 29, 2019.
- Accepted April 11, 2019.
- Copyright ©ERS 2019