Abstract
Computer-aided reading (CAR) of medical images is becoming increasingly common, but few studies exist for CAR in tuberculosis (TB). We designed a prospective study evaluating CAR for chest radiography (CXR) as a triage tool before Xpert MTB/RIF (Xpert).
Consecutively enrolled adults in Dhaka, Bangladesh, with TB symptoms received CXR and Xpert. Each image was scored by CAR and graded by a radiologist. We compared CAR with the radiologist for sensitivity and specificity, area under the receiver operating characteristic curve (AUC), and calculated the potential Xpert tests saved.
A total of 18 036 individuals were enrolled. TB prevalence by Xpert was 15%. The radiologist graded 49% of CXRs as abnormal, resulting in 91% sensitivity and 58% specificity. At a similar sensitivity, CAR had a lower specificity (41%), saving fewer (36%) Xpert tests. The AUC for CAR was 0.74 (95% CI 0.73–0.75). CAR performance declined with increasing age. The radiologist grading was superior across all sub-analyses.
Using CAR can save Xpert tests, but the radiologist's specificity was superior. Differentiated CAR thresholds may be required for different populations. Access to, and costs of, human readers must be considered when deciding to use CAR software. More studies are needed to evaluate CAR using different screening approaches.
Abstract
Automated CXR reading reduces expensive TB test use; differentiated scoring may be needed and costs must be reduced http://ow.ly/jPC8309Rx3F
Introduction
Modelling and research suggest that chest radiography (CXR) can be a good screening tool to identify people in need of further testing to diagnose tuberculosis (TB), especially when the follow-on diagnostic test is expensive [1, 2]. However, although diagnostic testing for TB in high burden settings is generally free, the cost of CXR is almost always borne by the care seeker and can be quite costly [3, 4]. In many settings there is also a dearth of trained staff who can quickly read CXR images, and there is a high degree of inter-reader variability, resulting in both over- and under-diagnosis of TB [5, 6]. These limitations have generated interest in scoring systems for chest radiography [7] as well as a growing awareness of, interest in, and use of computer-aided reading (CAR) software.
CAR software can quickly evaluate digital radiographs to recognise abnormalities in lungs fields to identify people in need of further diagnostic testing. CAR software is already commonplace in mammography services [8], but CAR for identifying TB abnormalities on chest radiographs is still in the development stage. All of the published literature for CAR for TB has evaluated CAD4TB [9], developed by Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, the Netherlands. CAD4TB is currently the only commercially available CAR software for TB. Evaluations of CAD4TB in high TB burden countries have been conducted in Africa [10–14] in collaboration with the software's developers. Briefly, CAD4TB identifies shape and textural abnormalities in CXR images to produce an abnormality score ranging from 0 (normal) to 100 (highly abnormal). A cut-off score can be selected, above which people should receive a diagnostic TB test. The optimal cut-off score may depend on the prevalence of abnormal lung images in the screened population and may vary by setting. Given the lack of evidence around its use, no formal guidance from the World Health Organization (WHO) is available advising to which abnormality scores to set the cut-off. The literature evaluating CAD4TB demonstrates the software's sensitivity has improved with each version and that its performance is similar to that of human readers [15].
Smear microscopy is overwhelmingly used as the diagnostic test in most high TB burden countries and is inexpensive [16], but it is not nearly as sensitive as the Xpert MTB/RIF (Xpert) assay [17, 18]. WHO currently recommends Xpert as an initial diagnostic test for TB “where resources permit” [19]. This condition on the recommendation highlights the usefulness of a sensitive triage test (i.e. CXR) to reduce costs by minimising the number of expensive Xpert tests performed [20, 21]. The Xpert assay has a concessional price of USD9.98 for public-sector providers, with shipping and importation costs increasing the cost [22]. There has been much global interest around Xpert [23], with huge increases in procurement of the assay [22].
In many Asian settings, especially in urban areas, most people choose to first access the private healthcare market because it is more convenient and/or is perceived as having a higher standard of care than public facilities [24, 25]. However, private providers are not eligible for concessional pricing, and higher procurement costs are passed onto patients, making the test unaffordable for many people with TB in high burden countries [26]. As a result, most countries have not engaged the private sector for Xpert use [27]. Referral systems for free Xpert testing may be an attractive way to link public- and private-sector providers.
To address the lack of evidence around CAD4TB in Asian settings, and the high use of private-sector care in Asia, we designed a prospective study to link patients visiting private providers and public facilities to Xpert testing and evaluated the performance of CAD4TB software as a triage tool in Dhaka, Bangladesh.
Methods
Study sites
With support from the Stop TB Partnership's TB REACH initiative, three screening centres were established across Dhaka. The screening centres operate as a social enterprise, where any funds generated through the sale of laboratory tests are then used to fund programmatic TB care. A Delft EZ DR X-ray system linked with CAD4TB software version 3.07 was installed in each screening centre, as well as multiple four-module GeneXpert systems. A custom-built mHealth application connected to a medical record system (OpenMRS) systematically captured demographic and symptom data from all individuals visiting the screening centres. A referral network of more than 2000 private providers was established to refer clients for testing services. Any private provider could refer patients from testing to a screening centre for testing based on clinical suspicion. In addition, 133 facilities linked to the National TB Control Program (NTP) were engaged to encourage referral of smear-negative individuals for Xpert testing. The screening centres were also open to walk-in clients with no referral history.
Screening, diagnostic and treatment algorithm
Adults (>15 years) presenting at any of the screening centres were verbally screened for a cough of more than 2 weeks duration as both public- and private-sector referrals were made independently of the study based on the clinical decisions of the referring physicians. Symptomatic individuals paid approximately USD6.40 (BDT 500) for a CXR, which was immediately analysed using CAD4TB software. The cost was on the lower end of the digital CXR market in Dhaka, and Xpert testing and TB care were provided without additional charges. Regardless of the CAD4TB abnormality score, all symptomatic individuals were asked to submit a sputum sample and received a free Xpert test. If the Xpert test failed (invalid, error or no result), it was performed again to obtain a valid outcome if enough sample remained. A Bangladeshi, board-certified radiologist with 10 years of experience read all CXR images offsite at a cost of USD0.50 per image. The radiologist was blinded to both the CAD4TB score and the Xpert result. CXRs were graded as highly TB suggestive, possible signs of TB, non-TB abnormality, and normal. In addition, the radiologist provided a standard radiology report to guide clinical care. Those unable or unwilling to pay for the CXR were referred to a nearby public-sector facility.
People with positive Xpert results were actively referred to the NTP-linked health facility nearest to their home, but were also allowed to receive anti-TB treatment from their private referring physician. If a TB patient opted to receive care with his/her referring physician, treatment initiation and outcomes were reported to the NTP by the study. Individuals with rifampicin-resistant TB were referred to the National Institute of Diseases of the Chest and Hospital for additional testing, clinical evaluation and second-line treatment initiation.
All enrolled participants provided informed written consent. The study protocol was reviewed and approved by the research review committee and the ethical review committee at the International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b).
Data analysis
De-identified data were abstracted from OpenMRS and analysed using R version 3.2.4 (R Foundation for Statistical Computing, Vienna, Austria). The radiologist's grades were dichotomised with highly suggestive of TB and possible TB grading considered “abnormal”. Basic demographic data, the yields from Xpert testing, and the grading of CXRs by the radiologist and CAD4TB software were compared across age groups and referral groups for people with valid Xpert results. We calculated the sensitivity, specificity, and positive and negative predictive values (PPV and NPV) of the radiologist's grading and different CAD4TB abnormality scores using Xpert as the bacteriological standard. Receiver operating characteristic (ROC) graphs were generated using the pROC package [28]. We calculated the theoretical numbers of Xpert tests saved by using CXR as a triage tool using different CAD4TB cut-off values.
Results
Between May 2014 and February 2016, 18 746 people visited the screening centres and were verbally screened. Only 53 individuals declined to pay for the CXR. Of the remaining, 18 036 symptomatic individuals aged 15 years and over were identified, and 17 134 (95%) of them were able to provide a sputum specimen for testing with Xpert. Among the people tested, 17 066 (99.6%) had a valid Xpert result and were included in the analysis.
The final sample comprised 11 368 (67%) men and 5698 (33%) women. The greatest proportion of test results came from people who were referred by their private provider (n=12 340, 72%), while other referral groups accounted for smaller and similar proportions (table 1). Just over half of participants were aged 26–54 years, and 26% (n=4485) of the participants were aged 55 years or older.
Demographic and clinical characteristics of people screened with chest radiography (CXR) and Xpert MTB/RIF
Overall, 2623 people (15%) had Mycobacterium tuberculosis-positive (MTB+) results. The yield was lowest among walk-ins (8% versus 17% for the other referral groups). Young adults (≤25 years) had the highest rates of TB (22%), while middle age and older groups had yields of 14% and 12%, respectively. Among people with MTB+ results, 139 (5%) were resistant to rifampicin. The radiologist graded 8428 (49%) CXRs as abnormal, overall. The radiologist's CXR abnormality rate was lowest among walk-ins compared to the NTP facility and private provider referrals (27% versus 48% and 54%, respectively). The radiologist also graded a greater proportion of CXRs as abnormal among participants 55 years and older (61% versus 45%). For all participants, the median (interquartile range) CAD4TB abnormality score was 79 (48–95). Walk-ins had the lowest median score (55, interquartile range 37–82). Those 55 years and older had a median CAD4TB score of 93, whereas it was 65 among young adults.
If CXR had been used as a triage tool before Xpert testing, the radiologist's binary grading would have a sensitivity and specificity of 91% and 58% overall (table 2). The sensitivity of the radiologist was around 90% for all groups, while the specificity ranged from 53% (private providers) to 78% (walk-ins). Among older participants, specificity was 43% and among those aged less than 25 years it was 70%. The overall PPV of the radiologist was 28%, and the NPV was 97%. Using CAD4TB to approximate the radiologist's sensitivity for the whole sample, a threshold score of 63 would have been chosen (table 2), and it would have varied from 56 to 82 for the different sub-groups. However, CAD4TB specificity was inferior to the radiologist at each threshold score (for the whole sample it was 41% versus 58%). Specificity of CAD4TB was below the radiologist for all referral types and age groups when holding sensitivity similar. The trade-offs between sensitivity and specificity across the different referral and age groups can be visualised in the ROC graphs in figures 1 and 2. The graphs show that, regardless of the CAD4TB cut-off value, the radiologist performed better across all groups. The area under the curve (AUC) for all referral groups is 0.74 (95% CI 0.73–0.75). The CAD4TB software performed significantly better among walk-ins (AUC 0.84, 95% CI 0.81–0.87) compared to people referred from NTP facilities (AUC 0.77, 95% CI 0.74–0.79) and private providers (AUC 0.72, 95% CI 0.70–0.73). The software also performed significantly better among young adults (AUC 0.81, 95% CI 0.80–0.83) than the older age group (AUC 0.66, 95% CI 0.65–0.69). AUC by gender was not significantly different (data not shown).
Performance and comparison of radiologist and computer automated reading for tuberculosis diagnosis
Receiver operating characteristic analysis for the detection of Mycobacterium tuberculosis stratified by referral type.
Receiver operating characteristic analysis for the detection of Mycobacterium tuberculosis stratified by age group.
Table 3 presents the results of the analysis of potential Xpert tests saved using different algorithms. Compared to testing all symptomatic individuals with Xpert, using CXR with the human reader could have saved 51% of Xpert tests, with a total loss of 9% (n=249) of the MTB+ cases. To approximate the sensitivity of the radiologist, a CAD4TB threshold score of 63 would be used and it would have saved 36% of the Xpert tests. To save 50% of Xpert tests using CAD4TB, a higher threshold score of 80 would be needed, resulting in 18% (n=473) of MTB+ individuals being missed. Tables of performance of CAD4TB at each threshold and for each referral and age group are presented in the supplementary files.
Performance of different algorithms for Xpert tests saved and cases missed in Bangladesh
Discussion
To our knowledge, this is the first published study to evaluate CAD4TB software in Asia and also the first using a public–private partnership. Our evaluation's sample size was also large. We identified more people with confirmed TB than the combined enrolment of all other published CAD4TB studies to date [15]. CAD4TB software can quickly identify people who require additional diagnostic testing for TB, with small losses in TB detection, and has the potential to save using many Xpert tests. However, the usefulness of CAD4TB will be heavily dependent on the setting and will probably need further improvements in performance and cost reductions to become a useful triage tool in urban Asian settings.
Published studies using CAD4TB have been conducted in Africa, where there are relatively fewer trained radiologists than in Asia [29]. Although a number of studies have shown CAD4TB software to be comparable to a trained human reader [10–14], the radiologist in our study performed better across every referral group, and was less expensive. The use of CAD4TB software as a triage tool in passive case finding in Dhaka (to rule out unnecessary tests) may not be as beneficial as using it as a screening tool to identify people in need of TB testing in rural settings and/or active case-finding approaches where high throughput, the need for quick decisions, and shortages of highly skilled readers are more likely. More prospective studies are needed to evaluate CAD4TB in screening and active case-finding interventions, because all of the published studies have been in populations with TB prevalence ranging from 18% to 60% [7].
To our knowledge, this is also the first published evaluation of the CAD4TB software across different sub-groups. The CAD4TB software performed differently depending on the age of the patient and referral group, which should be further evaluated. It is not surprising that CAD4TB performed worse among older individuals, as they have more time to develop non-TB-related abnormalities. We expect the differences in referral type are probably explained by other factors, but could not identify them through the current analysis. Our findings suggest that, to maximise case detection and Xpert test conservation, differentiated CAD4TB threshold scores may be needed, highlighting the need for piloting the software in each setting to identify the ideal cut-off scores. For instance, a study from Zambia achieved 100% sensitivity by including 83% of the individuals for testing [12], while in our evaluation 99.9% of individuals would need testing to achieve 100% sensitivity (supplementary files).
There are several limitations of our study. We used Xpert, not culture, as the end-point of our diagnostic algorithm, meaning that people with Xpert-negative, culture-positive TB have incorrectly been labelled as TB-free. The available culture facilities could not cope with the specimen throughput, and funds were not available to systematically culture even a sample of those tested. Given the evidence for intra-reader variability [5, 13], future work should evaluate CAD4TB against multiple radiologists with varying experience in reading chest radiographs, as our results included only one reader. We did not test for HIV, but Bangladesh's TB–HIV co-infection rate is estimated to be less than 1% [30]. The entry point for our diagnostic algorithm was TB symptoms, meaning that CXR was not used to detect asymptomatic individuals with TB, who make up the majority of TB cases in recent prevalence surveys [31]. Our sample comprises only people willing to pay for CXR. Although this is certainly a different population than recruited for other CAD4TB evaluation studies, it reflects the reality of health services in many Asian settings where people choose to pay for services [32]. Just 53 people came to the screening centres and decided not to pay. Because patients pay for CXR out of their own pocket (whether attending public or private facilities) in almost every high TB burden country [3, 4], our results may be more representative of populations who receive CXR as a triage tool than studies that provide CXR for free. As with other studies evaluating CAD4TB software, we did not measure the possible beneficial effect that having the CXR results immediately available might have had on dropout in a diagnostic algorithm which otherwise would require multiple visits. This is an area, especially for active case finding/screening interventions, that requires more research, as there may be value in having an immediate decision to access a diagnostic test rather than waiting for a human reader. Finally, we used version 3.07, and the latest software version released after this evaluation (version 5) may provide better results. Future studies should compare the results in different software versions to demonstrate improvements.
There are a number of lessons for those interested in implementing CAD4TB screening and Xpert testing outside a research setting. Offering the clients of private providers the opportunity for a free Xpert test was an excellent incentive to encourage referrals and can be a useful tool for public–private partnerships. There were 39 GeneXpert systems operating in the public sector in Bangladesh during this intervention, yet none were available at a for-profit price, probably due to lack of market. Educating clinicians about the interpretation of the abnormality score is essential. It was difficult to explain to providers that only one in three people with a score of 95 would test MTB+ (supplementary file) when CAD4TB is marketed for TB. The developers of CAD4TB have produced software updates that are compatible with multiple radiography systems, which will allow for wider adoption of the technology. Developing a version for children may be of interest as well, given the low rates of TB case-finding among children globally [30]. The cost of a CAD4TB reading must be low enough to be an attractive option for screening initiatives, as a radiologist report costs around USD0.5 per image read in Bangladesh. Furthermore, if paying clients demand a radiology report and a printed film, it can add costs for implementers, making the running costs of direct digital systems with CAD4TB higher than what has been reported [14]. Finally, newer versions of CAD4TB require an internet connection, and sending CXR images to telemedicine centres for reading may be an alternative depending on costs, turnaround times and reading quality.
Conclusions
In Bangladesh, the CAD4TB software worked on a high quality, direct digital CXR system, and has the potential to quickly screen out a large number of symptomatic individuals who do not need further diagnostic TB tests. If such an algorithm had been employed, it would have resulted in a significant reduction in the number of Xpert tests used. However, the software's overall performance was inferior and more expensive than a highly trained radiologist in this setting. The software may be more useful in high throughput programmes such as active case-finding interventions where a quick decision on further testing is critical, especially if trained readers are scarce and/or costs are high. The next versions' improvements, costs of CAD4TB, and its compatibility with different radiography systems will help determine its usefulness overall.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary data ERJ-02159-2016_Supplement
Disclosures
Supplementary Material
S. Banu ERJ-02159-2016_Banu
A.J. Codlin ERJ-02159-2016_Codlin
J. Creswell ERJ-02159-2016_Cresswell
T. Islam ERJ-02159-2016_Islam
M.A.S. Khan ERJ-02159-2016_Khan
A. Nahar ERJ-02159-2016_Nahar
Z.Z. Qin ERJ-02159-2016_Qin
M.M. Rahman ERJ-02159-2016_MM_Rahman
M.T. Rahman ERJ-02159-2016_MT_Rahman
M. Reja ERJ-02159-2016_Reja
Footnotes
This article has supplementary material available from erj.ersjournals.com
Support statement: This study was supported by the Stop TB Partnership’s TB REACH initiative and was funded by the Government of Canada and UNITAID. Funding information for this article has been deposited with the Crossref Funder Registry.
Conflict of interest: Disclosures can be found alongside this article at erj.ersjournals.com
- Received November 3, 2016.
- Accepted January 9, 2017.
- Copyright ©ERS 2017
This ERJ Open article is open access and distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0.