Diagnostic accuracy of centralised assays for TB detection and detection of resistance to rifampicin and isoniazid: a systematic review and meta-analysis
- Mikashmi Kohli1,2,
- Emily MacLean1,2,
- Madhukar Pai1,2,
- Samuel G. Schumacher3,5 and
- Claudia M. Denkinger3,4,5
- 1Dept of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
- 2McGill International TB Centre, McGill University, Montreal, QC, Canada
- 3Foundation for Innovative New Diagnostics, Geneva, Switzerland
- 4Centre for Infectious Diseases, University Heidelberg, Heidelberg, Germany
- 5S.G. Schumacher and C.M. Denkinger are joint senior authors
- Samuel G. Schumacher, FIND, Campus Biotech, Chemin des Mines 9, Geneva 1202, Switzerland. E-mail: samuel.schumacher{at}finddx.org
Abstract
Various diagnostic companies have developed high throughput molecular assays for tuberculosis (TB) and resistance detection for rifampicin and isoniazid. We performed a systematic review and meta-analyses to assess the diagnostic accuracy of five of these tests for pulmonary specimens. The tests included were Abbott RealTime MTB, Abbott RealTime RIF/INH, FluoroType MTB, FluoroType MTDBR and BD Max MDR-TB assay.
A comprehensive search of six databases for relevant citations was performed. Cross-sectional, case-control, cohort studies, and randomised controlled trials of any of the index tests were included. Respiratory specimens (such as sputum, bronchoalveolar lavage, tracheal aspirate, etc.) or their culture isolates.
A total of 21 included studies contributed 26 datasets. We could only meta-analyse data for three of the five assays identified, as data were limited for the remaining two. For TB detection, the included assays had a sensitivity of 91% or more and the specificity ranged from 97% to 100%. For rifampicin resistance detection, all the included assays had a sensitivity of more than 92%, with a specificity of 99–100%. Sensitivity for isoniazid resistance detection varied from 70 to 91%, with higher specificity of 99–100% across all index tests. Studies that included head-to-head comparisons of these assays with Xpert MTB/RIF for detection of TB and rifampicin resistance suggested comparable diagnostic accuracy.
In people with symptoms of pulmonary TB, the centralised molecular assays demonstrate comparable diagnostic accuracy for detection of TB, rifampicin and isoniazid resistance to Xpert MTB/RIF assay, a WHO recommended molecular test.
Abstract
In people with symptoms of pulmonary TB, the centralised molecular assays demonstrate comparable diagnostic accuracy for detection of TB, rifampicin resistance, and isoniazid resistance to existing WHO recommended tests https://bit.ly/3kQE20V
Introduction
Tuberculosis, caused by Mycobacterium tuberculosis complex (MTBC), has surpassed HIV/AIDS as the world's leading infectious cause of death. The World Health Organization (WHO) estimates that, in 2018, 10 million people became ill with tuberculosis, and approximately 1.45 million died of the disease. In 2018, only half of all confirmed tuberculosis patients underwent drug susceptibility testing [1].
The introduction and rollout of nucleic acid amplification tests (NAATs) has significantly improved the area of tuberculosis diagnosis by providing rapid tuberculosis and drug resistance detection (WHO 2010). The principal behind these assays is amplification of a targeted region of the M. tuberculosis genome by PCR. NAATs are used for both tuberculosis detection (particularly the Xpert MTB/RIF) and identification of mutations that confer resistance to anti-tuberculosis drugs (for example, Bruker-Hain and Nipro line probe assays (LPAs), most commonly rifampicin (RIF) and isoniazid (INH)) [2, 3]. Globally, INH mono-resistant tuberculosis is more prevalent than multidrug-resistant tuberculosis (MDR-TB), and WHO guidelines advocate for universal testing for both RIF and INH resistance before commencing tuberculosis treatment [4].
Recently, several companies have developed molecular tests for tuberculosis and RIF/INH resistance detection on centralised platforms, many of which have already been established as multi-disease platforms, primarily for detection of HIV, human papillomavirus and hepatitis C virus.
This systematic review intended to evaluate the diagnostic accuracy of five of these tests for M. tuberculosis and RIF/INH resistance detection to assess their diagnostic accuracy. The tests included were Abbott RealTime MTB, Abbott RealTime RIF/INH, FluoroType MTB, FluoroType MTDBR and BD Max MDR-TB assay.
Methods
Search strategy, information sources and eligibility criteria
We followed standard guidelines and methods for systematic review and meta-analyses of diagnostic test accuracy [5, 6]. A comprehensive search of databases (PubMed, EMBASE, BIOSIS, Web of Science, LILACS, Cochrane) for relevant citations, without language restrictions was performed. An example search strategy is provided in the supplementary methods. The time period was restricted to January 2009 to June 2018 and another scoping search was done till May 2020 to look for published studies for these platforms. We also contacted the developers of these tests to provide available data and lists of studies they are aware of. Cross-sectional, case–control, cohort studies and randomised controlled trials of any of the index tests (listed above) were included if at least 25 specimens were tested. Abstracts and unpublished studies were excluded. Patients of all age groups with presumed or confirmed pulmonary tuberculosis or MDR-TB, in all settings and any country, were included.
Our search strategy also included terms for two assays by Roche and Bioneer that are comparable to the assays reviewed, however, we did not find any studies for these assays.
Citation screening and study selection
Two authors (M. Kohli and E. MacLean) independently screened and reviewed the full texts. Any discrepancies were resolved by discussion, and in case of disagreement, a third author was consulted (C.M. Denkinger). If a study contributed data to more than one analysis (e.g. two different index tests in one study), it was considered as two or more datasets. Disagreements in extracted information were resolved by discussion with third author (C.M. Denkinger). Study authors were contacted in cases of missing data. In cases of papers without extractable diagnostic accuracy data, the study was excluded if after three attempts the study author did not reply.
Reference standards
For tuberculosis detection, solid or liquid culture was the reference standard. For resistance detection, phenotypic drug susceptibility testing (DST) was the primary reference standard. However, if the studies provided information on sequencing, we analysed the data using a phenotypic DST reference standard, a sequencing reference standard, and a composite reference standard (CRS). For a CRS, if phenotypic DST showed drug sensitivity but sequencing identified mutations recognised to be associated with resistance, the CRS was considered resistant when the mutations were associated with high or moderate confidence of resistance as per Miotto et al. [7]. If phenotypic DST showed resistance but sequencing did not identify mutations associated with resistance, the CRS was considered resistant (as mutations could be outside of the region sequenced).
Head-to-head comparisons
When possible, the index tests were also compared to other well-characterised, WHO-recommended molecular test: Xpert MTB/RIF for both TB detection and rifampicin resistance. Such head-to-head comparisons are preferred, as using a WHO-recommended comparator test with known diagnostic accuracy serves as an easily understood benchmark for the index test's performance [8]. It can allow flagging of studies with particularly strong or weak results for the index test, which may help explain some between-study heterogeneity.
Assessment of methodological quality
The Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool, a validated quality assessment tool for diagnostic studies [9], was used to assess the included studies' risk of bias.
Statistical analysis and data synthesis
For each index test, meta-analyses were performed of sensitivity and specificity of TB detection, as well as RIF resistance and INH resistance when at least four studies were available. Studies were pooled using bivariate random effect hierarchical models to calculate sensitivity and specificity, with associated 95% confidence intervals, of each index test against the relevant reference standard. When there were fewer than four studies for an index test or evident heterogeneity between studies, a descriptive analysis only was performed.
Results
From the literature search, 750 citations were identified, 81 full-text articles were reviewed, and 21 studies were included in the systematic review (see figure 1). The 21 studies contributed 26 datasets, as four provided data for more than one index test. All studies were conducted in central level laboratories, which was expected as these assays require sophisticated laboratory infrastructure and skilled laboratory workers. As most studies were laboratory-based, there were limited demographic data available, such as age, HIV status, and past tuberculosis history of the included patient population. Tables 1 and 2 show the results of all the index tests analysed separately for both tuberculosis detection and resistance detection. Table 3 provides data for head-to-head comparisons of the index tests with Xpert MTB/RIF.
Risk of bias by QUADAS-2 assessment
The overall methodological quality of the included studies for each index test is summarised in supplementary figures S1–S10. For all assays except BD Max MDR-TB, the studies had applicability concerns in the domain of participant selection, as the studies were not conducted in high tuberculosis or MDR-TB burden settings. Similarly, for risk of bias, some studies had concerns in the patient selection domain and also the reference standard domains. In all other domains, risk of bias was low.
Abbott RealTime MTB
Ten studies with 4858 respiratory specimens were included in the meta-analysis that evaluated Abbott RealTime MTB assay for TB detection [10–19] (figure 2). In all studies, the assay was run directly on specimens, as opposed to positive culture isolates. Most studies (six out of 10) used fresh specimens, while four used frozen specimens. The median sample size was 389 (interquartile range 242 to 599). In individual studies, the sensitivity point estimates of Abbott MTB assay varied from 79% to 100% with specificity varied from 84% to 99% (figure 2). Pooled sensitivity and specificity were 96.2% (95% CI 90.2–98.6) and 97.1% (95% CI 93.7–98.7), respectively.
Comparator test for tuberculosis detection: Xpert MTB/RIF
In addition to the RealTime MTB assay, three studies [10, 14, 19] performed Xpert MTB/RIF on the same specimens [10, 19] or on different specimens obtained from the patient on the same visit [14] (figure 3a). In the study by Wang et al. [19], a lower overall specificity was observed for both Xpert (90%) and RealTime MTB (84%) than would be expected. In contrast, Scott et al. [14], showed Xpert specificity of 98%, and specificity for RealTime MTB was 92% in the study. Berhanu et al. [10] also evaluated Xpert Ultra on the same specimens. The study showed an increased Xpert Ultra sensitivity of 89%, but with a trade-off for lower specificity of 96% (figure 3b).
Sub-group analyses: smear status
All ten studies provided data allowing stratification by smear status. For smear-positive specimens, the sensitivity of RealTime MTB assay varied from 95% to 100%. Pooled sensitivity was 99.0% (95% CI 97.7–100) (10 studies, 765 specimens).
For smear-negative specimens, the sensitivity in these specimens varied from 41% to 100%. Pooled sensitivity was 88.4% (95% CI 74.0–99.3) (10 studies, 4056 specimens). The study by Berhanu et al. [10] demonstrated very low sensitivity of 41.0% (95% CI 18.0–67.0), which may partially be explained by the high prevalence of HIV in their study population, meaning that a high proportion of cases suffered from paucibacillary disease. We were not able to explore test performance by HIV status further as most of the studies (60%) did not report HIV prevalence.
Abbott RealTime MTB RIF/INH
Seven studies provided data for RIF and INH resistance detection by RealTime MTB RIF/INH, with phenotypic DST as the reference standard in both use cases [10, 13–15, 20–22]. Six studies performed the index test directly on known tuberculosis-positive specimens or as an accompanying drug susceptibility test with RealTime MTB. One study [20] used tuberculosis-positive culture isolates for the index test specimen. Four studies used fresh specimens while others used bio-banked specimens.
RIF-resistance detection
The pooled sensitivity and specificity for RIF resistance were 94% (95% CI 89.0–99.0) and 100% (95% CI 99.0–100), respectively, from seven studies and a total of 1008 specimens (figure 4). There was little heterogeneity across studies.
Additionally, three studies provided sequencing data for RIF resistance, so we compared RealTime MTB RIF/INH performance against sequencing and a composite reference standard (CRS) in these instances (figure S11). In the paper by Hofmann-Thiel et al. [20], three specimens were classified as resistant by RealTime MTB RIF/INH due to a L511P mutation in the rpoB gene, but were sensitive on phenotypic DST. These three specimens were reclassified as true positives with CRS. In the same study, 10 specimens that were susceptible to RIF by index test were resistant by both sequencing and culture (six specimens with the high confidence mutation H526R and four with the moderate confidence mutation L533P mutations). In the paper by Kostera et al. [21], four specimens were classified as susceptible wildtype by the index test and sequencing, but were classified as false negatives by the CRS, as phenotypically they were resistant to RIF. In the smaller study by Tam et al. [15], the index test and reference standards had complete concordance. Thus overall, given the limited number of discordances between the phenotypic and genotypic DST, the results in reference to the different reference standards hardly changed (figure S11).
INH resistance detection
For INH resistance detection, the pooled sensitivity and specificity were 89% (95% CI 86.0–92.0) and 99% (95% CI 98.0–100), respectively, from seven studies and a total of 1013 specimens (figure 5). There was little heterogeneity across studies.
The same three studies provided data for INH resistance against sequencing. RealTime MTB RIF/INH displayed better accuracy when compared against the sequencing reference standard than against the phenotypic DST. For Hofmann-Thiel et al. [20], there were 18 specimens that were susceptible by index test but resistant by phenotypic DST. These 18 specimens did not show any mutations in the katG or inhA target regions using sequencing, so by the CRS we classified them as resistant, since these mutations could have been outside the target regions. Hence the accuracy estimates with CRS in the study were identical to the phenotypic DST. In the study by Kostera et al. [21], seven discordant specimens that were classified as susceptible phenotypically but INH resistant by index test were confirmed to be resistant by sequencing. This was due to the presence of the katG mutation S315T in three cases and an inhA protomer region mutation, c-15t, in four cases. These seven specimens were correctly identified as resistant by the index test but were missed by conventional phenotypic DST (figure S12).
Fluorotype MTB
Five studies with 2660 respiratory specimens were included in the meta-analysis [13, 23–26]. Median sample size was 608 (interquartile range 296–661). The assay was performed directly on specimens in all studies, with all but one (four out of five; 80%) studies reporting use of fresh specimens. One study used biobanked specimens [13]. Individual sensitivities ranged from 87% to 95%, while specificities ranged from 60% to 100% (figure 6). Pooled sensitivity and specificity were 92.1% (95%CI: 87.6–93.3) and 98.9% (95%CI: 64.0–99.9), respectively. Obasanya et al. observed relatively low specificity of 60% (95%CI: 53.0–66.0), which may be partially explained by the study being conducted in a low resource setting with higher potential for sample contamination, the use of Petroff's method for sputum decontamination, and Löwenstein-Jensen solid culture as the reference standard [26].
Comparator test for TB detection: Xpert MTB/RIF
In assessing Xpert as a comparator test in the same study [26], a substantially higher specificity was observed (94% for Xpert versus 60% for the FluoroType) (figure 7). However, the specificity of Xpert was lower than the observed specificity of the test for pulmonary tuberculosis in a large meta-analysis [27]. This study observed Xpert MTB/RIF sensitivity of 79% and FluoroType MTB sensitivity of 89%.
Fluorotype MTBDR
Two studies [28, 29] evaluated FluoroType MTBDR for TB detection using 782 frozen specimens (table 3). The study by de Vos et al. [28] reported a sensitivity of 96% (95%CI: 93–98) and a specificity of 100% (95%CI: 97–100). Haasis et al. [29] reported a sensitivity of 91% (95%CI: 82–97) and specificity of 100% (95%CI: 98–100). The study by de Vos et al. [28] only included Xpert-positive specimens, which could have introduced spectrum bias and an inflated sensitivity estimate.
RIF resistance detection
Two studies [29, 30] assessed the performance of the test for RIF resistance detection using a phenotypic DST. Hillemann et al. [30] used culture isolates for FluoroType MTBDR while Haasis et al. [29] performed the testing directly on specimens. Sensitivity was 97% (95% CI 82.0–100) for Haasis et al. [29] and 99% (95% CI 96.0–100) for Hillemann et al. [30] and specificity was 100% in both studies. No comparison to sequencing was performed.
INH resistance detection
For isoniazid resistance detection, phenotypic culture was also the reference standard. In Haasis et al. [29] and Hillemann et al. [30], sensitivities were 70% (95% CI 46.0–88.0) and 92% (95% CI 84.0–97.0), respectively, and specificity was 100% in both studies. No comparison to sequencing was performed.
For the Hillemann et al. [30] study, the use of culture isolates for testing might have resulted in better resistance detection than in Haasis et al. [29].
BD Max MDR-TB
One recently published multicentre study provided data for this assay [31]. The assay was run on fresh sputum specimens. It reported a sensitivity of 93% (95% CI 89.0–96.0) with specificity of 97% (95% CI 96.0–98.0) on raw sputum specimens. For decontaminated sputum specimens, the sensitivity was 91% (95% CI 87.0–94.0) and specificity was 95% (95% CI 93.0–97.0).
Comparator test for tuberculosis detection: Xpert MTB/RIF
The study performed Xpert on the same processed sputum specimens as a comparator test. It reported similar sensitivities of 91% and 90% and specificities of 96% and 98% for BD Max and Xpert, respectively (figure 8).
RIF resistance detection
For RIF resistance, the sensitivity and specificity with phenotypic DST as reference standard were 90% (95% CI 55–100) and 95% (95% CI 91–97), respectively (one study, 232 specimens). However, six of 11 specimens classified as false positives by phenotypic DST had rpoB mutations identified by Sanger sequencing. Two specimens each had D516Y and L511P mutations, while one specimen each had D516F and L533P mutations, all of which are considered to confer resistance with high or moderate confidence [7]. Based on this reclassification, specificity increased from 95% (211 out of 222) against phenotypic DST to 98% (211 out of 216) with the sequencing and CRS reference standards [31].
INH resistance detection
For INH resistance, the sensitivity and specificity were 82% (95% CI 63.0–92.0) and 100% (95% CI 98.0–100), respectively, against phenotypic DST.
Discussion
In this systematic review, we summarise the performance of five diagnostic test for tuberculosis and RIF/INH resistance detection: Abbott RealTime MTB, Abbott RealTime MTB RIF/INH, FluoroType MTB, FluoroType MTBDR and BD Max MDR-TB. Overall, the tests show similar performance to tests currently recommended by WHO.
Sensitivity across tests was in the range of 90% and above with markedly low observed variability for all assays. For specificity in tuberculosis detection, there was more variability across studies and tests and further research needs to be conducted to understand whether this variability is related to test characteristics. For some studies, accuracy estimates were low for both the index test and the comparator (Xpert), which helped in understanding that decreased accuracy could be due to some confounders or study characteristics not stated explicitly [19, 26]. Contrastingly, other studies were well conducted and there was more confidence in the diagnostic accuracy of the index tests as the comparators had accuracy estimates which were in-line with WHO estimates [14, 31].
Conceivably, the different tests might perform differently when it comes to detection of viable and non-viable bacteria depending on the extraction methods and the methods used to enrich whole cell bacteria (e.g. filters) [32, 33]. Therefore, studies recruiting individuals with recent tuberculosis history that compare index tests to existing WHO recommended tests (such as Xpert MTB/RIF) would be useful. As well, manual extraction methods, such as those employed by Obasanya et al. [26] for Fluorotype MTB, in the hands of less experienced users might have contributed to contamination and thus false positive results.
For RIF and INH resistance detection, the sensitivity and specificity estimates were also in the range of the published accuracy estimates for Xpert [27] and LPA [34]. Although data was limited and variability was observed, which might relate to how the tests were performed (e.g. from isolates or sample) or the study populations.
Three assays were evaluated for the detection of RIF and INH resistance. Abbott RealTime RIF/INH assay was the only assay that had sufficient data for meta-analysis, with pooled sensitivity and specificity for RIF resistance of 94% and 100%, respectively, and for INH resistance 89% and 99%, respectively. For the other two assays, data was insufficient to meta-analyze, but overall diagnostic accuracy for RIF and INH resistance detection at this point appeared comparable to that of the WHO-recommended LPA test (90%) [34]. The use of CRS increased the specificity in some studies due to the identification of disputed mutations by sequencing that went undetected by phenotypic DST [7]. All studies that provided sequencing information performed targeted Sanger sequencing, which is a limitation as only targeted sequences can be identified, compared to whole genome sequencing which would provide information on the entire genome and thus identify resistance conferring mutations outside of target regions such as rpoB. A concerning finding to be noted was that in a study [20] of RealTime MTB RIF/INH where six specimens were identified as susceptible to RIF by index test despite the presence of the high confidence mutation H526R. This finding needs to be further assessed in additional studies. S315T is a frequent katG mutation and arises typically before all other drug mutations. It is also one of the mutations termed as “harbinger mutations”. Its early detection may help in preventing multidrug resistance transmission [35]. In the current systematic review, Abbott RealTime RIF/INH assay picked up this mutation correctly in three specimens in comparison to the phenotypic DST [21]. There was insufficient data to assess these mutations in other assays included in the review.
Only for the BD Max MDR-TB, a single well-conducted study provided information across a well-characterised and representative population. For other tests, HIV status, sex, tuberculosis history and tuberculosis treatment status were not available for 70% of the datasets included in the analyses, making generalisability to specific settings difficult. For these tests, additional studies are needed that provide more demographic information for the samples tested to allow for further generalisability of the data.
Operational characteristics are also a critical component for the use of testing platforms in different settings. The throughput of all of the mostly automated platforms assessed in this study is large. Specifically, the number of specimens that can be processed in these platforms vary from 24 (BD Max) to 94 specimens (Abbott RealTime, Hain Fluorotype, Roche Cobas). The turnaround time vary from 3 to 5 h as available from the company manufacturers' package inserts. All of the platforms can be connected to central laboratory information management systems, which is beneficial for disseminating reports to clinicians and patients without delay. Furthermore, the platforms are able to run a large portfolio of assays for different diseases, with Abbott having the largest among the tests evaluated. As such the assays are suited for centralised settings and can provide results to many patients with minimal hands-on manipulation. This limits infection risk to healthcare workers and laboratory technicians, as well as the risk of sample contamination. All tests demonstrated sensitivity for smear-negative cases comparable to Xpert MTB/RIF assay, making them good contenders for this frequently difficult-to-diagnose use case.
However, the tests are not suited for use in lower levels of the healthcare system where patients first present for care. And for the platforms to have the same impact than near-patient platforms, specimen transport needs to optimised. In addition, without reliable systems in place to deliver test results to patients, the impact of these centralised platforms will be very limited, despite their high performance.
An important strength of our systematic review and meta-analysis was that we provided head-to-head comparisons of the index test with Xpert MTB/RIF, a WHO recommended molecular test [8]. Additionally, we also used multiple reference standards for evaluating drug resistance, which provided information on the mutations captured or missed by the index tests. However, the review and meta-analysis also had some limitations. As most of these tests are very new to market, there was minimal data to perform more detailed analyses. Most of the studies were laboratory-based studies, and therefore demographic data of the included participants were not provided. Thus, the generalisability of the performances of all tests (with the exception of BD Max) is uncertain. Another potential concern was that most of the studies had test manufacturers' involvement.
In summary, for patients with pulmonary tuberculosis, these centralised molecular assays demonstrate promising diagnostic accuracy for tuberculosis, RIF resistance, and INH resistance detection. While data were limited, the performance of these assays appears similar to that of WHO-recommended Xpert and LPA assays. The assays might prove to have operational advantages in some settings, but further research is necessary.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material ERJ-00747-2020.SUPPLEMENT
Shareable PDF
Supplementary Material
This one-page PDF can be shared freely online.
Shareable PDF ERJ-00747-2020.Shareable
Acknowledgements
We would like to thank Genevieve Gore, McGill Librarian for helping with the literature search for this review. We would also like to thank the Government of the UK for providing financial support.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Conflict of interest: E. MacLean has nothing to disclose.
Conflict of interest: M. Pai has nothing to disclose.
Conflict of interest: S.G. Schumacher has nothing to disclose.
Conflict of interest: C.M. Denkinger has nothing to disclose.
Conflict of interest: M. Kohli has nothing to disclose.
Support statement: This work was supported by the Government of the UK. Funding information for this article has been deposited with the Crossref Funder Registry.
- Received March 18, 2020.
- Accepted July 30, 2020.
- Copyright ©ERS 2021