Abstract
Light-emitting diode fluorescence microscopy (LED-FM) is recommended by the World Health Organization to replace conventional Ziehl–Neelsen microscopy for pulmonary tuberculosis diagnosis. Uptake of LED-FM has been slow. One reason is its reported loss of specificity compared with Ziehl–Neelsen microscopy. We aimed to determine the diagnostic accuracy of LED-FM for tuberculosis detection and explore potential factors that might affect its performance.
A comprehensive search strategy based on pre-specified criteria was employed to identify eligible studies between January 1, 2000 and April 1, 2014 in 11 databases. Standardised study selection, data extraction and quality assessment were conducted. Pooled sensitivity and specificity of LED-FM using culture as the reference standard were estimated through meta-analyses using a bivariate random-effects model. Investigation of heterogeneity was performed by subgroup analyses.
We identified 12 unique studies, half of which were from peripheral healthcare facilities. LED-FM achieved a pooled sensitivity of 66.9% (95% CI 60.5–72.7%) and pooled specificity of 96.8% (95% CI 93.1–98.6%). A pooled sensitivity of 53.0% (95% CI 42.8–63.0%) and pooled specificity of 96.1% (95% CI 86.0–99.0%) were obtained by LED-FM among HIV-infected patients. Study methodology factors and differences in the LED-FM procedure or device could also affect the performance.
LED-FM specificity is high and should not be a barrier to device introduction, particularly among peripheral healthcare settings where this technology is meant to be used. Sensitivity is reduced in HIV-infected patients.
Abstract
Meta-analysis showed high performance of LED fluorescence microscopy in diagnosing pulmonary TB http://ow.ly/U3dxd
Introduction
An estimated 9 million new patients developed tuberculosis (TB) in 2013, while globally there were an estimated 11 million prevalent TB cases [1]. As around one-third of TB cases are thought to be undetected [2], early, rapid and accurate diagnosis of TB is crucial in lowering the global burden of the disease.
Despite the effort to roll out the Xpert MTB/RIF assay for TB diagnosis, smear microscopy remains the cornerstone frontline diagnostic test in the majority of primary health centres among countries with a high burden and limited resources. Based on its increased sensitivity and reduced reading time compared with conventional Ziehl–Neelsen (ZN) light microscopy, the World Health Organization (WHO) recommended the introduction of light-emitting diode fluorescence microscopy (LED-FM) as an alternative to ZN microscopy in both high- and low-volume laboratories [3]. The WHO recommendation was mostly based on results from accuracy studies in reference or research laboratory centres presented in a meta-analysis conducted by Minion et al. [4] in 2009. However, in 2013, only 6% of microscopy centres had reportedly switched to LED-FM [1]. One barrier to the uptake of LED-FM is a shared concern, among technologists, of LED-FM's lower specificity [5–7] compared with ZN microscopy and the lack of clear quality control procedures.
Here, we provide an updated systematic review of LED-FM diagnostic accuracy of pulmonary TB detection with the addition of studies that assessed LED-FM performance at the microscopy laboratory level in limited-resource countries, where this technology is meant to be used. We aimed to determine the diagnostic accuracy of LED-FM from sputum specimens using culture as the reference standard and explore potential factors that might affect its performance, particularly the reduced specificity of LED-FM observed in the past literature [5–7].
Methods
Search strategy and selection criteria
Our review included only primary studies that assessed the diagnostic accuracy of LED-FM for detecting pulmonary TB from unprocessed sputum specimens of adults. Studies using LED-FM with additional digital manipulation devices, such as CellScope (CellScope, San Francisco, CA, USA), were excluded. The studies included must have compared LED-FM with a reference standard using sputum culture. We searched for all types of study designs from which we could extract data to populate a diagnostic 2×2 table. We excluded any patients, or sputum specimens of patients, who had been on treatment for pulmonary TB as well as studies in which results from diagnostic and follow-up specimens could not be differentiated.
We searched MEDLINE, EMBASE, BIOSIS, DARE, Cochrane Database of Systematic Reviews, PROSPERO, Health and Technology Assessment, WHO International Clinical Trial Registry platform, Cochrane Central Register of Controlled Trials, metaRegister of Controlled Trials, and Cochrane Infectious Disease Group Specialized Register to identify all relevant studies dated from January 1, 2000 up to April 1, 2014 in English, French and Chinese, regardless of publication status. The search terms are listed in the online supplementary material. All electronic searches were performed between March 18 and April 2, 2014. In addition, we reviewed reference lists, conducted citation tracking of all included articles, and hand-searched reports produced by WHO, the Special Programme for Research and Training in Tropical Diseases on LED-FM and FIND. We also contacted experts in the field of TB diagnosis to identify more relevant studies.
Data extraction
The three review authors collectively conducted double and independent screening of study titles and abstracts of the accumulated citations of relevance. Full-text articles of potentially eligible studies were first assessed by E.W.C. Preliminarily exclusions were reviewed and validated by A.-L.P. Discrepancies in preliminary full-text review were resolved through discussion with M.B. All three review authors performed independent full-text review of the remaining eligible studies. Final study selection was confirmed by consensus.
A standardised data extraction form (see online supplementary material), including a modified version of the latest QUADAS quality assessment of diagnostic accuracy studies tool [8], was finalised after piloting on four studies. The three review authors abstracted data and assessed risk of bias of the included studies. Discrepancies were resolved through discussion. Study authors were contacted for missing data and further clarifications if required. Extracted data were entered into Microsoft Excel for analysis.
Statistical analysis
We tabulated true positives, false positives, false negatives and true negatives by study, including those yielded from subgroup analyses on HIV-infected patients and/or different LED-FM systems, to construct the 2×2 contingency tables. Enrolled participants who had missing, contaminated or nontuberculous mycobacteria culture results were excluded from the diagnostic accuracy analyses. We used Review Manager 5.2 [9] to generate forest plots and display the calculated sensitivity and specificity with 95% confidence intervals by study. If applicable, summary receiver operating characteristic (SROC) curves with 95% confidence and prediction regions by subgroup were also generated. The hierarchical bivariate models, which recognised the correlation between sensitivity and specificity and included a random-effects term to account for both within and between study variances, were fitted to generate summary estimates directly. We used the user-written program “metandi” in Stata 12 [10] to pool accuracy measures. We followed the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy to assess expected heterogeneity [11]. Our pre-specified variables for subgroup analyses included: HIV status of patients, clinical setting of studies (primary health clinics or district hospitals versus research or referral centres), laboratory setting of studies (microscopy laboratory versus research or reference laboratory), type of culture media (solid, liquid or both), method of reference standards (culture applied to more than one specimen per participant versus only one specimen), microscope magnification levels (×400 or ×200) and type of LED-FM devices. Bivariate models failed to converge when there were fewer than four studies available for the subgroup analysis [11]. In such cases, we analysed the SROC visually [11]. Studies with multiple sites under the same study protocol were pooled directly as one single study in the meta-analysis. With regard to studies that evaluated multiple LED-FM systems on the same group of participants, we included data from only one LED-FM system for the meta-analysis in order to avoid bias caused by counting the same subjects more than once [12]. The LED-FM system most commonly used among our included studies in the meta-analysis was selected to minimise heterogeneity between studies.
Results
Study selection and characteristics
We identified 381 citations, of which 59 were eligible for full-text review (figure 1). 12 articles [13–24], reporting 12 independent studies, were included in the meta-analysis. One study [15] had two different patient groups (group 1: unlikely HIV-infected; group 2: HIV-infected) with separately analysed data. We considered these two patient groups to be independent in the meta-analysis.
The 12 prospective studies [13–24] were conducted across 11 countries and assessed six commercial LED-FM systems (table 1). All studies were located in middle- or low-income countries [25] and 11 recruited sites in WHO-recognised, high-TB-burden countries [1]. TB incidence rates per 100 000 population ranged from 48 (Yemen) to 860 (South Africa) [1]. More than half of the studies had unclear risk of bias in the domains of patient selection and reference standard due to missing report of patient sampling method, exclusion criteria and indication of blinded interpretation of reference standard results (see online supplementary material).
Overall accuracy analysis
A total of 6256 participants, out of 7451 enrolled subjects, from the 13 study groups were included in the diagnostic accuracy analyses. Study-specific sensitivities varied from 40% to 83%, while specificities varied from 82% to 100% (figure 2). The pooled sensitivity and specificity of LED-FM were 66.9% (95% CI 60.5–72.7%) and 96.8% (95% CI 93.1–98.6%), respectively. The expanded prediction region on the SROC plot indicated a high level of heterogeneity observed among the included studies (figure 3) [11].
Subgroup accuracy analysis
Subgroup accuracy analysis is presented in table 2 and the online supplementary material. Pooled estimates were not generated for HIV-uninfected patients as there were fewer than four studies. SROC indicated higher diagnostic accuracy of LED-FM from the two study groups [15, 16] of HIV-negative patients compared with expansive variability in both sensitivity and specificity estimates among studies with HIV-infected subjects (see online supplementary material).
In terms of study settings, the studies that were conducted at lower-level health facilities (district or primary healthcare), which represented half of the studies, had a high pooled specificity of 98.1% (table 2). SROCs showed a visible influence of clinical settings on the diagnostic accuracy of LED-FM and a higher variability in sensitivity associated with research/referral hospitals (see online supplementary material). However, minimal impact of laboratory settings on the diagnostic performance of LED-FM was observed from SROCs (see online supplementary material).
The bivariate models failed to converge in subgroup analyses of different culture media. Nonetheless, SROC plots suggested that liquid culture reference standard was associated with higher diagnostic accuracy of LED-FM (see online supplementary material). Studies using culture on one specimen (n=7) [14–17, 19, 22, 23] had a lower pooled specificity (94.7%) and a higher sensitivity (70.0%) compared with studies using more than one specimen (n=6) [13, 15, 18, 20, 21, 24] (98.5% and 61.1%, respectively), although the differences were not statistically significant. SROC plots suggested that culture reference based on more than one specimen improved the diagnostic accuracy of LED-FM, although associated with increasing heterogeneity in both sensitivity and specificity estimates of the LED-FM (see online supplementary material).
The included studies covered all commercially available LED-FM systems. Lumin™ (LW Scientific, Lawrenceville, GA, USA), evaluated in five studies [13, 18, 20–22], was the most commonly used device and displayed the best diagnostic performance on the SROC plots (see online supplementary material). Performance evaluations of all the other devices were based on fewer than four studies. Nevertheless, all three evaluations of CyScope® (Partec, Görlitz, Germany) [15, 17] reported low performance as compared with the other devices. Pooled performance estimates were only yielded from studies using ×400 magnification [13–15, 18, 20, 22, 23]. The pooled sensitivity of using ×400 magnification (68.3%) was close to the best sensitivity achieved by the three studies using ×200 magnification (69.0%) [16, 17, 21], but the pooled specificity of ×400 magnification (95.0%) only equalled the lowest specificity among studies using ×200 magnification (table 2). The SROC plots showed that ×200 magnification allowed the best LED-FM performance (see online supplementary material).
Only one study [18], conducted at a national reference laboratory, confirmed sufficient prior fluorescence microscopy reader experience so that no additional training was required. Seven studies reported provision of training of various length [13, 14, 16, 19–21, 23], but training provision did not consistently translate into better LED-FM performance accuracy (data not shown).
Discussion
To the best of our knowledge, this systematic review on the diagnostic accuracy of LED-FM for pulmonary TB detection is the first to be published in a peer-reviewed journal and the most up-to-date review conducted following the WHO's recommendation in 2011 [3]. Our reported sensitivity (66.9%) was much lower than that quoted in the WHO policy statement (83.6%) [3] based on the systematic review performed by Minion et al. [4], while the pooled specificity was similar (96.8% versus 98.2%).
Two major factors might explain the lower sensitivity in our review. First, in contrast to Minion et al. [4], we excluded studies using processed specimen (bleach sedimentation or centrifugation). Despite its potential to increase the sensitivity of microscopy, specimen processing before microscopy is not recommended by the WHO due to insufficient generalisable evidence [26]. Second, half of our review data were derived from primary health clinics and district hospitals, whereas the meta-analysis by Minion et al. [4] used mostly studies from research centres and referral hospitals. As suggested by our subgroup analyses, the sensitivity of LED-FM tended to be lower at primary healthcare settings than at referral hospitals, possibly due to the fact that patients at referral hospitals had more advanced disease. As a result, our findings might better reflect the diagnostic performance of LED-FM closer to the operational level, where this technology is aimed to be utilised.
As expected, sensitivity was lower in the HIV-infected population due to the higher occurrence of paucibacillary TB in this population and the difficulties in obtaining good quality sputum specimens in cases of advanced HIV infection [27]. However, specificity remained high in the HIV-infected population, as had been shown previously for ZN microscopy (96.1%) [28]. Significantly lower LED-FM sensitivity and specificity were reported from one study that recruited only HIV-infected patients [13]. This study took place in the context of an intensified TB case-finding strategy [13], which might have contributed to an influx of TB suspects being tested at an earlier stage [29], thus driving up the proportion of low- or very-low-positive sputum smears.
Although low specificity of LED-FM had been reported previously [5–7] and was thought to have had a negative impact on the introduction of this technique, we found good pooled specificity. Common factors hypothesised to influence specificity, such as prior reader experience and proportion of paucibacillary samples within a population, could not be well assessed due to the lack of information even after contacting the authors. However, the laboratory level had no impact in this review. Performance assessment of different LED-FM devices was also difficult given the small number of study groups. Nonetheless, our findings showed low performance of CyScope [15, 17]. They also supported the use of the ×20 objective for the examination of florescent staining, which is in agreement with the recent Global Laboratory Initiative proposal [30].
Study methodology, particularly the choice of reference standard, can influence performance estimates. A reference standard with imperfect sensitivity itself can lead to an underestimated specificity and an overestimated sensitivity of the tested diagnostic tool. Our review saw higher LED-FM specificity when evaluated against liquid culture, the most sensitive culture media. Cuevas et al. [19] and Chaidir et al. [15] also reported that the lack of sensitivity of solid culture probably contributed to the low LED-FM specificity, especially among patients with scanty smears. Studies that obtained culture from more than one specimen showed higher LED-FM specificities in our review. Indeed, combining culture outcomes from multiple sputum specimens per participant could improve the reference standard by reducing the risk of false-negative reference results caused by the over-killing effects during the decontamination process, even when using liquid culture. It could also reduce the selection bias resulting from the exclusion of patients with contaminated culture results.
In the absence of any real gold standard for evaluation of TB diagnostic tests, this review mainly focused on evaluations of LED-FM against culture, which is considered to be the optimal reference standard. We only found one study that used an alternate reference standard combining culture and an expert panel review of smear-positive, culture-negative results [31]. Future research on the assessment of LED-FM against alternative reference standards could be of interest.
Our review has a number of limitations, including language restriction in the searches and inclusion of studies with high heterogeneity in design, methodology and setting. Our investigation on sources of heterogeneity through subgroup analyses was constrained by the small number of studies and missing information for some of the analyses (training experience and proportion of scanty results).
Conclusion
This meta-analysis showed lower pooled LED-FM sensitivity than previously reported, but this probably better reflects the performance of LED-FM at lower-level healthcare facilities where the device is aimed to be used and where introduction of the Xpert MTB/RIF assay is more challenging. Although sensitivity might not represent a major advantage over ZN microscopy, the device-associated reduction in laboratory workload [3, 14, 15, 21, 23] remains a strong advantage in terms of cost-efficiency. LED-FM specificity is overall high and should not be a barrier to device introduction in high-TB-burden and limited-resource countries. Future studies should assess the diagnostic accuracy of LED-FM among HIV-infected populations in the context of more intensified case-finding strategies. Studies should use optimised culture reference standards including liquid culture on all specimens collected. Moreover, international multicentre studies using the same study design rather than several independent studies should be preferred in the future to reduce the heterogeneity of study populations and sites.
Acknowledgements
We thank Andrew Booth and Claire Beecroft (University of Sheffield, Sheffield, UK) for their helpful suggestion on our protocol; Drs Karen Steingart (Cochrane Infectious Diseases, Portland, OR, USA) and Vittoria Lutje (Cochrane Infectious Diseases, London, UK) for their help with the meta-analyses methodologies and electronic search strategy, respectively; and Mathieu Bastard (Epicentre, Paris, France) for his assistance on the statistical analyses. Gratitude to Drs. Heidi Albert, Richard Anthony, Adithya Cattamanchi, Pamela Nabeta, Sharon Reed, Andrew Whitelaw, Armand Van Deun, Zhao Ping and Zhao Yanlin for answering our questions and providing additional information regarding their studies.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Support statement: This study was funded by Epicentre. Funding information for this article has been deposited with FundRef.
Conflict of interest: None declared.
- Received June 21, 2015.
- Accepted October 21, 2015.
- Copyright ©ERS 2016