TABLE 2

Study designs to evaluate diagnostic accuracy of computer-aided detection (CAD) programmes

Study characteristicIdealAcceptable
Study populationBroad eligibility criteria to enhance generalisability.
Sufficient sample size to permit stratified analyses by key characteristics.
Should not have contributed CXR for developing/training CAD software that is under study.
Prevalence of TB in study population should reflect that expected in similar use-case settings.
Limited to specific high-risk groups, or key populations.
Participant selectionMinimise exclusion of participants.
Report number of persons screened for eligibility, ineligible for enrolment, and enrolled.
Report reasons for ineligibility.
Minimise exclusion of participants.
Report number of persons screened for eligibility, ineligible for enrolment, and enrolled.
DesignProspective or retrospective study of diagnostic accuracy.
CAD programme should be used how it will be used in the clinical context (including additional procedures, e.g. verification procedure prior to use of the software).
Cross-sectional study of diagnostic accuracy.
If data from other studies are retrospectively used to study a CAD solution, investigators should report how data were collected in the original study, the criteria for selecting data for the CAD evaluation, and the number of participants from the original study whose data were excluded.
CAD programme should be used how it will be used in the clinical context (including additional procedures, e.g. verification procedure prior to use of the software).
Reference standardMicrobiological reference standard of solid, and/or liquid cultures performed on all participants regardless of CXR result.#
A combined clinical and microbiological reference standard can be considered. Here it is important that the final classification of patients should be well standardised or performed by an expert group that is blind to the CAD result. A microbiological reference standard should always be reported in addition.
Microbiological reference standard of NAAT performed on all participants regardless of CXR result.
For systematic screening use-case only: liquid culture or NAAT performed on participants with a CXR classified as abnormal by a human reader, whereas pulmonary TB is assumed absent in CXRs classified as normal by a human reader.
Comparison to human readersA comparison to a human reader should be reported.
Human readers should:
Be blind to results of CAD, and microbiological tests when interpreting CXR;
Use simple two or three category readings [15].
Study should report training, and years of experience with radiographic reading of the human readers.
For each category of experience, there should be more than one human reader (e.g. at least two experts, at least two non-experts).
Inter-reader reliability should be reported.
A comparison to a human reader should be reported.
Human readers should:
Be blind to results of CAD, and microbiological tests when interpreting CXR;
Use simple two or three category readings [15].
Study should report training, and years of experience with radiographic reading of the human readers.
Inter-reader reliability should be reported.
Index testCAD solution should not have been developed/trained with CXRs from population being studied.Report if study site contributed CXRs for developing or training CAD programme, the number of CXR used, and whether this could affect generalisability of study results to sites where CXR have not contributed to software development or training.
Test performance and interpretationEvaluate a pre-specified threshold score that is recommended for use in similar settings.
Explore diagnostic heterogeneity arising from age, gender, HIV-infection and CD4 count, and smear status.
Report accuracy measures across a range of pre-specified threshold scores.
Explore diagnostic heterogeneity arising from HIV-infection and smear status.
Detection of important non-TB abnormalitiesInclude an assessment of the sensitivity of the CAD solution (compared to human readers) for identifying important non-TB abnormalities, defined as a radiological finding that could require treatment (e.g. bacterial pneumonia) or further clinical, and radiological follow-up (e.g. lung nodule/mass, pleural effusion).
The reference standard for important non-TB abnormalities should be a panel composed of at least a radiologist and a clinician blinded to CAD results.
Include an assessment of the sensitivity of the CAD solution (compared to human readers) for identifying important non-TB abnormalities, defined as a radiological finding that could require treatment (e.g. bacterial pneumonia) or further clinical, and radiological follow-up (e.g. lung nodule/mass, unfolded aorta, pleural effusion).
The reference standard for important non-TB abnormalities is a single specialist physician.

CXR: chest radiography; TB: tuberculosis; NAAT: nucleic acid amplification test. #: a microbiological reference standard is preferred over human-interpretation because the role of CXR in TB diagnosis is to select persons to undergo microbiological testing. Studies assessing the accuracy of CAD for detecting microbiologically confirmed TB provide evidence that is directly applicable to its role in the field, whereas studies using human reading as a reference standard provide evidence that is indirect, and less generalisable. : due to large sample sizes of systematic screening studies, it may be unfeasible to perform a microbiological test on all participants; hence it would be reasonable to use a human-reader interpretation of a CXR as normal as the reference standard for absence of pulmonary TB. In such studies, the human reader should have been instructed to “over read” CXR to lower the false-negative rate.