Abstract
Novel accurate tests are needed that identify individuals infected with Mycobacterium tuberculosis who have incipient disease and are likely to develop clinical tuberculosis (TB) in the near future to allow for targeted preventive treatment beyond the current risk groups. Recently, a target product profile was developed that outlines the minimal and optimal characteristics for such an incipient TB test. We describe an evaluation framework for generating evidence to inform the development of policy guidance for the use of such a new test by the World Health Organization. Two research objectives are addressed. 1) The predictive ability of an incipient TB test should be assessed in clinical evaluation studies that include the intended target population and follow-up of sufficient duration to observe whether individuals do or do not progress to clinical TB disease. 2) Studies are needed to evaluate the test under routine programmatic conditions and measure its impact on patient- or health-system-important outcomes. For both research objectives, study designs, methods and analysis are described, with the intent to inform the clinical development plans of test manufacturers, researchers and funders.
Abstract
An evaluation framework to evaluate new tests that predict development of clinical tuberculosis http://ow.ly/zVZH30kZEsU
Introduction
Programmatic management of latent tuberculosis (TB) infection (LTBI) has been poor in many high-burden countries, but if scaled up together with other treatment and prevention strategies it could accelerate the progress towards TB elimination [1–3]. LTBI is currently identified through a response to in vivo or in vitro immune stimulation by Mycobacterium tuberculosis antigens as shown by a positive tuberculin skin test (TST) or interferon-γ release assay (IGRA) in the absence of clinical signs of TB [4]. These tests, however, poorly predict which individuals progress to clinical disease, with a positive predictive value for developing clinical TB over a 2-year period ranging between 1.5% and 6.8%, depending on the risk group and setting [5, 6]. In general, the positive predictive value is lower for populations with a high likelihood of M. tuberculosis infection acquired in the past rather than recently, hence the number needed to treat to prevent one TB case is higher. Targeted testing and treatment is therefore only recommended for individuals with a high probability of a recent M. tuberculosis infection or with increased risk of progression to TB disease [7, 8].
New tests are needed to identify with high accuracy individuals who are infected with M. tuberculosis and are likely to develop disease in the near future. Several promising correlates of risk have been described, although only a few of them have been validated thus far [9–11]. Recently, Cobelens et al. [12] proposed to distinguish two conceptually different tests for 1) detecting persistent infection and 2) identifying incipient TB. It was argued that incipient TB tests detect a phase of early disease during which pathology evolves due to active mycobacterial replication and an associated inflammatory response, in the absence of clinical signs of disease (table 1). Such tests are expected to have a higher specificity and positive predictive value for predicting progression to clinical TB disease than current LTBI tests. Moreover, their performance characteristics should be largely independent of the population studied. The tests are considered “rule-in” tests: a negative result provides limited information (because of the limited sensitivity), but a positive result indicates that symptomatic (i.e. clinical, overt) clinical TB is likely to develop and therefore preventive treatment is indicated. Here, we outline an evaluation framework for new tests of incipient TB to inform assay manufacturers, researchers and funders about appropriate study designs to guide their clinical development plans and to generate evidence to inform the development of subsequent policy guidance by the World Health Organization (WHO). The process that was followed to reach consensus on the principles described here is depicted in table 2.
Definitions in tuberculosis (TB)
Consensus reaching process
Target product profile
The minimum specifications of an incipient TB test are described in a target product profile (TPP) [13]. This TPP was initiated by the Foundation for Innovative New Diagnostics, jointly developed with the Stop TB Partnership's New Diagnostic Working Group and WHO, and discussed in three expert group meetings (May 2015, July 2016 and February 2017). In short, such a test should be able to identify asymptomatic individuals with incipient TB who are likely to progress to clinical disease within the subsequent 2 years and who would therefore benefit from preventive treatment. A period of 2 years was chosen since progression to disease is highest shortly after infection; about half of the individuals who develop TB will do so within the first 2 years after infection [14]. In addition, the performance of a recently identified “correlate of risk” assay based on a 16-gene mRNA blood signature showed that prediction was greater nearer to the time of TB diagnosis and declined substantially after 2 years [15]. Our expectations of a test that measures incipient TB compared with tests that measure a LTBI, persistent infection or clinical TB are outlined in more detail in the supplementary material.
WHO evaluation and subsequent development of policy guidance
Since 2008, WHO follows the GRADE (Grading of Recommendations Assessment, Development and Evaluation) process for evidence synthesis and evaluation when developing new guidelines and policy recommendations [16, 17]. The evaluation framework presented here is meant to set a standard to generate the type of evidence that would be acceptable to support the GRADE process (“admissible evidence”) for WHO evaluation of an incipient TB test. Before entering into field evaluation studies, the test will have been scrutinised in earlier stages of development, to assess its reproducibility, robustness and variability under different circumstances. These early analytical studies, combined with data on test accuracy, are usually the focus for regulatory bodies to provide their approval. Requirements to obtain such approval are set out elsewhere [18–20]. However, to inform national and international guideline and policy development, evidence is required that looks beyond test accuracy, and demonstrates the effectiveness, cost-effectiveness and public health impact that can be anticipated when the product is applied in the clinical setting of intended use. Here, we describe the design standards for this last kind of study.
To generate such evidence, clinical evaluation studies are needed to assess the predictive ability of an incipient TB test. These studies are conducted in the intended target population and evaluate test performance in the absence of any additional intervention. To support a WHO policy guidance process, data generated from such studies should further be complemented by cost and transmission modelling in order to anticipate the patient and health system benefits. Health impact studies are needed to assess the intervention and measure its impact on patient- or health-system-important outcomes. Such studies are best done under routine programmatic conditions. We present here the design characteristics of the two types of studies.
Clinical evaluation studies: studies that aim to measure the test's predictive ability
WHO evaluation will be based on the ability of the test to predict the occurrence of TB disease. This requires longitudinal studies that follow a cohort of individuals over time, conducting the test in the intended target population, in settings where culture, Xpert MTB/RIF (Xpert) or Xpert Ultra are available to confirm or exclude incident clinical TB in the study population.
Study design and population
Studies assessing this question should be longitudinal (prospective), following a cohort of tested individuals at risk of TB progression (e.g. household contacts of smear-positive TB patients or HIV-infected individuals) and evaluating them over a specified duration (e.g. 2 years) for the occurrence of TB disease.
An alternative design is a case–control study nested within an (existing) cohort study, where samples from all individuals at risk of TB progression are taken at baseline and incident TB cases are captured either through active or passive follow-up. At study close-out samples from individuals diagnosed with TB and a random subset of individuals who remained TB-free (controls) are then tested with the novel tests. A random subset of individuals who were not diagnosed with TB at study close-out should be contacted to confirm that they remained TB-free. This design is less costly because fewer tests will be done. In these studies, as in others, active follow-up of all study participants is preferred to reduce the chance of loss to follow-up and potential selection bias.
For reasons of efficiency, clinical evaluation studies, regardless of their design, should ideally enrol individuals with likely M. tuberculosis exposure, either recent or not, but with a high risk of disease progression. However, a careful assessment and weighing of the potential benefits (e.g. prevention of clinical TB) and harms (e.g. depriving individuals from preventive therapy when potentially beneficial for them) of participating in research of novel incipient TB tests is essential, to make sure that 1) the country's LTBI policies are followed, and 2) for individuals that are not mentioned in the country policy equipoise can be assumed for the intervention and control arm.
Study methods
At study entry, prevalent symptomatic TB should be ruled out in accordance with current guidelines for starting preventive treatment. Studies should not attempt to rule out TB in a more rigorous way than is done routinely, e.g. by doing chest radiography or Xpert in individuals who do not have TB symptoms, as this might exclude cases of asymptomatic, incipient TB from the study population that the novel test is intended to identify.
Enrolled individuals should be tested with the novel test at least at baseline. Individuals are to be followed over time, irrespective of their initial test results, and evaluated for the occurrence of clinical TB at regular intervals (e.g. 3 or 6 monthly). The novel test may be repeated at the time-points when the end-point is being assessed to allow for evaluation of re-infection, disease progression or regression. The diagnosis of incident TB disease should be made blind of the initial test result. Follow-up is preferably active to make sure that individuals are followed with the same rigour regardless of the result of their incipient TB test and to limit the risk of loss to follow-up. Passive follow-up (for most of the study period) with a scheduled visit at the end of the study period may be acceptable in places where migration is limited and systems are in place for tracing back study participants. For nested case–control studies, all cases should be captured through robust registries and controls should be contacted to confirm that they remained TB-free. To prevent misclassification bias, ascertainment of incident TB disease should be with a highly specific test (e.g. culture or Xpert Ultra).
Data analysis
The primary study end-point is the cumulative incidence of confirmed bacteriologically clinical TB among individuals with a positive or negative incipient TB test at baseline. Secondary analyses may use other definitions for diagnosing and ruling out incident TB disease.
The predictive ability of the test can be expressed in different ways. We suggest to report: 1) the test accuracy (sensitivity and specificity for predicting incident clinically confirmed TB); 2) its positive and negative predictive value for clinical TB within a pre-specified period, the corresponding number needed to screen to find one positive test, and the number needed to treat to prevent one incident TB case; 3) the relative risk of a positive compared with a negative test for clinical TB within a pre-specified period; and 4) the incident rate of clinical TB after a positive and negative test, and corresponding incidence rate ratio.
All suggested outcomes can be measured in the same study. Outcomes may be shown for the total duration of follow-up (e.g. 2 years) as well as for shorter durations (i.e. 3, 6 or 12 months) to demonstrate possible decreases in predictive ability with increasing time between sample collection and onset of TB disease. Examples of such analyses were conducted by Zak et al. [15] who assessed the predictive ability of a RNA signature for incident clinical TB disease in a prospective cohort study of adolescents in South Africa, as well as others who assessed the relation between the presence of the correlate of risk and time till onset of TB [10, 11, 21]. For tests with a quantitative readout, different cut-offs for a positive result and trade-offs between sensitivity and specificity may be explored and presented, e.g. through a receiver operating characteristic curve.
Where feasible, studies should record and show stratified results for a range of different variables, including the history of previous TB disease, age, sex, bacille Calmette–Guérin vaccination status, risk of re-exposure (high/low-incidence country) and comorbidities (table 3). Information on individual TST and IGRA results would allow direct comparison of new tests with these existing LTBI tests and is highly recommended, even though the TST or IGRA should not be used as the reference standard. Dependent on the biological mechanism the novel test is measuring, the TST may be considered to be done only after the incident TB test was performed in order to avoid a boosting phenomenon. To inform policy, subgroups analyses or separate studies that include populations of special interest will be required, including, but not limited to, children, people living with HIV, individuals with other types of immunodeficiency (e.g. under tumour necrosis factor-α inhibitor therapy), diabetic patients and individuals with extrapulmonary TB or a history of prior TB or LTBI treatment.
List with minimum variables to measure in studies evaluating an incipient tuberculosis (TB) test
Challenges
Clinical evaluation studies of an incipient TB test pose a number of design challenges. In areas where TB incidence is low, one might not find sufficient eligible individuals with a history of (recent) M. tuberculosis exposure to enrol. Individuals who will receive preventive treatment according to guidelines cannot be included in the study without introducing bias, because preventive treatment reduces the risk of progression to clinical disease and individuals that decline preventive treatment may not be representative for all individuals who were at risk initially. Therefore, these studies should solely include individuals who are currently not routinely recommended for preventive treatment, such as individuals assigned to a nonintervention arm in randomised controlled trials (e.g. post-exposure vaccination trials). Another possibility is to randomise to either preventive treatment or placebo HIV-uninfected individuals with a positive test who are according to national guidelines not recommended for preventive treatment, as is done in the Correlate of Risk Targeted Intervention Study (CORTIS) in South Africa. In this trial, HIV-uninfected individuals with a positive RNA signature are randomised to receive a course of 3 months of weekly high-dose isoniazid and rifapentine (“3HP”) or no preventive treatment (ClinicalTrials.gov identifier NCT02735590) [22]. As all individuals with a positive RNA signature and a sample of those with a negative signature are followed, the predictive ability of the RNA signature for incident TB disease can be determined (figure 1).
Example of study design for the clinical evaluation of a novel incipient tuberculosis (TB) test. Study design is based on the Correlate of Risk Targeted Intervention Study (CORTIS) trial (ClinicalTrials.gov identifier NCT02735590) [22]. RR: risk ratio; IR: incidence rate; IRR: incidence rate ratio; NNS: number of individuals needed to screen to find a positive test; NNT: number of individuals needed to treat to prevent one incident TB case. Reproduced and modified from [13] with permission.
Another challenge is the low disease progression rate. Even in populations that carry an increased risk for reactivation, the cumulative TB incidence usually does not exceed 5% over a period of 2 years [23–25]. Studies therefore require large sample sizes to ensure that sufficient numbers of incident TB cases are observed during follow-up. For example, the CORTIS trial will be screening 10 000 HIV-uninfected individuals living in a highly endemic community in order to enrol 1500 test-positive and 1700 test-negative individuals [22].
Finally, TB re-infections may occur during the study period after the test was conducted. The rate of re-infection will be higher with higher TB incidence in the population in which the study is conducted. Re-infection may lead to misclassification bias in the accuracy estimates of the test depending on the re-infection rate and the length of follow-up. Since the re-infection rate may be modified by (partial) immunity due to existing LTBI, and differ between those tested positive and those tested negative, the magnitude and direction of this bias (under- or overestimation of the predictive values of the test) is difficult to predict. Potential misclassification bias can be minimised by shortening the follow-up period (especially in studies conducted in high-incidence settings), by repeating the test every 3 or 6 months during the study period and assessing its predictive ability for different lengths of follow-up, and by avoiding enrolment of individuals who are at repeated risk of TB exposure, such as healthcare workers exposed to TB patients (especially in studies conducted in low-incidence settings).
Health impact studies: studies to evaluate patient- or health-system-important outcomes
The second research objective to inform WHO evaluation addresses the potential impact of an incipient TB test on patient- or health-system-important outcomes when used in routine practice. Studies addressing this objective are conducted in settings of intended use, such as nontertiary care hospitals, DOTS (directly observed treatment, short-course) centres or primary healthcare facilities. Importantly, these studies assess the effectiveness and impact of the test when used to guide treatment decisions. Results of these studies may be used in subsequent impact and cost-effective modelling studies, which further assess the potential public health impact of the test.
Research questions related to the public health impact of an incipient TB test include the following. 1) What is the effectiveness of the test for reducing incident TB when combined with a strategy offering preventive treatment upon a positive test? 2) Is the test, combined with preventive treatment, a cost-effective strategy to reduce clinical TB in the intended target population? 3) Is the test, combined with preventive treatment, a more effective and cost-effective strategy compared with the current standard (e.g. alternative LTBI test-and-treat strategies)? 4) What is the effect of the test combined with preventive treatment on the occurrence of adverse effects (e.g. hepatotoxicity) when compared with strategies based on the TST and/or IGRA? 5) What is the effect of the test combined with preventive treatment on the uptake and adherence to therapy?
While health impact studies could run in parallel with clinical evaluation studies, some ethical review boards may require data from clinical evaluation studies indicating that the novel test predicts incident TB disease equally well as, or better than, current LTBI tests, such that equipoise can be assumed.
Study design and population
Studies aiming to answer the aforementioned research questions may compare a test-and-treat strategy using the novel test with the current standard and be individually or group randomised [22, 24, 26]. The standard may be TST and/or IGRA testing followed by preventive treatment, or no LTBI testing in settings where this is not routinely done. Alternative study designs may include stepped-wedge trials, although these may have limitations with regard to their interpretation [27].
An example study design for a pragmatic randomised controlled trial is given in figure 2. Individuals or clusters are randomly assigned to receiving either the standard of care (in this example the TST and/or IGRA) or the novel test. Individuals in both arms are offered preventive treatment when their test is positive. All individuals, irrespective of their test results, are followed for the occurrence of incident TB disease. At study close-out the difference in the cumulative incident TB cases, number of patients provided preventive treatment, frequency of adverse events, and patient and health system costs are compared between trial arms.
Example of study design for the evaluation of health impact. TST: tuberculin skin test; IGRA: interferon-γ release assay; TB: tuberculosis; AE: adverse event; NNS: number of individuals needed to screen to find one positive test; NNT: number of individuals needed to treat to prevent one incident TB case. Reproduced and modified from [13] with permission.
To inform international guideline development, study populations should include the intended (future) target population for the test, as described in the TPP. Studies may be conducted in low- and high-incidence countries, and could have a similar design. As studies evaluating the public health impact will offer preventive treatment to those tested positive, there is less potential for (indication) bias than for clinical evaluation studies.
Study methods
All individuals enrolled in the study are followed for the same pre-specified period, at least 1 year after the completion of preventive treatment, irrespective of their test result and irrespective of whether they receive preventive treatment or not. Follow-up after completion of preventive treatment is done to measure the incidence of TB post-treatment. The whole study population is assessed for the occurrence of TB disease at the end of the study period and identified cases classified as clinical TB are included in the outcome. Ideally the outcome assessment is done blinded to the initial test result to avoid differential verification or incorporation bias. Active follow-up (e.g. screening participants for TB symptoms during 3-monthly visits) is preferred above passive follow-up in particular to limit cohort attrition, which may otherwise be more likely to happen in the group with no preventive treatment. Ascertainment of incident clinical TB disease may be done according to routine practice, i.e. following national guidelines for diagnosing clinical TB.
Data analysis
At analysis the outcomes (e.g. incidence of TB disease, costs and occurrence of side-effects) in the arm that received the novel test-and-treat strategy are compared with those in the alternative arm. The primary analysis should be based on the intention-to-treat cohort, which includes all individuals who were enrolled in the arm they were allocated to, irrespective of whether they adhered to all interventions in their assigned arm.
The minimum list of variables to be collected is as suggested for studies of predictive ability (table 3). In addition, data should be collected on the acceptance of the novel test, acceptance of preventive treatment when test-positive, adverse events, cost of the whole test-and-treat intervention and that of the alternative strategy. In addition to a direct comparison of the effectiveness of the strategy, the study may also report on the cost and cost-effectiveness of the tested strategy and the occurrence of adverse effects. All these outcomes inform the positive and negative implications of scaling up the novel test-and-treat strategy and its potential budget implications.
While the research questions outlined earlier serve to generate a minimum set of evidence, other analyses may be worthwhile to further explore using the data from the studies described. For instance, the predictive utility of different cut-off levels of the test may be explored in different subgroups. Furthermore, the predictive ability of the test may be improved when combined with other patient parameters [28]. Lastly, the long-term public health impact may be assessed by varying cut-offs or prediction models in combination with different preventive treatment regimens.
Concluding remarks
We described the design standards for two sets of research questions that need to be addressed to inform development of policy guidance for the use of novel incipient TB tests by WHO, with the intent to help test developers, manufacturers and others to design appropriate studies and report informative outcome measures.
For clinical evaluation, comparative studies, while providing the highest quality of evidence when using randomisation of subjects, are hampered by high cost. A way to minimise these costs is to design the study such that multiple research questions can be answered using the same design, of which several examples are known (ClinicalTrials.gov identifier NCT02735590 [24, 29]). Another option is to make use of stored specimens (sample banks) that were collected in longitudinal studies and retrospectively analyse the test performance in a nested case–control study [10, 11, 15]. For health impact studies, an alternative is to use mathematical modelling approaches to estimate the potential population-level impact of the test-and-treat intervention under different circumstances.
Altogether, the development of novel, highly specific tests for identifying individuals with incipient TB who are likely to develop clinical TB with high accuracy may fill an important gap in the existing repertoire of TB diagnostics. When combined with highly effective preventive treatment in the intended target population, such tests can accelerate the progress towards TB elimination.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Online supplement ERJ-00946-2018_Supplement
Acknowledgements
We thank the Stop TB Partnership New Diagnostics Working Group (NDWG) for their coordination of the Task Force on “Latent TB infection and test of progression” and coordinating the expert and stakeholders’ consultation meetings. We thank Alessandra Varga (NDWG, Geneva, Switzerland) for critical review of an earlier version of this manuscript and for facilitating the expert meetings. We also thank all participants that participated in the technical expert consultation meeting on February 8, 2017: Dick Menzies (McGill University, Montreal, QC, Canada), Helen Ayles (LSHTM/Zambart, London, UK), Pauline Beattie (EDCTP, The Hague, The Netherlands), Grania Brigden (The Union, Paris, France), Hanif Esmail (University of Oxford, Oxford, UK), David Lewinsohn (University of Portland, Portland, OR, USA), Thomas Scriba (University of Cape Town, Cape Town, South Africa), Marieke van der Werf (ECDC, Stockholm, Sweden), Kevin Windthrop (Oregon HS, Portland, OR, USA), Jean-Pierre Zellweger (Swiss Lung Association, Bern, Switzerland), Ruvandhi Nathavitharana (TB Proof, South Africa), Khairunisa Suleiman (Global TB Community Advisory Board Diagnostic Workgroup, Nairobi, Kenya), Norbert Ndjeka (Drug-Resistant TB, TB and HIV, National Dept of Health, Pretoria, South Africa), Rohit Sarin (National Institute of TB and Respiratory Diseases, New Delhi, India), Irina Vasilieya (Ministry of Health, Moscow, Russian Federation), Nguyen Viet Nhugn (National TB Programme and National Lung Hospital, Hanoi, Vietnam), René Becker-Burgos (Global Fund, Geneva, Switzerland), Thomas Forissier (Bill and Melinda Gates Foundation, Seattle, WA, USA), Alexandra Asbach-Nitzsche (Lophius, Regensburg, Germany), Jeff Boyle (Qiagen, Germantown, MD, USA), Anke Coblenz (Abbott, Des Plaines, IL, USA), William Cruikshank (Oxford Immunotec, Oxford, UK), Philippe Jacon (Cepheid, Maurens-Scopont, France), Masae Kawamura (Qiagen, Germantown, MD, USA), Oksana Markova (Generium, Moscow, Russian Federation), Chris Novak (Roche, Basel, Switzerland), Morten Ruhwald (Staten Serum Institute, Copenhagen, Denmark), Andrae Vinson (BD, Research Triangle Park, NC, USA), Haileyesus Getahun (WHO, Geneva, Switzerland), Yohhei Hamada (WHO, Geneva, Switzerland), Alexei Korobitsyn (WHO, Geneva, Switzerland), Karin Weyer (WHO, Geneva, Switzerland).
Footnotes
This article has supplementary material available from erj.ersjournals.com
Conflict of interest: C.M. Denkinger: FIND (Foundation for Innovative New Diagnostics) is a not-for-profit foundation, whose mission is to find diagnostic solutions to overcome diseases of poverty in low- and middle-income countries (LMICs). It works closely with the private and public sectors and receives funding from some of its industry partners. It has organisational firewalls to protect it against any undue influences in its work or the publication of its findings. All industry partnerships are subject to review by an independent Scientific Advisory Committee or another independent review body, based on due diligence, target product profiles and public sector requirements. FIND catalyses product development, leads evaluations, takes positions and accelerates access to tools identified as serving its mission. It provides indirect support to industry (e.g. access to open specimen banks, a clinical trial platform, technical support, expertise, laboratory capacity strengthening in LMICs) to facilitate the development and use of products in these areas. FIND also supports the evaluation of prioritised assays and the early stages of implementation of WHO-approved (guidance and prequalification) assays using donor grants. In order to carry out test validations and evaluations, FIND has product evaluation agreements with several private sector companies, which strictly define its independence and neutrality vis-a-vis the companies whose products get evaluated, and describe roles and responsibilities.
- Received March 9, 2018.
- Accepted July 12, 2018.
- The content of this work is copyright of the authors or their employers. Design and branding are copyright ©ERS 2018.