Epidemiological cut-off values for a 96-well broth microdilution plate for high-throughput research antibiotic susceptibility testing of M. tuberculosis

Drug susceptibility testing of M. tuberculosis is rooted in a binary susceptible/resistant paradigm. While there are considerable advantages in measuring the minimum inhibitory concentrations (MICs) of a panel of drugs for an isolate, it is necessary to measure the epidemiological cut-off values (ECOFF/ECVs) to permit comparison with qualitative data. Here we present ECOFF/ECVs for 13 anti-tuberculosis compounds, including bedaquiline and delamanid, derived from 20 637 clinical isolates collected by 14 laboratories based in 11 countries on five continents. Each isolate was incubated for 14 days on a dry 96-well broth microdilution plate and then read. Resistance to most of the drugs due to prior exposure is expected and the MIC distributions for many of the compounds are complex, and therefore a phenotypically wild-type population could not be defined. Since a majority of samples also underwent genetic sequencing, we defined a genotypically wild-type population and measured the MIC of the 99th percentile by direct measurement and via fitting a Gaussian using interval regression. The proposed ECOFF/ECVs were then validated by comparing with the MIC distributions of high-confidence genetic variants that confer resistance and with qualitative drug susceptibility tests obtained via the Mycobacterial Growth Indicator Tube (MGIT) system or Microscopic-Observation Drug Susceptibility (MODS) assay. These ECOFF/ECVs will inform and encourage the more widespread adoption of broth microdilution: this is a cheap culture-based method that tests the susceptibility of 12–14 antibiotics on a single 96-well plate and so could help personalise the treatment of tuberculosis.


Introduction
Mycobacterium tuberculosis kills more people worldwide than any other single pathogen, severe acute respiratory syndrome coronavirus 2 excepted [1]. Despite its impact on global health, antibiotic susceptibility testing (AST) for M. tuberculosis has lagged behind other bacterial diseases due to its slow growth rate, difficulty in culturing and its low prevalence in high-income countries. The consequence is that most patients in the world receive empiric, or semi-empiric, treatment, which reduces the chance of treatment success and risks the amplification of resistance where too few effective drugs are prescribed.
infrastructure dependent, and comprehensive AST is the only way to optimise treatment regimens and outcomes.
The Comprehensive Research Prediction for Tuberculosis: an International Consortium (CRyPTIC) research project has collected 20 637 clinical M. tuberculosis samples from across the world. The primary aim of the project is to identify mutations in the M. tuberculosis genome that confer phenotypic resistance to a wide range of antibiotics. The CRyPTIC project measured minimum inhibitory concentrations (MICs) of each drug to permit quantitative analyses, associating mutations with MIC values with a view to using genome sequencing data to personalise drug regimens and doses. From the start the CRyPTIC project has taken a data-driven approach whereby all analyses are algorithmic, hence the allocation of a sample to a subgroup requires little or no expert, and hence subjective, intervention. This has the virtue of ensuring the results are reproducible.
The most practical and affordable means of determining MICs at scale was to use a pre-prepared 96-well 7H9 broth microdilution plate based on the Thermo Fisher Sensititre MYCOTB MIC plate [13][14][15][16][17][18], but including the new or repurposed antibiotics that feature in current World Health Organization (WHO) guidance [19]. The CRyPTIC project designed a variant of the MYCOTB plate, called UKMYC5, that contains 14 antibiotics, including bedaquiline (BDQ), delamanid (DLM), clofazimine (CFZ) and linezolid (LZD) but not pyrazinamide (PZA) (figure 1a). Based on a multi-laboratory study that examined the inter- and intra-laboratory reproducibility of the UKMYC5 plate, and determined the optimum reading methods and incubation period [17], CRyPTIC subsequently modified the design by removing para-aminosalicylic acid (PAS) and extending/changing the concentration of certain drugs, leading to the 13-drug UKMYC6 plate (figure 1b).
In this article we propose epidemiological cut-off values (ECOFF/ECVs) for the UKMYC series of plates to enable subsequent research on this dataset. The ECOFF/ECV is the highest MIC observed within a phenotypically wild-type ( pWT) population, usually defined as the MIC which encompasses 99% of that population [20], and allows interpretation of an MIC value as "susceptible" or "resistant", which is crucial to the decision on whether to prescribe a drug. The standard approach requires uncensored MICs and assumes that the pWT population can be readily identified, either because the population has been minimally exposed to the drug or because the MIC distribution is strongly bimodal. These conditions are not universally met in our dataset and we shall therefore identify a genotypically wild-type (gWT) population from which we can either measure the ECOFF/ECV directly or via a Gaussian fitted using interval regression, a statistical technique that can fit to censored data. We have made Python code publicly available that enables anyone to reproduce most of the figures and tables in a web browser window [21]. Although ECOFF/ECVs have been proposed for the MYCOTB microdilution plate using 385 strains from South Africa [22], here we are able to draw upon a far larger and more geographically diverse M. tuberculosis dataset.

Results
AST was performed on 20 637 isolates to 13 anti-tuberculosis (TB) drugs using either the UKMYC6 (12 672 (61%)) or the UKMYC5 (7965 (39%)) plate design (table 1, and figure 1a and b). These data were generated in 14 CRyPTIC laboratories based in 11 countries on five continents (figure 1c and  supplementary table S1). The isolates themselves were collected from 27 countries, with 19 countries contributing 10 or more isolates and 15 countries contributing 100 or more isolates (supplementary table  S2). Due to differences between the laboratories, it was not possible to collect clinical outcome data for the samples. Quality control processes detected that one laboratory developed a problem inoculating the plates (these plates were removed) and that another laboratory never managed to inoculate successfully (all their plates were excluded). Excluded these left 17 054 plates.
A previous study demonstrated that MICs measured by a single laboratory scientist after 14 days incubation of the UKMYC5 plate using either a Thermo Fisher Vizion instrument or a mirrored-box were reproducible and accurate [17]. As a further reproducibility check we pooled the MIC measurements of the H37Rv reference strain that were taken as part of our quality control process (supplementary figure S2 and  supplementary table S6); the histograms for both UKMYC plates showed that the majority of MICs measured by the laboratory scientists for many, but not all, of the drugs lay within one doubling dilution of the mode. Since the magnitude of MIC measurement error is anticipated to be much greater than the error in the genetic sequencing, we constructed an MIC quality assurance (QA) process to minimise the measurement error of the MICs (supplementary figure S3). This measured each MIC using up to three independent methods and only MICs where two of these methods concur are allowed into the final dataset.
Overall, 77% of all MIC measurements passed the MIC QA process.

MIC histograms are different for different drugs
As expected, the MIC histograms differ between drugs (figure 2, and supplementary figures S5 and S6); the MICs for some compounds form bimodal distributions (isoniazid (INH), kanamycin (KAN), amikacin (AMI), rifampicin (RIF) and rifabutin (RFB)) and therefore conform to the classical binary paradigm whereby an isolate is either "resistant" or "susceptible". CRyPTIC aimed for half the isolates collected to be multidrug resistant (MDR), and the MIC histograms for INH and RIF are consistent with this. Given this bias towards MDR in the dataset, one would expect appreciable resistance to ethambutol (EMB), ethionamide (ETH), both fluoroquinolones and both aminoglycosides. Both drugs belonging to the latter class indeed have a subset of isolates with very high MICs. The MIC histograms for the remaining compounds (EMB, ETH, moxifloxacin (MXF) and levofloxacin (LEV)) are not bimodal, hence it is unclear whether they can be adequately described by two log-normal distributions. Since the remaining drugs on the plates (BDQ, DLM, CFZ and LZD) have not yet been widely used, and for some countries were not even available to treat TB, one expects little resistance in the dataset and hence it is likely these MIC histograms are "phenotypically wild-type" ( pWT). All the MIC histograms are truncated/censored at either one or both ends and some are severely truncated with the mode MIC occurring in the lowest dilution (AMI, RFB and DLM). Our large dataset allows us to use reproducible, algorithmic approaches for estimating the 99th percentile of the wild-type population.
Iteratively fitting a log-normal distribution ECOFFinder is a heuristic approach that attempts to iteratively fit a log-normal distribution to the MIC histogram, and is recommended by both the European Committee on Antimicrobial Susceptibility Testing (EUCAST) and the Clinical and Laboratory Standards Institute (CLSI) [20,25]. EUCAST advise that ECOFFinder should not be applied to truncated data, but here we apply it to demonstrate how it performs for different levels of censored data. Distributions derived using ECOFFinder (figure 2, supplementary figure S6 and table 2) describe our data well where the MIC histogram is minimally truncated (ETH, LZD and BDQ); however, where the MIC histogram is heavily truncated (AMI, RFB and DLM) the resulting log-normal distribution does not fit the MIC histogram and where the mode MIC is resistant (RIF) it fails to perform a fit at all. In addition, since ECOFFinder requires a single consistent MIC distribution and our dataset is composed of two plate designs, two ECOFF/ECVs are returned for each drug. For many drugs these are very similar, but for EMB and DLM the estimates are almost a doubling dilution different.
To overcome the problem that our MIC histograms are truncated we applied interval regression, an established statistical method for fitting normal distributions to truncated data [26,27] which, unlike conventional maximum-likelihood algorithms, takes into account that observations are properly represented by intervals. The entire dataset containing measurements from both plate designs can then be considered simultaneously, resulting in a single pair of log-normal distributions that describe the MIC histograms on both plate designs (supplementary figure S7). The model fails to converge for KAN and EMB, and for several drugs the second distribution has a variance much larger than the MIC range, which is nonsensical (AMI, ETH, RFB, CFZ, LZD, DLM and BDQ), although for the new and repurposed compounds this is understandable since we do not expect many resistant isolates. Where the two distributions describe the data reasonably well (INH, RIF, MXF and LEV), they are well separated, as defined by the 99th percentile of the lower distribution (ECOFF/ECV) being smaller than the 1st percentile of the upper distribution (the non-wild-type cut-off value (NCOFF)), with the exception of INH where NCOFF<ECOFF.
Defining a gWT population Using these approaches we were not able to produce acceptable results when the MIC histogram is truncated and/or is not clearly bimodal. In the latter case it is probable that the overall MIC histogram is a convolution of several smaller, narrower distributions. Genetics offers a way to disentangle these subpopulations: one can predict genetically the susceptibility of strains to most, but not all, of the 13 anti-TB compounds of interest [6][7][8]. Note that we were unable to use the newer and more comprehensive genetic catalogue released by the WHO since its derivation set included these samples [9,10]. We predicted the antibiogram for the first-line (INH, RIF, EMB and also PZA; see Discussion) and second-line (AMI, KAN, LEV, MXF and ETH) compounds (Methods). No predictions were made for the other anti-TB compounds on the plate since the association between genetics and their resistance is poorly understood at present.
We defined an isolate as being gWT if it is predicted to be susceptible to the four first-line compounds and not resistant to the second-line compounds (see [6] for the distinction). The laxer criterion for the second-line compounds allowed for the fact that our understanding for these drugs is less complete. Epidemiologically, it is the case that if an isolate is susceptible to all four first-line antibiotics, it is also likely to be susceptible to second-line antibiotics (except perhaps for prior fluoroquinolone exposure or deeply rooted second-line resistance mutations). To contribute, a laboratory had to have collected susceptible samples which had undergone whole-genome sequencing and also had a high-quality   table S9). Visually, the resulting MIC histograms are simpler and more likely to be adequately described by a single log-normal distribution (supplementary figure S8).
Directly measuring the ECOFF/ECV from the gWT population Directly determining the MIC of the 99th percentile from the gWT population is an attractive option since it requires no further assumptions. This is not usually possible since typically either one cannot discern the wild-type population and/or there are an insufficient number of isolates. The large size of our dataset and the inclusion of genetic information enables us to directly measure the ECOFF/ECV (figure 3 and supplementary figure S9). Our dataset is enriched for resistance, hence the proportion of resistant samples misclassified as susceptible due to sample mislabelling is likely to be of the order of a few percentage points, even after we have removed some putative mislabelled samples (Methods). This makes directly identifying the 99th percentile challenging, hence we shall also consider the 97.5th and 95th percentiles.
All three percentiles for the MIC histograms of the gWT population are at most two doubling dilutions apart, except for LEV (UKMYC5) and INH (UKMYC6). The latter has an appreciable number of isolates that, despite being classified as gWT, have elevated MICs. These are likely due to some remaining samples that were mislabelled and illustrates the difficulty in using the 99th percentile to define an ECOFF/ECV due to its sensitivity to errors in the dataset, especially when the prevalence of resistance is high, as in the case for INH in our dataset. We shall take forward the values for the 99th percentiles (table 2) but will bear in mind that a high amount of variation may indicate strain mislabelling.

Interval regression takes account of the truncated distributions
To avoid the identification of the 99th percentile being disproportionately affected by a small number of mislabelled resistant samples one usually fits a log-normal distribution to the pWT (here gWT) population and then calculates from the resulting function the MIC of the 99th percentile. We cannot apply ECOFFinder here since its heuristic requires the presence of non-susceptible isolates in the distribution, so we instead simultaneously fit a single log-normal distribution using interval regression to the MIC histograms from both plate designs (figure 4, supplementary figure S10 and table 2). With the exceptions of INH, RFB and DLM, the resulting log-normal distributions describe the MIC histograms well, even when there is moderate truncation due to the plate design. The RFB MIC distribution is, however, extremely truncated and hence there are insufficient data to perform a fit; the concentration range for this drug should be lowered in future designs.

Proposed ECOFF/ECVs
We infer that direct measurement is the most reliable method since it makes the fewest assumptions. However, for drugs where there is variation of more than a doubling dilution between the 95th, 97.5th and 99th percentiles, which may indicate the gWT population includes a small but unknown number of isolates with elevated MICs, we shall place greater weight on the result obtained by interval regression. When the MIC histogram is not heavily truncated, we shall also include the 99th percentile reported by ECOFFinder. Note that to convert an MIC that is reported as a real number into an ECOFF/ECV it should be rounded up to the next value in the doubling dilution series. All these data and the resulting ECOFF/ECVs are shown in figure 5 and table 2.   The 99th percentile determined by direct measurement for INH for the UKMYC6 dataset was discounted due to the large range of MICs spanned by the percentiles; this is most likely due to resistant samples misidentified as susceptible due to laboratory mislabelling. INH has the highest prevalence of resistance in the dataset and hence would be affected most. Our starting point is therefore the corresponding value for the UKMYC5 dataset (0.1 mg·L −1 ). The ECOFFinder results were ignored for INH since the gWT MIC histogram is truncated on both plate designs and hence the fits were poor. Visually, the gWT MIC histograms (supplementary figure S8) do not appear to follow a log-normal distribution and consequently   FIGURE 4 Interval regression is able to fit a log-normal distribution to the minimum inhibitory concentration (MIC) histograms of the genotypically wild-type (gWT) isolates for all 13 drugs on the UKMYC6 plate. Data from both plate designs were considered simultaneously, hence the resulting distributions are those the algorithm considers to best describe both the UKMYC5 (supplementary figure S10) and UKMYC6 datasets. See figure 1 for drug abbreviations. See the TSV file in the supplementary material for the numerical data. The data can be reproduced online [21].   interval regression overestimates the 99th percentile ( figure 4 and supplementary figure S10). The ECOFF/ ECV of 0.1 mg·L −1 for INH is therefore less well supported than the values for the remaining drugs.
Direct measurement of the 99th percentiles for RIF were 0.5 and 0.25 mg·L −1 for the UKMYC6 and UKMYC5 datasets, respectively. Again, ECOFFinder was not used due to concerns about truncation. Visually, the gWT MIC histogram appears more normal in character and the 99th percentile derived from the interval regression fit is 0.52 mg·L −1 . Our proposed consensus ECOFF/ECV for RIF is hence 0.5 mg·L −1 . Direct measurement produced a consistent value of 4 mg·L −1 for EMB, which is supported by interval regression and ECOFFinder for the UKMYC5 dataset (the concentration range on the UKMYC6 plate was more truncated). Both fluoroquinolones behaved similarly: direct measurement gave a value of 1 mg·L −1 for the 99th percentile for both compounds for the UKMYC6 dataset, but 2 and 4 mg·L −1 for the UKMYC5 dataset (MXF and LEV, respectively). These values for the latter dataset were very sensitive to the exact percentile used in the definition (supplementary figure S9), again suggesting that these gWT populations may contain a small number of mislabelled resistant samples. Both drugs have the same concentration range on both plate designs, are only moderately truncated and hence ECOFFinder would be expected to give reasonable results. These, along with the result of the interval regression (figure 5), result in an ECOFF/ECV of 1 mg·L −1 for both fluoroquinolones.
Direct measurement indicates the 99th percentile for KAN is 8 and 4 mg·L −1 for the UKMYC6 and UKMYC5 datasets, respectively, while it produces the consistent value of 1 mg·L −1 for AMI. The MIC histogram of the latter is too truncated for ECOFFinder to function correctly and visually the log-normal fitted by interval regression appears to have overestimated the 99th percentile as 1.6 mg·L −1 ; hence, we propose an ECOFF/ECV for AMI of 1 mg·L −1 . The KAN MIC histograms are less truncated and interval regression better describes the gWT population; these data support an ECOFF/ECV of 4 mg·L −1 for KAN. For ETH, direct measurement produces a consistent value of 4 mg·L −1 , which is supported by interval regression and ECOFFinder. The MIC histogram of RFB is extremely truncated and hence only direct measurement is likely to be effective; it estimates 0.12 mg·L −1 to be the 99th percentile for both datasets, which is therefore our ECOFF/ECV.
Direct measurement yields consistent values of 0.25 and 1 mg·L −1 for BDQ and LZD, respectively, with each supported by both interval regression and ECOFFinder. Our ECOFF/ECV for DLM is 0.12 mg·L −1 since this is the direct measurement which is the same for both datasets and it is supported by interval regression. Lastly, direct measurement for CFZ suggests the 99th percentiles are 0.25 and 0.5 mg·L −1 for the UKMYC6 and UKMYC5 datasets, respectively. The latter has more variation and hence we propose its ECOFF/ECV is 0.25 mg·L −1 .
Comparison against genetic variants known to confer resistance Using the subset of the isolates with genetic information, we can examine our proposed ECOFF/ECVs by plotting the MIC histograms of several genetic variants that are widely accepted to confer resistance to key anti-TB drugs ( figure 6 and supplementary figure S11). The rpoB S450L and katG S315T single nucleotide polymorphisms (SNPs) substantially increase the MICs of RIF and INH, respectively, and the majority (96.9% and 99.5%) of isolates with these mutations had MICs greater than the ECOFF/ECV. The C-15T mutation in the promoter of the fabG1/inhA operon was associated with borderline (0.2 mg·L −1 ) INH MICs unless present in combination with a katG S315T mutation (MIC >1.6 mg·L −1 ), as observed elsewhere [22,28]. It is likely that this promoter mutation, and others like it, are responsible for the small peak in the MIC histogram observed for INH at 0.2 mg·L −1 .
Substituting isoleucine or valine at position 306 in the embB gene was associated with elevated EMB MICs; however, the increase in MIC is much less than observed for either of the RIF or INH resistance-conferring mutations mentioned earlier, leading to only 58.3% and 39.6% of isolates containing these mutations, respectively, having an MIC above the ECOFF/ECV. This is expected since it is known that isolates containing these variants can have variable or discordant MGIT results [29]. For both fluoroquinolones, the gyrA D94G mutation increases the MIC more than the gyrA A90V mutation [30]; however, for LEV the wild-type and non-wild-type populations appear slightly better separated, with the result that for these mutations 90.0% and 90.7% of isolates lie above the ECOFF/ECV, while for MXF the equivalent values are 87.0% and 58.2%. The majority of isolates (90.7% and 89.3%) with the A1401G mutation in the rrs gene have an MIC above the ECOFF/ECV for KAN and AMI, respectively. Finally, while the C-15T mutation in the promoter of the fabG1/inhA operon increases the MIC of ETH more than it does INH, only 80.4% of isolates with this variant lie above the ECOFF/ECV.

A role for a borderline category?
The ECOFF/ECV merely defines an MIC below which the majority of the "wild-type" isolates should lie. It does not necessarily follow that the majority of non-wild-type isolates have an MIC above the ECOFF/ ECV, and therefore care needs to be taken when using an ECOFF/ECV to define susceptibility and resistance. Although for most drugs an MIC below or equal to the ECOFF/ECV can be categorised as "susceptible" and those with MICs above the ECOFF/ECV are "resistant", the MIC histograms of INH, EMB and ETH are more complex. There is genetic evidence (figure 6) that this is due to a multitude of genetic variants, each with a different effect on the MIC. We therefore propose a third category, "borderline", for INH, EMB and ETH ( figure 6 and table 3).

Validation by comparison with MGIT and MODS results
The resistance of a subset of isolates was independently tested to a range of compounds using either the Mycobacterial Growth Indicator Tube (MGIT) system or Microscopic-Observation Drug Susceptibility (MODS) assay [31]. We can therefore validate our MIC-based categorisation by directly comparing between the binary (or ternary) phenotype derived from an MIC and the result from one of these well-established clinical microbiology methods (figure 7, supplementary figure S12 and supplementary table S10).  Minimum inhibitory concentrations (MICs) of isolates containing genetic variants known to confer resistance to different drugs tend to lie above the epidemiological cut-off value (ECOFF/ECV) on the UKMYC6 plate: a) isoniazid, b) rifampicin, c) ethambutol, d) moxifloxacin, e) levofloxacin, f) kanamycin, g) amikacin and h) ethionamide. The number of isolates lying above and below the ECOFF/ECV is annotated. The dashed line indicates the margin of a proposed "borderline" category for isoniazid, ethambutol and ethionamide. The same analysis has been repeated on the UKMYC5 dataset (supplementary figure S11) and can be reproduced online [21].  The agreement between MGIT and UKMYC is good, with a sensitivity of 93.4% and a specificity of 97.0% (supplementary table S10). Since the borderline category lies above the ECOFF/ECV, it is interpreted as providing a way of discriminating between isolates with a moderately elevated MIC and those with a high MIC. For RIF, the agreement between MGIT and UKMYC is excellent, with a sensitivity of 96.5% and a specificity of 96.6%. The borderline category for EMB provides a "buffer zone" since isolates with an MIC of 4 mg·L −1 are only 70.4% resistant according to MGIT. Ignoring these isolates, the sensitivities and specificities are 91.4% and 91.9%, respectively, for EMB. Since the "borderline" category in this case lies below the ECOFF/ECV, these isolates would otherwise be classified as "susceptible" but in reality have a mixed character. Hence, not assigning a borderline category would result in the sensitivities and specificities becoming 72.8% and 92.5%, respectively.
The aminoglycosides behave similarly to one another, with sensitivities and specificities of 76.2% and 99.1% for KAN and 84.3% and 99.3% for AMI, respectively. The "borderline" category for ETH (MIC 4 mg·L −1 ) is 79.4% resistant according to MGIT. Excluding these isolates, the sensitivity is 63.0% and 97.0%, respectively. Limited numbers of isolates were tested for MXF or LEV resistance using MGIT (supplementary figure S12). Although large number of isolates were tested for CFZ and LZD resistance by MGIT (supplementary figure S12), the low prevalence of resistance ensures no useful conclusions can be drawn.
A different set of samples were tested in parallel using the MODS assay (supplementary figure S13 and supplementary table S11). The sensitivities and specificities for INH (n=1888) and RIF (n=1857) were 95.3% and 98.9% and 95.1% and 99.2%, respectively.

Discussion
We have proposed ECOFF/ECVs for research use for 13 different anti-TB compounds for the UKMYC series of broth microdilution plates using an aggregated dataset of 20 637 TB samples collected worldwide by 14 CRyPTIC laboratories based in 11 countries on five continents. The UKMYC6 plate design (figure 1b) not only contains the first-line drugs RIF, INH and EMB, but also all of the Group A medicines, one of the two Group B medicines (CFZ) and five of the seven Group C medicines recommended by the WHO for treating cases of MDR-TB [19]. As such these plates offer near comprehensive phenotypic AST as well as a standardised, scalable phenotype that, once analysed with linked genomic data, will inform clinical decisions where routine diagnostics switch to genome sequencing.
We caution that while the ECOFF/ECVs proposed herein have been derived using the largest collection of M. tuberculosis samples to date, the methods do not conform with those laid out by EUCAST. That said, our analyses illustrate that the EUCAST definition of an ECOFF/ECV as the 99th percentile of the wild-type population [20] is difficult to apply in practice since firstly it is not always possible to define which isolates are "phenotypically wild-type" (pWT) without engaging in a circular argument. We were able to avoid this here by defining a "genotypically wild-type" (gWT) population. The second problem is that using the 99th percentile to define the ECOFF/ECV places a very stringent upper limit on the total error rate, which becomes harder to meet as the prevalence of resistance in any dataset increases. Despite our efforts, we see evidence that our gWT populations for some compounds contain >1% resistant isolates for some drugs which, for example, hampers the use of direct measurement. In contrast, the CLSI have a less-stringent definition for the ECOFF/ECV which avoids this issue [32], but in turn can create inconsistencies between studies. As suggested elsewhere, using a lower percentile (e.g. 97.5th) could help [22].
The CLSI recently proposed breakpoints for the MYCOTB plate [33]. There are a few minor differences: our proposed ECOFF/ECV for RIF is one doubling dilution lower at 0.5 mg·L −1 . The impact of this is difficult to assess due to the paucity of isolates with MICs of 0.5 and 1.0 mg·L −1 . For INH, although it is not possible to make an exact comparison between the ECOFF/ECVs since the doubling dilution series used on the UKMYC plates and by the CLSI are different, the value proposed by the CLSI (0.12 mg·L −1 ) is close the value proposed here (0.1 mg·L −1 ). The CLSI breakpoints for EMB exactly agree with the MIC-based classification adopted by CRyPTIC. Our ECOFF/ECVs are different to those of a recent MYCOTB study [22]; however, we note that the number of samples was modest (n=385) and originated from a single country. In addition, ECOFF/ECVs were determined using ECOFFinder, which given the truncated nature of the MIC histograms for many of the drugs, is not now advised and may have biased some of the results. Our ECOFF/ECVs for RIF and INH are, however, consistent with a recent recommendation made by the WHO, albeit for MGIT.
Critical concentrations for several of the drugs on the UKMYC plates (which are inoculated with 7H9 growth media) exist for M. tuberculosis grown in other growth media, such as Löwenstein-Jensen, 7H10 and 7H11, and also other AST methods, such as BACTEC MGIT 960 [34][35][36]. Caution must, of course, be applied when comparing cut-offs derived using fundamentally different growth media and AST methods.
The ECOFF/ECVs for nine drugs proposed by a series of 7H10 agar studies all either agree or are one doubling dilution different to our proposed ECOFF/ECVs for the UKMYC plates [37][38][39]. More recently, there has been a push to set breakpoints for the new compounds DLM and BDQ so that AST can be performed for these important drugs [40]. An early MGIT study using 194 isolates proposed an ECOFF/ ECV for DLM of 0.125 mg·L −1 [41], which is identical to our value. An ECOFF/ECV of 0.125 mg·L −1 for BDQ on broth microdilution plates was proposed [42]; however, the 95th percentile of the wild-type population was used to define the ECOFF/ECV since CLSI guidelines were followed [32] and ECOFFinder was used despite the truncated nature of the MIC histograms. This value was supported by two subsequent studies, the first of which showed that that the sensitivity and specificity is maximised with an ECOFF/ECV of 0.12 mg·L −1 compared with 0.25 mg·L −1 [43]. The second study confirmed this value, but also stated that the 99th percentile of the wild-type population was 0.25 mg·L −1 [44]. This illustrates that the exact value can be difficult to pin down when different ECOFF/ECV definitions are used; hopefully, the number and diversity of samples in our study will help resolve this important question. Lastly, the ECOFF/ECVs proposed here lie within the range of breakpoints recommended by the WHO for different growth media, with the exception of CFZ for which the WHO recommends a cut-off of 1 mg·L −1 in MGIT [35,36].
Deriving ECOFF/ECVs from MICs relies on several assumptions, foremost that applying a binary resistant/susceptible classification to a clinical infection is a reasonable and helpful way to proceed. That simplifying the description of the results of clinical microbiology investigations helps interpretation is not in doubt [45]; however, problems with reproducibility can arise depending on the character of the underlying MIC histogram. If the MIC histogram is "bimodal" (i.e. has two narrow peaks separated by an interval greater than their individual variance) then placing the ECOFF/ECV between the peaks leads to a helpful and reproducible classification system [46]. On the UKMYC series of plates, the only drugs that conform to this ideal are the rifamycins and the aminoglycosides; the other compounds either have more complex distributions (INH, EMB, MXF, LEV and ETH) or resistance is not sufficiently prevalent for us to fully characterise their MIC distributions (CFZ, LZD, DLM and BDQ).
There is a further implicit (weak) assumption that the "susceptible" and "resistant" subpopulations can each be described by a single MIC distribution, which is not necessarily true, as exemplified by the effect of the fabG1 promoter mutations on INH (figure 6a) [28]. In addition, this assumption implies that different lineages behave similarly when exposed to an antibiotic, which is unlikely to be true [47]. Finally, the wild-type distribution is usually assumed to be log-normal; however, our data do not support this for all drugs (e.g. LEV; figure 4) resulting in the log-normal distributions fitted by interval regression apparently overestimating the 99th percentile. Should this turn out to be generally true, this would invalidate methods based on fitting such distributions, making direct measurement more appealing [25].
One can deconstruct the error in determining an ECOFF/ECV using a microtitre plate into sample selection biases, data entry and labelling errors, inoculation and incubation errors, measurement errors, errors in defining the wild-type population, uncertainties arising from censored data, and errors in fitting a curve to the resulting MIC histogram. In addition to the obvious benefits in collecting such a large and diverse dataset (table 1, figure 1 and supplementary table S2), we have been careful to minimise measurement errors (supplementary figure S3) and have also used a principled method to attempt to remove some putative mislabelled samples (Methods). By defining a gWT population and either applying interval regression to fit normal distributions (figure 4) or directly measuring the 99th percentile (figure 5), we have also minimised the final two sources of error. Despite these steps, further sources of error no doubt remain.
Another key weakness of this study is the lack of PZA, which due to its preference for acidic conditions, is currently unable to be successfully incorporated onto broth microdilution plates, although there is hope that this could be rectified in future [48].
The debate about how to define and calculate ECOFF/ECVs will continue and new approaches will be suggested [49][50][51][52]. However it evolves, larger and more geographically diverse TB datasets, such as presented here, will bring more confidence and rigour to the AST of clinical TB samples. We hope also, that as clinical microbiology transitions into a data-driven science, our proposed method of directly measuring the required percentile of the gWT population will gain traction due to its simplicity and reproducibility as the genetics of M. tuberculosis resistance becomes better understood and accepted.
Although the main objective of the CRyPTIC project is to map the genetic variations in the M. tuberculosis genome that confer resistance to many antibiotics, the sheer number of samples collected provides a body of evidence to support the use of 7H9 broth microdilution plates in clinical mycobacteriology. Applications potentially include AST for samples that are predicted to be MDR or extensively drug resistant (XDR) by the GeneXpert RIF/MDR assay system, or surveying the prevalence of different patterns of resistance by region or country, allowing regional regimens to be designed and their impact monitored. Finally, even in settings which adopt genetics-based clinical microbiology [11], it would be prudent to maintain culture-based testing not only to identify new genetic variants as they arise but also to continuously monitor the performance of the genetic resistance catalogue which is likely to change over time as such catalogues are only likely partly causal.
In future work the CRyPTIC project will apply the ECOFF/ECVs proposed here not only to further optimise a genetic catalogue for the first-line anti-TB compounds [6] but also to extend coverage to second-line, repurposed and new compounds, with the aim of covering as many of the drugs recommended by the WHO for treating MDR-and XDR-TB [19]. Clearly the numerical data being collected by the CRyPTIC Consortium also lends itself to the development of a genetic catalogue for anti-TB compounds that can make quantitative predictions; such a catalogue would naturally take account of additivity, epistasis and non-linear effects. Finally, the tools and data-driven approaches developed here could be applied to other pathogens, especially other mycobacteria.

Ethics review
Approval for the CRyPTIC study was obtained by the Taiwan

Sample selection
The CRyPTIC project aimed for around half the samples collected to be susceptible to the first-line compounds with the remainder MDR/XDR. There was, however, large variation between the different participating laboratories.

Incubation and inoculation protocol
Each laboratory followed a standard operating protocol laid out by the CRyPTIC Consortium, which was similar to that described previously [17]. Clinical samples were subcultured either using Löwenstein-Jensen tubes, 7H10 agar plates or MGIT tubes. The protocol specified that first a suspension at 0.5 McFarland standard in saline Tween with glass beads (Thermo Fisher Scientific, Waltham, MA, USA) from 20-25-day-old colonies be prepared. These were then diluted 100-fold by adding 100 µL suspension to 10 mL enriched 7H9 broth [17]. A semi-automated Sensititre Autoinoculator (Thermo Fisher Scientific) was used to dispense 100 µL inoculum (1.5×10 5 CFU·mL −1 , with approximate range from 5×10 4 to 5×10 5 CFU·mL −1 ) into a well of a UKMYC5/6 microdilution plate. The plate was then sealed using transparent plastic provided by the manufacturer. The UKMYC5 and UKMYC6 microdilution plates were designed by the CRyPTIC Consortium and manufactured by Thermo Fisher Scientific. The drugs included and their concentrations are described in figure 1. DLM and BDQ pure substances were provided by Otsuka Pharmaceutical (Tokyo, Japan) and Jannsen Pharmaceuticals (Beerse, Belgium), respectively. H37Rv ATCC 27294 was used to perform periodic quality control runs since it is susceptible to all the drugs on both plate designs.

Measurement of MICs after 14 days incubation
In each laboratory a scientist read each plate after 14 days incubation using a Thermo Fisher Sensititre Vizion digital MIC viewing system, with results entered via a bespoke web portal, CliRes2 (https://clires2. oucru.org). In those cases where this was not possible, spreadsheets were sent. A photograph was also taken using the Vizion system and also stored in CliRes2. Two laboratories used a mirrored-box to read the plates and one of these also took a photograph using a DSLR camera. A plate was marked as invalid if it did not have adequate bacterial growth in both positive control wells. A small subset of plates with poor growth at day 14 were incubated for a further week and then read again.

Other AST measurements
Where available, the results of standard AST tests conducted by the participating laboratory were also entered via the CliRes2 online portal or, in some cases, shared via spreadsheet. The methods used were mainly either the MGIT system or MODS assay [31]. All MGIT tests used standard critical concentrations; for the MXF results only those with a critical concentration of 0.5 mg·L −1 were included.

Genetic sequencing and interpretation
Sequencing arrangements differed slightly between each CRyPTIC participating laboratory. All sequencing was performed using Illumina machines and hence the input to our genetic sequencing pipelines was a matched pair of FASTQ files containing the short reads. Data integrity was ensured throughout by tracking the MD5SUM hashs of the FASTQ files.
Human and HIV reads were removed from the raw sequence data as follows. Reads were mapped to the reference genome H37Rv, the human genome version GRC38, the HIV reference NC_001802.1, various other viral genomes (so that, if any reads mapped to HIV, one would only know that they mapped to some virus) and nasopharyngeal flora genomes from the human microbiome project, using the BWA-MEM aligner. First, a read pair was kept if either read matched H37Rv, then removed if either read matched one of the other genomes and finally kept if both reads were unmapped.
Variants were initially called using SAMtools and Cortex, two variant callers with orthogonal strengths (SAMtools is a high sensitivity SNP caller; Cortex is a high specificity SNP and insertion/deletion caller). These calls were then passed to the adjudication software minos, which produces a graph representation of the reference genome plus conflicting calls from the two callsets and then remaps reads to the graph to adjudicate statistically. This adjudication process and the performance of the combined SAMtools/Cortex callset are documented [53]. All of this process, including versions of SAMtools and Cortex and the reference genomes for filtering, is encapsulated in Clockwork version 0.8.3 [54].
Samples were excluded from the dataset if they had either >100 000 unfiltered SAMtools variant calls (a weak filter applied to detect samples contaminated with the wrong species) or an average read coverage of ⩽15 when mapped to reads covering the H37Rv reference. The samples that pass these criteria and have paired phenotype data are named the GPI (geno-pheno intersection). Variant calls were removed if they overlapped a set of masked positions as previously defined [55]. This mask consists of 324 971 positions from the H37Rv reference with self-blast matches (https://github.com/iqbal-lab-org/cryptic_tb_callable_ mask/commit/43ec21319209b23f648f32e4868bdf07cf09f2a0).
Version 3 of the H37Rv strain (NC_000962.3) was used as the TB reference genome throughout. The resulting VCF files were then transferred to the CRyPTIC data warehouse where they were interpreted.
Genetic resistance catalogue A hybrid TB genetic resistance catalogue was constructed by merging two published catalogues. The first more recent catalogue contained rows for the four first-line drugs (INH, RIF, EMB and PZA) [6]. The second also contained rows for MXF, LEV, streptomycin (STM), ofloxacin (OFX), AMI, KAN, capreomycin (CAP), ETH, LZD, CFZ, DLM, BDQ, RFB, prothionamide (PTO) and PAS [8]. Since each catalogue was constructed with respect to version 2 of the H37Rv M. tuberculosis reference genome, they were first translated to version 3 of the reference. These catalogues are freely available to download [56], and use a standard grammar, GARC, that is both machine and human readable. To avoid putting as few assumptions into downstream code as possible, default rules are included that, for example, specify that non-synonymous amino acid mutations that match no other row have an unknown effect. The hybrid catalogue was constructed by taking the rows for the first-line compounds from the first catalogue and rows for all other drugs from the second. This catalogue, called CRyPTICv1.31, is freely available for download [56] and is also provided in the attendant repository [21].

Genetic analysis
Each sample VCF was compared to a reference genome object using v0.1.0 of the Python gumpy module [57], thereby creating a table of genetic variants (both SNPs and insertions/deletions). Both the individual SNPs were stored and also their aggregated effect on any amino acids in the coding region of any gene. An intergenic region of up to 100 bases upstream of the start codon was assumed to be the promoter sequence and hence was associated with the gene. This list of variants was then parsed by a second bespoke Python module, piezo, that reads the hybrid catalogue and understands the GARC grammar, and so returns a resistant, susceptible or unknown prediction for each drug in the catalogue [58]. The species and, if M. tuberculosis, lineage of all samples was determined by SNP-IT [23].

Data warehousing
With the exception of the compressed FASTQ files, all data (VCFs, images, MIC metadata, genetic variants and catalogue predictions) were aggregated and stored in a hierarchical file system using the Python datreant 1.0.2 module [59], which allowed for data discovery, tagging and filtering. Updates were performed by inhouse Python scripts. Plate metadata was downloaded from CliRes2 using the zeep Python SOAP client.

QA of MIC readings
Central to our QA process is the photograph taken of the plate after 14 days of incubation using the Vizion instrument by the laboratory scientist. Images were deduplicated by checking the MD5SUM was unique. The remaining images were first read by bespoke software, AMyGDA [60,61], which detects the locations of the wells and, by measuring the growth in each well, estimates an MIC for all drugs. For 54.7% of all measurements the MICs measured by the laboratory scientist and AMyGDA were identical (supplementary figures S4 and S5a), and therefore passed the QA process.
Images of the 45.3% of cases where these two methods disagreed were uploaded to a Citizen Science project, hosted by the Zooniverse platform, called BashTheBug [62]. Each image was classified by at least 11 different volunteers and the median reading was taken to be the consensus. In 38.1% of the images sent (17.3% of the total) the consensus MIC agreed with the MIC measured by the laboratory scientist using the Vizion instrument. Visual inspection of a random subset (supplementary figure S5b) suggested that these were mostly cases where AMyGDA incorrectly estimated the MIC, usually because the growth was too small to be programmatically detected.
For a smaller proportion (12.0% of the images completed by BashTheBug; 5.4% of the total) (supplementary figure S3), the BashTheBug consensus agreed with the MIC measured by AMyGDA. Visual inspection of a random subset (supplementary figure S5c) indicated that, for the most part, these were errors made by the laboratory scientist. An error rate of 5.4% for a subjective laboratory-based measurement is reasonable, and catching and correcting these errors is the main goal of this QA process.
Overall, therefore, we have a high degree of confidence in 77.4% of the MIC measurements since two or more independent methods concur on the value. Finally, in 22.6% of cases all three methods gave a different answer (supplementary figure S5d); these are excluded from further analysis. All these proportions are averaged over all drugs; there is significant variation between drugs (supplementary tables S7 and S8).
Putative mislabelled samples Some 44 samples were assumed to be mislabelled samples as defined by being gWT but having both an INH MIC ⩾1.6 mg·L −1 and an RIF MIC ⩾4 mg·L −1 . This corresponds to 0.8% of the dataset, which is likely an underestimate. All 44 samples were removed.

ECOFFinder
A version of ECOFFinder (ECOFFinderXL2011forMac.xlxs) that worked on Microsoft Excel running on Apple Mac computers was provided by Claudio Köser (University of Cambridge, Cambridge, UK) [25].

Interval regression
The intreg function in Stata version 15.1 (StataCorp, College Station, TX, USA) was used.

Reproducibility
The raw data ( photographs of 96-well plates, genetic variant call files) along with a series of data tables related by a schema can be downloaded from the European Bioinformatics Institute at http://ftp.ebi.ac.uk/ pub/databases/cryptic/. In addition, one can reproduce nearly all the tables and figures, along with the supplementary data, in a browser window (i.e. no installation required) using Python code we have made publicly available [21].