Abstract
Modular and meta-profiling identify a common transcriptional response of patients with TB versus healthy controls http://ow.ly/YvEP7
To the Editor:
Mycobacterium tuberculosis is estimated to have infected one third of the world's population and continues to be a significant cause of mortality and morbidity [1]. There is a need for new and improved diagnostics or treatment-monitoring tools and blood-based mRNA diagnostics are a potential solution [2]. Gene expression microarray analysis of human blood has been widely used to profile the host transcriptional response in active tuberculosis (TB) to identify potential biomarkers and better understand the host immune response [2]. So far, there has been a relative lack of concordance in the actual genes being identified from the published studies [2, 3], although there has been agreement in some of the pathways identified. Interferon (IFN) signalling has been identified as a dominant signature in many of the individual studies [2, 4]; however, when significant gene lists were combined from eight publicly available TB datasets, TREM1 (triggering receptor expressed on myeloid cells 1) signalling became the most significant pathway [5].
We collectively reanalysed the publicly available datasets using differing methodologies to identify robustly differentially expressed genes that could distinguish active TB from controls. These genes are potential candidates for blood-based mRNA biomarkers of active disease and could provide valuable information regarding the immune and inflammatory response underlying TB pathogenesis.
We undertook a comprehensive search of PubMed and microarray depositories. Publicly available datasets that had active TB patients versus healthy controls, latently infected or patients post-treatment were identified and retained. The latter three cohorts are synonymous transcriptionally at the group level [4, 6, 7]. HIV-infected individuals were excluded from this analysis.
Where possible, data were imported in their raw format, Illumina and Agilent data were 75th centile normalised, and Affymetrix data were RMA (robust multichip average) quantile normalised. If raw data were not available then the authors' normalisation was used. All datasets were then filtered for low-expression transcripts (transcripts with expression 2 fold-change from the median in ≥10% of all samples retained) followed by statistical filtering (independent t-test with Benjamini–Hochberg multiple testing correction q-value <0.05). Probe/transcript IDs were matched to Entrez gene identifiers for each dataset. Multiple represented genes were filtered and the most significant (by q-value) retained. Venn Mapping [8] was used to check significance of the overlaps between any two datasets. Meta-profiling [9] of the significant gene lists was undertaken to identify the number of overlaps required for inclusion in the meta-signature. Only those genes that were expressed in a consistent direction of regulation for at least the number of determined overlaps were retained as the meta-signature.
Modular analysis [10] was undertaken of compatible datasets; it was not possible to analyse GSE56153, GSE34608 and GSE28623 with this method as their technology platforms were not supported by the tool. Canonical pathway, gene network analysis, gene function annotation and upstream analyses were generated through the use of IPA (Ingenuity Pathway Analysis; Ingenuity Systems, Qiagen, Redwood City, CA, USA).
16 datasets were included in the meta-analysis (GSE39939, GSE39940, GSE54992, GSE37250, GSE31348, GSE36238, GSE42825, GSE42826, GSE42830, GSE40553, GSE56153, GSE34608, GSE28623, GSE19444, GSE19442 and GSE19439). Modular analysis of these datasets revealed similarities, with overexpression of modules annotated as cytotoxic, IFN, inflammation and dendritic cells/apoptosis, and underexpression of modules annotated as B-cells, T-cells, lymphocyte activation and mitochondrial stress (figure 1a). Two datasets with the smallest cohort sizes (GSE54992 and GSE36238) had different weaker modular patterns with fewer modules identified as significantly different from the control group.
Independently for each dataset, differentially expressed genes between controls and TB groups were identified. There was significant overlap between the differentially expressed gene lists across the datasets, with no difference in degree of overlap dependent on the choice of control group (figure 1b). Meta-profiling identified 380 genes that were identified in nine or more datasets in a consistent direction of regulation (figure 1c). Upregulated genes were more consistently identified across datasets than downregulated genes (figure 1d). Five genes were identified in all 16 datasets: AIM2, BATF2, FCGR1B, HP and TLR5.
IFN-γ was the top predicted upstream regulator of these 380 meta-signature genes, with 54 genes directly or indirectly linked to IFN-γ within the IPA database (data not shown). The 380-gene meta-signature had enrichment for IPA canonical pathways involved in pattern recognition, IFN signalling, interleukin-6 signalling, TREM1 signalling and complement (data not shown). Based on these findings, a curated cartoon summarising the major functional groups of genes and their relationships was created (figure 1e).
In this study, we have identified a 380-gene meta-signature of active TB compared with healthy controls, patients post-treatment and asymptomatic latently infected individuals, which showed enrichment for both innate and adaptive immune functions.
Two main methodologies were used to analyse the publicly available data: modular analysis and meta-profiling. Modular analysis depends on identifying differences in coordinately expressed groups of genes (modules) rather than individual genes [10]. We identified remarkable similarity between the datasets, with overexpression of modules annotated with IFN, inflammation functions, monocyte and neutrophil functions, and underexpression of B- and T-cell modules. These findings are in keeping with the individual studies that have included grouped modular analysis [4, 7]. Where grouped modular profiles were less consistent, this may have resulted from the small cohort sizes used and emphasises the need for individual studies to be appropriately powered to detect all differentially expressed genes.
Using meta-profiling we found that 380 genes were consistently differentially expressed in nine or more datasets, with five genes identified as differentially expressed in all 16 datasets. These five genes were AIM2, BATF2, FCGR1B, TLR5 and HP, which have been shown to potentially play a role following M. tuberculosis infection [11–14]. A role for TLR5 has yet to be described and identification of such differentially regulated genes may therefore be part of a programmatic response rather than specifically targeted for a tailored response to the pathogen.
Analysis of the 380 genes comprising the meta-signature identified IFN-γ as the most significant potential upstream regulated molecule, with a large network of IFN-γ-regulated genes present within the 380 genes. IFN-γ is critical for control of mycobacterial disease in humans, with mutations either in the IFN-γ receptor or STAT1 (signal transducer and activator of transcription 1) resulting in increased susceptibility [15]. However, upregulation of gene expression molecules downstream of type I IFN signalling was also observed, which is of relevance to TB exacerbation since type I IFN has been shown to antagonise signalling downstream of IFN-γ [15]. Thus, capturing the overall picture of significant enrichment may be more informative than identification of one individual pathway, as shown in the summary cartoon (figure 1e). A number of immune pathways/functions are enriched for within the meta-signature, including multiple pattern recognition receptors, cytokines, the inflammasome, complement and immunoglobulin. This supports diverse findings obtained from both mouse and human studies that the immune response following M. tuberculosis infection is complex and can be cross-regulatory [15].
This study confirms the reproducibility of blood-based transcriptional analysis to identify the innate and adaptive host response in TB. It also identifies that upregulated mRNA transcripts are more reliably identified and highlights several mRNA candidates that could collectively be used as potential biomarkers of active disease. These findings have implications for the design and implementation of mRNA expression tools to support diagnostics and treatment monitoring of TB.
Footnotes
Support statement: S. Blankley and A. O'Garra were supported by the Medical Research Council (MRC), UK (grant U117565642), now The Francis Crick (A. O'Garra: Crick Budget 10126). S. Blankley was jointly funded by the UK MRC and the UK Dept for International Development (DFID) under the MRC/DFID Concordat agreement (grant MR/J010723/1). G. Santis was supported in part by the Dept of Health via the National Institutes of Health Research comprehensive Biomedical Research Centre award to Guy's and St Thomas' National Health Service Foundation Trust in partnership with King's College London. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Funding information for this article has been deposited with FundRef.
Conflict of interest: Disclosures can be found alongside the online version of this article at erj.ersjournals.com
- Received September 20, 2015.
- Accepted February 10, 2016.
- Copyright ©ERS 2016