Abstract
Sarcoidosis is a complex systemic inflammatory disease of unknown aetiology that is influenced by a variety of genetic and environmental factors.
To identify further susceptibility loci for sarcoidosis, a genome-wide association study (GWAS) was conducted in 381 patients and 392 control individuals based on Affymetrix 100k GeneChip data. The top 25 single-nucleotide polymorphisms (SNPs) were selected for validation in an independent study panel (1,582 patients versus 1,783 controls).
Variant rs10484410 on chromosome 6p12.1 was significantly associated, with a Bonferroni-corrected p-value of 2.90×10−2 in the validation sample and a nominal p-value of 2.64×10−4 in the GWAS. Extensive fine mapping of the novel locus narrowed down the signal to a region comprising the genes BAG2, C6orf65, KIAA1586, ZNF451 and RAB23. Verification of the sarcoidosis-associated nonsynonymous SNP rs1040461 in a further independent case–control sample and quantitative mRNA expression studies point to the RAB23 gene as the most likely risk factor. RAB23 is proposed to be involved in antibacterial defence processes and regulation of the sonic hedgehog signalling pathway.
The identified association of the 6p12.1 locus with sarcoidosis implicates this locus as a further susceptibility factor and RAB23 as a potential signalling component that may open up new perspectives in the pathophysiology of sarcoidosis.
Sarcoidosis (Mendelian Inheritance in Man (MIM) 181000) is a multisystem disease with a high risk of becoming chronic that is characterised by noncaseating epitheloid cell granulomas that can manifest in virtually any organ system, most frequently in the lung. It is known as a disease of young adults, with an annual incidence between <1 and 64 cases per 100,000 persons, depending on ethnicity and geographic region [1]. The cause of sarcoidosis is still unknown but it is thought to be triggered by a complex combination of yet unknown environmental [2, 3] and genetic factors [4]. Evidence for a strong genetic component is provided by twin and family studies, which have shown a higher concordance rate of sarcoidosis in monozygotic compared with dizygotic twins (0.148 versus 0.012) and a heritability of 66% [5]. The genetic underpinning of sarcoidosis is further supported by known genetic risk factors located in the human leukocyte antigen (HLA) region (reviewed in [6]) and the discovery of the first confirmed susceptibility gene butyrophilin-like 2 (BTNL2; MIM 606000) on chromosome 6 [7, 8]. Besides these disease loci, numerous association studies of potential candidate regions suggested additional susceptibility genes, such as the chemokine receptors chemokine receptor 2 (CCR2; MIM 601267) and chemokine receptor 5 (CCR5; MIM 601373), the tumour necrosis factor-α (TNFA; MIM 191160), and several other HLA loci (for a review see [4, 6]). However, many of these show conflicting results or await replication. Most recently, we identified annexinA11 (ANXA11; MIM 602572) as a further disease-related gene in sarcoidosis in a genome-wide association study (GWAS) using the Affymetrix 5.0 chip (Affymetrix, Santa Clara, CA, USA) [9]. In addition, we discovered a common susceptibility locus shared by sarcoidosis and Crohn’s disease (MIM266600) on chromosome 10p12.2, based on a joint analysis of the Affymetrix 100k GWAS scans for the two diseases [10].
The 100k and 5.0 chips share only a small proportion of their single-nucleotide polymorphisms (SNPs) (<15% of the 100k array SNPs are also present on the 5.0 set). Therefore, we analysed our 100k sarcoidosis data set, which was formerly only part of the combined analysis with Crohn’s disease (pooled data) [10], to identify susceptibility loci that could not have been detected by the 5.0 chip due to the difference in marker coverage.
METHODS
Patient recruitment and phenotyping
German sarcoidosis patients of panels A (GWAS), B (validation) and C (fine mapping) were recruited as previously described [7, 9, 11] (for details see Appendix S2 in the online supplementary material). The diagnosis of the participating patients was established on the basis of the International Consensus Statement on Sarcoidosis [12] and by histological demonstration of non-necrotising granuloma in an involved organ, most commonly the lung. Only patients with classical Löfgren’s syndrome were recruited without histological support. According to the clinical presentation of the disease, patients were classified as having chronic or acute sarcoidosis as described previously [13]. Patients (n=210) who could not be classified unequivocally concerning the course of the disease were excluded from the subphenotype-specific analysis that was performed in the fine-mapping stages. German control individuals of panels A, B and C were obtained from the PopGen Biobank (Montreal, QC, Canada) [14]. Panel D (replication) comprised sarcoidosis patients from different European locations that were recruited within the European network Genotype–Phenotype Relationship in Sarcoidosis (GenPhenReSa) (for details see the Acknowledgements section and Appendix S1 in the online supplementary material) and German controls (n=2,564) that were recruited as reported in detail previously [15].
Information on SNP genotyping and selection, statistical analysis, analysis of tissue-specific expression by RT-PCR, bronchoalveolar lavage cell samples, and mRNA isolation and real-time PCR can be found in Appendices S3–S10 in the online supplementary material.
RESULTS
GWAS analyses
After applying conservative and established quality filters to the dataset (see Methods section), 773 German samples (381 cases and 392 controls) and 97,088 SNPs were included in the association analysis of the screening stage. At a nominal significance level of 0.05, the experiment had 63% pre hoc power to detect variants with an odds ratio (OR) of ≥1.5 when assuming a frequency of the disease-associated allele of 20% in control subjects (see fig. S3 in the online supplementary material). The estimated population stratification was small, with a genomic inflation factor of λGC = 1.076 (1.0 indicating no inflation). Subsequent association results were corrected according to this inflation in the Chi-squared test statistic; the resulting quantile–quantile plot is shown in figure 1.
Validation of lead variants
25 SNPs from the screening stage that passed the aforementioned criteria (see Methods section) were genotyped in the independent validation panel B (1,783 German controls and 1,582 German sarcoidosis patients). Association analysis results for these SNPs are shown in table 1 and full analysis results including genotype counts are listed in table S1 in the online supplementary material. Only one marker, rs10484410 on chromosome 6p12.1, passed the Bonferroni correction for multiple testing (corrected p=25×1.17×10−3 = 2.90×10−2) and was significantly associated with sarcoidosis (see also genome-wide plots of the chromosomes; fig. S4 in the online supplementary material). The allelic OR for the rare G allele of rs10484410 was 1.26 (95% CI 1.10–1.44). Only 5.8% of the control individuals were homozygous for the risk allele G of rs10484410, while 7.1% of the sarcoidosis cases were homozygous for it.
The scan did not confirm our previously reported sarcoidosis associations with BTNL2 [7] and ANXA11 [9] in the screening. Both loci were replicated in independent European populations (ANXA11 [19] and BTNL2 [20]) and in Americans (BTNL2 [8]) and can be considered as true associations. However, not a single SNP in the ANXA11 or in the BTNL2 gene was present in the Affymetrix GeneChip® Human Mapping 100k set, which shows <15% marker overlap with the 500K Array Set. Conversely, in the previously published GWAS using the Affymetrix 5.0 chip [9], the lead SNP rs10484410 itself was not included, but rs1044670 (SNP_A-1824553; p=9×10−3) was found to be strongly associated in the fine-mapping stage. However, the SNP ranked only very low in the previous GWAS (rank 6,229). The results of the validation stage were verified using TaqMan genotyping as an independent technology (>99.8% genotype concordance).
Fine mapping around rs10484410 (6p12.1)
In addition to the lead SNP rs10484410, 44 HapMap tagging SNPs (tagSNPs) were selected for the fine mapping of ∼800 kb of the 6p12.1 region (see Methods section) carried out in panel C. 41 out of the 44 SNPs passed the abovementioned quality criteria (see Methods section). Nine markers yielded a p-value <0.05 in the analysis; detailed results, including genotype counts for the overall sarcoidosis panel, are shown in table S2 in the online supplementary material. The results were not corrected for multiple testing because the aim of the panel C analysis was to identify the probable source of the association signal that was established in panel B. No consistent difference in the significance of the association signal at the markers tested could be observed between the chronic and acute phenotype (see tables S2 and S3 in the online supplementary material).
Linkage disequilibrium (LD) between rs10484410 and the nine additional SNPs with a significant association from panel C varied greatly (r2 = 0.41–0.99; see table S2 in the online supplementary material). Of the 10 markers, SNP rs1411578 showed the strongest association (p=6.64×10−4). Marker rs1411578 (G>C) is located in exon 7 (3′ untranslated region (UTR)) of the Ras-related protein Rab23 (RAB23; MIM 606144). The minor G allele had a frequency of 17% in affected individuals and 14% in the control subjects, and 3.5% of the patients and 2% of the control individuals were homozygous for it. Figure 2 gives an overview of the association signals, the conservation status, the genes and the LD structure at the 6p12.1 locus.
Extended fine mapping
Since the results of the first fine-mapping stage pointed to RAB23, which is sharply delineated by loci with increased recombination rates at positions 57,190 kb and 57,271 kb based on HapMap data (fig. 2b), we aimed to ensure that the association signal was limited to the RAB23 genomic region and did not extend further up- or downstream. We therefore genotyped the samples from panel C for another 46 HapMap CEU-based tagSNPs that covered a ∼105-kb region surrounding the lead SNP rs10484410 and the RAB23 genomic region (table S3 in the online supplementary material).
Looking at the total mapped region (∼900 kb), the most significantly associated SNP was rs7756421, located in the 3′UTR of the zinc finger protein 451 (ZNF451) gene. The marker is part of a region of high LD (r2≥0.8), which includes eight other associated variants (p≤0.001) that map to RAB23 (rs11398, rs1411578, rs1547226 and rs3800018), ZNF451 (rs6459178, rs17619360 and rs10484410) and to a nongenic region (rs12190575) (fig. 3). A nonsynonymous SNP (nsSNP) (rs1040461) in the RAB23 gene that showed a significant association (nominal p-value (pnom) = 7.83×10−3) with sarcoidosis as well was not part of this set of highly correlated markers.
We used the information criterion-based backward model selection of Aikaike [21] in a logistic regression analysis to determine whether all of these nine significant signals could be attributed, through LD, to one or more underlying causative variants. Model selection confirmed that the observed association signal of all the highly correlated SNPs had a single origin. If the significantly associated nsSNP rs1040461 was also included, it remained in the model after selection, thus potentially representing an independent association signal.
Replication of rs10484410, rs7756421 and rs1040461
The GWAS lead SNP rs10484410, the SNP with the strongest association in the fine-mapped region (rs7756421) and the nsSNP rs1040461 in the RAB23 gene were genotyped in panel D for replication. We assigned each individual from our panel to the closest subpopulation of a European data set described elsewhere [22] and subsequently included the subpopulation-specific average values of the first six principal components (PCs) of the genome-wide data of this reference panel to adjust for population stratification. Without adjustment for sampling location, markers rs10484410 and rs7756421 showed significant association with sarcoidosis on the allelic level (pnom = 3.30×10−3 and pnom = 5.10×10−3, respectively), while marker rs1040461 did not. However, after inclusion of the first six PCs in the model, rs1040461 showed a nominal significance of pnom = 1.05×10−2, which remained significant after correction for multiple testing, whereas the other two markers became nonsignificant (prs10484410 = 1.40×10−1 and prs7756421 = 1.80×10−1; table 2). Analysis of only German cases and controls yielded a similar result for rs1040461 (p=1.20×10−3 without adjustment for sampling location and p=1.05×10−2 with adjustment), but a minor decrease in significance for the other two markers (prs10484410 = 6.90×10−2 and prs7756421 = 8.50×10−2 without adjustment, and prs10484410 = 1.40×10−1 and prs7756421 = 1.80×10−1 with adjustment).
Expression analysis of candidate genes
To narrow down the association signal by plausible biological reasoning, the transcript levels of the five genes (BAG family molecular chaperone regulator 2 (BAG2; MIM 603882), BEN domain-containing protein 6 (C6orf65; BEND6), uncharacterised protein KIAA1586, ZNF451 and RAB23) located in the fine-mapped region were first assessed by RT-PCR in a panel of different human tissues. As shown in figure 4, only RAB23 and ZNF451 showed high expression in the lung, whereas BAG2 and KIAA1586 expression in this tissue was considerably lower. C6orf65 mRNA could only be detected in brain tissue and in a few other tissues at an extremely low level. Interestingly, a high expression of RAB23 also became apparent in small intestine and colonic mucosa.
Next, we analysed the expression levels of the candidate genes in cells derived from bronchoalveolar lavage (BAL) using quantitative real-time PCR and cDNA from sarcoidosis patients and controls (n=5 per group). As shown in figure 5, four out of the five candidate genes were expressed in BAL cells, while C6orf65 (BEND6) mRNA was undetectable. Most interestingly, statistical analysis revealed that only RAB23 displayed significant differences in relative expression levels between patients and controls (p=2.94×10−3 based on Mann–Whitney U-test for nonparametric data). Compared with controls, BAL cells from sarcoidosis patients exhibited up to a three-fold increase in RAB23 mRNA levels, further emphasising the potential involvement of this gene in the pathogenesis of sarcoidosis.
For further expression analysis see the expression quantitative trait loci (eQTL) results in the online supplementary material (fig. S5A–S7C).
DISCUSSION
We found evidence for a novel sarcoidosis susceptibility locus on chromosome 6p12.1 that harbours the five candidate genes C6orf65 (BEND6), BAG2, KIAA1586, ZNF415 and RAB23. This result was obtained by the analysis of genome-wide case–control association scan data with >97,000 SNP markers and by extensive fine mapping of the validated region. The scan did not confirm the sarcoidosis locus ANXA11 that has recently been discovered using the Affymetrix 5.0 chip. Not a single SNP in the ANXA11 gene was present in the Affymetrix GeneChip® Human Mapping 100k set that shows <15% marker overlap with the 5.0 set. Conversely, the 5.0 chip covered the newly discovered risk locus on chromosome 6p12.1 with 16 SNPs, including the highly associated SNP rs1044670. Despite this being a true signal, the SNP ranked below any feasible cut-off for replication in the previous GWAS (rank 6,229) and was, therefore, not validated.
The allelic OR for the rare allele of the detected GWAS lead SNP rs10484410 was moderate (1.26). We have to point out here that the low prevalence of sarcoidosis in the general population strongly limits the number of available samples. In turn, the recruitment of enough samples to achieve genome-wide significance for genetic factors conveying moderate-to-small risks or with low allele frequencies may be infeasible. In this regard, our study design (with GWAS as a hypothesis-generating step, leading to a small correction factor for multiple testing in the validation stage) enabled the detection of such a low-risk factor for sarcoidosis.
Fine mapping of the novel locus revealed a strong association of sarcoidosis with several SNPs in a small region of high LD spreading from C6orf65 (BEND6), over ZNF451, to RAB23. Among these SNPs are four putative functional variants: rs1044670 (c.*1109G>A), located in the 3′UTR of C6orf65 (BEND6); rs1411578 (c.*416G>C) in the 3′UTR and rs1040461 (c.619G>A) in exon 7 of RAB23; and rs7756421 (c.*742A>G) in the 3′UTR of the ZNF451 gene. Logistic regression analysis suggested that the association signal might be mainly driven by more than one variant in RAB23, including the missense mutation rs1040461. This SNP changes the protein sequence by the amino acid substitution of glycine for the small, polar amino acid serine (G207/S207) in a non-domain-containing region of the gene. Replication in a further panel revealed that marker rs1040461, but not rs7756421 or the GWAS lead SNP rs10484410, appeared to be associated with sarcoidosis in several European populations after adjustment for sampling location. This finding strongly supports the hypothesis of rs1040461 being a true sarcoidosis risk variant.
Analysis of tissue-specific mRNA expression profiles demonstrates high levels of RAB23 and ZNF451 in healthy lung. Moreover, RAB23 was the only candidate gene displaying highly significant differences in relative expression between sarcoidosis patients and controls, indicating a potential involvement of RAB23 in disease pathogenesis. The pathophysiological relevance of our findings is indicated by an eQTL analysis that identified alternations in disease-associated processes (e.g. response to external stimulus, cellular defence response and metabolic processes) in response to the presented genetic variation. Although these findings do not prove that the observed overexpression of RAB23 results from variation in rs1040461 or another potentially causative variant in the associated region, they do provide a valuable starting point for future studies elucidating the role of RAB23 and its variants in the aetiopathogenesis of sarcoidosis.
The RAB23 gene belongs to the Rab family of 160 small guanosine triphosphatases that regulate intracellular trafficking of membrane-associated proteins [23, 24]. Based on microarray expression data (UCSC GNF Expression Atlas 2 Data and Affymetrix All Exon Microarrays; see URL 8 in the online supplementary material), RAB23 is more highly expressed in bronchial epithelial and thyroid cells, in addition to uterus and cerebellum tissue. This appears to be consistent with our expression results in lung tissue (see earlier).
The potential involvement of this locus in sarcoidosis pathogenesis remains to be unravelled and can, as yet, only be proposed as a working hypothesis. Sarcoidosis is a systemic immune disorder in which T-cell-mediated inflammation causes the formation of granulomas, which resemble a delayed hypersensitivity reaction. A delayed hypersensitivity reaction may be caused by the intracellular presence of antigens of chemical or microbial origin. Many reports describe the presence of microbial cell wall agents in tissues of patients with sarcoidosis and several clinical studies demonstrate the occurrence of microbes in patients with sarcoidosis [25, 26]. Since RAB23 plays a role in the antibacterial activity of the endogenous autophagy machinery [27], it is biologically plausible that dysfunction might lead to impaired autophagic clearance after exposure of the lung epithelium to bacterial pathogens. The high expression of RAB23 not only in lung but also in small intestine and colonic mucosa may point to a broader potential role of RAB23 for antibacterial defence mechanisms in epithelial barrier organs.
However, the RAB23 protein has also been implicated in facilitating vesicular transport, controlling endocytic progression to lysosomes [27, 28] and, particularly, in antagonising sonic hedgehog (Shh) signal transduction in neural systems [29, 30]. Interestingly, the Shh pathway may also play a role in chronic lung fibrosis and immune system communication [31], thereby providing alternative views regarding the involvement of RAB23 in sarcoidosis. Components of the Shh cascade have been identified in the adult immune system, participating in CD4+ T-cell activation, and studies on fibrotic pulmonary disorders have demonstrated Shh in both human and mouse lung restricted to areas of active disease (for review see [32]). It is assumed that Shh signalling may contribute to epithelial repair and may act as an intermediary in cross-talk between damaged epithelium and the immune/inflammatory system. In sarcoidosis, activated pulmonary T-helper type 1 cells are essential for the inflammatory process and particular CD4+ T-cell subsets can be found at dramatically increased levels in BAL fluid of sarcoidosis patients with active disease. It is possible that RAB23 contributes to the disease through dysfunction within the Shh pathway leading to an over-activation of CD4+ T-cells or to an inadequate repair process in the damaged lung in pulmonary sarcoidosis.
Although there is evidence indicating that variations in RAB23 confer susceptibility to sarcoidosis, it is also possible that the causative variants are located in ZNF451, C6orf65 (BEND6) or another, as yet undefined, genetic element. ZNF451, also known as COASTER, KIAA0576 and KIAA1702, is conserved across vertebrates, is ubiquitously expressed and has been suggested to act as a coactivator for steroid receptors; it might also be involved in transcriptional regulation [33]. The putative gene C6orf65 (BEND6) is a complex locus that appears to produce several proteins with no sequence overlap and of as yet unknown functions.
To conclude, this is the first report of an association between sarcoidosis and the 6p12.1 locus that comprises several genes, a likely candidate being RAB23. The importance of this observation should be evaluated by further delineating the biological role of RAB23 in sarcoidosis.
Acknowledgments
The members of the GenPhenReSa Consortium are A. Günther (Lung Center, University of Giessen, Giessen, and Lung Clinic Waldhof-Elgershausen, Greifenstein, Germany), A. Dubaniewicz (Dept of Pneumonology, Medical University of Gdansk, Gdansk, Poland), S. Pabst (Medical Clinic II, Dept of Pneumology, University of Bonn, Bonn, Germany), D. Bumbacea (Marius Nasta Institute of Pulmonology, Bucharest, Romania), A. Prasse and A. Grubanovic (Dept of Pneumology, University Medical Center, Albrecht Ludwigs University, Freiburg, Germany), P. Rottoli and E. Bargagli (Respiratory Diseases Section, Clinical Medicine and Immunology, University of Siena, Siena, Italy), P. Bresser (Dept of Pulmonology, Academic Medical Center, Amsterdam, the Netherlands), V. Poletti (Ospedale GB Morgagni, Dipartimento di Malattie dell'Apparato Respiratorio e del Torace UO Endoscopia Toracica, Forli, Italy), M. Luisetti (Clinica Malattie Apparato Respiratorio, IRCCS Policlinico San Matteo-Pavia, University of Pavia, Pavia, Italy), and V. Vucinic and J. Videnovic-Ivanov (Medical School Belgrade, Belgrade, Serbia).
The authors wish to thank all patients, families and physicians for their cooperation. The support of the Deutsche Sarkoidose-Vereinigung eV (Meerbusch, Germany), the PopGen Biobank (Montreal, QC, Canada) and the contributing pulmonologists is gratefully acknowledged. Finally, we thank the staff of the Institute for Clinical Molecular Biology (Kiel, Germany) for technical assistance.
Footnotes
This article has supplementary material available from www.erj.ersjournals.com
Support Statement
Experiments were performed at the Institute for Clinical Molecular Biology, Christian-Albrechts-University, Kiel, Germany. The study was supported by grants from the Federal Ministry for Education and Research in Germany (BMBF) within the National Genome Research Network (NGFN) (grant numbers 01GS0426 and 01GS0809) and by the German Research Foundation (DFG) through the Cluster of Excellence “Inflammation at Interfaces” as well as through the projects DFG MU 692/8-1 and MU 692/7-1.
Statement of Interest
None declared.
- Received January 5, 2011.
- Accepted April 2, 2011.
- ©ERS 2011