A Pooling Genome-Wide Association Study Identifies Susceptibility Loci and Signaling Pathways of Immune Thrombocytopenia in Chinese Han Population

Immune thrombocytopenia (ITP) is an acquired bleeding disease due to immune-mediated destruction of antilogous platelets and ineffective thrombopoiesis. Although the etiology of ITP remains unknown, genetic variants are thought to predispose individuals to the disease. Several candidate gene analyses have identified several loci that increased ITP susceptibility, but no systematic genetic analysis on a genome-wide scope. To extend the genetic evidence and to identify novel candidates of ITP, we performed a pooling genome-wide association study (GWAS) by IlluminaHumanOmniZhongHua-8 combining pathway analysis in 200 ITP cases and 200 controls from Chinese Han population (CHP). The results revealed that 4 novel loci (rs117503120, rs5998634, rs4483616, and rs16866133) were strongly associated with ITP (P < 1.0 × 10−7). Expect for rs4483616, other three loci were validated by the TaqMan probe genotyping assay (P < 0.05) in another cohort including 250 ITP cases and 250 controls. And rs5998634 T allele was more sensitive to glucocorticoids for ITP patients (χ2 = 7.30, P < 0.05). Moreover, we identified three overrepresented signaling pathways including the neuroactive ligand-receptor interaction, pathways in cancer, and the JAK-STAT pathway, which involved in the etiology of ITP. In conclusion, our results revealed four novel loci and three pathways related to ITP and provided new clues to explore the pathogenesis of ITP.


Introduction
Immune thrombocytopenia (ITP) is an autoimmune disorder characterized by low platelet count [1]. The incidence of adult ITP is about 5-10 cases/100,000 population annually in China [2]. However, the etiology of ITP is unclear and is considered multifactorial and polygenic in most cases. The research suggested genetic factor plays an important role in the pathogenesis of ITP [3]. Several susceptible genes of ITP have been identified by traditional candidate gene approaches including direct sequencing and polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP), but these mutations only explain a small fraction of ITP risk. The majority of heritability for ITP remains to be further elucidated.
Genome-wide association studies (GWAS) are a powerful tool in searching for gene variants of complex diseases by comparing single-nucleotide polymorphisms (SNPs) [4].
A number of repeatable susceptibility loci have been gradually translated into clinical treatment, prognosis, and pharmacological guidelines [5][6][7][8]. GWAS with pooled DNA has been widely used due to its rapid, efficient, and costeffective performance [9].
To extend the present genetic data and to identify the novel genetic and biological functional evidence of ITP, we firstly performed a pooling GWAS in 200 ITP patients and 200 control subjects from CHP using an IlluminaHumanOmniZhongHua-8 array scanning 862,620 SNPs across the autosomal region. By SNP-Map (singlenucleotide polymorphism microarrays and pools) analysis, our scanning revealed 4 novel loci (rs117503120, rs5998634, rs4483616, and rs16866133) were strongly associated with ITP from CHP. Furthermore, we validated the relationship between rs117503120, rs5998634, and rs16866133 and ITP by the TaqMan probe genotyping assay (P = 0:0019). Moreover, we analyzed the relationship of loci and clinical therapy and found rs5998634 had a positive association with response to glucocorticoids (χ 2 = 7:30, P = 0:03), suggesting that this SNP may have predictive value for the response to steroid treatment.
To provide further insight into the molecular function of these associated variants, we performed KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis based on the GWAS data. The most potential candidate pathways associated with ITP were the neuroactive ligand-receptor interaction, the pathways in cancer, and the JAK-STAT pathway.
In conclusion, our results suggest that these significantly associated loci, genes, and pathways may provide novel insights into the genetic etiology of ITP and novel clues for investigating the pathogenesis of ITP.

Patients and Controls.
This study was carried out in accordance with the principles of the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards and was approved by the Ethics Committee of the Second Affiliated Hospital of Nanchang University (No. Review [2010] No. (002)). All participants signed a written informed consent. A total of 450 adult ITP patients, who met the diagnostic criteria of consensus of Chinese experts on diagnosis and treatment of adult primary immune thrombocytopenia (version 2009, 10], from a Chinese Han population were enrolled during May 2010 to Feb 2017. None of the recruited ITP patients had hepatosplenomegaly or lymphadenopathy. In addition, the patients had normal or increased bone marrow megakaryocytes and significantly decreased peripheral blood platelet count. Familial ITP cases were not recruited in this study. In addition, patients with other types of thrombocytopenia such as heparin-induced thrombocytopenia or drug-induced thrombocytopenia were excluded. The 400 healthy unrelated control subjects were age-and sex-matched Chinese Han. Peripheral blood was collected from all participants including early-onset ITP cases and healthy controls. Clinical data from the two groups including platelet count (PLT), white blood cell (WBC), red blood cell (RBC), and hemoglobin (HB) were collected. The flowchart of two-stage sample collection was shown in Supplementary Figure 1.

Response to Glucocorticoid Treatment.
A total of 183 inpatients with ITP in the second stage were treated with glucocorticoids, including high-dose dexamethasone (HD-DEX) 40 mg daily for 4 days every 4 weeks and prednisone 1.0 mg/kg daily, which was then tapered. The enrolled patients were classified into two groups according to their response to glucocorticoid treatment: glucocorticoid response group (120 cases) and nonresponse group (63 cases). The response group included patients with complete response, which was defined as PLT higher than 100 × 10 9 /L, or with a partial response, which was defined as PLT ranging from 30 -100 × 10 9 /L or at least doubling of the baseline count. The nonresponse group only contained patients without a response, which was defined as PLT lower than 30 × 10 9 /L or less than doubling of the baseline count. The criteria for complete response, response, and nonresponse were judged according to previous criteria [10].

Pooling GWAS.
Genomic DNA was extracted from peripheral blood leukocytes from 200 cases and 200 controls using the Qiagen DNA Isolation kit (Qiagen DNA GmbH, Hilden, Germany) according to the manufacturer's protocol. Samples of intact genomic DNA showing no evidence of contamination by RNA and DNA degradation were selected for further analysis. DNA concentration and purity were calculated with a Nano Drop ND-2000 spectrophotometer (Nano Drop Technologies, DE, USA). For SNP-MaP scanning, the DNA concentration of each sample was quantified and adjusted to 5 ± 0:5 ng/μL using DNase-free water. The compared pools consisted of 200 cases and 200 control subjects, respectively. DNA (5 μL) of the case group was added to the case pool in equivalent molar amounts, and the same operation was applied for the control group. The concentration and purity of each pool were measured again. At last, eligible pools were processed for labeling and hybridization on IlluminaHumanOmniZhongHua-8 arrays according to the manufacturer's instructions (Illumina, Santa Clara CA, USA).

Analysis of SNPs.
The hybridization intensities of two probes for each SNP allele were derived from the raw scanning files. The frequencies of autosomal SNPs were averaged over three replicate case and control arrays. Before analyzing the chip data, the quality control should be conducted, as mentioned in the following points: (1) due to the phenomenon of hybridization failure of a small number of sites will occur in the process of gene chip hybridization, the failed sites should be deleted; (2) considering that too few SNP detection experiments will lead to unreliable experimental results, it is necessary to remove the sites with less than 3 repetitions; and (3) the DNA pooling method mixes the whole genomic DNA of male and female patients, resulting in uneven loci on the extraordinary chromosomes (X, Y, and mitochondria), so the loci on the extraordinary chromosomes are not considered. Differences in allelic frequency between the sample pools were evaluated by combined Z-test, as follows: The statistic combines (a) Chi-square statistic, T 1 , for testing differences between two proportions (allele frequencies) in cases and controls accounting for sampling variance: where f = Graw/ðGraw + RrawÞ and represents the approximation of allele A frequencies for each replicate, averaged over the number of replicates in each pool (Graw and Rraw are the intensities of the green and red fluorescence value).
represents the allele frequencies over n k pool replicates, v k = f k ð1 − f k Þ/2N k represents the binomial sampling variance, and N k represents the number of controls and cases (k = 1, 2).
(b) Z-statistics for testing the differences in mean allele frequencies between cases and controls: where is the square of the standard error.
Then, the relative allele frequency (RAF) of these two DNA pools was calculated using a method described previously [11]. For loci located nearby candidate genes, the false-positive report probabilities (FPRP) were calculated with the RAF to estimate the confidence intervals and the P value corresponding to the odds ratio (OR) scores [12].
The other SNP in linkage disequilibrium (LD) to the leading SNPs was analyzed using the functional mapping and annotation of genome-wide association studies (FUMA GWAS) tool (https://fuma. ctglab.nl/) [13].
2.5. TaqMan Probe Genotyping Analysis. The concentration of DNA samples from 250 cases and 250 controls was adjusted to 20 ng/μL using DNase-free water. Genotyping was performed using TaqMan SNP Genotyping Assays (Life Technologies, USA), TaqMan Genotyping Master Mix (Life Technologies, USA), and an Applied Biosystems ViiA™ 7 Real-Time PCR System (Life Technologies, USA) in a 96well format. The selected SNPs were genotyped with Taq-Man® SNP Genotyping Assays: AHCTBKM for rs5998634, AHQJQ0L for rs17503120, and C_32336830_10 for rs16866133, except for rs4483616 due to the failure probe synthesis. Each reaction (10 μL) contained 5.0 μL TaqMan Genotyping master mix, 0.25 μL primers and probes, 3.75 μL DNase-free water, and 1.0 μL DNA (20 ng/μL). Thermal cycling conditions were 95°C for 10 min, followed by 40 cycles of 95°C for 15 seconds and 60°C for 1 min.
2.6. SNPs Mapping to Genes. To scan for the genetic factors related to ITP, all identified SNPs were mapped to genes utilizing the EntrezGene database (http://www.ncbi.nim.nih .gov/entrez/). Using the information of chromosome and position, we located SNPs to the genes within a window 20 Kbp upstream and downstream. The annotations for human genome assembly version 37 (Feb. 2009, hg19, GRCh37), which was downloaded from the UCSC genome annotation databases (http://hgdownload.cse.ucsc.edu/), was used to map SNPs to genes. In our study, if multiple SNPs were mapped to the same gene, only the gene with the lowest P value was selected for further analysis. If no gene was found in a +/-20Kbp window of the SNP, the nearest gene on each side of the SNP was included.

Pathway Analysis of GWAS Data.
To provide further insight into the molecular function of identified associated variants, we utilized the WebGestalt (WEB-based Gene Set Analysis Toolkit, http://www.webgestalt.org/) to conduct functional enrichment analysis for genes in our study. All identified overrepresented pathways in our study derived from the KEGG pathway database (http://www.genome.ad .jp/kegg), which integrates genomic, chemical, and systemic functional information [14]. Although 7 genes reached the most significant criteria that P < 1:0 × 10 −7 , it was difficult to conduct a pathway analysis with such a few numbers of genes. Thus, we selected 287 genes by selecting P value less than 10 -5 as a gene set to analyze. The term "hsapiens" was selected as the organism and "hsapiens_gene_symbol" as the gene ID type when uploading both the interesting gene list and the reference gene set. Then, "hsapiens_genome" was selected as the reference set, and P < 0:05 was considered as significant when the hypergeometric method was used for statistical analysis.

Statistical Analysis.
In the clinical characteristics analysis, the continuous data were presented as mean ± standard deviation (SD), and the differences between cases and controls were evaluated using Student t tests or Chi-square tests where appropriate. P < 0:05 was considered statistically significant. In the discovery stage, the differences in allelic frequency between the sample pools were evaluated by combined Z-test. Statistical analysis was performed using SAS version 9.1.3 (SAS Institute Inc, Cary, NC, USA). Chi-square tests were used to detect whether the genotype distributions for the studied SNPs fit Hardy-Weinberg equilibrium (HWE), and P > 0:05 was considered to be consistent with the HWE equilibrium. Finally, 2 × 3 contingency tables were applied to compare genotype frequencies between cases and controls, and P < 0:05 was considered statistically significant. were recruited in the first stage. The mean age at the time of ITP onset was 40:36 ± 16:39 years, and the ratio of males and females was 1 : 2.1. A replicate cohort containing 250 cases (82 males and 168 females) and 250 unrelated controls was collected for the TaqMan probe genotyping assay. The mean age at the time of ITP onset was 40:85 ± 12:21 years, and the ratio of male to female was 1 : 2.04. Our data displayed a significant gender disparity. In both cohorts, PLT, RBC, and HB were significantly lower in the ITP group than in the control group (P < 0:001); however, WBC in ITP patients was significantly higher than in controls (P < 0.001). The detailed clinical information of all the subjects is shown in Table 1.

Loci Associated with ITP.
For SNP-Map scanning, 862,620 autosomal SNPs were screened by the IlluminaHumanOmniZhongHua-8 arrays. The raw data were deposited in the Gene Expression Omnibus as data set GSE76744 (https://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE76744). The averaged relative allele frequencies (RAF) of cases and controls presented a very high Pearson correlation (r 2 = 0:9947, Supplementary  Figure 2), indicating that the GWAS data was reliable for follow-up analysis. The difference in allelic frequency was assessed using the Z combination test, and the P value distribution is shown in Figure 1. The quantile-quantile (Q-Q) plots ( Figure 2) presented that an excess of small P values compared to the distribution expected under the null hypothesis. According to the Bonferroni correction for multiple tests, the top four loci (rs117503120, rs4483616, rs5998634, and rs16866133) located within or near GBE1/LINC02027, TENM4, SYN3/TIMP3, and RBM45/OSBPL6 genes reached a statistical significance in    The SNP name shown on the plot was the most significant SNP. The association between GBE1/LINC02027, TENM4, SYN3/TIMP3, RBM45/OSBPL6, and ITP in Chinese Han Population showed that the most significant site was rs117503120, rs4483616, rs5998634, and rs16866133, respectively, which were supported by the LD site (a-d). Plots were generated using LocusZooM (http://csg.sph.umich.edu/ locuszoom). 5 International Journal of Genomics association with ITP in the genome-wide scale (P < 1:0 × 10 −7 , Figure 1, Table 2). And the regional plots of four significant loci in the corresponding genes were shown in Figure 3. To further measure the strength of SNPs associated with ITP, we calculated the odds ratio scores and FPRP for each high-ranking SNP. As shown in Supplementary Table 1, among the top 20 SNPs, only rs5998634 located within or near the genes SYN3/TIMP3 showed an FPRP value of less than 1:0 × 10 −5 . The other SNPs in linkage disequilibrium (LD) to the top four loci were shown in Supplementary Table 2.

SNPs Associated with ITP Patient Response to
Glucocorticoid Therapy. We compared the genotype and allele frequencies of rs117503120, rs5998634, and rs16866133 between the response and nonresponse group for glucocorticoid therapy. The results showed that only the rs5998634 minor allele T was significantly associated with a favorable response to glucocorticoid treatment among ITP patients (χ 2 = 7:30, P = 0:03) ( Table 3). In term of rs5998634, there were 148 ITP patients with CC genotype and 35 patients with CT or TT genotype. We further evaluated the change of platelet count focused on the different genotypes of rs5998634 for ITP patients after the treatment of glucocorticoid within two weeks. As shown in Figure 4, the mean platelet count of ITP patients with the rs5998634 CT or TT genotype was significantly higher than in patients with the CC genotype after the fifth day of glucocorticoid treatment (P < 0:05).

Discussion
The genetic factor plays a nonnegligible role in modulating the course of ITP. Due to the low incidence of ITP, genetic   International Journal of Genomics analysis with large sample sizes can be challenging. Traditional candidate gene approaches have identified the relationship of genes such as IL-10, IL-3, and IFN-λ with ITP; but these findings have not elucidated the genetic etiology of ITP at a whole-genome scale [15,16]. To extend the present genetic evidence and to identify novel candidates of ITP, we performed a GWAS combining with pathway analysis for ITP from CHP. Our results revealed that 4 novel loci of GBE1/LINC02027 (rs117503120), TIMP3/SYN3 (rs5998634), TENM4 (rs4483616), and RBM45/OSBPL6 (rs16866133) were strongly associated with ITP from CHP. Rs117503120 is located on the GBE1/LINC02027 gene at chromosome 3q12.3. The glucan (1,4-alpha-) branching enzyme 1 (GBE1) was reported strongly associated with glycogen storage disease in previous research [17]. Our GWAS results showed that GBE1 was significantly associated with ITP (P = 6:45 × 10 −9 ), and the TaqMan probe genotyping results also showed that the MAF of rs117503120 in ITP cases was significantly lower than controls, suggesting that the rs117503120 minor allele may be protective for ITP. Thus, GBE1 may have a potential vital association with ITP from CHP, although the mechanism is yet to be investigated. In addition, the LINC02027 (long intergenic nonprotein coding RNA 2027) has been reported to be highly expressed in liver and kidney tissues from the 95 human individuals [18], but the detailed relationship with ITP remains unclear.
Rs5998634 is located on TIMP3/SYN3 at chromosome 22q12.3, which was strongly associated with ITP in our study (P = 8:06 × 10 −8 ). The tissue inhibitor of metalloproteinase-3 (TIMP3) is an inhibiting matrix metalloproteinase protein [19]. Researches have shown that the TIMP-3 protein has a statistically positive correlation with IL-4 and platelet count, but a negative correlation with IFN-γ in ITP patients, suggesting that this protein may lead to Th1/Th2 polarization via affected antigen-presenting cells and contribute to the occurrence and development of autoimmune disease [20]. Importantly, the TaqMan probe genotyping results confirmed such a strong association of rs5998634 with ITP (χ 2 = 12:5, P = 0:0019). The MAF of rs5998634 CT genotype in ITP patients (0.144) was significantly higher than in controls (0.045), suggesting that the T allele is a major genetic risk factor to ITP from CHP. Furthermore, we also evaluated the change difference of platelet count from the 183 ITP patients treated with glucocorticoid based on the different genotypes of rs5998634. Interestingly, the mean platelet count of 35 ITP patients with the rs5998634 CT or TT genotype was significantly higher than in 148 patients with the CC genotype after the fifth day of glucocorticoid treatment (P < 0:05). These results shown that the patients with ITP who carry the rs5998634 T allele may be more sensitive to glucocorticoids than patients with the C allele. The limited sample size and lack of replication resulted in low statistical efficiency, and follow-up studies with large sample sizes need to be carried out. Synapsin III (SYN3) mainly involved in the development of brain or neurons disease, such as Parkinson disease [21], but the detailed relationship with ITP remains to be confirmed.
Rs16866133 is located on RBM45/OSBPL6 at chromosome 2q31.2 and also had a strong association with ITP in our study (P = 9:39 × 10 −8 ). The association of RNAbinding motif protein 45 (RBM45) with ITP or other autoimmune diseases is unknown. The oxysterol-binding proteinlike 6 (OSBPL6) gene encodes the oxysterol-binding protein-like 6 receptor, which associated with multiple sclerosis (P = 4:64 × 10 −4 ) in the United Kingdom (UK) population [22]. It is not a surprise that ITP shares some immune mechanisms with multiple sclerosis; thus, OSBPL6 may participate in the pathogenesis of ITP in terms of immune regulation. Also, the TaqMan probe genotyping results showed a significantly lower MAF of rs16866133 TG genotype in ITP patients compared with the controls (0.005 vs 0.060), suggesting that rs16866133 G was a protective allele for ITP.
To provide further insight into the molecular function of identified associated variants, the pathway analysis converged GWAS datasets was conducted. In this study, the JAK-STAT signaling pathway, neuroactive ligand-receptor interaction, and pathways in cancer were proposed to be the most potentially associated with ITP from CHP. Among them, the JAK-STAT signaling pathway has been previously reported to participate in the etiological mechanism of pediatric ITP using the gene expression profile analysis methods [23]. It also plays a major role in the pharmacological mechanisms of eltrombopag, which is a thrombopoietin (TPO) receptor agonist approved by the FDA for the treatment of chronic ITP patients [24]. Thus, our results further provide evidence that the JAK-STAT signaling pathway involved in the pathogenesis of ITP, but its detail mechanism still needed to be explored. However, neuroactive ligand-receptor interaction and pathways in cancer have not been extensively studied in ITP to date. These findings motivate an in-depth evaluation of the contribution of these loci and pathways in the etiology of ITP.

Data Availability
The data used to support the findings of this study are available from the Gene Expression Omnibus as data set GSE76744 (https://www.ncbi.nlm.nih.gov/geo/ query/acc.cgi?acc=GSE76744). The number of reference genes in the category (total); ratio of enrichment (E-ratio); P value from hypergeometric test; P value adjusted by the multiple test adjustment (Adj-P), the significant threshold of Adj-P is P < 1:0 × 10 −5 .