Association Analysis of Genetic Variants with Type 2 Diabetes in a Mongolian Population in China

The large scale genome wide association studies (GWAS) have identified approximately 80 single nucleotide polymorphisms (SNPs) conferring susceptibility to type 2 diabetes (T2D). However, most of these loci have not been replicated in diverse populations and much genetic heterogeneity has been observed across ethnic groups. We tested 28 SNPs previously found to be associated with T2D by GWAS in a Mongolian sample of Northern China (497 diagnosed with T2D and 469 controls) for association with T2D and diabetes related quantitative traits. We replicated T2D association of 11 SNPs, namely, rs7578326 (IRS1), rs1531343 (HMGA2), rs8042680 (PRC1), rs7578597 (THADA), rs1333051 (CDKN2), rs6723108 (TMEM163), rs163182 and rs2237897 (KCNQ1), rs1387153 (MTNR1B), rs243021 (BCL11A), and rs10229583 (PAX4) in our sample. Further, we showed that risk allele of the strongest T2D associated SNP in our sample, rs757832 (IRS1), is associated with increased level of TG. We observed substantial difference of T2D risk allele frequency between the Mongolian sample and the 1000G Caucasian sample for a few SNPs, including rs6723108 (TMEM163) whose risk allele reaches near fixation in the Mongolian sample. Further study of genetic architecture of these variants in susceptibility of T2D is needed to understand the role of these variants in heterogeneous populations.


Introduction
Type 2 diabetes (T2D) is a complex disease characterized by insulin resistance and pancreatic beta-cell dysfunction. In China, 9.7% and 15.5% of the entire population suffer from T2D and prediabetes, respectively [1]. Given recent advances in genotyping and sequencing technology, the GWAS have contributed significantly to the identification of susceptibility loci for T2D and many other complex disorders. At least 80 loci conferring susceptibility to T2D have been identified to date [2,3] (http://www.genome.gov/gwastudies/) and the genetic architecture underlying T2D varies substantially between populations of different ethnic backgrounds [4,5]. Studying genetics of T2D in multiethnic cohorts has been insightful for fine-mapping casual variants and identifying new loci [6], demonstrating the use of investigating common variants in different ethnic samples.
There are 10 million Mongolians currently living in various regions of Asia [7]. To our knowledge, there is only one T2D association study conducted in a Mongolian sample, which replicated association of variants in KCNQ1 and ABCC8 with T2D [8]. However, the small-sized study (177 2 Journal of Diabetes Research cases and 216 controls) has low power to detect variants with smaller effect. The prevalence of T2D in Mongolians in China has grown among the adult urban population from 1.86% in 1980 to 5.6% in 2012 [9][10][11]. In our study, we aim to explore genetic risks of 34 T2D SNPs previously reported by GWAS in a larger Mongolian sample. Twenty-eight SNPs passed rigorous quality control filtering and 11 of them showed significant association with T2D (Bonferroni corrected < 0.05). The SNP with the strongest T2D association, rs7578326, has the risk allele significantly associated with increased levels of TG. We demonstrated the need of further study of allelic difference of T2D associated SNPs in diverse populations.

Methods and Materials
2.1. Ethics Statement. This study was approved by the Institutional Review Board of the Affiliated Hospital of Inner Mongolia University for the Nationalities and complied with the Declaration of Helsinki. The written informed consent was obtained from each participant.

Study Population.
We collected whole blood samples from 986 individuals of Mongolian ethnicity from Inner Mongolia, China. The sample was comprised of 511 T2D cases and 475 healthy normoglycemic controls, of which 497 cases and 469 controls passed quality control filtering and were used for subsequent analysis (see below). Cases were registered based on the World Health Organization (WHO) criteria [12] of fasting plasma glucose concentration ≥7 mmol/L or 2-h plasma glucose concentration ≥11.1 mmol/L and were admitted to the affiliated hospital of the Inner Mongolia University for Nationalities. Nondiabetic healthy controls were selected based on matching sex and ethnic background from the same region. Aside from the diagnosis of T2D, we collected other diabetes related lipid traits, such as TC, HDL-C, LDL-C, and TG, for each individual. We collected certain life style information (smoking and drinking habits), waist circumference (WC), and body mass index (BMI) of each participant as well.

Selection of SNPs and Genotyping.
We selected a list of SNPs previously found to be associated with T2D based on the NHGRI GWAS catalog [2] (available at http://www .genome.gov/gwastudies/, November, 2012). Candidate SNPs were initially selected with the following considerations: (1) SNPs found to be associated with T2D in an Asian sample were given higher priority (rs6723108 and rs5945326 were added after the initial selection date); and (2) subsequently SNPs found to be associated with multiple studies were included. We were able to genotype 34 SNPs located in or near 33 candidate genes (see Supplementary Material available online at http://dx.doi.org/10.1155/2015/613236). We included two SNPs around KCNQ1, because those have been reported to be associated with Asian populations [13,14].
We estimated the concentration of isolated genomic DNA using Qubit dsDNA BR Assay Kit (Invitrogen, USA), and the DNA solution was further diluted to a concentration of 10 ng/ L. We designed the targeted sequencing primers and redesigned the primer sets with dispersed or weak electrophoretic bands. To prepare the chip array, we used a multisample nanodispenser (WaferGen, USA) to disperse DNA and primers into SmartChip MyDesign Chip (Wafer-Gen, USA). Following the polymerase chain reaction (PCR) amplification, we purified PCR products through Agencourt AMPure XP-medium beads to get mixed Illumina pair-end libraries. Insert sizes were calculated by Aglient 2100 bioanalyzer (Agilent, USA) and concentrations were estimated by Real Time PCR. Sequencing was performed on either Illumina MiSeq or on Illumina Hiseq 2500. All sequencing steps were in strict accordance with Illumina recommended protocols.
The final sequencing depth reached > 200x, and the length of pair-end reads was 100 bp. Reads with an average base quality of ≥20 were kept for further analysis. BWA [15] (v0.5.9, available at http://bio-bwa.sourceforge.net/) was used to map all clean reads against the human reference genome of hg19 allowing ≤3 mismatches across a single read. Samtools mpileup (v0.1.18, available at http://samtools .sourceforge.net/) command was used to obtain SNP genotypes as described in [16].
These genotypes were further filtered according to the following criteria: SNPs with ≥5% of missing call rate across the samples. Samples with ≥3% of missing genotypes (which corresponds to 10% of missing SNP call rate) were removed. We tested SNPs for Hardy-Weinberg Equilibrium (HWE) and excluded SNPs with HWE value < 1 × 10 −6 in unaffected individuals. Twenty-eight SNPs of 966 samples (497 cases and 469 controls) passed the quality control filtering, and the overall genotype call rate is 99.3% or higher across the sample.

Statistical Analysis.
We tested association between candidate SNPs and the status of T2D using logistic regression (likelihood ratio test) by adjusting for the effects of age, sex, and BMI. The study-wide significance was determined by applying Bonferroni correction using 28 tested SNPs ( value ≤ 0.05/28 = 1.8 × 10 −3 ). We tested association with diabetes related quantitative traits (TC, HDL-C, LDL-C, and TG) across both T2D cases and controls using linear regression with the age, sex, BMI, and T2D status as covariates. All quantitative trait measures were normalized by quantile normalization and the normalized values were used in the analyses. Formal statistical tests, including 95% confidence intervals (CI), were performed using EPACTS [17] (v3.2.6, available at http://www.sph.umich.edu/csg/kang/epacts/). Differences in population structure between the Mongolian sample (healthy controls) and healthy Caucasian (CEU) or Chinese (CHB and CHS) samples of 1000 G project [18] (http://www.1000genomes.org/) were estimated by comparing risk allele frequency and the Wright's fixation index ( ST ) using plink [19,20]. Comparison of trait values between cases and controls was conducted using Student's -test.

Results
After rigorous sample and marker level quality control filtering, genotypes of 28 SNPs on 966 individuals (including  497 with T2D cases and 469 nondiabetic ethnically matched controls) were kept for subsequent analyses. Clinical characteristics of the sample are summarized in Table 1. Overall, consistent with previous studies [21], T2D cases in current study have higher TC, TG, and LDL-C values compared to controls and have comparable HDL-C values with the controls, indicating that TC, TG, and LDL-C are among risk factors for T2D in the Mongolian sample. Table 2 presents the association results between the 28 SNPs and T2D status. Of the 28 SNPs tested, 11 SNPs were significantly associated after correcting for multiple testing ( < 1.8×10 −3 ). We replicated a T2D association near KCNQ1 (rs2237897; OR = 1.39; = 0.002), originally identified in a Japanese population [14], and subsequently replicated in another Mongolian population sample with the same ethnic background as our sample [22]. We also replicated T2D association of three SNPs initially identified in Asian samples, namely, rs163182 (KCNQ1) [13] in Japanese, rs6723108 (TMEM163) [23] in Indians, and rs10229583 (PAX4) [24] in Chinese samples. In addition, we replicated associations of seven T2D SNPs previously identified in European populations. To our knowledge, the association of rs7578326 (IRS1), rs1531343 (HMGA2), rs8042680 (PRC1), rs1387153 (MTNR1B), rs7578597 (THADA) [25], rs243021 (BCL11A) [26], and rs1333051 (CDKN2) [27] was for the first time replicated in an Asian sample.
Among the four lipid related traits tested, we only observed a single significant association of T2D risk allele A of rs7578326 (IRS1) with TG level ( = 0.0004, OR = 3.4, and CI 1.7-6.6). Mean TG level for AA, AG, and GG is 2.65, 2.12, and 2.03 mmol/L (Figure 1). Individuals homozygous for the risk allele A have 25% higher TG values compared to heterozygotes. Heterozygotes have 4% higher TG compared to those homozygous for the nonrisk allele G.
Most of the SNPs selected for testing in our sample have been identified to be associated with T2D initially in European populations. The lack of replication of 17 T2D loci identified by the GWAS studies in other populations prompted us to look at the differences of allelic architecture in the SNPs tested for the T2D association in our population. We observed variable risk allele frequency difference between our sample and 1000 G Caucasian (CEU) and Chinese populations (Table 3). Six out of 11 SNPs that have risk allele frequency 10% higher in the Mongolian sample compared to the Caucasian sample in 1000 G panel showed significant association with T2D in our study, and only two out of nine SNPs that have 10% or lower risk allele frequency compared to CEU showed significant association with T2D. Although statistically not significant, this observation shows a trend of overabundance of T2D associated SNPs in those with high frequency of risk alleles in the Mongolian population. In addition, we calculated Wright's fixation index to estimate whether the heterogeneity of a tested SNP is different between the Mongolian sample and Caucasian or Chinese samples. Noteworthy, a T2D associated SNP, rs6723108 (TMEM163), has reached near fixed high risk allele frequency (0.98 in our sample compared to 0.51 in the Caucasian sample of 1000 G project) and has a substantial population difference with the Caucasian ( ST as high as 0.61). This trend is also present for the case of rs8042680 (PRC1), which has much higher risk allele frequency in the Mongolian sample compared to the Caucasians (0.92 versus 0.72; ST = 0.67). Although it is difficult to postulate the cause of the population difference, high proportion of Mongolians appear to carry this risk allele.

Discussion
In this study, we chose to examine the association of 34 GWAS SNPs, previously identified in European and East Asian populations, with susceptibility to T2D in a Mongolian sample from China. Six SNPs did not pass quality control filtering and were excluded from the analysis. This study confirmed the T2D association of rs2237897 in KCNQ1 that were reported in European, Mexican, Chinese, Japanese, and Mongolian populations [8]. We also replicated the T2D association of three SNPs previously identified in Asian samples, namely, rs163182 (KCNQ1) [13] in Japanese, rs6723108 (TMEM163) [23] in Indians, and rs10229583 (PAX4) [24] in Chinese samples. In addition, our study replicated T2D association of seven additional GWAS SNPs in our sample. To our knowledge, the T2D association of rs7578326 (IRS1), rs1531343 (HMGA2), rs8042680 (PRC1), rs1387153 (MTNR1B), rs7578597 (THADA) [25], rs243021 (BCL11A) [26], and rs1333051 (CDKN2) [27] was replicated in an East Asian sample for the first time. This indicates that our study was able to replicate the result obtained in a population with the similar ethnic background and extended the replication of several other loci in an Asian population.
Although the association is not significant after the multiple test correction, additional six SNPs show the OR trend consistent with the original reported studies. SNPs found to be associated with T2D in Asian populations, rs1048886 (C6orf57) [5], rs4402960 (IGF2BP2) [28], rs5015480 (HHEX, IDE), rs1359790 (SPRY2) [29], rs1552224 (CENTD2), rs3923113 (GRB14), rs5215 (KCNJ11), rs7903146 (TCF7L2) [6], rs10886471 (GRK5), and rs7403531 (RASGRP1) [30], were not replicated in our study. This observation clearly suggests the following: (1) our study has a limited power because of relatively small sample size, which warrants further confirmation of these SNPs in a larger sample from the same ethnic group; (2) our sample has variable degrees of difference in risk allele frequency compared to Caucasian population where most original GWAS were conducted, and (3) the selection of control individuals was not matched precisely with the cases, in particular, with respect to age, and there is a trend that the healthy controls are younger compared to the cases. However, we took a step to use age, sex, and BMI as covariates in our statistical analysis to minimize the effect of this disparity. The follow-up larger scale study should recruit more matching control individuals to cases.
None of the SNPs we tested for association with T2D here has been previously implicated to be associated with lipids based on the NHGRI GWAS catalog (available at http://www.genome.gov/gwastudies/, November, 2012). However, we observed that association of T2D risk allele (A, major allele) of rs7578326 (IRS1) is associated with the increased level of TG. An elevated level of TG has been implicated as a risk factor of T2D, which likely resulted from the diminished activity of insulin causing inhibition of microsomal TG transfer protein activity [31]. The major allele (A) of a SNP in the upstream region of the same gene (IRS1), namely, rs2972146, is reported to be associated with an elevated level of TG as well [32]. rs297214 has nominal linkage disequilibrium with the SNP reported here ( 2 = 0.3753, 1000 G phase I), indicating a possibility that the T2D associated SNP or its proxy could be playing a role in pathogenesis of T2D or its related TG metabolism through IRS1 activity. Further functional work will help to understand the role of this SNP. Since we did not have information on previous medical history of treatment for either high cholesterol or T2D for the patients, it is possible that such treatments, if administered previously, could have prevented us from seeing the effect of SNP association with the diabetes related quantitative traits, including TG.
Mongolians are one of the people who reside on the Mongolian Plateau in Asia and have heavily depended on nomadic life styles with harsh environments characterized by low temperature and scarce availability of food sources [33]. Although we do not have any concrete evidence that these variants might play a role in conserving energy, we found six T2D associated SNPs in our sample that have 10% or higher risk allele frequency compared to Caucasian samples. These SNPs have a comparable risk allele frequency with Chinese populations in the 1000 G project, indicating that our allele frequency estimate is reflecting the allelic structure of the SNPs in the populations of Asia. We observed that SNPs rs6723108 (TMEM163) and rs8042680 (PRC1) have substantial allelic differences in our sample compared to Caucasians and the risk alleles have reached near fixed level in Mongolians. On the other hand, a widely replicated T2D SNP, rs7903146 (TCF7L2), in European populations has substantially low risk allele frequency in the Mongolian sample (0.06 versus 0.31 in the Caucasian sample; ST = 0.27) and is not associated in our sample of modest size. It is likely that the frequency difference of some SNPs between our sample and European population also contributed to the lack of reproducibility in T2D association in our study.
Although we note that rs6723108 has a wide range of OR estimate (95% CI 2-33.3; risk allele frequency 0.98), the OR of 7.7 in our study is substantially greater than what was reported in the original study in an Indian population [23] (OR = 1.31, 95% CI 1.20-1.44; risk allele frequency 0.89), indicating the potential higher effect of the variant in our sample. More systematic studies of population specific allelic architecture with respect to T2D are needed to dissect the potential impact of these highly differentiated SNPs in different populations.
In conclusion, our association study has confirmed association of several previously identified T2D susceptibility loci in the Mongolian sample. We also identified rs7578326 near IRS1 to be associated with an increased level of TG. The observation of the remarkable allele frequency difference of the T2D SNPs in our sample compared to Caucasians is important in further identifying causative variants for T2D and understanding the role of these SNPs in development of T2D in different ethnic populations.