Identifying Candidate Genes for Type 2 Diabetes Mellitus and Obesity through Gene Expression Profiling in Multiple Tissues or Cells

Type 2 Diabetes Mellitus (T2DM) and obesity have become increasingly prevalent in recent years. Recent studies have focused on identifying causal variations or candidate genes for obesity and T2DM via analysis of expression quantitative trait loci (eQTL) within a single tissue. T2DM and obesity are affected by comprehensive sets of genes in multiple tissues. In the current study, gene expression levels in multiple human tissues from GEO datasets were analyzed, and 21 candidate genes displaying high percentages of differential expression were filtered out. Specifically, DENND1B, LYN, MRPL30, POC1B, PRKCB, RP4-655J12.3, HIBADH, and TMBIM4 were identified from the T2DM-control study, and BCAT1, BMP2K, CSRNP2, MYNN, NCKAP5L, SAP30BP, SLC35B4, SP1, BAP1, GRB14, HSP90AB1, ITGA5, and TOMM5 were identified from the obesity-control study. The majority of these genes are known to be involved in T2DM and obesity. Therefore, analysis of gene expression in various tissues using GEO datasets may be an effective and feasible method to determine novel or causal genes associated with T2DM and obesity.


Introduction
T2DM, a complex endocrine and metabolic disorder, has become more prevalent in recent years, with significant adverse effects on human health. T2DM is characterized by insulin resistance (IR) and deficient -cell function [1]. Interactions between multiple genetic and environmental factors are proposed to contribute to pathogenesis of the disease [1,2]. Association of obesity with T2DM has been reported, both within and among different populations [3]. Earlier research has shown that obesity and its duration are major risk factors for T2DM, and IR pathological state generally exists in obesity [4,5].
In recent years, numerous susceptibility loci have been identified through genome-wide association studies (GWAS) and meta-analyses on T2DM and obesity, and nearby candidate genes are proposed to be directly involved in the diseases [6,7]. However, the underlying mechanisms by which these susceptibility loci affect and cause T2DM or obesity are currently unclear. Known SNPs associated with disease typically account for only a small fraction of overall disease [8,9]. Gene expression patterns play a key role in determining pathogenesis and candidate genes of T2DM and obesity. A large-scale computable model has been created to analyze the molecular actions and effects of insulin on muscle gene expression [10]. Based on GWAS results, investigators integrated expression quantitative trait loci (eQTL) with coexpression networks to establish novel genes and networks relevant to the disease. Sixty-two candidate genes were identified through integrating 32 SNPs associated with T2DM and nearby gene expression from blood samples of 1008 morbidly obese patients. Many of the highly ranked genes are GEO   known to be involved in the regulation and metabolism of insulin, glucose, and lipids [11]. Different gene expression patterns exist in various tissues of organisms, and complex metabolic diseases, such as T2DM and obesity, are affected by comprehensive gene expression in multiple tissues. Analysis of gene expression in six tissues of mice from obesity-induced diabetes-resistant and diabetessusceptible strains before and after the onset of diabetes led to the identification of 105 coexpression gene modules [12]. In the present study, gene expression profiles of human skeletal muscle, adipose tissue, islet, liver, blood and arterial tissue (or skeletal muscle, omental adipose tissue, cumulus cells, liver, blood, and subcutaneous abdominal adipose tissue) from GEO datasets were analyzed to identify the candidate genes for T2DM and obesity. Furthermore, candidate genes 1 Mb upstream and downstream (±1 Mb) of susceptibility SNPs for human T2DM and obesity were screened. Our analysis of gene expression in various tissues using GEO datasets provides a valuable method to determine novel candidate genes for T2DM and obesity.

Materials and Methods
The overall experimental design is shown in Figure 1.

GEO Dataset Selection and Statistical Analysis.
Human GEO datasets for T2DM or obesity were downloaded from the Gene Expression Omnibus (GEO) database of NCBI (http://www.ncbi.nlm.nih.gov/gds/). In total, 23 datasets (14 for T2DM and 9 for obesity) were selected and downloaded. Some datasets were separated into several groups according to sample phenotype. Overall, 21 groups for T2DM and 14 for obesity were obtained. Samples of disease and control were included in the case and control subgroups, respectively (details of samples for each group are provided in Table S1 in Supplementary Material available online at http://dx.doi.org/10.1155/2013/970435). Three or more samples were included in each case or control subgroup for every microarray experiment. CEL files of samples were submitted to RMAExpress, Version 1.0.4, to yield normalized log 2 expression values for each probe in individual groups with default parameters [13]. Analysis of variance (ANOVA) for normalized log 2 expression values of two independent samples in each group was performed with the test. The -test for equal or unequal variances was used, depending on the -value of the tests.
Gene annotation files were downloaded from Ensembl (http://asia.ensembl.org/biomart/martview/45e0798c53bbd-97ed0cf3d61142da3df) depending on the platform (GPL) of each group. Probes were matched with unique genes through gene annotation files. Probes corresponding to more than one gene were excluded. Probes or genes with significant differential expression were defined as -value ≤ 0.05. We calculated the differential expression ( ≤ 0.05) percentage of each gene in all 21 T2DM and 14 obesity groups. For a gene with several probes, values ≤ 0.05 were selected to represent significance.

Statistical Analysis of Differential Expression Percentages of Genes.
Genes were ranked based on differential expression in the T2DM and obesity groups. Genes with the highest percentage of differential expression were identified as candidates (≥50% for T2DM and ≥60% for obesity). Ranked genes are presented in Supplementary Materials (Table S2).

Screening of Genes within ±1
Mb of Susceptibility SNPs for T2DM and Obesity. In total, 54 and 95 SNPs associated with T2DM and obesity, respectively, were selected ( ≤ 5 × 10 −8 , detailed information in Tables S3 and S4). The coordinate of each SNP in the chromosome was searched in the NCBI database (http://www.ncbi.nlm.nih.gov/SNP/). Consensus CDS (CCDS) files for human data were downloaded (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/), and genes within ±1 Mb of SNPs were filtered out. Overall, 445 and 917 genes within 2 Mb of SNPs were associated with T2DM and obesity, respectively. The genes were reordered based on differential expression percentages with the above method, and those with the highest percentages were selected as candidates for T2DM (>40%) and obesity (>50%). Detailed information on all ranked genes in close proximity to SNPs is provided in Supplementary Materials (Table S5).

GO (Gene Ontology) and Pathway Analysis of Candidate
Genes. Enrichment analysis of GO and pathways of all candidate genes was performed using Capital Bio Molecule Annotation System 3 (http://bioinfo.capitalbio.com/mas3/).

Candidate Genes for T2DM and Obesity.
In total, expression patterns of 23,810 genes were analyzed in the T2DMcontrol study. All genes were ranked based on the differential expression percentage. The average percentage of all genes was ∼11%. Six highly ranked genes (DENND1B, LYN, MRPL30, POC1B, PRKCB, and RP4-655J12.3) were identified as candidates for T2DM (Table 1).
Since less groups were available for the obesity-control study, genes with fewer than 10 values were excluded in order to obtain better statistical results. Expression of 14,367 genes was analyzed using the above method. The average percentage of all genes was ∼17.5%. Eight genes (BCAT1, BMP2 K, CSRNP2, MYNN, NCKAP5L, SAP30BP, SLC35B4, and SP1) were isolated as candidates for obesity (Table 2).

Candidate Genes within ±1
Mb of SNPs Conferring Susceptibility to T2DM and Obesity. In total, 445 genes in close proximity to T2DM SNPs were reordered based on their differential expression percentages. In particular, two highly ranked genes, HIBADH and TMBIM4, within ±1 Mb of rs864745, rs849134, and rs1531343 SNPs were filtered out ( Table 1).
Gene symbols and the corresponding full names of all candidate genes are supplied in Tables 1 and 2.

GO and Pathway Analysis of Candidate Genes.
Results of GO and pathway analyses revealed that PRKCB is mainly associated with T2DM, and PRKCB and GRB14 are involved in insulin signaling within the gene pathway network ( Figure 2). Further analysis of the correlation pathways of genes disclosed that PRKCB, SP1, GRB14, LYN, and ITGA5 are correlated with each other (Figure 3).

Discussion
Complex metabolic diseases are often caused by alterations in gene expression or metabolic pathways in various tissues. Here, we analyzed differences in gene expression levels in various human tissues from GEO datasets in T2DM-or obesity-control experiments with the -test. The values were adjusted using the Bonferroni or FDR method [20] to allow for multiple testing. We introduced strict criteria with FDR ≤ 0.05. However, with these criteria, no genes were filtered out in most groups (16 of 21 groups, Table S6), while the percentage of genes with -test values ≤ 0.05 was lower than 10% in most groups (15 of 21 groups, Table S6). Therefore, the -test statistic was ultimately applied for the present study. In total, we filtered out 21 candidate genes (8 for T2DM and 13 for obesity). The list of up-and downregulated candidate genes is provided in Supplementary Material (Table S7). Similarly, an eGWAS was performed across 130 independent experiments in human, rat, and mouse to identify additional genes implicated in the molecular pathogenesis of T2DM [21]. Interestingly, the same genes were not identified among the different studies. These discrepancies may be attributed to the use of various species, statistical methods, and tissues by different groups.
Analysis of the correlation pathways of the identified genes revealed that PRKCB, SP1, GRB14, LYN, and ITGA5 are correlated with each other (Figures 2 and 3). The proteins interact directly within cells or indirectly among different tissues in the etiological process of T2DM or obesity. PRKCB mediates Ca 2+ and DAG-evoked insulin secretion processes  in Langerhans' cells [22], functions downstream of insulinreceptor substrate 1 (IRS1) in muscle cells, and participates in the regulation of glucose transport in adipocytes by negatively modulating insulin-stimulated translocation of the glucose transporter, SLC2A4/GLUT4 [23,24]. Under high glucose conditions in pancreatic beta cells, PRKCB may be involved in the inhibition of insulin gene transcription [25]. In the present study, we observed PRKCB upregulation in skeletal muscle, islets, adipose tissue, and blood and downregulation in liver of T2DM individuals (Table S7). These findings suggest that PRKCB may be involved in IR and deficient -cell function in vivo. GRB14 binds directly to IR and regulates insulin-induced IR tyrosine phosphorylation [19]. GRB14-deficient mice display enhanced insulin signaling via IRS1 and AKT activation in liver and skeletal muscle, despite lower circulating insulin levels [18]. An earlier study showed increased GRB14 expression in adipose tissues of both ob/ob mice and Goto-Kakizaki (GK) rats, but no changes in liver [26]. In our experiments, GRB14 expression was similarly increased in subcutaneous adipose tissue of obese humans, while a decrease was observed in liver (Table S7). In addition, GRB14 is located within ±1 Mb of obese SNP rs10195252, and the rs10195252 T-allele is associated with increased GRB14 subcutaneous adipose tissue mRNA expression [27]. However, LYN is implicated in the insulin signaling pathway via phosphorylation of IRS1 and PI3 K in liver and adipose tissues [14]. The insulin secretagogue, glimepiride, activates LYN in adipocytes [28]. This indirect LYN activation may modulate glycemic control activity of glimepiride in the extrapancreatic environment [28,29]. In the present study, LYN expression was increased in adipose tissue, skeletal muscle, and blood of T2DM individuals, while a decrease was observed in islets and liver (Table S7). Moreover, LYN is a highly ranked gene with the highest differential expression percentage in the T2DM-control study (61.1%) and may therefore be a valuable candidate gene for future T2DM research. ITGA5 additionally promotes PI3 K and AKT phosphorylation [30]. ITGA5 expression was shown to be upregulated in adipose tissue of New Zealand obese (NZO) mice (high fat diet versus standard diet) [31]. We observed increased expression of ITGA5 in human subcutaneous adipose tissue (Table S7). Moreover, ITGA5 is located within ±1 Mb of the obesity SNP, rs1443512. SP1 is a zinc finger transcription factor that binds to GC-rich motifs and may be involved in insulinmediated glucose uptake through positively regulating Glut4 expression in adipose tissue, skeletal muscle, and heart [32,33]. SP1 was downregulated in adipose tissue, while increased expression was observed in blood. Pathway analysis revealed the involvement of SP1 in oxidative stress and adipogenesis ( Figure 2). SP1 not only is located within ±1 Mb of obesity SNP rs1443512 (similar to ITGA5), but also has the highest differential expression percentage (63.6%). Therefore, further  studies are necessary to determine whether rs1443512 is related to ITGA5 or SP1 expression.
Differential expression of HIBADH ((+) 5.1 − 03) was reported in liver mitochondria during development of Goto-Kakizaki (GK) rats [15]. We observed no changes in HIBADH expression in liver, while a decrease was evident in skeletal muscle and blood of humans with T2DM. In addition, HIBADH is located within ±1 Mb of T2DM SNPs, rs864745, and rs849134. The association of HIBADH with T2DM requires further evaluation.An earlier study reported higher BCAT1 expression in subcutaneous adipose tissue of females in the insulin-resistant than insulin-sensitive group [34]. Interestingly, higher BCAT1 expression was observed in subcutaneous adipose tissue of obese humans in this   study. We additionally recorded an increase in blood and decrease in omental adipose tissue (Table S7). BCAT1 has been identified as the optimal marker for weight regain [35]. Moreover, the rs2242400 polymorphism in BCAT1 appears to be associated with T2DM in more than one population [36]. SLC35B4 has been identified as a potential regulator of obesity and insulin resistance in mouse models. Both in vivo and in vitro studies in mice disclosed that decreased SLC35B4 expression is associated with a decrease in gluconeogenesis [17]. An increase in SLC35B4 expression was observed in subcutaneous adipose tissue of obese humans in our study (Table S7). Interestingly, a SNP in the human SLC35B4 gene (rs1619682) is associated with waist circumference [16]. HSP90AB1 mRNA is reported to be upregulated in 3T3-L1 cells 6 h after stimulation of adipogenesis [37]. Moreover, HSP90AB1 is located near the obesity SNP, rs6905288. Expression levels of MYNN are negatively correlated to BW (body weight) in adipose tissues of F2 mice (C57BL/6J × TALLYHO/JngJ) [38]. Consistently, our data showed that MYNN expression is downregulated in subcutaneous adipose of obese humans (Table S7). Furthermore, SAP30BP may be involved in body mass index (BMI) in adipose tissue (Pearson correlation (−0.51)) [39]. A decrease in SAP30BP expression was detected in subcutaneous adipocytes of obese human subjects in the present study (Table S7).
The rest of the candidates, C2orf15, DENND1B, MRPL30, POC1B, RP4-655J12.3, TMBIM4, BMP2 K, CSRNP2, NCKAP5L, TOMM5, and BAP1, may be novel genes related to T2DM or obesity. TMBIM4 is located within ±1 Mb of the SNP rs1531343 conferring susceptibility to T2DM, while NCKAP5L, TOMM5, and BAP1 are mapped within ±1 Mb of SNPs conferring susceptibility to obesity. TMBIM4 encodes transmembrane BAX inhibitor motif containing 4, which inhibits apoptosis induced by intrinsic and extrinsic stimuli and modulates both capacitative Ca 2+ entry and inositol 1,4,5-trisphosphate (IP3)-mediated Ca 2+ release [40]. In our study, TMBIM4 was mainly upregulated in skeletal muscle, while downregulation was observed in liver (Table S7). The NCKAP5L gene encoding Nck-associated protein 5-like displayed upregulation in adipose tissue but was downregulated in blood (Table S7). TOMM5 encodes the mitochondrial import receptor subunit TOM5 homolog. TOMM5 was mainly involved in four GO terms (GO:0008565, protein transporter activity; GO:0015031, protein transport; GO:0005739, mitochondrion; GO:0005742, mitochondrial outer membrane). BAP1 (ubiquitin carboxyl-terminal hydrolase) localizes at the nucleus and contains three domains (a ubiquitin carboxylterminal hydrolase (UCH) with an N-terminal catalytic domain, a unique linker region, and a C-terminal domain). The UCH domain conveys deubiquitinase activity to BAP1 [41]. In flies and humans, the Polycomb repressive deubiquitinase (PR-DUB) complex is formed through interactions of BAP1 and ASXL1 [42]. DENND1B may promote the exchange of GDP with GTP and play a role in clathrin-mediated endocytosis [43]. The product of MRPL30 is a constituent of mitochondrial ribosomes. POC1B is involved in the early steps of centriole duplication and the later steps of centriole length control [44,45]. The CSRNP2 protein binds to the consensus sequence, 5 -AGAGTG-3 , and has a transcriptional activator. However, C2orf15 and RP4-655J12. 3 have been rarely reported in databases or publications to date. Associations of all new candidate genes identified in the present study with obesity or T2DM require verification in future analyses.

Conclusions
LYN, a gene reported to be involved in the insulin pathway, was highly ranked with the highest differential expression percentage in the T2DM-control study (61.1%) and may therefore be a valuable candidate gene for future T2DM research. NCKAP5L with the highest differential expression percentage (63.6%) was located within ±1 Mb of the obesity susceptibility SNP, rs7132908, and was thus identified as the most likely novel candidate gene for obesity. We conclude that analysis of gene expression in various tissues via GEO datasets is an effective and feasible method to identify novel or causal genes associated with T2DM and obesity.