Genetic Risk Score Modelling for Disease Progression in New-Onset Type 1 Diabetes Patients: Increased Genetic Load of Islet-Expressed and Cytokine-Regulated Candidate Genes Predicts Poorer Glycemic Control

Genome-wide association studies (GWAS) have identified over 40 type 1 diabetes risk loci. The clinical impact of these loci on β-cell function during disease progression is unknown. We aimed at testing whether a genetic risk score could predict glycemic control and residual β-cell function in type 1 diabetes (T1D). As gene expression may represent an intermediate phenotype between genetic variation and disease, we hypothesized that genes within T1D loci which are expressed in islets and transcriptionally regulated by proinflammatory cytokines would be the best predictors of disease progression. Two-thirds of 46 GWAS candidate genes examined were expressed in human islets, and 11 of these significantly changed expression levels following exposure to proinflammatory cytokines (IL-1β + IFNγ + TNFα) for 48 h. Using the GWAS single nucleotide polymorphisms (SNPs) from each locus, we constructed a genetic risk score based on the cumulative number of risk alleles carried in children with newly diagnosed T1D. With each additional risk allele carried, HbA1c levels increased significantly within first year after diagnosis. Network and gene ontology (GO) analyses revealed that several of the 11 candidate genes have overlapping biological functions and interact in a common network. Our results may help predict disease progression in newly diagnosed children with T1D which can be exploited for optimizing treatment.


Introduction
In type 1 diabetes (T1D) the pancreatic -cells are destroyed by the immune system in a process involving the proinflammatory cytokines interleukin-1-(IL-1 ), interferon-(IFN ), and tumor necrosis factor-(TNF ) released from antigen-presenting cells and T-cells [1,2]. Genomewide association scans (GWAS) have identified more than 40 genomic regions that are associated with T1D risk [3] (http://www.t1dbase.org). Many of the GWAS candidate genes have annotated immune-cell functions and most of the genetic risk variants have therefore been suggested to modulate immune-regulatory pathways [4,5]. However, recent studies have highlighted that a significant proportion of the candidate genes are also expressed in human islets suggesting functional effects in -cells [6][7][8] and possibly involvement in inflammation-and immune-mediated -cell killing mechanisms thereby potentially affecting disease progression after clinical onset [9]. As most variants identified through GWAS contribute to only modest effects to disease risk, it is likely that a combination of variants will better capture effects of clinical relevance. In T1D, very few studies have analyzed 2 Journal of Diabetes Research the impact of multiple variants on disease prediction and progression [10][11][12], although candidate gene-focused studies have demonstrated association with parameters of disease progression [13][14][15][16].
In the current study, we aimed at investigating whether a combined genetic risk score of T1D risk variants can predict glycemic control and residual -cell function as assessed by HbA1c and insulin dose-adjusted HbA1c (IDAA1c) during disease progression in children with newly diagnosed T1D. We exclusively included SNPs for candidate genes expressed and transcriptionally regulated by cytokines in the target tissue of T1D, that is, human islets, as we hypothesized that these qualify as the most directly involved predictors.

Expression Profiling of Candidate Genes in Human Islets.
Human pancreatic islet preparations from nine nondiabetic donors (aged 8-57 years; 6 males and 3 females) were obtained from a multicenter European Union-supported program on -cell transplantation in diabetes. None had classical T1D-associated HLA-DR risk genotypes. The program was approved by central and local ethical committees. Islet preparation, cytokine stimulation (5000 U/mL TNF + 750 U/mL IFN + 75 U/mL IL-1 for 48 h), and RNA extraction have been described previously [17]. Relative gene expression of candidate genes was evaluated by TaqMan assays using the Low Density Array system on TaqMan 7900HT (Applied Biosystems). Target gene expression was normalized to the geometric mean of three housekeeping genes (GAPDH, 18S-RNA, and PPIA) and evaluated using the delta-delta Ct method [18]. One of the identified genes (IL10) whose expression was modulated following cytokine treatment was only detected in three of the human islet preparations. Genes with Ct values < 37 were considered as expressed.

Study Populations from the Hvidoere Study Group (HSG)
on Childhood Diabetes. The study population was collected through HSG and is described in [19]. The cohort included in total 257 children (126 girls and 131 boys). Eighty-four percent of the patients were white Caucasian, and age at clinical diagnosis was 9.1 ± 3.7 years (mean ± SEM), BMI 16.5 ± 3.2 kg/m 2 , and HbA1c 11.2 ± 2.1% at the time of diagnosis. DKA (HCO3 ≤ 15 mmol/L and/or pH ≤ 7.30) was present in 20.7% of the cases at the time of diagnosis. Exclusion criteria were suspected non-T1D (type 2 diabetes, maturityonset diabetes of the young (MODY), or secondary diabetes), decline of enrolment into the study by patients or parents, and patients initially treated outside of the centers for more than 5 days. The diagnosis of T1D was according to the World Health Organization criteria. The study was performed according to the criteria of the Helsinki II Declaration and was approved by the local ethic committee in each center. All patients, their parents, or guardians gave informed consent. In the current study, patients with missing values for genotyping and clinical outcome measures were excluded leaving a total of 182 patients with complete genotype profile and clinical characterization.

Gene Ontology Terms and Network Construction.
We used PANTHER [21] to perform functional annotation of the 11 input candidate genes. The enrichment for gene ontology (GO) terms in the biological process category was identified based on binomial test. The human genome was used as the reference list. To construct protein networks on the 11 input candidate genes, the STRING network tool was used. STRING is a database of known and predicted protein interaction data from multiple sources including experiments, coexpression, and text mining. In total, STRING covers nearly 10,000,000 proteins from over 2,000 organisms (http://string-db.org). Network was built with a medium confidence score (0.400) and up to 10 interactors.

Statistical Analysis.
A genetic risk score was calculated for each individual based on the cumulative number of risk alleles carried for the 11 SNPs and was used as a continuous variable to test for association with IDAA1c and HbA1c levels at 1, 3, 6, 9, and 12 months after T1D onset in linear regression models. The assigned risk alleles for CTSH and SKAP2 were opposite compared to risk of T1D due to regression analyses from individual SNP models. Regression models were adjusted for the covariates sex, age group (0-5, 5-10, and >10 years at diagnosis), and HLA risk groups. Forward stepwise regression models were selected from all SNPs and covariates. A value below 0.05 was considered statistically significant. All statistical analyses were performed in SAS version 9.2.

Cytokine-Induced Gene Expression in Human Islets.
Gene expression may represent an intermediate phenotype between genetic variation and disease. We therefore first evaluated the expression of established/pinpointed T1D GWAS candidate genes in human islets left untreated or exposed to a combination of proinflammatory cytokines (IL-1 + IFN + TNF ) for 48 hrs to mimic disease. We found 31 out of 46 tested genes to be expressed. Of these, 11 significantly changed their expression level following cytokine treatment ( < 0.05) ( Table 1). Six candidate genes were upregulated by cytokines, TNFAIP3, IFIH1, GSDMB, IL7R, IL10, and SH2B3, whereas 5 genes were downregulated, COBL, CTRB1, CTSH, SKAP2, and INS ( Figure 1). Comparable expression profiles of these genes were observed in a recently published human islet dataset [6].

Genetic Risk Score Modelling of Glycemic Control and -Cell Function.
A genetic risk score model was constructed from the GWAS-identified SNPs linked to the 11 genes identified above to investigate the cumulative effect of T1Dassociated risk alleles on disease progression in new-onset T1D children. The risk allele distribution is described in Supplementary  [20]) levels were increased in carriers with risk allele numbers at and above the 75th percentile (corresponding to minimum 15 risk alleles) during disease progression (HbA1c: 3, 6, and 9 months after disease onset, = 0.04, = 0.0004, and = 0.03, resp.; and IDAA1c: 9 months, = 0.04) (Figures 2(a) and 2(b)). We then performed a multiple linear regression analysis adjusted for age, sex, and HLA risk groups and found significantly increased HbA1c and IDAA1c levels with increasing genetic risk score (GRS) from 3-12 months following T1D onset ( Table 2). The validity of including GRS in the regression analysis was tested by comparing the variance explained by the model ( 2 ). This clearly showed that including GRS as explanatory factor improved the model (Supplementary Table 2). These findings suggest that residual -cell function declines faster following diagnosis in patients carrying increased genetic load of islet-expressed and cytokineregulated candidate genes.

Network and GO Analyses of Candidate Genes.
We next asked if any of the 11 candidate genes may interact with each other in a functional protein network which could explain their cumulative effects on disease progression. This was evaluated by the STRING network tool which constructed a network that contained 7 out of the 11 genes ( Figure 3). Consistent with this, the functional annotation of these candidate genes based on GO analyses revealed significantly enriched GO terms in biological processes category ( Table 3). The 11 candidate genes were found enriched for various immune-mediated processes including regulation of immune response ( = 0.0008) and immune system process ( = 0.01). These findings support that several of the 11 candidate genes act in common networks and pathways to affect disease risk and progression.

Discussion
Recent GWAS have identified a large number of loci affecting T1D risk [3]. In this study, we investigated the clinical relevance of a genetic risk score on markers of disease progression. An increased genetic risk score associated with increasing HbA1c and IDAA1c levels the first year after disease onset, indicating that a higher genetic load of isletexpressed candidate genes predicts poorer glycemic control and residual -cell function, respectively. One additional risk allele resulted in a 0.15% point increase in HbA1c after 12 months and a corresponding 0.19% increase in IDAA1c corresponding to a calculated 4% lower stimulated C-peptide [20]. Our cumulative genetic risk score assumes that each risk variant contributes with equal effects to the traits, which probably does not reflect the true underlying biology. An alternative approach would be to weight each variant by published effect sizes for T1D risk, which has been done in type 2 diabetes [22,23]. We chose the unweighted The genes that were transcriptionally regulated by cytokines in human islets are highlighted in bold, as are the corresponding risk SNPs included in the genetic risk score analysis.
HbA1c (%) * * * * *  The influence of increasing risk allele number on HbA1c and IDAA1c analyzed by genetic risk score generated from 11 qualified T1D genes in linear regression analysis adjusted for age, sex, and HLA risk groups. Data are presented as increase in HbA1c (%) and IDAA1c per additional risk allele during the first year after diagnosis in 182 children with new onset T1D.
cumulative score because disease risk and disease progression are different outcomes, which will likely not have identical effect sizes. This is underlined by our previous observation that there is no statistically significant association between HLA risk and T1D progression [19]. This is also the reason why we did not include HLA risk genes in the risk score model. The observed poorer glycemic control associated with higher genetic load might prove to be a valuable tool for prediction of disease progression. This should, however, be validated in independent cohorts. An advantage of our study is that the inclusion of variants in the genetic risk score was based on prior "biological" knowledge, as we strictly focused on islet-expressed and cytokine-regulated candidate genes. Because regulated gene expression is often highly dynamic due to positive and negative feedback mechanisms, that is, the expression of a specific gene might be increased at one time point but decreased in another and vice versa, we did not take into account in the risk score model whether genes were up-or downregulated by cytokines but simply focused on the fact that their expression level changed as we considered this most important. We may have missed genes that changed expression at different time points compared to those examined at the 48 hrs, and a more detailed time-course study in human islets would likely have allowed a greater number of SNPs to be included in the risk score and thus provide even more accurate predictions.
Interestingly, we found that 7 of the 11 investigated genes interact in a protein network and several of the genes also shared GO terms suggesting that they affect the same biological mechanisms within the -cells. We hypothesize that the genes are modulating -cell function in terms of insulin secretion and/or the regenerative capacity and/or regulate the vulnerability of the -cell to immune-mediated destruction. Indeed for some of the candidate genes, functional studies in -cells have been performed. Hence, we recently demonstrated that CTSH regulates insulin gene transcription and secretion and also has antiapoptotic properties in -cells 6 Journal of Diabetes Research  Figure 3: Protein interaction network of the 11 genes. The network was constructed using the STRING tool (http://string-db.org) and the 11 candidate genes as input. The width of the interactions depends on the confidence score to each association in STRING. [16]. Similarly, A20, the protein name of the gene product encoded by TNFAIP3, is an antiapoptotic protein that inhibits apoptosis induced by cytokines by blocking activation of the transcription factor NF B [24]. In conclusion, a cumulative genetic risk score comprising variants from 11 islet-expressed candidate genes predicted significantly poorer glycemic control and -cell function during disease progression in newonset T1D children. This knowledge might be useful to better predict disease progression after diagnosis with T1D.