Frequent Loss of Genome Gap Region in 4p16.3 Subtelomere in Early-Onset Type 2 Diabetes Mellitus

A small portion of Type 2 diabetes mellitus (T2DM) is familial, but the majority occurs as sporadic disease. Although causative genes are found in some rare forms, the genetic basis for sporadic T2DM is largely unknown. We searched for a copy number abnormality in 100 early-onset Japanese T2DM patients (onset age <35 years) by whole-genome screening with a copy number variation BeadChip. Within the 1.3-Mb subtelomeric region on chromosome 4p16.3, we found copy number losses in early-onset T2DM (13 of 100 T2DM versus one of 100 controls). This region surrounds a genome gap, which is rich in multiple low copy repeats. Subsequent region-targeted high-density custom-made oligonucleotide microarray experiments verified the copy number losses and delineated structural changes in the 1.3-Mb region. The results suggested that copy number losses of the genes in the deleted region around the genome gap in 4p16.3 may play significant roles in the etiology of T2DM.


Introduction
Type 2 diabetes mellitus (T2DM) is a common metabolic disease, affecting nearly 300 million individuals worldwide. T2DM affects over 10% of adult individuals over 40 years of age in Japan. The continuous increase in the number of patients is a major public health problem worldwide. Loci for rare monogenic forms of diabetes, such as maturity-onset diabetes of the young [1], mitochondrial diabetes [2,3], and Wolfram syndrome [4], have been elucidated in a limited proportion of patients. However, the etiology of sporadic T2DM remains largely unknown. Accumulating epidemiological evidence [5][6][7][8] suggests that genetic factors play an important role in the susceptibility to sporadic T2DM, in addition to environmental factors such as obesity, aging, and exercise.
Copy number variations (CNVs) or structural variations, such as deletion or gain of a genomic region, are increasingly recognized as important interindividual genetic variations across the human genome. CNVs account for more nucleotide variation between two individuals than do SNPs [19][20][21]. Repetitive, multicopy regions, such as segmental duplications and low copy repeats associated with CNV, are regarded as "rearrangement hotspots," and CNV regions are predisposed to the generation of deletion/duplication events [22]. Such repeat-rich regions were recently found to  show 13-fold enrichment of CNVs over the average genomic coverage in a reference assembly [23]. Therefore, CNVs or structural variations are recognized as significant contributors to human genetic disease and disease susceptibility [24]. In the search for susceptibility gene(s) for T2DM genes, we recruited a panel of 100 early-onset Japanese T2DM patients (onset age <35 years) and 100 controls, and performed CNV analysis in the whole genome using the deCODE-Illumina CNV370K BeadChip which focuses on the CNV-rich region of the human genome, followed by validation and characterization using an Agilent region-targeted high-density custom-made oligonucleotide tiling microarray. We found frequent copy number losses within the 1.3-Mb subtelomeric region in a substantial portion of earlyonset Japanese T2DM patients. This region surrounds the genome gap in 4p16.3, which is rich in multiple low copy repeats.

Subjects.
We considered that the early onset of T2DM reflects the presence of more genetic factors rather than environmental factors. Therefore, we adopted young-onset diabetic patients as case subjects. We studied 100 unrelated Japanese T2DM patients who developed T2DM before 35 years of age. They were recruited at Tohoku University Hospital and affiliated hospitals and medical clinics. Diabetes was diagnosed using the WHO criteria. Type 1 diabetes mellitus was excluded judged from clinical features and existence of anti-GAD (glutamic acid decarboxylase) antibodies or anti-IA-2 (insulinoma-associated antigen-2) antibodies. Patients with diabetes mellitus due to hepatic disease, pancreatic disease, other endocrinological disease, or mitochondrial DNA mutation, or drug-induced diabetes were excluded, judged from laboratory data and clinical history.
We also studied 100 nondiabetic control subjects, using the following criteria: 60 or more years of age, no prior diagnosis of diabetes mellitus, HbA1c less than 6.4% (where HbA1c (%) was estimated as an NGSP (National Glycohemoglobin Standardization Program) equivalent value (%) calculated by the formula HbA1c (%) = HbA1c (JDS: Japan Diabetes Society value) (%) + 0.4%, considering the relational expression of HbA1c (JDS) (%) measured by the previous Japanese standard substance and measurement methods for HbA1c (NGSP) [25] and no family history of T2DM within third-degree relatives, in order to exclude subjects who were more likely to develop diabetes later.
Clinical features available from 100 early-onset T2DM patients and 100 controls are shown in (see Table S1 in Supplementary Material available online at doi: 10.1155/ 2011/498460). In addition, clinical features of the 13 earlyonset T2DM patients with copy number losses in 4p16.3 are shown separately in Supplementary Table S2 and Supplementary Table S3 in comparison with the rest of 100 earlyonset T2DM patients without copy number loss (n = 87).
Genetic analysis of human subjects was approved by the ethics committee of Tohoku University Graduate School of Medicine. Appropriate informed consent was obtained from all the subjects examined.

Screening with Whole-Genome CNV BeadChip.
We screened the whole genome by CNV analysis using the deCODE-Illumina CNV370K BeadChip (Illumina Infinium system, deCODE genetics, Inc., Iceland), which, in addition to Hap300 SNP marker contents, has CNV probes designed to target the CNV-rich region of the whole genome. The CNV part of the platform consists of probes covering CNVrich regions of the genome, such as megasatellites (tandem repeats >500 bp), duplicons (region flanked by highly homologous segmental duplication >1 kb), unSNPable regions (>15 kb gaps in HapMap SNP map, and 5-15 kb gaps with >2SNPs with Hardy-Weinberg failure), and CNVs registered in the Database of Genomic Variants. The CNV part of probe content consists of 15,559 CNV segments covering 190 Mb, or 6% of the human genome. The platform has been tested in 4000 Icelandic and HapMap samples.
Data analysis of the deCODE-Illumina CNV chip was carried out using DosageMiner software developed by deCODE genetics, and loss/gain analysis consisted of the following four steps; (1) intensity normalization and GC content correction, (2) removal of batch effects using principal component analysis, (3) calling of clusters using a Gaussian mixture model, and (4) determination of CNV type using graphical constraints. In brief, CNVs were identified when CNV events stood out in the data, as all sample intensities for CNV probes should be increased or decreased relative to neighboring probes that are not in a CNV region. To determine deviations in signal intensity we started by normalizing the intensities. The normalized intensities for each color channel were determined by an equation and fit formula developed by deCODE genetics. A stretch with occurrence of more than one marker showing abnormality in the copy number in a consecutive stretch in the genome is considered more likely to be evidence of deletion or gain [26]. We display Supplementary Table to present raw data, that is, log 2 ratio measured at each probe for every individual. Raw data at screening step via deCODE/Illumina beads chip for all probes on chromosome 4p is shown in Supplementary Table S4.

High-Density Custom-Made Oligonucleotide Tiling
Microarray Analysis. DNA samples from 13 early-onset T2DM patients and 15 control individuals were subjected to Agilent's high-density custom-made oligonucleotide tiling microarray analysis based on an array comparative genomic hybridization (aCGH) assay. We fabricated a custom-designed microarray targeted to a 1.3-Mb genome region in the subtelomere at 4p16.3 (Chr. 4: 550,000-1,850,000 (NCBI Build 36.1, hg18)) according to previously described methods [27,28]. In brief, we used the Agilent website (http://earray.chem.agilent.com/earray/) to select and design our custom tiling array; the array consisted of probes 60-mer in size (Agilent Technologies, Santa Clara, CA).
Tiling-aCGH experiments were performed essentially as described previously [29]. In brief, test and reference (NA19000, a Japanese male from HapMap project) genomic Copy number losses are displayed as gray vertical bar. We defined two copy number classes, that is, "unchanged copy number" and "copy number loss." "Unchanged copy number" was defined when the log 2 ratio stays within the mean ± 1 SD distribution among the normal population. "Copy number loss" was called when the downward-deviation of log 2 ratios exceeded a threshold of 1 SD from the median probe ratio.
DNAs (250 ng per sample) were fluorescently labeled with Cy5 (test) and Cy3 (reference) with a ULS Labeling Kit (Agilent Technologies). For each sample, respective labeling reactions were mixed and then separated prior to hybridizing to each of the arrays. Labeled test and reference DNAs were combined, denatured, preannealed with Cot-1 DNA (Invitrogen) and blocking agent, and then hybridized to the arrays for 24 hr in a rotating oven at 65 • C and 20 rpm (Agilent Technologies). After hybridization and washes, the arrays were scanned at 3 µm resolution with an Agilent G2505C scanner. Images were analyzed with Feature Extraction Software 10. of abnormal copy number, losses, and gains, in a complex multicopy variable region by high-density tiling array was accessed by deviation of probe log 2 ratios that exceeded a threshold of 1 SD from the median probe ratio, according to procedures described previously [29][30][31]. We defined two copy number classes, that is, "unchanged copy number" and "copy number loss." "Unchanged copy number" was defined when the log 2 ratio stays within the mean ± 1 SD distribution among the normal population. "Copy number loss" was called when the downward-deviation of log 2 ratios exceeded a threshold of 1 SD from the median probe ratio. Raw data, that is, log 2 ratio measured at each probe for every individual obtained by high-density custom-made tilling array analysis is shown in Supplementary Table S5.

Results
In searching for CNVs associated with early-onset T2DM, we screened the whole genome by CNV analysis using the deCODE-Illumina CNV370K BeadChip in 100 earlyonset Japanese T2DM patients and 100 controls. We found four CNVs that fulfilled our screening criteria: (1) Figure 1 shows the pattern of alterations in copy number loss observed among the 100 early-onset Japanese 2DM patients. Thirteen patients displayed copy number losses around the gap in the 4p16.3 subtelomere whereas only one of 100 control samples showed copy number losses in this region. The association was statistically significant by Fisher's exact test (P = 6.75 × 10 −3 , OR = 14.7, 95% confidence interval 3.02-72.3). We observed two copy number classes, that is, "unchanged copy number" and "copy number loss" at 1.3-Mb region of chromosome 4p16.3 in our diabetic or control populations. The latter was found frequently observed among early-onset T2DM patients at the 4p16.3 subtelomere. The position, length, or pattern of deletion between one copy number loss in control and 13 copy number losses in T2DM patients were not apparently distinct, although the case number is too small to draw meaningful conclusion.
To verify the CNV BeadChip results, we analyzed copy number changes along the 1.3-Mb region in the subtelomere surrounding the genome gap of 4p16.3 using a high-density custom-made oligonucleotide tiling microarray. We used peripheral blood DNA of the 13 early-onset T2DM patients, identified, and 15 control healthy individuals. Again, we found frequent copy number losses in regions around the genome gap in all the 13 early-onset T2DM patients, whereas none of 15 healthy individuals showed copy number losses. Figure 2 shows detailed structure of copy number losses in four representative early-onset Japanese T2DM patients with copy number losses, in the 1.3-Mb region in the subtelomere (patients 1, 2, 3, 4; in Figures 2(a), 2(b), 2(c), and 2(d), resp.). Individual copy number plots using moving average (y-axis) versus distance along the chromosome (xaxis) are shown. As a comparison, copy number plots of healthy individuals who did not exhibit copy number alterations in the region are also shown (Figures 2(e) and 2(f)). Figure 3 shows genomic copy number losses in all the 13 early-onset T2DM patients. High-density tiling custommade microarray showed segmental losses in the subtelomeric region of 4p16.3 in all the 13 patients. Genomic copy number losses in these patients were clustered around a gap region in the 4p16.3 subtelomeric region.

Discussion
Our initial genome-wide screening with deCODE-Illumina CNV370K BeadChip for association with early-onset T2DM revealed losses in the subtelomeric region of 4p16.3. Subsequent high-density custom-made oligonucleotide tiling microarray verified copy number losses in this region.
It is worthy to note that most patients with copy number losses were treated with insulin injection. Urine C-peptide reactivity was not significantly different between the two groups, and only few patients underwent glucagon challenge test, thus, the data are too limited to infer the function of insulin secretion. We did not observe significant differences between two groups as to age of onset, body mass index, postprandial plasma glucose levels, or HbA1c levels. Fasting immunoreactive insulin levels were examined in 5 patients with copy number losses and 14 patients without copy number loss. HOMA-R in the former patients was 6.1 ± 6.8 (mean ± SD) and that in the latter patients was 5.5 ± 6.0 (mean ± SD); these values were not significantly different. Incidence of dyslipidemia, hypertension, diabetic retinopathy, nephropathy, or neuropathy was not different between the two groups (Supplementary Tables S2, S3). Further investigation of a large panel of patients would be necessary to clarify any clinical differences that might be present between the two groups.
The current map of CNV in the human genome reported in the existing databases is far from complete [32]. Increasing numbers of CNVs have recently been identified around repetitive sequences such as segmental duplications or low copy repeats. In fact, these repeat-rich regions were found to be 13-fold enriched in CNV over the average genomic coverage in the reference assembly [23]. Probes for conventional genome-wide SNP genotyping platforms are likely to be underrepresented; that is, only 25% and 40% of CNV are  [23]. These recent findings may partly explain why earlier genome-wide association studies for CNVs in the T2DM population failed to detect CNV loci being strongly associated with T2DM [20,21,33,34]. We found copy number losses among early-onset Japanese T2DM patients in a region surrounding a genome gap (gap-177) in the subtelomere of chromosome 4p16.3. The physical map of the human genome still contains a significant number of genome gaps; over 300 gaps still remain in the human draft genome sequence [35] that are considered inaccessible by most existing genotyping and sequencing technologies [36]. These gap regions are estimated to harbor ∼1000 genes, which comprise approximately 5% of 8 Experimental Diabetes Research the human genome (∼200 Mb). In particular, they are abundant in subtelomeres and pericentromeric regions of chromosomes [37,38]. Some of these gaps are thought to be susceptible sites for mediating meiotic recombination and are also susceptibility sites for break points for deletions [39,40].
Many CNVs were recently identified in repeat-rich regions [41], which are predisposed to the generation of deletion/duplication events [22]. It is intriguing to note that the locus-specific mutation rates for CNV or structural rearrangements were estimated to be between 10 −6 and 10 −4 : two to four orders of magnitude greater than nucleotidespecific rates for base substitutions or point mutations [19].
The deleted CNV region found in the present study and its flanking region contained 34 genes (Table 1). Gene(s) in this region may predispose to T2DM.
Recently, through whole-genome screening of a copy number variation using a CNV BeadChip and real-time quantitative polymerase chain reaction (qPCR), Kato et al. identified a segmental copy number gain within the 40-kb region on 10p15.3 subtelomere in patients of sporadic amyotrophic lateral sclerosis (SALS) [62]. They demonstrated the copy number gain in 46 out of 83 SALS patients, as compared with 10 out of 99 controls. The copy number gain region they identified is rather small (40-kb) and harbored two genes encoding isopentenyl diphosphate isomerase 1 (IDI1) and IDI2. Thus, they suggested the copy number gain in the region of these genes may play a significant role in the pathogenesis of SALS. The present study share a similar genome abnormality in that we found frequent copy number alterations in subtelomere region among sporadic adult-onset disease of unknown cause. The copy number alterations we identified here in early-onset T2DM were copy number losses on different chromosome subtelomere (4p16.3) and the size is rather large (up to 1.3-Mb). In our case, we suspect that multiple genes in the region may be involved in diabetes pathogenesis through impairments caused by copy number losses.

Conclusion
These results suggested that copy number losses of the candidate genes in the deleted region surrounding the genome gap in 4p16.3 may play significant roles in the etiology of T2DM. Further functional study, as well as investigation of 4p16.3 loss in a large panel of early-onset T2DM patients in different ethnic populations and geographical regions, is warranted.