Unique AGG Interruption in the CGG Repeats of the FMR1 Gene Exclusively Found in Asians Linked to a Specific SNP Haplotype

Fragile X syndrome (FXS) is the most common inherited intellectual disability. It is caused by the occurrence of more than 200 pure CGG repeats in the FMR1 gene. Normal individuals have 6–54 CGG repeats with two or more stabilizing AGG interruptions occurring once every 9- or 10-CGG-repeat blocks in various populations. However, the unique (CGG)6AGG pattern, designated as 6A, has been exclusively reported in Asians. To examine the genetic background of AGG interruptions in the CGG repeats of the FMR1 gene, we studied 8 SNPs near the CGG repeats in 176 unrelated Thai males with 19–56 CGG repeats. Of these 176 samples, we identified AGG interruption patterns from 95 samples using direct DNA sequencing. We found that the common CGG repeat groups (29, 30, and 36) were associated with 3 common haplotypes, GCGGATAA (Hap A), TTCATCGC (Hap C), and GCCGTTAA (Hap B), respectively. The configurations of 9A9A9, 10A9A9, and 9A9A6A9 were commonly found in chromosomes with 29, 30, and 36 CGG repeats, respectively. Almost all chromosomes with Hap B (22/23) carried at least one 6A pattern, suggesting that the 6A pattern is linked to Hap B and may have originally occurred in the ancestors of Asian populations.


Introduction
The cause of fragile X syndrome (FXS) is the expansion of CGG repeats in the 5 UTR of the FMR1 gene and subsequent hypermethylation at the CpG island in the promoter region of this gene, leading to transcriptional silence of the mRNA and absence of FMRP translation [1,2]. Affected full mutation individuals have >200 pure CGG repeats. Premutation carriers have 55-200 CGG repeats with one AAG interruption or absent AGG interruption resulting in increasing length of pure CGG repeats at the 3 end of the CGG repeat tracts. Normal individuals have 6-54 CGG repeats with two or more stabilizing AGG interruptions occurring once every 9 or 10 CGG repeat blocks [3,4]. The common patterns are (CGG)9AGG and (CGG)10AGG, found in various populations. However, the (CGG)6AGG pattern (designated as 6A) has been reported exclusively in Asian populations [5][6][7][8][9][10][11], leading to the possibility that this 6A pattern may have originated in Asia.
To explore the evolution of the 6A pattern, we studied 176 unrelated Thai males with 19-56 CGG repeats using 8 SNPs near the CGG repeats of the FMR1 gene. Of these 176 samples, we identified AGG interruption patterns from 95 samples with different CGG repeats using direct DNA sequencing. We found a specific SNP haplotype linked to the 6A pattern, and we also found something new that the SNP haplotypes showed strong associations between the common CGG repeat groups (29, 30, and 36) and AGG interruption patterns, suggesting different evolutionary lineages in the common CGG repeats of the FMR1 gene.

Materials and Methods
2.1. DNA Samples. DNA was extracted from whole blood using the standard phenol/chloroform method. The PCR for the CGG-FMR1 gene and methylation specific PCR were used with minor modification as previous reports [15,16]. We selected 176 unrelated Thai males in this study, ranging from 19 to 56 CGG repeats. At this time the Thai population is known to have three common alleles, 29, 30, and 36 CGG repeats [15]. In the analysis, samples were divided into 6 groups corresponding to common and uncommon CGG repeats: 19-28, 29, 30, 31-35, 36, and 37-56. The study protocol was approved by the Institutional Ethics Committee.

Haplotype Analysis.
The high linkage disequilibrium found among the 8 SNPs studied is shown in Figure 1(b). Allele frequencies of all SNPs are shown in Table 2. When we analyzed the SNP haplotypes, three major haplotypes, GCGGATAA (Hap A), GCCGTTAA (Hap B), and TTCATCGC (Hap C), were found. The rare haplotypes (Hap D) included 11 different haplotypes with frequencies of less than 5% each. Hap A was similar to Hap B with 2 allele differences in the SNP loci (rs1805420 and rs25731) whereas Hap A was different from Hap C for all alleles in 8 SNPs.

Association of SNP Haplotypes and CGG Repeats.
We divided the 176 samples into 6 groups based on the common and uncommon CGG repeats from small to large alleles (19-28, 29, 30, 31-35, 36, and 37-56) shown in Table 3. Strikingly, we found statistically significant associations between haplotypes and the common CGG repeat groups (Fisher's exact test < 0.001) but no statistical significance was found in other uncommon CGG repeat groups (Fisher's exact test = 0.0955). The 29-CGG-repeat group was associated with

Association of SNP Haplotypes and AGG Interruption
Patterns. We randomly selected 95 X chromosomes from 176 samples (54%) for DNA sequencing, including uncommon and common alleles. The results revealed variety in both numbers of AGG and AGG interruption patterns in the CGG repeats of the FMR1 gene ( Figure 2

Discussion
The haplotype analysis using 8 SNPs in the present study provided more information than in previous studies [9,17] which could not distinguish haplotypes with 29 CGG repeats from those with 36 CGG repeats (the third common allele exclusively found in Asians). Most chromosomes with 29  and 36 CGG repeats in Thai, Chinese, and Malay populations have G-T of the ATL1-IVS10 haplotype while the A-C haplotype was linked to chromosomes with 30 CGG repeats in Thai, Malay, Chinese, and Indian populations [9,17]. Table 2 shows that the 29 and 36 CGG repeat groups had different haplotypes from two SNPs (rs1805420, rs25731). Analysis of haplotypes using 8 SNPs in our study showed significant associations between haplotypes and the common CGG repeats (29,30, and 36). The 29-CGG-repeat group was associated with haplotype GCGGATAA (Hap A), the 30-CGG-repeat group was associated with haplotype TTCATCGC (Hap C), and the 36-CGG-repeat group was associated with haplotype GCCGTTAA (Hap B). The uncommon CGG repeats of the 19-28, 31-35, and 37-56 groups were not associated with any haplotype and had similar distributions of haplotypes. These findings suggest that uncommon CGG repeats randomly occur in all three common and rare haplotypes.
Most of chromosomes with 36 CGG repeats and Hap B had an AGG configuration of 9A9A6A9 that might be derived from chromosomes with 29 CGG repeats and Hap A (9A9A9) by 6A insertion [5]. This formation was also found in chromosomes with 43 CGG repeats and Hap B (9A9A6A6A9), which might be derived from chromosomes with 36 CGG repeats and Hap B by 6A insertion (Figure 3). However, a few Hap B-chromosomes with 27 and 29 CGG repeats had AGG configurations of 10A6A9 and 12A6A9 that might be derived from 20 (10A9) and 22 (12A9) CGG repeats of chromosomes with Hap C by insertion of 6A pattern ( Figure 3).
Hap A and Hap C had different alleles in all SNPs. This suggests that Hap A and Hap C may have different evolutionary pathways. However, Hap A and Hap B are likely evolutionarily derived since they had similar SNP haplotypes (Table 3) and both haplotypes carried 9A pattern at 5 of the CGG repeats tract (Figures 2 and 3). The evolution of CGG repeats is likely from primitive small to large CGG repeats. An evolutionary study of the CGG repeats of the FMR1 gene showed that most nonprimate mammals have a small number of uninterrupted CGG repeats with a mean of ∼8 repeats, while the repeats of primates are larger with a mean of ∼20 repeats and more highly specific interruptions [20]. Therefore, we hypothesize that there may be two distinct pathways in our findings. First, chromosomes with 29 and 30 CGG repeats may independently arise from Hap A and Hap C by gradual replication slippage or recombination via the smaller alleles [20] and were stable by the 9A9A9 and 10A9A9 patterns, respectively [11,21]. Second, the 6A pattern was linked to chromosomes with Hap B possibly derived from chromosomes with Hap A (major pathway) or Hap C (minor pathway). Simplified pathways of the hypothesis are shown in Figure 3. In addition, perhaps the 6A pattern enhances the stability of CGG repeat tracts [22,23]. Thus, chromosomes with 36 CGG repeats linked to the 6A pattern have become the third most common allele in only Asian populations. It is also relevant to note that, to date, the 6A pattern has been exclusively found in Asians [5][6][7][8][9][10][11]. A study based on an Eskimo population indicated that the 6A pattern has been stably conserved for 15,000-30,000 years, since this group migrated from Asia to North America [7].
It has been proposed that AGG interruptions play a crucial role in maintaining the stability of the CGG repeats since premutation alleles often contain only one AGG or no AGG interruptions [3,4,[24][25][26]. Haplotypes analysis using microsatellites near the FMR1 gene (DXS548-FRAXAC1-FRAXAC2) found that specific haplotypes were associated with the loss of AGG interruptions of the CGG repeats in Caucasians [27] and Jewish Tunisians [28]. In contrast, the findings in African Americans using those three microsatellites and the SNP, ATL1 did not show a haplotype association with CGG repeats instability [29]. Also, our findings in this study support earlier studies where the SNP haplotype association between nearby SNPs and AGG interruption patterns in CGG repeats of the FMR1 gene likely reflects linkage disequilibrium in each population [9,17,30]. Therefore, it is difficult to determine if an associated haplotype is a real factor for CGG repeats instability or a linkage disequilibrium in a specific population [31].

Conclusion
Our study showed new evidence that the specific haplotype (Hap B) was strongly linked to the 6A pattern in Thai subjects since almost all chromosomes with Hap B had at least one 6A configuration, regardless of CGG repeats (i.e., 10A6A9, 12A6A9, 9A9A6A6A9, and 9A9A9A6A8A9). The 6A pattern and Hap B may have originally occurred in the ancestors of Asian populations. However, we could not completely exclude that the findings may be by chance or sample selection bias. Further studies of SNP haplotypes and AGG interruption patterns in other Asian populations would be warranted, to confirm and expand on our findings.