Genome Wide Screening of CAG Trinucleotide Repeat Lengths in Breast Cancer

Trinucleotide repeat sequences are widely present in the human genome. The expansion of CAG repeats have been studied very extensively, and shown to be the causative mechanism of more than 40 neuromuscular and neurodegenerative diseases. In the present study, we performed a genome wide screening of CAG repeat expansions in non-neoplastic tissues of 212 breast cancer cases and 196 healthy population controls using the Repeat Expansion Detection (RED) method. Distribution of CAG repeat lengths in cases was not significantly different from controls. However, dramatically expanded CAG repeats were detected in 2.4% (n = 5) of breast cancer cases where no repeats of similar size were detected in any of the healthy population controls. Although this trend shows only borderline significance (p = 0.06), this finding suggests a potential involvement of CAG repeat expansion in breast cancer susceptibility. These repeats may potentially affect the function of cancer predisposition genes, with a similar mechanism as in neurodegenerative and neuromuscular disorders.


Introduction
Breast cancer is one of the most common causes of cancer death in women worldwide [1]. To date, two highly penetrant breast cancer predisposition genes, BRCA1 and BRCA2, have been identified [2][3][4]. Mutations in these genes are associated with a high risk of developing breast and/or ovarian cancers [5]. Mutations in a number of genes, such as p53 [6], ATM [7], and Chek2 [8] have also been shown to contribute to breast cancer risk in a very small fraction of breast cancer cases. Overall these highly penetrant mutations are only responsible for a small proportion of breast cancer cases, thus the genetic factors associated with the majority of breast cancers remain unknown. Other studies show evidence for the presence of other high-, as well as low-penetrance breast cancer alleles [9,10].
Microsatellites are widely present in the human genome in the form of mono, di, tri, or up to six nucleotide repeats [11]. These sequences are highly polymorphic, and serve as important genetic markers. Defects in the DNA mismatch repair mechanism are associated with the instability of the microsatellite repeats, which has been associated with multiple tumor types [12]. A specific form of repeat instability known as trinucleotide repeat expansion (TRE), has been shown to be associated with more than 40 neurological and neuromuscular disorders [13]. The mechanisms by which TREs disrupt the function of disease related proteins depend on their location in relation to the gene sequence. TRE, in coding regions, result in the alteration of the amino acid composition of the protein, thereby affecting its structure and function. TRE in non-coding regions may affect the regulation of transcription, translation, or posttranslational modification, resulting in defective proteins [14]. Expansion of CAG repeats is more common than any other type of TRE. The majority of known CAG expansions occur in the coding regions of genes, and only a few are located in the non-coding sequences. Diseases associated with CAG repeat expansions in the coding sequences are known as polyglutamine diseases. The threshold for most CAG expansion diseases is a repeat length of 35-40 triplets and the pathogenic repeat length varies between 26 and up to 1700 triplets [14].
The Repeat Expansion Detection (RED) methodology was first introduced by Schalling et al. in 1993 [15], and since then has been used widely to study the lengths of TRE in the human genome. This method allows the detection of expanded repeats in the genome without prior knowledge of their genomic location. Although microsatellite instability and alterations in specific repeat regions have been widely studied, genome-wide investigations of TRE in cancer have been limited. One of the main limitations of the RED method is the fact that it requires the use of large amounts of genomic DNA for screening and this may have restricted its application to other diseases, specifically when biomaterials were limited. Recently, we have developed a modified version of the RED method, which can be performed using significantly less genomic DNA [16]. To date, TREs have been shown extensively in neurodegenerative and neuromuscular disorders; however, this mechanism is likely to be related to other genetic disorders, such as cancer.
In the present study, we have used the modified version of the RED method to perform a genome wide screening for CAG repeat expansions in the peripheral blood samples of 212 breast cancer patients and 196 matching healthy population control samples.

Study population
The breast cancer cases and population control samples were selected from the population-based Ontario Familial Breast Cancer Registry (OFBCR) [17,18]. The case control design included breast cancer patients under the age of 55 with differing familial histories of breast cancer who were selected from the registry to represent the normal distribution of breast cancer in the population. These women had been diagnosed with breast cancer between the years of 1996 and 1998, and identified through the population-based Ontario Cancer Registry. Approximately 90% of the breast cancer patients were invited to enroll in the study, and approximately 68% responded with a completed family history questionnaire. Further participation included completing epidemiological risk factor and diet questionnaires, and providing a blood sample with the opportunity for genetic counseling. Approximately 90% of probands who responded to the family history questionnaire identified themselves as Caucasian. Since the prevalence of genetic variants may differ greatly among different populations, we have limited case selection to those who were Caucasian. Among breast cancer cases studied in this report, 77.8% (165/212) did not have any breast cancer history in first-degree relatives, whereas the remaining 22.2% (47/212) did.
Healthy population controls frequency-matched by five-year age groups to female cases in the OFBCR were recruited into the registry. Population controls were recruited by calling randomly selected residential telephone numbers. Eligible women who agreed to participate were mailed a package that included standard OFBCR questionnaires for family history, epidemiology and diet. Approximately 75% of those responding to the questionnaires expressed a willingness to provide a blood sample.

RED analysis
In the framework of this study, we modified the RED technique that allowed the use of a small amount of starting genomic DNA from the case and the control samples. RED analyses of cases and controls have been performed blindly to avoid any systematic error or bias in the results obtained from two study groups. RED analysis was performed using page purified (CAG) 8 probe (0.125 µM) (Invitrogen Canada, Burlington, Ontario), end-labeled in 20 µl volume, containing 1xT4 kinase buffer, 15 µCi of 32 P λ-ATP (3000 Ci/mmole) (Perkin Elmer Life Sciences, Boston, MA), 0.175 µM rATP, and 10 units of T4 kinase (Invitrogen Canada, Burlington, Ontario), incubated at 37 • C for 30 minutes. The ligation reaction was carried out in a 20 µl total volume containing 0.5 µg of genomic DNA template, 5 µl of the labeling mixture, 1X ligase buffer, and 15 units of Ampligase (Epicentre technology, Madison, WI). The reaction was performed in a PTC-100 MJ thermocycler (MJ research, Waltham, MA) by applying an initial denaturation for 5 minutes at 95 • C, then 500 cycles of ligation at 65 • C for 30 seconds, and denaturing at 95 • C for 10 seconds. Ligation products were run on 6% denaturing polyacrylamide gel (1mm thick) at 500 Volts for two hours. Gels were dried and exposed to X-ray film for 12-24 hours.

Analysis of specific CAG repeat regions
The ERDA-1 and SEF2-1a loci were genotyped using the primers ERDA1-1 and ERDA1-2 [19], and SEF2-1a and SEF2-1b [20] respectively. The CAG locus at 13q21 was genotyped using the primers 7aCAG and 7aCTG [21]. PCR reactions were carried out in 25 µl reaction volume containing 100 ng of genomic DNA, 1 mM MgCl 2 , 200 µM dNTPs, 0.25 µM of each primer with one primer end labeled with [γ-32 P] ATP (3000 Ci/mmole) (Perkin Elmer Life Sciences, Boston, MA), 1 U of Platinum Taq polymerase (Invitrogen, Canada, Burlington, Ontario), and 5% glycerol. The reactions were carried out in the MJ-Dyad thermocycler (MJ research, Waltham, MA) for 35 cycles (30 seconds denaturation at 95 • C, 30 seconds annealing at 57 • C for ERDA-1 locus and 53 • C for SEF2-1 and 13q21 loci, and 30 second elongation at 72 • C). PCR products were run on a 6% denaturing polyacrylamide gel for about 2 hours at 1500 V. The gel was dried and exposed to an X-ray film for 12-24 hours. In order to determine the repeat sizes accurately, a panel of samples in each locus were sequenced to establish their exact repeat size. These samples were used as markers to estimate the number of repeats in other samples.

Statistical analysis
The Mann-Whitney U test which is equivalent to the Wilcoxon rank sum test was used for comparison of CAG repeat length distribution between cases and control samples. The Fisher exact test was also performed to test the hypothesis of association between dramatically expanded CAG repeats and breast cancer predisposition. P values were calculated for degree of freedom DF = 1.

Results
We have applied the modified RED method to investigate the distribution of CAG repeat lengths in 212 breast cancer cases and 196 population controls obtained from the OFBCR. Since the length of the (CAG) 8 probe used for ligation reaction contained eight CAG repeats, the resulting RED fragments were multiples of (CAG) 8 , such as (CAG) 16 , (CAG) 24 , (CAG) 32 , etc. The RED products and their distribution in case and control samples are shown in Fig. 1. The sensitivity of the RED method in determining the length of CAG repeats decreases for long repeats. Therefore we were able to assign exact fragment length for fragments up to (CAG) 120 only. Larger repeats were identified as dramatically expanded repeats which were presented as (CAG) 144 in this study. We have identified CAG repeats of a length of 40 triplets ((CAG) 40 ) in 147 (69%) cases and 143 (73%) controls. Expanded CAG repeats ranging in size between (CAG) 48 and (CAG) 120 were detected in 60 (28%) cases and 53 (27%) controls. In five (2.4%) breast cancer samples, dramatically expanded CAG repeats were identified, whereas no repeats of this length were detected in any of the control samples. For validation purposes, we have randomly selected and repeated the RED analysis for 5% of all samples, as well as the samples presenting dramatically expanded CAG repeats. Statistical comparison of CAG repeat length distribution in cases and controls, using the Mann-Whitney U test, showed that the two distributions are not significantly different from each other (U = 21556 and p = 0.25). To test the association of dramatically expanded CAG repeats with breast cancer, we performed a Fisher exact test. P value was calculated using 2X2 table for the presence or absence of dramatically expanded CAG repeats in cases and controls (5 versus 207 in cases and 0 versus 96 in controls) Although statistically not significant, the Fisher exact test suggests a strong trend with border line significance (p = 0.0619).
CAG repeat expansions in the ERDA1, SEF2-1, and 13q21 loci accounts for most of the dramatically expanded CAG repeats detected using the RED method [19][20][21]. In order to confirm that the large CAG repeats detected in this study were not due to expansions in these three loci, we have analyzed the breast cancer samples with dramatically expanded repeats. None of the repeat lengths found in these loci were dramatic, and could not have been accounted for by dramatic RED expansions (Table 1).
We have also examined the pedigrees of five breast cancer cases bearing dramatically expanded CAG repeats (Fig. 2). The age of onset for these cases ranged between 45-54 years. All of the cases had a family history of at least one cancer type. Only one of the probands had a first-degree relative (mother) with breast cancer (Fig. 3c), whereas another proband had two second-degree relatives (aunts) with breast cancer (Fig. 3a). Comparison of the family history of the other cases in the study did not show any significant correlation between the presence or absence of dramatically expanded CAG repeats and family history.

Discussion
CAG repeat expansions account for the molecular basis of at least 40 neurodegenerative diseases [13]. In this study, we used the RED method to screen the length distribution of CAG repeats in breast cancer cases and matching healthy population controls in order to investigate the contribution of CAG repeat expansions to breast cancer predisposition. In our study, the distribution of CAG repeat length in control samples was similar to what has been presented in previous studies [22,23].
The association of dramatically expanded CAG repeats with breast cancer was not statistically significant; however, a trend with borderline significance suggested the involvement of repeat expansion in breast cancer. This finding may suggest a novel mutational mechanism for breast cancer susceptibility; however studies with larger sample sizes are required to further investigate the significance of this finding.
We have studied the cases with and without dramatically expanded repeats for the personal and familial characteristics of breast cancer. We have not observed any relationships with respect to age of diagnosis, and number of breast or other cancers in the family.
Since RED results do not provide information on the location of expansions in the genome, we cannot predict the mechanism of their possible functional involvement in breast cancer.

Conclusions
In conclusion, our study has shown a trend of dramatic repeat expansions occurring in breast cancer cases, however, these findings need to be validated in other populations. With strong evidence that CAG expansions are the major cause of many genetic diseases, it is highly feasible that this mechanism may be responsible for a portion of breast cancer cases in the population. Repeat expansion mechanisms have not been extensively explored in cancer and to our knowledge this is the first study suggesting the presence of CAG expansion in breast cancer.