Associations of Genetic Variants at Nongenic Susceptibility Loci with Breast Cancer Risk and Heterogeneity by Tumor Subtype in Southern Han Chinese Women

Current understanding of cancer genomes is mainly “gene centric.” However, GWAS have identified some nongenic breast cancer susceptibility loci. Validation studies showed inconsistent results among different populations. To further explore this inconsistency and to investigate associations by intrinsic subtype (Luminal-A, Luminal-B, ER−&PR−&HER2+, and triple negative) among Southern Han Chinese women, we genotyped five nongenic polymorphisms (2q35: rs13387042, 5p12: rs981782 and rs4415084, and 8q24: rs1562430 and rs13281615) using MassARRAY IPLEX platform in 609 patients and 882 controls. Significant associations with breast cancer were observed for rs13387042 and rs4415084 with OR (95% CI) per-allele 1.29 (1.00–1.66) and 0.83 (0.71–0.97), respectively. In subtype specific analysis, rs13387042 (per-allele adjusted OR = 1.36, 95% CI = 1.00–1.87) and rs4415084 (per-allele adjusted OR = 0.82, 95% CI = 0.66–1.00) showed slightly significant association with Luminal-A subtype; however, only rs13387042 was associated with ER−&PR−&HER2+ tumors (per-allele adjusted OR = 1.55, 95% CI = 1.00–2.40), and none of them were linked to Luminal-B and triple negative subtype. Collectively, nongenic SNPs were heterogeneous according to the intrinsic subtype. Further studies with larger datasets along with intrinsic subtype categorization should explore and confirm the role of these variants in increasing breast cancer risk.


Introduction
Detection and characterization of the genetic diversity of disease-associated loci are a major emphasis of current scientific inquiry in cancer. Rapid technological advances have enabled us to explore the increasingly complex genetic architectures and their relationship to cancer. However, current understanding of cancer genomes is primarily a "gene centric" view [1]. Because compared to genetic variants located outside genes, genic variants are frequently judged to be more likely to alter gene function and affect disease risk [2].
Focusing only on genic variants is obviously not a comprehensive strategy for research into the genetics of cancer, due to the fact that the majority lie in nongenic regions [3]. Consistent with this theory, several recent large-scale genomewide association studies (GWAS) of breast cancer [4][5][6] have identified three novel genetic susceptibility loci (8q24, 2q35, 5p12) that are associated with the risk of breast cancer. None of these loci were in coding regions of genes, and three variants (8q24: rs13281615, 2q35: rs13387042, and 5p12: rs10941679), each of which reflected a genetically independent locus, show independent associations with risk of breast 2 BioMed Research International cancer, although statistical gene-gene interactions resulting in larger joint effects than expected by their individual relative risks could exist.
Associations between three nongenic loci SNPs and breast cancer risk have been independently replicated by subsequent studies in recent years among East Asians, Africans, and some other ethnic populations; however, a proportion of them have yielded conflicting or inconclusive results [7][8][9]. The reasons for the differences in results remain to be determined. Growing evidence suggests substantial heterogeneity for association with the polymorphism of hormone-receptor defined subtypes of breast cancer [10].
In addition, all four GWAS were conducted in European ancestry (EA) populations, who differ from women of other ethnic groups in certain aspects of their genetic architecture. In this regard, it is important to validate these findings in other ethnic populations and perhaps use the different linkage disequilibrium (LD) patterns observed in non-European ancestral population to refine associated genomic regions. In our previous studies [11][12][13], most genic single nucleotide polymorphisms (SNPs)/loci identified from previous GWAS in populations of European descent had been evaluated in Southern Han Chinese women with an urbanized lifestyle, with breast cancer rates approaching those of the West [14]. Distinctively, the ethnic Chinese in Southern China (mainly Teochew and Cantonese) represent a geographically distinct population. To date, no report has associated abovementioned nongenic loci with breast cancer risk in Southern Han Chinese women.
Herein, we evaluated the association of three nongenic loci with breast cancer risk in Southern Han Chinese women. Furthermore, the associations of these loci with four breast cancer subtypes defined by four markers (estrogen receptor (ER) status, progestin receptor (PR) status, human epidermal growth factor receptor-2 (HER-2) status, and Ki-67 expression status) were also evaluated. Without a doubt, this paper will expand and refine our previous reports on analyses mainly focusing on genic loci and thus make more definite conclusions than previous reports.

Study Population.
Individuals included in the current analysis were Han Chinese women who participated in the Southern China Breast Cancer Genetics Study (SCBCGS) [15]. The SCBCGS was a multicenter, hospital-based study of breast cancer conducted among Han Chinese women from three areas of the Southern China, including Canton, Chongqing, and Nanchang. Database review of the SCBCGS identified 609 breast cancer patients with detailed and complete information on ER, PR, HER2, and Ki-67 expression status and 882 ethnicity, age (±5 years), and community of residence matched controls for present study. Detailed information on histories of menstrual and reproductive factors, hormone therapy (HT), weight, height, and family history of breast cancer for each participant was collected during in-person interviews conducted as part of the SCBCGS. After written informed consent was obtained, a peripheral blood sample was collected from each participant. The study was approved by the Ethical Committee of Southern Medical University and was conducted in accordance with the Declaration of Helsinki.
Our previous study has shown the specific characteristics of the controls and cases by the intrinsic subtype [15]. Briefly, compared with controls, cancer cases were older and more likely to be parous with first full-term pregnancy at ≥30 years and postmenopausal HT nonuser. Notably, no significant differences were seen in basic characteristics between subtypes.

SNP Selection and
Genotyping. From 2009 to 2013, we collaborated with other two research groups to start validating breast cancer susceptibility genes (including nongenic loci) in Chinese cohorts. As far, recent GWAS have identified more than 20 different intergenic loci associated with breast cancer risk, which can be viewed in the NHGRI GWAS Catalog. However, seven GWAS [4,5,[16][17][18][19][20] published before 2013 just identified a total of 8 SNPs in 6 nongenic loci associated with breast cancer in almost European decent populations. Three SNPs from the GWAS of Murabito et al. [18] were excluded because of missing replication in another independent cohort for Stage 2 of GWAS and unknown risk alleles. Thus, only 5 SNPs showing statistically significant associations with breast cancer were selected for analysis in this study. These 5 SNPs represent 3 independent loci that are present in intergenic regions, specifically, one SNP at 2q35 (rs13387042), two SNPs at 5p12 (rs981782 and rs4415084), and two SNPs at 8q24 (rs1562430 and rs13281615). These index SNPs had reached significance in one or more replication and refinement studies of European, Asian, and/or African ancestry populations [7][8][9]. The genotyping of SNPs was done using the SEQUENOM MassARRAY matrix-assisted laser desorption ionization time of flight mass spectrometry platform [12,21].

Classification of Biologic Subtype.
Four subtypes were constructed based on the receptor status of the primary tumor, specifically, (i) triple negative (ER−, PR−, HER2−), (ii) ER−&PR−&HER2+ (ER−, PR−, HER2+), (iii) Luminal-B (ER+ and/or PR+ and either HER2+ and/or Ki67 high ), and (iv) Luminal-A (ER+ and/or PR+ and not HER2+ or Ki67 high ). The number of cases with above-mentioned tumor marker data available, the classification scheme we used based on combinations of the marker, and specific characteristics of the controls and cases classified according to the intrinsic subtype have been described in our previous reports [15].

Statistical Analysis.
For each SNP, deviation of genotype frequencies in controls from the Hardy Weinberg Equilibrium (HWE) was assessed by a goodness-of-fit 2 test. Differences in frequencies of SNP alleles and genotypes between cases and controls were evaluated using chi-square test or Fisher's exact test as appropriate. Breast cancer risk was estimated as odds ratios (ORs) and 95% confidence intervals (95% CIs), based on unconditional logistic regression and adjusted for potential confounders including age, age at first full-term pregnancy, menopausal status, and hormonal therapy use [15]. Analyses were carried out assuming a dominant, codominant, and additive allelic effect for each polymorphism. Stratified analysis according to the 4 breast cancer subtypes was additionally conducted. To correct multiple testing, we estimated the adjusted significance by applying the Bonferroni correction for all the SNPs tested in the analysis. All statistical tests were two-sided, and < 0.05 was considered statistically significant. Statistical analysis was performed using SPSS version 19.0 (IBM SPSS Statistics for Windows, IBM Corporation, Somers, NY) unless otherwise specified.

Hardy Weinberg Equilibrium Testing.
We genotyped five nongenic SNPs in 609 breast cancer cases and 882 controls. The minor allele frequencies of all tested SNPs in Southern Han Chinese are roughly similar with the corresponding frequencies of the HapMap HCB (Chinese) and JPT (Japanese) population. All the observed genotype frequencies were found to be in agreement with HWE in the controls except for rs13281615, which deviates from HWE ( < 1 × 10 −4 ) and thus was excluded from the subsequent analyses (Table 1). Table 2 shows the allele and genotype distributions of non-genic rs13387042, rs981782, rs4415084, and rs1562430 polymorphisms in the combined sample. Univariate analysis showed that rs13387042 (per-allele OR = 1.34, 95% CI = 1.05-1.72) and rs4415084 (per-allele OR = 0.83, 95% CI = 0.72-0.97) were significantly correlated with the risk of breast cancer. After adjusting for age, age at first full-term pregnancy, menopausal status, hormonal therapy use, and logistic regression analysis further confirmed these associations which remained significant in per-allele model for rs13387042 (OR = 1.29, 95% CI = 1.00-1.66) and rs4415084 (OR = 0.83, 95% CI = 0.71-0.97).

Discussion
Understanding the genetic basis of disease can transform medicine by elucidating relevant biochemical pathways for drug targets and by enabling personalized risk assessments, but medical research has focused primarily on genic variants, owing to the difficulty of interpreting nongenic mutations. However, a survey of human trait-associated SNPs found that most are located in noncoding regions (43% from nongenic regions and 45% from introns), suggesting that the search for functional polymorphisms should extend beyond genic regions [22].
Furthermore, studies trying to investigate the association between common nongenic polymorphisms with breast cancer susceptibility have yielded inconsistent results [23]. There are some points that should be concerned for such inconsistent results. Firstly, ethnic differences may attribute to these different results, since the distributions of the studying polymorphism were different between various ethnic populations. For instance, the MAF differs from Chinese population, Whites, to African descents (Table 1). On the other hand, a polymorphism may be in close linkage with another nearby causal variant in one ethnic population but not in another. Furthermore, study design or small sample size or some environmental factors may also affect the results. Regretfully all   these studies did not consider intrinsic subtypes of the breast cancers in study design. It is possible that a positive SNP association with breast cancer in individuals mainly composed of one specific subtype that may be negative in another study population mainly consisting of another subtype [24]. Thus, present study investigated whether 5 common nongenic SNPs located in "gene desert" regions were associated with specific tumor subtypes defined by four markers (ER, PR, HER2, and Ki67). This will be the first Chinese study to validate and provide convincing evidence for heterogeneity in the strength of the associations of nongenic susceptibility loci with respect to the risk of tumor subtypes. Furthermore, stratification of tumors also provided further insights into etiological heterogeneity.
First, this study confirmed that two nongenic SNPs (rs13387042 and rs4415084) were significantly associated with increased risk of breast cancer. Rs13387042 was first identified as a breast cancer susceptibility SNP in two GWAS conducted among Europeans [5,19]. Significant associations were subsequently confirmed in the later studies on Europeans and African American women [25,26]. However, the findings were inconsistent in Chinese women. For example, Dai et al. reported significant association with increased breast cancer risk [27], whereas Zheng et al. did not find significance [28].
For rs4415084, the association with breast cancer has been evaluated by three studies in Chinese since it was first identified through GWAS approach [6]; however, all three studies yielded nonsignificant results [27,29,30]. The findings from a systematic review [23] including three above-mentioned studies on Chinese women, however, are consistent with our present results.
Further subtype stratification analyses showed that rs13387042 and rs4415084 marginally associated with Luminal-A breast cancer even after adjusting for potential cofounders including age, age at first full-term pregnancy, menopausal status, and hormonal therapy use (Table 3). However, only rs13387042 was statically significant with HRE2 overexpression breast cancer cases (Table 4). Beyond the significant associations mentioned above, no significant associations were detected between all four nongenic SNPs and the Luminal-B (Table S1) and triple negative (Table S2) breast cancer subtypes in Southern Han Chinese population.
Finding the potential biological functions of such SNPs can be an important step towards further study. However, we identified that 2q35-rs13387042 and 5p12-rs4415084 are located in a 90-kb and a 100 kb LD block containing neither known genes nor noncoding RNAs, respectively. Furthermore, both SNPs are located more than 100 kb from the nearest gene: TNP1 and MRPS30, respectively. The causal variant in this region has not been determined. Thus, functional studies in these regions are likely to lead to a better understanding of mechanisms of carcinogenesis and progression of breast cancer. In addition, the ORs we obtained were small with narrow CIs indicating that when considered alone as a genetic factor, both polymorphisms have a very small effect on susceptibility to breast cancer.
A strength of our study was that ER, PR, HER-2, and ki-67 status were all assessed using the same processing protocols and criteria for pathology review for all cases. However, three important limitations of this study must be considered. First, we could not confirm that other SNPs lacked an association with specific breast cancer subtypes because we had limited samples and a lack of power to detect a true association. For example, though current study has sufficient power (>90%) to detect a log-additive OR of 1.30 with allele frequencies >27%, the exact powers to detect a log-additive OR of 1.30 of four selected SNPs (rs13387042, rs981782, rs4415084, and rs1562430) with Luminal-B subtypes using the MAF in Table 1 were 20.6%, 44.5%, 45.8%, and 32.2%, respectively. Furthermore, considering the sample size and the marginal effect size, multiple comparison was also performed. However, based on the multiple hypothesis testing, all these associations were not significant (all Bonferroni-adjusted > 0.05). Thus, larger sample sizes could help improve the power and ensure the correct conclusion regarding whether these SNPs are associated with specific breast cancer subtypes.
Additionally, we cannot rule out that other nongenic loci may be risk factors for breast cancer and specific subtypes because there may be different functional variants among different populations and specific subtypes. For example, one GWAS among Chinese women by Zheng et al. in 2010 identified a novel SNP rs2046210 at a nongenic loci of 6q25.1 exhibiting strong and consistent significant association with breast cancer across all three stages in Chinese women only [28]. Difference in the LD patterns between the significant and hidden functional variants may be another unignored possibility. For example, in the HapMap CEU population, the rs4415084 SNP resides in a LD block that is nearly 100 kb covering some gaps of low LD. In the HapMap CHB population, this big LD block was split into smaller blocks (66 kb). Thus, genotyping other known SNPs in these nongenic regions according to HapMap data may help elucidate the basis for potential ethnicity/subtype-related disparities and provide a better understanding of the genetic basis of specific breast cancer subtypes.
Third, misclassification of breast cancer subtypes is likely to be independent of susceptibly loci and thus would tend to underestimate association strengths rather than create spurious associations [29]. For example, a recent study showed a high discordance between HER2 expression based on IHC and mRNA; 60% of the tumors classified as HER2+ by IHC did not display elevated levels by mRNA expression [30]. Finally, only subtype specific analysis was conducted in present study, but there may be other interactions between gene and environment factors. Thus, further studies about geneenvironment interaction on breast cancer could be useful.

Conclusions
Collectively, our study demonstrated a marginally significant association of nongenic SNPs with breast cancer risk in Southern Chinese Han population and added evidences for differential susceptibility according to intrinsic subtype. Moreover, further investigation of larger data sets along with intrinsic subtype categorization and functional studies are required to determine how and to what degree these variant gene forms are influencing breast cancer pathogenesis.