Breast cancer is the second most diagnosed malignancy and the second leading cause of cancer-related deaths among women in the US [
Breast cancer development is driven by acquired driver somatic mutations; however, germline genetic variants play a role in tumorigenesis by partaking in critical biological and cellular processes. For decades, germline mutations, contained within the heritable genome, and somatic mutations, acquired de novo by breast cancer cells, have been considered as separate research endeavours, and each has unique clinical applications and implications for patient care. A critical challenge faced by clinicians and patients is the identification of patients at high risk of developing aggressive disease that could guide the application of precision medicine and precision prevention in TNBC and non-TNBC. Achieving that goal requires understanding the germline somatic mutation interaction landscape and discovery of molecular markers driving each disease and distinguishing the two types of breast cancer.
Advances in microarray technology have enabled molecular classification of TNBC and non-TNBC [
The recent surge of next-generation sequencing of the cancer genomes has opened new options in clinical oncology, from discovery of driver mutations to implementation of precision medicine [
The objective of this study was to delineate germline and somatic mutation interaction landscape in TNBC and non-TNBC and to determine whether there are differences in gene expression and somatic mutation burden between the two types of breast cancer. We focus on the two types of breast cancer. Our working hypotheses were that (1) genomic alterations in genes containing germline and somatic variations could lead to measurable changes associating genetic predisposition with tumorigenesis and distinguishing TNBC from non-TNBC and (2) integrative analysis combining germline and somatic mutation information at the gene level will uncover molecular networks and signalling pathways through which germline and somatic variations interact and cooperate to drive TNBC and non-TNBC. We addressed these hypotheses using an integrative genomic approach that integrates germline variation information from GWAS with somatic mutation information from next-generation sequencing on TNBC and non-TNBC from TCGA, using gene expression data from TCGA as the intermediate phenotype. Our modelling approach focuses on the genes, gene regulatory networks, and signalling pathways rather than on individual mutations. This robust approach was designed to establish the potential causal association between genetic predisposition and tumorigenesis and to provide valuable insights about the broader biological context in which germline and somatic mutations interact and cooperate to drive TNBC and non-TNBC. It is worth noting that the two subtypes of breast cancer have many subtypes which we did not consider here, a weakness that we readily acknowledge as it is beyond the scope of this investigation. As pointed out earlier in this section, our focus on TNBC and non-TNBC was motivated by evidence from both clinical and epidemiological studies showing that TNBC has poorer outcomes and poorer survival rates when compared to non-TNBC [
Advances in high-throughput genotyping and next-generation sequencing technologies enabled discovery and creation of comprehensive catalogues of germline and somatic mutations. These discoveries have increased our understanding of the genetic susceptibility landscape and the molecular taxonomy of breast cancer. However, analyses of germline and somatic mutations have historically been considered as separate endeavours in breast cancer research. With the availability of germline, somatic, and gene expression variation data and powerful bioinformatics tools, we are now well-positioned to understand the causal association between genetic susceptibility and tumorigenesis through integrative analysis. Here, we integrated data on germline, somatic, and gene expression variation to delineate the germline-somatic mutation interaction landscape in TNBC and non-TNBC. The overall study design and execution strategy used in this study is presented in Figure
Project design, data processing, and analysis workflow for integrative analysis combining germline with somatic mutation information in TNBC and non-TNBC using gene expression data as the intermediate phenotype. RNA-seq read count data and somatic information were downloaded from the TCGA via the GDC. Germline mutation information was manually curated from GWAS studies and supplemented with information from the GWAS catalogue. LIMMA (R) package was used for the discovery of differentially expressed (DE) mutated and nonmutated genes. Ingenuity Pathway Analysis (IPA) was used for the discovery of molecular networks and biological pathways enriched for germline and somatic mutations.
We used population-level GWAS discoveries, specifically single-nucleotide polymorphisms (SNPs) (herein referred to as germline mutations) and genes associated with an increased risk of developing breast cancer from a comprehensive catalogue that we have developed and published [
Somatic mutation and gene expression along with clinical information were obtained from TCGA via the Genomics Data Commons (GDC) using the data transfer tool
The data processing and analysis steps are shown in the project design and execution workflow presented in Figure
The genes were ranked on
To discover significantly differentially expressed and differentially somatic mutated genes distinguishing TNBC from non-TNBC, we compared gene expression levels and number of mutation events per gene between the two types of breast cancer. Genes associated with both types of breast cancer were not included in this analysis to avoid confounding of the results. Differentially somatic mutated genes were identified by counting the number of mutation events per gene in each type of breast cancer. If the gene had somatic mutations in only one type of breast cancer, it was considered differentially mutated. To identify genes containing germline and somatic mutations, we evaluated all the 754 genes containing germline mutations for the presence of somatic mutations and their association with each type of breast cancer measured by their expression. Germline mutated genes significantly associated with each type of breast cancer were further evaluated for differences in their expression levels and somatic mutations between the two types of breast cancer.
We used the Core Analysis and pathways build modules implemented in the Ingenuity Pathway Analysis (IPA) software platform, QIAGEN Inc., USA [
To test whether the genes containing germline and somatic mutations discovered in this investigation have clinical utility and to validate them as potential clinically actionable biomarkers, we evaluated them against two clinically validated assays as described below:
For the first assay, we used the Prosigna (PAM50), a 50-gene signature that has gained prominence in clinical applications as a prognostic gene signature in breast cancer [ For the second assay, we used the MammaPrint, a clinically validated assay consisting of 70 genes developed by Agendia Corporation [
We chose the two assays because both the PAM50 and MammaPrint were developed using gene expression, which is also used in this investigation as the intermediate phenotype. For these validation analyses, we used several approaches: First, we investigated whether the genes containing both germline and somatic mutations are present in the PAM50 and the MammaPrint assays. Second, we evaluated the genes in these assays against highly somatic mutated genes significantly associated with each disease to eliminate the bias imposed by the limited number of genes containing germline mutations. Third, we investigated whether the genes containing germline and/or somatic mutations significantly associated with each disease are functionally related and interact with genes in the PAM50 and/or MammaPrint assays. The third approach was necessitated by the limited number of the genes in each assay. We reasoned that genes in these clinically validated assays may be regulated or may be regulating other genes which are altered in the germline, somatic, or both genomes.
We compared gene expression levels between TNBC and controls and between non-TNBC and controls to discover and characterize signatures of mutated and nonmutated genes associated with the two types of breast cancer. Genes were ranked and selected using estimates of
To discover gene signatures uniquely associated with each type of breast cancer and gene signatures associated with both types of breast cancer, we evaluated mutated and nonmutated genes using adjusted
Venn diagrams showing the distribution of genes containing somatic mutations (a) and genes without somatic mutations (b) significantly differentially expressed between cases and control samples in TNBC and non-TNBC. Genes in the intersections were significantly associated with both types of breast cancer.
Among the somatic mutated genes (Figure
Having discovered signatures of mutated and nonmutated genes associated with each type and/or both types of breast cancer, we performed additional analysis to investigate the differences in gene expression and mutation burden between TNBC and non-TNBC. For this analysis, we created and analysed a new data set of 8,875 genes, which was generated by combining the 1,489 genes containing somatic mutations significantly associated with TNBC only and the 7,386 genes containing somatic mutations significantly associated with non-TNBC only. Genes associated with both types of breast cancer were not included in this analysis to eliminate confounding of the results.
The analysis revealed a signature of 6,887 significantly differentially expressed genes distinguishing TNBC from non-TNBC. The signature included 290 genes somatic mutated in TNBC, 4,957 genes somatic mutated in non-TNBC, and 1,640 genes somatic mutated in both types of breast cancer. A list of the top 30 highly significantly differentially expressed somatic mutated genes between TNBC and non-TNBC with high somatic mutation events per gene is presented in Table
List of 30 significantly differentially expressed genes mutated in TNBC and non-TNBC with high somatic mutation events per gene.
Genes | Chromosome position | Adjusted | TNBC somatic mutation events | Non-TNBC somatic mutation events |
---|---|---|---|---|
19p13.11 | 3 | |||
6p21.1 | 3 | |||
5q33.1 | 3 | |||
4q21.23 | 2 | |||
4q13.3 | 2 | |||
19p13.12 | 2 | |||
14q22.2 | 2 | |||
3p21.31 | 2 | |||
16q22.2 | 2 | |||
1p34.2 | 2 | |||
14q32.33 | 2 | |||
11p14.3 | 2 | |||
Xp11.22 | 2 | |||
1p34.3 | 2 | |||
6p21.32 | 2 | |||
10p14 | 99 | |||
14q21.1 | 23 | |||
Xp22.2 | 21 | |||
Xp11.22 | 18 | |||
Xp22.13 | 17 | |||
12q24.11 | 16 | |||
7q31.1 | 16 | |||
1q41 | 16 | |||
9q34.13 | 16 | |||
3q13.11 | 15 | |||
Xq22.3 | 15 | |||
3p21.2 | 15 | |||
2q31.1 | 14 | |||
16q21 | 14 | |||
9p11.2 | 14 |
Note: blank cells in the 4th and 5th columns indicate that the gene is not mutated in that type of breast cancer.
This confirmed our hypothesis that there are differences in gene expression and somatic mutation burden between TNBC and non-TNBC. Additionally, the results showed that some of the differentially expressed genes tend to be somatic mutated in both types of breast cancer. Overall, there was significant variation in the number of somatic mutations per gene for genes mutated in each type and/or both types of breast cancer. The number of somatic mutation events per gene for the genes mutated in TNBC ranged from 1 to 3. The most highly mutated genes were
As noted earlier in Introduction, breast cancer develops through somatic driver mutations; however, germline mutations can potentiate tumorigenesis via diverse mechanisms. To establish the association between germline and somatic mutation information, we performed additional analysis. We hypothesized that genes containing germline mutations also contain somatic mutations and that these genes are associated with either TNBC or non-TNBC or both. As the first step in addressing this hypothesis, we evaluated all the 754 genes containing germline mutations associated with an increased risk of developing breast cancer for association with each type or both types of breast cancer using gene expression
The results showing the distribution of germline and somatic mutated genes and nonmutated genes from these analyses are presented in Venn diagrams in Figure
Venn diagram showing the distribution of genes containing both germline and somatic mutations, germline mutations only, and somatic mutations only and nonmutated in (a) TNBC and (b) non-TNBC. (c) Venn diagram showing the overlap in genes containing both germline and somatic mutations in TNBC and non-TNBC.
When we evaluated germline mutated genes for the presence of somatic mutations and association with non-TNBC, we discovered 531 genes containing both germline and somatic mutations (Figure
Following the discovery of genes containing both germline and somatic mutations associated with each type and both types of breast cancer, we performed additional evaluation to discover genes containing both germline and somatic mutations uniquely associated with TNBC and non-TNBC or both. This evaluation was restricted to 661 genes (i.e., 237 genes containing both germline and somatic mutations associated with TNBC plus 424 genes containing both germline and somatic mutations associated with non-TNBC). The results of this evaluation are presented in Figure
Having discovered gene signatures enriched for germline and somatic mutations associated with each type of breast cancer, we evaluated the genes in the signatures for the number of mutation events per gene, focusing on genes containing both germline and somatic mutations and associated with each type of breast cancer. The results showing a list of the top 30 highly somatic mutated genes out of the 237 genes containing both germline and somatic mutations associated with TNBC are presented in Table
Top 30 genes containing both germline and somatic mutations among genes significantly associated with TNBC
Genes | Chromosome position | Genetic variant | GWAS | Expression | Mutation events |
---|---|---|---|---|---|
16p13.3 | rs12920416 | 7 | |||
6q25.3 | rs140842923 | 6 | |||
17q21.31 | rs1799950 | 5 | |||
2q34 | rs13393577 | 5 | |||
18q12 | rs9956546 | 5 | |||
22q13.1 | rs12483853 | 5 | |||
4q21.23 | rs71599425 | 4 | |||
14q12 | rs140783387 | 4 | |||
7q35 | rs10487920 | 4 | |||
Xp21.1 | rs1293906 | 4 | |||
2p23.3 | rs1971136 | 4 | |||
1q22 | rs11406084 | 4 | |||
5q11-q12 | rs6151904 | 4 | |||
5p15.1-p14.3 | rs2562343 | 4 | |||
20q13.33 | rs6062356 | 4 | |||
7q22 | rs17157903 | 4 | |||
1p12 | rs1962373 | 4 | |||
1q22 | rs4971059 | 4 | |||
8q23.1 | rs12546444 | 4 | |||
16p13.3 | rs11076805 | 3 | |||
7q21.2 | rs10644111 | 3 | |||
1q22 | rs10796944 | 3 | |||
2p23.3 | rs144079028 | 3 | |||
11q22-q23 | rs1801516 | 3 | |||
6p22.3 | rs3819405 | 3 | |||
17q25.3 | rs8074440 | 3 | |||
1p36.22 | rs199867187 | 3 | |||
18q11.2 | rs1436904 | 3 | |||
17q21.2 | rs72826962 | 3 | |||
7p15.3 | rs7971 | 3 |
Top 30 genes containing germline and somatic mutations significantly associated with non-TNBC
Genes | Chromosome position | Genetic variant | GWAS | Expression | Mutation events |
---|---|---|---|---|---|
Xp21.1 | rs1293906 | 41 | |||
1p12 | rs372562666 | 27 | |||
7q22 | rs17157903 | 22 | |||
11q22-q23 | rs1801516 | 21 | |||
13q14.2 | rs2854344 | 20 | |||
6p21.3 | rs1801201 | 19 | |||
1q22 | rs10796944 | 18 | |||
11p11.2 | rs11039183 | 18 | |||
3p26.1 | rs6787391 | 17 | |||
10p15.1 | rs55910451 | 16 | |||
17q24.2 | rs36059695 | 15 | |||
2p23.3 | rs144079028 | 15 | |||
7q35 | rs10487920 | 15 | |||
9q31.1 | rs10512287 | 15 | |||
5q13.1 | rs184886 | 15 | |||
22q13.1 | rs12483853 | 15 | |||
1p36.22 | rs199867187 | 14 | |||
10p13 | rs10906522 | 14 | |||
2q24.3 | rs148760487 | 14 | |||
10q26.13 | rs2253762 | 14 | |||
13q32.1 | rs1926657 | 13 | |||
16p13.3 | rs11076805 | 13 | |||
14q32.11 | rs941764 | 13 | |||
10q26.13 | rs35054928 | 13 | |||
11q13.2 | rs55908905 | 13 | |||
3q23 | rs1802904 | 12 | |||
17q21.31 | rs1799950 | 12 | |||
13q13.1 | rs11571833 | 12 | |||
2q33.1 | rs3769821 | 12 | |||
5p14.3 | rs66783663 | 12 |
Top 30 genes with both germline and somatic mutations distinguishing TNBC from non-TNBC
Gene name | Chromosome position | SNP_ID | GWAS | Expression | GWAS event | TNBC mutation event | Non-TNBC mutation event |
---|---|---|---|---|---|---|---|
4q21.23 | rs1963045 | 1 | 2 | ||||
6p21.32 | rs169494 | 1 | 2 | ||||
1p36.13 | rs2992756 | 1 | 2 | ||||
19p13.11 | rs8170 | 1 | 1 | ||||
20p12.3 | rs16991615 | 1 | 1 | ||||
8q24.21 | rs11780156 | 1 | 1 | ||||
19p13.13 | rs78269692 | 1 | 1 | ||||
10p13 | rs10906522 | 1 | 14 | ||||
2q24.2 | rs148760487 | 1 | 14 | ||||
13q32.1 | rs1926657 | 1 | 13 | ||||
14q32.11 | rs941764 | 1 | 13 | ||||
5p14.3 | rs66783663 | 1 | 12 | ||||
1p13.2 | rs1230666 | 1 | 12 | ||||
2q35 | rs6436017 | 1 | 11 | ||||
15q26.1 | rs8037430 | 2 | 10 | ||||
1q24.3 | rs1894633 | 1 | 10 | ||||
6q23.1 | rs6569648 | 1 | 10 | ||||
17q23.2 | Deletion | 2 | 9 | ||||
3p13 | rs6805189 | 1 | 9 | ||||
6p21.33 | rs3132610 | 1 | 8 | ||||
2p23.3 | rs6725517 | 1 | 8 | ||||
17q21.33 | rs2075555 | 1 | 8 | ||||
7q22.1 | rs71559437 | 1 | 8 | ||||
22q12.2 | rs132390 | 1 | 8 | ||||
5p15.33 | rs190811224 | 1 | 8 | ||||
15q26.3 | rs1546713 | 2 | 8 |
The results showing a list of the top 30 most highly somatic mutated genes out of the 424 genes containing both germline and somatic mutations associated with non-TNBC are presented Table
To address the hypothesis that the 56 genes containing both germline and somatic mutations uniquely associated with TNBC and the 243 genes uniquely associated with non-TNBC (Figure
The most highly mutated genes in TNBC were
To delineate the possible oncogenic interactions and cooperation between genes containing germline and somatic mutations, we performed network and pathway analyses as described in Material and Methods separately, for each type of breast cancer. For TNBC, we used the 56 genes containing both germline and somatic mutations uniquely associated with TNBC and the 99 highly somatic mutated genes (i.e., ≥5 somatic mutation events per gene) that were highly significantly associated with TNBC. Likewise, for non-TNBC, we used the 243 genes containing both germline and somatic mutations uniquely associated with the disease and the 246 highly somatic mutated associated with the disease. The rationale for including highly somatic mutated genes without germline mutations was driven by the realization that GWAS discoveries explain only a small proportion of the phenotypic variation. Crucially, genetic variants from GWAS may not necessarily be causal but may be interacting and cooperating with highly somatic mutated oncogenes involved in the causal mechanisms through
The results showing molecular networks enriched for germline and somatic mutations in TNBC are presented in Figure
Molecular networks enriched for germline and somatic mutations in TNBC. Genes in red font contain germline and somatic mutations, and genes in blue font contain germline mutations only. Nodes represent the genes, and vertices represent functional relationships. Genes in black fonts are functionally mutated genes.
The results showing molecular networks enriched for germline and somatic mutations in non-TNBC are presented in Figure
Molecular networks enriched for germline and somatic mutations in non-TNBC. Genes in red font contain germline and somatic mutations, and genes in purple fonts contain germline mutations only. Nodes represent the genes, and vertices represent functional relationships.
Overall, there was overlap in molecular networks and signalling pathways discovered in TNBC and non-TNBC. For example, the signalling pathways involved in DNA repair and DNA damage were discovered in both types of breast cancer. Interestingly, in both TNBC and non-TNBC, genes containing germline mutations strongly associated with breast cancer were functionally related and interacting with highly somatic mutated genes in gene regulatory networks and signalling pathways. Taken together, the results of this investigation confirmed our hypothesis that in the context of breast cancer, TNBC and non-TNBC can be considered as emergent properties of molecular networks and signalling pathways influenced by both germline and somatic mutations. The investigation revealed that integrating germline with somatic mutation information holds promise for discovering the molecular mechanisms through which germline and somatic mutations interact and cooperate to drive TNBC and non-TNBC.
To validate and investigate the potential clinical utility of the discovered germline-somatic mutated genes, we performed
Evaluation using PM50 revealed the
Evaluation using MammaPrint did not reveal genes containing both germline and somatic mutations significantly associated with TNBC or non-TNBC. However, the analysis revealed 3 somatic mutated genes:
We used an integrative genomic approach combining data on germline and somatic variation using gene expression data as the intermediate phenotype to delineate possible oncogenic interactions and cooperation between genes containing germline and somatic mutations in TNBC and non-TNBC and to investigate the difference in mutation burden between the two types of breast cancer. The investigation revealed that genes containing germline mutations also contain somatic mutations. The investigation also revealed differences in gene expression and mutation burden between TNBC and non-TNBC. Most notably, the investigation revealed multiple gene regulatory networks and signalling pathways enriched for germline and somatic mutations in each type of breast cancer. To our knowledge, this is the first study to comprehensively characterize the germline-somatic mutation interaction landscape in TNBC and non-TNBC. The link between germline and somatic mutations in breast cancer has been explored [
The discovery of highly significantly differentially somatic mutated gene signatures between TNBC and non-TNBC suggests that breast cancer may be amenable to mutation-based classification [
The discovery of functionally related genes containing both germline and somatic mutations is of particular interest. The clinical significance of this finding is that it provides a rational basis that breast cancer may be amenable to predictive modelling to identify patients at high risk of developing aggressive disease such as TNBC, a key step in the realization of precision prevention strategies. This discovery may also provide insights about how and when the cancer cells are likely to gain the propensity to acquire malignancy transformation into a lethal disease.
The discovery of gene regulatory networks and signalling pathways enriched for germline and somatic mutations is highly significant. It suggests that breast cancer is an emergent property of molecular networks and signalling pathways enriched for germline and somatic mutations. The investigation further revealed that interaction and cooperation between germline and somatic mutations during tumorigenesis occurs through gene regulatory networks and signalling pathways. The clinical significance of these findings is that such signalling pathways could be used as therapeutic targets.
The majority of the germline mutations discovered thus far through GWAS map to noncoding regions such as intronic regions with undefined functions and their causal relationship with the disease have not been characterized. This investigation demonstrates that integrating germline with somatic mutation information provides a rational basis for establishing causal relationship between germline mutations and tumorigenesis. This is important given the limited evidence showing that cancer susceptibility variants are preferential targets for somatic mutations [
As noted earlier in this report, to date, genetic variants are being incorporated in risky prediction models such as polygenic risk scores [
In this study, we used the PAM50 and MammaPrint clinically validated and FDA-approved prognostic assays [
This study delineated the germline-somatic mutation interaction landscape in TNBC and non-TNBC. However, limitations must be acknowledged. Both GWAS and TCGA data sets lack diversity in ethnic population and clinical phenotype representation that would further inform these results. This limited progress must be balanced against the recognition that GWAS and TCGA studies have almost been exclusively focused on women of European ancestry. There is need for similar studies including women from underrepresented ethnic populations to ensure equitable use of genomic information to improve human health and eliminate health disparities [
The investigation revealed oncogenic interactions and cooperation between genes containing germline and somatic mutations and showed that these complex arrays of interacting genetic factors occur through molecular networks and signalling pathways driving TNBC and non-TNBC. The investigation revealed differences in gene expression and somatic mutation burden between TNBC and non-TNBC. Further research is recommended to validate and ascertain the specificity of germline mutations to TNBC and non-TNBC in different ethnic populations including African American women to ensure equitable use of genomic information to improve human health.
GWAS data is provided in Supplementary Table SG provided as supplementary materials to this report. Additional GWAS information is available at the GWAS catalogue managed by the European Bioinformatics Institute:
The content in this report is solely the responsibility of the authors and does not represent the official views of the National Institutes of Health or any funding source.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors have no conflict of interest to declare.
CH, JW, TM, and LZ conceived, designed, and drafted the manuscript. All four coauthors participated in data processing, analysis, integration and visualization, data interpretation, manuscript writing, and preparation. All authors read and approved the final draft of the manuscript.
The authors wish to thank Louisiana State University, School of Medicine, for providing funding in support of this research and the patients who volunteered and provide the tumor samples used to generate both GWAS and TCGA data. We thank patients who contributed to this study and the NCI Office of Cancer Genomics and acknowledge NIH grants. This research was supported by Louisiana State University, School of Medicine, in New Orleans and National Institute of Health grant numbers LSUHSC # U54 GM12254691 and UAB # UL1TR001417 which the authors acknowledge.
All additional results from the analysis are shown in the supplementary tables described. Supplementary Table SG: GWAS information-genetic variants and genes associated with an increased risk of developing breast cancer. Supplementary Table SM: somatic mutated genes significantly associated with TNBC vs. control and non-TNBC vs. control. Supplementary Table SN: nonsomatic mutated genes significantly associated with TNBC vs. control and non-TNBC vs. control. Supplementary Table S2A1: list of all somatic mutated genes significantly associated with TNBC. Supplementary Table S2A2: list of all somatic mutated genes significantly associated with non-TNBC. Supplementary Table S1: differentially expressed and uniquely somatic mutated genes between TNBC and non-TNBC. Supplementary Table SA3: significantly differentially expressed germline mutated genes with and without somatic mutations in TNBC. Supplementary Table SB3: significantly differentially expressed germline mutated genes with and without somatic mutations in non-TNBC. Supplementary Table S3A: genes containing both germline and somatic mutations significantly associated with TNBC. Supplementary Table S3B: genes containing both germline and somatic mutations significantly associated with non-TNBC. Supplementary Table S3C: a complete list of 251 genes containing both germline and somatic mutations distinguishing TNBC from non-TNBC.