SEMA6D Expression and Patient Survival in Breast Invasive Carcinoma

Breast cancer (BC) is the second most common cancer diagnosed in American women and is also the second leading cause of cancer death in women. Research has focused heavily on BC metastasis. Multiple signaling pathways have been implicated in regulating BC metastasis. Our knowledge of regulation of BC metastasis is, however, far from complete. Identification of new factors during metastasis is an essential step towards future therapy. Our labs have focused on Semaphorin 6D (SEMA6D), which was implicated in immune responses, heart development, and neurogenesis. It will be interesting to know SEMA6D-related genomic expression profile and its implications in clinical outcome. In this study, we examined the public datasets of breast invasive carcinoma from The Cancer Genome Atlas (TCGA). We analyzed the expression of SEMA6D along with its related genes, their functions, pathways, and potential as copredictors for BC patients' survival. We found 6-gene expression profile that can be used as such predictors. Our study provides evidences for the first time that breast invasive carcinoma may contain a subtype based on SEMA6D expression. The expression of SEMA6D gene may play an important role in promoting patient survival, especially among triple negative breast cancer patients.


Introduction
Breast cancer (BC) is the second most common cancer diagnosed in American women and is also the second leading cause of cancer death in women [1,2]. It is estimated that, in the developed world, one in eight women will develop breast cancer in her lifetime [3,4]. BC lethality is mainly caused by metastasis, which accounts for approximately 90% of BC deaths [5][6][7][8][9][10]. Metastatic BC can be treated, sometimes for many years, but cannot be cured. BC primarily metastasizes to the bone, lungs, regional lymph nodes, liver, and brain, with the most common site being the bone. Research has focused heavily on BC metastasis for many years. Multiple signaling pathways, such as TGF , Wnt, Notch, and EGF, have been implicated in regulating metastasis of BC cells [5][6][7][8][9][10]. However, our knowledge of regulation of BC metastasis is far from complete. Identification of new factors that play critical roles in driving/inhibiting metastatic progression is an essential step toward fully understanding BC metastasis and will also provide novel therapeutic targets/reagents against BC. Our labs have focused on Semaphorins (SEMAs), especially SEMA6D. SEMAs were implicated in immune responses, heart development, and neurogenesis [11][12][13] and recently in BC metastases [14][15][16][17].
Semaphorins were initially recognized as phylogenetically conserved neuronal guidance cues, and their critical regulatory roles in BC metastasis have rapidly emerged in recent years. Based on their sequence similarity, Semaphorins are classified into eight classes: classes 1-2 are found in invertebrates, classes 3-7 comprise the vertebrate Semaphorins, and class V is encoded by viruses. Class 2, 3, and V Semaphorins are secreted, while all other members are membrane tethered through a single transmembrane domain [17][18][19][20][21][22][23][24][25]. The signature structure of Semaphorins is the ∼500 amino acid (aa) Sema domain, which is a variant of the -propeller fold revealed from structural studies [22]. All Semaphorins, 2 International Journal of Breast Cancer except class V, contain a PSI (Plexin-semaphorin-integrin) domain immediately to the C-terminal side of the Sema domain. Different classes of Semaphorins may also contain additional functional motifs.
Published studies regarding functions of Semaphorins in BC have mainly focused on class 3 secreted Semaphorins and SEMA4D [14][15][16][17]. Our recent bioinformatic analysis by using public datasets reveals for the first time the potential role of SEMA6D in regulating BC pathogenesis. The extracellular region of SEMA6D, which contains the Sema and PSI domains, can be released from the cell surface to act as a secreted cytokine with unknown molecular mechanisms [26,27]. Thus, SEMA6D may act both locally through cellcell contacts and more distantly through diffusion of its cleaved ectodomain. The primary receptors of SEMA6D are PLXNA1 and PLXNA4, which belong to the Plexin receptor family. The extracellular domains of Plexins interact with Semaphorins, while the C-terminal tails of Plexins mediate intracellular signal transduction. Unlike many signaling pathways (such as the TGF /SMAD pathway), there is no "canonical" intracellular transduction cascade mediating activities of Semaphorin-Plexin signaling. Many intracellular signaling molecules, such as GTPase activating proteins, GTP/GDP exchange factors, and various tyrosine kinases, can be activated and/or inactivated by Semaphorin-Plexin signaling in a context-dependent manner [18][19][20][21][22][23][24][25].
In this study, we examine the public datasets from The Cancer Genome Atlas (TCGA), National Cancer Institute (NCI) for expression of SEMA6D along with genes that interact with SEMA6D. Other genes coregulated with SEMA6D were analyzed for their function, pathway, and potential as copredictors for BC patients' survival. We found 6-gene expression profile that can be used as such predictors. We also found that SEMA6D expression correlated with the cancer status of triple negative (TNBC) markers (ER, PR, and Her2 genes). The study shows the role of SEMA6D as potential survival predictor especially in TNBC patients.

Datasets.
The Cancer Genome Atlas (TCGA) Data Portal was used to download breast invasive carcinoma (BRCA) samples ( = 1, 100). The RNAseqV2 level 3 data, which includes fragments per kilobase of exon per million fragments mapped-(FPKM-) normalized gene level data, were used before statistics. In addition, idf and sdrf files were also downloaded for sample mapping and annotation. Clinical outcomes data were downloaded for correlation and survival analysis.

Gene Expression Data Analysis and Annotation.
Genelevel normalized expression data were used in Partek Genomic Suite (PGS, St. Louis, MO) for additional normalization, statistics, and annotation. The analysis of variance (ANOVA) methods were used for group comparisons. False discovery rate (FDR) correction (Benjamini-Hochberg methods) was applied for multiple hypothesis testing purpose. Other statistical tools such as SAS (Cary, NC) and Ingenuity Pathway Analysis (IPA, Redwood City, CA) were used for pathway analysis and building gene-gene interaction network. Heatmap was generated by using hierarchical clustering methods after z-normalization.

Survival Analysis.
A total of 140 patients with clinical outcomes data available (survival status, months of survival, demographics, and ER, PR, and HER2 status, etc.) were included in the analysis. Among significant genes after SEMA6D-high versus SEMA6D-low expression comparisons, we selected top 20 genes with the highest or lowest expression levels to correlate with clinical outcomes. Logarithm 2 based transformation of each gene was performed prior to any analysis. The correlation among these 20 genes was evaluated using Pearson correlation coefficient, and summary statistics were presented including mean with standard deviation, median, and range. Associations between level of genes and overall survival (OS) were assessed with Kaplan-Meier (K-M) curves and log-rank tests. Each gene was dichotomized as above or below median level of expression in the survival analysis. Significant association was determined at 5% type I error level. Multiple comparisons were not explicitly controlled for due to the small sample size and exploratory nature of the analysis.

Results and Discussion
Semaphorins, including members in subclass 3 and SEMA4D, have emerged as critical signaling molecules in regulating BC pathology [14][15][16][17]. The potential roles of other members of the Semaphorin family in BC have not been well addressed in the literature. Our ongoing studies suggest that SEMA6D plays a critical role in mediating Bone Morphogenetic Protein (BMP) signaling to regulate epithelial-mesenchymal-transition (EMT) by endocardial cells in developing hearts (Kai Jiao et al. 's manuscript in preparation). EMT is an essential step for initiating metastasis in BC and other cancers. We thus decided to apply a bioinformatic approach to examine the potential role of SEMA6D in BC tumorigenesis and progression using publically available datasets. After testing a few smaller datasets form Gene Ontology Omnibus (GEO), we chose to use a breast invasive carcinoma (BRCA) dataset from TCGA as it represents a major public data source with clinical outcomes information, such as overall survival after diagnosis. We divided all samples ( = 1, 100) into three roughly equal sized groups based on SEMA6D expression (high, medium, and low). The comparison of SEMA6D-high versus SEMA6D-low expression group will reveal genes that are coregulated with SEMA6D.

Gene Expression Profile by Principle Components Analysis (PCA) and Hierarchical Clustering.
To examine overall gene expression profile and sample similarities, we perform the PCA analysis of all samples. As showed in Figure 1, the PCA showed a clear separation among SEMA6D-high (H), SEMA6D-medium (M), and SEMA6D-low (L) groups. This indicates different gene expression profiles among the three groups.
Based on the genes that are differentially expressed in SEMA6D-high versus SEMA6D-low expression groups, we We further examined SEMA6D levels by including SEMA6D-medium versus SEMA6D-low expression group comparison. As shown in the Venn diagram in Figure 3, 58 unique genes are significant between SEMA6D-medium and SEMA6D-low patients, while 2,357 genes are significant in SEMA6D-high versus SEMA6D-low expression comparison. This suggests that higher level of SEMA6D expression may lead to more significant changes, as evidenced by an increased number of genes with significant changes. This also suggests that the function of SEMA6D may be dependent on the expression level or that SEMA6D-induced functional effects are dose-dependent.
Among significant genes of SEMA6D-high versus SEMA6D-low comparison, Gene Ontology (GO) analysis for biological processes revealed that multicellular organismal development and G-protein coupled receptor protein signaling pathway are among the top changed GO biological processes (Table 1). It has been shown that SEMA6D may play a role not only during heart development [26,27] but also during development of retina [28] and axon in zebra fish [29]. Our results support these findings and showed that SEMA6D played a major role in organismal development in addition to the G-protein coupled receptor signaling ( Table 1).
The semaphorins and their receptors, the neuropilins and the Plexins, are constituents of a complex regulatory system that controls axonal guidance [30]. It was suggested that  SEMA6D may bind to different receptor components and thus exert distinct functions during cardiac morphogenesis [26]. Our results suggested a broad function of SEMA6D to initiate signaling events that link to G-protein coupled receptor (GPCR) signaling (Table 1) and overall receptor activities ( Table 2). GPCRs represent a super family of cell surface signaling proteins and play essential roles in cancer metastasis [2,31,32] and are one of the most promising targets of metastatic breast cancer therapy [33][34][35][36][37].
In cell adhesion, cell-cell interactions between cancer cells with endothelium determine the metastatic spread. There are two major cell adhesions, including selectin and integrin, and accumulating evidence confirms that tumor cell interactions through them actively contribute to the metastatic spread of tumor cells [38]. Our results suggest a pivotal role for SEMA6D in tumor metastasis especially receptor activities ( Table 2) and GPCR signaling (Table 1).  The GO-molecular functions also reveal that receptor activity, sequence-specific DNA binding, and voltage-gated sodium channel activities are among top affected molecular functions when SEMA6D level is high ( Table 2). These results suggest that SEMA6D may initiate member receptors activation as a ligand. The gene-gene interactions among those genes that directly or indirectly interact with SEMA6D partially confirmed this hypothesis. As shown in Figure 4, elevated PLXNA4 may lead to an increase of SEMA6D expression and trigger transcriptions by the general transcription factors FOS and FOXO1. This may lead to a cascade of activations of membrane receptors including G-protein coupled receptors ( Table 2).
As reported, Plexin-B1 is a receptor for the transmembrane semaphorin SEMA4D (CD100) [39], and PLXNA4 negatively regulates T lymphocyte responses [40]. It has been shown that SEMA6D induces NF-B transcriptional activity in nonmalignant mesothelial cells [30]. Two potential targets of SEMA6D, the general transcription factors FOS and FOXO1, were both increased in SEME6D-high patients. FOXO1 has been widely reported in tumor oncogenesis and metastasis [41][42][43]. This suggests an important role for SEMA6D in promoting general transcription through FOS coupled with FOXO1 as previously reported [44]. The balance of transcriptions of both tumor suppressors and oncogenes may be the key to understand the underlining mechanism.
SEMA6D and Tumor Metastasis. SEMA6D plays an important role in tissue development and differentiation, a process involving epithelial-mesenchymal-transition (EMT); it will be interesting to know if EMT-related genes are coregulated in SEMA6D-high patients. As shown in Table 3, major tumor metastatic promoter-(MMP-) 9 was dramatically reduced among SEMA6D-high samples, while several tumor metastatic promoters such as TGF--related factors, ZEB2, ZEB1, and GNG11, however, were elevated corresponding to a high level of SEMA6D. In addition, the expressions of all these genes (except for DSC2) are highly correlated with the expression of SEMA6D (Table 4). As SEMA6D was implicated in VEGF-dependent and anchorage-independent cell growth [30], we also included VEGF genes in the correlation analysis. High levels of correlations between SEMA6D level and VEGFs were found as well (Table 4) suggesting a role of VEGF family genes in mediating SEMA6D signaling. MMP family proteins, especially MMP9, were suggested to be involved in the process of metastasis of breast cancer to the brain [45], CD147-mediated metastasis in MCF7 cells [19], TGF -mediated signaling at the tumor-bone interface [46], and L2-mediated matrix remodeling in metastasis and mammary gland involution [47]. Decreased autocrine EGFR signaling in metastatic breast cancer cells inhibits tumor 6 International Journal of Breast Cancer growth in bone and mammary fat pad through MMP9dependent pathways [48]. By using an RNA interference approach, the reduced levels of MMP-9 mRNA and protein correlated with inhibited phenotype of tumor invasion and metastasis [14]. Our results are in line with these findings and suggest a tumor suppressor function for SEMA6D. On the other hand, our results also showed an increased expression among SEMA6D-high samples of some important tumor promoters such as ZEB1/2, which had been reported to promote EMT by modulating Zeb1/2 and TGF expression [49]. Our results thus strongly suggest that the balance between tumor suppressors and promoters is the key to understand the role of SEMA6D during EMT. Another explanation is that the increased expression of SEMA6D may be the results, not the cause ZEB1/2 changes and vice versa.

SEMA6D Expression and Affected Signaling Pathways.
Although roles of SEMAs have been suggested in breast cancer [14][15][16][17], prostate cancer [50], and malignant mesothelioma [30], the underlying functional mechanisms including pathways are largely unknown. Nevertheless, SEMA6D has been reported to play a role in immune responses [12], NF-B signaling [30], and stromal expression of SEMA6D [50]. Our results implicated top canonical pathways (Table 5), which partially confirmed previous reports such as cAMP-mediated signaling in cervical cancer cell migration [51] and in lung cancer [52], G-protein coupled receptor signaling in breast cancer [34,53], and adhesion and diapedesis in a breast cancer cell line [54]. Therefore SEMA6D may play multiple roles during these processes although additional studies may be needed to further delineate SEMA6D functions in these pathways.

SEMA6D Expression Correlates with Patients' Survival.
In order to determine if SEMA6D and SEMA6D-related genes are correlated with overall patient survival, we filtered the significant gene list after SEMA6D-high versus SEMA6Dlow expression comparison by choosing the top 10 most upregulated genes and top 10 most downregulated genes and conducted a survival analysis by using K-M methods. We found that 6 candidate genes were significantly associated with overall survival. These genes are SEMA6D, CLEC9A,    C10orf107, DONSON, CHAC1, and TUBA1C. Two more genes COL4A6 and CBX2 genes are in borderline to be significant. A summary of these 8 genes is listed in Table 6 and Figure 5.
In addition, increased expressions of SEMA6D, CLEC9A, COL4A6, and C10orf107 are associated with better survival while decreased expressions of DONSON, CHAC1, TUBA1C, and CBX2 also correlate to better survival. Figure 5 showed survival probability based on SEMA6D expression (≥median or ≤median expressions). Similar significant separation trends were also observed for both CLEC9A and C10orf107 as positive predictors and DONSON, CHAC1, and TUBA1C as negative predictors (data not shown).

Correlation of Expressions of SEMA6D and Other Genes with TNBC Status.
As only 30% of women with metastases survive five years and virtually all TNBC women will ultimately die of their disease despite systemic therapy [55], we further explore the role of SEMA6D in promoting survival in TNBC patients. We found not only that these genes associated with survival, but also that they are interacting with TNBC status (Yes or No) significantly for SEMA6D, for example, with a log-rank = 0.0083, as shown in Figure 6. It is clearly shown that TNBC patients (Figure 6, SEMA6D-high in brown relative to SEMA6D-low in green) show larger survival differences as compared with non-TNBC patients (SEMA6D-high in red relative to SEMA6D-low in blue). Other genes such as CLEC9A ( = 0.0083) and C10orf107 ( = 0.0083) are similarly associated with TNBC status (data not shown).
These results strongly suggest that SEMA6D expression levels correlate with overall survival (Figure 5), especially in TNBC patients ( Figure 6).

Conclusions
Our study provides evidences that breast invasive carcinoma (BRCA) may contain a subtype based on SEMA6D expression. The expression of SEMA6D gene may play an important role in promoting patient survival, especially among triple negative breast cancer (TNBC) patients.