Integrated Bioinformatics Analysis for the Screening of Hub Genes and Therapeutic Drugs in Androgen Receptor-Positive TNBC

As the most invasive and lethal subtype of breast cancer (BC), triple-negative breast carcinoma (TNBC) is of increasing interest. However, the androgen receptor (AR) still has an unclear role in TNBC. The current study is aimed at testing the diagnostic and therapeutic performance of novel biomarkers for AR-positive TNBC. The GSE76124 dataset was analyzed by combining WGCNA and other bioinformatics methods. Subsequently, function enrichment analysis was applied to identify the relationships between these differential expression genes (DEGs). Subsequently, the protein-protein interaction network was established, and the hub genes were identified by Cytoscape software. Eventually, the miRNA-hub gene modulate network was developed and the Drug-Gene Interaction Database (DGIdb) was applied to verify the potential drugs for AR-positive TNBC. In the current research, 88 DEGs in total were selected from the intersection of the purple module genes identified by WGCNA and limma package. TFF1, FOXA1, ESR1, AGR2, TFF3, AGR3, GATA3, XBP1, SPDEF, and TOX3 were selected as hub genes by the MCC method, which were all upregulated. The survival analysis suggested that TFF1 was the only one related to significant lower survival rate in TNBC. Ultimately, hsa-miR-520g-3p and hsa-miR-520h were found taking part in the regulation of TFF1, and 2 small molecules were identified as the potential targets for AR-positive TNBC treatment. As a result, our study suggested that hsa-miR-520g-3p, hsa-miR-520h, and TFF1 might have significant potential values for AR-positive TNBC diagnosis and prognosis prediction. TFF1, hsa-miR-520g-3, and hsa-miR-520h may serve as the novel therapeutic targets, and our findings offer further insights into the therapy of AR-positive TNBC.


Introduction
The incidence of BC is the highest of all female malignancies worldwide. In 2020, US statistics indicated that BC accounted for 30% of female malignancies and had a 15% mortality rate [1]. At present, the prognosis of breast cancer was improved by several clinical treatment methods, mainly including chemotherapy, radiation therapy, surgery, immunotherapy, and targeted therapy [2]. Although surgery is still the mainstay of early BC treatment, chemotherapy and radiotherapy are important in reducing recurrence and improving prognosis. Recently, more and more drugs targeting HER2 have been developed and identified to improve the prognosis of HER2-positive BC patents. As for the hormone receptor-(HR-) positive BC, in addition to the classic aro-matase inhibitors and estrogen receptor antagonists, CDK4/6 inhibitors are used extensively in clinical practice in recent years [3]. Regrettably, the lack of biomarkers for early detection and identified targets for treatment meant that patients with TNBC were diagnosed late and benefited little from targeted or hormonal treatments [4]. Consequently, TNBC patients generally faced high risks of metastasis as well as recurrence and had a worse prognosis, with reduced overall survival (OS) and disease-free survival (DFS) [5][6][7]. Notably, a number of gene mutations have been previously described over the years as being highly significantly associated with an increased risk of BC. In particular, breast cancer 1 (BRCA1) and BRCA2 are tumor suppressor genes with high penetrance. They are identified to take part in the processes of activating the cell-cycle GSM1974618      Scale-free fit index (left) and the mean connectivity (right) for soft-thresholding powers. When β was set at 8, the scale-free network was constructed in the GSE76124 database. (c) Clustering dendrograms of genes based on dissimilarity topological overlap and module colors in GSE76124 database. 24 coexpression modules were established and marked by different colors. (d) Visualizing the gene network using a heat map plot. The module eigengene dendrogram and heat map verified that the purple module was positively correlated with AR-positive TNBC. (e) Analysis of module-trait relationships of TNBC based on the dataset GSE76124. Pearson correlation coefficient matrix was calculated between traits and modules. The correlation coefficient of each module and the corresponding P value were displayed. A positive correlation between the purple module (containing 227genes) and the ARpositive TNBC was indicated with a P < 0:05 (correlation coefficient = 0:87, P < 0:01). (f) A scatter plot of GS for AR-positive TNBC and the MM in the purple module. Intramodular analysis indicated that the genes in the purple module had a high correlation with ARpositive TNBC, with P = 4:6e − 54 and correlation = 0:81.

4
Disease Markers checkpoints and DNA repair to further respond to DNA damage. Consequently, the application of poly[ADP-ribose] polymerase (PARP) inhibitors targeted to PARP proteins associated with DNA repair mechanisms is shown to be efficient in BC with BRCA1/2 gene mutation [8]. Additionally, the circulating tumor DNA (ctDNA) was reported to be useful in the diagnosis and surveillance of BC. ctDNA is detected by "liquid biopsies," which are noninvasive means by simply obtaining blood instead of tumor tissue biopsies. To date, a number of ctDNA biomarkers have been identified and used for the diagnosis and prognosis of BC, including tumor protein p53 (TP53), AKT1, and phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha (PIK3CA) etc., but they still lack specificity for TNBC [9]. The molecular subtyping of TNBC is important for the correct classification of cancer lesions and for predicting patient prognosis. Therefore, an increasing amount of research is focused on identifying new molecules critical to TNBC in order to provide improved diagnosis, prognostic prediction, and treatment strategies for this malignant tumor. Breast carcinoma is classified into four subtypes based on the expression of genes and receptor proteins [10]. TNBC accounts for 15-20% of all subtypes of BC and is usually described as ER-negative, PR-negative, or HER2-negative [11]. In 2011, Lehmann et al. performed a cluster analysis on the gene expression profiles of 587 patients with TNBC from 21 public databases and proposed that TNBC could be divided into 6 subtypes based on the gene expression level. They were named basal-like-1 (BL-1), basal-like-2 (BL-2), mesenchymal (M), immunomodulatory (IM), mesenchymal stem-like (MSL), and luminal androgen receptor (LAR) [12]. In 2015, Burstein et al. conducted the genomewide analysis of 198 TNBC samples to determine 4 TNBC subtypes: basal-like immune-suppressed type (BLIS), basallike immune-activated (BLIA), mesenchymal (MES), and LAR. At the same time, they pointed out that the prognosis of BLIS-type TNBC was poor, while BLIA-type TNBC had a good prognosis (P < 0:05) [13]. AR was identified as a steroid receptor superfamily member and was expressed in over 70% of BC as well as approximately 25%-35% of TNBC [14][15][16][17][18][19]. Some studies showed that AR-positive patients were notably related to a low risk of cancer recurrence and patient mortality. Nevertheless, in other studies, positive AR expression of TNBC patients was related to poorer clinical performance, and therapeutic AR blockade was worth considering as a possible endocrine therapy [14,[20][21][22][23]. Consequently, the potential therapeutic strategies for ARpositive TNBC may be provided by identification of novel biomolecules and signaling pathways.

Disease Markers
Some oncological academics suggested that diagnostic, prognostic, and predictive values were generally essential for good biomarkers. In addition, the application of bioinformatics approaches to integrate biomarker data will provide us with new insights in biological pathways and regulatory mechanisms with disorders. Hong and colleagues used systemic and comprehensive bioinformatics to identify an 8-miRNA signature that can improve the current TNM staging system and provide a molecular assay to forecast recurrence in TNBC patients after surgery [10]. Moreover, the study conducted by Burstein et al. found novel subtype-specific targets that could be targeted for the effective treatment of TNBCs through RNA and DNA profiling analysis for the datasets from the GEO database and Baylor College of Medicine [13]. Besides, Candido et al. revealed the roles that IL6, IL6R, and IL6ST played in epigenetic regula-tions in cancer by use of cancer genomic and epigenomic datasets from TCGA [24]. The bioinformatics and computational analysis were applied not only in the oncology field but also in other diseases. Giambò and colleagues identified the vital genetic and epigenetic alterations related to pesticide exposure by a series of computational analyses of gene expression, miRNA expression, and DNA methylation datasets from the GEO database [25].
Our study was aimed at identifying novel effective biomarkers for TNBC especially for AR-positive TNBC. For this purpose, a series of continuous bioinformatics methods and computational statistical analysis are applied to miRNA profiling from public database.
WGCNA (weighted gene coexpression network analysis) was a systematic biological method to identify the modules of highly associated genes to establish a coexpression 7 Disease Markers network on the basis of gene expression data [26]. Genes were expressed and functionally similar in the same coexpression module.
The total expression of genes in the coexpression was reflected by the first principal component, named module vector [27]. The WGCNA was employed to define the hub genes of the cluster, which could play as the potential biomarkers of disease or targets for therapy. Moreover, the molecular mechanism of BC development was able to be illustrated through the modulatory networks of the genes involved [28,29]. WGCNA had been adopted to find several potential biomarkers in different fields, including neurodegenerative diseases, cancers, and immune disease [30,31].
In this study, microarray data of the GSE76124 dataset was collected from the Gene Expression Omnibus database (GEO database, https://www.ncbi.nlm.nih.gov/geo/). These samples were defined as the AR-positive subtype TNBC group and the other-three-subtype (MES, BLIA, and BLIS) TNBC group [13]. WGCNA was used to establish coexpression networks for both groups, identify modules associated with AR positivity, and obtain the key genes in the modules. Subsequently, the limma package was applied to recognize the differentially expressed genes (DEGs) between the ARpositive subtype and the other three subtypes of TNBC tissues. The candidate genes related to AR-positive TNBC were finally selected by combining DEGs and WGCNA  , and the correlation between TFF1 and RNA expression was verified. After that, the regulatory miRNAs of TFF1 were identified, has-miR-520g-3p and hsa-miR-520h, which were considered to be associated with the regulatory mechanism of AR-positive TNBC development. Eventually, the miRNA-hub gene network was further established. The Drug-Gene Interaction Database (DGIdb) was utilized to verify and find the candidate drugs for AR-positive TNBC. Therefore, our work was aimed at illustrating the potential molecularly mechanisms in promoting the prognosis of AR-positive TNBC. The results may further provide insights into the diagnosis and therapies of AR-positive TNBC.

Public Datasets and Data
Preprocessing. The gene expression microarray datasets (GSE76124 and GSE167213) were retrieved from GEO, which were processed on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array). Inclusion criteria were as follows: (a) mRNA expression data were available; (b) more than 100 TNBC samples were available with complete clinical information; (c) the AR state for each sample was exact. 9 Disease Markers calculation to further assess the dataset. Subsequently, the quality of the dataset was evaluated by drawing a weight map, relative logarithmic expression map, residual symbol map, and RNA degradation map. Besides, the KNN method was adopted to add the missing values [32]. Probes with gene annotation and matched only one genetic symbol were included in the current study. Eventually, 23,519 genes in 198 samples from the GEO database were screened out for the coexpression network establishment after ranking the variance of the descending alignments.

Weighted Gene Coexpression
Network Analysis (WGCNA). As an approach for gene set expression analysis, WGCNA was adopted to establish a network, in which the genes and the interrelationships between genes were represented as the points and lines, respectively. The coexpression network was established by use of the R package WGCNA (http://www.r-project.org/) in the R environment [33]. Generally, matrices of paired Pearson correlation coefficients were created to assess the similarities between genes in TNBC patients. Subsequently, the power adjacency function was applied to realize the conversion of the similarity matrix and the adjacency matrix. According to the scale-free network, we further built the topology of the coexpression network. And the function of soft connectivity from the WGCNA package was employed to select the softthreshold power β. With a low power (<30) scale-free Topology Fit Index (TFI) of 0.9 or more, the topology of 10 Disease Markers the gene coexpression network was considered scale-free and there were no batch effects. Thus, the power β = 8 was chosen [34]. After that, the Topological Overlap Measure (TOM) was adopted to detect network modules [35]. The minimum module size was fixed at 30, and the other parameters were fixed at their default values. At the same time, the first principal component of a given module was measured to calculate the module eigengene to represent each module. Different modules were indicated by different colors. The gray module was used to indicate the group of genes that was not categorized into any modules. Subsequently, Module-Trait Relationships (MTRs) were applied to establish the vital relationship among module eigengenes and TNBC subtypes categorized in the GSE76124 database. Gene Significance (GS) was calculated to identify the relevance of traits and genes. Module Membership (MM) was evaluated to confirm the relevance of the expression profile and every module eigengene. At last, the genes with high GS and significant MM were identified in the TNBC subtype.

Differentially Expressed Gene
Screening. The DEGs between the AR-positive subtype and the other 3 subtypes of TNBC tissues were identified by use of the R package limma. The DEGs were defined as the gene that met the cut-off criteria of jlog 2fold change ðFCÞj > 1 and P value < 0.05. Afterward, Venn diagrams were adopted to select the overlapped DEGs. Eventually, the final overlapping DEGs were selected from the intersection of the WGCNAidentified module genes (purple module genes) and aforementioned common DEGs for subsequent function analysis.

Gene Functional Annotation Analysis.
The final DEGs were selected to conduct a functional enrichment analysis. The online database DAVID (https://david.ncifcrf.gov/) was employed to perform the GO and KEGG pathway enrichment analyses. Three categories were included in GO analysis, cellular component (CC), biological process (BP), and molecular functions (MF). Various pathway information of the genes were contained in KEGG analysis [36].

Protein-Protein Interaction (PPI) Analysis.
The PPI network for final overlapping DEGs was established by the STRING database with a combined interaction score > 0:4 and visualized via Cytoscape software (version 3.8.2). Next, the cytoHubba was employed to select the hub genes according to the network. The first 10 genes selected with the MCC method were defined as core genes [37,38].

Survival Analysis and Validation of the Hub Genes.
The prognostic value of the identified hub genes in TNBC was assessed by the online database KM plotter that included the gene expression profile and corresponding prognostic information of patients from the TCGA and GEO databases. In the current study, the parameters were arranged as follows: (1) the negative expression of ER, PR, and HER-2; (2) only JetSet best probe set. Besides, the GEPIA platform was employed to confirm the mRNA expression levels of the core genes in tumor and normal breast tissues (jlog 2FCj cut − off > 1 and P value cut-off < 0.01) [39]. Another dataset from the GEO database, GSE167213, was used as the validation group. The expression of hub genes was calculated between AR-positive TNBC samples and other subtypes of TNBC samples. 59 5 TargetScan ENCOR 3 Figure 7: Validation of the overlapped miRNAs in the two databases (ENCORI and TargetScan) visualized by Venn diagram. A total of 8 miRNAs regulating TFF1 were predicted by ENCORI and 64 miRNAs were predicted by TargetScan. 5 common miRNAs regulating TFF1 were identified via the Venn diagrams.

Survival
Analysis and Validation of the Candidate miRNAs. The relevance of TFF1 and its candidate miRNAs was confirmed by ENCORI. Meanwhile, the selected miRNA expression levels were compared among tumor and normal breast tissues. TFF1 was found to have significant overexpression in AR-positive TNBC, which was associated with the unfavorable prognosis. Hence, the miRNAs moderating TFF1 were hypothesized to be related to better prognosis of TNBC. Furthermore, the prognostic correlation of the selected miRNAs was evaluated by the KM plotter (the paraments were set as TCGA, TNBC, and OS).
2.9. The Interaction of Drug-Hub Gene. DGIdb (http://www .dgidb.org/search_interactions; version 3.0.2) was adopted to select the drugs on the basis of the core genes served as potential therapeutic targets. The interaction network of the hub genes and possible drugs was created by means of the Cytoscape software [41].

Establishment of the Coexpression Module and
Identification of the Core Module in TNBC. A weighted coexpression network was built through R package WGCNA. The 198 samples of the GSE76124 database were clustered to filter outliers for follow-up study, and 2 outlier samples (GSM1974605 and GSM1974616) were removed by setting the height line at 150; then, the new cluster was proposed and a characteristic heat map was exhibited based on the subtypes of TNBC (Figure 1(a)). Subsequently, the power of β = 8 (scale-free R2 = 0:90) was selected as the softthresholding parameter to make sure of a scale-free network (Figure 1(b)). From 23,519 genes, 24 modules were distinguished and every one was represented with an individual color in the hierarchical clustering dendrogram (Figure 1(c)). The module-trait association was evaluated by the relevance between the module eigengene and the clinical characteristics including TNBC subtypes. Interestingly, the positive correlation between the purple module (containing 227 genes) and the AR-positive TNBC was indicated with the P < 0:05 (correlation coefficient = 0:87, P < 0:01) (Figure 1(e)). Subsequently, the module eigengene dendrogram and heat map indicated that the purple module was positively related to AR-positive TNBC (Figure 1(d)). After that, the scatterplot of GS vs. MM was drawn based on the coexpression purple module (Figure 1(f)). Consequently, the purple module was selected as the candidate module for further analysis.  Figure 2). Afterwards, Venn diagrams were used to identify the overlapped DEGs. As a result, 201 overlapped DEGs were discovered, in which 64 were downregulated (P < 0:05, log 2FC < −1) and 137 were upregulated (P < 0:05, log 2FC > 1) (Figures 3(a) and 3(b)). Eventually, 88 DEGs were selected from the intersection of the purple module genes identified by WGCNA and aforementioned common DEGs (Figure 3(c)).  13 Disease Markers to find the potential biological functions. As shown in Figure 4(a) (P value < 0.05), biological processes involved in DEGs are positive regulation of apoptotic cell clearance, cellular response to tumor necrosis factor, regulation of complement activation, detection of molecule of bacterial origin, regulation of intracellular transport, and lung goblet cell differentiation. Cellular components of DEGs are integral component of plasma membrane, extracellular exosome, axon, dendrite, apical plasma membrane, and other organism cells. Molecular functions involved in DEGs are carbohydrate binding, epidermal growth factor receptor binding, endopeptidase inhibitor activity, estrogen response element binding, complement binding, and dystroglycan binding. KEGG pathway enrichment analysis was applied to the 88 DEGs, indicating that DEGs was mostly enriched in the metabolic pathways.

Establishment of PPI Network and Identification of Hub
Genes. The PPI network for the 88 overlapping DEGs was established by STRING and displayed in Cytoscape, containing 81 nodes and 63 edges. Subsequently, the topological analysis methods were employed to pick out hub genes, and the top 10 genes were identified by the MCC method. As a result, TFF1, FOXA1, ESR1, AGR2, TFF3, AGR3, GATA3, XBP1, SPDEF, and TOX3 were selected, and they were all upregulated (Figure 4(b)).

Expression and Survival
Analysis for Hub Genes. GEPIA was employed to detect the expression of 10 hub genes (the paraments were set as jlog 2FCj cut − off value = 1 and P value cut-off value = 0.01). The outcomes manifested that the 10 core gene expression levels were statistically higher in tumor tissues than in normal breast tissues ( Figure 5). Besides, the survival analysis for the aforementioned 10 hub genes were performed via KM plotter, indicating that only TFF1 was related to significantly poorer survival outcome (P < 0:05) in TNBC ( Figure 6). Therefore, TFF1 was marked as the key hub gene. In the validation group, the GSE167213 dataset, the expression of TFF1 was calculated between AR-positive and other subtypes of TNBC samples. As shown in Supplemental Figure 1, the expression level of TFF1 in AR-positive TNBC was significantly higher than that in other subtypes.

Candidate miRNA Survival Analysis and Expression
Analysis. The KM plotter was applied to the 4 candidate miRNAs for survival analysis. hsa-miR-520g-3p, hsa-miR-520h, and hsa-miR-2278 were statistically related to poor prognostic outcomes of TNBC (P < 0:05). Although hsa-miR-187-3p was correlated with the poor prognosis, it was excluded without statistical significance (Figure 8). Eventually, ENCORI pan-cancer analysis was conducted to indicate the differences of hsa-miR-520g-3p, hsa-miR-520h, and hsa-miR-2278 expression between tumor and normal breast tissues. Figures 9(a)-9(c) illustrate that hsa-miR-520g-3p and hsa-miR-520h were significantly downregulated in breast cancer samples. There was no noticeable difference in the expression of has-miR-2278 in breast cancer and normal samples.
3.8. The Interaction of Drug-Gene Network. Two potential drugs for AR-positive TNBC patients were suggested by drug-gene interactions. In the current study, based on the 14 Disease Markers significant outcomes of survival analysis, FTT1 was selected as the hub gene, meanwhile AFIMOXIFENE (4-4-hydroxytamoxifen) and raloxifene were identified as the potential targeted drugs. However, only raloxifene was approved by the FDA. The drug-gene network was visualized by Cytoscape (Figure 9(d)).

Discussion
As a malignant disease whose pathogenesis is not fully understood, breast cancer is highly heterogeneous in terms of patient prognosis and tumor genetics. TNBC is more aggressive than other subtypes of BC, and patients suffering from TNBC showed a higher mortality rate [11,42]. The heterogeneous nature of TNBC makes the treatment of tumors more challenging. It is essential to understand the regulatory mechanisms behind the development of TNBC so as to enhance the therapeutic response of tumors. In some studies, a subgroup of TNBC had been established with AR expression, finding that AR was expressed in 15% to 35% of all TNBC, indicating that AR is able to be a possible target of TNBC treatment. Observations from these studies also revealed the vital role of AR in promoting the migration and invasion of TNBC cells. Actually, AR is able to perform multiple roles in BC progression and serve as an effective target for the management of AR-positive TNBC patients in the clinical setting [12,21,[43][44][45][46]. As a new tool that is based on complex algorithms, WGCNA for network modelling enables the identification of multiple biological associations of biological networks with their phenotypes. Recently, WGCNA was employed in several studies of refractory diseases to further clarify the regulatory mechanisms, including Alzheimer's disease [47], familial combined hyperlipidemia [48], and BC [29]. In the current study, WGCNA methods and DEG analysis were applied to detect the differences between AR-positive TNBC and non-AR-positive TNBC samples, respectively.
The results of the WGCNA analysis identified critical modules of clinical significance and were screened for purple modules by conservation assessment. Subsequently, the overlapped genes of DEGs and the purple module were selected for further study.
After that, GO and KEGG analyses were applied to study the chiefly relevant biological pathway of the intersection genes, and a PPI network was created. GO analysis suggested that the intersection genes principally participated in such pathways, including positive regulation of apoptotic cell clearance, cellular response to tumor necrosis factor, regulation of complement activation, detection of molecule of bacterial origin, integral component of plasma membrane, extracellular exosome, carbohydrate binding, epidermal growth factor receptor binding, endopeptidase inhibitor activity, and estrogen response element binding. Several biological pathways have been confirmed in previous studies [49][50][51]. KEGG analysis suggested that metabolic pathways were markedly enriched. Interestingly, the study of TNBC conducted by Jia et al. also proposed vital enrichment pathways of cellular senescence [52].
Finally, TFF1, FOXA1, ESR1, AGR2, TFF3, AGR3, GATA3, XBP1, SPDEF, and TOX3 were selected as hub genes; these were all upregulated. The survival analysis for the aforementioned 10 genes was performed via KM plotter, indicating that only TFF1 was related to statistically poorer survival in TNBC. ENCORI and TargetScan were adopted to identify the candidate miRNA of TFF1. According to survival and expression analyses, hsa-miR-520g-3p and hsa-miR-520h were selected as the candidate miRNAs of TFF1 (P < 0:05).
Mammalian trefoil factors consisted of 3 stable secretory proteins, TFF1, TFF2 and TFF3, which were coexpressed together with mucins through the epithelial cells of the gastrointestinal tract [53]. TFF1 belongs to the trefoil factor family, that is, a classic secreted peptide released from gastric surface mucous cells. 60 amino acid residues made up human TFF1, including 7 cysteine residues [54]. However, TFF1, TFF2, and TFF3 were initially recognized as estrogen-responsive gene products in BC cells [55]. The study conducted by Yi et al. demonstrated that TFF1 expression was much lower in TNBC and positively correlated with breast cancer survival. Moreover, they found that serum concentrations of TFF1 were lower in TNBC sufferers compared to non-TNBC sufferers, which correlated with the clinical features of BC sufferers, for instance, ER, PR, and HER2 status [56], whereas another study of BC reported that TFF1 was positively related to Circ-TFF1, and both of them were upregulated. In vitro, knockdown of Circ-TFF1 blocked BC cell growth, invasion, migration, and EMT while in vivo limiting tumor proliferation [57]. In addition, TFF1 was also considered as a biomarker of metastatic colon carcinoma [58]. Moreover, some studies indicated that TFF1 played an important role in the interacting of H. pylori and epithelial cells and related to gastric cancer [54,59].
Although several studies manifested that TFF1 was related to different kinds of carcinoma, few studies on TFF1 in AR-positive TNBC have been reported.
Afimoxifene (4-hydroxytamoxifen) and raloxifene were selected as the potential drugs, but only raloxifene was approved by FDA. Meanwhile, the mechanisms of these 2 drugs were still unknown in AR-positive TNBC. Afimoxifene (4-hydroxytamoxifen, tradename TamoGel) was a novel estrogen inhibitor being investigated for various estrogen-dependent conditions, such as gynecomastia and cyclic breast pain. A previous study about estrogen response element (ERE) indicated that across a wide range of 4hydroxytamoxifen (OHT) concentrations, OHT-hERα was closely related to the pS2 ERE and weakly to the PI-9 ERU [60]. Raloxifene was an oral selective estrogen receptor modulator (SERM) with estrogenic effects on the bones and antiestrogenic effects on the uterus and mammary gland. The observations of some studies demonstrated that the risk of invasive BC was decreased among postmenopausal women with osteoporosis during treatment of raloxifene [61]. Meanwhile, raloxifene was reported to be associated with vascular relaxing properties and treatment of postmenopausal women with schizophrenia [62,63]. Wu et al. found that tamoxifen was associated with the induction of autophagy in TNBC cells, which was related to the endoplasmic 15 Disease Markers reticulum stress and AMPK/mTOR [64]. The TNBC mouse models were used by Taurin and colleagues to evaluate the therapeutic value of raloxifene, suggesting that raloxifene (0.85 mg/kg) prevented tumor proliferation and led to tumor regression. Moreover, raloxifene was reported to promote EGFR translocation into endosomes in vitro, thereby reducing cell migration, invasiveness, and tumorigenicity [65]. Besides, another study in SERMs indicated that tamoxifen inhibited cell migration and enhanced chemosensitivity of mesenchymal TNBC cells by reversing their EMT-like property [66]. Nevertheless, to our knowledge, there are few studies in AR-positive TNBC. Further experiment should be conducted to explore the mechanisms of the candidate drugs in AR-positive TNBC.
According to the features of hub genes in terms of expression, biological function, signaling pathway, and previous associated studies, we considered that TFF1, hsa-miR-520g-3p, and hsa-miR-520h were likely to play vital roles in AR-positive TNBC and could be considered as potential biomarkers. Nevertheless, several limitations of our work should be noticed. First of all, the shared sources of data from the GEO and TCGA databases were only analyzed through a series of bioinformatics methods and no in vivo or in vitro experiments were performed. Secondly, this research only initially revealed the expression levels of TFF1 and hsa-miR-520g-3p and hsa-miR-520h in ARpositive TNBC but rarely addressed the signaling pathways and functional mechanisms. We only revealed the modulation relationship between them without information of details and regulation mechanisms. Hence, more prospective research is needed to validate the value of TFF1, hsa-miR-520g-3p, and hsa-miR-520h in AR-positive TNBC, and their relationships should be further investigated by wet assays.

Conclusion
In summary, our study focused on AR-positive relevant genes in TNBC. The vital gene modules and candidate genes related to AR-positive TNBC were identified by WGCNA and other bioinformatics methods. This study suggested that hsa-miR-520g-3p, hsa-miR-520h, and TFF1 could have remarkably potential diagnostic and prognostic values in AR-positive TNBC. TFF1, hsa-miR-520g-3p, and hsa-miR-520h are able to be novel therapeutic targets. Our findings offer further insights into the therapy of AR-positive TNBC. In the future, deeper molecular mechanism studies of novel core genes in AR-positive TNBC are required, and associated experimental models based on core genes should be established for early detection, risk estimation, prognosis determination, and targeted treatment of AR-positive TNBC.

Data Availability
The RNA-seq data and corresponding clinical information used to support the findings of this study are available on the GEO database (https://www.ncbi.nlm.nih.gov/geo/). Disease Markers