Identification of Novel Diagnostic Biomarkers in Prostate Adenocarcinoma Based on the Stromal-Immune Score and Analysis of the WGCNA and ceRNA Network

Prostate cancer is still a significant global health burden in the coming decade. Novel biomarkers for detection and prognosis are needed to improve the survival of distant and advanced stage prostate cancer patients. The tumor microenvironment is an important driving factor for tumor biological functions. To investigate RNA prognostic biomarkers for prostate cancer in the tumor microenvironment, we obtained relevant data from The Cancer Genome Atlas (TCGA) database. We used the bioinformatics tools Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data (ESTIMATE) algorithm and weighted coexpression network analysis (WGCNA) to construct tumor microenvironment stromal-immune score-based competitive endogenous RNA (ceRNA) networks. Then, the Cox regression model was performed to screen RNAs associated with prostate cancer survival. The differentially expressed gene profile in tumor stroma was significantly enriched in microenvironment functions, like immune response, cancer-related pathways, and cell adhesion-related pathways. Based on these differentially expressed genes, we constructed three ceRNA networks with 152 RNAs associated with the prostate cancer tumor microenvironment. Cox regression analysis screened 31 RNAs as the potential prognostic biomarkers for prostate cancer. The most interesting 8 prognostic biomarkers for prostate cancer included lncRNA LINC01082, miRNA hsa-miR-133a-3p, and genes TTLL12, PTGDS, GAS6, CYP27A1, PKP3, and ZG16B. In this systematic study for ceRNA networks in the tumor environment, we screened out potential biomarkers to predict prognosis for prostate cancer. Our findings might apply a valuable tool to improve prostate cancer clinical management and the new target for mechanism study and therapy.


Introduction
Prostate cancer (PRAD) is one of the most common types of cancers and the third leading cause of death from cancer in men worldwide [1,2]. The number of new prostate cancer cases globally in 2030 is estimated to be 1.7 million [2]. Prostate cancer is still a significant global health burden in the coming decade. Surgery, chemotherapy, radiation therapy, hormone therapy, and immunotherapy such as Sipuleucel-T and PD-1/PD-L1 immune checkpoint inhibitors are the available treatment for prostate cancer patients nowadays. Although the five-year survival is optimistic for most of the early stage, the survival for a distant and advanced stage of prostate cancer still needs improvement [3]. The therapeutic resistant and metastatic progression of cancer cells was significantly associated with their surrounding tumor microenvironment [4,5]. The tumor microenvironment consists of cancer cells, stromal cells (including endothelial cells, fibroblasts, and various types of immune cells) and the extracellular matrix (ECM) produced by stromal cells [5]. The crosstalk between cancer cells, stromal cells, and immune cells is the major driver for the biological functions of tumors. Prostate cancer is the solid malignancy having a highly immunosuppressive microenvironment [6,7]. Therefore, the molecular events associated with the dynamic regulation in tumor microenvironment are valuable for screening prognosis biomarkers for prostate cancer.
Competitive endogenous RNA (ceRNA) means the RNA in the complex transcriptional regulation network, including messenger RNAs (mRNA), transcribed pseudogenes, circular RNAs, and long-chain noncoding RNAs (lncRNA). The ceRNA hypothesis suggested that the mutual regulation of RNAs is achieved by competing for microRNA (miRNA) response elements (MRE) of miRNAs [8]. lncRNAs can bind to miRNAs by MRE and prevent the regulatory function of miRNA in mRNA. ceRNA/miRNA axis posttranscriptional regulation is one of the current hot topics in cancer research. The crosstalk in the ceRNA network regulates essential biological processes in cancer, indicating the possibilities of ceRNAs as diagnostic and prognosis biomarkers for cancers [9,10].
Weighted gene co-expression network analysis (WGNA) is an analysis method for exploring co-expressed gene modules, which can discover the relationship between gene networks and phenotypes of interest, and focus on core genes in the networks [11,12]. WGCNA can identify highly related genes and cluster them into the same module and provide clinical characteristics of related modules [13], which is very helpful in identifying the candidate biomarkers and a commonly used method in tumor-related research including liver hepatocellular carcinoma [14], breast cancer [15], and lung cancer [16]. However, there is no report yet on the application of WGCNA to find the biomarkers of PRAD.
Our study was aimed at investigating microenvironmentrelated prognostic biomarkers for prostate cancer targeting the ceRNA network using The Cancer Genome Atlas (TCGA) database. We employed the bioinformatics tools Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data (ESTIMATE) [17] and WGCNA [11] to construct a ceRNA network based on the stromalimmune score and screen prognosis lncRNA, mRNA, and miRNA biomarkers for prostate cancer.

Materials and Methods
2.1. Data Acquisition. The genetic test data (RNA-seqv2) were downloaded from The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov, accessed on 15 May 2020) and organized into standardized raw data for subsequent analysis. We excluded samples diagnosed with other cancers and prostate cancer samples missing any lncRNA, miRNA, or mRNA data. Finally, 462 prostate cancer cases were included for further analysis.

Tumor Stromal-Related Differentially Expressed Gene
Profile Based on the ESTIMATE Algorithm. ESTIMATE can predict tumor purity and tumor microenvironment (whether it is infiltrated by stromal cells and immune cells) using gene expression data [17]. Briefly, ESTIMATE generates three scores: (1) the stromal score (SS) reflects the level of stroma content in tumor tissue, (2) the immune score (IS) represents the infiltration of immune cells in tumor tissue, and (3) the estimate score infers tumor purity. SS and IS for each of the prostate cancer samples were derived by the R language "estimate" package. Samples are divided into 4 groups according to SI and SS, and the median value is the cut-off value. The differentially expressed mRNA, lncRNA, and miRNA in the SS-high group compared to the SS-low group and the IS-high group compared to the IS-low group were analyzed. P value < 0.05, false discovery rate < 0:05, and log2 fold change > 1:2 were considered to indicate significance.
2.3. WGCNA. The "WGCNA" package in the R language was used for WGCNA analysis, as described previously [11]. We constructed the lncRNA/mRNA coexpression network and miRNA coexpression network separately. We firstly got Pearson's correlation matrices cor ði, jÞ to indicate the correlation of genes and then calculated the weighted adjacency matrix as follows: a ij = (0.5 × (1 + cor (i, j))β. a ij refers to the correlation between genes i and j. β as a soft thresholding parameter strengthens the strong correlation while weakening the weak correlation and the negative correlation, making the correlation value more in line with the characteristics of the scale-free network, and has more biological significance [18]. Based on these β values, we constructed a topological overlap matrix (TOM) and a hierarchical clustering tree between genes for module detection.
Then key coexpression modules were further screened by the number of coexpressed genes, the clinical characteristic correlation analysis, and biological function analysis. It should be noted that in our study, we performed WGCNA preanalysis on the data obtained by SS and IS and finally select the SS group as the data for subsequent analysis.

Correlation Analysis between Coexpression Modules and
Clinical Characteristics. Module eigengene (ME) was the first principal component of a given module, which can represent the gene expression profile of the entire module [11,19]. After we got MEs, the Pearson correlation test was performed to analyze the association between the coexpression gene module and the clinical characteristics. The clinical characteristics included in the analysis have age, Gleason score, stroma score, T stage, N stage, tumor grade, and survival. A significant correlation was considered when P value was < 0.05.

Potential Molecular Mechanism and Pathway Analysis.
To investigate the molecular mechanisms, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genome (KEGG) functional enrichment analysis, applying the Visualization and Integrated Discovery (DAVID). We got multiple results, using the criterion of P < 0:05 to get the target pathways.
2.6. ceRNA Network Construction. We selected three lncRNA/mRNA modules (turquoise, blue, and brown) and two miRNA modules (turquoise and blue) to construct a We carried out RFS survival analysis on the mRNA, lncRNA, and miRNA in the module we selected and found mRNA, which has a significant impact on the survival of RFS lncRNA and miRNA. The P value set in the analysis results is 0.05.

Differentially Expressed Gene Profile in Tumor Stroma.
A total of 462 tumor samples were used for further analysis, which we get from TCGA database. As shown in Figure 1, we found 781 mRNA, 237 lncRNA, and 60 miRNA differentially expressed in the IS-high group compared to the IS-low group and 765 mRNA, 207 lncRNA, and 116 miRNA differentially expressed in the SS-high group compared to the SSlow group. The significant functions of the GO and KEGG signaling pathway were screened ( Table 1). The differentially expressed genes from comparison of both IS and SS were significantly enriched in immune response. The metabolism pathway, PI3k-AKT signaling pathway, cancer-related pathways, and cell adhesion-related pathways were the top pathways in differentially expressed genes from comparison of both IS and SS.

Construction of the Weighted Coexpression Network.
After the above analysis, the differentially expressed genes of IS and SS were obtained, and then, WGCNA preanalysis was performed, respectively. However, no meaningful results were obtained in the IS group and the subsequent analysis could not be performed, so we selected the SS group as the research data. We chose the 765 mRNAs, 207 lncRNAs, and 116 miR-NAs differentially expressed in the SS-high group for coexpression network construction. In order to make the connections between genes in the network obey the scalefree network distribution while taking into account the average connectivity, the β values we finally chose in the coexpression network analysis of lncRNAs/mRNAs (Figure 2(a)) and miRNAs (Figure 2(b)) were 4 and 9, respectively. Next, the hierarchical clustering tree was obtained through the correlation coefficient between genes (Figures 2(c) and 2(d)).
Then, using Pearson's correlation test analysis method, the correlation between gene modules and clinical phenotypes was calculated, and trait-related modules were identified. According to Figures 2(e) and 2(f), the turquoise miRNA module was significantly associated with Gleason score, tumor clinical stage, and survival; the blue lncRNA and mRNA module was associated with N stage, and the brown lncRNA and mRNA module was associated with Gleason score, N stage, and tumor clinical stage.
Regarding GO functional analysis, we noticed that the significant functions in the turquoise module were signal transduction, cell adhesion, cell shape regulation, and several stromal cell-related functions. For the blue module, mRNAs were significantly enriched in signal transduction, oxidativereduction process, cell differentiation, and gene expression regulation. The related functions of mRNA in the brown module are mainly related to several metabolic-related processes (Table 2). KEGG pathway analysis showed that the mRNAs in the turquoise module were mainly related to focal adhesion, pathway in cancer, cGMP-PKG, cAMP, and MAPK signaling pathway. In the blue module, the mRNAs were associated with metabolic pathway, tight junction, and focal adhesion. In the brown module, the mRNAs were associated with metabolic pathways and PI3K-Akt signaling pathway.
Combined with the number of differential expressions of lncRNA and miRNA, the correlation between module and traits, the biological function, and signaling pathway analysis, we chose turquoise, blue, and brown lncRNA and mRNA modules and turquoise and blue miRNA modules as the key modules for the next ceRNA network analysis.
3.3. Module-ceRNA Analysis. Based on the key modules we identified above, we have three combinations for ceRNA network analysis: group 1: turquoise mRNA and lncRNA module and turquoise miRNA module, group 2: blue mRNA and lncRNA module and turquoise miRNA module, and group 3: brown mRNA and lncRNA module and blue miRNA module. Then, we used the predictive miRNA-mRNA and miRNA-lncRNA pair to build the internally competitive ceRNA network ( Figure 3). The network for group 1 includes 45 mRNAs, 18 lncRNAs, and 24 miRNAs; that for group 2 includes 16 mRNAs, 15 lncRNAs, and 12 miRNAs; and that for group 3 had 9 mRNAs, 6 lncRNAs, and 7 miRNAs.

Discussions
The importance of having tumor microenvironment factors to predict the therapy and prognosis has been strengthened due to their essential role in tumor development. This study investigated prostate cancer prognosis biomarkers based on stromal-immune score-based ceRNA network in the tumor microenvironment using bioinformatics tools ESTIMATE and WCGA. Finally, we screened out a panel of 8 RNAs as the potential prognosis biomarkers for prostate cancer. As far as we know, this is the first study systematically that investigated the ceRNA network in the tumor environment in prostate cancer.
Most of the tumor microenvironment and immune scores are based on the Immunohistochemistry (IHC) and hematoxylin-eosin (HE) staining from Formalin-Fixed and Paraffin-Embedded (FFPE) tissue slides, such as microenvironment cell population counter [19], Glasgow microenvironment score [20], and tumor microenvironment of metastasis score [21]. Generally, TCGA samples are not appropriate to evaluate the microenvironment factors. However, due to the development of the bioinformatics tool ESTIMATE, we can empirically quantitate the stromal and immune cells in tumor samples using gene expression data from whole tumor tissue [17]. The ESTIMATE method enormously expands the database used for microenvironment biomarker screening. In addition, the immune score and stromal score combined with their genomic fingerprint can be used to identify tumor microenvironment stromal cells and characterize cancer immunologic landscape [22]. In our study, the ESTIMATE method was used to evaluate differently expressed genes in prostate cancer microenvironment. The enriched biological function and pathways, immune response, PI3k-AKT, cancer pathway, and cell adhesion-related pathway found in differentially expressed genes from ESTIMATE analysis are significant roles in tumor microenvironment. These findings supported our next ceRNA network construction, and prognosis biomarker screening was based on the prostate cancer tumor microenvironment.
A number of studies have shown that the ceRNA network is related to the occurrence and development of prostate cancer. For example, NEAT1 can regulate the epigenetics of target gene promoters to play the role of oncogenes, by increasing ACSL4 via sponging miR-34a-5p and miR-204-5p, or HMGA2 via sponging miR-98-5p, and was found significantly associated with prostate cancer prognosis [23,24]. PCAT1 promotes prostate cancer proliferation through c-MYC via sponging miR-3667-3p and FSCN1 via sponging miR-145-5p [25]. To conduct our research, the lncRNA-miRNA-mRNA axis was systematically screened using WGCNA bioinformatics tool. WGCNA has the advantage of finding coexpressed gene modules and probing the relationship between each element and clinical characteristics. WGCNA is valuable for investigating candidate biomarkers and has been widely used in several cancers [14].
We constructed three ceRNA modules, including 153 RNAs associated with the prostate cancer tumor microenvironment. From them, we screened out 31 RNAs significantly associated with RFS survival. lncRNA LINC01082, miRNA hsa-miR-133a-3p, and genes TTLL12, PTGDS, GAS6,   Disease Markers after radical prostatectomy, hsa-miR-133a-3p was found as a new prognostic biomarker [28]. We found that miR-133a-3p constructed a network with miR-133b, lncRNA RP11-44B10.1, and gene QPCTL and NME4. TTLL12, a serum autoantibody, was overexpressed in prostate cancer patients and regulate cytoskeleton, tubulin modification, and chromosome number stability in prostate cancer [29]. PTGDS was downexpressed and has the potential to predict  biochemical relapse in prostate cancer [30]. Its biomarker potential for prostate cancer was also found in a proteomic analysis [31]. GAS6 was one of the genes in an early-stage prostate cancer diagnosis model [32]. GAS6 can promote prostate cancer survival by cell cycle arrest and apoptosis inhibition [33]. CYP27A1 was one of the vitamin D pathway genes. It has a certain potential for predicting the prognosis of PRAD patients [34] and shows some effects on prostate cancer chemoprevention based on vitamin D metabolism. The transcription level of CYP27A1 is positively correlated with disease-free survival and negatively correlated with tumor grade [35]. PKP3 is related to the carcinogenicity and aggressiveness of prostate cancer [36]. This result is consistent with our findings that high expression of PKP3 was associated with a worse prognosis. PKP3 plays some roles in the tumor microenvironment, such as regulating cell invasion and tumor formation via MMP7 proteins [37] and regulating adhering junctions and mesenchymalepithelial transitions by interaction with desmoglein and desmocollin [38]. ZG16B can regulate the Wnt/β-catenin pathway and enhance the immunosuppressive activity of myeloid-derived suppressor cells in the tumor microenvironment [39]. Moreover, ZG16B was found as a potential predictor of prostate cancer biochemical recurrence [40]. The above eight lncRNAs, microRNAs, and genes have the biological function in PRAD development and its microenvironment. Our research showed that they have the potential to predict prostate cancer diagnosis and prognosis of PRAD. Regarding other RNAs, their research in prostate cancer is limited, which provides new research ideas and directions for the carcinogenesis and prognosis of prostate cancer.

Conclusions
We constructed ceRNA networks in the prostate cancer microenvironment and identified lncRNA LINC01082, miRNA hsa-miR-133a-3p, and genes TTLL12, PTGDS, GAS6, CYP27A1, PKP3, and ZG16B as the potential biomarkers to predict prognosis for prostate cancer. Our findings might apply a valuable tool to improve prostate cancer clinical management and the new target for mechanism study and therapy.

Data Availability
The datasets generated and/or analyzed during the current study are available in the following databases:

Conflicts of Interest
The authors declare that they have no competing interests.

Authors' Contributions
Tengfei Zhang and Yaxuan Wang have contributed equally to this work, and they are co-first authors of this paper.