Identification of the Key Genes Involved in the Tumorigenesis and Prognosis of Prostate Cancer

Background Prostate cancer (PCa) is a malignant tumor in males, with a majority of the cases advancing to metastatic castration resistance. Metastasis is the leading cause of mortality in PCa. The traditional early detection and prediction approaches cannot differentiate between the different stages of PCa. Therefore, new biomarkers are necessary for early detection and clear differentiation of PCa stages to provide precise therapeutic intervention. Methods The objective of the study was to find significant differences in genes and combine the three GEO datasets with TCGA-PRAD datasets (DEG). Weighted gene coexpression network analysis (WGCNA) determined the gene set and PCa clinical feature correlation module utilizing the TGGA-PRAD clinical feature data. The correlation module genes were rescreened using the biological information analysis tools, with the three hub genes (TOP2A, NCAPG, and BUB1B) for proper verification. Finally, internal (TCGA) and external (GSE32571, GSE70770) validation datasets were used to validate and predict the value of last hub genes. Results The hub gene was abnormally upregulated in PCa samples during verification. The expression of each gene was favorably connected with the Gleason score and TN tumor grade in clinical samples but negatively correlated with the overall survival rate. The expression of these genes was linked to CD8 naive cells and macrophages, among other cells. Antitumor immune cells like NK and NKT were favorably and adversely correlated with infiltrating cells, respectively. Simultaneously, the GSCV and GSEA indicated that the hub gene is connected with cell proliferation, death, and androgen receptor, among other signaling pathways. Therefore, these genes could influence the incidence and progression of PCa by participating in or modulating various signaling pathways. Furthermore, using the online tool of CMap, we examined the individual medications for Hughes and determined that tipifarnib could be useful for the clinical therapy of PCa. Conclusion TOP2A, NCAPG, and BUB1B are important genes intimately linked to the clinical prognosis of PCa and can be employed as reliable biomarkers for early diagnosis and prognosis. Moreover, these genes can provide a theoretical basis for precision differentiation and treatment of PCa.

Background. Prostate cancer (PCa) is a malignant tumor in males, with a majority of the cases advancing to metastatic castration resistance. Metastasis is the leading cause of mortality in PCa. The traditional early detection and prediction approaches cannot differentiate between the different stages of PCa. Therefore, new biomarkers are necessary for early detection and clear differentiation of PCa stages to provide precise therapeutic intervention. Methods. The objective of the study was to find significant differences in genes and combine the three GEO datasets with TCGA-PRAD datasets (DEG). Weighted gene coexpression network analysis (WGCNA) determined the gene set and PCa clinical feature correlation module utilizing the TGGA-PRAD clinical feature data. The correlation module genes were rescreened using the biological information analysis tools, with the three hub genes (TOP2A, NCAPG, and BUB1B) for proper verification. Finally, internal (TCGA) and external (GSE32571, GSE70770) validation datasets were used to validate and predict the value of last hub genes. Results. The hub gene was abnormally upregulated in PCa samples during verification. The expression of each gene was favorably connected with the Gleason score and TN tumor grade in clinical samples but negatively correlated with the overall survival rate. The expression of these genes was linked to CD8 naive cells and macrophages, among other cells. Antitumor immune cells like NK and NKT were favorably and adversely correlated with infiltrating cells, respectively. Simultaneously, the GSCV and GSEA indicated that the hub gene is connected with cell proliferation, death, and androgen receptor, among other signaling pathways. Therefore, these genes could influence the incidence and progression of PCa by participating in or modulating various signaling pathways. Furthermore, using the online tool of CMap, we examined the individual medications for Hughes and determined that tipifarnib could be useful for the clinical therapy of PCa. Conclusion. TOP2A, NCAPG, and BUB1B are important genes intimately linked to the clinical prognosis of PCa and can be employed as reliable biomarkers for early diagnosis and prognosis. Moreover, these genes can provide a theoretical basis for precision differentiation and treatment of PCa.

Introduction
PCa is one of the most frequent cancers in males and has the highest prevalence of male malignant tumors among the 112 nations in the global cancer statistics in 2020. Moreover, the fatality rate is second only to lung cancer patients [1]. The Gleason grading system [2] determines the aggressiveness of prostate cancer and judges the prognosis of patients [3,4]. Prostate-specific antigen (PSA) and needle biopsy are the most common early screening methods. PSA can detect well-differentiated prostate cancer, although the difference between poorly differentiated and advanced prostate cancer is murky. In addition, PSA is also boosted by benign prostatic hyperplasia and prostatitis conditions. Thus, the occurrence of this gray zone creates a challenge in prostate cancer diagnosis.
The increase or decrease in prostate cancer mortality due to PSA screening does not accurately represent the survival rate of patients [5]. Furthermore, the Gleason score is subjective and inaccurate [6], and the biopsy scores of different pathologists could vary by 30-50% [7]. PSA and Gleason scores have not depicted correct differentiation or excellent predictive effects in these disorders. Therefore, finding early diagnostic and prognostic biomarkers, precisely distinguishing the various stages of PCa, and determining the exact therapy for PCa is critical.
The cumulative analysis of multiple data, numerous platforms, and substantial sample sizes has revealed certain advantages in screening various tumor markers due to the fast development and deployment of gene chips and second-generation sequencing technologies.
The current work utilized GEO and TCGA gene chip datasets to filter DEG and used the WGCNA, the Cytoscape software, and the cytoHubba plug-in the MCC technique to identify BUB1B, NCAPG, and TOP2A as hub genes. In addition, the HIPLOT online tool was used to study the potential biological functions of genes: the Gene Enrichment Analysis (GSEA), Gene Variation Analysis (GSVA), and Tumor Immune Infiltration Analysis (GSCA). The Cancer Genomics Database (cBioPortal) was used to study genetic changes of genes, and the Pharmacogenomics Database (CMAP) was used to screen prostate cancer-related smallmolecule drugs.

PCa Gene Expression Dataset and Related Clinical
Information. The GSE38241, GSE3325, and GSE46602 microarray datasets (Supplementary Table 1) were downloaded from the NCBI GEO dataset containing the tumor and normal prostate samples. In addition, the statistics and clinical information were downloaded from the TCGA database for TCGA-PRAD (RNA-seq) counts, including 498 tumor and 52 normal prostate samples.
The distinct genes from the four datasets were intersected and visualized using a Venn diagram.

WGCNA Analyzes Candidate DEGs and Identifies Vital
Modules. The WGCN analysis established the relationships between distinct sample groups, gene modules, and genes having similar expression patterns.
The current analysis utilizes the TCGA-PRAD dataset. The WGCNA online analysis tool (http://sangerbox.com/) examined the candidate DEGs for hub genes associated with clinical feature-related modules. The soft threshold was set at 12 (scale-free R 2 = 0:85), and the smallest module was set to three.

Hub Gene Protein Network
Interaction. Using the Cystoscape v3.8.2 [8] software, a protein interaction network diagram was developed with the cytoHubba plug-in based on the best MCC by selecting the module having the highest association with clinical features (maximal clique centrality) [9]. The screening approach chose the topranked PPI hub gene while screening MMTC-hub gene for TC > 0:25 and MM > 0:4. Then, as the final hub gene, the overlapping parts of PPI hub gene and MMTC-hub gene were combined.

Hub Gene Clinical
Characteristic Analysis. After screening the last hub genes, internal validation (TCGA dataset) and external validation (GSE70770 and GSE25371 dataset) were performed, respectively.
The HIPLOT online tool, named Between stats (https://hiplot.com.cn/), was used in conjunction with the TCGA-PRAD data to ascertain the differential expression of the hub genes between PCa and normal prostate tissue at varied Gleason scores and tumor TNM staging [10]. Analysis of variance (ANOVA) or Student's t-test was used to determine the statistical significance of the calculated findings.
Survival analysis, ROC curve drawing, and AUC calculation were conducted by The HIPLOT online tool to evaluate the diagnostic value of hub gene.
2.6. Prognostic Analysis of Hub Genes. The online tool for univariate survival analysis (http://sangerbox.com/) generated the KM survival curve, the ROC curve, and the area under the ROC curve (AUC) for evaluating the diagnostic value of the hub genes.
2.7. GSEA and GSVA. The 498 PCa samples in the TCGA-PRAD RNA-seq data were subdivided into the highexpression and low-expression groups based on the median expression of each hub gene, using the GSEA software [11] (GSEA3.0 https://www.gseamsigdb.org/gsea/index.jsp). The 2 Computational and Mathematical Methods in Medicine "c2.cp.kegg.v6.2.symbols.gmt" was used as the reference gene set (download from MSigDB [12]) for analysis. P ≤ 0:05 was considered statistically significant. In addition, the internet database (http://bioinfo.life.hust .edu.cn/) was utilized in the RPPA (Reverse Protein Chip High-throughput Antibody Technology) [13], using the "GSVA" [14] R package. In the center, the pathways associated with PCa were scored, and the most related pathway to the hub gene was determined.
2.8. Analysis of Tumor-Infiltrating Immune Cells. The system examined the immune infiltration situation of the hub genes in PCa with the web tool GSCA [15] (http://bioinfo.life.hust .edu.cn/), and 550 samples in the TCGA PRAD data were retrieved. In addition, the expression of hub genes and 24 tumor-infiltrating immune cells (B cells, CD4 and CD8 T cells, NK cells, NKT cells, gamma delta T cells, neutrophils, macrophages, monocytes, and dendritic cells) were analyzed using the online tools.    Computational and Mathematical Methods in Medicine 2.9. Genetic Changes in the Hub Gene. The cancer genomics database cBioPortal (http://cbioportal.org) [16] utilized cancer genomics to study genetic changes related to hub genes.
DEGs of the green module were compared to CMap data to propose small-molecule therapies that could reverse the biological state of PCa. The standardized connection score of CMap ranged from -100 to +100. A positive link score meant the medicine could create signal biology in a specific disease, whereas a negative connection score indicated that the drug could prevent the signal biology. In general, the screening requirements for future study are points greater than or equal to +90 or points less than or equal to -90.
Touchstone and PCL screen genes were used from the same gene family or genes targeted by the same substance to find the best related small-molecule medications.
Finally, 62 common genes derived from the intersection of GEO-DEGs and TCGA-DEGs as potential DEGs were employed ( Figure 1(b), Supplementary Table 4).

Candidate DEG WGCNA Analysis and Identification
Key Modules. We performed WGCNA analysis on candidate DEGs to find the critical gene modules most relevant to the clinical characteristics of PCa based on the TCGA-PRAD dataset (Figures 2 and 3). Among them, the clinical characteristics of PCa matching the candidate DEGs mainly included the Gleason score, PSA, and TNM grades (Figure 2(b)). The soft parameter threshold was 12 (scalefree R 2 = 0:85), the cluster height was 0.25, and the gene clustering method for similar expression profiles was the dynamic tree cut algorithm (Figure 2(a)). Finally, five modules were determined (Figures 2(b) and 2(c)); each branch within the hierarchical tree or the vertical line in the colored bar represents a gene. The genes not attributed to any module are gray. The module-feature correlation heatmap depicted that the green module correlated with clinical features. The Gleason score was the most significant (Person correlation value = 0:38, P = 2:66E − 18, Figure 2(c)). The gene expression in the green module developed a heatmap (Figure 2(d)), and there was a close relationship between these genes.

Key Modules MMTC and PPI Network to Screen Hub
Genes. We selected 10 pivot genes from the green module by establishing the membership degree of the green module to MM > 0:4 and TC > 0:25 (Figure 3(b)). Then, we created a protein-protein interaction network (PPI network) for each module DEG based on the matching protein interaction network of the green module in Cytoscape software using the CytoHubba plug-in and the MCC screening approach [9] (Figure 3(a)). The first 12 hub genes are listed in Figure 3(c). Finally, 10 hub genes were generated by combining the MM/ TC and PPI networks (EZH2, BUB1B, MK167, CENPF, NCAPG, TK1, CENPU, TOP2A, BIRC5, and RRM2; Figure 3(d)). We selected TOP2A, NCAPG, and BUB1B among the 10 hub genes listed above as the final important genes by combining MCC scores (Supplementary Table 5).

Internal and External Validation of Last Hub Gene
Expression. Among the 10 central genes screened above, we selected 3 genes (TOP2A, NCAPG and BUB1B) as important central genes for the next step.
Based on the clinical samples, the expression of these genes was significantly higher in the cancer group than in the matching normal control group, with P values of 3:6E   Furthermore, the KM and ROC curves based on the TCGA-PRAD dataset demonstrated that these genes were closely associated with clinical prognosis, having overall survival (Figure 6(a)) and TOP2A AUC = 0:76, NCAPG AUC = 0:80, and BUB1B AUC = 0:85 (Figure 6(b)), demonstrating their significant diagnostic and prognostic potentials as PCa biomarkers.
In addition, based on the UCAN online database, protein levels of these 3 genes were significantly higher in tumor tissues than in normal tissues and were positively correlated with Gleason score and T and N stages of PCa (Supplementary Figure 2).

Hub
Gene GSVA and GSEA. GSVA and GSEA were utilized to investigate and evaluate the potential activities of the hub genes. According to the GSVA analysis, apoptosis, cell cycle, DNA damage, epithelial-mesenchymal transition (EMT), androgen receptor (hormone AR) pathways, among others, were associated with the hub genes in prostate cancer (Figure 7(a), Supplementary Table 6). Furthermore, the expression of these genes was positively correlated with the activation of the above pathways (Figure 7(b), Supplementary Table 7). Therefore, it was hypothesized that these hub genes could be linked to PCa and CRPC proliferation and medication resistance.
Furthermore, the GSEA analysis of hub genes based on the TCGA data revealed that metabolic pathways, including "apoptosis" and "cell Cycle," had higher enrichment scores in the high-expression group. Therefore, these genes were associated with the proliferation activation process and connection (Figures 8(a)-8(c), Supplementary Figure 3).

Hub Gene and Tumor Immune Infiltration Analysis.
Previously, the relationship between hub gene expression and the metabolic pathways participating in prostate cancer was analyzed. Next, the association between hub gene expression and the relevant immune infiltrating cells in the sample was evaluated. The results showed that the

Hub Gene Mutation.
We analyzed the TCGA-PRAD mRNA expression data from the cBioPortal database and identified that the mutation types of the three hub genes were mostly AMP, diploid, and deep deletion. BUB1B had the highest mutation rate (6%), whereas NCAPG and TOP2A had a 4% mutation rate. Furthermore, 47 (9%) of all the three hub genes were mutated in 498 individuals (Figures 9(a) and 9(b)).

Hub
Gene-Related Small-Molecule Drug Screening. We utilized the CMap online database to assess DEGs in the green module to screen small-molecule medicines closely associated with PCa. There were 90 small-molecule medicines with a connection score (|CS | >95) and an n-sample ≥ 3 in the analysis results (Supplementary Table 8). Since all indicated a negative link, it was assumed that PCa could be slowed or stopped. A small-molecule medication (MDM inhibitor) was chosen for further investigation with a connection score of -99 and a target protein of MDM2/ TP53. Using the PCL filtering approach and the PC3 and VCAP prostate cancer cell lines, the four small-molecule medications closely related to the MDM inhibitor (median tau score > 90) were finally obtained using the Touchstone software: farnesyltransferase, an angiogenesis inhibitor, and apoptosis; tipifarnib, an apoptosis promoter; aminomethyltransferase (AMT), a nitric oxide synthase inhibitor; xaliproden, a serotonin receptor agonist; and BAY-K8644, an L-type calcium channel activator. Tipifarnib had the highest median tau score of 94.82 among the comparable small-molecule medicines ( Figure 10). These possible small-molecule medications can reverse PCa-induced gene expression and help develop targeted molecular therapies against PCa.

Discussion
The pathogenesis of prostate cancer is complicated, and metastases lead to medication resistance which is challenging to treat. As a result, proper identification is highly critical to therapy. In recent years, bioinformatics technology has provided many studies to screen biomarkers for malignant tumors [19][20][21]. However, only a few have made it into clinical practice. For the screening of 10 hub genes (EZH2, BUB1B, MK167, CENPF), NCAPG, TK1, CENPU, TOP2A, and BIRC5, RRM2), GEO and TCGA gene expression datasets were employed, as well as clinical information such as PSA, Gleason score, TNM staging, and more realistic screening approaches like WGCNA. The three genes, BUB1B, NCAPG, and TOP2A, have an excellent clinical diagnostic and predictive value. These genes are not only upregulated in prostate cancer tissues but also their expression levels are associated with the Gleason score, T and N staging, and overall survival analysis. The 5-year AUC values of the ROC curve were 0.6, 0.61, and 0.61, respectively. Furthermore, these genes could be linked to immune invading cells in prostate cancer and tumor therapy resistance. BUB1B (BUBR1), also known as the mitotic checkpoint for serine/ threonine kinase B, belongs to the Bub1 family. A "destruction" box can degrade the targeted proteins during mitosis of the cell cycle [22]. BUB1B is abnormally expressed in cancers of the liver, pancreas, lung, breast, and other organs. Its clinical prognosis, especially its poor survival rate, is linked to the BUB1B gene expression [23][24][25][26]. Based on our findings, the expression of BUB1B in PCa tissue is significantly higher than in normal prostate tissue. Its expression is favorably associated with Gleason score and T and N staging and negatively correlated with the overall clinical survival based on the TCGA and GEO datasets. The findings of Zhong et al. [27] were also validated based on our findings. Our results revealed that BUB1B plays a critical role in the invasion and proliferation of PCa and is linked to various clinical outcomes. The regulatory subunit, NCAPG, of the clusterin complex is essential for chromosomal condensation and stabilization in mitosis and meiosis. During mitosis, two threonine residues in the CAP-G subunit can be mutated, resulting in the CAP-G formation of the chromosome. Birth deformities and cancer have been associated with location defects [28]. The present study on NCAPG focuses on how it affects the cell cycle to enhance invasion, progression, and metastasis of liver cancer [29,30]. Furthermore, NCAPG has been correlated with a poor clinical outcome in breast and lung cancer [30,31]. Our results support the findings of Feng et al. and Arai et al. [32,33], who observed that NCAPG expression is substantially associated with tumor stage and overall clinical survival rate.
Topoisomerase II (TOP2A) is a DNA topoisomerase II isoenzyme that regulates essential biological functions by modifying the topological structure of the chromosomal DNA [34]. Type II topological difference, anticancer, and antibacterial medicines are therapeutic targets for structural enzymes [35,36]. TOP2A research has reached an advanced stage. The topoisomerase II inhibitor, etoposide phosphate (VP-16), has clear activity in individuals with metastatic castration-resistant prostate cancer (mCRPC), as evidenced by studies like Cattrini. Furthermore, TOP2A overexpression could be a biomarker for predicting mCRPC (excellent response to VP-16) [37]. Therefore, TOP2A has a higher diagnostic and prognostic value, as demonstrated in this study.  Figure 10: Small-molecule drugs were analyzed and screened in the green module DEGs. The small-molecule drugs, viz., tipifarnib, AMT, xaliproden and bay-K8644, were closely related to MDM inhibitors (median tau score > 90). Tipifarnib had the most significant median tau score = 94:82.

Computational and Mathematical Methods in Medicine
The tumor microenvironment (TME) includes tumors, stroma, and invading immune cells. Many studies have observed that tumor-infiltrating immune cells (TIIC) can modulate tumor prognosis, immunotherapy response rates, and chemotherapeutic efficacy [38,39]. In prostate cancer progression, TIIC is also crucial. According to certain research, the PCa malignancy degree is directly associated with the infiltration trend of quiescent NK cells, memory B cells, M2 macrophages, and activated dendritic cells. The malignancy degree is adversely connected with naive B cells, active NK cells, and quiescent dendritic cells [40].
Furthermore, iNKT cells could slow PCa evolution by decreasing proangiogenic macrophages and boosting the regulatory mechanism of proinflammatory m1-like macrophages [41]. Moreover, a strong relationship was observed between the hub gene mRNA expression and immune cell infiltration in the study sample in this investigation. Therefore, TOP2A, NCAPG, and BUB1B are possible prognostic indicators associated with tumor-infiltrating immune cells in the tumor microenvironment. They could be evaluated as potential immunotherapy targets to enhance the clinical performance prognosis of PCa patients, especially CRPC patients.
The functional analysis of GSVA and GSEA revealed that the hub gene is primarily enriched in apoptosis, cell cycle, DNA damage, EMT, hormone AR, and other metabolic pathways. Therefore, they are linked to tumor growth and treatment resistance in the late stages of the tumor.
In this investigation, we found four related smallmolecule medicines that could prevent PCa progression using the CMap database: tipifarnib, AMT, xaliproden, and BAY-K8644. Tipifarnib depicted the highest association among the three and was regulated by three hub genes. Tipifarnib is a farnesyltransferase inhibitor that is highly effective and selective and can treat various solid cancers. It treats HRA-mutated non-small-cell lung cancer [42] and pancreatic cancer [43] and is also being tested as a novel anticancer therapy for cervical squamous cell carcinoma [44].
Thus, tipifarnib could be beneficial in treating PCa, particularly CRPC; additional in vivo and in vitro testing is required.
However, there is no substantial relationship between the gene chosen and the clinical sample M staging and PSA levels. This lack of association could be due to the small sample size of M staging tumor patients (4 cases). Moreover, various aggravating factors and constraints could affect the PSA test value and the impact of sex in a person's life. The current work uses an open database for research and verification. In vivo and in vitro research are still required to ascertain the accuracy of these findings and better understand the specific roles and molecular mechanisms of the three identified biomarkers in the evolution of PCa.

Conclusion
We employed a combination of datasets, including GEO and TCGA and bioinformatics methods like WGCNA and cyto-Hubba, to screen for three hub genes (TOP2A, NCAPG, and BUB1B). Hub genes were confirmed and analyzed using GSVA, GSEA, cBioPortal, and CMap. Finally, TOP2A, NCAPG, and BUB1B could be exploited as potential PCa biomarkers. However, their reliability and specific mode of action are still under investigation.

Supplementary Materials
Supplementary Table 1: three GEO datasets. The GSE38241, GSE3325, and GSE46602 microarray datasets were downloaded from the NCBI GEO dataset, which contained 67 tumor samples and 41 normal prostate samples. The table describes the dataset ID, references, and Gene Expression Omnibus Platform information of each GEO datasets. Supplementary Table 2: the 115 DEG_GEO. The NCBI Web analysis tool "GEO2R" was used to collect and evaluate the differences in the selected GEO datasets, and the 115 differential gene expression data matrix was generated with P ≤ 0:05|Log FC| ≥ 1 as the standard. Adj.P.Val: adjusting P value. Log FC: logarithmic fold change. Supplementary Table 3: top 100 DEGs of the TCGA_PRAD data. DEGs of normal and cancer samples in TCGA-PRAD RNA-seq counts data was screened using the "EdgeR" R package based on the overdispersed Poisson model. Using P ≤ 0:05|Log FC| ≥ 1 as criteria, the differential gene expression data matrix was obtained, and the top 100 target genes were selected for further analysis, 14 upregulated genes and 86 downregulated genes. Supplementary Table 4: the 62 common genes derived from the intersection of GEO-DEGs and the TCGA-DEGs. The 62 upregulated genes were obtained from overlapping parts of GEO-DEGs and TCGA-DEGs datasets for further analysis. Supplementary Table 5: top 10 in network node ranked by MCC method. Supplementary Table 6: the pathways were associated with the hub genes in prostate cancer. Supplementary Table 7: the expression of 3 hub genes was positively correlated with the activation of the pathways. Supplementary Table 8: the hub gene-drug-TOP 90 (|Raw-cs| > 95, N ≥ 3). Ninety small-molecule drugs closely related to PCa were screened from DEG in the green module using CMap online database. Raw-SC ranged from -100 to +100; negative values indicated that the signal biology could be prevented or slowed down. Pert_id: Perturbagen ID; Raw-cs: raw-connection score; Cell iname: the type of cell line that the drug is treating; Moa: CMap mode of action analysis. Supplementary Table 9