ASPM, CDC20, DLGAP5, BUB1B, CDCA8, and NCAPG May Serve as Diagnostic and Prognostic Biomarkers in Endometrial Carcinoma

Uterine Corpus Endometrial Carcinoma (UCEC), the most common gynecologic malignancy in developed countries, remains to be a major public health problem. Further studies are surely needed to elucidate the tumorigenesis of UCEC. Herein, intersecting 203 differentially expressed genes (DEGs) were identified with the GSE17025, GSE63678, and The Cancer Genome Atlas-UCEC datasets. The Gene Ontology/Kyoto Encyclopedia of Genes and Genomes functional enrichment analysis and protein-protein interaction (PPI) network were performed on those 203 DEGs. Intriguingly, 6 of the top 10 nodes in the PPI network were related to unfavorable prognosis, that is, ASPM, CDC20, DLGAP5, BUB1B, CDCA8, and NCAPG. The mRNA and protein expression levels of the 6 hub genes were elevated in UCEC tissues compared to normal tissues. Higher expression of the 6 hub genes was associated with poor prognostic clinicopathological characteristics. The receiver operating characteristic curve suggested the significant diagnostic ability of the 6 hub genes for UCEC. Then, underlying pathogeneses of UCEC including promoter methylation level, TP53 mutation status, genomic genetic variation, and immune cells infiltration were analyzed. The mRNA expression level of the 6 hub genes was also higher in cervical squamous cell carcinoma and endocervical adenocarcinoma, uterine carcinosarcoma, and ovarian serous cystadenocarcinoma tissues than in corresponding normal tissues. In conclusion, ASPM, CDC20, DLGAP5, BUB1B, CDCA8, and NCAPG may be considered diagnostic and prognostic biomarkers in UCEC.


Introduction
Uterine Corpus Endometrial Carcinoma (UCEC) is reported as the most common gynecologic malignancy in developed countries [1] and has significant negative impacts on women's physical and mental health.
In 2013, e Cancer Genome Atlas (TCGA) classified UCEC into four molecular subtypes: POLE-ultramutant, microsatellite instability, low copy number variation, and high copy number variation [2]. e risk stratification of UCEC based on the molecular subtypes was the prerequisite for prognostic evaluation. However, it was far from optimizing the treatment guidelines [3]. Further studies are needed to explore suitable biomarkers for purpose of developing more effective treatments and improving the outcome for patients.
en we verified the differential mRNA and protein expression levels of those hub genes in tumor and normal tissues. Next, we analyzed the hub genes' clinical significance, such as the correlation with clinicopathological features, and diagnostic and prognostic values. en, underlying pathogeneses of UCEC including promoter methylation level, TP53 mutation status, genomic genetic variation, and immune cells infiltration were analyzed. Finally, we investigated the hub genes mRNA expression in cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), uterine carcinosarcoma (UCS), and ovarian serous cystadenocarcinoma (OV). We make the case that this study provides good diagnostic and prognostic biomarkers and relevant pathogeneses for UCEC in a new light.

Data Processing and Identification of DEGs.
Pre-processing procedures were used to process raw data in GSE17025 and GSE63678, including the Robust Multichip Average background correction, and completed log2 transformation by "affy" R language package. e probes were converted into the corresponding gene symbol according to the annotation information on the platform. After data pre-processing and standardization, the "limma" R language package was utilized to screen for DEGs between endometrial cancer samples and normal endometrial samples, in which genes with adjusted P value<0.05 and |log2 fold change (FC)|>1 were considered as threshold values for identifying DEGs. "Perl" software and "GTF dumps" (downloaded from Ensembl Data (http://grch37.ensembl. org/index.html)) were used for TCGA-UCEC RNA-seq HTSeq-Counts data extraction, integration, and conversion. e "edgeR" R language package was applied to discover DEGs in which genes with adjusted P value<0.05 and |log2 FC|>1 were considered the threshold for the DEGs. A Venn diagram tool (http://bioinformatics.psb.ugent.be/webtools/ Venn/) was used to identify the co-up-regulated and codown-regulated DEGs among the above 3 datasets.

Protein-Protein Interaction (PPI) Network Construction and Identification of Hub Genes.
e Search Tool for the Retrieval of Interacting Genes (STRING) (https://string-db. org) online database was used to predict the PPI network of DEGs. e obtained PPI network was imported into "Cytoscape" software (V3.9.0), and interaction with a combined score ≥0.9 was considered to be statistically significant. CytoHubba plugin tool (Degree method) was applied to identify hub genes.

Data Processing and Validation of the Hub Genes Differential Expression between UCEC Tissues and Normal
Tissues. Level 3 HTSeq-FPKM RNA-seq data (including 35 normal tissues and 552 tumor tissues) obtained from the TCGA-UCEC database were transformed into transcripts per million (TPM) reads for further analyses. Scatter plots were generated by the "ggplot2" R language package (V3.3.3) to present differential expression of the hub genes in unpaired samples (35 normal tissues and 552 tumor tissues) and paired samples (23 pairs of tumor/paracancerous tissues) respectively. Besides, the GEPIA (https://gepia.cancerpku.cn/) database which included 174 tumor tissues and 91 normal tissues (match TCGA normal and GTEx data as normal data) was also applied for validation.

Clinical Significance of the Hub Genes Expression.
Kaplan-Meier (KM) survival curve analysis was employed for prognostic analysis including overall survival (OS), and disease-specific survival (DSS). Statistical ranking for the hub genes expression above or below the median value was defined as high or low. Survival data were statistically analyzed by the "Survival" R language package (V3.2-10) and visualized by the "Survminer" R language package (V0.4.9). Unknown follow-up time and outcome were regarded as missing values. e supplementary prognostic data was obtained from a published study [28]. e receiver operating characteristic (ROC) curve was plotted to test the diagnostic performance of the hub genes in UCEC in which the "pROC" R language package (V1.17.0.1) for analysis and the "ggplot2" R language package (V3.3.3) for visualization. Each predictive variable is under its optimal cut-off value.

Protein Expression Analysis of the Hub Genes in the Human Protein Atlas (HPA) and Clinical Proteomic Tumor
Analysis Consortium (CPTAC). HPA (https://www. proteinatlas.org/) website was used to compare the protein expression of the hub genes between normal endometrial tissue and endometrial cancer tissue with the application of the immunohistochemical (IHC) method. Additionally, we conducted UALCAN (https://ualcan.path. uab.edu/analysis.html)-CPTAC to present a throughout analysis of the expression profiles of the hub genes at the protein level.

Correlation Analysis between the Hub Genes Expression and Clinicopathological Features.
We performed a stratified analysis based on clinicopathological features (including clinical stage, primary therapy outcome, race, age, weight, height, BMI, histological type, residual tumor, histologic grade, tumor invasion, menopause status, hormones therapy, diabetes, radiation therapy, surgical approach, OS event, DSS event, progress free interval (PFI) event), and determined the relationship between them and the hub genes expression level. "ggplot2" R language package (V3.3.3) for visualization was employed to present the correlation between the hub genes expression and clinical variables (clinical stage and histologic grade). Additionally, single-gene logistics regression analysis was also employed. Statistical ranking for the hub genes expression above or below the median value was defined as high or low, respectively.

2.9.
e Intrinsic Pathogeneses of UCEC. e UALCAN online tool was applied to present the promoter methylation level of the hub genes based on sample types and expression of the hub genes based on TP53 mutation status. e cBioportal web platform (https://www.cbioportal.org/) was designed for comprehensive genomic analysis. e study "Uterine Corpus Endometrial Carcinoma (TCGA, Nature 2013)" and the genomic profiles (included "Mutations," "Putative copy number alterations from GISTIC", and "mRNA expression z-scores relative to diploid samples (RNA Seq V2 RSEM)") were chosen to analyze. We performed the analysis on mutation spectrum, mutation count, methylation cluster, subtype, and genetic alteration in a total of 232 UCEC samples.

Association between the Hub Genes Expression and Immune Cells Infiltration.
e single-sample Gene Set Enrichment Analysis method from the "GSVA" R language package [29] was applied to present the association between different hub genes mRNA expression level and infiltration enrichment of 24 types of immune cells, including dendritic cell (DC), activated DC (aDC), B cells, CD8+ T cells, cytotoxic cells, eosinophils, immature DC (iDC), macrophages, mast cells, neutrophils, NK CD56bright cells, NK CD56dim cells, NK cells, plasmacytoid DC (pDC), T cells, T helper cells, T central memory (Tcm), T effector memory (Tem), T follicular helper (Tfh), T gamma delta (Tgd), 1 cells, 17 cells, 2 cells, and Treg. e immune cells' markers were derived from Immunity [30].

Data Processing and Analysis of the Hub Genes Differential Expression between CESC, UCS, OV Tissues and Corresponding Normal
Tissues. RNA-seq data in TPM format were processed uniformly by the Toil process [31] and applied to complete log2 transformation before expression comparison between samples. Differential expression of the hub genes between normal tissues and tumor tissues was visualized by the "ggplot2" R language package (V3.3.3).

Statistical Analysis.
All statistical analyses were performed in R (V3.6.3), with P values less than 0.05 considered significant. e Wilcoxon rank-sum test and paired-samples T test were used to analyze the expression of the hub genes in nonpaired samples and paired samples, respectively. e Cox regression analysis and the KM method were used to evaluate the role of the hub genes expression in UCEC prognosis. Clinicopathological features were compared for high-and low-hub gene expression groups using the Chisquare test, Wilcoxon rank-sum test, T test, and Fisher's exact test. e binary logistics model was used for single gene logistics regression analysis. Bonferroni-Dunn test was used to evaluate the relationship between the hub genes expression and clinical stage/histologic grade. Spearman's analysis evaluated the association between the hub genes expression and immune infiltration.

GO and KEGG Enrichment Analyses of DEGs.
A total of 196 Entrez IDs had been successfully converted by "org.Hs.eg.db" R language package (V3.10.0), with a conversion rate of 96.6% (196/203). As shown in Table 1, under the GO functional enrichment analysis, the DEGs were mainly enriched in nuclear division (ontology: BP), the spindle (ontology: CC), and ATPase activity (ontology: MF).
Additionally, under the KEGG pathway analysis, the DEGs were enriched in the cell cycle (hsa04110), oocyte meiosis (hsa04114), p53 signaling pathway (hsa04115), and cysteine and methionine metabolism (hsa00270) ( Table 2). We vividly presented partial GO/KEGG functional enrichment results in the form of bubble diagrams ( Figure 2).

PPI Network and Identification of the Hub Genes.
As shown in Figure 3, more than 50 genes interacted closely in the PPI network. e degree of correlation of each node was calculated by the cytoHubba plugin tool. e top 10 nodes ranked by the "Degree" method were located, which   Genetics Research included CDK1, ASPM, CCNB1, TOP2A, CDC20, DLGAP5, KIF11, BUB1B, CDCA8, and NCAPG (Table 3).

Abnormal Protein Expression of the Hub Genes in UCEC.
We explored the protein expression of the hub genes on the HPA website and representative images were presented in Figure 6(a). By the method of IHC, CDC20 was high staining in UCEC tissues while not detected in normal tissues by antibody CAB004525; DLGAP5 was medium staining in UCEC tissues while low staining in normal tissues by antibody HPA005546; CDCA8 was high staining in UCEC tissues while low staining in normal tissues by antibody HPA028120; NCAPG was high staining in UCEC tissues while medium staining in normal tissues by antibody HPA039613. IHC images of ASPM and BUB1B in UCEC tissues and normal tissues were not found on the HPA website. Furthermore, we used the ULCAN-CPTAC platform to verify the results of protein differential expression obtained from the HPA website. As shown in Figure 6(b), the protein expression level of CDC20 (P < 0.001), DLGAP5 (P < 0.001), BUB1B (P < 0.001), CDCA8 (P < 0.001), and  Genetics Research 5 NCAPG (P < 0.001) was higher in UCEC (n � 100) tissues than in normal tissues (n � 31), which was consistent with the results obtained from HPA. Interestingly, we discovered that ASPM (P < 0.05) exhibited down-regulated pattern at the protein level in UCEC (n � 100) tissues compared with normal tissues (n � 31). We discussed the contradictory trend in the discussion section in combination with the results of the genetic alteration analysis.

Association between the Hub Genes Expression and Clinicopathological Variables in UCEC.
To better understand the role of the hub genes, 552 UCEC samples' clinical information from TCGA was analyzed. e Association between detailed clinicopathologic characteristics of the patients and the 6 hub genes expression was listed in Table S1.
Next, we specifically focused on clinical stages (normal (n � 23), stage I (n � 342), stage II (n � 51), stage III (n � 130), stage IV (n � 29)) and histologic grades (normal (n � 23), G1 (n � 98), G2 (n � 120), G3 (n � 323)). As shown in Figure 7(a), patients in more advanced stages tended to express higher mRNA expression of the 6 hub genes (stage III vs. stage I, P < 0.05), and the highest mRNA expression of the 6 hub genes was found in stage 3. A lower number of included samples in stage 4 may be a limitation. In addition, it was shown in Figure 7(b) that mRNA expression of the 6 hub genes was significantly related to histologic grades, and the highest mRNA expression of the 6 hub genes was found in grade 3.
Single gene logistics regression analysis illustrated that the hub genes expression as independent variables was associated with poor prognostic clinicopathological characteristics (Table S2)        (TPM + 1) for log-scale, and red represents UCEC tissues and gray represents normal tissues. UCEC, uterine corpus endometrial carcinoma. 8 Genetics Research (AUC) value ranged from 0.5 to 0.1. e closer AUC to 1, the better the diagnostic effect was. As shown in Figure 8, the AUC of the 6 hub genes was all above 0.

Analysis of the Intrinsic Mechanisms of the 6 Hub Genes in UCEC.
First, we investigated the promoter methylation level of the 6 hub genes in 438 UCEC tissues and 46 normal tissues. We found that promoter methylation level of ASPM, CDC20, BUB1B, CDCA8, and NCAPG was observably lower in UCEC tissues than in normal tissues (P < 0.01) while DLGAP5 showed no significant statistical differences (Figures 9(a)-9(f )). Next, as shown in Figures 9(g)-9(l), the expression of the 6 hub genes in UCEC TP53-mutant tissues (n � 196) was dramatically higher (P < 0.001) than in UCEC TP53-nonmutant tissues (n � 345) or normal tissues (n � 35), which indicated a close relationship between the hub genes expression and TP53-mutation status.
Lastly, by inputting the 6 hub genes into the cBioPortal website, we found that the genetic alterations of ASPM, CDC20, DLGAP5, BUB1B, CDCA8, and NCAPG among 232 UCEC samples were 18%, 10%, 5%, 6%, 7%, and 10%, respectively (Figure 9(m)). Among the 6 hub genes, ASPM was the most frequently altered gene. In particular, missense mutation, truncating mutation, and amplification were identified as the primary types of genetic alteration of ASPM in UCEC. It was worth noting that the types of UCEC samples with large mutation counts were mainly microsatellite instability-high or POLE-ultramutant types and the primary mutation spectrum type was C > T (Figure 9(m)).

Discussion
In the present study, a total of 650 UCEC samples and 40 normal samples were collected from GSE17025, GSE63678, and TCGA-UCEC databases. rough analysis, we ended up with 125 co-up-regulated DEGs and 78 co-down-regulated DEGs. Under the GO/KEGG functional enrichment analysis, we found a plethora of DEGs were associated with the cell division process, for example, nuclear division, chromosome segregation, tubulin binding, microtubule motor activity, ATPase activity, and cell cycle.
e results were consistent with the theory that uncontrolled DNA replication, abnormal proliferation, and dysregulated cell cycle control were essential molecular mechanisms in carcinogenesis [35].
By constructing the PPI network, we were excited to find out that more than 50 DEGs interacted with each other closely. is greatly aroused our curiosity and we hypothesized that those DEGs might be of paramount importance in UCEC tumorigenesis. Using the "Degree" method of the cytoHubba plugin tool, the top 10 nodes (including CDK1, ASPM, CCNB1, TOP2A, CDC20, DLGAP5, KIF11, BUB1B, CDCA8, and NCAPG) were located successfully and we considered them as hub genes for further research.
In the TCGA-UCEC database, we observed that mRNA expression of the 10 hub genes was significantly overexpressed in tumor tissues compared to normal tissues both in unpaired samples and paired samples. e same results were confirmed in the GEPIA database. Nevertheless, only partial hub genes over-expression was associated with an unfavorable prognosis. Compared with low gene expression, high expression of ASPM, BUB1B, CDCA8, and NCAPG was significantly correlated with a poor OS. In addition, high expression of ASPM, CDC20, DLGAP5, BUB1B, CDCA8, and NCAPG was markedly associated with a poor DSS.
ese results suggested that ASPM, CDC20, DLGAP5, BUB1B, CDCA8, and NCAPG might serve as biomarkers for   poor prognosis in UCEC. Similarly, previous studies had stated that ASPM, CDC20, BUB1B, and CDCA8 expression could be potential poor survival prognostic biomarkers in lung adenocarcinoma [14], prostate/breast cancer [18,19], glioblastoma [12], respectively. Hence, we tried to conduct an appropriate, thorough, and in-depth understanding of the 6 hub genes in UCEC. First, we explored the 6 hub genes' protein expression on the HPA website. Compared with normal tissues, we found that CDC20, DLGAP5, CDCA8, and NCAPG were higher staining in UCEC tissues. On account of lacking ASPM and BUB1B information, we then turned to the ULCAN-CPTAC platform for further validation. It was vividly shown that the protein expression level of CDC20, DLGAP5, BUB1B, CDCA8, and NCAPG was higher in UCEC tissues than in normal tissues.
is was completely consistent with the result from HPA.
Next, we analyzed the association between the 6 hub genes and clinicopathological features. Notably, over-expressed expression of CDC20, DLGAP5, BUB1B, CDCA8, and NCAPG was all associated with poor prognostic clinicopathological characteristics (clinical stage, histological type, and histologic grade) by single gene logistics regression analysis while a little different was that over-expressed expression of ASPM was associated with age, histological type, and histologic grade. As tumor grade or stage increased, the mRNA expression of the 6 hub genes leaned to be higher. e above results suggested that the 6 hub genes functioned as oncogenes in UCEC. It should be mentioned that type I UCEC shared risk factors as exemplified by metabolic abnormalities such as obesity and diabetes [36]. Nonetheless, none of the expression of ASPM, CDC20, DLGAP5, BUB1B, CDCA8, and NCAPG was related to BMI and diabetes. Furthermore, the AUC of the 6 hub genes was all above 0.95.
is strongly manifested a high discriminative power of the 6 hub genes for UCEC diagnosis between UCEC tissues and normal tissues. en, it was worth mentioning that the lower level of promoter methylation and the higher level of TP53-mutation were tied to the mechanisms of 5 (DLGAP5 excepted) or 6 hub genes in UCEC respectively. In terms of genetic alterations which encompassed missense mutation, truncating mutation, amplification, and mRNA high, ASPM was the most frequently altered gene in the 6 hub genes. We hypothesized that the contradictory trend of ASPM protein expression level might be related to its genetic alterations such as amplification and mutation. Gene expression was regulated in many ways. Transcriptional/post-transcriptional regulation and translational/post-translational regulation all played roles in the final protein expression [37]. Moreover, factors such as mRNA degradation and protein degradation might lead to the inconsistency between mRNA abundance and protein expression level [38].
UCEC therapy was not merely needed to keep a watchful eye on the intrinsic characteristics of UCEC cells but also needed to pay close attention to the dynamic communication with various components in its TME. It had been established that tumor-infiltrating immune cells played essential roles in the TME with their composition and distribution considered to be linked with tumorigenesis and development [32][33][34]. We discovered that the expression of the 6 hub genes was positively correlated with the abundance of 2 cells, and was all negatively correlated with the abundance of 17 cells, NK CD56bright cells, pDC, NK cells, cytotoxic cells, neutrophils, eosinophils, mast cells, T cells, Tfh. Indeed, there were limited studies on the association between the 6 hub genes and immune infiltrations. James et al observed that enhanced expansion of Treg cells was accompanied by elevated expression of CDC20 and inflammatory tissue migratory markers (ITGA4, CXCR1) [39]. Similarly, Seon et al found that responses resulting from hypoxic stress, including upregulation of CDC20, were accountable for the superior expansion of NK cells via ERK/ STAT3 activation in patients with advanced cancer [40]. A previous study stated that both chemokines (CXCL12, IP-10, and CCL27) and cytokines profiles (IL-1β and IL-6) in the TME suppressed NK cells (major anti-tumoral effector cells) expression and function, then promoting UCEC progression [41]. It is important to understand how the 6 hub genes interact with their surrounding infiltrating immune cells during oncogenesis. Once the underlying immune-related mechanisms were clarified by the experimental method, the 6 hub genes might be useful for novel immunotherapy.
Finally, we explored the role of the 6 hub genes in the female reproductive system. Intriguingly, the mRNA level of the 6 hub genes was up-regulated in CESC, UCS, and OV tissues compared to corresponding normal tissues. e results demonstrated that the 6 hub genes might play a pivotal role in the tumorigenesis of the female reproductive system and its verification could shed some light on the current research.
However, there are still deficiencies in this study. Clinical samples may need to collect to validate the results. Further experimental verifications are needed to dissect more carefully the biological functions of the 6 hub genes in vitro and in vivo.

Conclusions
ASPM, CDC20, DLGAP5, BUB1B, CDCA8, and NCAPG may be considered diagnostic and prognostic biomarkers for UCEC. Promoter methylation level, TP53-mutation status, genomic genetic variation, and immune infiltration were involved in the UCEC pathogenesis of the 6 hub genes.

Data Availability
Datasets used to support the findings of the study can be obtained from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.