Whole Genome Analysis and Prognostic Model Construction Based on Alternative Splicing Events in Endometrial Cancer

Objectives A growing body of evidence has shown that aberrant alternative splicing (AS) is closely related to the occurrence and development of cancer. However, prior studies mainly have concentrated on a few genes that exhibit aberrant AS. This study aimed to determine AS events through whole genome analysis and construct a prognostic model of endometrial cancer (EC). Methods We downloaded gene expression RNAseq data from UCSC Xena, and seven types of AS events from TCGA SpliceSeq. Univariate Cox regression was employed to analyze the prognostic-related alternative splicing events (PASEs) and splicing factors; multivariate Cox regression was conducted to analyze the effect of risk score (All) and clinicopathological parameters on EC prognosis. An underlying interaction network of PASEs of EC was constructed by Cytoscape Reactome FI, GO, and KEGG pathway enrichment was performed by DAVID. ROC curves and Kaplan-Meir analysis were used to assess the diagnostic value of prognostic model. The correlation between PASEs and splicing factors was analyzed by GraphPad Prism; then a network was constructed using Cytoscape. Results In total, 28,281 AS events in EC were identified, which consisted of 1166 PASEs. RNPS1, NEK2, and CTNNB1 were the hub genes in the network of the top 600 PASEs. The area under the curve (AUC) of risk score (All) reached 0.819. Risk score (All) together with FIGO stage, cancer status, and primary therapy outcome success was risk factors of the prognosis of EC patients. Splicing factors YBX1, HNRNPDL, and HNRNPA1 were significantly related to the overall survival (OS). The splicing network indicated that the expression of splicing factors was significantly correlated with percent-splice-in (PSI) value of PASEs. Conclusion We constructed a model for predicting the prognosis of EC patients based on PASEs using whole genome analysis of AS events and thereby provided a reliable theoretical basis for EC clinical prognosis evaluation.


Introduction
Alternative splicing (AS) is an important regulation mechanism in the process of mRNAs after transcription. Pre-mRNA produces different mRNAs through different splicing methods to translate into different proteins with versatile functions, which contributes to protein diversity [1]. AS is a ubiquitous biological process, approximately 95% of human genes undergo AS in physiological processes [2]. In recent years, studies have shown that AS plays an important role in the occurrence and development of cancer, and AS participates in the process of proliferation, apoptosis, and metastasis of tumor cells [3,4]. AS is mainly regulated by the spliceosome, which is the complex proteins composed of small nuclear ribonucleic acids (snRNA) and splicing factors [5]. A splicing factor is an important accessory protein that regulates pre-mRNA AS. Mutation and/or abnormal expression of splicing factors are closely related to the occurrence of abnormal AS [6]. In total, there are seven types of AS events listed in the SpliceSeq database, i.e., alternate acceptor site (AA), alternate donor site (AD), alternate terminator (AT), alternate promoter (AP), exon skip (ES), mutually exclusive exons (ME), and retained intron (RI) [7].
Endometrial cancer (EC) is one of the common gynecological malignant tumors. In recent years, older age (⩾55 years old), obesity (⩾25 kg/m 2 ), diabetes, and tamoxifen 2 BioMed Research International adjuvant therapy have led to an increase in the incidence of endometrial cancer (EC) [8]. According to cancer statistics in 2019, an estimated 61,880 new cases of EC will occur in the United States, proceeding to the first in gynecological malignant tumors [9]. EC is a hormone-dependent cancer: research has shown that AS of estrogen receptor (ER ) and progesterone receptor (PR) is closely related to the occurrence and development of EC [10]. A newly defined splicing factor, YT521, was found to promote the AS of vascular endothelial growth factor A (VEGF-A), thereby leading to an increase in splicing variant VEGF-165 and ultimately promoting EC invasion [11]. Until now, monitoring the diagnosis and prognosis of EC has been limited by a lack of biomarkers with both sensitivity and specificity. Therefore, the purpose of our study was to elucidate aberrant AS in EC, which would provide a theoretical basis for finding prognostic biomarkers and therapeutic targets for EC. We also analyzed PASEs through whole genome analysis and constructed prognostic model for EC.
In the present study, we obtained normalized RNAseq data from UCSC Xena and seven types of AS events from TCGA SpliceSeq database, then performed a systematic analysis to figure out PASEs in EC, then we constructed prognostic model risk score (All) based on PASEs, and evaluated the diagnostic value of risk score (All) in assessing the prognosis of EC. In addition, splicing factors that were associated with prognosis of EC were identified using univariate Cox regression and a correlation network between splicing factors and PASEs was constructed. In summary, our study developed a new prognostic model for EC.

Materials and Methods
. . Data Extraction and Preprocessing. RNAseq-HTSeq-FPKM-UQ data was downloaded from UCSC Xena (https://xenabrowser.net/), which provided normalized RNAseq data. Except for 35 cases of adjacent normal samples, seven types of AS events of 545 ECs were downloaded from the TCGA SpliceSeq (https://bioinformatics.mdanderson .org/TCGASpliceSeq/index.jsp) database, and PSI value ranges from 0 to 1, which is used to quantify AS events. AS events with PSI range ≥ 0.5 were included in this study.
. . Building a Prognostic Model Based on PASEs. Patients with overall survival (OS) ≥ 30 days were included in this study. According to its median number, each parameter was divided into high-risk (≥ median number) and low-risk (<median number) groups. Univariate Cox regression was performed on all seven types of AS events, and lnHR was used as the -coefficient to construct a prognostic model for EC [12]. The prognostic model was constructed according to . . Evaluation of the Diagnostic Value of Prognostic Models. A Kaplan-Meier survival curve was plotted using the "survival" package (version 2.43-3) to assess the effect of the prognostic model on the 5-year OS of EC. A ROC curve was drawn using the "survivalROC" package (version 1.0.3) to assess the diagnostic value of prognostic models of EC . . Functional and Pathway Enrichment Analysis. DAVID (https://david.ncifcrf.gov/) is a database that integrates biological data and analysis tools and provides comprehensive annotation information. PASEs were analyzed using DAVID (version 6.8), including GO terms (Biological Process/Cellular Component/Molecular Function, BP/CC/MF) and KEGG pathways.
. . Construction of an Interaction and Correlation Network. To further explore the interaction of PASEs in EC, we constructed an interaction network using Cytoscape Reactome FI (version 3.6.0), and hub genes were selected according to the number of degrees. The splicing factors were downloaded from the TCGA SplieAid2 (http:// 193.206.120.249/splicing tissue.html) database (Table S1), and a scatter plot between prognostic-related splicing factor expression profiles and the PSI values of the selected PASEs was constructed in GraphPad Prism8.0. The correlation network between splicing factors and PASEs was constructed in Cytoscape (version 3.7.1).

Results
. . Overview of AS Events in TCGA-EC. In total, there were 8140 genes, and 28281 AS events in 545 EC patients and the median number of AS events per gene was 3.474. Among the seven types of AS events, ES was the most common, followed by AT and AP. Specifically, there were 4604 genes in the 9744 ES events, 3411 genes in the 7796 ATs event, 1792 genes in the 4458 AP events, 1691 genes in the 2270 AA events, 1413 genes in the 2050 RI events, 1386 genes in the 1877 AD events, and 85 genes in the 86 ME events (Figure 1(a)).
It is worth noting that one gene can undergo multiple types AS events; therefore, UpSet picture was used to match the genes with AS events. The results showed that the most common event was ES followed by AT and AP; ME events occurred the least. Three (AA, AD, and ES) AS events observed were DMKN and ATXN2L (Figure 1(b)).
To better track the AS events, we named AS events using the gene name, AS type, and a unique ID number in AS event. For example, in RPLP0-ES-24731, RPLP0 represents the gene name, ES represents the type of AS event, and 24731 was the unique ID number of the AS event.
. . AS Events Associated with Prognosis of EC. To build a prognostic model, we needed to find AS events with good discrimination; thus, AS events with PSI range ≥ 0.5 were included in this study. Ultimately, we obtained 5015 genes and 11709 AS events. To explore the relationship between AS events and EC prognosis, we evaluated prognostic AS events using univariate Cox regression. The results showed that 1166 AS events were significantly associated with OS in EC (P<0.05; Table  S2). We selected top 600 PASEs and constructed an interaction network using Cytoscape Reactome FI. The results showed that the RNPS1, NEK2, and CTNNB1 genes were the hub nodes of this network (Figure 1(c)). Then we performed GO (BP/CC/MF) and KEGG pathway analysis using the DAVID database and found that 19 BPs were enriched, such as the G2/M transition of mitotic cell cycle, mitotic nuclear division, and translation, 12 CCs, 12 MFs (P<0.05), and 2 KEGG signaling pathways (P<0.05; Table S3), such as ribosome and cell cycle, and plotted the bubbles of Top5 GO (BP/CC/MF) and the KEGG pathways ( Figure 1(d)).

. . Construction of Prognostic Model of EC.
To investigate the relationship between AS events and EC survival outcomes, we constructed a prognostic model of EC based on PASEs. The P values were sorted from small to large, and the top 4 genes for each AS event were selected to construct a prognostic model, while the top 6 genes were used to construct the risk score (All) prognostic model. We chose lnHR to construct a formula of prognostic models (Table 1).
Risk score (All) = (PSI value of RPLP0-ES-24731 × -9.093) According to the median number of risk score, EC patients were divided into high and low-risk groups, which indicated that the mortality rate was higher for EC patients in the high-risk group (red spots), which was associated with the lower OS (red and green spots; Figures 2(a)-2(h)). A ROC curve was used to evaluate the diagnostic value of the prognostic model; results suggest that the AUC of the risk score (All) is 0.819 followed by 0.805 for the risk score (ES) and 0.731 for the risk score (AT) (Figures 3(a)-3(h)). Compared with the prognostic model constructed by a single AS event, risk score (All) better predicted EC patients' prognosis. We also plotted survival curves for all prognostic models (Figures 3(i)-3(p)) and found that the high-risk groups of all prognostic models were associated with poor prognosis in EC patients (P<0.05).
In addition, we analyzed the relationship between clinicopathological parameters and EC prognosis. Age more than 55 years old and BMI more than 25 kg/m 2 were considered as high-risk factors. Univariate Cox regression analysis indicated that pathological type, FIGO stage, grade, cancer status, new tumor event after initial treatment, primary therapy outcome success, and risk score (All) were associated with poor prognosis of EC patients (P<0.05). Multivariate Cox regression showed that only FIGO stage, cancer status, primary therapy outcome success, and risk score (All) were risk factors for the poor prognosis of EC (P<0.05) ( Table 2).  (Table S4). A correlation network constructed by Cytoscape indicated that 10 splicing factors (blue dots) were negatively (green lines) correlated with 125 favorable prognostic AS events (green dots), and positively correlated (red lines) with 143 AS events with poor prognosis (red dots; Figure 4(g); Table S5).
We constructed Kaplan-Meier survival curves of 10 splicing factors and found that only YBX1, HNRNPDL, and HNRNPA1 had significant effects on OS in EC patients ( Figure S1). We plotted representative K-M survival curves of YBX1 and HNRNPDL (P<0.05; Figures 4(a) and 4(b)). In addition, we generated a scatter plot between the expression of splicing factors and PSI value of AS events and found that splicing factor YBX1 was positively correlated with PSI value of DNAH9-AT-39292 and negatively correlated with PSI value of DNAH9-AT-39293 (Figures 4(c) and 4(e)). Splicing factor HNRNPDL showed different correlation between different splicing event types of the same gene CD33 (P<0.005;

Discussion
AS is an important biological process for producing protein diversity. Abnormal AS events in tumors are closely related to tumor initiation and tumor progression. A gene can undergo different types of AS events and can be regulated by a variety of splicing factors, thus complicating the study of the regulatory networks between AS events and splicing factors. At present, research mainly detected AS of EC by using small sample sequencing [13]. However, studies employing whole genome analysis of AS events in EC have not been reported.
In recent years, high-throughput sequencing technology can better characterize abnormally mutated splices and splice sites; therefore, it is important to use bioinformatics techniques to analyze abnormal AS events in EC.
In the present study, we downloaded seven types of AS events from the TCGA SpliceSeq database. There were 8140 genes and 28281 AS events in EC, which indicates that AS events are ubiquitous in EC. 1166 PASEs were identified using univariate Cox regression analysis, and the underlying regulatory network was constructed using Cytoscape based on the top 600 PASEs. The results indicated that the RNPS1, NEK2, and CTNNB1 genes were hub nodes of the network. RNPS1 is a member of the SR protein family and functions as an activator in the AS process [14,15]. RNPS1 can inhibit abnormal splicing of pre-mRNA and plays an important role in quality control during pre-mRNA splicing [15,16]. Nek2 is a serine/threonine protein kinase that is involved in the regulation of the cell cycle, and there exist three alternative splice variants, Nek2A, Nek2B, and Nek2C [17]. A prior study showed that Nek2C was highly expressed in breast cancer cells and promoted breast cancer cells invasion and migration [18]. In addition, we performed GO (BP/CC/MF) and KEGG pathway enrichment using DAVID and found that these genes were closely related to transition of mitotic cell cycle, mitotic nuclear division, ribosome, cell cycle, etc. From the above perspective, RNPS1, NEK2, and CTNNB1 were the hub genes in the identified network and might therefore be novel targets for EC treatment.
Recently, a prognostic model was constructed based on four miRNAs (miR-4758, miR-876, miR-142, and miR-190b) to evaluate the prognosis of EC patients [19]. However, the construction of a prognostic model for EC based on AS events has not been reported. To assess the diagnostic value of aberrant AS events in the prognosis of EC, we constructed prognostic models based on risk score (All) and seven types of AS event types (AA, AD, AP, AT, ES, ME, and RI). EC patients were divided into high-risk and low-risk groups according to the median number of each AS event, then we plotted Kaplan-Meier survival curves of risk score (All) and the risk scores of each of the seven types of AS events. The results showed that risk score (All) and seven types of AS events were associated with poor prognosis in EC patients (P<0.05). In addition, we also plotted ROC curves. The results show that the AUC of the risk score (All) is 0.819 followed by 0.805 for the risk score (ES) and 0.731 for the risk score (AT). Univariate Cox regression was used to analyze the effects of clinicopathological parameters and risk score (All) on the prognosis of EC patients, and multivariate Cox regression was performed to analyze the risk factors that were associated with prognosis. The results show that FIGO stage, cancer status, primary therapy outcome success, and risk score (All) were risk factors for EC patients. These results suggested that the risk score (All) model played an important role in the assessment of EC prognosis.
Abnormalities in splicing factor expression are also related to aberrant AS events. In this study, we obtained 10 splicing factors that are related to the OS of EC patients using univariate Cox regression. By plotting Kaplan-Meier survival curves, we found that only splicing factors YBX1, HNRNPDL, and HNRNPA1 were significantly associated with EC prognosis (P<0.05). The constructed splicing network analyzes the association between splicing factors and PASEs in EC. The results suggested that poor prognostic AS events were positively correlated with the expression of splicing factors, while favorable prognostic AS events was negatively correlated with the expression of splicing factors. A splicing factor can precisely regulate the pre-mRNA splicing process by binding to the splicing regulatory sequence elements of a particular gene [20]. Y-box binding protein (YBX1) is a DNA/RNA-binding protein that is involved in gene transcriptional regulation and pre-mRNA splicing [21][22][23]. Allemand E [24] found that YBX1 interacted with the active form of PP2Cgamma, which is involved in spliceosome assembly, thus regulating CD44 splicing. In breast cancer, it has been shown that there is a significant difference in the expression of YBX1 between triple-negative (TN) and ER+ subtypes [25]. Additionally, another study found that high expression of YBX1 in ER+ breast cancer patients was associated with poor prognosis [26]. It is well-known that EC is a hormone-dependent tumor. EC is mainly divided into estrogen-dependent (type I) and non-estrogen-dependent (type II), and studies have shown that an ER-and PRtype is associated with poor prognosis of EC patients [27]. Therefore, we hypothesized that the differential expression of splicing factor YBX1 was closely related to the different types of endometrial cancer. Interestingly, our study showed that YBX1 was both positively and negatively correlated with different splicing patterns of axonemal dynein heavy chain (DNAH9), which indicated that YBX1 might regulate the splicing of DNAH9. In summary, YBX1 might be diagnostic biomarker and therapeutic target for EC. HNRNPDL and HNRNPA1 are members of the hnRNP family of RNA-binding proteins that are involved in RNA maturation and translation and in pre-mRNA AS [28,29]. Studies have shown that the inhibition of HNRNPDL expression can result in increased expression of genes that are involved in cell proliferation and migration [28]. Consistent with this research, our results indicated that HNRNPDL served as a potent tumor suppressor gene. However, studies have shown that HNRNPA1 is highly expressed in gastric cancer cells, thus promoting gastric cancer proliferation, invasion, migration, and EMT [1,30]. HNRNPA1 regulated AS of the androgen receptor (AR) and gave rise to a 8 BioMed Research International splice variant AR-V7, which conferred drug resistance toward enzalutamide in cancer cells [31]. The aforementioned studies collectively indicated that high expression of HNRNPA1 could promote tumor progression. Our results indicated that high expression of the splicing factors HNRNPA1 was a protective factor for the prognosis of EC patients. We hypothesized that HNRNPA1 may have different expression patterns at the tissue level and might exert different biological functions. Further experimental verification of HNRNPA1 should be pursued in EC.

Conclusions
In summary, we analyzed the effects of abnormal AS events and splicing factors on the prognosis of EC through whole genome analysis and constructed a new clinical prognosis prediction model for EC based on risk score (All). In addition, we built a correlation network between splicing factors and PASEs, which was important to investigate the potential regulatory mechanisms between splicing factors and PASEs. Altogether, our study provided a new theoretical basis for prognostic evaluation and targeted therapy for EC.

Data Availability
The data used to support the findings of this study are included within the article.