Identification of Biomarkers to Construct a Competing Endogenous RNA Network and Establishment of a Genomic-Clinicopathologic Nomogram to Predict Survival for Children with Rhabdoid Tumors of the Kidney

Rhabdoid tumor of the kidney (RTK) is a rare and severely malignant tumor occurring in infancy and early childhood, with the overall outcomes remain poor. Neither gene regulatory networks nor biomarkers to predict the prognostic outcomes have been elucidated in RTK. In this study, RNA sequencing data were obtained to identify differentially expressed messenger RNAs (mRNAs), long noncoding RNAs (lncRNAs), and microRNAs (miRNAs) between RTK samples and normal samples. A total of 4217 mRNAs, 284 lncRNAs, and 286 miRNAs were screened out. Of those, 103 mRNAs, 80 lncRNAs, and 45 miRNAs were identified for a competing endogenous RNA (ceRNA) regulatory network, in which three significant modules were identified. A protein-protein interaction (PPI) network was constructed, and the hub-gene cluster consisted of four core genes (EXOSC2, PAK1IP1, WDR43, and POLR1D) was selected. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were also performed to analyze the functional characteristics of differentially expressed mRNAs. Subsequently, among 211 mRNAs, 8 lncRNAs, and 12 miRNAs associated with overall survival (OS) obtained by univariate Cox analysis, 5 mRNAs, 7 lncRNAs, and 7 miRNAs were identified and the risk score formulas were constructed correspondingly using the least absolute shrinkage and selection operator (LASSO) Cox regression model analysis. The log-rank tests and Kaplan-Meier analyses were performed to confirm the predictive value of the risk scores for OS in RTK patients. A genomic-clinicopathologic nomogram integrating the stage and risk scores based on RNAs was established and demonstrated high predictive accuracy and clinical value, which was validated through calibration curves, time-dependent receiver operating characteristic (ROC) curve analyses, and decision curve analysis (DCA). In conclusion, this study not only provided potential insights into the mechanisms underlying RTK, but also presented a practicable tool for predicting the prognosis in children with RTK.


Introduction
As rare and extremely aggressive malignancies, rhabdoid tumors primarily affect infants and young children. These tumors predominantly arise in the kidney and the central nervous system [1,2]. Rhabdoid tumor of the kidney (RTK) generally metastasizes to the brain and lung, and patients with RTK continue to have a poor prognosis [1,3,4]. The loss of function of the SMARCB1 (INI1/SNF5/-BAF47) gene is the common genetic abnormality in rhabdoid tumors, regardless of the anatomic origin [5]. However, there are few reports on the development of biomarkers to predict the prognostic outcomes in children with RTK.
In recent years, advanced RNA sequencing analysis gained a lot of attention and revealed the complexity of the human genome [6]. Under such circumstances, competing endogenous RNA (ceRNA) hypothesis has stated that long noncoding RNAs (lncRNAs) can act as microRNAs (miRNAs) sponges and inhibit miRNAs functions by sharing miRNA response elements, thereby indirectly regulating messenger RNAs (mRNAs) expression levels [7,8]. Some previous studies have explored the ceRNA regulatory network associated with tumor progression [9][10][11]. However, the specific ceRNA regulatory network remains unelucidated in RTK.
In the present study, RNA sequencing data were used to identify differentially expressed mRNAs, lncRNAs, and miR-NAs between RTK samples and normal samples. Subsequently, a series of analyses, including the ceRNA network construction, protein-protein interaction (PPI) analyses, and functional enrichment analyses, were performed. Furthermore, risk scores based on mRNAs, lncRNAs, and miR-NAs to predict the outcomes in patients with RTK were calculated, and a genomic-clinicopathologic nomogram, integrating the risk scores and traditional clinicopathological factors, was developed and validated. All of these might not only provide insights into the molecular mechanisms that participate in the progression and tumorigenesis of RTK but also provide an efficient method based on biomarkers to predict the outcomes in children with RTK.

Study Population and RNA Sequencing Data Processing.
Expression profiles (mRNA-Seq and miRNA-Seq), clinical characteristics, and survival data were downloaded for subsequent analysis from the National Cancer Institute (NCI) Genomic Data Commons (GDC) Data Portal (https://portal .gdc.cancer.gov/repository), Therapeutically Applicable Research to Generate Effective Treatments (TARGET)   Program in April 2020, through structured queries [12]. mRNAs and lncRNAs expression data were acquired from 6 normal samples and 57 RTK samples, and miRNAs expression data were acquired from 6 normal samples and 58 RTK samples. ENSEMBL (htps://http://www.ensembl.org/) was used to annotate mRNAs and lncRNAs [13], and miRBase 21 (http://http://www.mirbase.org/blog/2014/06/mirbase-21-finally-arrives/) was used to annotate mature miRNAs with arms features through the R software (version 3.6.2) [14][15][16][17][18][19]. No ethical approval was required for the study as all patient data were acquired from the GDC Data Portal. A flow chart of the analysis procedure is shown in Figure 1.

2.2.
Identification of DEMs, DELs, and DEMis. At first, RNAs (mRNAs, lncRNAs, and miRNAs) that did not have a worthwhile number of reads in any samples were filtered out. To identify the differentially expressed mRNAs (DEMs), lncRNAs (DELs), and miRNAs (DEMis) between RTK samples and normal samples, the "edgeR" package (version 3.28.1) was used to analyze high-throughput sequencing data on differentially expressed RNAs [20,21]. The screening conditions for RNAs (mRNAs, lncRNAs, and miRNAs) differential expression were |log 2 fold change | >1, and a false discovery rate (FDR) or adjusted P value < 0.05. These differentially expressed RNAs were further subjected to the ceRNA network construction, gene enrichment analysis, and RNAbased prognostic model construction combined with clinicopathologic features.
2.3. Construction of ceRNA Network. Basing on DEMs, DELs, and DEMis, a ceRNA network was built through the "GDCRNATools" package (version 3.28.1) [22]. The major criteria for building ceRNA networks included the following: (1) The mRNAs and lncRNAs must share a significant          [22]. Following the pipelines of GDCRNA-Tools, miRcode was chosen to collect predicted and experimentally validated miRNA-lncRNA interactions [23], and StarBase v2.0 was used to predict miRNA-mRNA interactions [24]. Also, the P value of both the hypergeometric test and Pearson correlation analysis <0.01 was considered statistically significant. Visualization of the ceRNA network was performed by Cytoscape software (version 3.7.1) [25], and the subnetwork was generated using the Molecular Complex Detection (MCODE) plug-in Cytoscape with the default cri-teria (degree cutoff = 2, node score cutoff = 0:2, Max depth = 100, and k score = 2) [26].

PPI Network
Construction. The construction of PPI network based on the DEMs involved in the ceRNA network was performed through the Search Tool for the Retrieval of Interacting Genes (STRING; http://string-db.org) online database [27]. Required interaction score > 0:4 was considered statistically significant. Subsequently, the data from STRING were downloaded to model the PPI network through Cytoscape software. MCODE plug-in Cytoscape with the default criteria was adopted to identify densely connected regions in the PPI network.    7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 [28,29] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of the DEMs [30][31][32] were performed through the "clusterProfiler" R package (v3.6.0) [33]. The P value < 0.05 was considered statistically significant.

Construction of Risk Scores Based on RNAs for Survival
Analyses. Due to the lack of important clinical information such as the survival time of some patients, clinical data of 50 RTK patients were finally collected. In this study, the clinical features include age at diagnosis, gender, cooperative group protocol, and stage ( Table 1). The normalization of the high-throughput sequencing raw data had already been carried out by "edgeR", and the normalized expression data were converted to log 2 counts per million (log 2 CPM) values using the "cpm" function in edgeR. The "survival" package in R was used to identify the prognosis-associated RNAs (mRNAs, lncRNAs, and miRNAs) by univariate Cox regression analysis, and P < 0:05 was used as the cutoff criterion [34]. Subsequently, through the "glmnet" package (version 3.0-2) in R, the least absolute shrinkage and selection operator (LASSO) method was adopted to identify the key RNAs from RNAs which were significant in univariate Cox analysis [35,36]. Ten-time cross-validations were utilized to detect the best penalty parameter lambda. Then, the risk scores were calculated based on the formulas generated through the LASSO Cox regression model, respectively. The optimal cutoff values for risk scores were generated through the "surv_ cutpoint" function in the "survminer" R package (version 0.4.6). Based on cutoff values, RTK patients in the data set were divided into low-and high-risk groups correspond-ingly. Differences in overall survival (OS) between the lowand high-risk groups were compared via the log-rank test and Kaplan-Meier analysis. The survival curves were also constructed through the "survival" package. A P < 0:05 denoted statistical significance.

Establishment and
Validation of a Genomic-Clinicopathologic Nomogram. The Cox regression analysis was performed to determine whether the risk scores based on RNAs and the clinicopathologic features could be predictors associated with OS for RTK patients. Subsequently, based on the results of Cox regression analysis, a predictive genomic-clinicopathologic nomogram was generated to predict 1-, 3-, and 5-year OS through the "rms" package (version 5.1-4) in R. Furthermore, calibration curves were achieved to visualize the consistency between the actual probability of OS and the nomogram-predicted probability of OS. Moreover, time-dependent receiver operating characteristic (ROC) curve analyses were performed to assess the accuracy by calculating the area under the curve (AUC) through the "time-ROC" package (version 0.4) in R [37]. Additionally, decision curve analyses (DCA) were conducted to examine the clinical value via "stdca.R" statistical code in R [38].

PPI Network Construction.
The PPI network of the 103 DEMs including 52 upregulated and 51 downregulated involved in the ceRNA network was established (Figure 4(a)). Also following MCODE analysis, one hubgene cluster consisting of four core genes (EXOSC2, PAK1IP1, WDR43, and POLR1D) was identified (Figure 4(b)). These four key genes, which were all upregulated interestingly, may play an essential role in RTK progression.

20
BioMed Research International the patients were divided into high-and low-groups, respectively, based on the optimal cutoff values for risk scores (Figure 7, Table 2). Patients in each high-risk group had a significantly worse OS than those in the corresponding low-risk group (Figure 8(a), Table 3). Notably, the subgroup analyses based on stage revealed that the prognostic value of risk scores was independent of stage, except for miRNAs risk score in stage I-II subgroup (P = 0:1444) (Figures 8(b) and 8(c), Table 3).

Establishment and Validation of a Genomic-Clinicopathologic Nomogram.
To more accurately determine the prognostic value of risk scores based on RNAs (mRNAs, lncRNAs, and miRNAs), univariate and multivariate Cox regression analyses were performed. According to the results from univariate Cox analysis, the stage and risk scores were all significantly associated with OS for RTK patients ( Table 4). The subsequent multivariate Cox analysis further demonstrated that the risk scores based on mRNAs and miR-NAs remained powerful and independent prognostic factors (P = 0:01078 and P = 0:00816) ( Table 4). Through integrating the stage and risk scores, a genomic-clinicopathologic predictive nomogram (combined model) was developed (Figure 9(a)). The calibration curves also showed high consistency between the actual proportion of 1-, 3-, and 5-year OS and the nomogram-predicted probability (Figures 9(b)-9(d)). Furthermore, the predictive accuracy of nomogram was evaluated by time-dependent ROC curves among three models including stage, RNAs combined, and RNAs and stage combined. The results illustrated the RNAs and stage combined model had a significantly greater AUC than that of the stage model (1-year, P = 1:05e − 6; 3-year, P = 5:17e − 5; 5-year P = 0:0117) (Figures 9(e)-9(g), Table 5, Table 6). Additionally, DCA curves illustrated that the net benefit for the combined model was higher than that for the stage model, verifying the clinical value of the genomicclinicopathologic nomogram (Figures 9(h)-9(j)).

Discussion
RTK is a rare, highly aggressive type of cancer accounting for only 2% of all renal tumors in childhood [39].      23 BioMed Research International [5], potential biomarkers to predict the prognostic outcomes in children with RTK have rarely been identified.
Lately, the essential roles of the ceRNA network in gene expression regulation have aroused interest of researchers. In the present study, we first identified differentially expressed mRNAs, lncRNAs, and miRNAs in patients with RTK. Then, 103 DEMs, 80 DELs, and 45 DEMis were selected to construct a ceRNA regulatory network, and three significant modules were revealed through cluster analyses. Subsequently, we constructed the PPI network of the 103 DEMs including 52 upregulated and 51 downregulated involved in the ceRNA network, and a hub-gene cluster consisting of four core genes (EXOSC2, PAK1IP1, WDR43, and POLR1D) was identified. GO and KEGG pathway

24
BioMed Research International enrichment analyses also revealed the functional characteristics of differentially expressed mRNAs, which demonstrated that the variations in biological processes, cellular components, molecular functions, and pathways may play an essential role in the pathogenic mechanism of RTK, necessitating further research to verify. As far as we know, biomarker-based prognostic models or combined models built for predicting the outcomes of children with RTK have not been reported. In the present study, reasonable normalization of the raw data was achieved to eliminate the influence of differences in the platforms, prior to screening the prognosis-associated RNAs (mRNAs, lncRNAs, and miRNAs) for modeling. Moreover, the procedures of identifying the optimal RNAs for modeling were precise, such as the selection of the differentially expressed RNAs at step one, the selection through univariate Cox regression analysis at step two, and the selection through LASSO analysis at step three. Ultimately, 5 mRNAs, 7 lncRNAs, and 7 miRNAs were identified and the risk score formulas were constructed correspondingly. The following evaluations further confirmed the predictive value of risk scores for OS in children with RTK.
Nomograms, which are commonly used tools to estimate prognosis in oncology and medicine, can generate an individual probability of clinical events by incorporating multiple prognostic characteristics [40]. In the present study, a genomic-clinicopathologic nomogram was constructed by integrating the stage and risk scores based on RNAs. The combined model demonstrated significantly higher predictive accuracy than that of the stage model, which was further validated through calibration curves and time-dependent ROC curve analyses. The following DCA also verified the higher clinical value of the combined model compared with that of the stage model. However, there were still several limitations to the present study. First of all, the sample size of RTK was small because RTK is a rare tumor. Second, for the same reason, neither an external validation cohort for the prognostic model nor the collection of RTK samples to verify the results of bioinformatics through RT-qPCR was available, and all the data we analyzed were collected from the GDC Data Portal, which might result in some bias. Finally, the results were not further validated in cell lines.

Conclusions
In summary, this study for the first time identified a ceRNA network that potentially regulated RTK progression, revealed probable genes and pathways associated with RTK, and constructed a genomic-clinicopathologic nomogram by integrating the stage and RNAs-based prognostic index, which might present a practicable tool for predicting the prognosis in children with RTK.

Data Availability
The data used to support the findings of this study are included in the article. AUC: the area under the time-dependent receiver operating characteristic curve; CI: confidence interval.