A Novel 7-Methylguanosine (m7G)-Related Gene Signature for Overall Survival Prediction in Patient with Clear Cell Renal Cell Carcinoma

Clear cell renal cell carcinoma (ccRCC) is the most common pathology type of renal cancer that has an abysmal prognosis. Although a crucial role for 7-methylguanosine modification in cancer cell development has been reported, its role in ccRCC remains uncertain. This study was conducted to determine the efficacy of predictive biomarkers based on m7G-related genes in ccRCC. Firstly, we extracted clinical data and gene expression profiles of ccRCC patients from publicly accessible databases. It identified that 22 of the m7G-related 34 genes were related to overall survival, and 5 of the 22 genes were significantly expressed differently in tumor tissues. Based on Lasso regression analysis, five optimal genes (CYFIP2, EIF4A1, NUDT1, NUDT10, and NUDT4) were chosen to build a new predictive risk model in the TCGA cohort. Validation was carried out with the E-MTAB-1980 cohort. Then, a prognostic nomogram was erected, including the m7G-related gene risk score, age, histological grade, and stage status. Further studies and analysis showed that immune cell infiltration might be associated with the m7G-related risk genes. In addition, the relationship between gene expression and drug response was evaluated by the Pearson correlation test. Therefore, the risk signature with five selected m7G-related genes may be a promising prognostic biomarker and contribute to standardized prognostic assessment for ccRCC.


Introduction
Ninety percent of all renal malignancies are renal cell carcinomas (RCC), a prevalent malignancy in the urinary system [1]. ccRCC, the most common subtype of RCC, accounts for approximately 70% of RCC and is also one of the most aggressive subtypes [2]. Te early symptoms of renal clear cell carcinoma are not obvious, with only 6-10% of patients presenting with typical symptoms such as backache, an abdominal mass, or hematuria [3]. More than about 30% of ccRCC patients already have metastasis or local progression at the frst diagnosis [4], so it is necessary to diagnose ccRCC at an early stage. Although surgery is the best therapy method for ccRCC, nearly 30% of patients develop local recurrence and metastasis after surgically removing local ccRCC [5]. Despite the advent of targeted drugs and immune checkpoint inhibitors, patients with metastatic renal cancer still have a low overall survival rate. Hence, it is crucial to discover efective and novel biomarkers to provide novel molecular targets for adjuvant therapies and improve patient outcomes.
We frst downloaded mRNA expression and clinical data from public databases. Ten, selected fve optimal genes and constructed a predictive model. Te model's predictive value has been confrmed via various survival analyses. Moreover, we also validated the accuracy of the model in the E-MTAB-1980 cohorts. Furthermore, we set up a nomogram to enhance individualized prognosis assessment. Finally, we elucidate the underlying mechanism of ccRCC through functional analysis and explore the relationship between risk genes and chemosensitivity.

Data Acquisition.
We acquired the RNA sequence data of 539 tumors and 72 normal tissues of ccRCC and clinical information (n � 537) in the TCGA database. In addition, 101 patients in the E-MTAB-1980 cohort with clinicopathological information and RNA sequence data were also downloaded from the ArrayExpress database. Reference [20]. Our analysis excluded patients who had no follow-up days during the survival period. Tirty-four m7G-related genes were obtained from earlier studies [21], and gene sets: gomf-m7G-5-pppn-diphosphatase-activity, gomf-RNA-7methylguanosine-cap-binding, and gomf-RNA-cap-binding (https://www.gsea-msigdb.org/gsea/index.jsp). Basic clinical information is summarized in Table 1. A fow chart of our overall study is displayed in Figure 1.

Prognostic Model Construction and Validation.
Diferentially expressed genes (DEGs) in ccRCC normal samples and tumor samples in the TCGA cohort were processed using the "Limma" R package, and the flter conditions were set (fdrFilter <0.05, logFCflter >1). Ten, a univariate Cox analysis of survival outcomes was performed to screen the genes with prognostic values. Te "Venn" R package was used to select m7G-related DEGs with predictive values through DEGs and prognostically valuable genes. Furthermore, we used the "pheatmap" package to display heatmaps and the "Survival" package to show forest maps to represent diferences between groups. In the TCGA cohort, the "Survival" and the "glmnet" packages were applied for prognostic risk characteristics. Te risk score of each patient was estimated as follows: Risk score � (gene expression × corresponding coefcient. Ten, we calculated the risk score of each patient according to the above formula and divided all patients into high-risk and low-risk groups with the median score. "Survminer" and "Survival" of the R language were introduced to evaluate overall survival (OS) based on the Kaplan-Meier (K-M) method. Te "timeRoc" of the R package was applied to assess the accuracy of the related gene model. Te "pheatmap" of the R package was used to describe the risk scores and corresponding survival times. Te R packages "Rtsne" and "GGplot2" were used to investigate whether there was a diference in distribution between the two groups of at-risk patients. We further processed the same analyses to validate the model's predictive performance in the E-MTAB-1980 cohorts.

Te Construction of a Predictive Nomogram Using Risk Scores and Clinical Factors.
We performed the analysis of univariate and multivariate Cox regression further to examine the relationship between OS and clinical factors. Ten, establishing a nomogram including a risk score and the independent prognostic clinical characteristics, we can assess the probability of survival at 1, 3, and 5 years for ccRCC patients. In order to measure this nomogram's predictive precision, calibrate curve analysis was used.

Functional and Immune Infltration Enrichment Analyses.
Using the package ClusterProfler, we used GO and KEGG to analyze risk-related DEGs. Te ESTIMATE algorithm assessed the association between risk score and stromal score or immune score. "GSVA" and "GSEABASE" were utilized to assess immune functions and cells among riskrelated DEGs.

Generation of the Predictive Risk Model and Survival
Analyses in the TCGA Database. By lasso regression analysis, we discovered that fve m7G-related genes ft the model (Figure 3(a) and 3(b)). Among them, EIF4A1, NUDT1, and NUDT10 are high-risk genes, while CYFIP2 and NUDT4 are assigned to low-risk genes. In order to establish the prognosis model based on the expression levels of the selected genes, we computed risk scores using the following formula: risk score � (−0.532106574444019 * CYFIP2) + (0.25728474 2647963 * EIF4A1) + (0.210867606644546 * NUDT1) + (0.027 3145419183592 * NUDT10) + (−0.254461172952311 * NUDT 4). Five m7G-related genes model was identifed in the TCGA cohort. We categorized all patients based on the median risk scores into two groups: high-risk (n � 262) and low-risk (n � 263). (Figure 4(a)). Te results of PCA and t-SNE analysis of MRGs indicated that diferent risk groups of patients were entirely distributed diferently (Figures 4(b) and 4(c)). Furthermore, we noticed that the high-risk patients died more frequently than the low-risk patients when we took survival outcomes into consideration (Figure 4(d)). Terefore, the predictive capacity for the clinical prognosis of the model was evaluated using the K-M survival and timedependent ROC analyses, respectively. Te results are as follows: survival rates for patients assigned to the high-risk group were signifcantly worse than those in the low-risk group according to the K-M survival curve (P

Survival Analyses of the E-MTAB-1980 Cohorts for Verifcation.
We also calculated the risk score for the E-MTAB-1980 cohorts for validation (n � 101). In light of the median risk score, patients were divided into two sets: those at high-risk (n � 50) and those at low-risk (n � 51).
( Figure 5(a)). MRGs were reduced in dimension by PCA and T-SNE analysis, and then we found that diferent risk groups were distributed in two diferent directions (Figures 5(b) and 5(c)). Additionally, the death rates in the high-risk group were higher than those in the low-risk group. (Figure 5(d)).
In addition, we performed analyses of K-M survival and time-dependent ROC curves, which suggested that both were signifcant in assessing the prognostic value of the risk score model. Survival curves showed that patients in the high-risk group had signifcantly worse OS than patients in the low-risk group (P � 1.53e − 03) ( Figure 5

Te Independent Prognostic Evaluation of Five Genes.
We depicted to use a heatmap to illustrate the expression levels of the fve selected m7G genes and clinical features in the two low-high groups, which suggested the signifcant diferences in clinical features between the low and high groups ( Figure 6(a)). Toward a better understanding of the predictive value of risk scores and other clinical features in ccRCC patients, the TCGA cohort was subjected to both    univariate and multivariate Cox regression analyses (Figures 6(b) and 6(c)). We omitted the N-stage in our further analyses because most patients lacked clinical information on the N-stage. A prediction model was developed using four independent prognostic factors: age, risk score, grade, and stage. A nomogram predicting the OS probability over the course of 1, 3, and 5 years was estimated using the fve m7G-related genes ( Figure 6(d)) and assessed its predictive power (Figures 6(e) and 6(g)). Te calibration curves proved that the nomogram had a positive efect on prognosis.

Functional Analyses in TCGA Cohort.
To address the potential biological diferences, we conducted GO and KEGG enrichment analyses on the DEGs between the different risk groups. Te GO enrichment analysis of riskrelated DEGs revealed signifcant correlations between humoral immune response, immunoglobulin complex, and antigen binding (Figure 7(a), P adjust <0.05). Te KEGG pathway analysis demonstrated that DEGs had signifcantly enriched the following pathways: cytokine-cytokine receptor interaction, viral protein interaction with cytokine, and cytokine receptor and mineral absorption (Figure 7(b), P adjust <0.05).

Expressions of MRGs Correlated with Terapeutic
Response. We incorporated the data on cancer cell expression and the efectiveness of FDA-approved drugs in our study in order to investigate the clinical applications of the m7G-related gene signature. CYFIP2-overexpressed cancer cells were demonstrated to be more sensitive to the following drugs: nelarabine, melphalan, idarubicin, and so on. Te elevated NUDT4 expression levels in cancer cells increased the sensitivity to selumetinib, but drug resistance to everolimus was correlated with them. Overexpression of NUDT1 in cancer cells makes them more susceptible to carboplatin and triethylenemelamine, and cancer cells expressing higher levels of EIF4A1 were more sensitive to cladribine (Figure 9).

Discussion
Clear cell renal cell carcinoma is a fatal adult renal cancer [22], with a higher risk of recurrence and metastasis [23]. m7G and m6A are both common types of internal RNA modifcations. Interestingly, numerous studies have revealed that m6A RNA methylation regulators are connected with the development and progression of cancer in humans [24], such as bladder cancer [25], hepatocellular carcinoma [26], and breast cancer [27]. However, there are few similar studies on m7G. Terefore, this study explored the impact of m7G RNA methylation regulators on ccRCC and constructed an m7G-related signature and nomogram for prognostic prediction of ccRCC.
First, we fltered out the diferentially expressed m7Grelated genes with prognostic values. Ten we constructed the signature of fve m7G-related genes, including CYFIP2, EIF4A1, NUDT1, NUDT10, and NUDT4. Te expression of CYFIP2 and NUDT4 was downregulated, whereas EIF4A1, NUDT1, and NUDT10 expressions were upregulated in the ccRCC high-risk group. Te cytoplasmic FMR1-interacting protein (CYFIP) family was initially recognized as a protein associated with brain diseases [28]. Recent research demonstrated that Cyfp 2 might play a critical role and have potential functions in cancers, such as gastric cancer [29] and colorectal adenocarcinoma [30]. Previous studies indicated CYFIP2 was downregulated in ccRCC patients due to its DNA methylation, and it was involved in the metabolic reprogramming, the EMT pathway, and immune infltration processes in ccRCC by its DNA methylation [31]. Te eukaryotic initiation factor 4A (eIF4A) family plays a vital role in many cancers [32][33][34]. In human cancers like gastric cancers and breast cancers, ZBP1 expression is increased, which also correlates with poor prognoses [35,36]. EIF4A1 is a necessary component of EIF4F, which was considered a direct connection between essential steps in cancer development and translation initiation [37]. Moreover, accumulating studies have linked EIF4A1 to malignant phenotypes of tumor cells and tumor-specifc survival [38,39]. NUDT1, NUDT4, and NUDT10 belong to the nudix hydroxylase (NUDT) family. Te expression of NUDT1 and NUDT10 is increased and the expression of NUDT4 is decreased in ccRCC [40]. Up to now, the association between the NUDTfamily and tumorigenesis has been unclear, and studies about their role in ccRCC are rare [41][42][43]. Previous studies indicated that NUDT1 might provide diagnostic and prognostic value for KIRC [44]. A study revealed that NUDT1 is activated by HIF2αtranscription, thereby inhibiting oxidative stress and promoting the progression of ccRCC [45]. Te roles of NUDT4 and NUDT10 remain ambiguous in ccRCC tumor progression and metastasis.
An analysis using Cox regression displayed that risk score, age, stage, and grade could act as the independent predictive factors for patients with ccRCC. Tese independent predictive factors were used to build a nomogram to improve the ability to predict the prognosis for ccRCC. Te calibration curves demonstrated that the nomogram was fairly accurate in predicting the actual OS.
After performing GO, KEGG, and immune infltration analyses on diferential risk DEG genes, many biological pathways and functions were identifed as contributing to immunity. It might be reasonable to conclude that m7G modifcation is closely related to tumor immunity. Lines of evidence demonstrate that the progression of ccRCC is associated with the presence of diferent types of immune cells and immune functions [46]. It has been proven that RNA methylation contributes to tumor immunity [47]. But the relationship between m7G modifcation and tumor immunity is still developing, and a deeper exploration of the mechanism is needed.
To facilitate the clinical transformation of the fve genes model, we further identifed FDA-approved sensitive drugs that express high levels of CYFIP2, EIF4A1, NUDT1, NUDT4, and NUDT10 in multiple cancer cell types. A high expression of CYFIP2 is related to cancer cells' sensitivity to nelarabine, Melphalan, and so on. Nelarabine is a process for obtaining the deoxyguanosine analog 9-β-Darabinofuranosylguanine(Ara-G) [48,49]. Nelarabine has preferential cytotoxicity to T lymphoblastic cells through an  accumulation of Ara-GTP, thereby inhibiting ribonucleotide reductase and subsequent DNA synthesis [50,51].
According to the score comparisons of immune cells and immune function, we observed that high-risk patients possessed higher scores in helper T cells 2 (T2 cells), CD8+ T cells, follicular helper T cells (Tfh), helper T cells 1 (T1 cells), and T cell co-stimulation, suggesting a potential therapeutic agent in ccRCC with nelarabine. However, no literature has been found on the treatment of kidney cancer by nelarabine. Melphalan plus fudarabine is a feasible and efective RIC regimen for allogeneic SCT in metastatic RCC [52]. Increasing NUDT4 expression has been related to the resistance of tumor cells to everolimus. Nevertheless, it has been associated with increased sensitivity to selumetinib. Everolimus, a mammalian target of rapamycin (mTOR) inhibitor, is the standard second or third line therapy in patients with ccRCC [53]. SELUMETINIB inhibits MEK1/2, a mitogen-activated protein kinase, which can abrogate   Figure 8: Results of ssGSEA immune infltration in the TCGA cohort. Te association between risk score and (a) immune score or (b) stromal score. (c) 16 immune cell ssGSEA scores between diferent risk groups. (d) 13 immune-related functional ssGSEA scores between diferent risk groups. Adjusted P values are: ns, not signifcant; * P < 0.05, * * P < 0.01, and * * * P < 0.001. resistance, leading to improved antitumor efcacy in renal cell carcinoma [54]. NUDT1 expression was positively correlated with carboplatin and triethylenemelamine sensitivity in cancer cells. Te increased expression of EIF4A1 was connected with higher sensitivity to cladribine in cancer cells. Carboplatin, triethylenemelamine, and cladribine are some of the most widely used cancer chemotherapeutics, but ccRCC is not sensitive to chemotherapy [55].
Currently, a large amount of literature on gene markers predicting the prognosis of ccRCC patients can be found. Some studies have constructed models without internal or external validation [56,57] or only constructed prediction models without nomogram diagrams [57,58]. Wenwei Chen and colleagues included age, sex, grade, and stage in univariate regression analysis but did not include TNM clinical indicators [59]. Zhao et al. constructed a nomograph diagram, including age, grade, stage, and risk score. Risk-score is classifed as low and high rather than specifc numerical values [56]. Compared with other gene predictive models of ccRCC patients, our research makes up for these defciencies and has specifc innovations and advantages. First of all, it is undeniable that our study is the frst comprehensive study of m7G-related mRNA in ccRCC. In addition, the nomogram diagrams contain risk scores and clinical indicators that can further efectively predict the outcome of patients with ccRCC. Furthermore, Zhao et al. only evaluated the accuracy of the nomogram diagram, but not the accuracy of the gene-related model [56]. However, we evaluated the accuracy of the model using ROC. Te closer the AUC is to 1.0, the higher the authenticity of the predictive model. Wang et al. also estimated the model's accuracy using ROC, and the risk AUC was 0.704 [60]. Our study demonstrates that the risk AUC is 0.748 in the TCGA cohort and the risk AUC is 0.811 in the E-MTAB-1980 cohort. It shows that our model is more accurate and efective. Tis study still has some faws in its process and methodology. Firstly, the ccRCC cohort is relatively small, and the samples lack complete clinical information. Secondly, how those m7G-related genes interact with each other and which pathway is involved in ccRCC needs further study. Finally, the molecular mechanism of m7Grelated genes in ccRCC remains to be conducted both in internal and external experiments.

Conclusion
Collectively, we constructed fve m7G-related gene predictive models, which performed well in predicting survival prognosis, immune microenvironment, and sensitivity to drugs in ccRCC. Ten, a prognostic nomogram for survival prediction based on fve m7G-related genes could improve survival estimates for ccRCC patients. However, the potential mechanism of m7G-related genes in ccRCC remains unclear and needs further exploration.

Conflicts of Interest
In this work, none of the authors has conficts of interest.