Identification and Validation of a m5C RNA Modification-Related Gene Signature for Predicting Prognosis and Immunotherapeutic Efficiency of Gastric Cancer

Background 5-methylcytosine (m5C) is a major site of RNA methylation modification, and its abnormal modification is associated with the development of gastric cancer (GC). This study aimed to explore the value of m5C-related genes on the prognosis of GC patients through bioinformatics. Methods First, m5C-related genes were obtained by nonnegative matrix factorization (NMF) analysis and differentially expressed analysis. The m5C-related model was established and validated in distinct datasets. Moreover, a differential analysis of risk scores according to clinical characteristics was performed. The enrichment analysis was carried out to elucidate the underlying molecular mechanisms. Furthermore, we calculated the differences in immunotherapy and chemotherapy sensitivity between the high- and low-risk groups. Finally, we validated the expression levels of identified model genes by quantitative real-time polymerase chain reaction (qRT-PCR). Results A total of five m5C-related subtypes of GC patients in the TCGA database were identified. The m5C-related model was constructed based on APOD, ASCL2, MFAP2, and CREB3L3. Functional enrichment revealed that the m5C-related model might involve in the cell cycle and cell adhesion. Moreover, the high-risk group had a higher abundance of stromal and immune cells in malignant tumor tissues and a lower tumor purity than the low-risk group. The patients in the high-risk group were more sensitive to chemotherapy and had better sensitivity to CTLA4 inhibitors. Furthermore, qRT-PCR results from our specimens verified an over-expression of ASCL2, CREB3L3, and MFAP2 in the cancer cells compared with the normal cells. Conclusion A total of five GC subtypes were identified, and a risk model was constructed based on m5C modification.


Introduction
GC is the ffth most common malignancy worldwide and the third leading cause of global cancer-related mortality [1,2]. Although clinical and surgical conditions improved significantly, the 5-year survival rate for GC patients remains very low, as more than 80% of patients are diagnosed at an advanced stage [3,4]. Now, surgical resection is still the most efective treatment for early GC. Besides, chemotherapy, radiotherapy, immunotherapy, and molecular targeted therapy also play essential roles in the prognosis for GC [5,6]. However, the mechanism of GC progression and metastasis is still unclear, and the prognosis leading to metastasis, recurrence, and advanced GC is not yet satisfactory. Terefore, it is urgent to study the mechanism of GC progression to develop new therapeutic strategies.
RNA modifcations, such as N6 methyladenosine (m6A), play a visible role in epigenetic gene regulation and cell function and are closely related to many human diseases such as cancer, neurological diseases, and immune disorders [7][8][9][10][11]. As another important RNA modifcation, m5C has attracted more and more attention, and like m6A, m5C has its methyltransferase, demethylase, and binding proteins [12]. Members of the NOP2/Sun domain family 1-7 (NSUN1-7) and DNA methyltransferase (DNMT) homolog DNMT2 act as m5C writers in mammals and catalyze methylation at the C5 site of RNA [13,14]. In contrast, TET2 oxidizes m5C to 5-hmC and then removes the methyl group [15,16]. Subsequently, the Aly/REF output factor (ALYREF) and Y-box binding protein 1 (YBX1), which are characterized by readers, recognize and bind the m5C motif and then perform diferent biological functions [17,18]. In addition, these regulatory factors are known to be synergistically involved in multiple tumor progressions with m5C modifcation. Chen et al. [19] found TRDMT1, an RNA methyltransferase known to methylate tRNA, is a writer of RNA m5C at sites of DNA damage and contributes to the resistance of cancer cells to radiotherapy and PARP inhibitors. Breast tumors expressing low levels of TRDMT1 are more responsive to radiotherapy. Du et al. [20] analyzed the clinical relevance of m5C regulators in pan-cancer. Liu et al. [21] wrote that the RNA m5C modifcation and its regulators have been shown to be involved in the progression of various cancers, including hepatocellular carcinoma, bladder cancer, glioblastoma multiforme, breast cancer, and head and neck carcinoma, indicating that RNA m5C might play an important role in tumorigenesis and progression.
In the present study, the efect of m5C on the prognosis of GC patients was explored by bioinformatics methods, which identifed fve m5C-related subtypes and mined four m5C-related genes as biomarkers, and based on the relationship of the prognosis model, patient survival, therapies, and the role of m5C in GC were demonstrated roundly.

Data
Source. GC-related datasets were obtained from Te Cancer Genome Atlas (TCGA) database (https://portal. gdc.cancer.gov/) and the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/gds). Te TCGA-GC dataset contains 32 normal cases and 373 cancer cases. Te 345 cancer samples that have complete survival data were split into a training set (242 cases) and a testing set (103 cases) according to a ratio of 7 : 3. Te t-test was used to compare the diferent characteristics between patients in training and testing sets. Te results are shown in Table 1. Moreover, the GSE15459 dataset containing 192 cancer cases was obtained from the GEO database as a validation set. Te 13 m5C RNA regulators (NOP2, NSUN2, NSUN3, NSUN4, NSUN5, NSUN6, NSUN7, DNMT1, TRDMT1, DNMT3A, DNMT3B, TET2, and ALYREF) were obtained from the previous literature [22].

Identifcation of m5C-Related
Subtypes. 373 GC samples from the TCGA database and the expression of 13 m5C RNA genes from the previous study were used for the nonnegative matrix factorization (NMF) analysis (R language, Version 0.23.0) [23] to identify m5C-related subtypes for GC patients. Ten, overall survival (OS) and disease-specifc survival (DSS) analyses of diferent subtypes were performed to screen the two subtypes with the most signifcant prognostic diferences. Tese two subtypes were then used in subsequent analyses. Moreover, the clinical characteristics of the two subtypes were analyzed, and the results were visualized by ggplot2 (R package, Version 3.3.5) [24]. Te immune cell infltration of the two subtypes was calculated using the ssGSEA algorithm in the GSVA (R package) based on 24 immune cell types [25] and the MCPcounter algorithm by immunedeconv (R package, Version 2.0.4) based on 8 immune cell types and 2 stromal cell types.

Construction and Validation of an m5C-Related Model.
Te edgeR (R package) (Version 4.1) is used to perform diferential expression analysis [26,27]. P < 0.05 and | log2FC| > 1 were considered as a diference. Te DEGs between the two subtypes with the most signifcant diferences were detected, and the DEGs between the GC samples (n � 373) and para-cancerous samples (n � 32) in the TCGA dataset were also screened. By overlapping DEGs selected above, the DEm5CRGs were fnally screened. Ten, Cox regression analyses and the LASSO algorithm were adopted to construct the risk signature. Te threshold was P < 0.05. Te risk score of each sample was calculated by the following formula: risk score � h0(t) × exp (β 1 X 1 + β 2 X 2 + ... β n X n ). Te h0(t) was the baseline hazard function, and the β was the regression coefcient. GC patients in the training set were split into high-and low-risk groups based on the median risk score. At last, R package Survminer and survival ROC were used to plot the Kaplan-Meier (KM) and ROC curves to evaluate the risk model in the training set, and then the testing and validation sets were used to validate [28,29].

Diferential Analysis of Risk Values in Clinical
Characteristics. Te stratifcation survival analysis was performed to confrm whether the risk model could apply in diferent clinicopathological characteristics (including age, gender, radiotherapy, and chemotherapy). Meanwhile, the clinicopathological data were involved in variance analysis to investigate diferences between clinicopathological features and risk values.

Construction of a Nomogram.
Te risk score and clinicopathological data were merged into Cox regression analyses to detect the independent prognosis factors. Ten, the selected independent prognostic factors were integrated to establish a nomogram. Furthermore, the calibration curves and the decision curve analysis (DCA) were plotted to assess the nomogram.
2.6. Diference Analysis and GSEA. Te DEGs between highand low-risk groups were detected by the "limma" package. P < 0.05 and |log 2 FC | > 1 were considered as a diference. R package clusterProfler (Version 4.0.2) was selected to perform GO enrichment and KEGG pathway analyses on these DEGs. Moreover, to further explore the related signaling pathways and potential biological mechanisms, R package clusterProfler (Version 3.18.1) [30] was adopted to perform GSEA enrichment analysis. Te signifcance 2 Journal of Oncology thresholds for GSEA were |NES| > 1, q < 0.25, and NOM P < 0.05.

Analysis of Immunotherapy and Chemotherapy.
Te immune cell infltration situations of the sample are inferred by the ESTIMATE and the CIBERSORT algorithms, and diferences were analyzed between the high-and low-risk groups from the training set [31]. Te tumor purity of the two groups was assessed using ABSOLUTE software. Te expression of targeted immune checkpoints and the sensitivity to immunotherapy were analyzed in the two groups, and the prediction of susceptibility to PD-1 and CTLA4 inhibitors was analyzed in the two groups using the SubMap algorithm. We used oncoPredict (Version 0.2) in R language to analyze the sensitivity of commonly used chemotherapy drugs of GC samples [32].

Identifcation of m5C-Related
Subtypes. NMF analysis fnally identifed fve m5C-related subtypes (Figures S1 and 1(a)). OS and DSS analyses showed that survival diferences between group3 and group4 were the most signifcant (P < 0.05; Figure S2). Te distribution features of the clinical characteristics and the infltration of immune cell types in group3 and group4 are shown (Figures 1(b) and 1(c)). Te two groups were quite diferent in 5-cell concentrations ( Figure 1(d)). Nine 5mC genes were signifcantly diferent between them (Figure 1(e)).
Te risk scores, patient survival status, survival time, and gene expression pattern are shown in Figures S4(a)-S4(c).

Diferential Analysis of Risk Values.
To implore the clinicopathological characteristics and the survival of cases in the two groups, a hierarchical analysis of the km curve in the TCGA cases was performed. High score patients younger than 60 years old or whose pathological stage were T3 or T4 had a worse prognosis ( Figure 3). Diferences analysis between clinicopathological features and risk values showed that M0 and M1 and Stage II, Stage III, and Stage IV had signifcant diferences ( Figure S5).

Construction of a Nomogram.
Score and treatment type were associated with GC cases prognosis and were the factor that were independent prognostic (Figures 4(a) and 4(b)). Ten, the nomogram model was constructed to predict the survival of GC patients (Figure 4(c)). Te calibration curves (C-index � 0.6547) and DCA curves of the nomogram were also plotted (Figures 4(d) and 4(e)).  3.6. Analysis of Immunotherapy and Chemotherapy. Te stromal score, the immune score, and the ESTIMATE composite score were obtained, and there were diferences in the ESTIMATE composite score and the stromal score between high-and low-risk groups (Figure 6(a); P < 0.0001). Te high-risk group has lower tumor purity (Figure 6(b)).
Tere are eight immune cell (macrophages M1, mast cells resting, etc.) abundances that difer between high-and lowrisk groups (Figure 6(c)). Te results of the correlation analysis between the risk score and immune cell abundance suggest that the abundance of monocytes, mast cells resting, and T cells CD4 memory resting was positively correlated with a risk score and the abundance of NK cells resting, T cells follicular helper, and T cells CD4 memory activated was negatively correlated with the risk score ( Figure 6(d)). Te immune checkpoint PD-L1 expression levels differed signifcantly between high-and low-risk groups (Figure 7(a)). Te expression of routine immune checkpoints in high-and low-risk groups is shown in supplementary table 2. Te high-risk group was more sensitive to the overall immune checkpoint and had better sensitivity to CTLA4 inhibitors (Figure 7(b)).
Among 198 commonly used drugs for the treatment of GC, 182 species showed signifcant diferences between high-and low-risk groups, and most high-risk groups were more sensitive to these drugs than low-risk groups (Figure 7(c)).

Expression Validation of Prognostic lncRNAs.
Te qRT-PCR results from our specimens verifed an over-expression of ASCL2, CREB3L3, and MFAP2 in GC cells compared with the human immortalized normal gastric cells (Figure 8).

Discussion
It is well known that GC is one of the leading causes of cancer-related deaths globally [33]. Although signifcant advancements in the treatments for GC have been acquired in recent years, the overall prognosis of GC patients is still poor [34]. m5C, in which the methyl group is attached to the ffth position of the cytosine ring, is catalyzed by RNA methyltransferase. m5C modifcation has also been closely related to cancer progression [35]. Meanwhile, bioinformatic studies have shown that m5C regulators could be    Journal of Oncology used as a prognostic factor for lung adenocarcinoma (LUAD), head and neck squamous cell carcinoma (NHSCC), and hepatocellular carcinoma (HCC) [36][37][38].
With the development of molecular biology and clinical treatment with precision therapy, researchers have been exploring new prognostic markers of GC at the molecular level. Zhu et al. [3] revealed the expression, prognostic value, potential functional networks, protein interactions, and immune infltration of MTFR2 (mitochondrial fssion regulator 2) in GC, concluding that MTFR2 may be a potential prognostic marker and therapeutic target for GC patients. Zhu et al. [34] explored the association between VEGFR-2 and the prognosis of GC. Tey showed that the high expression of VEGFR-2 as well as the VEGFR-2 rs1870377 A > T genetic polymorphism may be prognostic factors for patients with resected GC. Zu et al. [39] considered that the preoperative prealbumin level was an independent prognostic factor for GC patients, and it is essential to predict the prognosis of patients with GC. Here, we established a prognosis model for GC based on fve m5C-related subtypes and four DEm5CRGs (APOD, ASCL2, MFAP2, and CREB3L3) as biomarkers, employing 405 GC samples about second-generation sequencing data, clinical information, and copy gene variation information from the TCGA database, and at last, verifying the four biomarkers in GC cells compared with the human immortalized normal gastric cells by the RT-qPCR method, which is usually missing in bioinformatic analysis.
Te four m5C-related genes based on 2 m5C-related subtypes afect the occurrence and development of cancer. Firstly, APOD (apolipoprotein D) is a lipocalin that participates in various cellular processes, including       Journal of Oncology cytoprotection, and is a biomarker positively correlated with the prognosis of breast and prostate cancer [40]. APOD was also reported to be the prognostic factor of GC. Patients with high expression of APOD might have a shorter OS time, correlating with worse prognosis [41]. Second, ASCL2 (Achaete-scute homolog 2) is an essential helix-loop-helix transcription factor and a cancer stem cell marker, and specifc reports have revealed that ASCL2 promotes cell proliferation and migration in colon cancer [42,43]. In the meantime, ASCL2 also serves an essential role in the growth of GC. It was able to downregulate the expression level of miR223, contribute to EMT (the epithelial-mesenchymal transition), and promote gastric tumor metastasis, which indicated that ASCL2 might serve as a therapeutic target in the treatment of GC [44]. Tird, MFAP2 (microfbrilassociated protein 2) plays a vital role in the regulation of the integrin signal pathway in cancer cell-ECM (extracellular matrix) interaction. Te intracellular form of MFAP2 can induce the transcription of integrin α4 in human osteosarcoma cell line SAOS-2 in vascular development [45]. Scholars also validated that MFAP2 was up-regulated in GC tissue, and it was implicated in the malignant behavior of GC cells, such as proliferation, migration, and invasion [46]. Te fourth biomarker is CREB3L3, a member of the basic leucine zipper family and the AMP-dependent transcription factor family. It can link to acute infammatory response and hepatocellular carcinoma [47]. Dewaele et al. illustrated that EWSR1-CREB3L3 gene fusion is associated with a mesenteric sclerosing epithelioid fbrosarcoma [48]. In GC, CREB3L3 is related to the OS derived from univariate and multivariate Cox regression analysis and is highly expressed in cancer tissues [49]. In a word, the four biomarkers can afect the occurrence and development of cancer in various degrees, including GC, and the guiding signifcance is great to analyze the relationship between the prognosis model and the survival of GC patients.  Moreover, GO and KEGG function analysis indicated that DEGs among the four gene biomarkers were closely correlated with biological processes and signaling pathways, such as ECM organization, extracellular structure organization, external encapsulating structure organization, complement and coagulation cascades, vascular smooth muscle contraction, and focal adhesion.
Te m5C locus has been reported to be involved in a variety of biological processes, including structural stability and metabolism of RNA, tRNA recognition, and stress response [8]. A recent study has shown that in human urothelial cell carcinoma of the bladder, m5C regulators bound to the 3′UTR of oncogene mRNA, stabilizing its expression, thereby promoting cancer progression [50]. Yang et al. [17] found that NSUN2 (NOP2/Sun domain family, member 2; MYC-induced SUN domain-containing protein, Misu) was the main enzyme catalyzing m5C formation, while the Aly/ REF export factor (ALYREF, an mRNA transport adaptor, also named THOC4) functioned as a specifc mRNA m5Cbinding protein regulating mRNA export. In addition, p57 Kip2 was an important downstream gene regulated by NSUN2 in GC. p57 Kip2 is the recently found CDK inhibitor of the Cip/Kip family and has been involved in many biological processes, including cell cycle control, diferentiation, apoptosis, tumorigenesis, and development, which is in accordance with GO terms and KEGG pathways of 4 m5Crelated genes [51,52]. Previous studies found that the expression level of NSUN2 was negatively correlated with p57 Kip2 , and the ability of NSUN2 to knockdown cells proliferation was enhanced after p57 Kip2 silencing in GC. It revealed another regulatory mechanism that NSUN2 plays an oncogenic role by repressing p57 Kip2 expression in GC. Te cause may be NSUN2 destabilizing the p57 Kip2 mRNA relying on its methyltransferase activity and m5C modifcations in the 3′-untranslated region (UTR) of p57 Kip2 mRNA [53].
It has been reported that m5C modifcation is involved in immune microenvironment regulation, and the tumor immune microenvironment plays a role in the efect of m5C regulators on patient prognosis [54]. ALYREF, the Aly/REF nuclear export factor, functions as an m5C reader; its expression levels were signifcantly associated with immune infltrating cells, such as B cells, macrophages, NK cells, and dendritic cells [55]. In an eight-lncRNAm5C-related prognostic signature, monocytes, memory B cells, activated mast cells, and naïve CD4 T cells presented a signifcant diferences in high-and low-risk groups [56]. In the present study, signifcant diferences existed in 5 types immune infltrating cells obtained by the MCP counter algorithm, including NK cells, monocytic lineage, myeloid dendritic cells, cytotoxic lymphocytes, and neutrophils, which have similarities with previous studies.

Conclusion
Four DEm5CRGs were identifed as biomarkers of the prognostic model in GC using three cohort profle datasets and integrated bioinformatics analysis. Te expression pattern and prognostic value of m5C genes in GC were determined, and a novel m5C gene-based risk scoring system was established to predict the clinical outcomes of GC patients. It was found that m5C genes can reliably predict the OS of GC patients, providing a new target for the treatment of GC. However, to provide patients with a better prognosis and fnd the ideal individualized and targeted therapy, further prospective trials to test clinical efcacy are necessary.

Data Availability
Te datasets generated and/or analyzed during the current study are available in the TCGA database (https://portal.gdc. cancer.gov/) and GEO database (https://www.ncbi.nlm.nih. gov/geo/).

Ethical Approval
All the obtained data were used according to the GEO and TCGA data access policies, as well as publication guidelines.

Conflicts of Interest
Te authors declare that there are no conficts of interest.

Authors' Contributions
All authors made positive contributions to the conception and design of the bioinformatic analysis.