A Prognostic Signature for Colon Adenocarcinoma Patients Based on m6A-Related lncRNAs

N6-methyladenosine (m6A) modification is a common epigenetic modification. It is reported that lncRNA can be regulated by m6A modification. Previous studies have shown that lncRNAs associated with m6A regulation (m6A-lncRNAs) serve as ideal prognostic biomarkers. However, whether lncRNAs are involved in m6A modification in colon adenocarcinoma (COAD) needs further exploration. The objective of this study was to construct an m6A-lncRNAs-based signature for patients with COAD. We obtained the RNA sequencing data and clinical information from The Cancer Genome Atlas (TCGA). Pearson correlation analysis was employed to recognize lncRNAs associated with m6A regulation (m6A-lncRNAs). 24 prognostic m6A-lncRNAs was identified by univariate Cox regression analysis. Gene set enrichment analysis (GSAE) was used to investigate the potential cellular pathways and biological processes. We have also explored the relationship between immune infiltrate levels and m6A-lncRNAs. Then, a predictive signature based on the expression of 13 m6A-lncRNAs was constructed by the Lasso regression algorithm, including UBA6-AS1, AC139149.1, U91328.1, AC138207.5, AC025171.4, AC008760.1, ITGB1-DT, AP001619.1, AL391422.4, AC104532.2, ZEB1-AS1, AC156455.1, and AC104819.3. ROC curves and K M survival curves have shown that the risk score has a well-predictive ability. We also set up a quantitative nomogram on the basis of risk score and prognosis-related clinical characteristics. In summary, we have identified some m6A-lncRNAs that correlated with prognosis and tumor immune microenvironment in COAD. In addition, a potential alternative signature based on the expression of m6A-lncRNAs was provided for the management of COAD patients.


Introduction
Colon adenocarcinoma is a common pathological type of colon cancer, and its prognosis is poor [1,2]. Terapies for colon adenocarcinoma (COAD) include surgery, chemotherapy, and radiation therapy [3]. Surgery can cure about half of the COAD patients, but the recurrence rate stays high after surgery. Chemotherapy and radiation are not efective due to their side efects and drug resistance. Besides, the signifcant heterogeneity of COAD limits the utilization of traditional methods [4]. With the approach of the era of personalized therapy managements, traditional diagnosing failed to satisfy advanced diagnoses and therapies. It is expected to build more useful prognostic signatures to help improve personalized treatment management.
lncRNA is a kind of RNA molecule that does not encode a protein, with a length of more than 200 bp, and plays important roles in the carcinogenesis and progression of cancers, including COAD [19,20]. It has been proven that m6A modifcation can regulate the physiological functions of lncRNAs [21,22]. For example, the structure of lncRNA can be regulated by binding to m6A readers, allowing m6A residues to be accessed by specifc RNA-binding proteins [23,24]. m6A modifcation modulates the structure of lncRNA MALAT1 to play the function of the structural switch, which is related with cancer malignancies [25]. METTL16 (writer) was identifed as an RNA-binding protein of lncRNA MALAT1 [26]. m6A modifcation can stabilize lncRNA FAM225A that served as a sponge for miR-1275 and miR-590-3p in nasopharyngeal carcinoma [27]. METTL3 (writer) can stabilize and upregulate LINC00958, which is involved with the malignancy of liver cancer progression by sponging miR-3619-5p [28]. Previous studies have shown that m6A-lncRNAs serve as ideal prognostic biomarkers. Nevertheless, whether lncRNAs are involved in the regulation of m6A modifcation in COAD still need to be elucidated. Te objective was to identify prognostic m6A-lncRNAs and construct an m6A-lncRNAs-based prognostic signature for patients with COAD.
In this study, we obtained the RNA sequencing data in TCGA and identifed 24 prognostic m6A-lncRNAs and 13 m6A-lncRNAs were selected by the Lasso regression algorithm to construct prognostic signature. We have also verifed the reliability of the prognostic signature. In addition, a quantitative prognostic nomogram was constructed based on signature and clinical features.

Data Source and Preparation.
RNA sequencing data of 398 COAD patients with clinical information were obtained from TCGA. 19 COAD patients were excluded due to a lack of necessary clinical information. Subsequently, RNA sequencing data were divided into mRNAs and lncRNAs using the Ensembl Genome Browser [29]. Te corresponding clinical data included age, gender, tumor-node-metastasis (TNM) stage, pathological stage, and survival time. We randomly divided COAD patients at a ratio of 7 : 3 into the training cohort (267 patients) and validation cohort (112 patients). Related m6A-lncRNAs. 23 m6A  regulators, based on published articles, including 8 writers,  13 readers, and 2 erasers, were collected in this study (Table 1) [30,31]. Pearson correlation analysis was applied to the expression of lncRNAs and the 23 m6A regulators to recognize m6A-associated lncRNAs. Univariate Cox regression analysis was applied to recognize prognostic m6A-lncRNAs. We used the "limma" R software package to analyse the diferential expression of prognosis-related m6A lncRNA.

Consensus Clustering Analysis.
For a better understanding of the role of m6A-lncRNAs, consensus clustering was performed by the "ConsensusClusterPlus" R package to divide all samples into diferent clusters based on the expression of prognosis-relatedm6A-lncRNAs [32]. Subsequently, Kaplan Meier analysis was performed in diferent clusters. We also applied the CIBERSORT algorithm and the Wilcoxon test to analyse diferent immune cell infltration between clusters. Besides, immune, stromal, and ESTIMATE scores were calculated by the "ESTIMATE" R package. GSEA software was applied to investigate the potential cellular pathways and biological processes in different clusters.

Development and Evaluation of Prognostic Signature.
After prognosis-related m6A-lncRNAs were identifed, LASSO regression analysis was applied to setup a risk model by "glmnet" R package, which could avoid overftting by disposing of highly correlated lncRNAs [33,34]. Te signature was calculated in the following format: (1) Ten, COAD patients were classifed into high-and lowrisk subgroups. K M survival analysis was employed to compare whether there were diferences in survival between the two subgroups using "survival" R packages. To testify the prediction efcacy of the risk model, we employed ROC curves and measured the AUC values by R package "timeROC" [35].

Establishment of Prognostic Nomogram.
To further evaluate the reliability of the signature, a comprehensive analysis of risk score and clinical features was performed. Subsequently, a quantitative nomogram was developed on the basis of risk score and clinical features using the "rms" R package [36]. We applied the calibration curves to outline the accuracy of the nomogram.
2.6. Statistical Analysis. All statistical analysis in this study was performed using R software (V 4.0.4). Te Wilcoxon's test was employed to compare the diference. Kaplan Meier (K M) survival analysis was performed by using the log-rank

Consensus Clustering Analysis.
For further understanding the roles of prognostic m6A-lncRNAs, patients were clustered according to the expression of m6A-lncRNAs by consensus clustering analysis. As displayed in the consensus matrix map and cumulative distribution function (CDF) plot for k � 2, the interference was the smallest (Figures 2(a) and 2(b)). Te overall survival results of patients in cluster 1 are better than those in cluster 2 ( Figure 2(c)). Figure 2(d) displays the correlation between clinicopathological features and clusters.

Gene Set Enrichment Analysis (GSAE).
To explore the potential cellular pathways and biological processes of prognostic m6A-lncRNAs, the GSEA was employed between two clusters. As displayed in Figure 3, genes in cluster 2 were enriched in the p53-signaling-pathway, proteasome, cell cycle, and peroxisome. Besides, genes in cluster 1 were enriched in TGF-betasignaling-pathway, ERBB-signalingpathway, ECM-receptor interaction, MAPK-signalingpathway, and JAK-STAT-signaling-pathway.

Immune Cell Infltration and Distribution of Immunity.
To explore the roles of m6A-lncRNAs in the tumor immune microenvironment (TIME), we compared the scores of 22 diferent immune cell types in two clusters. As shown in Figure 4(a), activated memory CD4 T cells are rich in cluster 2 (P < 0.05). Figure 4(b) shows the positive correlation between the m6A-lncRNAs and PD-L1. Furthermore, we analysed the distribution of immunity in two clusters. Cluster 1 have a higher stromal score, immune score, and ESTIMATE score (P < 0.05, Figures 4(c)-4(e)).

Construction and Validation of m6A-lncRNAs Signature.
Te LASSO regression algorithm was used to avoid overftting and for constructing risk scores. As displayed in Figures 5(a) and 5(b), at penalty factor (λ) 13, the coefcients of some variables are near to 0 [37]. Ultimately, 13 m6A-lncRNAs were identifed as independent prognostic factors. Te m6A-lncRNAs risk model for predicting prognosis in COAD was established on the basis of coefcients of 13 m6A-lncRNAs in the following format: risk score = UBA6-AS1 * * 0.7684476 + AC139149.1 * 0.7685387 Figure  5(c)). Te K M curves showed that the survival outcome of the low-risk subgroup was better (P < 0.05) (Figures 5(d) and 5(g)). Te time-dependent ROC curves indicated that, in the training cohort, the AUC value at 1 year was 0.845, 0.797 at 3 years, and 0.813 at 5 years ( Figure 5(e)). In the validation cohort, the AUC value at 1 year was 0.750, 0.821 at 3 years, and 0.935 at 5 years ( Figure 5(h)). Furthermore, the risk score distribution plot and scatter plot also showed that the survival outcome of the high-risk subgroup was worse. Te heatmap illustrated that the expressions of risky lncRNAs were upregulated in the high-risk group, while protective lncRNA was downregulated (Figures 5(f ) and 5(i)).
To further validate the risk model, the correlation between risk score and clinical characteristics was analysed. As shown in Figure 6(a), the risk score was signifcantly correlated with pathological stage and immune score (P < 0.05). Besides, the scatter plot showed that the risk score was also signifcantly correlated with PD-L1 expression and immune score (Figures 6(b) and 6(d)).

Development of Survival Prognostic Nomogram.
We comprehensively analysed the risk score and clinical characteristics. Risk scores, age, and pathological stage were identifed as signifcant independent prognostic variables (P < 0.05, Figures 7(a) and 7(b)), which revealed that the risk model served as a reliable tool for COAD patients. Subsequently, we constructed a quantitative nomogram on the basis of risk score and clinical characteristics (Figure 7(c)). Te calibration curves displayed the concordance between observed and predicted overall survival (Figures 7(d)-7(f )).

Discussions
N6-methyladenosine (m6A) modifcation is a common epigenetic modifcation in eukaryotes, involving many biological processes, such as RNA splicing, translation, and expression. M6A modifcation and lncRNAs play critical roles in the biological processes of COAD. For example, m6A regulators, such as METTL3, METTL14, and YTHDF2, have been proven to be involved in regulating the pathological process of COAD [16,17]. It is reported that lncRNA can be regulated by m6A modifcation. However, whether lncRNAs are involved in the regulation of m6A modifcation in the progression of COAD needs further exploration.
With the approach of the era of personalized therapy managements, traditional diagnosing failed to satisfy advanced diagnoses and therapies. It is expected to build more useful prognostic signatures to help improve personalized treatment management. In this study, we focused on identifying prognosis-related lncRNAs associated with m6A modifcation (m6A-lncRNAs) and used bioinformatics methods to establish a reliable risk model for patients with COAD.
We downloaded RNA sequencing data in TCGA. M6A-lncRNAs were recognized by Pearson correlation analysis according to the expression of lncRNAs and m6A regulators. Tere are 24 prognostic m6A-lncRNAs identifed by univariate Cox regression analysis. For exploring the biological features of these 24 prognostic m6A-lncRNAs, we divided the COAD patients into two clusters. GSAE has been applied to investigate the potential cellular pathways and biological processes. Accumulated studies have illustrated that tumor immune microenvironment (TIME) is correlated with tumorigenesis and the development of COAD [38][39][40][41]. In this study, we have explored the relationship between immune infltrate levels and m6A-lncRNAs. Tere are also signifcant diferences between the two clusters in ESTIMATE score, stromal score, and immune Score.

Journal of Oncology
Te Lasso regression algorithm is widely used to construct risk models. Te most important diference between lasso regression analysis and traditional stepwise Cox regression analysis is that it can process all variables simultaneously, instead of step by step [34], which greatly improves the stability of the model. In this study, LASSO regression analysis was conducted to avoid overftting and to construct risk scores. Ultimately, 13 m6A-lncRNAs, including UBA6-AS1, AC139149.1, U91328.1, AC138207.5, AC025171.4, AC008760.1, ITGB1-DT, AP001619.1, AL391422.4, AC104532.2, ZEB1-AS1, AC156455.1, and AC104819.3 were selected to construct the signature. To testify the reliability of the signature, COAD patients were divided, at a ratio of 7 : 3, into a training set and validation set. K M survival analysis and the ROC curves showed well discrimination of the signatures. In addition, risk score and clinicopathological characteristics were integrated into the analysis. Age, risk score, and pathological stage were recognized as independent prognostic factors, which also demonstrated the reliability of the risk model. Ten,             Tere were also limitations in our study. First, the m6A-lncRNAs risk model was constructed and validated based on the TCGA database. We did not verify the prognostic signatures in external independent cohorts. Second, the interaction between prognosis-related m6A-lncRNA and m6A regulator lacks experimental confrmation.

Conclusions
To sum up, we have identifed some m6A-related lncRNAs which were correlated with prognosis and tumor immune microenvironment. A reliable alternative prognostic signature was provided for the management of COAD patients. We also combined risk scores with clinical characteristics to establish a quantitative prognostic nomogram.

Data Availability
Te data analysed in this study are free available in Te Cancer Genome Atlas (TCGA) database (https://portal.gdc. cancer.gov).

Conflicts of Interest
Te authors declare that they have no conficts of interest.

Authors' Contributions
Te fnal version of the manuscript was approved by all the authors. SZZ designed the study and drafted the manuscript. YLP, QCD, CJY, and DJZ were in charge of data obtainment and statistical analysis. XJW, JZ, and MLL supervised this study and revised the manuscript. Su-Zhe Zhou, Ying-Lian Pan, Qing-Chun Deng, and Chang-Jun Yin contributed equally to this research and should be regarded as co-frst authors.