Apoptosis-Associated Gene Expression Profiling Is One New Prognosis Risk Predictor of Human Rectal Cancer

Background Prior research has revealed the predictive significance of a series of genetic markers in the prognosis of rectal cancer (RC), but the roles of apoptosis-associated genes in RC are rarely studied. Methods The RNA-seq data as well as clinical data about patients with rectum adenocarcinoma (READ) were downloaded from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Additionally, 87 apoptosis-associated genes were downloaded and acquired from Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Comprehensive bioinformatics analysis was carried out for deep exploration of the expression and prognostic significance of these genes. Least absolute shrinkage and selection operator (LASSO) and multivariate Cox regression analysis was performed for the establishment of a risk scoring equation for the prognosis model and construction of a survival prognosis model. ROC curves were drawn for evaluating the accuracy of the model. A real-time quantitative PCR assay was conducted for quantification of apoptosis-associated proteins related to prognosis. Results Eight genes were identified as hub genes associated with the prognosis of PFS. A risk model of prognosis prediction based on four gene signatures (CYCS, IKBKB, NFKB1, and TRADD) was constructed. According to further analysis of this model, the high-risk group experienced worse overall survival than the other. The prognosis model demonstrated a favorable predictive ability, with areas under the receiver operating characteristic curves (AUC) of 0.720, 0.641, and 0.677 in forecasting the 1-, 2-, and 3-year prognosis, respectively. In addition, CYCS and NFKB1 presented low expression, while IKBKB and TRADD presented high expression in TCGA and clinical tumor samples. Conclusions A four-gene signature risk model for prognosis forecasting of RC has been constructed, which possesses favorable predictive ability, which offers ideas and breakthrough points to the apoptosis-associated development of RC.


Introduction
According to the World Cancer Report 2020, colorectal cancer (CRC) ranks second in frequently seen cancers among both men and women [1]. Appropriately 30%-35% of CRC patients are rectal cancer (RC) patients, most of which are rectal adenocarcinoma (READ) [2]. It is the tumor from the dentate line to the border of the rectum and sigmoid colon, which can be easily confirmed through sigmoid colonoscopy and digital rectal examination, and mainly afflicts individuals > 45 years old [3,4]. RC can be treated by surgery, radiotherapy, chemotherapy, molecular targeted therapy, etc. [5][6][7][8]. Clinically, surgical resection is still the primary means for it. However, many patients with cancer miss the optimal surgery timing because of the lack of notable symptoms in the early phase, so the treatment outcome of advanced RC is still unsatisfactory [9]. CRC is classified into colon cancer and RC [10,11]. Despite their various common characteristics, the two have some crucial differences, including the high postoperative local recurrence rate of appropriately 15%-45% in RC [12]. Through years of efforts, the number of oncogenes and tumor suppressors and markers of RC has been clarified [13,14]. However, the risk model for prognosis prediction of READ requires further research. Besides, there are abundant studies on colon cancer than rectal cancer which motivated us to perform this study to construct a prediction model of prognosis for rectal cancer patients. The innovation of this study is to construct a new and promising risk model for predicting the prognosis of rectal cancer. Accordingly, it is of crucial significance to find and establish a prognosis model of READ for developing strong diagnosis and therapy methods and evaluating the prognosis.
Apoptosis, or programmed cell death, also takes a pivotal part in the development and maturation of normal tissues. It maintains the homeostasis in vivo by controlling the immune system, as the main cellular mechanism that eliminates DNA-damaged cells and maintains the homeostasis of tissues for mammals [15][16][17]. There exist two primary ways to activate apoptosis, the external way and the internal way. Tumors can escape apoptosis in many means. For instance, the increase in BCL-2 protein and deletion of BAX/BAK suppress the apoptosis and thus trigger tumor [18], and suppression of caspase function also hinders apoptosis function [19]. In addition, in vivo and in vitro assays have verified the roles of many chemopreventive drugs in inducing transformed cells' apoptosis [20,21].
Zhang et al. [22] have confirmed some metabolic genes associated with RC prognosis. Huang et al. [23] have identified a risk model based on 10 M6A gene markers for prognosis prediction of RC patients. However, the application of apoptosis-associated genes in forecasting the prognosis of RC is yet to be reported. Accordingly, this study analyzed the expression and associations of 87 apoptosis-associated genes in RC and used LASSO regression and Cox regression analyses to screen out prognosis-associated gene markers. Finally, a prognostic risk prediction model composed of four gene signatures was constructed, which had high accuracy.

Data
Source. RNA-seq as well as clinical data about READ cases were acquired from The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/) and RNA-seq data of healthy tissues from the Genotype-Tissue Expression (GTEx, https://gtexportal.org/) project, removing batch effects from normalized data and corresponding to the corresponding clinical samples, removing duplicates and deleted samples from the downloaded data and cases with missing clinical outcomes. For the analysis of patients' clinical data, the data of those with unknown survival time and those with survival time of "0" were deleted. 87 apoptosis-associated genes were acquired from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database.

Specimen Collection.
Totally, 33 RC tissue specimens and 33 corresponding paracarcinoma tissue specimens were acquired from RC patients who underwent surgery in the Gastroenterology Department in the 1 st Affiliated Hospital of Wenzhou Medical University and the Large Intestine Surgery Department in Wenzhou Central Hospital between January 2019 and July 2020. The resected RC tissues and paracarcinoma tissues were immediately placed in liquid nitrogen for later analyses. All the postoperative specimens were confirmed by pathological examination, and the site that cancer tissues were taken was a cancer-rich site, and no obvious necrotic tissue was found in the specimens. Our study was performed after ratification by the hospital Medical Ethics Committee.
2.3. Gene Expression Analysis. The Limma software package of R software (version 4.0.3) was adopted for studying the differential expression of mRNA. The adjusted P value was analyzed based on TCGA or GTEx to correct the false positive result. Under the condition of jlogFCj ≧ 1 and adjusted P < 0:05, genes with aberrant expression were screened, and the heat map was drawn via R package ggplot2.

Protein-Protein Interaction (PPI) Network Construction.
Genes with differential levels were evaluated using Metascape (https://metascape.org/gp/index.html#/main/step1), and PPI network was thus constructed. The MCODE plugin in Cytoscape software (version 3.8.2) was adopted for further exploration of gene interaction.

Kaplan-Meier Survival
Analyses. Survival in R package was utilized for survival analyses. P value and hazard ratio (95% CI) in KM curves were acquired through both logrank test and univariate Cox proportional hazard regression.

LASSO Model Construction.
With the LASSO regression algorithm, characteristics were selected, and 10-fold crossvalidation was carried out for parameter determination to acquire a suitable model. Then, the genes acquired by LASSO regression were treated with multivariate Cox regression, and the multivariate regression coefficient of every gene was calculated, on which a risk scoring equation was constructed. Patients were grouped into the high-/lowrisk groups in the light of the median risk score. KM survival curves were used for analyzing and comparing the two groups' overall survival (OS), and time-associated receiver operator characteristic curve (ROC) was adopted for predictive value evaluation of gene markers.

Protein Expression Verification.
Immunohistochemical staining maps about protein levels of gene signatures in CRC tissues and normal rectum tissues were acquired from the Human Protein Atlas (HPA).
2.9. Gene Set Enrichment Analysis (GSEA). To observe the effect of gene expression on tumors, the samples were divided into two groups of high and low expression based on the median expression level of genes, and the enrichment of KEGG and HALLMARK pathways in the high-and lowexpression group was analyzed using GSEA, respectively. . Shanghai Sangon Biotech Co., Ltd. completed the synthesis of specific primers adopted in qRT-PCR (Table 1: primer sequences). The calculation of relative expression was carried out using 2 -ΔΔCt (internal reference: GAPDH). The assay was conducted in triplicate, and the obtained results were averaged.
2.11. Statistical Analyses. SPSS 22.0 (IBM Corp., Armonk, NY, USA) was adopted for data analyses and GraphPad Prism 8.0 (GraphPad Software, La Jolla California USA) for visualization of data into figures. Measurement data were presented by (mean ± SD), and every assay was performed in triplicate. The independent-samples t-test was adopted for intergroup comparison of measurement data in normal distribution. Differentially expressed genes between normal tissues and tumor tissues were identified through the Wilcoxon test. P < 0:05 implies a notable difference.

Expression and Interaction of Apoptosis-Associated
Genes. Based on observation on the expression of 87 apoptosis-associated genes in 165 RC specimens and 10 corresponding paracancerous specimens from TCGA and 799 normal tissue specimens from GTEx, 77 genes with differential levels were acquired via variation analysis (Figure 1(a): the heat map). For exploring the interaction of these apoptosis-associated genes, PPI was conducted, and the MCODE plugin in Cytoscape was adopted for the calculation of characteristics of each node in the network diagram and their visualization ( Figure 1(b)). The functional enrichment of all MCODE modules is presented in Table 2.
According to the results, 5 MCODEs were mainly enriched in IL-1 signaling pathway, apoptosis, c-FLIP regulation, and amyotrophic lateral sclerosis (ALS).

Establishment of a Risk Model for Prognosis Prediction
According to the median value of risk scores, these specimens were grouped into the high-/low-risk groups. As the distribution of survival time showed in Figure 3(c), more patients died in the high-risk group than in the low-risk group, and there were more patients with shorter overall survival (OS) in the high-risk group than in the low-risk group. KM analysis revealed notably worse prognosis in the high-risk group than in the other (Figure 3(d)). Additionally, the sensitivity and specificity of this model for forecasting the OS of high-/low-risk cases were verified via ROC curves. The AUC of the risk model was 0.720, 0.641, and 0.677 in forecasting the 1-, 2-, and 3-year prognosis, respectively, demonstrating relatively high accuracy in forecasting the prognosis and survival of READ patients (Figure 3(e)).

Independent Predictive Significance of the Risk Model.
We conducted multivariate and univariate Cox regression analyses for evaluating the feasibility of adopting the risk scores of four gene signatures as one independent factor for prognosis. According to the latter analysis, in TCGA cohort, risk score and TMN staging were independent factors for forecasting adverse survival (Table 3). According to the former analysis, risk score was one independent prognostic predictor after adjustment or other confounding factors (Table 4).

Discussion
RC is one frequently seen cancer, with an unfavorable postoperative cure rate and OS and a high local recurrence rate. Over the past few years, searching for gene markers associated with RC prognosis and developing other prediction methods for prognosis have become a hot focus [13,24]. Our study acquired the information of apoptosis-associated genes and corresponding data from TCGA-READ dataset. Through LASSO regression analysis, a risk model for forecasting RC prognosis was constructed, which delivered high accuracy in forecasting patients' prognosis in 1, 2, and 3 years (AUC: 0.720, 0.641, and 0.677), with four apoptosisassociated gene signatures. Subsequently, multivariate and univariate Cox regression analyses demonstrated that the risk scores given to the 4 gene signatures could serve as independent prognostic markers.
The risk model for prognosis prediction of RC based on 4 apoptosis-related genes was composed of CYCS, IKBKB, NFKB1, and TRADD. Few studies were conducted to construct prediction model for the prognosis of rectal cancer. Zuo et al. [25] identified a 6-gene signature predicting prognosis for colorectal cancer including rectal cancer. Thus, our prediction model of prognosis in rectal cancer is of significant. Whereafter, verification results by HPA database and qPCR of clinical tissue specimens revealed lowly expressed CYCS and NFKB in RC tissues and highly expressed IKBKB and TRADD in them.
CYCS, released by mitochondria, forms a large protein complex called apoptotic bodies by interacting with Apaf-1. The complex recruits and activates caspase-9 to initiate caspase cascade and apoptosis and thus supports the synthesis of ATP in mitochondria. CYCS is also bound up with cancers involving apoptosis and p53 tumor inhibition pathway [26]. Liu et al. [27] have revealed that the upregulation of CYCS triggers the cascade activation of caspase-9 and caspase-3 and then gives rise to apoptosis. However, in our study, CYCS presented low expression in RC cases, and the low expression was strongly bound up with the short PFS. Similar to prior research, patients with a low CYCS level show weaker apoptosis of cancer cells. As everyone knows, the IKK complex is composed of IKKα, IKKγ, and IKKβ (also referred as IKBKB) [28]. The last one has been verified to be implicated in tumor growth through NF-κB activation and phosphorylation-dependent suppression of tumor suppressor [29,30]. It has been confirmed to have increased     10 Disease Markers expression and/or abnormal activity in various kinds of human cancers including osteosarcoma [31]. Gong et al. [32] have discovered the association of IKBKB rs2272736 with the survival rate of gastric cancer. Our study analyzed the PFS time of TCGA specimens and revealed a shorter PFS time of patients with high IKBKB expression. NFKB1 is one subunit of NF-κB [33]. It has been verified to be one pathway-specific tumor inhibitor, which can prevent hematological malignancies after cell damage mediated by alkylating agent (N-methyl-N-nitrosourea) [34]. Prior research has revealed that NFKB1 acts via p50 homodimer and is a suppressor of neutrophil-driven hepatocarcinoma [35]. The significance of NFKB1 function could be observed in models of mouse: Nfkb1−/− mouse showed intensified inflammatory response and susceptibility to DNA injury in some forms and finally suffered cancer and quick aging phenotype [36]. However, in our study, NFKB1 presented low expression in RC cases, and the low expression was also strongly bound up short PFS. The results also indirectly reflect that downregulated NFKB1 could not act as a tumor suppressor gene, which induces RC development. TRADD is one crucial TNF signal transduction medium mediated by TNFR1. Its mechanism of action is primarily responsible for recruiting other effector proteins, thus activating MAPK, NF-κB, and other signal pathways including cell death [37,38]. On the one hand, TRADD takes a pivotal part in TNF-α-induced proinflammatory reaction by interacting with TNFR1 [39]. On the other hand, TNF-α also initiates apoptosis and necrosis by recruiting FADD protein [40]. Based on our study results, IKBKB, NFKB1, and TRADD are all bound up with the NF-κB signaling pathway, which offers ideas to the further study of RC. Based on the above research results, we constructed a model for the prediction of RC prognosis based on 4 apoptosis-associated genes and further revealed the feasibility of using risk score based on this model as one independent predictor of RC patients' prognosis. The results provide a potential novel target for RC therapy and also a novel target and idea for targeted therapy of tumors. The study also has some limitations. Genes in tumors are studied    14 Disease Markers less, and their mechanism of action needs deeper exploration. Besides, the clinical tissue samples we collected for this study do not include the follow-up survival information which cannot be used for the validation of the risk model. Whether the conclusions already reached can be verified equally in vivo or in vitro needs to be proved by more follow-up basic biological experiments. In addition, due to the rare public RNA-seq datasets on rectal cancer, it is urgent to collect more clinical rectal cancer sample to construct a transcriptome library.

Data Availability
Bioinformatics data from the TCGA and GEO public databases were used to support this study. These previous datasets have been attributed where relevant in the text (https://portal.gdc.cancer.gov/) (https://gtexportal.org/).