Construction of a CXC Chemokine-Based Prediction Model for the Prognosis of Colon Cancer.

Colon cancer is the third most common cancer, with a high incidence and mortality. Construction of a specific and sensitive prediction model for prognosis is urgently needed. In this study, profiles of patients with colon cancer with clinical and gene expression data were downloaded from Gene Expression Omnibus and The Cancer Genome Atlas (TCGA). CXC chemokines in patients with colon cancer were investigated by differential expression gene analysis, overall survival analysis, receiver operating characteristic analysis, gene set enrichment analysis (GSEA), and weighted gene coexpression network analysis. CXCL1, CXCL2, CXCL3, and CXCL11 were upregulated in patients with colon cancer and significantly correlated with prognosis. The area under curve (AUC) of the multigene forecast model of CXCL1, CXCL11, CXCL2, and CXCL3 was 0.705 in the GSE41258 dataset and 0.624 in TCGA. The prediction model was constructed using the risk score of the multigene model and three clinicopathological risk factors and exhibited 92.6% and 91.8% accuracy in predicting 3-year and 5-year overall survival of patients with colon cancer, respectively. In addition, by GSEA, expression of CXCL1, CXCL11, CXCL2, and CXCL3 was correlated with several signaling pathways, including NOD-like receptor, oxidative phosphorylation, mTORC1, interferon-gamma response, and IL6/JAK/STAT3 pathways. Patients with colon cancer will benefit from this prediction model for prognosis, and this will pave the way to improve the survival rate and optimize treatment for colon cancer.


Introduction
Colon cancer is one of the most common tumors observed in the world [1]. In the United States, colon cancer is the third most commonly diagnosed cancer, and the second most common cause of cancer-related death [2]. In China, colon cancer is the fifth most common cause of cancer-related death [3]. As a result of improvements in treatment and earlier detection, from the mid-1970s to the most recent time period (2006)(2007)(2008)(2009)(2010)(2011)(2012), the 5-year relative survival rate for all stages of colon cancer increased from 51% to 66% [2]. Despite dramatic reductions in colorectal cancer incidence and mortality, striking disparities by age, race, and tumor subsite remain [2,4]. Colorectal cancer incidence rates are about threefold higher in transitioned versus transitioning countries [4]. Novel biomarkers with clinical value are thus essential to improve compliance rates and predict poor prognoses for colon cancer.
CXC chemokines (CXCLs 1-16) are heparin-binding proteins that display disparate roles in the regulation of angiogenesis, angiostasis, and metastasis in cancer [5]. CXCLs are widely expressed in gastrointestinal cancers and are correlated with prognosis [6][7][8]. Recently, CXCLs have emerged as putative plasma biomarkers for pancreatic cancer diagnosis [9,10]. Overexpression of CXCL1 is associated with tumor progression and poor prognosis in hepatocellular carcinoma [11]. CXCL4 is a predictor of tumor angiogenic activity and a prognostic biomarker in patients with non-small-cell lung cancer (NSCLC) undergoing surgical treatment [12]. CXCL5 favors tumor progression by attracting neutrophils [13]. CXCL12 is associated with gallbladder carcinoma progression [14]. Highly expressed CXCL16 is associated with good prognosis and increases tumor-infiltrating lymphocytes in colon cancer [15]. In this study, we investigated the potential of CXCLs as prognostic biomarkers for colon cancer.
This study is the first to report that the prediction model based on the risk score of the multigene model and three clinicopathological risk factors can predict the survival of patients with colon cancer, indicating that patients with colon cancer will benefit from this prediction model to improve survival rate. and GSE44076 have no survival data and were used solely for differential expression gene analysis. The associations of overall survival and clinic pathological information of the patients were analyzed by univariate and multivariate Cox regression analyses. Correlations between the expression of CXCLs and clinical characteristics of patients with colon cancer were investigated using Pearson's correlation coefficient. Statistics were performed using IBM SPSS Statistics for Windows, version 23.0 (IBM Corp., Armonk, N.Y., USA).

Differential Expression Gene
Analysis. Differential expression gene analysis was performed to estimate the difference in gene expression between tumor samples and healthy controls using the "limma" and "edgeR" packages for GEO and TCGA data, respectively, using R (R Core Team, Vienna, Austria) [16,17]. Consequently, log 2 foldchange (logFC), P value, and the false discovery rate (FDR) (or adjusted P value) of each gene were obtained. Expression patterns of each CXC chemokine were illustrated by heat map. CXCLs with |logFC| > 1, P value < 0.05, and FDR < :05 were considered as differentially expressed genes (DEGs). A Venn diagram was drawn to show overlapping DEGs from the four datasets. The expression differences of each overlapping DEG were presented in boxplots. 2.5. Nomogram Construction and Assessment. Nomograms for individualized prediction were generated based on risk scores from the multigene models and clinical risk factors to predict 3-year and 5-year overall survival (OS) using the "rms" package in R. Concordance index (C-index), ROC curve (AUC), and calibration plots were obtained using R to evaluate the performance of the nomograms.
2.6. Pathway Analysis. The potential biological pathways of CXCLs were investigated by gene set enrichment analysis (GSEA) [18], a computational method that determines whether an a priori defined set of genes shows statistically significant differences between two biological states. Gene sets enriched in low-and high-risk patient groups were obtained using the expression profiles of patients' tumor samples by java GSEA. KEGG gene sets (v6.2), oncogenic signature gene sets (v6.2), and hallmark gene sets (v6.2) were chosen as references in this study. Gene sets whose results are P < 0:01 and FDR < 0:25 were considered significant.

Coexpression Network Analysis. Genes coexpressed with
CXCLs were screened by performing weighted gene coexpression network analysis (WCGNA) [19], a biological method for describing the correlation patterns among genes across microarray samples. The network was drawn via Cytoscape (v3.6.1).

Clinical Characteristics of Patients with Colon Cancer.
Relationships between the clinical characteristics and OS of patients with colon cancer in GSE41258 and TCGA were clarified by performing univariate and multivariate Cox regression analyses. In univariate analysis, poor OS of patients was significantly related to advanced tumor-nodemetastasis (TNM) stage, T3 and T4 stages, N2 and N3 stages, and M1 stage in both GSE41258 and TCGA (Tables 1 and 2,).

BioMed Research International
Characteristics with significant P values from the univariate analysis were screened using multivariate analysis. Multivariate analysis revealed that N stage and M stage in GSE41258 and T stage and M stage in TCGA might be independent prognostic factors for patients with colon cancer (Tables 1  and 2). Additionally, the correlations between the clinical characteristics of colon cancer and expression of CXCLs were also investigated. The expression of several CXCLs was significantly related to TNM stage, N stage, M stage, and p53 mutants in GSE41258 (Table 3) and associated with age, TNM stage, N stage, and M stage in TCGA (Table 4).

Identification of CXCLs Differentially Expressed between
Tumor and Normal Samples. To systematically identify CXC chemokine DEGs in colon cancer, we compared their expression levels between tumor and normal samples. In the GSE41258, TCGA, GSE68468, and GSE44076 datasets, 8/14, 12/16, 9/14, and 11/15 CXC chemokine genes, respectively, were found significantly aberrantly expressed in colon cancer (Figures 1(a)-1(d)). Furthermore, the Venn diagram demonstrated that a total of six DEGs, including CXCL1, CXCL11, CXCL12, CXCL2, CXCL3, and CXCL5, overlapped in the aforementioned datasets (Figure 1(e)). Among these, CXCL1, CXCL11, CXCL2, CXCL3, and CXCL5 were all upregulated, whereas CXCL12 was downregulated in tumor samples compared to normal tissue. The expression differences between tumor and normal tissues from each dataset are shown by boxplot (Figures 1(f)-1(i)). Expression difference analysis revealed that many CXCLs, especially the overlapping DEGs (CXCL1, CXCL11, CXCL12, CXCL2, CXCL3,    We also analyzed the effects of the expression of the overlapping DEGs on patients' survival by univariate Cox analysis and the Kaplan-Meier method in patients with colon cancer. In univariate Cox analysis and overall survival curves, expression of CXCL11, CXCL2, and CXCL3 in GSE41258 and CXCL1, CXCL2, and CXCL3 in TCGA had a strong correlation with the progression of colon cancer ( Figure S1). Results showed that single-gene models of CXCL11, CXCL2, and CXCL3 in GSE41258 and single-gene models of CXCL1 and CXCL3 in TCGA exhibited the potential ability to predict 5-year OS for patients with colon cancer (AUC > 0:60) (Figures 2(a) and 2(b)). ROC curves of each gene for 3-year OS are shown in Figure S2. To assess the joint effects of CXCL1, CXCL11, CXCL2, and CXCL3 on patients' survival, a multigene forecast model was established. Using R package, risk scores of patients were calculated according to the below formulas: risk score ðGSE41258Þ = ð0:486 * CXCL1 Exp Þ + ð−0:278 * CXCL11 Exp Þ + ð−0:727 * CXCL2 Exp Þ + ð0:128 * CXCL3 Exp Þ and risk score ðTCGAÞ = ð−0:124 * CXCL1 Exp Þ + ð−0:063 * CXCL11 Exp Þ + ð−0:038 * CXCL2 Exp Þ + ð0:006 * CXCL3 Exp Þ. As a result, AUCs from the multigene forecast model in GSE41258 and TCGA were both >0.60 (0.705 in GSE41258 and 0.624 in TCGA) (Figures 2(c) and 2(d)). ROC curves of multigene analysis for 3-year OS are shown in Figure S2. These results suggest that the forecast model possessed moderate specificity and sensitivity in colon cancer survival prediction. Further, according to the median risk score, patients were divided into low-risk and high-risk groups and survival curves were plotted. Low-risk patients had better survival than that of the high-risk group (P < 0:001 in GSE41258, P = 0:003 in TCGA; Figures 2(c) and 2(d)). The risk score distribution of patients in the order of ascending risk score is presented (Figures 2(e) and 2(f)). Survival times and status figures showed that the number of deceased patients in the high-risk group was higher than that in the low-risk group (Figures 2(g) and 2(h)), which was reflected by the survival curves. Collectively, these findings showed that the forecast model based on the expression of CXCL1, CXCL11, CXCL2, and CXCL3 could have a high prognostic value for the survival of patients with colon cancer.

Construction of Nomograms Based on the Risk Scores of Multigene Models and Clinical Risk Factors.
For a more sensitive predictive tool in clinical practice, we constructed nomograms integrating the risk scores of multigene models and three clinicopathological risk factors (T stage, N stage, and M stage) (Figures 3(a) and 3(b)). The C-indices of nomograms from GSE41258 and TCGA were 0.812 and 0.737, respectively. For GSE41258, the 3-year and 5-year true positive rates of the nomogram could reach up to 92.6% and 91.8%, respectively (Figure 3(c)), demonstrating that the nomogram was highly accurate in predicting individual OS for colon cancer. The 3-year and 5-year AUCs of the nomogram for TCGA were 0.774 and 0.727, respectively (Figure 3(d)), indicating that this nomogram possesses moderate predictive accuracy for patients' OS. Additionally, the calibration curves for predicting 3-year and 5-year OS also indicated that the nomogram-predicted survival closely corresponded with actual survival outcomes in both GSE41258 and TCGA (Figures 3(e) and 3(f)).

3.5.
Mechanism of the Effect of CXCL1, CXCL11, CXCL2, and CXCL3 on Colon Cancer Progression. To identify the mechanism of the effect of CXCL1, CXCL11, CXCL2, and CXCL3 on colon cancer, we performed GSEA and WGCNA. For GSEA, the expression profiles of tumor samples were divided into the low-risk and high-risk groups based on the risk scores of the multigene forecast. Then, the expression profile was analyzed using KEGG gene sets (c2), oncogenic  signatures gene sets (c6), and Hallmark gene sets (h) as references. The gene sets of NOD-like receptor signaling pathways, oxidative phosphorylation, and Parkinson's disease and the proteasome were significantly enriched according to c2 (Figure 4(a)). Based on c6, the enriched gene sets were CAMP, CSR/LATE, MTOR, and SNF5 (Figure 4(b)). Using h for reference, mTORC1 signaling, interferon-gamma response, and IL6/JAK/STAT3 signaling were significantly enriched (Figure 4(c)). For WGCNA, coexpressed genes with weights > 0:4 were selected and shown in visualized networks (Figures 5(a) and 5(b)). These results indicate that CXCLs play important roles in the progression of colon cancer.

Conclusions
Colon cancer is one of the most common and aggressive human malignancies [20,21]. Despite advances in systemic therapy for colon cancer, successful therapeutic strategies are limited because of the poor prognosis and high recurrence rate [22,23]. In this study, we constructed a prediction model for the prognosis of patients with colon cancer. In addition, we analyzed the underlying mechanisms of CXCLs by GSEA and built a regulatory network of these chemokines in colon cancer progression.
A few genes were identified to predict the diagnosis and prognosis of colorectal cancer, and the regulatory network was constructed [24][25][26]. In addition, the DNA methylation was analyzed in colon cancer, and several genes were identified [27]. In this study, we applied a bioinformatics approach to the discovery of prognostic biomarkers in human colon cancer. We assembled gene expression data involving human colon cancers from TCGA and GEO and then searched for differentially expressed genes. Genes associated with patient survival of colon cancer could be identified as single prognostic biomarkers. Using this approach, we identified CXCL1, CXCL11, CXCL2, and CXCL3 as potential biomarkers; we then established a multigene forecast model combining these chemokines. Results showed that our forecast model exhibited the potential ability to predict 5-year OS for patients with  colon cancer accurately. We further constructed nomograms integrating the risk scores of multigene models and three clinicopathological risk factors. Results showed that the nomograms have high accuracy in predicting individual OS for colon cancer. We then performed GSEA to find signaling pathways related to CXCLs. This revealed that CXCLs were correlated with the development and progression of tumors. We finally set up a regulatory network of CXCLs in colon cancer. However, the underlying mechanisms need to be further elucidated in future work.
Previous studies indicated that CXCL1 promotes tumor growth and is associated with poor survival in gastric cancer, breast cancer, and hepatocellular carcinoma [11,28,29]. However, in the TCGA database, highly expressed CXCL1 is associated with better survival in colon cancer, and this is consistent with a previous report that overexpression of CXCL1 positively correlates with improved survival [30]. CXCL2 is correlated with prognosis in bladder cancer [31]. In our study, CXCL2 was found to be highly expressed and correlated with the survival of patients with colon cancer in GSE41258. CXCL3 plays a predominant role in the tumorigenicity of prostate cancer cells and is upregulated in prostate cancer [32,33]. It is also involved in the migration, invasion, proliferation, and tubule formation of trophoblasts [34]. CXCL5 is overexpressed in pancreatic cancer, and it is associated with poor survival in hepatocellular carcinoma, pancreatic cancer, and late-stage gastric cancer [35][36][37]. Interestingly, it has been reported that low expression of CXCL5 is significantly associated with poor prognosis for patients with colorectal cancer [38]. However, CXCL5 had no significant correlation with the survival of patients with colon cancer in TCGA and GSE41258 in our study. CXCL8 has the potential to be a prognostic marker for breast cancer and colorectal cancer [39,40]. As CXCL8 was not included in GSE44076, it was not referred to in the prediction model in our work. Neuroendocrine-like cell-derived CXCL10 and CXCL11 induce the infiltration of tumor-associated macrophages and lead to the poor prognosis of colorectal cancer [41]. Downregulation of CXCL11 inhibits colorectal cancer cell growth and epithelial-

10
BioMed Research International mesenchymal transition [42]. However, highly expressed CXCL11 was found to be related to better survival in GSE41258, but not in TCGA in this study. A high level of CXCL12 is an independent predictor of poor survival in ovarian cancer [43]. Our results showed that CXCL1, CXCL2, CXCL3, and CXCL11 were all upregulated in colon cancer compared with healthy tissues, and in the colon cancer group, a high level of CXCL1, CXCL2, CXCL3, and CXCL11 was correlated with better survival in TCGA or GEO. The differences between this result and previous reports may be due to the differences in patient numbers, age, sex, races, metastasis, complications, or clinical stages. Using a single gene to predict prognosis is incomplete and limited. Our results indicate that a prediction model using multiple genes and clinical risk factors successfully predicts the prognosis of patients with colon cancer. Patients with colon cancer will benefit from this prediction model to improve treatment options and prognosis.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.