A Novel Cancer Stemness-Related Signature for Predicting Prognosis in Patients with Colon Adenocarcinoma

Objective To explore the cancer stemness features and develop a novel cancer stemness-related prognostic signature for colon adenocarcinoma (COAD). Methods We downloaded the mRNA expression data and clinical data of COAD from TCGA database and GEO database. Stemness index, mRNAsi, was utilized to investigate cancer stemness features. Weighted gene coexpression network analysis (WGCNA) was used to identify cancer stemness-related genes. Univariate and multivariate Cox regression analyses were applied to construct a prognostic risk cancer stemness-related signature. We then performed internal and external validation. The relationship between cancer stemness and COAD immune microenvironment was investigated. Results COAD patients with higher mRNAsi score or EREG-mRNAsi score have significant longer overall survival (OS). We identified 483 differently expressed genes (DEGs) between the high and low mRNAsi score groups. We developed a cancer stemness-related signature using fifteen genes (including RAB31, COL6A3, COL5A2, CCDC80, ADAM12, VGLL3, ECM2, POSTN, DPYSL3, PCDH7, CRISPLD2, COLEC12, NRP2, ISLR, and CCDC8) for prognosis prediction of COAD. Low-risk score was associated with significantly preferable OS in comparison with high-risk score, and the area under the ROC curve (AUC) for OS prediction was 0.705. The prognostic signature was an independent predictor for OS of COAD. Macrophages, mast cells, and T helper cells were the vital infiltration immune cells, and APC costimulation and type II IFN response were the vital immune pathways in COAD. Conclusions We developed and validated a novel cancer stemness-related prognostic signature for COAD, which would contribute to understanding of molecular mechanism in COAD.


Introduction
As is known to us, colon adenocarcinoma (COAD) is regarded as one of the most frequent malignant tumors in the gastrointestinal system [1][2][3]. It was reported that there was an increasing incidence of more than 4% in COAD per year all over the world [4]. Cancer heterogeneity results in variable prognosis in patients with COAD [5,6]. In spite of optimal operative treatment‚ the recurrence rate of COAD is high [6]. Hence, numerous researchers devoted themselves to exploring molecular biomarkers or circulating tumor biomarkers (especially circulating tumor DNA) for identifying minimal residual disease or early relapse [6,7]. Besides, better characterization of the transcriptomic subtypes and molecular risk stratifications of COAD has greatly improved our understanding of the molecular mechanism of COAD [5,8,9]. However, there were no recognized accurate clinical risk stratification methods and biomarkers for COAD [10]. It is of great importance to find out biomarkers or risk stratification models for COAD.
It has been speculated that cancer stem cells (CSCs) played a vital role in expansion, progression, relapse, and therapeutic resistance of solid malignancies [11,12]. Shiokawa et al. also demonstrated that the slow-cycling subpopulation of CSCs was rather significant in COAD development and chemoresistance [13]. Recently, a novel caner stemness index (named mRNAsi), which was developed by deep learning methods [14], has been extensively researched in various types of cancer, including bladder cancer [15], gastric carcinoma [16], lung cancer [17], and glioma [18]. However, there was not an integrated investigation of COAD stemness features using high-throughput data.
Weighted gene coexpression network analysis (WGCNA) is a recently widespread approach to estimate the distance between genes and identify network topologies [19,20]. This study utilized WGCNA to investigate cancer stemness features and identify cancer stemness-related genes. Besides, we developed a novel cancer stemnessrelated prognostic signature for COAD by univariate and multivariate Cox regression analysis. Internal and external validation of this signature was then performed. The relationship between cancer stemness and COAD immune microenvironment was particularly investigated.

Materials and Methods
2.1. Data Source. The transcriptome data and clinical data of COAD were obtained from TCGA database (https://portal .gdc.cancer.gov) [21]. We combined all these data into a matrix file utilizing Perl language (http://www.perl.org/). Ensembl IDs were converted into gene symbols using the Ensembl database (http://asia.ensembl.org/signature.html). We downloaded the gene expression-based stemness index, named mRNAsi, from Malta et al.'s study [14]. Besides, GSE17538 dataset and GSE39582 dataset were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov).

Relationships between Stemness Indices and
Clinicopathologic Feature. According to the median value of mRNAsi score, EREG-mRNAsi score, mDNAsi score, and EREG-mDNAsi score, all COAD cases in TCGA database were segmented into low-score group and high-score group, respectively. We then used R package "survival" and "survminer" to explore the difference of overall survival (OS) between the high and low stemness score groups. Besides, the relationships between stemness indices and clinicopathologic features, including age, gender, AJCC stage, T stage, N stage, and M stage, were also investigated using R package "beeswarm."

Relationships between Stemness Indices and Tumor
Microenvironment. Single-sample gene set enrichment analysis (ssGSEA) was performed to calculate the infiltrating score of 16 immune cells and the activity of 13 immune-related function using R package "gsva." The associations of the stemness indices with immune infiltration cells and immune function activity were then investigated.
There were various components in the tumor microenvironment including tumor cells, immune cells, stromal cells, and extracellular matrix [22]. The ESTIMATE algorithm, which was developed by Yoshihara et al., was recently widely used for exploring tumor microenvironment components [22][23][24]. We calculated stromal score, immune score, ESTI-MATE score, and tumor purity of COAD tumor microenvironment using the ESTIMATE algorithm. The correlation of COAD stemness indices with tumor microenvironment was then explored.

Coexpression Analysis Screening Stemness-Related Genes.
The coexpression network targeting these DEGs was established using "WGCNA" package. All paired genes adopted the average linkage method and Pearson's correlation matrices. Moreover, the coexpression similarity matrix was built using the absolute values of the correlations between transcription data. Besides, a power function a mn = jc mn j β (a mn = adjacency between gene m and gene n; c mn = Pearso n ' s correlation between gene m and gene n) was used to construct a weighted adjacency matrix. β, a soft thresholding parameter, was applied to emphasize strong relations between genes and penalize the weak correlation. Then, an appropriate power of β was selected based on the mean connectivity. Next, the network connectivity was measured by converting this adjacency into a topological overlap matrix (TOM). The TOM summed up the adjacent genes for the network gene ratio and calculated the corresponding dissimilarity. Modules were identified using the dynamic tree cut method and named using various colors. The main component for each module was defined as module eigengene (ME). We calculated the correlation between cancer stemness indices and each ME to identify the most significant module. Finally, the modules most highly correlated with mRNAsi and genes in this module were selected for further analysis.

Development of a Cancer Stemness-Related Novel
Prognostic Signature. We randomly divided all COAD cases in the GSE39582 cohort into train cohort and test cohort. Univariate and multivariate Cox regression analyses were conducted in the train cohort to develop a cancer stemness-related novel prognostic signature for predicting OS in patients with COAD. Next, all patients in the train cohort were split into two groups, low-risk group and high-risk group, according to median value of the risk score.

Internal and External Validation of This Cancer
Stemness-Related Signature. Firstly, we conducted survival analysis and time-dependent receiver operating characteristic (ROC) curve to validate the performance of this cancer stemness-related novel prognostic signature. Then, univariate and multivariate independent prognostic analyses were used to explore whether this signature was an independent risk factor for OS in COAD patients.

2
Stem Cells International    Stem Cells International Moreover, we utilized the test cohort to perform internal validation for this novel cancer stemness-related prognostic signature. The risk score for each patient in the test cohort was calculated using the risk formula. Next, TCGA cohort and GSE17538 cohort were used as external validation. We calculate the risk score for each case in TCGA cohort and GSE17538 cohort by the risk formula. Survival analysis was then performed.

2.7.
Associations of the Cancer Stemness-Related Signature with Immune Cell Infiltration and Immune Function. We calculated the infiltrating scores of 16 immune cells and the activity of 13 immune-related pathways of TCGA cohort and GSE39582 cohort using the ssGSEA method. Then, the associations of the cancer stemness-related signature with immune cell infiltration and immune-related pathway activity were explored.

Identification and Validation of Hub Biomarkers in
Multidatabase. We selected two genes (including NRP2 or ADAM12) as hub biomarkers for COAD cancer stemness features. The mRNA expression levels between normal and tumor tissues were demonstrated using the GEPIA (http:// gepia.cancer-pku.cn/signature.html) database. The UAL-CAN database (http://ualcan.path.uab.edu/) is a portal for facilitating tumor subgroup gene expression and survival analyses. Promoter methylation levels of hub genes were revealed using the UALCAN database. Survival analysis based on the UALCAN database was performed to evaluate the prognostic value of these two hub biomarkers.

Associations of COAD Stemness Features with
Clinicopathologic Feature. We eventually included 398 COAD samples with complete transcriptome data from TCGA database, 238 COAD cases with transcriptome data from GSE17538 dataset, and 585 COAD cases with transcriptome data from GSE39582 dataset. However, there were merely 385 COAD samples with clinical data in TCGA database and 232 cases with clinical data in GSE17538 dataset. The clinicopathologic data of TCGA cohort, GSE17538 cohort, and GSE39582 cohort are presented in Table 1. Survival curve suggested that the COAD patients with higher mRNAsi score or EREG-mRNAsi score have significantly longer overall survival compared with those with low mRNAsi score or EREG-mRNAsi score. However, there was no significant difference in overall survival between the high and low mDNAsi score and EREG-mDNAsi groups. Moreover, we found that mRNAsi score was significantly associated with AJCC stage and N stage while EREG-mRNAsi score was related to age. However, mDNAsi score was not related to any clinical features. Therefore, we  Stem Cells International selected mRNAsi to explore the cancer stemness of COAD in this study ( Figure 1).

Associations of COAD Stemness Features with Tumor
Immune Microenvironments. The ssGSEA analysis revealed that higher infiltrating percentages of B cells, DCs, iDCs, macrophages, mast cells, neutrophils, pDCs, T helper cells, Tfh, TIL, and Treg were significantly associated with low mRNAsi score of COAD. Besides, patients with lower mRNAsi have higher scores of immune function or pathways than those with higher mRNAsi, including APC coinhibition, APC costimulation, CCR, checkpoint, HLA, parainflammation, T cell coinhibition, T cell costimulation, type I IFN response, and type II IFN response. We also used the ESTIMATE algorithm to explore the COAD immune environment. The results showed that high mRNAsi score had an association with lower immune score, stromal score, and ESTIMATE score of COAD. However, high mRNAsi score had a positive association with higher tumor purity of COAD ( Figure 2).

WGCNA in Identifying Cancer Stemness-Related
Genes and Functional Enrichment. There were a total of 386 patients in TCGA database with complete transcriptome data and mRNAsi score. We divided all patients in TCGA database into the low mRNAsi score group (187 cases) and the high mRNAsi score group (199 cases). We identified 483 differently expressed genes between these two groups, including 480 downregulated DEGs and 3 upregulated DEGs. Downregulated DEGs represented genes highly expressed in the low mRNAsi score patients while upregulated DEGs represented genes highly expressed in the high mRNAsi score patients. These genes were imported into coexpression analysis. WGCNA analysis identified a total of 2 coexpression modules, which were shown using various colors. The correlation between each module eigengene (ME) and mRNAsi score was explored to identify the relevant modules. Finally, the turquoise module and blue module were considered as the most significant module and selected for further analysis. We set GS as >0.5 and MM > 0:8 and identified a total of 110 cancer stemness-related genes (Figures 3(a)-3(f)). Functional analysis demonstrated that these cancer stemness-related genes were mainly enriched in cellular response to vascular endothelial growth factor stimulus, face morphogenesis, kidney development, intramembranous ossification, heart development, NABA MATRISOME ASSOCIATED, NABA PROTEOGLYCANS, response to growth factor, cell-substrate adhesion, collagen metabolic process, sensory organ development, ossification, blood vessel development, endodermal cell differentiation, supramolecular fiber organization, collagen fibril organization, skeletal system development, NABA ECM GLYCOPRO-TEINS, NABA COLLAGENS, and NABA CORE MATRI-SOME (Figures 3(g) and 3(h)).

Development of a Novel Cancer Stemness-Based
Prognostic Signature. After we excluded those OS time < 30 days, 573 cases in GSE39582 dataset, 355 cases in TCGA database, and 229 cases in GSE17538 dataset were finally included in the development and validation of a novel stemness-based prognostic signature. The GSE39582 dataset has the maximum sample size among these three cohorts. Hence, we used the GSE39582 dataset to develop a cancer stemness-related signature for predicting prognosis of COAD patients. TCGA cohort and GSE17538 cohort were used for external validation. Firstly, we divided all samples in the GSE39582 dataset into train cohort and test cohort. There were a total of 288 cases in the train cohort and 285 cases in the test cohort. Multivariate Cox regression analysis was conducted to develop a novel signature using fifteen cancer stemness-related genes (including RAB31, COL6A3, COL5A2, CCDC80, ADAM12, VGLL3, ECM2, POSTN, DPYSL3, PCDH7, CRISPLD2, COLEC12, NRP2, ISLR, and CCDC8) for prognosis prediction of COAD using the train cohort. The calculation formula of risk score is shown as follows: Risk score = coef 1 * Exp1 + coef 2 * Exp2 + coef 3 * Exp3 + ⋯+coefx * Expx (Table 2).

Validation of This Novel Cancer Stemness-Based
Prognostic Signature. In the train cohort, the difference in    9 Stem Cells International OS between the low-risk and high-risk groups was statistically significant (P < 0:001); low-risk score was associated with significantly preferable OS in comparison with highrisk score. The area under the ROC curve (AUC) for OS prediction was 0.705, suggesting the promising value of this novel cancer stemness-based prognostic signature for prognosis prediction of COAD (Figure 4). Univariate independent prognostic analysis showed that age, stage, N stage, M stage, and this prognostic signature were associated with OS of COAD. Multivariate analysis demonstrated that age, N stage, M stage, and this prognostic signature were independent predictors for OS of COAD, indicating the great performance of this signature (P < 0:05, Table 3).
Moreover, internal and external validation was performed. We calculated the risk score of each patient in train cohort, test cohort, TCGA cohort, and GSE17538 cohort. Survival analysis revealed that the difference in OS between the low-risk and high-risk groups was statistically significant in the test cohort (P = 0:015), whole GSE39582 cohort (P < 0:001), TCGA cohort (P < 0:001), and GSE17538    11 Stem Cells International cohort (P = 0:029), respectively. The expression heat map, the distribution of risk score, and survival time of train cohort, test cohort, TCGA cohort, and GSE17538 cohort are presented in Figure 5.

Associations of Cancer Stemness-Based Prognostic
Signature with Immune Cell Infiltration and Immune Function. The immune functions and immune cell infiltra-tion were quantified using ssGSEA in TCGA cohort and GSE39582 cohort. For the GSE39582 cohort, the infiltrating proportion of DCs, macrophages, mast cells, neutrophils, and T helper cells in the high-risk group was significantly increased compared with that in the low-risk group; patients with high-risk score have higher scores of immune function or pathways than those with low-risk score including APC costimulation, CCR, parainflammation, and type II IFN ). For TCGA cohort, the infiltrating proportion of macrophages, mast cells, pDCs, and T helper cells in the low-risk group was significantly decreased in comparison with that in the high-risk group; the infiltrating proportion of Th2 cells in the low-risk group was significantly increased in comparison with that in the high-risk group; patients with low-risk score have lower scores of immune function or pathways than those with high-risk score including APC costimulation, HLA, type I IFN response, and type II IFN response (Figures 6(c) and 6(d)). These results showed that macrophages, mast cells, and T helper cells were the vital infiltration immune cells and associated with this cancer stemness-related signature and that APC costimulation and type II IFN response were the vital immune functions and associated with this cancer stemness-related signature.

Validation of Two Hub Genes Significantly Associated with Stemness
Features. It was also found that the mRNA expression level of NRP2 was significantly decreased in COAD tissues compared with that in normal tissues while the mRNA expression level of ADAM12 was significantly increased in COAD tissues compared with that in normal tissues. Moreover, the promoter methylation level of NRP2 and ADAM12 was significantly higher in COAD tissues compared with that in normal tissues. Survival analysis reveals that the OS of patients with high expression of NRP2 or ADAM12 was decreased significantly compared with those with low expression (Figure 7).

Discussions
Numerous studies have revealed that tumors harbored subclones with respect to the sensitivity of various therapy methods, and CSCs were thought to be determining intratumor heterogeneity [25]. Besides, CSCs have been reported to have a vital role in cancer initiation, progression, and chemoresistance, and epigenetic alterations have been regarded as brilliant factors related to CSC phenotype [12]. For example, La Noce et al. revealed for the first time that HDAC2 could serve as a key factor in regulating CSC phenotype and a potential therapeutic target in human osteosarcoma [12]. Over the past decade, colon CSCs also have been reported to be critical in COAD proliferation, invasion, recurrence, and chemoresistance [26]. A deep understanding into cancer stemness features of COAD would contribute to making clinical decision and improving outcomes once CSC targeted therapy is available [27]. In this study, comprehensive bioinformatics analysis was performed to identify stem cell-associated genes for discovering the underlying compounds and developing stemness-related signature for predicting prognosis in COAD. In particular, we explored the association between COAD stemness and tumor immune environment.
In recent years, the tumor immune microenvironment was found to be vital in maintaining tumor stemness [28]. Crespo et al. suggested in their study that introducing/promoting T cell stemness and targeting T cell dysfunctional mechanisms would be of great importance in treating malignant tumors [29]. Su et al. demonstrated that targeting the CD10+GPR77+ cancer-associated fibroblast subpopulation could be an effective therapeutic strategy for CSC-driven solid tumors [30]. In this study, based on the ESTIMATE tumor immune environment evaluation algorithm, we revealed that high mRNAsi score had an association with lower immune score, stromal score, and ESTIMATE score of COAD. Besides, the ssGSEA revealed that macrophages, mast cells, and T helper cells were the vital infiltration immune cells and APC costimulation and type II IFN response were the vital immune pathways in COAD. Previous study also revealed the close associations of COAD stemness with tumor environment and immune cells. Fang et al. demonstrated that IL33 played a vital role in accelerating COAD cell stemness via activating JNK and recruiting macrophages [31]. Hwang et al. indicated that cancer stem-like cell could prime neutrophils for facilitating the occurrence and development of COAD [32]. Yu et al. found that mast cells might be vital in promoting COAD genesis and progression via bidirectional crosstalk [33]. Perez et al. reported that TGF-β signaling in Th17 cells was crucial in accelerating IL-22 production and colitis-associated COAD [34]. However, exploration of APC costimulation and type II IFN response in COAD developments is rare. Further research is required.       16 Stem Cells International Moreover, an interesting signature related to cancer stemness of COAD was developed in this study. There have been several prognostic signatures of COAD in previous study. Xu et al. identified a signature using 15 genes based on support vector machine, which could be used to classify COAD patients with various outcomes [10]. Yin et al. established an effective lncRNA signature based on the genome instability as a potential tool for predicting survival of COAD [35]. Nie et al. developed a ferroptosis-related prognostic signature to predict survival of COAD and found that STING might be a new promising immune target [35]. Jiang et al. constructed a vital signature as a novel marker to pre-dict COAD prognosis using lipid metabolism-related genes [36]. Chang et al. used five RNA-binding proteins to develop a superior prognostic and diagnostic signature, which provided new possibilities for individualized treatment of COAD patients [37]. However, no previous studies constructed prognostic signature of COAD from the point of cancer stemness. In this study, we developed a novel cancer stemness-related prognostic signature for COAD using bioinformatics methods for the first time. Univariate and multivariate independent prognostic analyses demonstrated that this cancer stemness-related signature was a vital independent predictor for OS of COAD . Internal and external   Type  RAB31  COL6A3  COL5A2  CCDC80  ADAM12  VGLL3  ECM2  POSTN  DPYSL3  PCDH7  CRISPLD2  COLEC12  NRP2  ISLR  CCDC8   0  50  100  150 ns  ns ns  ns  ns  ns  ns  ns  ns  ns  ns  ⁎  ⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎  ⁎⁎⁎  ns  ns ns s s s s  ns  ns  ns  ns  ns  ns  ns  ns  ⁎  ⁎⁎⁎ ⁎⁎⁎  18 Stem Cells International validation indicated the potential role of this cancer stemness-related signature. Finally, we identified two cancer stemness-related genes as potential hub biomarkers for COAD, including NRP2 and ADAM12. Fujii et al. found that miR-331-3p could promote the differentiation of keratinocyte through inhibiting NRP2 in cervical cancer cell [38]. Wang et al. revealed that circ-LDLRAD3 was important in promoting the progression of gastric cancer by regulating the miR-224-5p/NRP2 axis [39]. Polavaram et al. demonstrated that targeting NRP2 in prostate cancer might be beneficial in treating bone metasta-sis [40]. Huang et al. reported that ADAM12 and lnc015192 could serve as a ceRNA by regulating miR-34a in breast cancer [41]. Veenstra et al. proved that ADAM12 could serve as a serum marker for stromal activation and predict chemotherapy sensibility in pancreatic cancer [42]. Shimura et al. found that ADAM12 and MMP-9/NGAL complex in urine could serve as a detective biomarker for gastric cancer [43]. However, researches focusing on the vital role of NRP2 and ADAM12 in COAD were rare. Further exploration into the mechanism of NRP2 and ADAM12 in COAD is required. However, several inescapable limitations should be noted in this study. The retrospective nature and merely bioinformatics analysis are the major weaknesses of this study. Prospective sequencing data is required. Secondly, these 15 cancer stemness-related genes might be of great importance in COAD tumorigenesis. Further in vitro and in vivo experiment into the underlying mechanism should be performed.

Conclusion
In our study, we identified differently expressed genes involved in cancer stemness for COAD by comprehensive bioinformatics analysis. More importantly, we successfully developed and validated a novel cancer stemness-related prognostic signature for COAD, which would contribute to further understanding of molecular mechanism in COAD.

Data Availability
All data generated or analyzed during the present study were downloaded from TCGA database and GEO database.

Consent
Consent is not applicable.