COL8A1 Predicts the Clinical Prognosis of Gastric Cancer and Is Related to Epithelial-Mesenchymal Transition

Background Gastric cancer (GC) is the fifth most common malignant tumor and the third leading cause of cancer-related deaths. Because GC has the characteristics of high heterogeneity, unclear mechanism, limited treatment methods, and low five-year survival rate, it is necessary to find the prognostic biomarkers of GC and explore the mechanism of GC. Methods We first identified differentially expressed genes (DEGs) between gastric cancer and normal gastric cells through expression analysis. A protein-protein interaction (PPI) network was constructed to find tightly connected modules. We performed survival analysis on the DEGs in the modules to identify genes with prognostic significance. Gene set enrichment analysis (GSEA) was used to identify gene enrichment pathways. Finally, we used our own collected clinical samples of 119 gastric adenocarcinoma (STAD) tissues and 40 normal gastric tissues to perform immunohistochemical (IHC) staining to verify the differential expression of COL8A1 in STAD tissues and normal gastric tissues and its correlation with epithelial-mesenchymal transition- (EMT-) related factors. Results We identified 356 DEGs through differential expression analysis. Through PPI analysis and survival analysis, we determined that the collagen type VII alpha-1 chain (COL8A1) gene has prognostic significance. GSEA analysis showed that COL8A1 was significantly enriched in the EMT. IHC results showed that COL8A1 was upregulated in STAD tissues and could be used as an independent prognostic factor and was related to EMT. Conclusion This study shows that COL8A1 is related to the prognosis of GC patients and might affect the progress of GC through the EMT pathway. Therefore, COL8A1 may be a biomarker for predicting the prognosis of GC.


Introduction
Worldwide, gastric cancer (GC) is the fifth most common malignant tumor and the third leading cause of cancerrelated deaths [1]. Although progress has been made in surgery, chemotherapy, targeted therapy, and immunotherapy, the 5-year overall survival (OS) rate of GC is only 20% due to the lack of sensitive and specific biomarkers and the advanced stage at diagnosis [2,3]. Stomach adenocarcinoma (STAD) is the main type of gastric cancer [4]. Therefore, it is extremely important to explore the mechanism of occurrence and development of gastric cancer and seek new potential biomarkers for early diagnosis and prognostic evaluation.
Type VIII collagen was originally identified as a biosynthesis product of bovine aorta and rabbit corneal endothelial cells. Collagen type VIII alpha-1 chain (COL8A1) is responsible for encoding the type VIII collagen α1 chain and plays a role in the proliferation and migration of different cells [5]. COL8A1 has been implicated in vascular injury, angiogenesis, and protumorigenic processes. COL8A1 is involved in the angiogenesis of certain brain tumors [6]. Silencing of COL8A1 significantly inhibited the proliferation and invasion of hepatocellular carcinoma cell lines and increased the sensitivity to D-limonene in the treatment of hepatocellular carcinoma [7]. Vastatin, a fragment of collagen type VIII, is increased in the serum of colorectal cancer patients and is associated with stromal responses [8]. However, the role of COL8A1 in gastric cancer remains unclear.
Epithelial-mesenchymal transition (EMT) is the process by which epithelial cells transform into cells with a mesenchymal phenotype, which is associated with tumor invasion and metastasis [9]. EMT contributes to the transition of gastric cancer from early to mid-late stage because it influences the aggressiveness of gastric cancer cells [10]. A variety of factors can affect the EMT process of the tumor either directly or through crosstalk. For example, multiple intracellular signaling pathways coordinate to induce EMT, and various factors secreted by cells in the tumor microenvironment can induce EMT [11]. Upregulation of vimentin and downregulation of E-cadherin are hallmarks of EMT [12,13].
In recent years, bioinformatics has brought a turning point in tumor research. It facilitates the collection and organization of tumor research results from different perspectives and allows investigators to build various databases with different functions according to different needs [14]. The Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) are commonly used databases in cancer research. There have been many studies using bioinformatics to analyze gene expression and clinical characteristics to identify molecular biomarkers, predict prognosis, or predict drug resistance [15][16][17].
In this study, we downloaded four STAD-related gene sets from GEO and sequencing data from TCGA. Differentially expressed genes (DEGs) were identified by R software. Then, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, and protein-protein interaction (PPI) network analyses were performed. Finally, we identified the COL8A1 gene as having prognostic significance through survival analysis. We evaluated COL8A1 enrichment in the EMT pathway through gene set enrichment analysis (GSEA). To clarify the prognostic significance and possible carcinogenic mechanism of COL8A1 in GC, we used immunohistochemistry and tissue chip technology to verify the prognostic significance of COL8A1 and its relationship with EMT.

Data Processing of Microarray Datasets.
In the R software, the "limma" package is used for screening for DEG [18]. The "RobustRankAggreg" package was used to identify common differentially expressed genes in four datasets. Since this RRA method is based on the null hypothesis of irrelevant input, its screening results are improved over prior methods [19]. The select criteria for DEG were jlog 2 foldchangej ≥ 1 and P value < 0.05.

Validation of DEGs.
The RNA-sequencing data obtained by TCGA was used to verify the results of the GEO dataset integration analysis. A jlog 2foldchangej ≥ 1; P value < 0.05 was considered statistically significant. We retained the overlapping genes of DEG obtained from TCGA RNAsequencing data analysis and GEO integration analysis for further analysis.

GO and KEGG Enrichment
Analyses of DEGs. We used the "clusterProfiler" package in R software to perform GO and KEGG analyses on overlapping DEGs and generated a visual analysis output of cellular components (CC), biological processes (BP), molecular functions (MF), and pathways among the overlapping DEGs.
2.5. PPI Analysis. STRING (https://string-db.org/) is a database for exploring known and predicted protein-protein interactions. To evaluate the interaction between these DEGs, we mapped DEGs to STRINGs and selected PPIs with confidence scores ≥ 0:7 to be retained and further imported into Cytoscape software. We used the cytoHubba application in Cytoscape software to build a PPI network. The Cytoscape Molecular Complex Detection (MCODE) plug-in was used to select the most closely connected module from the existing PPI network, and we set the filter conditions as degree cutoff = 2 to carry out further functional analysis [20].
2.6. Survival Analysis. The clinical information on 326 GC patients was used for survival analysis. We used the "survival" package to analyze the clinical information of these patients to find genes that were closely related to survival.

Gene Set Enrichment
Analysis. Based on the selected candidate genes, we performed GSEA analysis using GSEA software to determine the potential function of the candidate genes. The enrichment score indicated the degree of enrichment of genes in the pathway. The annotated    Because the patients were anonymous, this study was exempt from signed informed consent.
2.9. Preparation of Tissue Microarray. Formalin-fixed and paraffin-embedded (FFPE) blocks and corresponding hematoxylin and eosin (H&E) sections of all cases were examined by two experienced pathologists. Two pathologists reevaluated the STAD specimens based on the 2010 Eighth Edition of the American Joint Committee on Cancer (AJCC) Tumor, Lymph Node, and Metastasis (TNM) classification. In addition, the pathologist circled the STAD and normal gastric tissue areas on the H&E slice of the corresponding FFPE block.

Immunohistochemistry (IHC).
Based on the circled area of the H&E slice, we took the tumor core from the corresponding FFPE block and transferred it to the blank FFPE block to construct a tissue microarray. Then, 4 μm sections were used for immunohistochemistry.
The sections were dewaxed in xylene and hydrated via treatment with increasingly dilute ethanol solutions. The sections then were placed in ethylenediaminetetraacetic acid (EDTA) stock solution (1 : 50 dilution, pH = 9:0) for high temperature and pressure antigen retrieval. The sections were incubated with a peroxidase blocking agent for 10 minutes to block endogenous peroxidase activity and blocked with normal unimmunized animal serum according to the manufacturer's protocol. Anti-COL8A1 antibody was diluted 1 : 100 after which the sections were incubated with

Identification of DEGs.
Detailed information on the four GEO datasets included in our study is shown in Table 1.
After a comprehensive analysis of the four datasets, a total of 528 DEGs were obtained, including 195 upregulated genes

GO and KEGG Enrichment Analyses.
We performed an enrichment analysis of GO and KEGG pathways on the 356 overlapping DEGs to determine the GO category and KEGG pathway. DEG is closely related to the collagen-rich extracellular matrix in the classification of CC (Figure 1(a)) and BP (Figure 1(b)). In terms of MF classification, DEGs were significantly enriched in the extracellular matrix structural constituent and extracellular matrix structural constituent conferring tensile strength (Figure 1(c)). These results indicate that DEGs are closely related to collagen components in the extracellular matrix, and studies have shown that collagen itself participates in many aspects of tumor transformation [21]. The results of the KEGG pathway analysis indicate that DEGs were significantly enriched in protein digestion, absorption, and metabolism of xenobiotics by cytochrome P450 (Figure 1(d)). The above results show that these DEGs were enriched in the pathways involved in the occurrence and development of GC [22,23].

PPI Analysis.
We analyzed the PPI network of overlapping DEGs to identify key genes and their interactions in the progression of gastric cancer. There were 117 nodes in the PPI network, and we excluded the unconnected nodes. The nodes' degrees were calculated to identify candidate central nodes (Figure 2(a)). Then, we used the MCODE plug-in to select the most closely connected module from the constructed PPI network for further functional analysis. The results showed that the most compact module in the cluster contained 16 genes, namely, COL10A1, COL6A5, SERPINH1, COL5A2, THBS2, COL5A1, BGN, COL4A1, COL3A1, COL11A1, COL12A1, SPARC, COL1A2, SPP1, COL8A1, and COL1A1 (Figure 2(b)).

Survival Analysis.
We performed Kaplan-Meyer (KM) curve analysis on the above 16 genes to identify genes with prognostic significance, and 9 genes with prognostic significance were obtained (Table 2). According to the expression of these 9 genes, we found that COL10A1, SPP1, and COL8A1 showed abundant variation (Table 2). In addition, we found that there was no large-scale cohort study to prove the prognostic value of COL8A1 in STAD patients. Therefore, we chose COL8A1 for further verification. We used the clinical information downloaded from TCGA to analyze the correlation between the expression of COL8A1 and the clinicopathological characteristics and molecular subtype characteristics of STAD patients. Our results show that the genome-stable (GS) subtype has the highest correlation with COL8A1 expression (Figure 3(a)). Analysis of clinicopathological characteristics indicated that the expression of COL8A1 was associated with pathological grade and tumor stage, but not to age, gender, or lymph node status (Figures 3(b)-3(g)).

COL8A1 Was Highly Expressed in STAD and Associated
with Poor Prognosis. Based on the above analysis, we used our own collected specimens to perform IHC staining of COL8A1 protein to explore the expression of COL8A1 in STAD. As shown in Table 3, COL8A1 was highly expressed in STAD and low in normal gastric tissues (Figure 4(a)). Table 4 shows the relationship between COL8A1 and the clinical pathological characteristics of the patients. The results indicate that the expression of COL8A1 was related to the tumor's stage and lymph node status. Based on the expression of COL8A1, we divided all patients into a positive expression group and a negative expression group. The results of survival analysis showed that patients with high expression of COL8A1 had poor survival results (Figure 4(b)). These results show that COL8A1 has bearing on the progression of GC and could be used to predict the prognosis.
3.6. Gene Set Enrichment Analysis of COL8A1. To investigate the carcinogenic mechanism of COL8A1 in GC, we used GSEA to analyze the signal pathways enriched in samples with a high expression of COL8A1. Twenty genomes were identified (FDR < 0:25); the ranked enrichment scores in the top six gene sets are as follows: "epithelial mesenchymal transformation,", "angiogenesis," "myogenesis," "hedgehog_ signaling," "uv response dn (uv response down)," and "coagulation" (Figure 5). These gene sets are all closely related to tumor development [24][25][26].

The Relationship between COL8A1 and EMT-Related
Protein Expression. Based on the results of GSEA analysis, COL8A1 had the highest enrichment score in the EMT pathway. Therefore, we performed IHC staining on EMTrelated proteins to explore the relationship between COL8A1 and EMT. The expression of E-cadherin and vimentin proteins is shown in Figures 4(c) and 4(d). The correlation analysis results show that the expression of the E-cadherin protein was negatively correlated with the expression of the COL8A1 protein, and the expression of   (Table 5). These results suggest that COL8A1 might promote the progression of STAD through the EMT pathway. The combined prognostic survival analysis showed that COL8A1 and vimentin could be used in combination to predict the prognosis of GC patients (Figures 4(e) and 4(f)). In addition, the expression of E-cadherin and vimentin had no relationship with clinicopathological characteristics of patients (Supplementary Table 1 and Table 2). We also analyzed the effect of Ecadherin protein expression and vimentin protein expression on the survival of patients, but the difference was also not statistically significant (Supplementary Figure 1A-B).

Discussion
GC is one of the common causes of death of cancer patients in the world. Early diagnosis, timely treatment, and prognosis assessment are of great significance to increase patient survival rates. In our study, a total of 356 differentially expressed genes were obtained through differential gene expression analysis. For exploring the biological functions of these related DEGs, we performed GO and KEGG enrichment analyses. The results showed that DEGs were mainly related to the extracellular matrix components and the physiological functions of the stomach. We identified a module with the most closely connected 16 genes from the PPI network. In order to find biomarkers related to the prognosis of GC, we performed survival analysis on these 16 genes. Finally, we identified COL8A1 as being related to the prognosis of GC for subsequent analysis. GSEA analysis showed that COL8A1 had the highest enrichment score in the EMT pathway. Based on the results of this bioinformatics analysis, we used our own clinically collected gastric cancer samples for immunohistochemical verification to explore the expres-sion of COL8A1 in GC patients and its relationship with the EMT.
COL8A1 is responsible for encoding the α1 chain of type VIII collagen and is involved in the formation of the vascular endothelium [27]. Recent research has shown that COL8A1 is dysregulated in a variety of cancers. COL8A1 may affect the progression of colorectal cancer and the prognosis of patients by regulating focal adhesion-related pathways, and the expression of COL8A1 in colorectal cancer is related to the expression of Wnt2 and is linked to the poor survival of patients [28,29]. COL8A1 may promote breast cancer migration by affecting ECM receptor interactions and cooperation with other genes [30]. There are very few studies of COL8A1 in gastric cancer. Experiments on gastric cancer cells show that the silence of COL8A1 can obviously inhibit cell proliferation, migration, and invasion in GC [31]. We have verified through immunohistochemical analysis that the expression of COL8A1 in GC tissues is distinctly higher than that in normal tissues, and COL8A1 is significantly related to pathological T staging and lymph node metastasis, which suggests that COL8A1 expression may play a part in the progression of gastric cancer. Our survival analysis results showed that COL8A1 could be a biomarker to predict the prognosis of GC.
There are currently 28 types of collagens, which are divided into four families based on the supramolecular structure [32]. These collagen genes all take part in the regulation of EMT in tumors. The downregulation of COL11A1 may affect the migration and invasion cascade of ESCC through the downregulation of EMT [33]. COL6A3 silencing suppresses the expression of MMP-2, MMP-9, and vimentin, then participates in the process of inhibiting EMT of bladder cancer cells [34]. Knocking out COL1A1 can inhibit hepatocellular carcinoma cell migration and invasion by uncontrolled EMT in vitro [35]. The upregulation of COL2A1 may be a biomarker for partial EMT [36].

BioMed Research International
In bladder cancer, COL1A1 is upregulated. When COL1A1 was knocked down, the EMT process and apoptosis were inhibited [37]. In addition, COL10A1 has a molecular structure similar to COL8A1, and COL10A1 may be an effective costimulator of TGF-β1-induced EMT in GC [38]. However, there is currently no study in GC on the relationship between COL8A1 and EMT. Our IHC staining results indicated that COL8A1 was negatively correlated with Ecadherin protein expression and positively correlated with vimentin protein expression. The combined prognostic analysis of COL8A1, E-cadherin, and vimentin indicated that COL8A1 may affect the prognosis of patients through EMT. EMT can promote tumor occurrence, invasion, and metastasis [11]. Many genes can influence the occurrence and development of tumors through EMT and thus affect the prognosis [39][40][41]. Combining the above collagen family and EMT-related research, our experimental results indicate that COL8A1 might also affect the progression of GC through EMT. There are some limitations to this study. On the one hand, we had very few STAD tissue samples for IHC experiments, so more clinical tissue samples are needed for valida-tion. Secondly, our exploration of the mechanism by which COL8A1 affected the occurrence and development of GC through EMT is still unclear and needs to be verified by further in vitro and in vivo experiments.

Conclusions
In summary, this study showed that COL8A1 was upregulated in GC and related to the prognosis of GC patients, indicating that COL8A1could be a biomarker for predicting the prognosis of GC. In addition, COL8A1 may affect the progression of GC through EMT.

GC:
Gastric cancer OS: Overall survival STAD: Stomach adenocarcinoma COL8A1: Collagen type VIII alpha-1 chain EMT: Epithelial-mesenchymal transition GEO: Gene Expression Omnibus TCGA: The Cancer Genome Atlas