Identification of the Key Genes and Pathways in Esophageal Carcinoma

Objective. Esophageal carcinoma (EC) is a frequently common malignancy of gastrointestinal cancer in the world. This study aims to screen key genes and pathways in EC and elucidate the mechanism of it. Methods. 5 microarray datasets of EC were downloaded from Gene Expression Omnibus. Differentially expressed genes (DEGs) were screened by bioinformatics analysis. Gene Ontology (GO) enrichment, Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment, and protein-protein interaction (PPI) network construction were performed to obtain the biological roles of DEGs in EC. Quantitative real-time polymerase chain reaction (qRT-PCR) was used to verify the expression level of DEGs in EC. Results. A total of 1955 genes were filtered as DEGs in EC. The upregulated genes were significantly enriched in cell cycle and the downregulated genes significantly enriched in Endocytosis. PPI network displayed CDK4 and CCT3 were hub proteins in the network. The expression level of 8 dysregulated DEGs including CDK4, CCT3, THSD4, SIM2, MYBL2, CENPF, CDCA3, and CDKN3 was validated in EC compared to adjacent nontumor tissues and the results were matched with the microarray analysis. Conclusion. The significantly DEGs including CDK4, CCT3, THSD4, and SIM2 may play key roles in tumorigenesis and development of EC involved in cell cycle and Endocytosis.


Introduction
Esophageal carcinoma (EC) is the sixth leading cause of cancer mortality in males and the ninth leading cause of cancer mortality in females in 2012 worldwide [1]. The highest incident rates of EC are found in Eastern Asia, Southern Africa, and Eastern Africa and the lowest incidence rate of EC is found in Western Africa [1]. Esophageal carcinoma is usually 3 to 4 times more common among men than women. The 5-year overall survival ranges from 15% to 25% [2]. In China, it is predicted that EC is the fourth leading cause of cancer deaths in males and females after lung and bronchus, stomach, and liver in 2015 [3].
EC is classified as esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC) according to histological type and ESCC is the predominant histological type of EC in the world [2]. It is reported that tobacco consumption, alcohol consumption, and low intake of fruits and vegetables are major risk factors for ESCC [4]. Overweight, obesity, gastroesophagus reflux disease (GERD), and Barrett's esophagus increase incidence risk of EAC [1,5].
In addition to the above-mentioned environmental factors, abnormal expression of miRNA and genes and methylation of genes and SNPs are associated with EC tumorigenesis and development. miR-219-1 rs107822G > A polymorphism might significantly decrease ESCC risk through changing individual susceptibility to Chinese Kazakhs [5]. The cases carrying the GG variant homozygote have a significant 2.81fold increased risk of EC [6]. miR-330-3p promotes cell growth, cell migration, and invasion and inhibits cisplatininduced apoptosis in ESCC cells via suppression of PDCD4 expression [7]. miR-199a-5p downregulation contributes to enhancing EC cell proliferation through upregulation of mitogen-activated protein kinase kinase kinase-11 [8]. DACT2 is frequently methylated in human esophageal cancer; methylated DATC2 accelerates esophageal cancer development by activating Wnt signaling [9]. RUNX3 methylation is associated with an increased risk, progression, and poor survival in EC [10].
Currently, the molecular mechanism of EC was unclear. In this study, we used bioinformatics methods to analyze the mRNA expression data of EC, which were available on the GEO database, to identify key genes and pathways in EC, aiming to provide valuable information for further pathogenesis mechanism elucidation and provide ground work for therapeutic targets identification for EC.

Expression Profile Microarray.
Gene expression profiles data were downloaded from the Gene Expression Omnibus (GEO) data repository (http://www.ncbi.nlm.nih.gov/geo/). The datasets of patients receiving preoperative treatment before oesophagectomy and cell lines receiving drug stimulus were excluded. Total of 5 mRNA expression datasets of EC tissues/cell lines comprising GSE53625, GSE33810, GSE17351, GSE9982, and GSE12737 were included in our study.

Identification of DEGs.
The raw data of the mRNA expression profiles were downloaded and analyzed by R language software [11]. Background correction, quartile data normalization, and probe summarization were applied for the original data. The limma [12] method in Bioconductor (http://www.bioconductor.org/) was used to identify genes which were differentially expressed between EC and normal controls; the significance of DEGs was calculated by t-test and was represented by value. To reduce the risk of false positives, values were adjusted for multiple testing using the Benjamini-Hochberg False Discovery Rate (FDR) method. The corrected value was represented by FDR [13]. FDR < 0.05 were considered as the cutoff values for DEG screening.

Gene Ontology
Analysis. GO is a useful tool for collecting a large number of gene annotation terms [14]. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) [15], is bioinformatics resources consisting of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological functional annotation from large gene/protein lists, such as being derived from high-throughput genomic experiments. To gain the in-depth understanding of the biological functions of DEGs, DAVID tool was used to obtain the enriched GO terms of DEGs based on the hypergeometric distribution to compute values, which were corrected by the Benjamini and Hochberg FDR method for multiple hypothesis testing. FDR < 0.05 was set as the threshold value.

KEGG Enrichment
Pathways. KEGG is a database resource for understanding functions of genes list from molecular level [16]. GeneCoDis3 is a valuable tool to functionally interpret results from experimental techniques in genomics [17]. This web-based application integrates different sources of information for finding groups of genes with similar biological meaning. The enrichment analysis of GeneCoDis3 is essential in the interpretation of high-throughput experiments. In the study, GeneCoDis3 software was used to test the statistical enrichment of DEGs in KEGG pathways. < 0.05 was set as the threshold value.
2.5. PPI Interaction Network. The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid .org/) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species including budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe, and the model plant Arabidopsis thaliana. In a word, BioGRID is a depository for genetic and protein interactions based on experimental verification [18]. The top 10 upregulated genes and top 10 downregulated genes between EC and normal controls were subjected to BioGRID database to get the predicted PPIs of these DEGs. The PPIs were visualized in Cytoscape [17].

qRT-PCR Validation.
Total RNA of fresh paired EC tumor and adjacent nontumor specimens were extracted using TRIzol reagent (Invitrogen, CA, USA). The SuperScript III Reverse Transcription Kit (Invitrogen, CA, USA) was used to synthesize the cDNA. qRT-PCR reactions were performed using Power SYBR Green PCR Master Mix (Applied Biosystems, Foster City, CA) on the Applied Biosystems 7500 (Foster City, CA, USA). -actin was used as internal control for mRNA detected. The relative expression of genes was calculated using the comparative Ct methods [19]. The PCR primers were used as shown in supplementary

Identification of DEGs.
Five mRNA expression profiles including 208 EC samples and 195 normal controls were downloaded and analyzed, as shown in Table 1. 208 EC samples comprised 207 squamous cell carcinoma samples and 1 adenocarcinoma sample. 1955 DEGs were identified in EC compared to normal control, including 919 upregulated and 1036 downregulated genes. The top 10 significantly upregulated and downregulated genes were listed in Table 2. The most significantly up-and downregulated genes were CDK4 and THSD4, respectively. The full list of DEGs in EC was shown in supplementary Table S1.

GO Analysis of DEGs.
Following GO analyses for upand downregulated DEGs, significant GO terms including biological process, cellular component, and molecular function were collected. For upregulated DEGs, cell cycle was the most significant enrichment of biological process; membrane-enclosed lumen was the highest enrichment of cellular component; nucleotide binding was the highest enrichment of molecular function, as shown in Table 3. For downregulated DEGs, response to wounding was the most significant enrichment of biological process; actin cytoskeleton was the highest enrichment of cellular component and  cytoskeletal protein binding was the highest enrichment of molecular function, as shown in Table 4.

KEGG Enrichment Pathways of DEGs.
Following KEGG enrichment analysis for DEGs, significant KEGG terms were collected. The pathways enriched by 919 upregulated DEGs were mainly related to cell cycle, RNA transport, and p53 signaling pathway (Table 5). 1036 downregulated DEGs were significantly enriched in Endocytosis, focal adhesion, and vascular smooth muscle contraction, as shown in Table 6.

PPI Network Construction.
Based on data from the BioGRID database, the PPI network was the top 10 upregulated and downregulated DEGs which were constructed by Cytoscape software (Figure 1). The network consisted of 451 nodes and 499 edges. In the PPI networks the nodes with high degree are defined as hub proteins. The most significant hub proteins in the PPI network were CDK4 (degree = 132) and CCT3 (degree = 127); as shown in Figure 1, the red circular nodes represent upregulated DEGs and green circular nodes represent downregulated DEGs, respectively.  (Figures 2(c)-2(f)), respectively. SIM2 was significantly downregulated in ESCC ( Figure 2(g)). THSD4 had the downregulation tendency in ESCC (Figure 2(h)). The qRT-PCR results were matched with the microarray analysis.

Discussion
CDK4 was identified as the most significantly upregulated gene in our microarray analysis and it had an upregulated tendency in EC tissues through the qRT-PCR validation. CDK4 was the hub protein and interacted with 132 genes in the regulatory network. CDK4 was significantly enriched in cell cycle, measles, small cell lung cancer, and pathways in cancer. CDK4 encodes cyclin-dependent kinase 4, a member of the Ser/Thr protein kinase family, which plays an important role in cell cycle G1 phase progression and G1/S transition. In our study, CDK1, CDK6, and CDK10 showed upregulation in EC. CDK1, CDK6, and CDK4 were significantly enriched in cell cycle pathway. CDK4 is overexpression in several cancer comprising of breast cancer, pancreas cancer, clear cell renal cell carcinoma, and colorectal cancer [20][21][22][23]. Downregulation of MALAT1 (long noncoding RNA metastasis-associated lung adenocarcinoma transcript 1) inhibits breast cancer cell proliferation and cell cycle progression in vitro and in vivo through miR-124 downregulation and CDK4 upregulation [20,24]. Overexpression of cyclin D1/CDK4 is regulated by CEACAM6 and promotes cell proliferation in human pancreatic carcinoma [21]. CDK4 and CDK6 expression are decreased by miR-1 and contribute to inhibition of cell cycle progression and metastasis in clear cell renal cell carcinoma [22]. CCT3 was the top 3 upregulation DEGs in EC ( Table 2). The qRT-PCR displayed that CCT3 was significantly upregulated in EC, which was in accordance with our microarray analysis ( Figure 2). CCT3 interacted with 127 genes in the PPI network ( Figure 1). CCT3 encodes chaperonin containing TCP1 subunit 3, a molecular chaperone, which is a member of the chaperonin containing TCP1 complex (CCT). In our study, CCT2, CCT4, CCT5, and CCT7 were upregulated in EC compared to normal controls, respectively. CCT3 depletion suppresses cell proliferation by inducing mitotic arrest at prometaphase and apoptosis eventually in HCC in vitro. Clinically, overexpression of CCT3 predicts poor prognosis in hepatocellular carcinoma patients after hepatectomy [25,26]. CCT3 is significantly associated with carboplatin resistance in ovarian cancer patients after surgery treatment [27]. The proteomic-based study shows that patients with cholangiocarcinoma (CCA) which are positive for CCT3 and CCT3 might be potential biomarker for the diagnosis of CCA [28]. To our knowledge, this is the first report about CCT3 expressed status in EC and the biological function of upregulated CCT3 in EC needs further exploration.
THSD4 was the most downregulated DGE in EC through microarray analysis. The expression level of THSD4 had no significance in EC compared to normal controls but had the downregulated tendency in EC. THSD4 encodes thrombospondin type 1 domain containing 4. The methylated status of THSD4 shows positive correlation with short survival in glioblastoma patients and hypermethylation of THSD4 indicates poor survival [29]. The expression of THSD4 is regulated by GATA3 and mediates transformation of normal cells into breast cancer through deregulation of THSD4 [30]. The role of downregulated THSD4 in EC is unclear, and the investigation needs to be carried out in the future.
SIM2 was significantly downregulated in EC (Figure 2). SIM2 encodes single-minded family bHLH transcription factor 2. SIM2-s was dysregulated in glioma, prostate cancer, breast cancer, colorectal cancer, and ESCC [31][32][33][34][35]. SIM2s is downregulated in human breast cancer samples and it suppresses tumor activity through decreased expression of matrix metalloprotease-3. In breast cancer, SIM2s is downregulated. It is a key regulator of mammary-ductal development. SIM2s inhibition is associated with cell invasive and EMT-like phenotype through regulating matrix metalloprotease-3 expression [34,36] It is reported that SIM2s is downregulated in 70% ESCC tissues, which is consistent with our qRT-PCR verification [35]. SIM2 overexpression results in increase of drug-and radio-sensitivities in ESCC in vivo and in vitro and patients with high expression level of SIM2 are associated with favorable prognosis before chemotherapy [35]. It is suggested that SIM2 plays vital roles in EC onset and progression.   MYBL2, CENPF, CDKN3, and CDCA3 were upregulated in EC tissues ( Figure 2). MYBL2 is frequently amplified in gastroesophageal cancer cell lines and Barrett's adenocarcinoma [37,38]. CENPF is frequently amplified in region around 1q32-q41 and is overexpressed in ESCC cell line [39]. CDKN3 is upregulated in 68.0% of the epithelial ovarian cancer samples and lung adenocarcinoma patients and is correlated with poor patient survival [40,41]. CDCA3 expression status in EC was firstly reported in our study. The molecular mechanism of MYBL2, CENPF, CDKN3, and CDCA3 in EC is needed to be explored.

Conclusions
We identified 1955 DEGs comprising 919 upregulated genes and 1036 downregulated genes in EC. DEGs including CDK4, CCT3, THSD4, and SIM2 were verified in EC tissues through qRT-PCR. CDK4 and CCT3 were hub proteins in the PPI interaction network. We found that some genes including CDK4, CCT3, THSD4, and SIM2 may play essential roles in EC through cell cycle, RNA transport, Endocytosis, and focal adhesion signaling pathways. The genes could also be considered as potential candidate biomarkers for therapeutic targets for this malignancy. Furthermore, our study would shed light on the molecular mechanism underlying tumorigenesis of EC. (h) THSD4. EC: esophageal carcinoma; CON: adjacent nontumor tissues of ESCC. At least three independent experiments were performed for statistical evaluation. qRT-PCR experimental data were expressed as means ± SD. The statistical significance was evaluated using Student's t-test and < 0.05 was considered as a significant difference.