Identification of Common Genes Refers to Colorectal Carcinogenesis with Paired Cancer and Noncancer Samples

Colorectal cancer is a malignant tumor which harmed human beings' health. The aim of this study was to explore common biomarkers associated with colorectal carcinogenesis in paired cancer and noncancer samples. At first, fifty-nine pairs of colorectal cancer and noncancer samples from three gene expression datasets were collected and analyzed. Then, 181 upregulation and 282 downregulation common differential expression genes (DEGs) were found. Next, functional annotation was performed in the DAVID database with the DEGs. Finally, real-time polymerase chain reaction (PCR) assay was conducted to verify the analyses in sixteen colorectal cancer and individual-matched adjacent mucosa samples. Real-time PCR showed that MCM2, RNASEH2A, and TOP2A were upregulated in colorectal cancer compared with adjacent mucosa samples (MCM2, P < 0.001; RNASEH2A, P < 0.001; TOP2A, P = 0.001). These suggested that 463 DEGs might contribute to colorectal carcinogenesis.


Introduction
Colorectal cancer is one of the most commonly diagnosed cancers worldwide, with approximately 1.4 million cases and 693,900 deaths in 2012 [1]. Some risk factors may lead to colorectal cancer such as family history [2,3], inflammatory bowel disease [4], and smoking [5,6]. In the molecular level, colorectal cancer is a heterogeneous disease. Mutations, epigenetic changes, and expression differences of multiple genes are well-known colorectal cancer contributors. However, the underlying molecular mechanism of colorectal carcinogenesis has not been fully understood yet.
More than twenty years ago, complementary DNA microarray has been used to analyze the gene expression patterns in human cancer [7]. With the development of technology, more and more expression profile platforms and researches on colorectal cancer have been released in recent years. Currently, a large number of expression profile datasets were uploaded and shared publically on several public databases, and Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo) and ArrayExpress database (http://www.ebi.ac.uk/arrayexpress) were included. The public shared data of the databases could benefit the researchers greatly in finding interesting research targets or verify their ideas easily.
In this study, to further understand the mechanism of colorectal carcinogenesis from the gene expression level, we searched the GEO database for paired colorectal cancer and noncancer samples. Fifty-nine paired samples from GSE21510, GSE23878, and GSE32323 were selected in total. The DEGs were screened from three datasets, and 463 common DEGs (181 upregulation and 282 downregulation genes) were detected and sent to the DAVID database (https://david.ncifcrf.gov) for functional annotation. Three upregulated genes were chosen to verify the analysis results in sixteen pairs of colorectal cancer and adjacent mucosa tissues. Real-time PCR results showed that MCM2, RNA-SEH2A, and TOP2A were associated with colorectal carcinogenesis.

Materials and Methods
2.1. Microarray Data. We searched the NCBI-GEO database with the following keywords: Human Genome U133 Plus 2.0, colorectal cancer, and normal (Human Genome U133 Plus 2.0 Array is a platform of Affymetrix). Fifty-nine paired colorectal cancer and noncancerous samples were selected from three gene expression datasets (twenty-three pairs in GSE21510 from Japan, seventeen pairs in GSE23878 from Saudi Arabia, and nineteen pairs in GSE32323 from Japan). The cancer and noncancerous samples were divided into cancer and noncancer groups, and the noncancer group was used as control.

Identification of DEGs.
The selected samples of GSE21510, GSE23878, and GSE32323 were separately analyzed by R software (https://www.r-project.org). At first, raw data files of the three datasets were downloaded from the GEO database. Then, the Robust Multichip Average algorithm was applied to perform background correction and quantile normalization with the R package Affy [8,9]. Next, the probability of genes being differentially expressed between cancer and noncancer groups were calculated by the Linear Models for Microarray Data (LIMMA) package. Finally, the DEGs were selected under corrected P < 0 05 and |fold change| ≥ 2.0 criteria. The volcano plots of differentially expressed genes were also performed by R software [10]. The DEGs from each dataset were intersected to identify common DEGs.

Gene Ontology and Pathway Enrichment Analysis of
DEGs. Gene Ontology (GO, http://www.geneontology.org) was a framework for the model of biology that describes the attributes of gene products. It was classified into three aspects: molecular function, biological process, and cellular component [11]. Pathway analysis was a popular method to analyze microarray data in a more detailed, specific way [12,13]. DAVID database was a free online bioinformatics resource that aimed to provide functional interpretation of large lists of genes derived from genomic studies [14]. The common DEGs were input in the DAVID database to perform the GO and pathway analysis [15]. P < 0 05 was considered statistically significant. 2.5. RNA Isolation, Reverse Transcription, and Real-Time PCR. Tissue specimens were grounded and added with TRIzol reagent (Takara). Then the total RNA was isolated, and 1 μg of RNA was reverse-transcripted with PrimeScript 1st Strand complementary DNA Synthesis kit (Takara). Realtime PCR assay was performed on ABI 7500 platform. SYBR Premix Dimer Eraser kit (Takara) was used in 20 μl reaction volume, and the cycling conditions were as follows: an initial 30 s denaturation at 95°C and 45 cycles (5 s at 95°C, 30 s at 55°C, and 34 s at 72°C). PPIA and B2M genes were set as internal controls. MCM2, RNASEH2A, and TOP2A expression level was detected in sixteen pairs of colorectal cancer and adjacent mucosa samples. The primer sequences were shown in Supplementary Table 1. 2.6. Statistical Analysis. Statistical analyses were performed with SPSS version 18.0 (IBM Corporation) and GraphPad Prism 6.0 software (GraphPad Software Inc.). DEGs were determined using t-statistics from the LIMMA Bioconductor package. The real-time PCR data was analyzed with 2 −ΔCt and the significance of the difference between the cancer and noncancer groups was evaluated by two-tailed Student's t-test. P values less than 0.05 were considered statistically significant. 3.2. GO Term Enrichment Analysis. The 181 upregulated genes were uploaded to the DAVID database for GO analysis. The results showed that upregulated DEGs were significantly enriched in biological process, including 22 DEGs in cell division (P = 1 48E − 11) and 19 in mitotic nuclear division (P = 2 06E − 11) GO terms. In the cellular component, nucleoplasm (P = 4 95E − 07), spindle microtubule (P = 2 49 E − 06), and spindle pole (P = 6 09E − 06) were the top three GO terms, and there were 51 DEGs enriched in the nucleoplasm. In molecular function, the top three GO terms were frizzled binding (P = 3 70E − 04), protein binding (P = 4 65E − 04), and microtubule binding (P = 8 73E − 04) ( Table 1).

KEGG Pathway Analysis.
After all the DEGs were input into the DAVID database, KEGG pathway analysis results were also acquired. In Table 2, the most significantly enriched pathway of the upregulated and downregulated pathways was set out.

Discussion
Colorectal cancer was a complex disease. To get more information for colorectal cancer occurrence, the gene expression data of fifty-nine paired colorectal cancer and noncancer tissues was extracted from the GSE21510, GSE23878, and GSE32323 datasets. The fifty-nine paired samples were from two countries: Japan (40 samples from SE21510 and GSE32323) and Saudi Arabia (19 samples from SE32323). The regional divergence of the samples might contribute to finding the common DEGs from two the ethnic groups. After analysis, 181 upregulated DEGs and 282 downregulated common DEGs were screened. Pathway analyses showed that the upregulated DEGs were mainly involved in cell cycle pathway (n = 11), p53 signaling pathway (n = 4), and DNA replication pathway (n = 3).
Cell cycle plays a crucial role in tumor evolution and progression, so it acts as an important target for antitumor drugs, such as paclitaxel, vincristine, and Adriamycin. In this study, 3 genes (CDK1, CDK4, and CCNB1) refer to cell cycle pathway were also included in the p53 signaling pathway. CDK1 and CDK4 were key protein kinase for cell cycle control [16,17]. CCNB1 was also a regulatory protein involved in mitosis [18]. In the p53 signaling pathway, the RRM2 gene was an oncogene that overexpressed in colorectal cancer, with its elevated expression correlated with invasion depth, poorly differentiated type, and tumor node metastasis stage [19]. Transcription factor E2F1 could promote RRM2 expression in colorectal cancer cell lines [20].
DNA replication is an important pathway in carcinogenesis, which ranked the third in upregulated DEGs. Three upregulated DEGs refer to DNA replication were MCM2, RFC3, and RNASEH2A. MCM2 was a member of the MCM family (MCM2-7), and all 6 members of this family could form a hexameric protein complex with each other. This complex worked as a DNA helicase to untie the DNA double helix at the initiation stage of DNA synthesis [21]. MCM2 expression was reported to be associated with colorectal cancer stage and prognosis [22] and used to detect colorectal cancer in stool [23].
Eukaryotic RNase H2 was a heterotrimeric enzyme formed by RNASEH2A, RNASEH2B, and RNASEH2C. RNA-SEH2A was a catalytic subunit that could hydrolyze RNA/ DNA hybrid substrate with the structural support from RNASEH2B and RNASEH2C subunits [24]. It had been reported that RNASEH2A showed higher expression level in colorectal cancer [25], and this was validated in our research.
TOP2A was a gene that involves copy number variation and chromosomal instability in many cancers [26][27][28][29]. In colorectal cancer, protein expression level of TOP2A was related to aggressive tumor phenotype and advanced tumor stage [30]. In our research, we found that TOP2A mRNA expression level was upregulated in colorectal cancer.
The downregulated DEGs are mainly involved in mineral absorption, nitrogen metabolism, bile secretion, retinol metabolism, proximal tubule bicarbonate reclamation, pancreatic secretion and so on, which may signal that the tumor cells lose some function for metabolism of the normal colorectal epithelial cell.
In real-time PCR assay, the commonly used internal genes were GAPDH, β-actin, tubulin, and so forth. However,  in colorectal cancer tissues, GAPDH was not a good internal control gene since it showed higher transcription level than in normal mucosa, nor was β-actin [31,32]. The combined application of B2M and PPIA were better internal control than others in colorectal cancer [33]. Therefore, B2M and PPIA were used as internal control in our study. In summary, these gene expression profile analysis results suggested 463 candidate biomarkers for early screening and diagnosis of colorectal cancer. Our study confirmed that MCM2, RNASEH2A, and TOP2A were upregulated in colorectal cancer. The protein expression level and functional studies of these markers were warranted to reveal the molecular mechanism of colorectal cancer development.

Conflicts of Interest
The authors declare that they have no competing interests.