The Identification of Key Genes and Pathways in Glioma by Bioinformatics Analysis

Glioma is the most common malignant tumor in the central nervous system. This study aims to explore the potential mechanism and identify gene signatures of glioma. The glioma gene expression profile GSE4290 was analyzed for differentially expressed genes (DEGs). Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were applied for the enriched pathways. A protein-protein interaction (PPI) network was constructed to find the hub genes. Survival analysis was conducted to screen and validate critical genes. In this study, 775 downregulated DEGs were identified. GO analysis demonstrated that the DEGs were enriched in cellular protein modification, regulation of cell communication, and regulation of signaling. KEGG analysis indicated that the DEGs were enriched in the MAPK signaling pathway, endocytosis, oxytocin signaling, and calcium signaling. PPI network and module analysis found 12 hub genes, which were enriched in synaptic vesicle cycling rheumatoid arthritis and collecting duct acid secretion. The four key genes CDK17, GNA13, PHF21A, and MTHFD2 were identified in both generation (GSE4412) and validation (GSE4271) dataset, respectively. Regression analysis showed that CDK13, PHF21A, and MTHFD2 were independent predictors. The results suggested that CDK17, GNA13, PHF21A, and MTHFD2 might play important roles and potentially be valuable in the prognosis and treatment of glioma.


Introduction
Among the various histological subtypes of brain tumor, glioma is the most common malignant tumor in the central nervous system [1]. Established by the World Health Organization (WHO), it can be classified from grade I to grade IV based on histopathological and clinical criteria [2]. During invasive growth, most gliomas extend processes, resulting in a lack of clear borders between tumor and normal brain tissue, making surgical resection of the entire carcinoma difficult. Currently, imageological examination is the most important diagnostic method, as well as the evaluation of the postoperative curative effect. However, imaging is influenced by many factors, such as radiation injury and surgery that result in poor specificity. It is difficult to achieve early diagnosis and treatment of glioma due to a lack of specificity of auxiliary examination indices, so that many patients can lose the chance for radical excision, thereby increasing the risk for poor prognosis. The 5-year overall survival (OS) of patients with glioblastoma is less than 10% [3]. Therefore, the identification of sensitive and specific biological markers that would help identify patients at a higher or lower risk of death from glioma is of vital importance, not only for a better understanding of the molecular and cellular processes involved in tumorigenesis but also for more effective diagnosis, suitable treatment, and improved prognosis.
Gene expression profiling analysis is a useful method with broad clinical application for identifying tumor-related genes in various types of cancer, from molecular diagnosis to pathological classification, from therapeutic evaluation to prognosis prediction, and from drug sensitivity to neoplasm recurrence [4][5][6]. However, the use of microarrays in clinical practice is limited by the overwhelming number of genes identified by gene profiling, lack of both repeatability and independent validation, and need for complicated statistical analyses [7]. Therefore, in order to put these expression profiles in clinical practice, it is necessary to identify a suitable number of genes and develop a method that can be operated by routine assay. In this study, we downloaded original data from the glioma microarray in the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/), an online public collection database for registration, which is not only for saving microarray data but also for helping the user query and download. We compare gene expression profiles of tumor cells with normal brain cells in order to identify differentially expressed genes (DEGs). Subsequently, the identified DEGs were screened by using Morpheus online software, followed by gene ontology (GO) and pathway enrichment analysis. After analyzing their biological functions and pathways, we further explored the potential biomarkers for diagnosis and prognosis by survival analysis in two independent datasets in order to gain insight on glioma development and progression at the molecular level.

Microarray Data.
We downloaded the gene expression profiles in GSE4290, GSE4412, and GSE4271 from the GEO database. GSE4290 has a total of 180 samples, including 157 cases of glioma (26 astrocytomas, 50 oligodendrogliomas, and 81 glioblastomas) and 23 cases of normal brain tissue, based on the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) by Fine HA et al. Using the GPL96 platform (Affymetrix Human Genome U133A Array), the GES4412 dataset containing 85 cases of glioma was submitted by Nelson SF; and the GES4271 dataset, containing 100 samples that included 77 cases of primary tumor samples and 23 cases of recurrence, was submitted by Phillips HS et al.

Screen
Genes of Differential Expression. The analysis was carried out by using GEO2R, an online analysis tool, for the GEO database, based on R language. We applied analysis to classify the sample into two groups that had similar expression patterns in glioma and normal brain tissue. We defined DEGs as differentially expressed with logFC > 2 or logFC < −2, a criteria as described in the references [8,9]. An adj. P value < 0.05 was considered statistically significant. In addition, we used visual hierarchical cluster analysis to show the two groups by Morpheus online analysis software (https://software.broadinstitute.org/morpheus/) after the relative raw data of TXT files was downloaded.

Gene Ontology and KEGG Pathway Analysis of DEGs.
With functions including molecular function, biological pathways, and cellular component, gene ontology (GO) analysis annotates genes and gene products [10]. KEGG comprises a set of genome and enzymatic approaches and a biological chemical energy online database [11]. It is a resource for systematic analysis of gene function and related high-level genome functional information. DAVID (https:// david.ncifcrf.gov/) can provide systematic and comprehensive biological function annotation information for highthroughput gene expression [12]. Therefore, we applied GO and KEGG pathway analyses to the DEGs by using DAVID online tools at the functional level. A P < 0 05 was considered to have significant differences.

2.4.
Integration of Protein-Protein Interaction (PPI) Network and Module Analysis. The STRING database is an online tool for assessment and integration of protein-protein interactions, including direct (physical), as well as indirect (functional) associations. STRING version 10.0 covers 9,643,763 proteins from 2031 organisms [13]. We drew DEGs using STRING in order to assess the interactional relationships among the DEGs, then used the Cytoscape software to build a PPI network, employed the plug-in Molecular Complex Detection (MCODE) to screen PPI network modules, and established scores > 3 and nodes > 4 in the MCODE module, function, and pathway enrichment analysis. A P < 0 05 was considered statistically significant.

Identification of Biomarkers.
Based on the information in the individual MCODE modules, the node with the highest score was selected as the hub gene in GSE4290. Every hub gene was also found in two independent datasets (generation dataset GSE4412, primary tumor samples n = 85, and validation dataset GSE4271, primary tumor samples n = 77) based on the downloaded raw data files, including the information of gene expression value, overall survival time (OS), and survival state. Statistical analyses were performed using SPSS version 20.0 for Windows (IBM, Chicago, IL). We divided expression values into two groups, high expression and low expression, according to X-tile [14]. The Kaplan-Meier method was used to determine the probability of survival and analyzed by the log-rank test. A P < 0 05 was considered statistically significant.

Identification of DEGs.
A comparison of 157 cases of glioma samples with 23 cases of normal brain tissue in GSE4290 by using the GEO2R online analysis tool resulted in the identification of the DEGs listed in Figure 1. Based on GEO2R analysis, using an adjusted P < 0 05 and log (fold change) (logFC) ≥ 2.0 criteria, there were 775 downregulated genes identified. We further validated the results by using the Morpheus online tool, resulting in a DEG expression heat map, of the top 50 downregulated genes, shown in Figure 1.

Gene Ontology Enrichment Analysis.
We used the DAVID online analysis tool to identify statistically significantly enriched GO terms and KEGG pathways after uploading all downregulated genes. GO analysis results showed that downregulated DEGs were significantly enriched in molecular function (MF), including small molecule binding, nucleoside phosphate binding, and carbohydrate derivative binding (Table 1). For biological processes (BP), the downregulated genes were enriched in cellular protein modification, regulation of cell communication, and regulation of signaling (Table 1). In addition, GO cell component (CC) analysis also displayed that the downregulated DEGs were significantly enriched in the cytosol, membrane-bounded vesicles, and nucleoplasm (Table 1).

Module Analysis and Hub Gene Selection in the PPI
Network. Based on the information in the STRING database, the highest module was shown by using the MCODE plug-in, and the functional annotation of the genes involved in the module was analyzed (Figure 2). Enrichment pathway analysis showed that the genes in the module were related to synaptic vesicle cycling, rheumatoid arthritis, and collecting duct acid secretion. Moreover, the 12 hub nodes of the highest score were screened in all modules. The hub genes included PDS5 cohesin associated factor B (PDS5B), chromodomain helicase DNA binding protein 5 (CHD5), cyclin-dependent kinase 17 (CDK17), eukaryotic translation initiation factor 3 subunit E (EIF3E), ATPase H+ transporting V1 subunit H (ATP6V1H), G protein subunit alpha 13 (GNA13), PHD  finger protein 21A (PHF21A), methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 2 (MTHFD2), lipoprotein lipase (LPL), adenylosuccinate synthase (ADSS), Wnt family member 10B (WNT10B), and serine and argininerich splicing factor 1 (SRSF1). The enriched pathways for genes in the highest module were shown in Table 3.

Identification of Biomarkers.
In order to identify biomarkers, we calculated the survival rate for two groups of 12 hub genes in the generation dataset (GSE4412) and compared the result with the validation dataset (GSE4271) through Kaplan-Meier analysis and log-rank test. According to the analysis, we found that only the downregulation of CDK17, GNA13, PHF21A, and MTHFD2 was closely associated with a decreased overall survival among patients with glioma (Figures 3 and 4). The remaining 8 biomarkers had no statistical significance between gene expression and clinical outcome of glioma or no recoverability in the validation dataset. Furthermore, using the Cox proportional hazards model, a multivariate analysis was performed identifying that expression levels of CDK17, PHF21A, and MTHFD2 were independent prognostic factors ( Table 4).

Discussion
In the present study, we identified DEGs between glioma and normal samples and used a series of bioinformatics analyses to screen key gene and pathways associated with cancer. However, GSE4290 dataset contains only limited number of control samples, 23 out of 180 samples. In order to improve the statistical power of DEG, we defined that the absolute value of the logarithm (base 2) fold change (logFC) greater than 2 and 775 DEGs was obtained. Bioinformatics analysis on DEGs, including GO enrichment, KEGG pathway, PPI network, and survival analyses, found glioma-related genes and pathways that play an important role in cancer initiation and progression.
GO term enrichment analysis demonstrated that 775 downregulated DEGs were significantly enriched in functions involving cellular protein modification, regulation of cell communication, and regulation of signaling. Many studies found that the cellular protein modification, including phosphorylation, ubiquitination, and acetylation, can change the cell biology function, influence disease phenotype, affect glioma cell proliferation, invasion, and apoptosis, and regulate the development of glioma [15][16][17]. And glioma cells can regulate cell communication, through the information passed to the target cells, interact with the receptor, resulting in specific biological effect such as cell proliferation and cytoskeleton changes, and promote glioma progression and angiogenesis [18]. KEGG pathway analysis indicated that the functions of the downregulated genes were enriched in MAPK signaling, endocytosis, oxytocin signaling, and calcium signaling. Zhang et al. [19] demonstrated that the MAPK signaling pathway induces cell apoptosis in glioma cells and the calcium signaling pathway is involved in quiescence, maintenance, proliferation, and migration in glioma cells [20,21]. PPI network and module analysis found that the first gene module significantly was enriched in synaptic vesicle cycling. Some results indicate that interference synaptic vesicle cycling could disrupt synaptic function and homeostasis, which would lead to cognitive decline and neurodegeneration in Alzheimer's disease [22]. Therefore, monitoring these signaling pathways may help in the prediction of tumor occurrence and progression.
Since no survival data about GSE4290 could be available, two independent glioma datasets GSE4412 and GSE4271 were applied to detect whether the hub gene could affect the survival time of glioma patients. Survival analysis found that CDK17, GNA13, PHF21A, and MTHFD2 are closely associated with glioma. CDK17 is a member of the cyclindependent kinase family. Chaput et al. [23] found that the expression levels of CDK17 are significantly increased in Alzheimer's disease and are associated with the mechanism to promote AD neurodegeneration and may inhibit the pathology development in AD, and Demirkan et al. [24], through a genome-wide association study, found that the CDK17 can be mapped to the glycerophospholipid metabolism pathway. GNA13, one member of the G protein family, is involved in metastasis of tumor cells [25], angiogenesis, and cellular responses to chemokines [26]. In neuronal cells, GNA13 affects neurite outgrowth together with Rho, which is closely related with cell motility and differentiation [27]. Furthermore, GNA13 is coupled to brain-specific angiogenesis inhibitor-1 (BAI1), which is an adhesion-related GPCR, and regulates synaptic function via Rho signaling [28]. PHF21A (also known as BHC80), a plant homeodomain finger-containing protein, can affect the neurofacial and craniofacial development and suppression of the latter and lead to both craniofacial abnormalities and neuronal apoptosis [29]. Moreover, PHF21A specifically binds H3K4me, which is a transcribed genomic locus of regulated posttranslational modification, and implicated the development and maintenance of neural connections [30]. MTHFD2 (methylenetetrahydrofolate dehydrogenase 2) is a mitochondrial enzyme with methylenetetrahydrofolate dehydrogenase and methenyltetrahydrofolate cyclohydrolase activities and has an effect on cancer cell proliferation [31], migration, and invasion [32]. In the expression level of human tumors, MTHFD2 is overexpressed in most cancer types, but exceptions are found in glioma [33], similar to our results. Up to now, the biological functions of CDK17, GNA13, PHF21A, and MTHFD2 in glioma have remained unclear. However, our study shows that the expression level of four key genes are all downregulated in glioma, after comparison with normal brain tissue, and the downregulation is associated with poorer prognosis, as the patients with extended survival time have increased expression of these genes. At the moment, there are some related bioinformatic research reports of GSE4290 in glioma. Some studies have shown that different enrichment pathway analyses of DEGs can be classified according to their degrees of differential expression during tumor progression in order to explore the deterioration of low into high grade glioma [34]. Some research finds that the DEGs are regulated by transcription factors in glioblastoma [35] and microarray technology has been used to identify the DEGs and their functions in the development of three types of glioma (astrocytoma, glioblastoma, and oligodendroglioma) [36]. Different from these, our study selects the node of the highest score from each module as hub genes in MCODE after comparing nontumor samples with glioma samples. These hub nodes are the key genes of interaction, in the PPI network, that may play important roles in the occurrence and development of glioma. Moreover, hub gene identification is more persuasive, since we validate the association of hub genes and glioma by using survival analysis in two independent datasets to identify four genes that may be cancer biomarkers for glioma. Though not all hub genes associated with the survival of glioma patients, but some hub genes play important roles in immune or inflammation. For example, WNT10B plays an important role in regulating asthmatic airway inflammation through modification of the T cell response [37].
In conclusion, we presumed these key genes identified by a series of bioinformatics analyses on DEGs between tumor samples and normal samples, probably related to the development of glioma. These hub genes could also affect the survival time of glioma patients as validated from two independent datasets. These identified genes and pathways provide a more detailed molecular mechanism for underlying glioma initiation and development. According to the study, downregulation of CDK17, GNA13, PHF21A, and MTHFD2 can be considered as biomarkers or therapeutic targets for glioma. However, further molecular and biological   experiments are required to confirm the functions of the key genes in glioma.