Candidate Biomarkers and Molecular Mechanism Investigation for Glioblastoma Multiforme Utilizing WGCNA

To reveal the potential molecular mechanism of glioblastoma multiforme (GBM) and provide the candidate biomarkers for GBM gene therapy. Microarray dataset GSE50161 was obtained from GEO database. The differentially expressed genes (DEGs) were identified between GBM samples and control samples, followed by the module partition analysis based on WGCNA. Then, the pathway and functional enrichment analyses of DEGs were performed. The hub genes were further investigated, followed by the survival analysis and data validation. A total of 1913 DEGs were investigated between two groups, followed by analysis of 5 modules using WGCNA. These DEGs were mainly enriched in functions like inflammatory response. The hub genes including upregulated N-Myc and STAT Interactor (NMI), Capping Actin Protein-Gelsolin Like (CAPG), and Proteasome Subunit Beta 8 (PSMB8) were revealed as potential liquid biopsy molecules for GBM diagnose. Moreover, Nucleolar and Spindle Associated Protein 1 (NUSAP1) and G Protein-Coupled Receptor 65 (GPR65) were outstanding genes in survival analysis. Our results suggested that CPNE6, HAPLN2, CMTM3, NMI, CAPG, and PSMB8 might be used as potential molecules for liquid biopsy of GBM. NUSAP1 and GPR65 might be novel prognostic targets for GBM gene therapy. Furthermore, the upregulated NMI might play an important role in GBM progression via inflammatory response.


Introduction
Glioblastoma multiforme (GBM) is the most aggressive cancer that represent 15% of all brain tumors [1]. The survival rate for GBM patients is less than 15 months [2]. Although surgery is commonly used for the treatment of GBM [3], the cancer usually recurs due to lack of effective prevention method for this disease [4].
Understanding the mechanisms of glioblastoma at the molecular and structural level is valuable for clinical treatment [5]. Bioinformatics can be effectively used to analyze GBM microarray data to provide theoretical reference for further exploration of tumorigenesis mechanism and help search for potential target genes [6]. Based on bioinformatics study, some differentially expressed genes (DEGs) such as Transforming Growth Factor Beta Induced (TGFBI) and SRY-Box 4 (SOX4) were explored as the potential therapy targets for GBM [7]. Coexpression analysis has emerged as a powerful technique for obtaining novel insights into complex mechanisms and multigene analysis of large-scale data sets, especially for identifying functional modules. As an approach of bioinformatics study, weighted gene coexpression network analysis (WGCNA) is commonly used for revealing the correlation between genes in different samples [8]. Previous WGCNA shows the epigenetic events in GBM development and prognosis based on The Cancer 2 BioMed Research International Genome Atlas (TCGA) database [9]. Thus, WGCNA can be used to predict genes associated with cancer development [10]. In pervious study, Griesinger et al. indicated that the different pediatric brain tumor types (including GBM) exhibited distinct immunophenotypes, implying that specific immunotherapeutic approaches may be most effective for each tumor type [11]. However, candidate biomarkers for the clinical gene therapy of GBM were still unclear.
In current study, GBM gene expression data deposited by Griesinger et al. [11] was downloaded and reanalyzed by WGCNA. The DEGs between GBM samples and control samples were investigated, followed by the functional and pathway enrichment analysis. Then, TCGA survival analysis and data validation as well as literature verification were performed to confirm the effect of biomarker for GBM. We hoped to reveal the potential molecular mechanism of GBM and provide the candidate biomarkers for GBM gene therapy.

Microarray Data.
The gene expression profile of GSE50161 [11] was downloaded from Gene Expression Omnibus (GEO) database [12]. A total of 47 tissue samples including 34 surgical brain tissue samples (GBM group) obtained from patients who were diagnosed with GBM and 13 normal brain samples (control group) obtained from pediatric epilepsy patients at the time of surgical intervention were used for the follow-up analysis.

Data Preprocessing and DEG Analysis.
There were totally 17049 probes in the present dataset. The knn function in the impute package (version: 1.48.0) [13] was used to impute missing value. Normalization was performed using Limma Linear Models for Microarray Data (limma, version: 3.30.13) package [14]. The median value was considered as the gene expression value when different probes linked to the same gene (16812 genes). The eBayes analysis [15] was used to analyze the DEGs between GBM group and control group. P-value <0.05 and |log-fold change (LFC)| >2 were selected as the thresholds for DEGs screening.

WGCNA Analysis.
The coexpression network analysis was performed using WGCNA (version:1.61) [8]. First, the soft threshold for network construction was selected. The soft threshold makes the adjacency matrix to be the continuous value between 0 and 1, so that the constructed network conforms to the power-law distribution and is closer to the real biological network state. Second, the scale-free network was constructed using blockwiseModules function, followed by the module partition analysis to identify gene coexpression modules, which could group genes with similar patterns of expression. The modules were defined by cutting the clustering tree into branches using a dynamic tree cutting algorithm and assigned to different colors for visualization [16]. Then, the module eigengene (ME) of each module was calculated. ME represents the expression level for each module. Then, the correlation between ME and clinical trait in each module was calculated. Finally, the gene significance (GS) of gene in the module, which represented the correlation between gene and sample, was further calculated.

Function and the Pathway Enrichment
Analysis. The DAVID 6.8 (https://david.ncifcrf.gov) [17] software was used for the GO-biological function (GO-BP) [18] and KEGG pathway analysis [19] of genes in each module. P false discovery rate (FDR) <0.05 was selected as the threshold for the identification of significant GO-BP terms and pathways.
2.5. The Hub Gene Investigation. The network topological index is defined as the number of links incident upon a node [20]. The key node (hub gene) was determined by high intramodule connectivity of genes by summing the connection strengths with other module genes. In this study, the intramodular connectivity of genes was identified by WGCNA [8]. According to the intramodule connectivity, the top 20 hub genes in modules were visualized using Cytoscape (version 3.5.1) software [21].

Survival Analysis.
To reveal the prognostic value of hub genes on GBM patients, the survival analysis was performed. The expression value and clinical data of GBM were downloaded from the TCGA [22] database. A total of 148 samples were included. The samples in the data were divided into high expression group (up group) and low expression group (down group) according to the median value of each hub protein.
The survival package (version: 2.41-3) in R software was used for the current analysis. Then, the survival rate estimation and statistical significance were performed using Kaplan-Meier (KM) method [23] and log-rank test [24], respectively. P < 0.05 was considered statistically significant.

Data Validation.
In order to verify the expression of hub genes in serum samples in other dataset, the microarray data of GSE24084 [25] (platform: GPL8755 PF-Agilent-014850 Whole Human Genome Microarray 4x44K), which included 16 serum samples (9 cancer samples and 7 normal samples), was obtained from GEO database. Using ROCR package [26] in R software (version: 1.0-7), the area under curve (AUC) from the receiver operating characteristic (ROC) curve study was performed on the expression data of each hub gene. In the present study, AUC >0.5 represented upregulated genes, while AUC <0.5 represented downregulated genes. Larger |AUC-0.5| value of a gene indicated that this genes can well distinguish GBM from the control samples [27]. Based on the AUC values, the diagnose effect of hub genes (as liquid biopsy molecules) in each module was further investigated.

DEGs between GBM Group and Control Group.
With P FDR < 0.05 and |LFC| > 2, a total of 1913 DEGs including 776 upregulated DEGs and 1137 downregulated DEGs were identified between GBM group and control group. The heat map and volcano plot are shown in Figures 1(a) and 1(b), respectively. The results of heat map showed that these DEGs could be used to well distinguish the GBM from the control samples.

WGCNA Analysis.
The WGCNA analysis was performed on 1913 DEGs. The soft threshold for network construction was selected as 18 (Figure 1(c)) [28]. Meanwhile, the fitting degree of scale-free topological model was 0.85. Thus, this network conformed to the power-law distribution and was closer to the real biological network state.
A total of 5 modules (Figure 1(d)) including turquoise (867 DEGs), grey (643 DEGs), blue (238 DEGs), brown (137 DEGs), and yellow (28 DEGs) were obtained in current study. The DEGs in grey were not included in any module, so the subsequent analysis was no long performed on grey. ME was in accordance with the expression pattern of DEGs in each module. The turquoise module and yellow module were downregulated, while blue module and brown module were upregulated. Furthermore, the turquoise module (correlation index: -0.9, P =7.0E-18) and yellow module (correlation index: -0.69, P =9.0E-8) were negatively correlated with the disease. Meanwhile, blue module (correlation index: 0.86, P =1.0E-14) and brown module (correlation index: -0.78, P =1.0E-10) were positively correlated with the disease (Figure 2). The GS value for blue module, turquoise module, brown module, and yellow module were 0.75, 0.73, 0.64, and 0.59, respectively, which indicated a close relationship with the disease (Figure 3).

Hub Genes in Modules.
According to the network topological index, a total of 80 hub genes were investigated from 4 modules. The detailed information was showed in Figure 4.

Survival Analysis and Data Validation.
To validate our initial finding of hub genes, we examined an independent TCGA dataset. Based on the expression and clinical data of 148 GBM samples from TCGA database, the survival analysis was performed on totally 80 hub genes. The KM curve for gene with the minimal P value in each module was showed in Figure 5. Three genes including FXYD Domain Containing Ion Transport Regulator 1 (FXYD1, P =1.92E-02), Nucleolar and Spindle Associated Protein 1 (NUSAP1, P =4.44E-02), and G Protein-Coupled Receptor 65 (GPR65, P =2.01E-02) were outstanding in the current survival analysis ( Figure 5), which had significant association with prognosis. Moreover, to explore the expression pattern of hub genes in serum samples from GBM patients, the validation data GSE24084 was downloaded from the GEO database, followed by ROC curve analysis using ROCR software. The result showed that Copine 6 (CPNE6), Hyaluronan and Proteoglycan Link Protein 2 (HAPLN2), Ribonucleotide Reductase Regulatory Subunit M2 (RRM2), and CKLF Like MARVEL Transmembrane Domain Containing 3 (CMTM3) were four outstanding genes with the largest |AUC-0.5| in each module, respectively ( Figure 6). Furthermore, the other genes including N-Myc and STAT Interactor (NMI), Capping Actin Protein-Gelsolin Like (CAPG), Proteasome Subunit Beta 8 (PSMB8), CKLF Like MARVEL Transmembrane Domain Containing 3 (CMTM3), Sodium Voltage-Gated Channel Beta Subunit 2 (SCN2B), Copine 6 (CPNE6), and Hyaluronan and Proteoglycan Link Protein 2 (HAPLN2) were genes with |AUC-0.5| > 0.4.

Discussion
GBM is a kind of brain tumor in adults [29]. Targeted gene therapy is a strategy for GBM [30]. However, the effective candidate biomarkers for the gene therapy of GBM was still unclear. A gene expression data of GBM was analyzed by  WGCNA in this study. The results showed that a total of 1913 DEGs were investigated between GBM samples and control samples, followed by 5 modules explored. These DEGs were mainly enriched in functions like inflammatory response and pathways including cell cycle. The hub genes including CPNE6, HAPLN2, CMTM3, NMI, CAPG, and PSMB8 were revealed as potential liquid biopsy molecules for GBM diagnose. Moreover, FXYD1, NUSAP1, and GPR65 were three outstanding genes in survival analysis.
Evidence demonstrated that CPNE6 terminated their expression in glioblastomas [31]. Sim and colleagues found that HAPLN2, also known as BRAL1, was virtually absent in malignant gliomas [32]. A previous study had reported that CMTM3 promoted cell invasion of glioblastoma and was significantly correlated with shorter overall survival [33]. NMI, is a protein that play critical roles in tumor growth, progression, and metastasis [34]. Previous study indicates that NMI polymorphisms are closely related to the genetic susceptibility of glioma in Chinese Han population [35]. CAPG is a ubiquitous gelsolin-family actin-modulating protein involved in the control of cell migration via immune and inflammatory pathways [36]. Yun et al. showed that the downregulation of CAPG significantly inhibited GBM cell proliferation by blocking the cell cycle in G1/S transition [37]. PSMB8, a protein contributes to the complete assembly of 20S proteasome complex, regulates glioma cell migration, proliferation, and apoptosis [38]. Previous study shows that autoinflammatory disorder caused by PSMB8 mutation leads to the proteasome assembly defection [39]. In the present study, the |AUC-0.5| value for hub genes like CPNE6 (downregulated), HAPLN2 (downregulated), CMTM3 (upregulated), NMI (upregulated), CAPG (upregulated), and PSMB8 (upregulated) was all larger than 0.4. Based on the study of Long et al. [27], we suggested that hub genes like CPNE6, HAPLN2, CMTM3, NMI, CAPG, and PSMB8 might be used as potential molecules for liquid biopsy of GBM. Furthermore the inflammatory pathway has previously been linked to chemotherapy resistance in glioma tumors [40]. Actually, the suppression role in inflammation response was closely related with gliomas development and progression [41]. In this study, the function analysis in this study showed that inflammatory response was one of the most outstanding functions enriched by DEGs including upregulated NMI. Thus, we speculated that the upregulated NMI might play an important role in GBM progression via inflammatory response.
Identification of hub genes or regulatory factors is the first step for GBM gene therapy [42]. Previous study shows that NUSAP1 expression is correlated not only with glioma grade but also with prognosis of glioma patients [43]. Knockdown of NUSAP1 expressing suppress glioma cells growth and promote glioma cell apoptosis [44]. Despite NUSAP1, GPR65 proved to promote tumor growth in cancer [45]. Ail et al. indicated that GPR65 may take part in a mechanism that is for photoreceptors survival in the degenerating retina [46]. However, whether GPR65 contribute to the process of glioma or BGM is still unclear. In the present study, NUSAP1 and GPR65 were not only DEGs between GBM samples and control samples, but also two most outstanding hub genes in survival analysis. Thus, we speculated that NUSAP1 and GPR65 might be novel prognostic targets for GBM gene therapy. However, there were some limitations in this study such as small sample size and lack of verification test; thus, a large sample size with a wide verification analysis is needed in the further investigation.

Conclusions
In conclusion, CPNE6, HAPLN2, CMTM3, NMI, CAPG, and PSMB8 might be used as potential molecules for liquid biopsy of GBM. NUSAP1 and GPR65 might be novel prognostic targets for GBM gene therapy. Furthermore, the upregulated NMI might play an important role in GBM progression via inflammatory response.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure
Qi Yang and Rui Wang are co-first authors.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.