Establishment of an Immune-Related Gene Signature for Risk Stratification for Patients with Glioma

Glioma is a frequently seen primary malignant intracranial tumor, characterized by poor prognosis. The study is aimed at constructing a prognostic model for risk stratification in patients suffering from glioma. Weighted gene coexpression network analysis (WGCNA), integrated transcriptome analysis, and combining immune-related genes (IRGs) were used to identify core differentially expressed IRGs (DE IRGs). Subsequently, univariate and multivariate Cox regression analyses were utilized to establish an immune-related risk score (IRRS) model for risk stratification for glioma patients. Furthermore, a nomogram was developed for predicting glioma patients' overall survival (OS). The turquoise module (cor = 0.67; P < 0.001) and its genes (n = 1092) were significantly pertinent to glioma progression. Ultimately, multivariate Cox regression analysis constructed an IRRS model based on VEGFA, SOCS3, SPP1, and TGFB2 core DE IRGs, with a C-index of 0.811 (95% CI: 0.786-0.836). Then, Kaplan-Meier (KM) survival curves revealed that patients presenting high risk had a dismal outcome (P < 0.0001). Also, this IRRS model was found to be an independent prognostic indicator of gliomas' survival prediction, with HR of 1.89 (95% CI: 1.252-2.85) and 2.17 (95% CI: 1.493-3.14) in the Cancer Genome Atlas (TCGA) and Chinese Glioma Genome Atlas (CGGA) datasets, respectively. We established the IRRS prognostic model, capable of effectively stratifying glioma population, convenient for decision-making in clinical practice.


Introduction
Glioma is a common primary malignant brain tumor [1], with the average incidence of approximately six cases per 100,000 people in the world [2]. According to the WHO classification, glioma can be divided into four grades, including grades I, II, III, and IV, where grades I and II are lowgrade glioma (LGG), including astrocytoma, oligodendroglioma, and mixed oligodendrogliomas, whereas grades III and IV are high-grade gliomas (HGG), such as glioblastoma (GBM), anaplastic astrocytoma, and anaplastic oligodendroglioma [3]. However, as time goes by, LGG will progress to invasive HGG [4][5][6]. On the basis of the statistical data of the Chinese Glioma Genome Atlas (CGGA), the overall survival (OS) of LGG is 78.1 months, with a 5-year survival rate of 67% [7] . Nevertheless, once patients progress to HGG, their OS will shorten to 14.4 months and the 5-year survival rate is roughly 9%. Even though they adopt positive treat-ments, such as surgery and chemotherapy, as well as radiotherapy, their 2-year survival rate is merely 43% [8].
Immunotherapeutic strategies in glioma arouse unprecedented attention in the public, where long-term tumor remission can be achieved with minimal side-effects [9]. The immune therapeutic methods included immune checkpoint inhibitors (ICI), peptide vaccines, dendritic cell vaccines, chimeric antigen receptor T cells, and oncolytic viruses [10]. The ICIs contain the programmed cell death protein (PD-1) and its ligand (PD-L1), considered as the main factors hindering immune response [11]. However, due to low and variable levels of PD-L1 in glioma cells [12,13], most patients showed no significant increase in OS for anti-PD-1 therapy [14]. Thus, it is imperative to develop some potential consistently expressed immune-related biomarkers of the glioma. In recent years, some researchers propose that immune-related gene (IRG) signatures are involved in tumor prognosis, including pancreatic ductal adenocarcinoma [15], lung adenocarcinoma [16], neuroblastoma [17], and head and neck squamous cell carcinoma [18], as well as LGG [19]. In a previous study, Wang et al. have identified an immune-related lncRNA signature via construction of immune-lncRNAs and an immune-gene coexpression network, for predicting its prognostic value in the anaplastic gliomas [20]. Moreover, another previous study has established an immune-related risk signature for GBM, which can independently distinguish high-risk patients, and they also elucidated the relationship between local immune response and prognosis in GBM [21]. In addition, Zhang et al. constructed an IRG signature for risk stratification and developed a prognostic nomogram for survival prediction in LGG patients [19]. However, most researches established prognostic models based on IRG signatures [19,[21][22][23], but they have yet to illuminate whether progression-related differentially expressed IRGs (DE IRGs) can effectively perform risk stratification for patients subjected to glioma.
Hence, this present study was undertaken to develop a prognostic model via screening progression-related core DE IRGs [24], for stratifying glioma patients by risk score and providing potential immunotherapeutic targets for inhibition of glioma progression and improvement of patients' prognosis.

Weighted Gene Coexpression Network Analysis.
Genes with top-ranking 5000 expression data of the GSE4290 dataset were used to implement WGCNA. Then, coexpression modules were constructed using the "WGCNA" package in the RStudio software (Version 3.5.0). To eliminate outliers, the parameters of height and minsize were set to 110 and 30, respectively. Using the power function a mn = jc mn j β (a mn : adjacency between gene m and gene n, c mn : Pearson correlation, β: the soft threshold), a weighted adjacency matrix was constructed [32,33]. Then, the topological overlap matrix (TOM) was converted to the adjacency matrix to measure the network connectivity of these genes [32]. Subsequently, genes with absolute high correlation were classified into gene modules according to the TOM-based dissimilarity metric [33]. In addition, coexpression modules were uncovered using the blockwiseModules function of the "WGCNA" package [34,35]. Through establishment of a hierarchical clustering tree diagram of selected gene expression values and analysis of adaptive branch cutting [36], functional modules were visualized with different colors [37], where genes without being enriched in any module were assigned to the gray module [38]. Module eigengene (ME) is defined as the main component of the first standardized expression profile [39] and also is thought as the representative gene expression profile in modules [33,40]. Furthermore, module significance tends to be highly correlated with the correlations between ME and clinical traits [41]. And gene significance (GS) is considered to be the absolute values of the correlation coefficients between genes and traits, and module membership (MM) is defined as the correlations between the eigengene modules and gene expression profile [34,42]. Accordingly, a preliminary correlation between modules and clinical traits was revealed via the WGCNA method.

Identification of Differentially Expressed Genes (DEGs)
and Their IRGs (DE IRGs). The original data from four GEO datasets (including GSE45921, GSE15824, GSE43378 and GSE4290) were decompressed and implemented normalization, background correction, and log2 transformation via the "affy" package [43,44]. And missing values were filled using the impute.knn function [45]. Additionally, the ComBat algorithm in the "sva" package was adopted for removing batch-batch difference [46]. Subsequently, differential expression analysis between HGG and LGG samples was performed using the "limma" package. If multiple probes matched to the same gene symbol, the median expression values were selected as the expression levels of these genes. Besides, cutoff values of DEGs were set to | log 2 FoldChange ðlog 2 FCÞ | ≥1:0 and adjusted P < 0:05. To further discover glioma progression-related DE IRGs, a Venn diagram was used based on an overlapping region of DEGs and IRGs, as well as genes in the significant progression-related modules.

Functional Enrichment Analyses for DE IRGs. The Gene
Ontology (GO) term and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analyses were performed for DE IRGs via the "ClusterProfiler" [47] package. GO term enrichment analysis contained three categories, namely, molecular function (MF), cellular component (CC), and biological processes (BP) [48]. Additionally, KEGG (http://www.genome.jp/kegg/) is a bioinformatics resource that links genome or molecular datasets to 2 Computational and Mathematical Methods in Medicine networks [49]. The P value less than 0.05 was regarded as the threshold.

Construction of a PPI Network and Identification of Clusters.
A protein-protein interaction (PPI) network of DE IRGs was constructed using the STRING (https:// string-db.org/) [50] website via setting the minimum required interaction score to 0.900 and the max number of interactors to show to no more than 50, and visualized in the Cytoscape software [51]. Furthermore, the median degree was calculated by the CentiScaPe 2.0 plug-in, a tool for determining the most significant nodes [52]. And the molecular complex detection (MCODE) [53] plug-in was capable of visualizing the highly connected clusters of the PPI network. Thus, significant nodes were selected as core genes for the next analysis. A GOChord function in the "GOplot" package was employed to display the relationships that genes were coupled with GO terms through ribbons [54].
2.6. Survival Analysis. To disclose the predictive value of expression levels of DE IRGs for glioma survival, we adopted a log rank test to perform Kaplan-Meier (KM) survival analyses.

Establishment of an Immune-Related Risk Score (IRRS)
Model. We screened out the most significant nodes via the previous methods. The TCGA samples were used to construct and assess the IRRS model. Firstly, univariate Cox regression analysis was performed to screen those genes with P < 0:05. Subsequently, the IRRS model was established by multivariate Cox regression analysis based on the Akaike Information Criterion (AIC) algorithm [55]. The risk score formula was erected via the coefficients (β) of each gene from the multivariate Cox regression analysis and their expression levels. The calculation formula of the risk score was as follows: risk score = expression level of mRNA1 × β 1 mRNA1 + expression level of mRNA2 × β2 mRNA2 + ⋯ + expression level of mRNAn × βn mRNAn . Then, the timedependent receiver operating characteristic (ROC) curves for predicting the 1-year, 3-year, and 5-year OS showed the predictive performance of this IRRS model via the "sur-vivalROC" package [56]. In addition, the concordance index (C-index), ranging from 0.5 to 1.0, is an indicator for evaluating this model's predictive ability. The closer to 1.0 the Cindex, the better the discriminatory ability of the IRRS model [57]. Next, considering that 5 years is a key time point for evaluating the prognosis of glioma patients, we have stratified patients into high-and low-risk groups via adopting the optimal cutoff value of the risk score at the 5-year time point in ROC curves [58]. Subsequently, survival analysis was implemented using the "survival" and "survminer" packages via log rank test, and the 5-year survival rate also was acquired.  [59]. Moreover, calibration curves were utilized to evaluate the agreement of predictive and observed probabilities of 1-year, 3-year, and 5-year OS from the nomogram.
2.9. Statistical Analysis. Statistical analysis was conducted in the RStudio software (Version 3.5.0) and GraphPad Prism software (Version 5.0). Survival curves were established via the KM method and log-rank test. Calculation of the area under ROC curves of 1-year, 3-year, and 5-year survival rates was implemented based on the "survivalROC" package [56]. The "rms" package was applied to perform nomogram analysis. P < 0:05 denoted the presence of statistically significant difference.

Data Processing.
The flowchart of the present study design is presented in Figure 1.

Construction of a Weight Gene Coexpression Network and Identification of Grade Progression-Related Modules.
First, we used a hierarchical clustering method to detect the outliers, enabling two samples to be eliminated ("GSM97895" and "GSM97932"). The soft threshold value was equal to 12 via the pickSoftThreshold function, shown in Figure 2(a). Second, we employed the blockwiseModules function of the "WGCNA" package to acquire six modules, including brown, turquoise, green, blue, yellow and gray modules, with genes ranging from 149 to 1552, where the gray module represented genes without being enriched, which is visualized in Figure 2 Table 1).

Differentially Expressed Genes (DEGs) and
Their IRGs (DE IRGs), as well as Functional Enrichment Analyses. Differential expression analysis between HGG (n = 179) and LGG (n = 72) samples was completed via the "limma" package. According to the presetting cutoff value, 389 DEGs (Supplementary Table 2) were obtained for further analysis, with upregulation of 213 and downregulation of 176 DEGs. Figure 3(a) shows a clustering heat map that reveals the expression levels of five core DE IRGs. We selected the 41 overlapping genes in the DEG and IRG groups via Venn plot as DE IRGs. Thus, as shown in Figure 3(b), 26 progression-related DE IRGs located in the intersection region among the DEG, IRG, and turquoise module's gene groups. Subsequently, Figures 3(c) and 3(d) present the results of GO term and KEGG pathway enrichment analyses, respectively, which clarified that these DE IRGs were mainly enriched in receptor ligand activity, growth factor activity, extracellular matrix binding, etc. (GO term), as well as focal adhesion (KEGG pathway).

Construction of a PPI Network and Identification of Clusters.
A PPI network of 26 progression-related DE IRGs was constructed by the STRING tool and was visualized via the Cytoscape software, with 118 nodes and 705 edges (Figure 4(a)). The median degree, as the threshold of PPI network, was 11.94 calculated by the CentiScaPe 2.2 plugin. Subsequently, a total of four closely connected clusters were discovered (Table 1), with a score more than 6.0 using the MCODE algorithm, where the parameters of degree cutoff, node score cutoff, k-score, and max. depth were set to 2.0, 0.2, 2.0, and 100, respectively (Figures 4(b)-4(e)). Ulti-mately, we have identified five core DE IRGs (including VEGFA, SOCS3, THBS1, SPP1, and TGFB2 genes) with a degree more than 12 in clusters A and B, which met the requirements mentioned above. Additionally, GOChord diagrams disclosed that most of core DE IRGs were enriched in the positive regulation of epithelial cell migration (BP, Figure 4(f)) and extracellular matrix binding and cytokine activity, as well as receptor ligand activity (MF, Figure 4(g)).
3.6. Establishment of the IRRS Model. These core DE IRGs were used to perform univariate Cox regression analysis, screening genes with a P value less than 0.05, which were included for conducting multivariate Cox regression analysis via the AIC algorithm (the smallest value of 2753.8) for establishment of a prognostic model. Finally, we constructed an IRRS model composed of four DE IRGs via the following formula: risk score = 0:20365 × VEGFA + 0:12448 × SOCS3 + 0:26231 × SPP1 + 0:12930 × TGFB2. And C-index was 0.811 (95% CI: 0.786-0.836) for the IRRS model, showing the superiority of this model for predicting the probability of OS in glioma population (Table 3). Subsequently, we used diagrams of survival status' distribution and risk score of each glioma patient, as well as four gene expression heat  ). The optimal cutoff value of the risk score for forecasting the 5year survival in ROC curves was 0.87894, which was used to classify glioma patients into high-and low-risk populations and then generate a survival curve, revealing that the 5-year survival rate was merely 14.30% in the high-risk group (Figure 6(c)). As is shown in Table 4, the correlations between the risk score and clinicopathological characteristics were described. Also, univariate and multivariate Cox regression analyses based on the risk score and clinical characteristics indicated that this IRRS model was able to be an independent prognostic factor ( Figure 6(d)), with an HR value of 1.89 (95% CI: 1.252-2.85, P = 0:0024) (  5 Computational and Mathematical Methods in Medicine then, the survival curve elucidated that the 5-year survival rate was merely 29.80% in high-risk glioma patients (Figure 7(c)). Meanwhile, the IRRS model was demonstrated that it can become an independent prognostic indicator (HR: 2.17, 95% CI: 1.493-3.14, P < 0:0001, Figure 7(d)).

Development of a Nomogram.
A nomogram was erected to predict 1-year, 3-year, and 5-year OS of glioma patients via combing the IRRS prognostic model and significant clinical characteristics (Figure 8(a)), which appeared to be conducive to clinical use. Furthermore, calibration curves substantiated a superior consistency between predicted and observed values at probabilities of 1-year, 3-year, and 5year OS of the nomogram, as described in Figure 8(b).

TGFB2
Transforming growth factor beta 2 0.1293 1.14 1.02-1.27 0:0168 * HR: hazard ratio; * * * and * shows P value less than 0.01 and 0.05, respectively; bold texts indicate statistically significant difference. 9 Computational and Mathematical Methods in Medicine more than ten genes, which restricted their widespread application.

Discussion
With the increasing significance of IRG signatures on tumors' prognosis [29,77,78], it is imperative to throw light on their prognostic value for tumors. In our present study, four progression-related DE IRG (including VEGFA, SOCS3, SPP1, and TGFB2 genes)constituting a signature can perform risk stratification for glioma patients and disclosed that patients with high risk seemed to have an approximate 14.3%-29.8% 5-year survival rate. Besides, we speculated that these overexpressed core genes might participate in the    [79].
VEGFA, called vascular endothelial growth factor A, belongs to the PDGF/VEGF growth factor family and plays

11
Computational and Mathematical Methods in Medicine an essential part in inducing neovascularization [80] and promoting endothelial cell proliferation and migration. For example, the enzyme for degrading ECM can release active fragments from the matrix substance and activate growth factors, including the heparin-binding epidermal growth factor, insulin-like growth factor, epidermal growth factor, VEGF, and fibroblast growth factor-2, not only promoting the growth but also propelling invasion and neovascularization of tumors [81,82]. Also, VEGFA can act as a crucial regulator of the cancer-immune circulation via generating considerable modifications, inducing immune tolerance and causing tumor immune evasion. Furthermore, soluble proteins coded by the VEGFA gene were involved in the mechanism of inducing angiogenesis and immunosuppressive responses in gastric cancer [83]. In addition, Hatva et al. [84] measured the expression of VEGF in both the normal brain vessel system and glioma cells; then, they concluded that VEGF was manifestly overexpressed in malignant glioma cells and their corresponding receptors were induced, which played a cardinal role in the angiogenesis of tumors. Besides, Zhang et al. [85] found that escalation of VEGFA participated in the progression of glioma and lowered patients' OS. Additionally, a high level of VEGFA was relevant to hypoxia, angiogenesis, and immune suppression of the tumor microenvironment (TME) [86], which elaborated the mechanism of tumors' innate resistance to ICIs and uncovered that upregulated VEGFA enhanced the malignancy of tumor cells [87]. Some evidence indicated that a high level of VEGFA had the potentiality of facilitating treatment efficacy based on target therapy. Compared to monotherapy, in the bearing-tumor mice without high expression of VEGFA, combination treatment showed no synergistic effect in antitumor efficacy. Surprisingly, in treatment of tumors with overexpressed VEGFA, antiangiogenesis therapies have transformed the immunosuppressive TME and enhanced the effects of ICIs [87]. Thus, VEGFA can serve as not only a prognostic biomarker for predicting gliomas' prognosis but also a potential immunotherapeutic target in the near future.
SOCS3, suppressor of cytokine signaling 3, is one of the most potent members of the SOCS family and encodes a group of STAT inhibitors induced by STAT, which contains a motif of the kinase-inhibitory region (KIR) and directly inhibits signal transduction via suppressing the Janus kinase (JAK) catalytic activity [88,89]. To our knowledge, suppressive SOCS3 is a negative modulator of cytokine signaling via directly inhibiting JAKs, functioning as a key regulator of the immune system [90]. Besides, in the central nervous system, the expressed process of IFN-β-induced SOCS3 in astrocytes depended on the activation of STAT3. Destruction of the expression of SOCS3 caused a great deal of inflammatory responses and promoted the migration of microglial and T cells [91]. Shi et al. [92] revealed that the signaling pathway of ER-JAK2/STAT3/SOCS3 was activated by bisphenol F, which propelled the polarization of macrophages to the proinflammatory M1 subtype. Additionally, McFarland et al. [93] found that myeloid cell population without SOCS3 delayed the growth of intracranial tumors and raised survival rates in the orthotopic glioma-bearing mice. Also, Sheu et al. have found a novel transcriptional mechanism of ECM accumulation induced by high glucose. Specifically, high glucose enhanced SOCS3 expression via activating PI3K and STAT1/3, which produced a multitude of cascade reactions to form the ECM [94]. Therefore, we believe that SOCS3 will be a promising prognostic biomarker of patients with glioma.
SPP1, a secreted phosphoprotein 1, also called osteopontin (OPN), encodes the chemokine-like, calcified ECMassociated proteins. It plays a pivotal role in regulating immune functions and participating in adhesion, remodeling ECM, proliferation, and angiogenesis, as well as metastasis of tumors [95,96]. Saitoh et al. [97] demonstrated that OPN had a close relation to glioma's malignancy. OPN is one of the most significant components of the ECM, which is involved in the regulation of matrix interactions and cell adhesion. Also, OPN was able to interact with ECM elements, such as collagen, fibronectin, and calcium ion [98]. Besides, Friedmann-Morvinski et al. [99] indicated that silencing of OPN significantly exerted influence on the cell cycle and WNT as well as focal adhesion signaling pathways in GBM patients. At the same time, they have drawn a conclusion that OPN serves a crucial role in cell dedifferentiation during the formation of tumors. As a consequence, inhibition of OPN might be the potential target for the treatment of GBM. Also, OPN was overexpressed in almost 90% GBM patients, and a high level of OPN was associated with tumor malignancy, where OPN recruited neutrophils and macrophages, inducing tumor cell and leukocyte migration [100]. Similarly, macrophages were recruited by OPN to the site of GBM, implying that OPN acted as a crosstalk between glioma cells and the innate immune system. Thus, OPN could be considered to be an outstanding therapeutic target [101].
TGFB2, transforming growth factor beta 2, encodes the ligands of the TGF-beta super-family's proteins. These ligands bind a variety of TGF-beta receptors, leading to recruiting SMAD transcription factors, which modulate gene expressions. Chen et al. [102] found that malignant glioma cells were able to regulate some adhesion molecules (such as VCAM-1) via secreting TGF-beta2 and releasing the tumor necrosis factor (TNF) receptor. Of note, Corbet et al. [103] demonstrated that TGF-beta2 was the main   14 Computational and Mathematical Methods in Medicine LGG: low-grade glioma; GBM: glioblastoma; IDH: isocitrate dehydrogenase; TCGA: The Cancer Genome Atlas; CGGA: Chinese Glioma Genome Atlas; CI: confidence interval; NA: not available; * univariable analysis.
15 Computational and Mathematical Methods in Medicine driving force of the reconstruction of lipid metabolism promoted by tumor acidosis and was essential to support energy requirements of cancer cells' invasion. Moreover, they also found that acidosis-induced TGF-beta2 activation promoted partial epithelial-to-mesenchymal transition (EMT) and fatty acid metabolism in various original tumor cells, where the latter supported the acetylation of Smad2. Similarly, TGF-beta2 activated the autophagy of human glioma cell lines via the Smad and non-Smad pathways, facilitating glioma cell invasion, where epithelial-interstitial transformation and metabolic alterations were vital for glioma progression [104]. Additionally, Xiao et al. [105] showed that TGF-beta2 functioned as a key factor influencing immune cell recruitment and infiltration to the gastric tumors' site. Therefore, TGF-beta2 may be regarded as a valuable prognostic biomarker for tumors.
There were some advantages in the present study. First, we adopted the top-ranking 5000 gene expression values based on the MAD algorithm for performing WGCNA, avoiding utilizing DEGs, which was not recommended by the official website. Second, we also removed the batchbatch difference and performed differential expression analysis using the integrated transcriptome analysis based on four GEO datasets, which enabled our results to be more reliable. However, the limitations also need to be given more importance. Given that glioma tissues were difficult to collect, it is challenging to perform experimental validation in gene expressions. Furthermore, it is also rather restricted to construct a model of glioma progression using glioma cell lines for our laboratory. Finally, the specific mechanism and interrelationship of core IRGs in the IRRS model deserve to be further investigated.

Conclusions
Taken together, in the present study, we established an IRRS prognostic model, composed of VEGFA, SOCS3, SPP1, and TGFB2 core DE IRGs, for risk stratification and survival prediction for glioma patients. Additionally, glioma patients with high risk stratified by this IRRS prognostic model presented a short survival time, which might be attributed to the ECM signal pathway where those genes participate in. However, a multitude of prospective studies and experiments will be needed to verify our findings in the near future.

Disclosure
The funding agencies had no role in the study design, analysis, interpretation of results, decision to publish, or preparation of the manuscript. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agency.

Conflicts of Interest
The authors declare that there is no conflict of interest.