Identification and Validation of an Energy Metabolism-Related lncRNA-mRNA Signature for Lower-Grade Glioma

Energy metabolic processes play important roles for tumor malignancy, indicating that related protein-coding genes and regulatory upstream genes (such as long noncoding RNAs (lncRNAs)) may represent potential biomarkers for prognostic prediction. This study will develop a new energy metabolism-related lncRNA-mRNA prognostic signature for lower-grade glioma (LGG) patients. A GSE4290 dataset obtained from Gene Expression Omnibus was used for screening the differentially expressed genes (DEGs) and lncRNAs (DELs). The Cancer Genome Atlas (TCGA) dataset was used as the prognosis training set, while the Chinese Glioma Genome Atlas (CGGA) was for the validation set. Energy metabolism-related genes were collected from the Molecular Signatures Database (MsigDB), and a coexpression network was established between energy metabolism-related DEGs and DELs to identify energy metabolism-related DELs. Least absolute shrinkage and selection operator (LASSO) analysis was performed to filter the prognostic signature which underwent survival analysis and nomogram construction. A total of 1613 DEGs and 37 DELs were identified between LGG and normal brain tissues. One hundred and ten DEGs were overlapped with energy metabolism-related genes. Twenty-seven DELs could coexpress with 67 metabolism-related DEGs. LASSO regression analysis showed that 9 genes in the coexpression network were the optimal signature and used to construct the risk score. Kaplan-Meier curve analysis showed that patients with a high risk score had significantly worse OS than those with a low risk score (TCGA: HR = 3.192, 95%CI = 2.182‐4.670; CGGA: HR = 1.922, 95%CI = 1.431‐2.583). The predictive accuracy of the risk score was also high according to the AUC of the ROC curve (TCGA: 0.827; CGGA: 0.806). Multivariate Cox regression analyses revealed age, IDH1 mutation, and risk score as independent prognostic factors, and thus, a prognostic nomogram was established based on these three variables. The excellent prognostic performance of the nomogram was confirmed by calibration and discrimination analyses. In conclusion, our findings provided a new biomarker for the stratification of LGG patients with poor prognosis.


Introduction
Lower-grade gliomas (LGG) that include World Health Organization (WHO) grade II and III diffuse gliomas are common infiltrative brain tumors in adults [1]. Although advances have been made for the treatment of LGG, including neurosurgical resection, chemotherapy, and radiotherapy, a considerable proportion of patients still experience recurrence and malignant transformation to high-grade glioblastoma multiforme (GBM; WHO grade IV) [2], leading to declines in their health-related quality of life [3] and eventual death [2]. This heterogeneity in the prognosis of patients with LGG highlights the necessity to develop effective biomarkers to early stratify the patients at high risk for poor outcomes and give preventative therapy.
In order to maintain the malignant characteristics (rapid proliferation, migration, and invasion), tumor cells (including gliomas) need to produce a large amount of energy [4]. It is well known that carbohydrate, lipid, and amino acid metabolic processes are the main sources for the production of adenosine triphosphate (ATP) [5]. Therefore, the expression changes in genes involved in these metabolic processes may be important molecular mechanisms for the progression of gliomas, and the genes may represent potential biomarkers for prognostic prediction. This theory has been demonstrated by some scholars. For example, Qi et al. extracted the fatty acid catabolic metabolism-related genes from Molecular Signatures Database (MsigDB) and then identified an 8-gene risk signature using the Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis based on RNA-seq data from the Chinese Glioma Genome Atlas (CGGA) dataset and The Cancer Genome Atlas (TCGA) dataset. This risk signature was found to be an independent prognostic factor for patients with all grade gliomas (CGGA: hazard ratios ðHRÞ = 4:0044, 95%confidence intervals ðCIÞ = 2:7634-5:8028; TCGA: HR = 1:7382, 95%CI = 1:0577-2:8567) [6]. Wu et al. used the Cox proportional hazards model to screen a prognostic signature from the differential genes of lipid metabolism between LGG and GBM. Consequently, a nine-gene signature was obtained as a classifier, which was demonstrated to significantly distinguish the overall survival (OS) between the high-and low-risk group of CGGA and TCGA cohorts [7]. Univariate Cox regression analysis performed by Zhao et al. generated a signature including 45 glucose-related genes. This risk score was associated with the OS of patients in the CGGA (HR = 2:293, 95%CI = 1:471-3:576) training dataset and TCGA (HR = 1:227, 95% CI = 1:000-1:504) and GSE16011 (HR = 1:440; 95%CI = 1:016-2:039) validation datasets [8]. Based on glioma datasets from TCGA, REMBRANDT (Repository for Molecular Brain Neoplasia Data), and GSE16011, Chen et al. established a glycolytic gene expression signature score by incorporating ten glycolytic genes. A high risk score was reported to predict poor prognosis for patients with GBM [9]. Zhou et al. obtained 587 energy metabolism-related genes (including 41 carbohydrate, 73 lipid, and 144 protein metabolisms) from MsigDB and overlapped with the 463 differentially expressed genes between LGG and GBM to develop a risk score. As a result, a 29-gene signature was identified, and its predictive accuracy can reach 87.2% [10]. However, energy metabolism-related prognostic biomarkers for LGG remain rarely reported.
In addition, long noncoding RNAs (lncRNAs), a class of noncoding transcripts > 200 nucleotides in length, had also been demonstrated to affect cancer progression by regulation of the energy metabolism genes and then the related processes [11]. For example, Cheng et al. identified that highly expressed lncRNA X-inactive specific transcript (XIST) may promote cell viability, migration, invasion, and resistance to apoptosis by increasing glucose uptake, with the mechanism referring to upregulation of glucose transporters GLUT1 and GLUT3 [12]. The study of He et al. revealed that upregulated lncRNA UCA1 may induce glycolysis and invasion in glioma cells by competitively binding to miR-182 and then influencing the downstream target (fructose-2,6-biphosphatase) of miR-182 [13]. These findings indicated that the metabolism-related lncRNAs may also have underlying prognostic values for LGG; nevertheless, no studies focused on the lncRNA signature until now. Thus, the goal of this study was to construct a new energy metabolism-related lncRNA-mRNA prognostic signature for LGG patients.

Materials and Methods
2.1. Dataset Mining. Three datasets obtained from Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/ geo/), TCGA (https://gdc-portal.nci.nih.gov/; level 3), and CGGA (http://www.cgga.org.cn; level 3) were included in our study. GSE4290 in GEO was enrolled because it met the following inclusion criteria: (1) studying RNA expression profiles, (2) studying brain tissue samples, (3) the number of samples more than 50, and (4) consisting of LGG (n = 76) and normal controls (n = 23). GSE4290 was used for screening the differentially expressed RNAs (DERs). Samples of TCGA (n = 520) and CGGA (n = 431) were eligible if they (1) belonged to the LGG type and (2) provided the clinical survival outcomes because TCGA was used as the training set for constructing the prognosis model, while CGGA was used as the validation set to confirm the prognosis value of the established model.

Identification of Energy
Metabolism-Related Genes. The mRNAs and lncRNAs in the GSE4290 dataset were reannotated by the HUGO Gene Nomenclature Committee (HGNC; http://www.genenames.org/) that includes the standard nomenclature for 4516 lncRNAs and 19,200 protein-coding genes [14].

Identification of Energy Metabolism-Related lncRNAs.
Energy metabolism-related lncRNAs were determined by constructing a coexpression network between DELs and energy metabolism-related DEGs. The coexpression pairs were selected by calculation of Pearson correlation coefficients (PCC) between lncRNAs and DEGs by cor.test function (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor .test.html) in R. PCC > 0:6 revealed that a significant correlation existed. The network was visualized in the Cytoscape software (version 3.6.1; http://www.cytoscape.org/).

Function Enrichment Analysis.
To illustrate the specific metabolic processes involved of the lncRNAs in the network,  3 BioMed Research International coexpression network using "survival" package in R (version, 2.41-1; http://bioconductor.org/packages/survivalr/), with p < 0:05 tested by log-rank testing as the statistical threshold. A Cox proportional hazards model based on the L1-penalized regularization regression algorithm in the penalized package (version, 0.9-5; http://bioconductor.org/packages/penalized/) [18,19] was subsequently conducted in DELs and DEGs still significant after multivariate analysis to further screen the optimal signature combination. The expression differences of signature genes between LGG and GBM were subsequently identified in GSE4290 (71 vs. 81), CGGA (443 vs. 249), and TCGA (524 vs. 155) datasets via an unpaired t-test to verify their specificity, with p < 0:05 set as a statistical difference. Finally, the risk score was established as follows: Prognostic risk score = ∑β DERs × Exp DERs , where Exp DERs is the expression levels of prognostic DERs and ∑β DERs is the prognostic coefficients of DERs after LASSO analysis.
The LGG patients were divided into the high-risk group and low-risk group using the median risk score as the cutoff. The Kaplan-Meier curve and receiver operating characteristic (ROC) curve were used to assess the predictive ability of the energy metabolism-related signature. These analyses were performed for the training dataset (TCGA) and validation dataset (CGGA), respectively.
To validate if the risk score could be independent of other clinicopathological parameters, univariate and multivariate Cox regression analyses were performed using the training dataset, followed by the stratification analysis for clinical variables with p < 0:05 in multivariate analysis. Furthermore, a nomogram using the result of multivariate Cox regression analysis was constructed to predict the 3-year and 5-year OS. The performance of the nomogram was assessed by discrimination and calibration. The discrimination was determined by the area under the curve (AUC) of the ROC curve and concordance index (C-index). The calibration was evaluated by calibration curves, which shows the agreement between the predicted and observed survival probabilities.

Identification of Energy Metabolism-Related DERs.
A total of 15,183 protein-encoding mRNAs and 576 lncRNAs were annotated in three included datasets. Based on the LIMMA method, 1613 mRNAs and 37 lncRNAs were found to be differentially expressed between LGG and normal brain tissues in the GSE4290 dataset (Figure 1(a)). Hierarchical clustering analysis showed that LGG samples could be clearly    (Figure 1(b)). By searching the MsigDB database, a total of 1403 energy metabolism-related genes were downloaded, including 293 for carbohydrate, 738 for lipid, and 372 for amino acids and derivative metabolism. These energy metabolism-related genes were then overlapped with the above 1613 DEGs, with 110 shown to be shared (Figure 1(c)), indicating that they were energy metabolism-related DEGs.

Development of Energy Metabolism-Related DER-Based
Risk Score. Univariate Cox regression analysis revealed that 13 out of 27 energy metabolism-related DELs and 27 out of 67 energy metabolism-related DEGs were significantly correlated with OS of LGG patients in the TCGA dataset. Then, they were entered into multivariate Cox regression. Five energy metabolism-related DELs and four energy metabolism-related DEGs were filtered as significant, independent prognostic factors. These 9 genes were further suggested to be the powerful prognostic indicators after LASSO regression analysis (Table 2). Also, there were significant expression differences in these 9 genes between LGG and GBM, indicating that they were specific signatures for LGG (Table 2) After calculation of the risk score for each patient in TCGA and CGGA datasets, the patients were dichotomized to the low-risk (<median) group and high-risk (≥median) group. Kaplan-Meier curve analysis showed that patients with a high risk score had significantly worse OS than those with a low risk score (TCGA: HR = 3:192, 95%CI = 2:182-4:670, p = 1:989e − 10, Figure 4(a); CGGA: HR = 1:922, 95% CI = 1:431-2:583, p = 1:039e − 05, Figure 4(c)). The predictive accuracy of the prognostic risk score was also demonstrated to be high according to the AUC of ROC curve (TCGA: 0.827, Figure 4(b); CGGA: 0.806, Figure 4(d)).
To construct a prognostic nomogram, univariate and multivariate Cox regression analyses were performed for risk score and several clinical factors to explore all independent risk factors for OS. Univariate analysis demonstrated that age, histological type, isocitrate dehydrogenase 1 (IDH1) mutation, neoplasm histologic grade, radiation therapy, and risk score were significant prognostic factors. These significant variables further underwent multivariate analysis. As a result, age    (Table 3). Also, stratification analysis showed that the risk score could further distinguish the prognosis of patients with the same age (<42 years, p = 2:529e − 04, Figure 5(b); ≥42 years, p = 3:071e − 09, Figure 5(c)) and IDH status (without mutation, p = 5:918e − 03, Figure 5(e); with mutation, p = 1:032e − 01, Figure 5(f)), implying that it is necessary to integrate the risk score into the clinical prognostic factors. Thus, a nomogram was then constructed based on these independent prognostic factors ( Figure 6(a)). The excellent prognostic performance of the nomogram was confirmed by calibration (approximate to the 45-degree line for 3-and 5-year OS prediction) (Figure 6(b)) and discrimination (AUC = 0:845 and C-index = 0:928; both higher than age, IDH status, and risk score model alone) ( Table 4; Figure 6(c)).

Discussion
In this study, we, for the first time, attempted to develop an energy metabolism-related prognostic signature for LGG patients based on the lncRNAs and mRNAs. As a result, a prognostic risk score established by 9 energy metabolismassociated lncRNAs (GABPB1-AS1, HAR1A, LINC00599, SNAI3-AS1, and SNHG1)-mRNAs (FABP6, MBOAT7, SLC25A1, and UST) was generated. This risk score was demonstrated to be an independent prognostic factor for OS prediction, with the predictive accuracy reaching 82.7% for TCGA and 80.6% for the CGGA dataset, respectively, which seemed to be higher than the signature established by lncRNAs and mRNAs alone previously for LGG, such as Zhou et al. . Furthermore, the poor prognosis of LGG was traditionally determined according to clinical characteristics, including older age [23] and IDH1 nonmutational status [24,25]. Thus, whether the risk score was superior or added additional prognostic value to these current clinical systems for prognosis prediction was also an important focus in the signature studies [10,22,26]. In the present study, we performed the stratification, AUC, C-index calculation, and nomogram analyses to confirm it. In line with previous studies [10,22,26], our results showed that the OS of patients with the same age and IDH1 mutation status can be further stratified by the risk score. Also, the AUC and C-index of risk score were obviously higher than age if age, IDH1 mutation, and the risk score were combined. These findings suggested that our identified combination (risk score+age+IDH1 mutation) may be a promising biomarker for clinical prediction of the outcomes in LGG patients. Although our molecular prognostic signature was new for LGG patients, several lncRNAs included had been demonstrated to be related to the progression and prognosis for glioma. For example, the study of Luan et al. revealed that GABPB1-AS1 was a protective factor for the poor OS in patients with glioma (HR = 0:668, 95%CI = 0:494-0:904) and GABPB1-AS1 may be involved in glioma by regulating autophagy-related genes [27]. Zou et al. reported that HAR1A was significantly downregulated in patients with GBM compared with nontumor controls (logFC = −2:873, p = 2:98e − 11). Multivariate analysis showed that low HAR1A expression was an independent prognosis factor for OS of glioma patients (HR = 1:6, p = 0:021) [28]. Fu et al. found that the expression of LINC00599 was reduced in LGG and GBM tissues as well as glioma cell lines compared with  Figure 5: Stratification analysis based on age (a) and IDH1 mutation (d) which were also independent prognostic factors for overall survival in addition to the risk score. The Kaplan-Meier curve showed significant differences in overall survival between the high-risk group and the low-risk group in different ages (b, c) and IDH1 mutation status (e, f). HR: hazard ratio; y: year; IDH: isocitrate dehydrogenase. 9 BioMed Research International normal brain tissues or human astrocytes. Low LINC00599 expression was associated with poor disease-free survival and OS of glioma patients. In vitro study implied that overexpression of LINC00599 inhibited cell migration and invasion through blocking the epithelial-mesenchymal transition process [29]. Liu et al. observed that SNHG1 was overexpressed in glioma tissues and cell lines. In vitro and in vivo assays suggested that SNHG1 promoted glioma progression by functioning as a sponge for miR-194 and then inducing the high expression of pleckstrin homology like domain family A, member 1 (PHLDA1) [30]. Li et al. elucidated that SNHG1 may regulate the malignant behavior of glioma cells by binding to miR-154-5p or miR-376b-3p and then enhancing the expression of downstream target of both miR-154-5p and miR-376b-3p FOXP2 [31]. Kaplan-Meier analysis of Wang et al. showed that high expression of SNHG1 was significantly associated with poor OS. Functional studies demonstrated that knockdown of SNHG1 suppressed glioma cell proliferation and cell invasion and increased cell apoptosis [32]. In line with these studies, we also demonstrated that GABPB1-AS1 and SNHG1 were highly expressed, while HAR1A and LINC00599 were lowly expressed in LGG compared with normal brain tissues. They were all OS-related genes for LGG. However, their functions in glioma remain not well understood. In this study, we predicted that these four lncRNAs may play crucial roles in LGG by regulating the transcription of energy metabolismrelated genes, including GABPB1-AS1-PON2, HAR1A-CYP46A1/HK1, LINC00599-INPP5J, and SNHG1-GPC2. The roles of these energy metabolism-related genes had been implicated for cancers. PON2 is a member of the paraoxonase family and had been demonstrated to exert an antioxidative function by improving mitochondrial efficiency to reduce reactive oxygen species production. Theoretically, PON2 should be downregulated in cancers exposed to anoxia. However, several studies showed that PON2 expression was obviously increased in gastric cancer [33] and bladder cancer [34] tissues compared with normal tissue samples. Overexpression of PON2 led to a significant increase in tumor cell proliferation and resistance to oxidative stress [34], while silencing of PON2 expression inhibited cancer cell proliferation, migration, and invasion [24]. Patients with high PON2 expression had shorter OS compared with those having low PON2 expression [23]. These findings indicated that high expression of PON2 may represent a protective stress response. It was reported that CYP46A1, a brain-specific enzyme that converts the cholesterol into 24(S)-hydroxycholesterol (24OHC), was

10
BioMed Research International significantly decreased in GBM samples compared with normal brain tissue. Low expression of CYP46A1 was associated with increasing tumor grade and poor prognosis in human gliomas [35]. Overexpression of CYP46A1 or the use of activator suppressed cell proliferation and in vivo tumor growth by increasing 24OHC levels [25]. Malignant tumors often rely on glycolysis to produce ATP (that is, Warburg effect), but not tricarboxylic acid cycle. Thus, glycolytic enzymes may play important roles for cancer. Hexokinase (HK) is the first rate-limiting enzyme to phosphorylate glucose to form glucose-6-phosphate (G-6-P). Theoretically, all hexokinase isozymes (HK1 to HK4) should be highly expressed for tumor initiation and maintenance. However, a study showed that HK2 was upregulated [36] but HK1 was downregulated to increase glycolysis and accelerate tumor growth and metastasis [37]. This result may be attributed to the underlying regulation  Points   0  10  20  30  40  50  60  70  80  90  100   90  80  70  60  50  40  30  20  10   Total points  0  20  40  60  80  100  120  140  160  180 3   [37]. INPP5J is a negative regulator for PI3K/AKT signaling and thus may function as a tumor suppressor, which was demonstrated in breast cancer [38], ovarian cancer [39], hepatocellular carcinoma [40], and oesophageal squamous cell carcinoma [41]. GPC2 is a member of the human glypican family of proteins that mediate neuronal cell adhesion and neurite outgrowth by attaching to the cell surface via a GPI anchor. Hereby, GPC2 protein is highly expressed in brain tumors, which had been validated in neuroblastoma [42,43]. In accordance with these studies, we also identified that PON2 and GPC2 were upregulated, while CYP46A1, HK1, and INPP5J were downregulated in LGG compared with normal control (Supplementary Table 1).
The published study on SNAI3-AS1 was rarely reported, except one for hepatocellular carcinoma, in which highly expressed SNAI3-AS1 was shown to be correlated with shorter OS; knockdown of SNAI3-AS1 inhibited cell proliferation and metastasis, whereas inverse conclusions were obtained with overexpression of SNAI3-AS1 in vitro. Functional investigations showed that SNAI3-AS1 may affect tumorigenesis by inducing tumor epithelial to mesenchymal transition via regulating the UPF1/Smad7 signaling pathway [44]. Unfortunately, our results showed that SNAI3-AS1 was downregulated in LGG, suggesting that SNAI3-AS1 may be tissue-specifically expressed and a new target for LGG. Furthermore, we speculated that SNAI3-AS1 may function in LGG by coexpressing with HK1 and INPP5J as described above.
In addition to lncRNAs, four energy metabolism-related genes (FABP6, MBOAT7, SLC25A1, and UST) were also included into the prognostic signature, indicating their importance for LGG. Some genes had been demonstrated to be related to the development and progression of cancer previously. For example, a gastric cancer risk allele carrier was observed to have downregulated expressions of MBOAT7 [45]. Overexpression of mitochondrial citrate carrier (SLC25A1) was proved to be associated with reduced survival of lung cancer patients [46]. The mechanisms of SLC25A1 in cancer referred to its antioxidant defense and maintenance of the self-renewal capability of cancer stem cells [46,47]. Knockdown of UST could significantly reduce migration and adhesion in mouse melanoma cells and pulmonary metastasis in mice [48]. However, no studies focused on glioma, suggesting they may also represent new biomarkers for LGG.
There were some limitations in this study. First, the expression of crucial lncRNAs and mRNAs should be verified with quantitative PCR, western blotting, or immunohistochemistry in LGG and normal brain tissues with larger sample size. Second, the prognosis ability of our risk score and combined model should be validated in newly hospitalized LGG patients. Third, in vitro and in vivo experiments are also essential to confirm the coexpression mechanisms of our identified lncRNAs (GABPB1-AS1-PON2, HAR1A-CYP46A1/HK1, LINC00599-INPP5J, and SNHG1-GPC2). Fourth, lncRNAs can function as competing endogenous RNAs (ceRNAs) to target mRNAs by sponging miRNAs. It may be a potential direction in the future to explore a signature established by lncRNA-miRNA-mRNA for prognosis prediction and reveal possible ceRNA mechanism axes for LGG as reported in other cancers [49]. Fifth, proteomics analysis also should be conducted in LGG and normal brain tissues in order to identify a protein alone or lncRNA-protein signature for prognosis prediction [50].

Conclusion
Our present study provided a new prognostic biomarker for LGG based on energy metabolism mechanisms. This prognostic signature consisted of five lncRNAs (GABPB1-AS1, HAR1A, LINC00599, SNAI3-AS1, and SNHG1) and four mRNAs (FABP6, MBOAT7, SLC25A1, and UST), which could classify patients into high-and low-risk subgroups exhibiting significantly different OS. Furthermore, this risk score was also combined with clinical characteristics (age, IDH1 mutation) to establish a prognostic nomogram, which may be more applicable for clinical use.

Conflicts of Interest
The authors declare that they have no competing interests.