Identification of Hypoxia-Related Differentially Expressed Genes and Construction of the Clinical Prognostic Predictor in Hepatocellular Carcinoma by Bioinformatic Analysis

Background . Hypoxia closely relates to malignant progression and appears to be prognostic for outcome in hepatocellular carcinoma (HCC). Our research is aimed at mining the hypoxic-related genes (HRGs) and constructing a prognostic predictor (PP) model on clinical prognosis in HCC patients. Methods . RNA-sequencing data about HRGs and clinical data of patients with HCC were obtained from The Cancer Genome Atlas (TCGA) database portal. Di ﬀ erentially expressed HRGs between HCC and para-carcinoma tissue samples were obtained by applying the Wilcox analysis in R statistical software. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes were used for gene functional enrichment analyses. Then, the patients who were asked to follow up for at least one month were enrolled in the following study. Cox proportional risk regression model was applied to obtain key HRGs which related to overall survival (OS) in HCC. PP was constructed and de ﬁ ned, and the accuracy of PP was validated by constructing the signature in a training set and validation set. Connectivity map (CMap) was used to ﬁ nd potential drugs, and gene set cancer analysis (GSCA) was also performed to explore the underlying molecular mechanisms. Results . Thirty-seven di ﬀ erentially expressed HRGs were obtained. It contained 28 upregulated and 9 downregulated genes. After the univariate Cox regression model analysis, we obtained 27 prognosis-related HRGs. Of these, 25 genes were risk factors for cancer, and 2 genes were protective factors. The PP was composed by 12 key genes (HDLBP, SAP30, PFKP, DPYSL4, SLC2A1, HMOX1, PGK1, ERO1A, LDHA, ENO2, SLC6A6, and TPI1). GSCA results showed the overall activity of these 12 key genes in 10 cancer-related pathways. Besides, CMap identi ﬁ ed deferoxamine, crotamiton, talampicillin, and lycorine might have e ﬀ ects with HCC. Conclusions . This study ﬁ rstly reported 12 prognostic HRGs and constructed the model of the PP. This comprehensive research of multiple databases helps us gain insight into the biological properties of HCC and provides deferoxamine, crotamiton, talampicillin, and lycorine as potential drugs to ﬁ ght against HCC.


Introduction
Hypoxia is a hallmark of the tumor microenvironment and contributes to tumor malignant phenotypes. It presents in the majority of tumors, and the distribution of hypoxic regions in tumors is heterogeneous [1]. Hypoxia has been considered as a vital sign of poor prognosis in almost all solid tumors, including the hepatocellular carcinoma (HCC) [2]. Hypoxia can lead to changes in the proteome and/or genome.
The mechanism may be related to cells' continued access to nutrients, escape from unsatisfactory microenvironments, and promote unrestricted growth to accelerate tumor progression [3]. Tumor cells in hypoxic regions are closely associated with treatment resistance, local recurrence, and distant metastasis [4]. All these can be regarded as important factors for poor prognosis of patients.
Liver cancer is the fourth leading cause of cancer death in 2018. It kills about 782,000 people annually [5,6]. It has been predicted that liver cancer may become the sixth most commonly diagnosed cancer in the future. HCC comprises 75%-85% of primary cancer of the liver. Unfortunately, the exact mechanism and pathways leading to HCC development are still unclear explanation [7]. In particular, the absence of indicators is related to the prognosis of HCC. The 5-year survival rate is still very low [8,9]. Therefore, finding the key molecular biomarkers focused on hypoxia which is related to the occurrence and development of HCC patients has certain reference value for reliable estimation of the deterioration of HCC and may be an effective measure to against HCC.
In recent years, progress using high-throughput sequencing technologies has greatly expanded the research of cancer genome. Here, we compared the expression profile of hypoxic-related genes (HRGs) between HCC tissue samples and para-carcinoma tissue samples which were collected from The Cancer Genome Atlas (TCGA) database [10]. Furthermore, in order to clarify the correlation between the HRGs and clinical outcomes in HCC patients, the prognostic predictor (PP) was developed and validated as an independent indicator to assess the overall survival (OS) of patients. We also tried to find potential molecular drugs to fight against HCC. These findings may provide new research targets and therapeutic strategies for HCC patients and provide a reliable theoretical basis for judging treatment outcomes and assessing prognosis.

Materials and Methods
2.1. Data Acquisition. The Molecular Signatures Database (MSigDB, http://software.broadinstitute.org/gsea/msigdb/ index.jsp) is a collection of annotated gene sets for GSEA software. A variety of HRGs were extracted from the Hallmark hypoxic gene sets in the MSigDB [11,12], which contains a list of 200 HRGs involved in hypoxic regulatory pathways reported in the literature. RNA-sequencing data about HRGs and clinical data of patients with HCC were obtained from the TCGA data portal. The dataset consists of 374 HCC tissue samples and 50 para-carcinoma tissue samples. We performed on these samples to obtain the differentially expressed HRGs. Then, in order to build a model related to the prognosis of HCC, a total of 349 patients who were asked to follow up for at least one month were enrolled in our study. The follow-up periods of included patients were for a range of 30 days to 3675 days.

Differentially Expressed HRG Enrichment Analysis.
We screened for differences in HRG expression between HCC tissue samples and para-carcinoma tissue samples by applying the Wilcox analysis in R statistical software. The significantly differentially expressed HRGs were defined as the p value is less than 0.05; the gene expression changes at least 1.5-fold. To explain the main biological properties of HRGs, the tools of Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) were used for gene functional enrichment analyses. GO and KEGG terms with a p value and q value both smaller than 0.05 were considered significant categories. To take full advantage of the well-known GO and KEGG themes and highlight the most relevant GO terms associated with a given gene list, these results of gene enrichment were visualized by R language clustering package [13].

Construction of the Individualized Prognostic Predictor
Model Based on HRGs. We extracted the expression profile of HRGs downloaded from TCGA in the FPKM format. Univariate Cox regression analysis was performed to screen HRGs whose expression profile was significantly correlated with OS in HCC patients from 200 hypoxic-related genes (p < 0:01). Multivariate Cox regression analysis was performed on these survival-related genes to remove genes that might not be independent indicators of prognosis. We got the optimal solution of the model by narrowing down the relevant genes step by step. Finally, the PP was constructed and significantly divided patients of HCC into high-and low-risk groups according to OS. Kaplan-Meier (K-M) method was used to plot the survival curve. The survival rates of the two groups were tested by log-rank test.
To further verify the hypoxia-related gene model and whether the PP could be used as an independent predictor of prognosis for HCC patients, we divided the TCGA data set into a training set and a validation set, and performed survival analysis on the hypoxia-related signature in the two data sets to illustrate the robustness of the PP. Besides, we introduced several following indicators closely related to the clinic into the model as covariables for multivariate Cox regression analysis: PP and age were coded as continuous variables, gender (male or female), tumor grade (high or low), AJCC TNM stage (I = 1, II = 2, III = 3, and IV = 4), tumor stage [1][2][3][4], lymph node metastasis (positive or negative), and distant metastasis (positive or negative).

Identification of Potential Therapeutic
Agents. Connectivity map (CMap) was used to identify potential therapeutic agents for HCC. It is a program that predicts potential drugs and produces biological states by encoding specific gene expression characteristics [14]. 28 upregulated and 9 downregulated HRGs were used to check the CMap database. Finally, the enrichment score representing similarity was calculated, ranging from -1 to 1. The positive connectivity score indicates that drugs can induce the biological phenomena queried in human cell lines. In contrast, a negative connectivity score indicates that the drug reverses the requested biological characteristics and has potential therapeutic value. With p < 0:05 as the cut-off value, the connectivity scores of different instances were filtered out. We carried out studies on these potentially effective drugs in PubChem.

Gene Set Cancer Analysis (GSCA).
To further explore the underlying molecular mechanisms involved in these modeling genes, GSCA (http://bioinfo.life.hust.edu.cn/web/GSCALite/) [15] was used to obtain the activity of 10 cancer-related pathways. Genes were divided into high and low group by median expression. Student's T test was used to define the difference of pathway activity score (PAS) between groups. p value was adjusted by FDR (FDR ≤ 0:05 is considered as significant).     BioMed Research International 2.6. Statistical Analysis. All statistical analyses were conducted using SPSS 22.0 (Chicago, IL, USA) and R 3.6.0 (https://www.r-project.org/); Adobe Illustrator CS 5 (San Jose, California, USA) was performed to draw plots. Univariate Cox regression analysis was used for illustrating the correlation between genes expression profiles and OS. The multivariate Cox proportional hazard regression model was used for performing multivariate analysis and construct prognostic models. Receiver-operating characteristic (ROC) curve and the corresponding area under the ROC curve (AUC) were applied for verifying that PP was accurate. All tests were performed bilaterally, p < 0:05 indicates a difference with statistical significance.

Differentially Expressed HRGs.
A total of 374 HCC tissue samples and 50 para-carcinoma tissue specimens were obtained for RNA-seq and clinical data from TCGA. A list of 200 HRGs involved in hypoxic regulatory pathways reported in the literature was extracted from the Hallmark hypoxic gene sets in the MSigDB database. According to the limitation of a FDR < 0:05 and |log2 ðFold ChangeÞ | > 1:5, our study screened 37 genes with significant differential expression, of which 28 were upregulated and 9 were downregulated (Figures 1(a) and 1(b)). Then, we visualized the expression patterns of these HRGs between HCC and normal 10 Gene expression 5

Functional Annotation and Enrichment Analysis of Differentially Expressed HRGs.
To further understand the biologically relevant information of these 37 differential HRGs, we performed functional enrichment and enrichment analysis. The GO term function ( Figure 2) and KEGG pathway enrichment ( Figure 3) of these genes were reviewed, respectively. We found the results that the GO terms for biological processes were carbohydrate biosynthetic process, NADH regeneration, canonical glycolysis, glucose catabolic process to pyruvate, and so on. The detailed results of all the above gene enrichment are shown in Figure 2.
Through the enrichment analysis function in the KEGG pathway, we found many important pathways associated with these genes such as HIF-1 signaling pathway, glycolysis/gluconeogenesis, carbon metabolism, MAPK signaling pathway, and PI3K-Akt signaling pathway ( Figure 3). As shown in the circle plot, the change from the Z-score indicated mostly related signaling pathways were more inclined to be increased.
In order to better evaluate the prognosis and survival time of patients, an optimal equation for the prognostic predictor (PP) was further conducted by multivariate Cox proportional hazard regression model. Finally, 12 genes (HDLBP, SAP30, PFKP, DPYSL4, SLC2A1, HMOX1, PGK1, ERO1A, LDHA, ENO2, SLC6A6, and TPI1) were obtained and incorporated into the final model. Among these genes, 9 genes (HDLBP, SAP30, PFKP, DPYSL4, SLC2A1, HMOX1, ERO1A, LDHA, and TPI1) were risk factors for HCC patients, and 3 genes (PGK1, ENO2, and SLC6A6) were protective factors (     OS of patients with HCC. Conversely, the negative coefficient of the gene means that these genes were beneficial for increasing the OS of HCC patients.

Validation the Value of PP in Evaluating Patients'
Clinical Prognosis. To validate the performance of PP in evaluating patients' clinical prognosis, we divided patients into the high-and low-risk group with the median risk score as the cut-off point according to the risk score formula ( Figure 5(a)). The K-M plots were plotted to analyze the different survival time between the two groups. The K-M analysis showed that patients in the high-risk group suffered significantly worse survival than those in the low-risk group ( Figure 5(b)).
All these results indicated that the prediction ability of the equation was very well. Furthermore, the distribution of patients according to their prognostic risk was also showed (Figures 5(c) and 5(d)). We could see intuitively that the high-risk group had a much higher number of deaths than the low-risk group.
3.6. The Accuracy of PP Was Verified by Combining with Clinical Parameters. To further validate the accuracy of PP in predicting OS in HCC patients, our research performed the 1-, 3-, and 5-year ROC curves by integrating age, gender, tumor grade, and AJCC TNM stage. We got the AUC value of the 1-, 3-, and 5-year risk score were all >0.7, which were much bigger than other clinical parameters (Figure 6(a)). Thus, compared with other indicators, PP had important reference value in predicting patients' prognosis.
Furthermore, we used four-fifths of the samples that met the inclusion criteria from the TCGA data set as the training set, and the remaining one-fifth samples as the validation set. The results of the two data sets with survival analysis showed that the survival probability of the high-risk group was significantly lower than that of the low-risk group, and the difference was significant (Figures 6(b) and 6(c)). Risk score remained as an independent prognostic indicator for HCC patient in univariate and multivariate analyses, after adjusting for clinicopathological features such as age, gender, tumor grade, and AJCC TNM stage (HR = 1:484, 95%CI = 1:342 -1:642, p < 0:001, and HR = 1:414, 95%CI = 1:258 -1:588, p < 0:001, Figures 6(d) and 6(e)).

Relationship between PP and Clinical Parameters.
To further clarify the important value of PP in clinical application, we explored the relationship between risk score and clinical indicators by using the "Beeswarm" software package. The relationship between risk score of the high-and low-risk group of HCC patients and the six clinical indicators (age, gender, tumor grade, and AJCC TNM stage) was verified by the "Beeswarm" software package; p < 0:05 was considered to be statistically significant. The final results showed that the risk score calculated by the PP model was consistent with    (Figure 7(a)) and T stage (Figure 7(b)). The higher the patient's risk score, the higher the corresponding tumor stage and T stage, and the patient's prognosis was poor.
3.8. Potential Therapeutic Drug Screening. In order to screen out potential therapeutic drugs, we analyzed 28 upregulated and 9 downregulated HRGs with CMap. We got the top 4 of small molecules: deferoxamine, crotamiton, and talampicillin showed a highly positive correlation with HCC; lycorine showed a highly negative correlation with HCC. They all might become effective drugs for treating HCC. The 2D chemical structural formulas of the top 4 most promising molecular drugs from PubChem were shown in Figures 8(a)-8(d).
3.9. Gene Set Cancer Analysis (GSCA). In order to further explore the underlying molecular mechanisms of the 12 core genes involved in the PP, GSCA was used to obtain the activity of 10 cancer-related pathways. Figure 9 showed the overall activity of these 12 key genes in 10 cancer-related pathways.

Discussion
HCC is a major contributor of death caused by cancer. Previously, many progresses have been made in understanding the epidemiology, risk factors, and molecular profiles of HCC [16]. However, incidence and HCC-specific mortality still continue to increase [17][18][19]. Although some progress has been made in molecular targeted therapy for HCC, the results are still unsatisfactory [20]. It is urgently needed to better understand the molecular mechanism which leads to the development of HCC and to search for targeted genes that play a key role in the prognosis of HCC for early intervention.
Exploration of hypoxia mechanism opens new perspectives for HCC [21]. In particular, the announcement of the 2019 Nobel Prize in physiology or medicine has increased researchers' enthusiasm for exploring the mechanism of  BioMed Research International hypoxia in the field of cancer. Increasing evidence indicates that hypoxia is closely related to the migration and malignant progression of HCC [8,22]. However, most studies have only focused on hypoxia by studying signaling genes [23,24]. With the development of bioinformatics and the continuous improvement of high-throughput sequencing technology, some large-scale databases, including TCGA and GEO, have emerged. These databases provide effective means for sorting out gene signatures. In the current study, we deeply mined all the differential genes which are significantly expressed in tumors and normal tissues. By integrating them with a large sample of clinical data, we aimed to select molecular biomarkers for detecting the prognosis of HCC patients. We first screened 37 differentially expressed HRGs between HCC and para-carcinoma tissues. To discover the role of genes in HCC progression, GO and KEGG analysis of the differential expression genes was performed. Gene functional enrichment analysis suggested that these genes were mainly involved in carbohydrate biosynthetic process and HIF-1 signaling pathway. The process of carbohydrate biosynthesis was mainly related to the energy metabolism of cancer cells. HIF-1 and its related genes actively promoted HCC growth, HCC cell proliferation, and aggressive behavior, which was positively associated with the presence of intrahepatic metastasis and the histological grade of HCC [8,25]. Our analysis confirmed that HIF-1 signaling pathway was the most critical process for hypoxia leading to poor prognosis of HCC, which is consistent with the findings in most literatures [26,27]. As such, these genes could also act as clinical biomarkers for monitoring metastasis, assessing survival, and potential drug targets. Bioinformatics analysis provided several clues intervening the occurrence and development of HCC via several hypoxic pathways. The univariate Cox regression analysis revealed that 27 HRGs were associated with OS in the TCGA database. Further multivariate Cox regression analysis helped us determine 12 vital prognostic HRGs (HDLBP, SAP30, PFKP, DPYSL4, SLC2A1, HMOX1, PGK1, ERO1A, LDHA, ENO2, SLC6A6, and TPI1) to construct of the PP. There are already several existed reports about partial of these genes.
HDLBP (high-density lipoprotein-binding protein), a positively regulated gene, is closely related to the cancer process [28]. HDLBP can promote HCC cell proliferation and tumorigenesis. Consistent with many other studies, HDLBP was overexpressed in other types of cancers [29]. Intriguingly, in breast cancer, HDLBP may be a tumor suppressor to accelerate the degradation and inhibit the translation of the c-fms protooncogene mRNA [30]. These observations suggested that the mechanism of HDLBP (either inhibit or promote carcinogenesis) was still unclear. SAP30 (serum amyloid P30) was a very sensitive indicator of liver disease [31]. It has been demonstrated that SAP upregulated in serum samples from HCC patients. HMOX1 (heme oxygenase 1) is an isoenzyme of heme oxygenase (HO). HO is a microsomal enzyme system that degrades heme. Their main biological role is to degrade heme to produce carbon 11 BioMed Research International monoxide (CO) and bilivertin and release iron ions. HMOX1 has been proved to be an oncogene and plays an important role in the development of various cancers. Inhibition of HMOX1 degradation can definitely promote HCC proliferation. LDHA (lactate dehydrogenase A) is a vital enzyme responsible for cancer growth and energy metabolism via the aerobic glycolytic pathway. Inhibition of LDHA could inhibit tumor-initiating cell survival and proliferation, which indicated that LDHA may be a potential therapeutics target [32]. PGK1 (phosphoglycerate kinase 1) is an important enzyme in the metabolic glycolysis pathway. Many articles have observed a significant overexpression of PGK1 in HCC tissues and a negative correlation between PGK1 expression and HCC patient survival. Depletion of PGK1 dramatically reduced cancer cell proliferation and tumorigenesis [33,34]. These findings demonstrated a novel role of PGK1 in HCC progression. ENO2 (Enolase 2) encodes an enolase isoenzyme which is regarded as a sensitive biomarker for neuroendocrine tumors [35]. Additionally, ENO2 was believed to be an indicator in the diagnosis and prognosis rather than a potential target for therapy in renal tumor patients [36]. In our study, ENO2 was defined as a negative regulatory gene. It provided new therapeutic targets and biomarkers for HCC prognosis through inhibiting its expression to improve the prognosis of HCC. Other genes were rarely reported; their roles and underlying mechanism in cancer still need to be further explored.
According to the current study, some features of HCC prognosis based on expression profile have been found by large public databases. However, the purpose of these studies was often limited to mining new molecular markers but neglecting routine clinical parameters [37,38]. Thus, we integrated patient clinical information to establish a PP model for assessing prognosis. Firstly, we verified the accuracy of the model through the K-M survival analysis. The risk score was calculated by PP formula, and the median was taken as the cut-off point dividing the high-and low-risk group, and the survival difference was significant ( Figure 5(a)). Secondly, 12 BioMed Research International we performed the 1-, 3-, and 5-year risk score ROC curve analysis by integrating age, gender, tumor grade, and AJCC TNM stage and found that the AUC value of 1-, 3-, and 5year risk score were all much bigger than other clinical parameters ( Figure 6(a)). Besides, we divided the TCGA data set into a training set and a validation set and performed survival analysis on the hypoxia-related signature in the two data sets to illustrate the robustness of the PP. The results of the two data sets with survival analysis showed that the survival probability of the high-risk group was significantly lower than that of the low-risk group, and the difference was significant (Figures 6(b) and 6(c)). Similarly, we found that the risk score was still an independent indicator for evaluating the prognosis of patients with HCC through the process of univariate and multivariate analysis (Figures 6(d) and 6(e)). At last, by comprehensively analyzing the risk score and the patient's tumor TNM stage, we found that the greater the risk score calculated by PP, the higher the patient's tumor grade and the worse the prognosis. All of the above fully demonstrated the accuracy of our constructed model of PP and its important value in evaluating patients' prognosis.
The results of CMap demonstrated the therapeutic agents closely associated with HCC. Substantial evidence suggests that iron overload enhances cancer growth and metastasis. As a commonly used iron chelator, high-dose deferoxamine therapy could inhibit tumor migration by destroying iron homeostasis, arresting the cell cycle and enhancing apoptosis of hepatocellular carcinoma and hepatoblastoma cell lines [39]. The iron chelator deferoxamine has been proved that it could prevent liver injury as well as the development of preneoplastic lesions in rats. In particular, as a potential anticancer drug, it has been used in clinical studies to confirm its efficacy in patients with advanced hepatocellular carcinoma [40]. Lycorine is a multifunctional bioactive compound and powerful anticancer agent against various cancer cell lines. Focusing on hepatocellular carcinoma, lycorine can significantly induce apoptosis and autophagy of hepatocellular carcinoma cells both in vivo and in vitro [41]. The possible molecular mechanism involved the TCRP1/Akt/mTOR pathway. Lycorine inhibited TCRP1 to reduce the phosphorylation level of Akt and inhibited Akt/mTOR signal transduction and finally mediated the apoptosis and autophagy of xenograft hepatocellular tumors without remarkable toxicity. Similarly, research by Hu et al. suggested that lycorine could be used to treat colorectal cancer by inducing autophagy-associated apoptosis [42]. Additionally, lycorine could inhibit the proliferation of SGC-7901 gastric cancer cells and promote cell apoptosis [43]. The specific mechanism of treatment was exactly consistent with the results of GSCA for modeling gene pathway analysis. However, there were few reports about crotamiton and talampicillin in the treatment of cancer. Their anticancer mechanisms need to be further explored and studied.
In conclusion, through a comprehensive bioinformatics analysis with HRG expression profiles, our study firstly reported 12 prognostic HRGs and constructed the model of the PP. This comprehensive research of multiple databases helps us gain insight into the biological properties of HCC and provides deferoxamine, crotamiton, talampicillin, and lycorine as potential molecular drugs to fight against HCC.

Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.