Identification of Glucose Metabolism-Related Genes in the Progression from Nonalcoholic Fatty Liver Disease to Hepatocellular Carcinoma

Nonalcoholic fatty liver disease (NAFLD) is a manifestation of hepatic metabolic syndrome that varies in severity. Hepatocellular carcinoma progresses from NAFLD when there is heterogeneity in the infiltration of immune cells and molecules. A precise molecular classification of NAFLD remains lacking, allowing further exploration of the link between NAFLD and hepatocellular carcinoma. In this work, a weighted gene coexpression network analysis was used to identify two coexpression modules based on multiple omics data used to differentiate NAFLD subtypes. Additionally, key genes in the process of glucose metabolism and NAFLD were used to construct a prognostic model in a cohort of patients with hepatocellular carcinoma. Furthermore, the specific expression of signature genes in hepatocellular carcinoma cells was analyzed using a single-cell RNA sequencing approach. A total of 19 liver tissues of NAFLD patients were obtained from the GEO database, and 81 glucose metabolism-related genes were downloaded from the CTD database. In addition, based on nine signature genes, we constructed a prognostic model to divide the HCC cohort into high and low-risk groups. We also demonstrated a significant correlation between prognostic models and clinical phenotypes. Furthermore, we integrated single-cell RNA-sequencing data and immunology data to assess potential relationships between different molecular subtypes and hepatocellular carcinoma. Finally, our study discovered that the glucose metabolism pathway may play an important role in the process of NAFLD-hepatocellular carcinoma. In addition, three glucose metabolism-related genes (SERPINE1, VCAN, and TFPI2) may be the potential targets for the immunotherapy of patients with NAFLD-hepatocellular carcinoma.


Introduction
Globally, nonalcoholic fatty liver disease (NAFLD) afects approximately 25% of the adult population, making it the most common chronic liver disease [1]. As part of NAFLD, there are several types of liver disease, such as simple steatosis and nonalcoholic steatohepatitis with varying levels of fbrosis and even cirrhosis [2]. Since obesity and metabolic syndrome are becoming more prevalent, NAFLD has become the leading cause of abnormal liver enzymes in the United States [3]. About 25% of the world's population may sufer from NAFLD, which afects 1 billion people worldwide [4]. A substantial diference exists in the prevalence of NAFLD in diferent parts of the world. NAFLD prevalence is highest in the Middle East and South America and lowest in Africa [5]. As many as 80 million people in the U.S. may have NAFLD. An individual with 5% hepatocyte infltration with steatosis is considered to have NAFLD when they undergo imaging or liver biopsy testing [6]. Te majority of people with NAFLD are asymptomatic, and they may remain silent until they develop cirrhosis [7]. Patients with NAFLD often sufer from fatigue and pain in the right upper quadrant when they are initially referred. Individuals with NAFLD may have an echogenic liver on ultrasound or evidence of liver fat on imaging studies [8]. Cardiovascular disease is the leading cause of death among NAFLD patients, followed by cancer and liver disease [9]. Te adjusted hazard ratio for cardiovascular disease in nonobese people with NAFLD was approximately 10 times higher than in individuals without NAFLD in a Japanese cohort [10].
Glucose metabolism in the liver is critical to protein and lipid glycosylation. Diabetes and other chronic diseases may have metabolic changes due to alterations in glucose metabolism in the human liver. Understanding the glucose metabolism pathways in the healthy liver may help to shed light on these changes [11]. It is believed that NAFLD results from the imbalance in the hepatic energy metabolism, where excessive energy enters the liver relative to its ability to oxidize it into carbon dioxide or very low-density lipoprotein [12]. Terefore, energy is accumulated in the liver in the form of triglycerides, which may explain the common occurrence of NAFLD in obese and lipodystrophic patients [13]. Although excessive consumption of any food can lead to the development of NAFLD, monosaccharides and disaccharides, especially fructose, sucrose, and high fructose corn syrup, which are prevalent in processed foods, can further exacerbate NAFLD by activating de novo lipogenesis programs in the liver [14]. Moreover, fructose is almost entirely metabolized by the liver, and dietary fructose is converted into triglycerides by de novo lipogenesis [15].
NAFLD has become the leading cause of hepatocellular carcinoma and end-stage liver disease in the past decade. It is now well established that hepatocellular carcinoma can develop in NAFLD without cirrhosis, even though it has previously been considered the end stage of liver disease progression [16]. It is estimated that liver cancer cells consume an enormous amount of energy during proliferation and escape from apoptosis [17]. Glucose metabolism and fatty acid oxidation are altered to support proliferation and escape apoptosis [18]. It is also possible that altered glucose metabolism can result in elevated levels of saturated and monounsaturated fatty acids, which may prevent oxidative damage to cancer cells [19].

Datasets Downloaded.
Genome-wide analysis of gene expression in NAFLD patients and healthy livers is downloaded from GSE89632 (https://www.ncbi.nlm.nih.gov/, GEO). In addition, glucose metabolism-related genes were downloaded from the Comparative Toxicogenomics Database (CTD, https://ctdbase.org/). Te gene expression data as well as the clinical information of hepatocellular carcinoma patients were downloaded from the Cancer Genome Atlas database (https://portal.gdc.cancer.gov/, TCGA). Single-cell RNA expression data from multiregional sampling in hepatocellular carcinoma were downloaded from GSE112271 in the GEO database.

Exploration of the Diferential Expression
Genes. Data on gene expression were obtained from the TCGA and GEO databases, and diferential expression of mRNA was investigated using the Limma package in R. An adjusted P value of 0.05 in TCGA or GEO was defned as a threshold to distinguish between mRNAs, while |log 2 (fold change) | > 1 was defned as a threshold for mRNA diferential expression screening. A gene annotation tool, the Gene Ontology (GO), is widely used to annotate genes with functions, particularly molecular functions (MFs), biological pathways (BPs), and cellular components (CCs). A KEGG enrichment analysis can be efective for analyzing gene function and related genomic functional information at a high level. An analysis of the KEGG pathway enrichment and GO function of underlying mRNAs was conducted using the ClusterProfler package in R to better understand the oncogenic potential of target genes.

Subtype of the Expression Data.
A consistency analysis was performed using the package ConsensusClusterPlus (v1.54.0), and heatmaps for gene expression were generated using genes with a variance greater than 0.1. R is used to implement all the above analysis methods.

Timer Database
Analysis. An analysis of the correlation between immune infltrating cells and tumor immunity was performed with the TIMER module (https://cistrome.shinyapps. io/timer/). Additionally, we used CellMarker to search for immune gene markers (https://biocc.hrbmu.edu.cn/CellMarker/). Te correlations between gene expression levels and markers for immune genes can be visualized using expression plots.

Construction of the Prognostic Prediction Model Based on
Glucose Metabolism Related Genes. Data and clinical information on hepatocellular carcinoma are downloaded from the TCGA dataset repository (https://portal.gdc.com). After extracting the data in TPM format from it and normalizing it to log2(TPM + 1), we retained samples with RNAseq data and clinical information. A KM survival analysis was conducted using the log rank to determine whether there was a statistically signifcant diference between the groups above in terms of survival. For the prediction model's accuracy, a timeROC analysis was performed. Te least absolute shrinkage and selection operator (LASSO) regression algorithm was used for feature selection, and 10-fold cross-validation was used. Te logrank test and univariate Cox regression were used for calculating P-values and hazard ratios (HR) with 95% confdence intervals (CI) for Kaplan-Meier curves. Statistical signifcance was defned as a P < 0.05 for all of the above analysis methods and R packages, which were performed using R software version 4.2.1.

Gene Set Enrichment Analysis (GSEA).
MSigDB was used to retrieve gene sets. GSEA was performed on the gene sets to identify enriched GO terms and KEGG pathways. Te 50 best terms were selected from each subtype based on their signifcance.  Genetics Research 3 2.7. Immune Scores, Immune Checkpoints, and Immunotherapy Responses. In order to explore the immune scores, we used immunedeconv, which is an R package integrating six state-of-the-art algorithms, including TIMER, xCell, MCP-counter, CIBERSORT, EPIC, and quanTIseq. Based on the TCGA dataset, we obtained clinical information about patients with hepatocellular carcinoma. SIGLEC15, TIGIT, CD274, HAVCR2, PDCD1, CTLA4, LAG3, and PDCD1LG2 are genes related to immune checkpoints, and the expression of genes related to immune checkpoints was evaluated in R. In addition, the TIDE algorithm is used to predict possible immunotherapy responses.

Preprocessing and Quality Control of Single Cell RNA-Seq
Data (10× Genomics). Te single-cell RNA-seq dataset was derived from the GEO database's supplementary fle. In addition to fltering out poor-quality cells using the Seurat package, standard data preprocessing pipelines were used to generate the objects. Genes with fewer than three cells detected were fltered, as were genes with fewer than 200 genes detected. A minimum of 10,000 cells were used in the analysis, and cells with fewer than 200 or more than 2,500 genes detected, as well as cells with a high mitochondrial content, were fltered out. By adjusting the scale factor to 10,000, we normalized each cell. Te ScaleData function from Seurat is used to normalize the data after it has been log-transformed. A normalized set of data measures was applied to standard analyses, as described in the Seurat R package. In UMAP, the frst 30 principal components are used for visualization and clustering. A cell clustering procedure was performed using the FindClusters function (resolution � 0.2) in the Seurat R package.  tissues and 19 liver tissues of NAFLD patients were involved in the GSE89632 cohort ( Figure 1(a)). Te diferential expression analysis between NAFLD patients and control groups was performed in R. Te results demonstrated that 925 genes were upregulated and 1158 genes were downregulated in the NAFLD patients compared with normal people (Figures 1(b)-1(c)). Te GO and KEGG enrichment analysis revealed that many pathways were closely correlated with NAFLD ( Figure 1(d)). In addition, in order to explore the role of glucose metabolism in the NAFLD patients, we then obtained a total of 81 glucose metabolism-related genes were downloaded from the CTD database. Te Venn diagram demonstrated that 9 key genes are involved in both the NAFLD and glucose metabolism pathways, including GCK, PPP1R3C, NHLRC1, ENO3, PPP2R5D, PFKFB3, PGM2, SLC25A12, and PFKP ( Figure 1(e)).

Exploration the Role of Key of Immune-Related Genes in the NAFLD Cohort.
Subsequently, based on the expression level of immune-related genes, the expression data of the NAFLD cohort were divided into high-and low-immune score groups (Figures 2(a)-2(b)). Te results revealed that the immune cells were diferentially expressed between the G1 and G2 groups. In addition, the G2 group shows a higher stromal score compared with the G1 group ( Figure 2(c)). While the immune score and estimate score show no difference between the G1 and G2 groups ( Figure 2(d)).

Te Subtype Based on 9 Key Genes Was Closely Associated with the Prognosis of Hepatocellular Carcinoma Patients.
In order to explore the relationship between NAFLD and hepatocellular carcinoma and fgure out the role of the glucose metabolism pathway in hepatocellular carcinoma induced by NAFLD, the patients involved in hepatocellular carcinoma were divided into C1 and C2 groups based on the expression level of 9 key genes. For concordance clustering, delta area curves indicate the change in the area under the cumulative distribution function (CDF) curve for each category number k compared to k − 1 (Figure 3(a)). Te ConsensusClusterPlus consistent clustering heat map shows red for high expressions and blue for low expressions when k � 2 (Figures 3(b)-3(d)). Tere are signifcant diferences between the overall survival rates of the C1 group and the C2 group according to the KM survival curves of diferent

Construction of the Prognostic Prediction Model Based on Glucose Metabolism Related Genes Involved in NAFLD in the Hepatocellular Carcinoma
Cohort. Subsequently, in order to further obtain the genes that are closely associated with the prognosis of hepatocellular carcinoma patients, we then performed the lasso regression analysis. Te lasso regression analysis revealed that three glucose metabolism-related genes involved in NAFLD were applied to the prognosis prediction model (the risk score � (0.1177) × SERPINE1 + (0.0046) × VCAN + (0.0141) × TFPI2) (Figure 4(a)). Depending on the median risk score, patients were categorized as either low-risk or high-risk groups. In addition, the Kaplan-Meier curve showed that the prognostic model was closely related to the prognosis of hepatocellular carcinoma patients. Furthermore, the ROC curve results show that the AUCs are all greater than 0.6 at 1, 3, and 5 years, which indicates that the model is of good predictive value (Figure 4(b)). Te expression level of B cells and CD4 + T cells is positively correlated with the risk score. In addition, the expression levels of endothelial cells, macrophages, and NK cells were negatively correlated with the risk score (Figure 4(c)). Te clinical correlation analysis revealed that the risk score is closely related to the T stage, stage, and grade of hepatocellular carcinoma patients (Figure 4(d)). We then evaluated the expression of SER-PINE1, VCAN, and TFPI2 in the hepatocellular carcinoma cohort of the TCGA cohort. Te results demonstrated that VCAN was downregulated in the hepatocellular carcinoma samples compared with normal samples (Figure 5(c)). While SERPINE1 and TFPI2 were upregulated in hepatocellular carcinoma samples compared with normal samples (Figures 5(a)-5(b)). Te KM survival curve revealed that VCAN is associated with the prognosis of hepatocellular carcinoma patients (P < 0.05) (Figures 5(d)-5(f)). Te timedependent ROC curve showed that the AUC value for TFPI2, SERPINE1, and VCAN was 0.866, 0.791, and 0.637, respectively ( Figure 5(g)). Our next step was to examine differences in immune checkpoint expression between the groups. A signifcant diference was observed between highand low-risk groups in the expression of CD274, CTLA4, HAVCR, LAG3, PDCD1, and TIGIT, which may be the potential targets for immunotherapy ( Figure 6(a)). An assessment of tumor immune escape mechanisms was conducted using the TIDE score. According to the TIDE score results, the low-risk group received immune checkpoint blockade therapy with low efcacy, indicating that they received an immune checkpoint blockade therapy that was not efective (Figure 6(b)). According to the immune cell scores, high-risk and low-risk groups had signifcantly diferent scores for B cells, CD4+ T cells, neutrophils, macrophages, and myeloid dendritic cells (Figure 6(c)).   (Figure 8). Figure 9 shows the distribution of cell proportions in diferent groups. Ten, we evaluated the expression level of SERPINE1, VCAN, and TFPI2 in human hepatocellular carcinoma cells. Te results demonstrated that SERPINE1 is rarely expressed in hepatocellular carcinoma cells. VCAN is specifcally expressed in B cells of hepatocellular carcinoma. In addition, TFPI2 is specifcally expressed in the monocytes of hepatocellular carcinoma.

Exploration of the Potential Function of SERPINE1, VCAN, and TFPI2 in the Hepatocellular Carcinoma Cohort.
Finally, in order to explore the function of 3 key genes (SERPINE1, VCAN, and TFPI2) in hepatocellular carcinoma patients, we then performed the GSVA enrichment analysis. Te results revealed that SERPINE1 is mainly enriched in a structural constituent of ribosome, ribosomal subunit, sensory perception of smell, organic acid catabolic process, and oxidative phosphorylation (Figure 10(a)). For VCAN, the GSEA enrichment analysis demonstrated that VCAN is closely associated with external encapsulating structure organization, collagen-containing extracellular matrix, extracellular matrix structural constituent, plasma membrane signaling receptor complex, skeletal system development, T cell receptor complex, and immune response regulating signaling pathway (Figure 10(c)). In terms of TFPI2, the results of GSEA enrichment analysis revealed that many pathways are involved in TFPI2, including immunoglobulin complex, structural constituent of ribosome, external encapsulating structure organization, antigen binding, large ribosomal subunit, T cell receptor complex, complement activation, humoral immune response, and ribosomal subunit (Figure 10(b)).

Discussion
Approximately 25% of the world's adult population sufers from NAFLD, which is the most common chronic liver disease [20]. Te prevalence of NAFLD has been found to increase with age and may even lead to cirrhosis or hepatocellular carcinoma in some studies [21]. Individuals maintain health by maintaining glucose homeostasis in order to meet the energy requirements of vital organs [22]. In addition to glycogenesis, glycogenolysis, glycolysis, and gluconeogenesis, the liver plays a vital role in controlling glucose homeostasis [23]. However, few studies focused on the role of glucose metabolism in hepatocellular carcinoma induced by NAFLD. In this work, we frst explore the genes that are closely related to NAFLD and glucose metabolism. Te results revealed that a total of 9 genes were closely correlated with NAFLD and glucose metabolism, including GCK, PPP1R3C, NHLRC1, ENO3, PPP2R5D, PFKFB3, PGM2, SLC25A12, and PFKP.
Te underlying problem with NAFLD is insulin resistance, a key factor in metabolic syndrome, which is also linked to type 2 diabetes and hypertriglyceridemia [24]. Patients with obesity may be at risk for NAFLD due to abnormal lipid and glucose metabolism [25]. Currently, most basic research appears to focus on insulin resistance as well as the failure of the liver to process glucose loads from a pathophysiological perspective [26]. A former study has discovered that the JKW modulates insulin signaling and glucose metabolism to alleviate NAFLD [27]. In addition, this study identifes scientifc evidence supporting the potential efcacy of JKW for the prevention and treatment of NAFLD [28].
In order to further explore the role of glucose metabolism in hepatocellular carcinoma induced by NAFLD, we then constructed a prognostic prediction model based on 9 key genes. We fnally discovered that SERPINE1, VCAN, and TFPI2 play an important role in hepatocellular carcinoma.
Recent studies have discovered that SERPINE1, VCAN, and TFPI2 are associated with many human tumors. Te former study revealed that sh-TARBP2 cells with miR-145 overexpression were rescued from SERPINE1 inhibition and functional hepatoma cells were restored, which could be an important new intervention target in aggressive hepatocellular carcinoma. Many studies have found that VCAN may be a risk factor in gastric cancer, breast cancer, and colorectal cancer [29]. In addition, VCAN is a promising biomarker for the prognostic prediction of gastric cancer patients, breast cancer patients, and colorectal cancer patients [30]. Zhao et al.have discovered that TFPI2 inhibits breast cancer progression by inhibiting the TWIST-integrin pathways, presenting a new therapeutic target [31]. As a biomarker used in the colorectal cancer cohort, VCAN may assist in identifying patients at high risk for postoperative complications during stages II and III [32]. According to another study, TFPI2 gene methylation is an independent predictor of poor prognosis in nonsmall cell lung cancer patients [33]. In addition, our further research has revealed that 3 key genes are associated with immune checkpoint blockade therapy and immunotherapy of hepatocellular carcinoma, which may suggest that immunotherapy could be an efective way to treat hepatocellular carcinoma induced by NAFLD [34].
Previous studies have focused on the screening of diferentially expressed biomarkers between tumor and nontumor tissues. It is possible to lose important genes when analyzing bulk transcriptome data from cell populations. Single cell-RNA sequencing analysis is therefore more useful in elucidating the underlying mechanisms of NAFLD and hepatocellular carcinoma. In this work, in order to explore the expression level of key genes in the diferent cells of hepatocellular carcinoma, we then performed single cell-RNA sequencing of hepatocellular carcinoma samples. Te results demonstrated that VCAN is specifcally expressed in B cells and is specifcally expressed in monocytes. Liu et al. discovered that    hepatocellular carcinoma is more responsive to immunotherapy by targeting monocyte-intrinsic enhancer reprogramming. Furthermore, an assessment of the lymphocyte-to-monocyte ratio predicts prognosis in hepatocellular carcinoma patients undergoing radiofrequency ablation and transcatheter arterial chemoembolization. Additionally, we also evaluated the potential function of SERPINE1, VCAN, and TFPI2 in a hepatocellular carcinoma cohort. Te results revealed that the humoral immune response is closely associated with TFPI2. According to growing evidence, the peripheral immune response to hepatocellular carcinoma afects how the disease develops, how it responds to therapy, and how long patients live. Furthermore, an immune-suppressive response was also found among patients with NAFLD-hepatocellular carcinoma, as determined by functional and metabolomic evidence [35]. An additional study demonstrated that AKR1B10 and SPP1 were closely related to NAFLD and NAFLD-hepatocellular carcinoma immune cell infltration and immunosuppressive cytokine expression [36]. SERPINE1 is closely associated with immune checkpoint molecule expression in the GC cohort as a hypoxia-related gene [37]. In addition, there is a good correlation between VCAN and immune checkpoint blockade response [38].
In recent years, many studies have focused on the role of bioinformatics analysis methods in human health [39]. Te bioinformatics analysis could lead to higher-quality research and provide new directions for researchers. However, there are also some limitations to bioinformatics analysis. First, without experimental verifcation, the results need to be verifed by experiments [40]. In addition, high heterogeneity often leads to large bioinformatics analysis errors, so unifying the methods is essential to reducing errors [41]. Terefore, corresponding experimental validations are needed to be performed to further confrm the accuracy of our results.
Taken together, our study discovered that the glucose metabolism pathway may play an important role in the process of NAFLD-hepatocellular carcinoma. In addition, three glucose metabolism-related genes (SERPINE1, VCAN, and TFPI2) may be potential targets for the immunotherapy of patients with NAFLD-hepatocellular carcinoma.

Data Availability
Te data used to support the fndings of this study were supplied by Peng Yao under license and so cannot be made freely available. Requests for access to these data should be made to Peng Yao, email: yp113065@outlook.com.

Conflicts of Interest
Te authors declare that they have no conficts of interest.