Profiles of Acetylation Regulation Genes Contribute to Malignant Progression and Have a Clinical Prognostic Impact on Liver Cancer

Background Several studies have demonstrated that acetylation was involved in the process of liver cancer. This study aimed to establish an effective predictive prognostic model using acetylation regulation genes in liver cancer. Methods Two datasets were downloaded from the Cancer Genome Atlas (TCGA) database and International Cancer Genome Consortium (ICGC) database. Differentially expressed acetylation regulation genes were identified in the TCGA-LIHC dataset, and then, Gene Ontology (GO) functional annotation analysis was used to investigate the molecular mechanism. After grouping the patients into clusters based on consensus clustering, we explored the correlation between clusters and clinical characteristics. A risk model was constructed by the least absolute shrinkage and selection operator (LASSO) regression analysis to calculate the risk score. Patients were divided into high-risk and low-risk groups according to the risk score using the acetylation regulation genes. Data downloaded from LIRI-JP were used for external validation. Univariate and multivariate Cox regressions were performed to identify independent risk factors. A prognostic nomogram was constructed according to the TCGA-LIHC dataset. The effect of HDAC11 expression on the proliferation and migration of liver cancer was detected by the CCK-8 method and cell scratch test, respectively. Results Eleven of 29 acetylation regulation genes were identified as upregulated differentially expressed genes. Go enrichment analysis showed that they were involved in “protein and histone deacylation and deacetylation.” Patients were categorized into two clusters according to the expression of 29 acetylation regulation genes. Compared with cluster 2, cluster 1 correlated with shorter overall survival (OS) and higher expression. Stage, T stage, grade, gender, age, and follow-up state were significantly different between two clusters. Pathways involved in DNA repair were significantly enriched in cluster 1. The risk score was calculated by HDAC1, HDAC2, HDAC4, HDAC11, HAT1, and SIRT6. Patients in the high-risk group had a worse prognosis in both datasets. Risk score was not only an independent prognostic marker but could also predict the clinicopathological features of liver cancer. A nomogram containing risk score, T stage, and M stage was built to predict overall survival. After transfection with HDAC11 overexpression plasmid, the proliferation ability of HepG2 cells increased, while the migration ability had no change. Conclusions Our findings suggested that acetylation regulation genes contribute to malignant progression and have a clinical prognostic impact on liver cancer.


Introduction
Primary liver cancer comprises hepatocellular carcinoma, intrahepatic cholangiocarcinoma, and other rare tumors [1]. The prognosis for liver cancer is poor [2]. Neither cur-rent ablation therapies nor chemotherapy is appreciably effective in improving outcomes of this devastating disease [2]. Therefore, it is a priority for us to identify high-risk patients and ascertain novel therapeutic targets as well as effective treatment options for this disease.
Since being proposed, considerable evidence supports the pivotal role of cancer stem-like cells (CSCs) in pathological self-renewal, drug resistance, and cellular heterogeneity of cancer. Meanwhile, the epigenetic aberration in normal developmental processes is a key driver of CSC-like properties [3,4]. Epigenetic modifications, including DNA methylation, histone modification, nucleosome remodeling, and RNA-mediated targeting, regulate many biological processes that are fundamental to the genesis of cancer [5]. Acetylation is one of the most important protein modifications, which contributes to several chromatin-dependent processes, such as DNA replication, damage and repair, transcriptional activation, cell cycle, and gene regulation [3], and in turn modulates cellular activities like proliferation, differentiation, and migration [6]. The dynamic process is regulated by the balance between histone acetyltransferases (HATs) and deacetylases (HDACs) [6]. Previous studies have revealed that several members in HATs and HDACs are involved in liver cancer progression [7][8][9], growth [10], proliferation, migration [7,11], and chemoresistance [12]. It has been observed that in different databases, several HDACs increase with a strong expression variant in liver cancer compared to adjacent normal tissues [13]. Also, some HDACs inhibitor can activate apoptosis in liver cancer [14] and has been approved as a potential treatment for early-stage liver cancer in animal trials [15] and cell experiments [13]. However, most previous studies focus on the effect of a single gene, while the precise role of each HATs and HDACs in regulating tumorigenesis remains unclear. Because of the overlap of the target genes of acetylation regulation genes and the existence of joint regulation by the same factor [16], the profile of cellular acetylation status and gene expression level is of great importance to exploring new treatments and assessing the potential benefit of the existing inhibitors.
Therefore, through this novel study, we attempt to establish an effective predictive prognostic model using acetylation regulation genes in liver cancer. We detected 11 different expression genes (DEGs) between liver tumor tissue and adjacent normal tissues by systematically analyzing the expression of 29 acetylation regulation genes in TCGA-LIHC cohort. Gene Ontology (GO) functional annotation analysis was performed using these 11 DEGs to explore the underlying molecular mechanism. Next, we divided patients into two clusters according to 29 acetylation regulation genes' expression and confirmed the relation between clustering and malignant progression. Gene set enrichment analysis (GSEA) was used to find various Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways that might explain the mechanism of different prognoses enriched in different clusters. Besides, we built a risk model including HDAC1, HDAC2, HDAC4, HDAC11, HAT1, and SIRT6 through the least absolute shrinkage and selection operator (LASSO) regression analysis. Patients included in the highrisk group had a worse prognosis. The prediction value was validated in TCGA-LIHC and LIRI-JP datasets using Kaplan-Meier curve and receiver operating curve (ROC). Association between risk score and clinical characteristics was also investigated. Univariate and multivariate Cox regressions testified risk score as an independent prognostic risk factor. The nomogram consisting of risk score, T, and M, was built to predict overall survival. Cell experiment confirmed overexpression of HDAC11 accelerated cell proliferation. Therefore, we put forward that acetylation regulation genes may contribute to malignant progression, and these six genes have a clinical prognostic impact on liver cancer.
We present the following article in accordance with the MDAR reporting checklist.

Materials and Methods
2.1. Data Acquisition. In this study, the RNA-seq transcriptome data of 374 tumor tissues and 50 adjacent normal tissues and the clinical information of 348 patients in the TCGA-LIHC dataset were downloaded from the Cancer Genome Atlas (TCGA) data portal (https://portal.gdc .cancer.gov/). LIRI-JP dataset was obtained from International Cancer Genome Consortium (ICGC) data portal (https://dcc.icgc.org/), containing the RNA-seq transcriptome data of 202 normal tissues and 243 liver tumor tissues and the clinical information of 232 patients. RNAsequencing data were normalized as fragments per kilobase million (FPKM). The extracted clinical information, such as gender, age, grade, stage, T stage, M stage, N stage, follow-up time (futime), and follow-up state (fustate), was used in the following analysis.
To investigate the DEGs of acetylation regulation genes between liver cancer and adjacent normal tissue, we used the "Limma" package to analyze the expression of 29 genes in samples of TCGA-LIHC. After log 2 -transformed, these genes expression was compared between normal tissue and tumor tissue by using the Wilcoxon test. After calculating the mean value and log 2 FC (fold change) (log 2 FC = log 2 ðtumor mean valueÞ/ðnormal mean valueÞ) of each target gene, the genes with p < 0:05 and jlog 2 FCj ≥ 1 were considered as DEGs and the result was visualized by a vioplot.

Bioinformatic Analysis
2.3.1. GO Functional Annotation Analysis. Functional analyses of GO pathway were conducted to determine the major pathways regulated by these DEGs, using several packages such as "clusterProfiler," "enrichplot," "ggplot2," and "GOplot." The p value cutoff and q value cutoff were equal to 0.05. The top results were presented by barplot and GO circle plot. Identification of Consensus Clusters. The correlation analysis was conducted by "corrplot" package to identify the correlation among 29 target genes' expression in liver cancer. And then, we used the "ConsensusClusterPlus" package to group liver cancer patients into different clusters and used principal component analysis (PCA) to verify the grouping results. Next, we drew a Kaplan-Meier survival curve of patients in different clusters using the "survival" package, and the correlation between clinical characteristics and clusters was conducted to determine the relationship between clustering and malignant progression. GSEA was used to find out various KEGG pathways enriched in different clusters.

Identification of Risk Model and External Validation.
We performed a univariate Cox regression analysis of the target genes' expression matrix to find genes related to a worse prognosis. Then, we used LASSO regression to generate a risk model to delete redundant genes predicting clinical prognosis and used ROC curve to determine the cutoff value. Patients were divided into high-risk and low-risk groups accordingly. Meanwhile, according to the risk model formula originating from the TCGA-LIHC dataset, the risk scores of patients in the LIRI-JP dataset were calculated. Besides, patients were rated into high-risk and low-risk groups using the cutoff determined from their own ROC. The p value of the Kaplan-Meier survival curve and AUC of ROC were used to evaluate the prediction value in both datasets. Finally, univariate Cox regression analysis and mul-tivariate Cox stepwise regression were used to identify independent prognostic risk factors in the TCGA-LIHC dataset.

Predictive Nomogram Construction and Evaluation.
Variables identified in multivariate Cox stepwise regression were used to construct a predictive nomogram via "rms" package in the TCGA-LIHC dataset. The fit model was built by the function of "cph." Calibration curve and Harrell's concordance index (C-index) were used to evaluate nomogram discrimination. Both of them were calculated using a bootstrap method with 1000 resamples. Sixty-five patients per group compared the concordance between nomogram predicted OS and observed OS at different time points. Harrell's C-index was calculated by internal sampling validation. The mean value of C-index evaluated the nomogram discrimination.   Figure 1: Flowchart presenting the process in this study.   RIPA lysis buffer (Fdbio, China) with protease inhibitors was used to obtain total protein extraction. Equal amounts of protein were separated by 12% SDS-PAGE followed by transfer to PVDF membranes (Millipore, USA). Primary antibodies used included mouse anti-HDAC11 (Santa Cruz, USA) and anti-GAPDH (CST, USA). Signal was developed using an ECL Kit (Fdbio, China) with detection on a ChemiScope system (Clinx Science, China).

CCK-8 Assay.
Cell survival rates were estimated by the CCK-8 assay (APExBIO, USA). After transfection, approximately 10 3 cells were seeded in 96-well plates with 100 μl medium for each well. At 0, 24, 48, 72, and 96 h, the original medium was removed, and a medium supplemented with 10% CCK-8 solution was added and incubated for 2 h. The absorbance at 450 nm was measured. The proliferation curve was plotted according to log (ODt1/OD0h, 2).

Cell Scratch Test.
A single scratch was made using a sterile 10-μl pipette tip, while the cells seeded in a 6-well plate reached a confluent state. Then, cells were washed with PBS 2 times, and the medium was replaced by FBS-free DMEM and incubated for another 24 h. Images of the scratches were captured at 0 and 24 h with Olympus IX73. The width of the scratch was analyzed using the Image Pro Plus software.
2.5. Statistical Analysis. The statistical and bioinformatic analysis was conducted by using R software (version 4.0.3), and related packages ("limma," "ggplot2," "survival," "survminer," "forestplot," and "glmnet") were needed to install and load during analysis. GSEA was conducted by using GSEA software (version 4.1.0). Wilcoxon tests were used to compare the expression level between tumor tissues and adjacent normal tissues, and Fisher tests were used to   SIRT3   KAT2B  HDAC6  SIRT5  HDAC9  HDAC7  KAT7  KAT5  KAT6A  SIRT1  KAT6B  EP300  CREBBP  HDAC8  HDAC4  HDAC1  HAT1  HDAC2  KAT8  HDAC3  HDAC5  SIRT7  KAT2A  NAA60  SIRT6  SIRT4  SIRT2  HDAC10  HDAC11   SIRT3  KAT2B  HDAC6  SIRT5 HDAC9 HDAC7 After constructing the risk model, we obtained the optimal cutoff value depending on ROC curve and divided the patients into high-risk and low-risk groups. The significant prognostic risk factor was identified by univariate Cox regression analysis and multivariate Cox stepwise regression analysis (p < 0:05). All cell experimental results were inde-pendently repeated at least three times. T-test or one-way ANOVA test was used to analyze proliferation and migration ability. p < 0:05 was statistically significant.

The Distinction of Acetylation Regulation Genes in Liver
Cancer and Normal Tissue. This study is performed    (Figure 2(a)). The heat map and vioplot showed that there were 25 genes (Figures 2(a) and 2(b)), the expression levels of which were significantly different between tumor tissue and normal tissue. Among them, the expression of HDAC6 and KAT2B decreased in tumor tissue, while the rest increased. HDAC1, HDAC4, HDAC5, HDAC7, HDAC10, HDAC11, KAT2A, KAT7, SIRT4, SIRT6, and SIRT7 were upregulated and identified as significant DEGs between normal and tumor tissues (Table 1). Among them, HDAC7, HDAC4, SIRT7, KAT2A, and HDAC11 were the top five most upregulated genes, indicating that the expression of deacetylation-related genes played a dominant role in liver cancer compared with acetylation-related genes.
GO analysis was conducted to analyze the function of these 11 genes in tumorigenesis and progression of liver cancer. The bubble plot (Figure 2(c)) showed the result listed in descending order by gene ratio in each pathway. In terms of biological process (BP), pathways such as "histone modification," "covalent chromatin modification," "protein deacetylation," and "protein deacylation" increased. Besides, "histone deacetylase complex" and "transcription regulator complex" increased as cellular component (CC), and the molecular function (MF) of "hydrolase activity," "histone deacetylase activity," and "protein deacetylase activity" was upregulated in tumor tissue (Figures 2(c) and 2(d)). The top 5 results in BP, CC, and MF in ascending order by p value are listed in Supplementary Table S1.

Consensus Clustering of Acetylation Regulation Genes
Identified Two Clusters of Liver Cancer. We supposed the expression of acetylation regulation genes might correlate to different clinical characteristics in liver cancer. Firstly, we investigated the correlation analysis of 29 acetylation regulation genes in liver cancer. Figure 3(a) shows that a large proportion of target genes were weakly to moderately correlated, which meant those genes could not be considered as a single independent variable during subsequent analysis. Next, we used "ConsensusClusterPlus" package to group patients into different clusters. The fit k value needed to meet three criteria: (1) The ideal k value should be chosen as the cumulative distribution function (CDF) reaching a maximum approximation, (2) the number of patients in each cluster should not be too small, and (3) the gene expression should be a strong correlation in each cluster, while the correlation between clusters should be as weak as possible. Based on the results of Figures 3(b) and 3(c), k = 4 seems to have a minor cumulative distribution function (CDF) increment, but taking the number of patients in each cluster and the correlation among different clusters into consideration (Figures 3(d)-3(f)), k = 2 was more appropriate in our study. To confirm whether our classification was correct, we performed PCA using the whole RNA-seq data and found that patients in respective clusters could gather together (Figure 3(g)). This result indicated that the classification of liver cancer patients into two clusters by target genes was correct.    Figure 4(a) shows that stage (stage I-IV), T stage (T1-T4), grade (G1-G4), gender (male/female), age (≤65, >65), and follow-up state (alive/dead) were significantly different in two clusters. Besides, we found that most of the target genes had higher expression in cluster 1. The OS of patients in cluster 2 was much longer than that in cluster 1 (Figure 4(b)), and the difference of 3-year survival rates was statistically significant (cluster 1 vs cluster 2 =40.5% vs 67.6%). In other words, the expression profile of acetylation regulation genes was expected to predict prognosis in liver cancer.

Correlation between Clusters and Clinical
To investigate the potential mechanisms caused this survival difference, we used GSEA to find out different KEGG

Prognostic Value of Risk Model Consisting of Acetylation
Regulation Genes. We performed a univariate Cox regression analysis to investigate better the prognostic role of acetylation regulation genes in liver cancer. The results indicated that HDAC1, HDAC2, HDAC4, HDAC5, HDAC11, HAT1, SIRT6, SIRT7, and KAT7 were prognostic risk factors, with hazard ratios (HR) >1 and lower 95% confidence intervals (CI) >1 ( Figure 5(a), Table 2). Thirteen genes (HDAC1, HDAC2, HDAC3, HDAC4, HDAC5, HDAC7, HDAC11, HAT1, SIRT6, SIRT7, KAT5, KAT7, and EP300) with p < 0:1 entered LASSO Cox regression algorithm to delete redundant genes and constructed a risk model. Six genes were selected based on the minimum criteria, and the coefficients obtained from the LASSO algorithm were used to calculate the risk score for every patient included in our study (Figures 5(b) and 5(c)). "SurvivalROC" package was used to evaluate the performance of the risk model and to determine the optional cutoff when grouping the patients into high-risk and low-risk groups. According to 3-year survival rates, the AUC was equal to 0.683 ( Figure 5(d)), and the cutoff was equal to 2.301581.
To confirm the grouping result, we drew Kaplan-Meier survival curves. Figure 5(e) indicates that the patients divided into the high-risk group had a shorter OS (p < 0:05), especially when follow-up was less than 3 years. Besides, patients with higher LASSO risk scores tended to have shorter survival (Figures 5(f) and 5(g)). We also found that these 6 genes included in the LASSO algorithm had higher expression in the high-risk group (Figure 5(h)).

External
Validation of Risk Model. The mRNA expression matrix and clinical information of the six genes were extracted in the LIRI-JP dataset. The RNA-seq transcriptome data originated from 202 normal tissues and 243 liver tumor tissues in this dataset. After matching with clinical characteristics, a total of 232 patients were included in this study. The risk score of each patient was calculated using the formula mentioned above. According to the 3year survival rates, the AUC was 0.699 (Figure 6(a)) with a threshold of 6.168746.
Furthermore, the AUC of 5-year survival rates was 0.741 ( Figure 6(b)). One hundred and thirty-three patients were rated as high risk, while 99 patients were low risk. Kaplan-Meier survival curves showed that the OS in the high-risk group was much shorter than in the low-risk group (Figure 6(c)). These results confirmed that the risk score calculated by six genes effectively predicted liver cancer prognosis.

Associations between Risk and Clinicopathological Features in Liver Cancer and Independent Risk Factor
Identification. The correlation between the risk model and clinical features was analyzed. We found that stage (stage I-IV), T stage (T1-T4), N stage (N0/N1/NX), grade (G1-G4), gender (male/female), age (≤65, >65), and follow-up state (alive/dead) were significantly differential distribution in high-risk and low-risk groups (Figure 7(a)).
Next, we used age, gender, grade, stage, T stage, M stage, N stage, and risk score in a univariate Cox regression analysis (Figure 7(b)). Three variables stage, T stage, and risk score with p < 0:05 were identified as risk factors. Furthermore, Cox regression stepwise analysis was used to build a fit model to predict prognosis with independent risk factors. Eventually, risk score, T, and M were included as independent prognostic risk factors in our study (p < 0:05) (Figure 7(c)).

Predictive Nomogram and Evaluation.
The TCGA-LIHC dataset was used to construct the predictive nomogram. Independent risk factors identified by stepwise COX regression analysis, such as risk score, T, and M, were contained in predicting 1-year, 2-year, and 3-year OS (Figure 8(a)). Total points calculated by respective points based on risk score, T, and M were used to predict corresponding 1-year, 2-year, and 3-year OS. Calibration curves showed the ratio of nomogram predicted OS and observed OS always fluctuated around 1 (Figure 8(b)), indicating that the nomogram performed well. C-index was calculated by the internal bootstrap method with 1,000 resamples. The mean of C-index was 0.672.
3.8. HDAC11 Upregulated Cell Proliferation in HepG2. CCK-8 proliferation curve indicated that the proliferation rate of HepG2 cells was significantly accelerated after transfection 13 Disease Markers with HDAC11 plasmid compared with the control group (Figures 9(a) and 9(b)). After 48 h, the cell densities were much higher in the over-HDAC11 group compared with the over-con group (Figure 9(b)). The cell scratch test result showed no difference in the travel distance between the two groups (Figures 9(c) and 9(d)). Therefore, a high level of HDAC11 could upregulate cell proliferation ability.

Discussion
As the second leading cause of cancer-related death and a major public health challenge worldwide, the burden of liver cancer is increasing globally [1]. Liver cancer patients' outcomes are improved due to the optimization of individual treatment strategies and the development more complex therapeutic modalities [2].
Acetylation, considered one of the most critical protein modifications, plays an essential role in tumorigenesis and tumor growth, proliferation, migration, and chemoresistance in various cancer. Previous studies have demonstrated that acetylation and deacetylation influence the plasticity of chromatin structure by changing the electrical property of acetylated sites of histone and improving the stability of many nonhistone proteins by covering ubiquitination sites [16]. Protein acetylation modification regulated by HATs and HDACs exhibits different biological effects, leading to promotion [25] or suppression [26] in liver cancer, depending on the effect of protein. Even the same acetylation regulation gene can cause the opposite effect in liver cancer through regulating different target proteins [25,26], let alone the genes in the same family [27,28]. Considering these complex situations, integration of expression profiles of known acetylation regulation genes, which influence malignant progression and clinical prognostic in liver cancer, has great value and necessity. This study aimed to clarify the effect of acetylation regulation genes on liver cancer and attempted to construct a predictive prognostic model.
Twenty-five of 29 target genes had significant differences between tumor and normal tissues, and 11 were identified as DEGs. Nine of 11 DEGs were HDACs, including HDAC1, HDAC4, HDAC5, HDAC7, HDAC10, HDAC11, SIRT4, SIRT6, and SIRT7, and only KAT2A, and KAT7 in HATs     were DEGs. GO analysis showed the activity of "protein deacetylation," "histone H3 deacetylation," "protein deacylation," "macromolecule deacylation," and "histone deacetylation" upregulated in tumor tissue. Therefore, we hypothesized that deacetylation of important proteins played an important role in developing liver cancer. To explore the relationship between acetylation regulation genes and prognosis in liver cancer, we identified two clusters of liver cancer by applying consensus clustering to 29 genes. We found that cluster 1 with higher expression of most target genes was correlated with a poorer prognosis. Stage (stage I-IV), T stage (T1-T4), grade (G1-G4), gender (male/female), age (≤65, >65), and follow-up state (alive/ dead) were significantly different in two clusters. The KEGG pathways involved in "spliceosome," "RNA degradation," "pyrimidine metabolism," "base excision repair," and "nucleotide excision repair" were significantly enriched in cluster 1. It was genomic repair and stability that we supposed endowed the tumor cells with better survival and subsequently caused a worse prognosis.
Also, we constructed a risk model by LASSO regression analysis. The risk score was calculated using the expression value of 6 acetylation regulation genes. All patients were rated into the high-risk and the low-risk groups accordingly. The ROC AUC of the risk score to predict clinical prognosis was 0.683. Generally, a model with AUROC =68.3% was not an ideal perfect one. But considering the complexity of tumorigenesis and its multiple factors contributed to prognosis, we supposed the risk prediction model using gene expression level has specific values. Furthermore, we explored other significant risk factors in the following analysis. We found that stage (stage I-IV), T stage (T1-T4), N stage (N0/N1/NX), grade (G1-G4), gender (male/female), age (≤65, >65), and follow-up state (alive/dead) were significantly differential distribution in high-risk and low-risk groups. Information extracted from the LIRI-JP dataset validated the predicted value of this risk model. Stepwise Cox regression showed that risk score, T, and M were independent risk factors in predicting prognosis in liver cancer. Finally, we construct a