Identifying a Six-Gene Signature Predicting Response to TACE in Hepatocellular Carcinoma by Bioinformatics Analysis

Background and Aim. With regard to patients with intermediate-stage, irresectable hepatocellular carcinoma (HCC), transcatheter arterial chemoembolization (TACE) is the mainstay of treatment. There is an urgent clinical requirement to identify reliable biomarkers to predict the response of HCC patients to TACE treatment. We aimed to identify a gene signature for predicting TACE response in HCC patients based on bioinformatics analysis. Methods. We downloaded the gene expression profile GSE104580 based on 147 tumor samples from 81 responders to TACE and 66 nonresponders from the Gene Expression Omnibus (GEO) database. Then, we randomly divided the 147 tumor samples into a training set (
 
 n
 =
 89
 
 ) and a validation set (
 
 n
 =
 58
 
 ) and screened differentially expressed genes (DEGs) in the training set. Gene Ontology (GO) term and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed to annotate functions of the DEGs. The DEGs were mapped into the STRING website for constructing protein-protein interaction (PPI). The predictive value of the candidate genes by receiver-operating characteristic (ROC) curves was further verified in the validation set. Results. We totally found 158 DEGs (92 upregulated genes and 66 downregulated genes) in the training set. The GO enrichment analysis revealed that DEGs were significantly enriched in metabolic and catabolic processes, such as drug metabolic process, fatty acid metabolic process, and small molecule catabolic process. The KEGG pathway analysis revealed that the DEGs were mainly concentrated in drug metabolism-cytochrome P450, metabolism of xenobiotics by cytochrome P450, and chemical carcinogenesis. We identified 6 candidate genes (CXCL8, AFP, CYP1A1, MMP9, CYP3A4, and SERPINC1) based on the PPI network of the DEGs, which had high predictive value in HCC response to TACE with an area under the curve (AUC) value of 0.875 and 0.897 for the training set and validation set, respectively. Conclusion. We identified a six-gene signature which might be biomarkers for predicting HCC response to TACE by a comprehensive bioinformatics analysis. However, the actual functions of these genes required verification.


Introduction
Hepatocellular carcinoma (HCC) represents a malignant tumor predominantly arising in the setting of cirrhosis and will be responsible for an estimated one million death on a global scale in 2030 [1]. As per the Global Cancer Statistics in 2018, the mortality rate of HCC is only second to lung cancer in males by gender stratification [2]. Most of HCC cases at their initial diagnosis have reached the advanced stage, which results in an unsatisfactory overall survival [3]. Transcatheter arterial chemoembolization (TACE) induces ischemia-hypoxia and local chemotherapy-induced cytotoxicity which destroys cancerous cells, which is the first-line treatment for unresectable or intermediate stage HCC [4]. However, as a palliative therapy, TACE also has several limitations. Firstly, the response rate to TACE may vary widely among HCC patients. Although TACE did offer a survival advantage over supportive treatment alone in some patients, there were still up to 60% of HCC patients who did not benefit from it in spite of multiple treatments.
Secondly, TACE has multiple side effects (AEs). A recent literature reported a total of 21,461 AEs in 15,351 patients treated with TACE [5]. Abnormal liver enzyme was the most common AE, with an incidence of 52%, followed by postembolism syndrome such as intestinal obstruction, abdominal pain, and fever, with an incidence of 47.7% [5]. In addition, fever, nausea, and fatigue were also the typical postembolization syndrome. Some serious complications including liver failure, gastroduodenal ulceration, and death may also occur [4][5][6]. Thirdly, TACE treatment is costly. Therefore, early identifying which patients benefit most from TACE may be of great clinical relevance and is the key goal of modern personalized medicine.
The response to TACE in HCC patients is conventionally assessed by radiologically enhanced criteria which focus on tumor size [7]. However, the use of radiologic criteria to measure response has some limitations, such as interobserver subjectivity, high interobserver variability, increased patient's radiation exposure, and misjudgment resulting from dysplastic or regenerative nodules, or perfusion abnormalities [8]. Recently, more and more researchers investigate novel biomarkers to predict patient response to TACE in HCC. For example, Mao et al. found that ASPP2 expression in cancer tissue following TACE is an independent risk factor for HCC recurrence as well as overall survival [9]. Ma et al. demonstrated serum STIP1 is a promising biomarker for outcome evaluation, therapeutic response assessment, and microvascular invasion prediction in HCC following TACE [10]. Given this, screening reliable and novel biomarkers for predicting HCC response to TACE treatment is an urgent clinical need. Due to the complicated molecular and cellular heterogeneity in HCC, we speculated that the difference in tumor gene expression may be related to the TACE response. With the development of biomedicine, high-throughput gene chip technology has begun to be popularized in the exploration of mechanism and identifying potential biomarkers.
Therefore, in this study, we downloaded and initially analyzed the GSE104580 dataset from the Gene Expression Omnibus (GEO) database to explore the DEGs associated with TACE response between TACE responders and nonresponders. We then annotated the functions of these DEGs by using the GO term enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Finally, we performed Protein-Protein Interaction (PPI) network analysis to discover candidate genes. Based on the bioinformatics analysis, novel biomarkers for predicting HCC response may be provided.

Microarray Data and Preprocessing.
We downloaded the gene expression dataset of GSE104580 which contains microarray data from 147 tumor samples from 81 TACE responders and 66 TACE nonresponders from the GEO database and then randomly divided the 147 samples into a training set (n = 89) and a validation set (n = 58). The training set was then used to screen the DEGs associated with TACE response.

Differential Expression Analysis.
We used the limma package in R software to analyze the DEGs (|logFC | ≥1:2 and the adj:p value < 0:05) associated with TACE response by comparing HCC tissues between TACE responders and nonresponders in the training set.

GO and KEGG Pathway Analysis.
To explore the significantly enriched functions of the DEGs and better understand the important pathways of the DEGs participation, we used clusterProfile packages for GO and KEGG analysis. The Fisher's exact test was applied to evaluate the significance of GO and KEGG enrichments, and a q value < 0.05 indicated a statistically significant difference.
2.4. Construction of the PPI Network. The Search Tool for the Retrieval of Interacting Genes (STRING) database was applied to construct the PPI network of DEGs, and a confidence > 0:4 was set as statistically significant. Then, the Cytoscape software (version 3.6.3) was employed to visualize the obtained PPI network. Moreover, to find out the most important nodes in the PPI network, the CentiScaPe 2.2 plug-in was employed to calculate the degree, closeness, and betweenness of the network. We operationally defined genes with degree value in the top 10% as hub genes, while the betweenness value in the top 10% as bottleneck genes. The Venn diagram was then applied to screen the genes in the intersection of these two gene sets as candidate genes for predicting HCC response to TACE treatment.

Prediction and Evaluation of Predictive Biomarkers for
HCC Response to TACE Treatment. To explore predictive biomarkers for HCC response to TACE treatment, we used the above hub genes as candidates to find their predictive value based on ROC curve analysis. In brief, 89 samples (TACE responders = 52, TACE nonresponders = 37) were randomly distributed as the training set, which was used to build a ROC curve. The remaining set (n = 58) was used as the validation set. The area under the curve (AUC) was computed to estimate the predictive accuracy of the classifier.

Identification of DEGs between Responders to TACE and
Nonresponders for HCC Patients. After differential expression analysis, we identified a total of 158 DEGs according to the thresholds of |log FC | ≥1:2 and adj:p value < 0:05, including 92 upregulated genes and 66 downregulated genes ( Table 1). The volcano plot is presented in Figure 1(a). In addition, Figure 1(b) shows the heat map of expression diversity of DEGs, from which we can see that tumor samples from the TACE responders were distinguished clearly from that from the TACE nonresponders by the identified DEGs. Figure 2 shows the results of Go enrichment, from which we found that the DEGs were the strongest enrichment in metabolic and catabolic processes, such as drug metabolic process, fatty acid metabolic process, and small molecule catabolic process ( Figure 2). In addition, the KEGG pathway analysis showed the DEGs mainly concentrated in drug metabolism-2

Integrative Bioinformatics Analysis of DEGs.
Journal of Nanomaterials cytochrome P450, metabolism of xenobiotics by cytochrome P450, and chemical carcinogenesis ( Figure 3).

The Value of Candidate Genes in Predicting HCC
Response to TACE Treatment. At last, we examined the predictive value of CXCL8, AFP, CYP1A1, MMP9, CYP3A4, and SERPINC1 in predicting HCC response to TACE treatment in the training set and confirmed in the validation set.
The ROC curves revealed that the 6-gene signature had good performance in predicting HCC response to TACE treatment with AUC of 0.875 for the training set ( Figure 6(a)) and 0.897 for the validation set ( Figure 6(b)). These results suggested that the 6 differentially expressed hub genes can be used as potential biomarkers for predicting HCC response to TACE.

Discussion
Response to TACE treatment varies greatly among patients with HCC, and over 40% of patients do not respond to it.
At present, we cannot accurately predict the efficacy of TACE because there is no effective predictor of markers. Therefore, it is urgent to determine the key genes that affect the effect of TACE treatment and actively invest in the research of multigene signatures for predicting the response of TACE treatment. In the current study, we screened out 158 significant DEGs between HCC samples from TACE responders and nonresponders. These DEGs might play decisive roles in TACE response through metabolic and catabolic processes. Also, these DEGs were significantly concentrated in pathways like drug metabolism-cytochrome P450 pathway, and chemical carcinogenesis, which may be the underlying mechanism by which the DEGs affect the TACE response. Furthermore, by constructing the PPI network, we established a six-gene signature (including CYP1A1, CYP3A4, SERPINC1, CXCL8, AFP, and MMP9) for predicting TACE response. The 6-gene signature had good performance in predicting TACE response, demonstrating that the 6-gene signature provided new insights into the biological behavior of HCC and a basis for individualized management of HCC. The cytochrome P450 (CYP) enzyme family mainly participate in the metabolism of environmental toxicants, cancer-promoting substances, and antitumor drugs. Its function strongly correlates with the appearance and progression of tumors and the response to anticancer drugs [11,12]. Cytochrome P450 1A1 (CYP1A1) is a subtype of the cytochrome P 450 family, which is highly expressed in human liver tissues [13,14]. Numerous studies have confirmed the relationship between CYP1A1 gene polymorphisms and HCC risk. Moreover, it has been reported that CYP1A1 is an important regulator of the conversion of tobacco-derived polycyclic aromatic hydrocarbons (PAHs) to carcinogenic metabolites, and its polymorphism may exert an effect on raising the susceptibility to smoking-related HCC [15].
Cytochrome P450 3A4 (CYP3A4) is also a member of the CYP 450 superfamily. It is expressed mainly in human liver as well as extrahepatic tissues including lung, kidney, and intestine. The expression of CYP3A4 is induced by    IFI27  SPARCL1  FAS  CES2  AKR7A3  ADH1C  UGT2B15  CLDN2  CYP1A2  CD5L  CYP3A4  CYP1A1  THRSP  C3P1  RDH16  LINC00844  HSD11B1  ETNPPL  NUDT6  LDHD  SLCO1B1  CFHR4  HPR  FETUB  RBP5  ADHFE1  GLYAT  GYS2  C6  CYP8B1  HAO2  FGGY  ANXA10  SRD5A2  CYP2A6  CYP2A7  CYP2C9  FLJ22763  OGDHL  GNMT  PON3  LINC01018  ACSM2A  ACSM2B  CYP4F2  DAO  UPB1  SLC6A1  ASPOA5  EPHX2  AGXT  HPX  HRG  PIPOX  SERPINC1  AFM  HSD17B6  SLC22A1  SLC10A1  C4BPA  F9  CYP7A1  BHMT  AASS  NR113  IGFBP2  CYP4A11  SPP2  CCL16  RTP3  AQP9  HGFAC  PGLYRP2  ZG16  PCK1  HPD  ADH4  ANGPTL3  ACOT12  LECT2  ADH6  ADH1A  ADH1B  ABCC691  KLHDC9  CYP2C18  IL32  FMO1  RP11-274H2.5  GAL3ST1  VNN2  CA9  NDRG1  SLC6A8  KISS1R  ENO2  HILPDA  SPAG4  SULT1C2  ZIC2  ASRGL1  CLGN  MAP7D2  CTHRC1  EPCAM  FOXQ1  DKK1  MMP9  COLEC12  SPINK1  CCL20  SPP1  CXCL8  HK2  MMP12  DUSP9  TRIM71  AFP  TET1  ENPP3  RFX6  KANKA  RBP2  MEP1A  XK  SLAIN1  IL13RA2  FKBP1B  STEAP2  STEAP1  FDCSP  NTS  QPCT  S100A9  S100A8  ARG2  LIN28B  POPDC3  TSPAN13  AGR2  MMP1  GCNT3  ENPP5  PKIB  CD24  MAPK13  IGF2BP3  PLBD1  C15orf48  HCAR3  BCL2A1  CA12  TREM1 Type Non-responder   Figure 3: KEGG pathway enrichment analysis of 158 DEGs between HCC samples from TACE responders (n = 52) and nonresponders (n = 37) by analyzing the GSE104580 dataset; The DEGs were significantly enriched in drug metabolism-cytochrome P450, metabolism of xenobiotics by cytochrome P450, and chemical carcinogenesis. Fatty acid metabolic process Small molecule catabolic process Steroid metabolic process Cellular hormone metabolic process Hormone metabolic process Carboxylic acid biosynthetic process Xenobiotic metabolic process Cellular response to xenobiotic stimulus Response to xenobiotic stimulus Organic acid catabolic process Carboxylic acid catabolic process Terpenoid metabolic process Isoprenoid metabolic process Fatty acid derivative metabolic process Retinoid metabolic process Diterpenoid metabolic process Olefinic compound metabolic process Icosanoid metabolic process Drug metabolic process Long−chain fatty acid metabolic process Unsaturated fatty acid metabolic process Epoxygenase P450 pathway Retinol metabolic process Arachidonic acid metabolic process Primary alcohol metabolic process Retinoic acid metabolic process Exogenous drug catabolic process Drug catabolic process Steroid catabolic process Ethanol oxidation 15 20 5 10 Figure 2: GO term enrichment analysis of 158 DEGs between HCC samples from TACE responders (n = 52) and nonresponders (n = 37) by analyzing the GSE104580 dataset; The DEGs were significantly enriched in metabolic and catabolic processes, such as drug metabolic process, fatty acid metabolic process, and small molecule catabolic process. 6 Journal of Nanomaterials glucocorticoids and mainly participates in drug metabolism and lipid composition synthesis. CYP3A4 has a bidirectional function of activating some drugs and inactivating some drugs. Previous studies showed that cytochrome P450 enzymes metabolized approximately 70-80% of the drugs used in clinic, and CYP3A4 played a role in the metabolism of half of these drugs, including many cancer chemotherapeutic agents [16,17]. Recently, several studies revealed that CYP3A4 was significantly downregulated in HCC tissues, which was associated with a poor prognosis of HCC [18,19]. Moreover, another investigation found that CYP3A4 may be a new tumor suppressor gene, which is related to the prognosis of HCC [20]. SERPINC1 is a serine protease inhibitor. It controls the blood coagulation process of the body. Previous study found that the expression levels of SERPINC1 in the serum of HCC patients and healthy subjects were significantly different [21].  7 Journal of Nanomaterials A major event in tumor growth and progression is the tumor angiogenesis. Cancer cells can secrete not only angiogenesis stimulating factors but also antiangiogenesis factors. The cleavage conformation of SERPINC1 inhibits the proliferation and migration of cancer cells through antiangiogenesis and anti-inflammatory [22,23]. Its expression decreased in HCC [24,25]. In the current study, upregulated expression of SERPINC1was associated with a favorable response to TACE. We therefore speculated that the high expression of SERPINC1 may contribute to the host's anticancer defense.
The chemokine CXCL8 has tumorigenic and angiogenic effects. Both inflammatory cells and tumor cells can secrete it. Studies have shown that CXCL8 played a role in the angiogenesis and metastasis of many types of tumors and is related to the poor prognosis of these tumors [26].
Specifically, CXCL8 and its receptors CXCR1 and CXCR2 may be involved in the occurrence and progression of HCC [27]. Like alpha-fetoprotein (AFP), CXCL8 also has a high predictive value in identifying HCC, and it could also independently predict the survival prognosis of patients with early HCC [28]. Interestingly, the size of HCC was positively linked to CXCL8 [27]. CXCL8 overexpressed in HCC tissues, and it may be related to the pathological staging, microvascular infiltration, and metastasis of HCC, indicating that CXCL8 may play a role in the progression and metastasis of HCC by participating in tumor proliferation and angiogenesis [29].
About 70% of HCCs secreted AFP, and it is a commonly used biomarker for diagnosing HCC in the clinic [30]. Furthermore, AFP has been confirmed to be the most common prognostic indicators of HCC in clinic. Several prognostics scores for HCC patients treated with TACE have included AFP as an important scoring factor [6,31,32]. Serum levels of AFP before TACE treatment and changes in AFP levels after TACE treatment were good predictors of response rate to TACE treatment and overall survival for HCC patients, respectively. The serum levels of AFP were negatively correlated with the response rate to TACE, and HCC patients with AFP level > 999,999 ng/ mL would have no response to TACE. Consistent with previous studies, we found that the downregulated expression of AFP was associated with a good clinical response to TACE treatment. These results may be explained by the fact that higher levels of AFP are usually caused by late tumor stage, large tumor burden, and portal vein tumor thrombosis [6,30].
The matrix metalloproteinase (MMP) family may degrade the extracellular matrix, destroy the normal tissue structure of cell adhesion molecules, and combine with other related enzymes to degrade the matric around the blood vessel, thereby facilitating the infiltration and metastasis of liver cancer cells. MMP 9 is a member of the MMP family and seems to exert a major effect in tumor angiogenesis through its key intervention in regulating growth plate angiogenesis and recruiting endothelial stem cells. M2 macrophages could promote the invasion and migration of HCC cells by means of changing miR-149-5p and therefore increasing the expression of MMP9 in HCC. Furthermore, the overexpression of MMP 9 in liver cancer leads to a higher TNM staging by increasing lymph node invasion and promoting metastasis, resulting in poor differentiation and an overall prognosis.
The current study still had several limitations need to be noted. Firstly, the six-gene signature lacked external validation. In future studies, external validation of the six-gene signature in more independent cohorts is necessary. Secondly, the 6 genes were obtained only by bioinformatics analysis; therefore, further experiments need to validate them and to clarify the underlying mechanism of them.
In conclusion, we identified a six-gene signature that could be used to predict the response of HCC to TACE

Data Availability
The profiles of GSE104580 can be downloaded from GEO datasets.

Conflicts of Interest
The authors declare there is no conflict of interest.