Construction of a Novel Diagnostic Model Based on Ferroptosis-Related Genes for Hepatocellular Carcinoma Using Machine and Deep Learning Methods

Hepatocellular carcinoma (HCC) is one of the most general malignant tumors. Ferroptosis, a type of necrotic cell death that is oxidative and iron-dependent, has a strong correlation with the development of tumors and the progression of cancer. The present study was designed to identify potential diagnostic Ferroptosis-related genes (FRGs) using machine learning. From GEO datasets, two publicly available gene expression profiles (GSE65372 and GSE84402) from HCC and nontumor tissues were retrieved. The GSE65372 database was used to screen for FRGs with differential expression between HCC cases and nontumor specimens. Following this, a pathway enrichment analysis of FRGs was carried out. In order to locate potential biomarkers, an analysis using the support vector machine recursive feature elimination (SVM-RFE) model and the LASSO regression model were carried out. The levels of the novel biomarkers were validated further using data from the GSE84402 dataset and the TCGA datasets. In this study, 40 of 237 FRGs exhibited a dysregulated level between HCC specimens and nontumor specimens from GSE65372, including 27 increased and 13 decreased genes. The results of KEGG assays indicated that the 40 differential expressed FRGs were mainly enriched in the longevity regulating pathway, AMPK signaling pathway, the mTOR signaling pathway, and hepatocellular carcinoma. Subsequently, HSPB1, CDKN2A, LPIN1, MTDH, DCAF7, TRIM26, PIR, BCAT2, EZH2, and ADAMTS13 were identified as potential diagnostic biomarkers. ROC assays confirmed the diagnostic value of the new model. The expression of some FRGs among 11 FRGs was further confirmed by the GSE84402 dataset and TCGA datasets. Overall, our findings provided a novel diagnostic model using FRGs. Prior to its application in a clinical context, there is a need for additional research to evaluate the diagnostic value for HCC.


Introduction
According to the fndings of the Global Cancer Statistics 2018, there were around 841,000 newly diagnosed cases of liver cancer and 782,000 deaths caused by liver cancer around the world, with China alone accounting for about 50% of the total number of cases and deaths [1][2][3]. It is estimated that between 75 and 80 percent of all occurrences of liver cancer are caused by hepatocellular carcinoma (HCC), which is an aggressive kind of malignant tumor that is typically discovered at a later stage when treatment is no longer efective [4,5]. Although there have been signifcant progresses and advancements in the treatment of HCC in recent years, in terms of surgical procedures, chemotherapeutic medications, and targeted drugs, HCC continues to have a very high incidence and mortality rate, which poses a serious threat to human health [6,7]. Te most popular blood biomarker for HCC, alpha-fetoprotein (AFP), demonstrates subpar performance as a serological test in HCC surveillance due to its low sensitivity being only 10%-20% in early-stage HCC and its labile levels during hepatitis fares [8,9]. It is due to the fact that AFP levels fuctuate during hepatitis fares. Terefore, patients diagnosed with HCC at an early stage who have a high chance of experiencing recurrence need to be identifed as quickly as possible so that tailored therapeutic options can be optimized and patient survival can be improved.
In recent years, the technology of microarrays has been employed in conjunction with integrated bioinformatics analysis in order to locate novel genes that have been linked to a range of diseases [10,11]. Tese genes have the potential to function as diagnostic and prognostic biological markers. For instance, Lan et al. reported that the expressions of KIAA1429 were distinctly increased in HCC specimens. In individuals with HCC, having a high expression of KIAA1429 was related with having a bad prognosis. Te knockdown of KIAA1429 resulted in a reduction in cell proliferation and metastasis both in vitro and in vivo. Tis was accomplished through a post-transcriptional alteration of GATA3 that was dependent on N6-methyladenosine [12]. Zhang et al. showed that DDX39 expression was positively connected with advanced clinical stages, and survival assays confrmed that patients with high-DDX39 levels exhibited a poor outcome. DDX39 was increased in HCC tissues and cells. According to the fndings of a functional analysis, increased levels of DDX39 in HCC cells facilitated motility, migration, growth, and invasion via regulating the Wnt/catenin pathway [13]. In addition, several genes in the blood of HCC patients were also reported to show important diagnostic values, such as serum IL27, HMMR, NXPH4, PITX1, and THBS4 [14,15].
Ferroptosis is a sort of regulated cell death (RCD) that is triggered by the accumulation of harmful lipid peroxidation and is dependent on the presence of iron [16]. In recent years, the induction of ferroptosis has emerged as a promising therapeutic alternative to suppress tumor proliferation and growth, especially for advanced tumors that are resistant to surgical treatment, radiotherapy, and chemotherapy [17,18]. It has been shown that ferroptosis plays an important role in the regulation of metabolism and redox biology, which has implications for the development of cancer and its treatment, including HCC [19][20][21]. Shan et al. showed that UBA1 contributed to the progression of HCC by elevating the activity of the Nrf2 signaling pathway and lowering the concentration of ferric ions, which triggered ferroptosis-inhibiting bioactivities [22]. In addition, several studies have reported the prognostic value of many ferroptosis-related genes (FRGs). However, the diagnostic model based on ferroptosis-related genes has not been investigated. In this study, we aimed to develop a diagnostic model based on ferroptosis-related genes using machine and deep learning methods.

Microarray Data Source.
Te GEO database was searched using the following keywords in order to retrieve the mRNA expression datasets of HCC: "hepatocellular carcinoma," "homo sapiens" (porgn: txid9606)," and "expression profling by array." Following an in-depth analysis, two GSE profles (GSE65372 and GSE84402) were chosen, and their respective downloads were initiated. GSE65372 and GSE84402 were based on GPL14951 and GPL570, respectively. Te array data for GSE65372 were composed of 39 HCC specimens and 15 nontumor specimens, respectively. For GSE84402, the array data also included 14 HCC specimens and 14 nontumor specimens. All data were freely accessible, and the present study did not involve any human or animal experimentation.

Diferential Expression Analysis.
We began by retrieving the expression data of 237 FRGs from the GSE65372 database. Within this dataset, only 237 FRGs were found to be expressed. Tese data were then applied to normal samples and HCC samples. Following that, the Student's t-test was carried out in R in order to identify the FRGs that exhibited diferent levels of expression in the two distinct samples. Genes that had a p value of less than 0.001 were determined to be signifcant.

Pathway Analysis.
Te "clusterProfler," "enrichplot," and "ggplot2" programs were used to conduct GO and KEGG pathway enrichment analyses in order to determine the biological characteristics of diferently expressed genes (DEGs) linked to ferroptosis. Tese analyses were carried out in order to identify the biological features of DEGs. Enrichment results with an FDR (false discovery rate) of <0.05 were recognized as signifcant functional categories.

Candidate Diagnostic Biomarker
Screening. Two different machine learning methods were employed to make predictions about the disease's progression in order to fnd meaningful prognostic variables. Te least absolute shrinkage and selection operator (LASSO) is an approach for regression analysis that makes use of regularization in order to increase the accuracy of prediction. In order to determine the genes that are signifcantly connected with the diferentiation of HCC samples from normal samples, the LASSO regression algorithm was implemented in R and carried out with the "glmnet" package. Support vector machine (SVM) is a popular type of supervised machine learning approach that may be used for both classifcation and regression. As a result, support vector machine recursive feature elimination (SVM-RFE) was utilized in order to choose the pertinent characteristics in order to fnd the group of genes that had the capacity to diferentiate across groups the most efectively.
2.6. Statistical Analysis. All statistical analyses were conducted using R (version 3.6.3). p < 0.05 was considered as statistically signifcant.

Identifcation of Diferential Expressed FRGs in the GSE65372
Datasets. 40 of the 237 FRGs exhibited a dysregulated level between HCC specimens and nontumor specimens, including 27 increased and 13 decreased genes, which were identifed from the GSE65372 dataset. Te clustering heatmap displayed the expression pattern of FRGs that were diferentially expressed between the samples (Figure 1(a)). Figure 1(b) illustrates the correlation between these genes.

Functional Analyses for the Diferential Expressed FRGs.
To explore the functional efects of diferential expressed FRGs, we performed GO and KEGG assays. As shown in Figures 2(a) and 2(b), we found that the 40 diferential expressed FRGs were mainly associated with responses to oxidative stress, cellular response to oxidative stress, regulation of autophagy, cellular response to chemical stress, mitochondrial outer membrane, organelle outer membrane, outer membrane, TOR complex, transcription coregulator activity, DNA-binding transcription factor bindin, and antioxidant activity. Te results of KEGG assays indicated that the 40 diferential expressed FRGs were mainly enriched in the longevity regulating pathway, AMPK signaling pathway, the mTOR signaling pathway, and hepatocellular carcinoma (Figure 2(c)).

Diferential Expressed FRGs Were Identifed as Diagnostic
Genes for HCC. Estimating the diagnostic capability of diferentially expressed FRGs was our goal in order to take into account the diferences that exist between patients with HCC and healthy individuals. Subsequently, we carried out two separate machine learning algorithms in the GSE65372 datasets for the identifcation of the distinct diferentially expressed FRGs in order to diferentiate HCC from normal specimens. Tese algorithms were used to identify the FRGs that was signifcantly diferent between the two groups. In order to choose HCC-related features, the LASSO logistic regression algorithm was utilized, and the penalty parameter tuning process was carried out using 10-foldcross-validation (Figures 3(a) and 3(b)). After that, we sorted through the 17 diferentially expressed FRGs using the SVM-RFE algorithm in order to locate the best possible combination of feature genes. In the end, seven genes were selected as the best candidates for feature genes (Figures 3(c) and 3(d)). Following the intersection of the marker genes generated from the LASSO and SVM-RFE models, 11 new markers (HSPB1, CDKN2A, LPIN1, MTDH, DCAF7, TRIM26, PIR, BCAT2, EZH2, and ADAMTS13) were identifed for further investigation (Figure 3(e)).

Te Identifcation of the Diagnostic Value of the New
Model for HCC. With the use of the glm R package, we developed a logistic regression model. Subsequent ROC curves demonstrated that the 11 marker gene-based logistic regression model correctly diferentiated normal samples from HCC samples with an area under the curve (AUC) value of 1.000. Tis model was based on the 11 marker genes mentioned earlier (Figure 4(a)). In addition, ROC curves were constructed for each of the 11 marker genes in order to provide light on the ability of individual genes to diferentiate normal samples from those containing HCC. AUC was higher than 0.7 for every gene, as shown in Figure 4(b). Based on the information shown above, it appears that the logistic regression model provides a higher level of accuracy and specifcity when compared to the individual marker genes when it comes to discriminating HCC samples from normal samples.

Expressions of Novel Diagnostic Genes in the GSE84402
and TCGA Datasets. In the fnal step of this process, we checked the expression of marker genes using the GSE84402 dataset. We found that the GSE20680 dataset was consistent with the patterns of expression for ADAMTS13, DCAF7, EZH2, HSPB1, and CDKN2A ( Figure 5). Among them, the expressions of DCAF7, EZH2, HSPB1, and CDKN2A in HCC specimens were distinctly increased compared with normal specimens, while the expressions of ADAMTS13 were distinctly decreased in HCC samples. In addition, in TCGA datasets, we found that the expression of 10 genes showed a dysregulated level in HCC ( Figure 6).

Discussion
HCC is the most prevalent primary malignancy of the liver, accounting for about 90% of all malignant cases. It is also the most curable form of primary liver cancer [23,24]. Te fact that the formation of HCC is a multistep process, as well as a multigene alteration-induced malignancy with a high level of heterogeneity, has been established via extensive research and documentation [25,26]. It has been determined that hepatitis B, hepatitis C, alcoholism, steatohepatitis, and obesity are all etiologic factors that contribute to the disease [27,28]. Recent studies at the molecular levels have indicated that specifc gene mutations play an important part in the progression of HCC. By controlling iron metabolism, amino acid and glutathione metabolism, and reactive oxygen species (ROS) metabolism, ferroptosis has shown promising results in inducing cancer cell death in recent years, especially in the elimination of aggressive malignancies that are resistant to conventional therapies [29,30]. Terefore, ferroptosis can be a potential and powerful target for cancer therapy. However, the relationship between ferroptosisrelated genes and HCC progression is still vastly unknown, making it a challenge to develop ferroptosis therapy for HCC. Tanks to the development of high-throughput technologies, gene microarray analysis has emerged as a powerful tool for detecting DEGs and, by extension, putative biomarkers in a wide range of disorders. Gene microarray analysis has been used in a number of studies to discover crucial genes in the etiology of HCC. Tere is hope that integrated multiple gene microarray analysis will help fnd more reliable gene biomarkers. Machine learning algorithms have been shown to ofer great potential for screening sensitive diagnostic biomarkers in a variety of diseases, and this research has only increased in the last few years [31,32]. In this study, we screened diferential expressed FRGs, and 40 of 237 FRGs exhibited a dysregulated level between HCC specimens and nontumor samples, including 27 increased and 13 decreased genes. By eliminating cells from the environment that lack vital nutrients, ferroptosis has been shown to play a crucial role in suppressing carcinogenesis, as demonstrated by recent scientifc studies. Functional studies of FRGs as tumor promoters or inhibitors have increased in the feld of HCC. Te results of KEGG indicated that the 40 diferential expressed FRGs were manly enriched in the longevity regulating pathway, AMPK signaling pathway, the mTOR signaling pathway, and hepatocellular carcinoma, highlighting their roles in HCC progression. Our fnding suggested the 44 diferential expressed FRGs may play an important role in the progression of HCC.
Based on the 40 diferential expressed FRGs, we carried out LASSO and SVM and confrmed 11 novel marker genes (HSPB1, CDKN2A, LPIN1, MTDH, DCAF7, TRIM26, PIR, BCAT2, EZH2, and ADAMTS13). Te AUC for all 11 genes Journal of Oncology 5 are more than 0.75, indicating that they can reliably and accurately separate HCC specimens from nontumor specimens. Among the 11 genes, some genes have been functionally studied in HCC. For instance, He et al. reported that the expressions of MTDH were found to be distinctly elevated in HCC specimens. In HCC patients, the expressions of MTDH were predictive of a short overall survival without any heterogeneity. In addition, high-grade histological differentiation, nonvascular invasion, and HCC metastases were all found to be linked with MTDH expression. Te results of in vitro investigations showed that MTDH has the ability to limit cell growth in all four HCC cell lines, in   Journal of Oncology 7 addition to activating caspase-3/7 activity and death [33]. Wang et al. showed that, when compared with normal liver tissue, the level of TRIM26 expression was much lower in HCC tissue; this was found to be associated with an advanced T stage and a bad prognosis. In vitro studies with HCC cells showed that inhibiting TRIM26 led to increased cancer cell proliferation and metastasis [34]. Tese fndings were consistent with our fndings. Our ROC curves showed that the logistic regression model based on these 11 marker genes successfully distinguished between normal and HCC samples (AUC � 1.000) using the R package glm. Our fndings suggested the novel diagnostic model based on 11 marker genes had great clinical reference values. Finally, we demonstrated the expression of 11 marker genes in other GSE84402 and TCGA datasets. Te expression of several genes was on track. However, more samples were needed to further confrm our fndings. Several limitations could also be found in our study. First, the sample size was low; despite the fact that our fndings were constructed using and validated using two separate datasets. Validation of this model in larger prospective clinical studies is required in the future. Second, to further understand the molecular functions of the 11 critical genes, additional biological research is required.

Conclusion
We developed a novel diagnostic model based on 11 FRGs for HCC. Tese eforts may also serve to further promote patient compliance, assist healthcare providers in better managing patients, and eventually improve their overall health status and quality of life.

Data Availability
Te data used to support this study are available from the corresponding author upon request.  Journal of Oncology