Survival Prediction Model for Patients with Esophageal Squamous Cell Carcinoma Based on the Parameter-Optimized Deep Belief Network Using the Improved Archimedes Optimization Algorithm

Esophageal squamous cell carcinoma (ESCC) is one of the highest incidence and mortality cancers in the world. An effective survival prediction model can improve the quality of patients' survival. Therefore, a parameter-optimized deep belief network based on the improved Archimedes optimization algorithm is proposed in this paper for the survival prediction of patients with ESCC. Firstly, a combination of features significantly associated with the survival of patients is found by the minimum redundancy and maximum relevancy (MRMR) algorithm. Secondly, a DBN network is introduced to make predictions for survival of patients. Aiming at the problem that the deep belief network model is affected by parameters in the construction process, this paper uses the Archimedes optimization algorithm to optimize the learning rate α and batch size β of DBN. In order to overcome the problem that AOA is prone to fall into local optimum and low search accuracy, an improved Archimedes optimization algorithm (IAOA) is proposed. On this basis, a survival prediction model for patients with ESCC is constructed. Finally, accuracy comparison tests are carried out on IAOA-DBN, AOA-DBN, SSA-DBN, PSO-DBN, BES-DBN, IAOA-SVM, and IAOA-BPNN models. The results show that the IAOA-DBN model can effectively predict the five-year survival rate of patients and provide a reference for the clinical judgment of patients with ESCC.


Introduction
Cancer is the second leading cause of death in the world and poses a great danger to human health [1,2]. There will be approximately 19.29 million new cancer cases and 9.95 million cancer deaths worldwide in 2021 [3]. Esophageal cancer is the sixth most common cause of cancer-related death worldwide, including esophageal squamous cell carcinoma and esophageal adenocarcinoma [4]. More than 90% of esophageal cancers are esophageal squamous cell carcinoma (ESCC). The pathology of esophageal squamous cell carci-noma is complex, and it is often found at an advanced stage, which brings a huge burden to the patient's family [5,6]. In recent years, the incidence of esophageal squamous cell carcinoma has been increasing [7], and the mortality rate is still high [8,9].
One of the most fundamental difficulties in the treatment of ESCC is the lack of effective methods for predicting survival risk [10,11]. Currently, with the more in-depth research on ESCC and the continuous development of medical technology [12], the use of various types of intelligent systems in esophageal cancer diagnosis is increasing [13].
The treatment methods and treatment concepts for patients with ESCC have continued to rise [14]. However, as with other malignancies, the incidence of patients with ESCC is increasing. Even for professional doctors, it is difficult to judge the patient's ultimate risk of survival [15].
Generally, blood indicators, age, and TNM stage information are considered related to the survival rate of cancer patients, and they are often used to predict the survival status of patients [16][17][18]. In recent years, with the continuous progress of machine learning technology, more and more intelligent algorithms are proposed and applied in multiple fields [19][20][21]. In the medical field, the research on the survival risk of cancer patients has become a popular research content [22]. A reasonable survival prediction model will effectively improve the survival of cancer patients. Essentially, the cancer patient survival prediction model is a classification problem [23], including the screening of datasets and analyzing the connections between the data. So far, many data mining methods have been proposed in the literature to predict the survival status of esophageal cancer patients [24,25]. In [26], 90 breast cancer risk miRNAs are predicted based on the proposed DMTN by using the SVM classifier, which obtained an AUV of 0.9633. The method of backpropagation artificial neural network is adopted to predict whether postoperative fatigue occurred in patients undergoing gastrointestinal tumor surgery in [27], and the accuracy rate reached 0.872.
The above approach based on shallow architecture achieves good performance in cancer prediction problems. However, since the classification accuracy of shallow learning depends largely on the quality of the extracted features, it may cause problems when dealing with more complex applications [28]. In fact, for high latitude and complex cancer patient data, it is not sufficient to use simple traditional shallow architecture to solve it [29]. Correspondingly, the deep learning model has multiple nonlinear network structures, which enable it to extract the features of the original data from the hidden layer step by step and improve the classification and prediction accuracy of the model [30,31]. Therefore, a network structure with deeper layers is preferred.
Deep learning is a new direction in the field of machine learning that models high-level abstractions in input data with hierarchies and multiple layers [32,33]. Through the establishment of artificial neural network with a network hierarchy, multiple layers gradually extract higher-level features from the original input for learning. Different types of deep neural networks for classification prediction have been used in multiple literatures [34][35][36]. DBN is a probabilistic generative network, which is considered more suitable for prediction of cancer classification with high feature similarity and complexity [37]. However, in the process of building DBN, improper parameter setting will lead to the instability of the model and the problem of poor classification accuracy. Often, the selection of parameters still relies on the experience of experts to be manually tuned. Aiming at the above problems, a cancer patient survival prediction model based on the improved Archimedes optimization algorithm (IAOA) to optimize DBN parameters is proposed.
In this paper, seventeen blood indicators, age, and TNM staging information of 298 patients with ESCC are studied. Firstly, the clinical data of cancer patients are selected by the minimum redundancy and maximum relevancy algorithm, and the feature indexes are sorted according to their importance. A combination of eleven indicators is selected that is significantly associated with patient survival, which is verified by the Cox regression method in the SPSS software. Secondly, the IAOA is introduced to optimize the parameters in the DBN network training process to improve the stability and classification accuracy of the DBN model. Finally, a survival prediction model of patients with ESCC based on IAOA-DBN is established. The above eleven related indicators are used as inputs, and the five-year survival rate of the patient is used as output. The prediction accuracy rate of IAOA-DBN is better than the existing AOA-DBN, SSA-DBN, PSO-DBN, BES-DBN, IAOA-SVM, and IAOA-BPNN. Therefore, the method for survival diagnosis of patients with ESCC proposed in this paper can accurately predict the survival level of patients. The main contributions of this article can be summarized as follows: (1) A combination of eleven indicators is found based on minimum redundancy and maximum relevancy feature selection, which is verified to be significantly associated with survival in patients with ESCC (2) The proposed method uses IAOA to optimize the parameters of the DBN, which effectively improves the stability and classification accuracy of the DBN network. The problem that AOA tends to fall into local optimum and low convergence accuracy is effectively improved by the IAOA. Through the establishment of the IAOA-DBN model, the fiveyear survival rate of patients with ESCC is effectively predicted This work is presented as follows. In Section 2, the original data is analyzed, and a combination of multiple indicators that is significantly related to patient survival is found based on minimum redundancy and maximum relevancy algorithm. An improved Archimedes algorithm is proposed in Section 3, which can effectively improve the optimization accuracy and stability of AOA. In Section 4, a survival prediction model based on IAOA-DBN is proposed, which can effectively predict the five-year survival rate of patients with ESCC. In Section 5, the conclusions of this article are presented.  Table 1. Information of seventeen blood indicators is shown in Table 2. Among all patients, 147 patients survived more than five years, 151 patients survived less than five years, and the data are evenly distributed. The age distribution of the patients ranged from 38 to 82 years, including 190 male patients and 108 female patients. In addition, the selected patients should have complete treatment records and be followed up for more than six months.

Minimum Redundancy and Maximum Relevancy
Algorithm. The minimum redundancy and maximum relevancy (MRMR) algorithm [38] is a typical feature selection method. The purpose of MRMR is to select the features with the minimal redundancy and the maximal relevance with the class label. The relevance between features and class labels is represented by mutual information. The mutual information is calculated as Equation (1).
where x and y are given two random variables, pðx, yÞ is the joint probability density function of x and y, pðxÞ and pðyÞ are the probability density functions of x and y, respectively. The minimum redundancy and maximum relevancy are calculated as follows, respectively.
where S and jSj are feature subsets and the number of features contained therein, respectively, C is the class label, Iðx i ; cÞ is the mutual information between feature i and class label C, Iðx i ; x j Þ is the mutual information between feature i and feature j, D is the mean between each feature in the feature set S and the class label C, indicating the relevance between the feature set and the corresponding class label, and R is the size of the mutual information between the features in the feature set S, which represents the redundancy between the features.
The goal of the MRMR algorithm is to maximize the classification performance of the selected feature subset while minimizing the feature dimension. Therefore, it is required that the relevance between the feature subset and the label is the largest, and the redundancy between the features is the least. The minimum redundancy and maximum relevancy are constructed as follows.
The main process of minimum redundancy and maximum relevancy (MRMR) algorithm is as follows.

2.2.1.
Step 1: The First Feature Is Selected. The mutual information between all candidate variables and target variables in the clinical data of esophageal cancer patients is calculated. The feature variable with the largest mutual information is the first feature variable selected.

2.2.2.
Step 2: The Second Feature Is Selected. The redundancy between the selected first feature and the other features is calculated. The feature variable with the least redundancy is the second feature variable.

2.2.3.
Step 3: Sequential Selection of Other Features. Based on the selected two feature variables, the selection of the next feature variable is required to make the selected feature subset have the largest relevance with the target variable and the least redundancy with the selected feature. Therefore, it is necessary to satisfy the minimum redundancy and maximum relevancy criterion of Equation (4). Repeat the calculation of the criteria shown in Equation (4), and add the variables that meet the requirements to the selected feature subset in turn. When the number of selected features meets the requirements, the algorithm ends.
In order to clearly express the MRMR process, the framework of MRMR is shown in Algorithm 1.

Selection of Optimal Subset
Combinations. The patients' 17 blood indicators, age, and TNM staging information are used as input and five-year survival status as output. The patients' indicators are reordered according to their importance by the MRMR method. The reordered dataset is put into the BP neural network [39], and the classification accuracy of the feature combination is verified by tenfold 3 Computational and Mathematical Methods in Medicine cross-validation. When the highest classification accuracy is achieved, the combination with the smallest number of features is the optimal feature combination. The result is shown in Figure 1. When the highest classification accuracy is achieved, the number of features is eleven. Therefore, the features selected in this paper are the first eleven features. The eleven features are TNM stage, BASO, Age, PT, FIB, LYMPH, RBC, TT, PLT, T stage, and GLOB.

Cox Regression Analysis to Verify the Correlation of
Indicators. Cox regression models [40] are widely used in the medical field to analyze the effects of multiple variables on survival status and survival time. In this section, Cox regression models are used to further validate the correlation of selected features with a 5-year survival status and survival time of patients with ESCC. The SPSS 26.0 statistical software is used to make the Cox model. The survival time  The redundancy between f max and the rest of the features is calculated by Equation (2) and stored in the Redundancy set set. 9: f min = min sortfRedundancy set g 10: F ⟵ F \ f min ; S ⟵ f min 11: end for 12: for f = 1: |F | −2 13: The minimum redundancy and maximum relevancy of each feature is calculated by Equation (4) and sorted in the MRMR set. 14: S ⟵ S ∪ MRMR 15: end for 16: The sorted set S is output.  Figure 2. The results show that the p value of the overall score of the eleven indicators is much less than 0.05. The combination of these eleven indicators is significantly related to the survival rate of patients.

Improving the Archimedes Optimization Algorithm
3.1. Basic Archimedes Optimization Algorithm. The Archimedes optimization algorithm [41] (AOA) is a new metaheuristic algorithm proposed in 2020. In this algorithm, the population individuals are submerged objects, and the population position is updated by adjusting the density, volume, and acceleration of the objects. According to whether the objects collide in the liquid, AOA is divided into a global exploration stage and a local search stage. If the objects do not collide, the global exploration phase is performed. Instead, a partial development phase is performed.
3.1.1. Initial Stage. In the initialization phase, AOA randomly initializes the density (den), volume (vol), and acceleration (acc) of individuals in the population. The current optimal individual (x best ), optimal density (den best ), optimal volume (vol best ), and optimal acceleration (acc best ) are selected. In the AOA, the individual density, volume and transfer factor TF are calculated as Equations (5)-(7), respectively.
where rand is a random number between (0,1). den t i and den t+1 i are the densities of the individual i for the generation t and the generation t + 1, respectively. vol t i and vol t+1 i are the volumes of the individual i in the generation t and the generation t + 1, respectively.
where t is the current iteration number and t max is the maximum iteration number. When T F ≤ 0:5, AOA performs a global search, and the update of the individual acceleration is calculated as follows.
When T F = 0:5, AOA is developed locally, and the individual acceleration is updated to the following: The acceleration of the individual is normalized to obtain Equation (10).
where acc t+1 i−norm is the normalized acceleration of the individual i in the t generation, u and l are the parameters for adjusting the normalization range.
During the global search phase, the individual positions are updated by Equation (11).
where x t+1 i and x t i are the positions of individuals in the t + 1 and t generations and x rand is the positions of random individuals in the generation t. rand ∈ ð0, 1Þ is a random number. C 1 is a fixed constant. d is the density factor, which is calculated as follows.
During the local development stage, the individual position is updated by Equation (13).
where c 2 is a fixed constant and F is the direction factor that determines the update direction of the individual position, which is constructed as follows: where p = 2 × rand − C 4 and C 4 is a fixed constant. T = C 3 × T F, and T ∈ ½C 3 × 0:3,1:  Computational and Mathematical Methods in Medicine 3.2. Improved Archimedean Optimization Algorithm. In the basic AOA, the update of the optimal individual of the population depends on the update of the population in each iteration. After each iteration, the optimal individual is replaced by the individual with the best fitness, and the algorithm does not actively disturb the optimal individual. When the optimal individual of the population falls into the local extremum space, the algorithm will fall into the local optimum, and the phenomenon of premature convergence will occur [42]. Therefore, this paper introduces the corresponding improvement strategy to improve the defects of the basic AOA. Firstly, Sine chaos mapping and reverse learning strategies are used to initialize the population, which can enhance the population diversity and improve the solving efficiency. Secondly, Gaussian variation and superior selection strategies are used to perturb the positions of optimal individuals, which can enhance the global search ability and help the population to jump out of the local optimum. In this paper, the improved AOA is called IAOA. The specific strategy is as follows.

Sine Chaos Reverse Learning Initialization Strategy.
The population of AOA is initialized by random generation. This leads to uneven distribution of individuals in the initial population, which affects the later iterative optimization. The Sine chaotic model [43] is a chaotic model with good randomness and ergodicity with infinite number of map foldings. Reverse learning [44] can obtain its corresponding reverse solution through the current solution. The optimal initial solution can be obtained by comparing and selecting a better solution. In this paper, the Sine chaotic strategy is used to generate an initial population with better diversity. Second, the reverse population is generated according to reverse learning. Finally, the fitness of the obtained population is calculated, and the solution with low fitness is selected as the initial population to improve the probability of obtaining the optimal initial solution. The 1-dimensional mapping expression of Sine chaos is calculated as the follows.
The population X = fX i , i = 1, 2, ⋯:Tg, X j = fX j , j = 1, 2, ⋯dimg is obtained by mapping the Sine chaos into the solution space. The population individuals are represented as follows.
where X i+1,j is the dimensional j value of the population i + 1.
The reverse population can be represented as X * = fX i * , i = 1, 2, ⋯Tg, X * i = fX * ij , j = 1, 2, ⋯dimg. The reverse population individual X * ij can be calculated by the following.
where ½X min j , X max j is the population search dynamic boundary.
The new population fX ∪ X * g is formed by the Sine chaotic population X and the reverse population X * . The fitness values of the new population are ranked, and N individuals with the best fitness values are selected to form the initial population.

Gaussian Operator and Superior Selection Strategy.
The Gaussian operator [45,46] is introduced in this paper in order to avoid AOA from falling into local optimum and to maintain the diversity of individuals in the population. The current optimal solution X best t is subjected to Gaussian variation with certain probability p, and a meritocratic selection strategy is taken. The expression of the Gaussian variational operator is calculated as follows: where X t+1 i denotes the individual position after variation and GaussðδÞ is a random variable satisfying a Gaussian distribution. The global optimal solution position is updated as follows.
where rand 1 is a random variable between ½0, 1, p is the probability of superior selection, and f ð:Þ is the individual fitness value. Therefore, variational operations on the global optimal solution can avoid the algorithm from falling into a local optimum and effectively improve the search efficiency.   Computational and Mathematical Methods in Medicine In order to clearly express the IAOA process, the framework of IAOA is shown in Algorithm 2.

IAOA Validation and Comparison.
In order to fully verify the effectiveness of the IAOA proposed in this paper, the improved Archimedes optimization algorithm, Archimedes optimization algorithm, sparrow search algorithm [47], and bald eagle search algorithm [48] are compared and tested under thirteen benchmark functions at the same time. The selected benchmark functions are classified into three categories. The first category is the single-peak benchmark function, as shown in F1-F5 in Table 3. The second category is the multipeak benchmark function, as shown in F6-F10 in Table 3. The third category is the multimodal benchmark function with fixed dimension, as shown in F11-F13 in Table 3. The basic parameters of the algorithm are as follows: the population size is 30, and the maximum number of iterations is 500. The other parameters within the algorithm are shown in Table 4. The experimental results are presented in Tables 5 and 6. The optimization ability of the algorithm is reflected by the optimal value and the average value, and the stability of the algorithm is reflected by the standard deviation. Firstly, for the five single-peaked functions, IAOA has higher convergence accuracy and stability compared to other algorithms. Secondly, F6 and F8 are able to reach the theoretical optimum when solving for the multipeak function. For other multipeaked functions, IAOA has the best search accuracy and stability. For fixed dimensional functions, IAOA is also better than other algorithms. Therefore, the improvement strategy proposed in this paper has improved the performance of the algorithm to some extent. Input: Initialize algorithm related parameters: the maximum number of iterations T max , population search boundary ½ub, lb, parameters C 1 , C 2 , C 3 , C 4 , density (den), volume (vol), and acceleration (acc). 1: The population is initialized by using the Sine chaos reverse learning strategy 2: while ðt < T max Þ 3: for i = 1 to N do 4: The den and vol are updated by Equations (5) and (6); 5: The T F is calculated by Equation (7); 6: The d is calculated by Equation (12) [49]. The learning process of DBN can be divided into pretraining and fine-tuning. During the pretraining process, each RBM is trained individually by an unsupervised learning algorithm in turn, and the network parameters of each layer are gradually adjusted. In the fine-tuning process, the classification labels are used as the output layer of the DBN. The BP neural network is trained sequentially from top to bottom, and the training error is propagated back to the RBM to fine-tune the parameters of all layers to reach the global optimal    Figure 4. There are bidirectional connections between the visible and hidden layers, while there is no connection between units in the same layer. In RBM, there is a weight w between any two connected neurons in the visible layer and the hidden layer to represent the connection strength. Each neuron has a bias coefficient a (for the neurons in the visible layer) and b (for the neurons in the hidden layer) to represent its own weight. Therefore, the energy function contained in each RBM is calculated as follows: where θ represents the parameter set of RBM, including the state v i and bias a i of the visible layer and the state h j and bias b j of the hidden layer, ω ij is the connection weight between the visible layer node and the hidden layer node, and n and m represent the number of neurons in the visible layer and the hidden layer, respectively.
According to the energy function of the RBM, the joint distribution of the visible layer and the hidden layer is calculated as follows.
where TðθÞ = ∑ v,h e −Eðv,h;θÞ , called the normalization factor. The independent probability distribution of the visible layer is calculated as follows.
There is no connection between nodes in the same layer in the RBM, so the conditional probability distribution of each neuron in the visible layer and the hidden layer is as follows: where εðxÞ = 1/ð1 + exp ðxÞÞ is the sigmoid function. The goal of RBM training learning is to make the Gibbs distribution of the RBM network representation as close as possible to the distribution of the original data so that pðvÞ is maximized.
The network structure parameters θ = fa i , b j , ω ij g of the RBM can be obtained using the maximum likelihood estimation method, and the parameter set θ can be updated by the comparative scattering method, as expressed by the following.
where hi pðvjθÞ represents the expected value of the partial derivative under the distribution of pðv | θÞ.
The model parameter update method is as follows: Determine the fitness function The population position is updated by Equation (11) The DBN optimal weight parameters are obtained The acc and acci-norm are calculated by Equation (8) and (10) The acc and acci-norm are calculatedby Equation (9) and 10 The population position is updated by Equation (13) The location of the population is updated by Gaussian variation      The established survival prediction model for patients with ESCC is shown in Figure 6. To verify the validity of this model, the Archimedes optimization algorithm-deep belief network (AOA-DBN), sparrow search algorithm-deep belief network (SSA-DBN), particle swarm optimizationdeep belief network (PSO-DBN) [50], bald eagle searchdeep belief network (BES-DBN), improved Archimedean optimization algorithm-support vector machines (IAOA-SVM), and improved Archimedean optimization algorithmbackpropagation neural networks (IAOA-BPNN) are used for comparison. The initial population of AOA, SSA, PSO, and BES is uniformly set to 20, and the maximum number of iterations is 500. The dataset is divided into ten parts, and the tenfold cross-validation method is used to verify the classification accuracy of the model. The prediction results of the DBN optimized by the five optimization algorithms, IAOA-SVM, and IAOA-BPNN model are shown in Table 7.
When eleven patient indicators are used as input, the Tables 5 and 6  To better demonstrate the effectiveness of the proposed model, the Wisconsin Diagnostic Breast Cancer (WBCD) dataset is used for testing. In Wisconsin Diagnostic Breast Cancer (WBCD) dataset, 30 indexes of patients are used as input, and the benign and malignant tumors of patients are used as output. The dataset is divided into ten parts, and the tenfold cross-validation method is used to verify the performance of the model. The test results are shown in Table 8. From the test results, it can be seen that IAOA-DBN has higher prediction accuracy than other models. Therefore, the survival prediction model proposed in this paper can effectively predict the prognosis of cancer patients.

Conclusions
A novel survival prediction model for patients with ESCC is presented in this paper. Firstly, a minimum redundancy and maximum relevancy algorithm is used to screen out indicators significantly correlated with survival in patients, which is validated by the Cox regression analysis. Secondly, an IAOA-DBN model is proposed. The model uses IAOA to optimize the parameters in the DBN training process, which improves the stability and classification accuracy of the DBN model. Finally, the model is applied to the survival prediction model for patients with ESCC. The results of comparison with four methods verify the validity and superiority of the model. The key conclusions are expressed as follows.
(1) The patients' clinical indicators are ranked by importance using the minimum redundancy and maximum relevancy algorithm, and a new subset of features is selected. The experimental results show that the new feature subset is with better prediction results than the all-feature set  (2) Aiming at the problem that poor convergence accuracy and easy to fall into local optimum of AOA, an improved AOA (IAOA) is proposed in this paper.
The experimental results show that the improved strategy proposed in this paper improves the performance of AOA to a certain extent (3) The learning rate α and batch size β of DBN are optimized using IAOA to obtain the optimal parameters, which improved the classification prediction accuracy and stability of the DBN model. Compared with AOA-DBN, SSA-DBN, PSO-DBN, and BES-DBN, the results verify the effectiveness and superiority of the IAOA-DBN model

Data Availability
The datasets presented in this article are not readily available because the data used in the study are private and confidential data. Requests to access the datasets should be directed to Junwei Sun, junweisun@yeah.net.

Conflicts of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.