Multistrategy Improved Sparrow Search Algorithm Optimized Deep Neural Network for Esophageal Cancer

Deep neural network is a complex pattern recognition network system. It is widely favored by scholars for its strong nonlinear fitting ability. However, training deep neural network models on small datasets typically realizes worse performance than shallow neural network. In this study, a strategy to improve the sparrow search algorithm based on the iterative map, iterative perturbation, and Gaussian mutation is developed. This optimized strategy improved the sparrow search algorithm validated by fourteen benchmark functions, and the algorithm has the best search accuracy and the fastest convergence speed. An algorithm based on the iterative map, iterative perturbation, and Gaussian mutation improved sparrow search algorithm is designed to optimize deep neural networks. The modified sparrow algorithm is exploited to search for the optimal connection weights of deep neural network. This algorithm is implemented for the esophageal cancer dataset along with the other six algorithms. The proposed model is able to achieve 0.92 under all the eight scoring criteria, which is better than the performance of the other six algorithms. Therefore, an optimized deep neural network based on an improved sparrow search algorithm with iterative map, iterative perturbation, and Gaussian mutation is an effective approach to predict the survival rate of esophageal cancer.


Introduction
Esophageal cancer is a malignant tumor with high incidence, high mortality, and high recurrence rate [1]. e way of cancer patient treatment is mainly based on physicians' experience, which inevitably leads to physicians' treatment errors [2]. How to effectively forecast the survival time of esophageal cancer patients to decrease the misdiagnosis rate of physicians is a current research hotspot [3][4][5][6]. In recent years, the quick advancement of artificial intelligence has enabled the construction of intelligent systems, a simple and easy task [7]. Artificial intelligence has been able to simulate human intelligence more accurately to learn and make predictive actions on medical datasets [8,9]. In particular, deep learning excels at complex machine learning tasks by building multilayer neural networks [10,11]. Deep learning has made remarkable progress and excellent performance in areas such as medical image process, biological image process, and target inspection completion [12][13][14][15]. Since deep learning is adept at handling complex nonlinear problems, it can perform comparably or even better than professional physicians in the fields of disease identification and disease prediction [16,17]. erefore, survival prediction models based on deep learning will hopefully provide a scientific basis for clinical medical decisions in esophageal cancer.
Neural network [18] is the basis of deep learning, a mathematical simplification of single-layer perception of human nerve cells. Deep neural network (DNN) has been continuously trained by researchers regarding the exploration of the human nervous system with multiple implicit layers [19,20]. DNN has been used to identify different ECG abnormalities, and its identification results are better than hospitalists with heart disease and emergency medical medicine [21]. DNN model is effective in identifying tumors or hyperplastic polyps, and the identification time of the model is shorter than endoscopy [22]. 3D-DNN has achieved progress in automated lung cancer diagnosis by computed tomography. e model can identify all suspicious lung pathologies and evaluate the grades of lung malignancies [23]. DNN has exhibited state-of-the-art performance in areas such as medical image recognition [24], cancer diagnosis, pathology examination [25], stock price prediction [26], and daily plant transpiration estimation [27]. However, the DNN model trained on small datasets typically exhibits poorer properties than traditional machine learning approaches, such as shallow neural networks and support vector machines.
Gradient descent is a common optimization approach for neural network learning, which cleverly utilizes gradients to find function minima [28]. Nevertheless, when gradient descent is used as an optimization method for DNN, it is usually difficult to get rid of the local minimum value and slow convergence speed. e metaheuristic algorithm can effectively eliminate the above problem by obtaining the optimal solution through global search [29]. Metaheuristic algorithm is extensively implemented in function optimization, fuzzy logic system, and image treatment [30][31][32]. e DNN model optimized by the Gray Wolf Optimization (GWO) algorithm has been applied to extract and classify features from the CAD image dataset of diabetic retinopathy [33]. e model has advantages in terms of accuracy, precision, recall, sensitivity, and specificity. Gravitational search algorithm (GSA) plays an essential role in enhancing the prediction accuracy of DNN models [34]. e model can more precisely differentiate between benign and malignant nodules in CT scan lung images. Whale Optimization Algorithm (WOA) of DNN has achieved significant results [35]. Dimensionality reduction by principal component analysis (PCA) and firefly algorithm is performed on the diabetic retinopathy dataset [36]. e simplified dataset is sent to the DNN model for classification. e DNN model is shown to outperform other machine learning algorithms in accuracy, precision, recall, sensitivity, and specificity. Optimization algorithm for feature dimensionality reduction is a common neural network optimization method. However, the impact of dimensionality reduction on data results is significant, which may result in DNN performing even worse than traditional methods on small sample datasets [37]. is makes the application of DNN to small sample datasets a challenge. In this study, a method based on a metaheuristic algorithm for reducing the mean square deviation difference of DNN models and changing the interlayer connection weights makes the accuracy of models for small datasets improved.
In this study, the chaotic map is introduced into the sparrow search algorithm (SSA), which allows the sparrow population to be increased in sample diversification and improved in even distribution. e ability to escape local optimum and increase convergence speed is obtained by the SSA. Gaussian mutation and chaotic perturbations are also introduced into the SSA, which makes it possible to adjust the aggregated sparrow individuals. e local search ability of SSA in the focal search region is enhanced by Gaussian mutation and chaotic perturbation. e iterative map is employed in the SSA to verify its optimal performance by fourteen benchmark functions. Iterative map, iterative perturbation, and Gaussian mutation optimized SSA (IIGSSA) are proposed. IIGSSA is adopted to find the connection weights of DNN. To evaluate the algorithm performance, IIGSSA-DNN is applied to the esophageal cancer dataset along with eight predictive classification algorithms, which are DNN, PSO-DNN, GSA-DNN, GWO-DNN, WOA-DNN, SSA-DNN, IIGSSA-K Nearest Neighbor (KNN), and IIGSSA-Support Vector Machine (SVM). ey also use 26 metrics as input and survival time as output. Due to the small samples used in this paper, five-fold crossvalidation is selected to evaluate the authenticity of the model performance. Ten scoring criteria are used as judging criteria, which are accuracy (Acc), false positive rate (FPR), recall rate (REC), true positive rate (TPR), precision (PRE), true negative rate (TNR), area under the curve (AUC), F1measure (F1-M), and pooled mean (G-M) [38]. e IIGSSA-DNN model has an FPR value of 0.1, a P-value of 0.01, and all other scoring criteria have a value of 0.92. e model is evidenced to possess superior predictive accuracy and statistical value. erefore, IIGSSA-DNN is an algorithm to accurately predict the survival time for esophageal cancer patients. IIGSSA-DNN is expected as a novel approach for the future clinical treatment of esophageal cancer.
A mixed metaheuristic algorithm is formulated to improve the prediction algorithm of the DNN network structure. e iterative map, iterative perturbation, and Gaussian mutation optimized SSA (IIGSSA) is developed. e optimum network structure of DNN can be defined by IIGSSA. IIGSSA-DNN is expected to be an enabling instrument for the clinical diagnosis and treatment of esophageal cancer. is is also an innovation of IIGSSA-DNN applied to nonimage datasets of esophageal cancer. e primary contributions of this study can be summarized as follows: (1) e iterative map, iterative perturbation, and Gaussian mutation optimized SSA with the fastest convergence rate and best accuracy are validated in fourteen benchmark functions. (2) IIGSSA is used as an optimization strategy to improve the model accuracy by optimizing the optimal connection weights for the DNN model. (3) IIGSSA-DNN is presented to the esophageal cancer dataset, and IIGSSA can predict the survival time of esophageal cancer patients more precisely than other algorithms.
e remainder of this study is structured as follows. Section 2 introduces the SSA and improvement ideas and identifies the iterative map, iterative perturbation, and Gaussian variation as the best optimization strategy on fourteen benchmark functions. Section 3 describes the principles and improvement ideas of DNN and applies IIGSSA-DNN to the esophageal cancer dataset. e conclusions are described in Section 4.

Development and Validation of an Improved
SSA Based on Iterative Map, Iterative Perturbation, and Gaussian Mutation 2.1. SSA. SSA [39,40] is a swarm intelligence optimization algorithm that simulates sparrow foraging behavior and antipredation behavior. e sparrow population achieves foraging behavior through three task divisions, which are discovery, follow, and alert. Discoverers are leaders in the population due to their high fitness. is is associated with their capacity to seek and provide the location and direction of food resources. Epigones follow and forage around the discoverers for greater fitness. Epigones in a population supervise the behavior of other individuals, and they compete with high-intake peers for food resources to improve their predation rates. When the whole population is threatened by predators or perceives danger, the sparrows immediately counter-hunt. e sparrows in the outer circle of the population are vulnerable to predators, and they need to constantly relocate to the center of the population. e sparrow in the center of the population adjusts its position to keep its distance from others as short as possible. SSA simulates the foraging process of sparrows to obtain the solution for the optimization problem.
Suppose a population of N sparrows searches in D dimensional search space; then, the position of the i th sparrow in D dimensional search space is X i � [x i1 , x i2 , · · · , x id , · · · , x iD ], i � 1, 2, · · · , N. x id stands for the position of the i th sparrow in $D$ dimension. SSA takes 10%∼20% of the finders in the sparrow population as constraint conditions, and the position update equation is as follows: where t is the current iteration number. T is the maximum number of iterations. α is the uniform random number between (0, 1]. Q is a random number that follows the standard normal distribution. L is a 1 × D dimensional matrix composed of element 1. R 2 ∈ [0, 1] is the warning value. S T ∈ [0.5, 1] is the safety value. If R 2 < S T , predators or other hazards are not near the population, and the current search environment is safe. Discoverers conduct extensive searches to guide the population to obtain higher fitness. If R 2 ≥ S T , epigones are keen to spot predators and quickly release danger signals for reminding the population to act immediately against predation. e population adjusts its search strategy and quickly moves towards the safe area. e update equation for follower position is as follows: where A + is 1 × D vector allocation with the value 1 or -1, andA + � A T (AA T ) − 1 . xw t d is the worst position of sparrows in the Ddimension in the t iteration. xb t+1 d is the optimal position of sparrows in the Ddimension at the t + 1 iteration. If i > n/2, the i th epigone is starving without food. To obtain higher fitness, the epigone shifts to the area for food. If i ≤ n/2, the i th epigone has a better fitness at the current best location, and the epigone randomly seeks a location near the best location for foraging. 10%∼20% of the population is responsible for reconnaissance, and their positions are updated as follows: where β is a step size control parameter with the mean value of 0 and variance of 1, which satisfies normal distribution random number. K is a random step size control parameter in the interval [−1, 1], referring to the orientation of the sparrow's flight. e is a minimum constant to prevent the numerator being zero. f i is the fitness value of the i th sparrow. f g is the optimal fitness value of the current sparrow population. f w is the worst fitness value of the current sparrow species. If f i ≠ f g , sparrows on the edge of the population are vulnerable to predator attack, and they need to position themselves to avoid the attack. If f i � f g , the sparrow is in the middle of the population. To avoid being attacked by predators, sparrows in this area adjust their search strategy by approaching other sparrows in time after realizing the threat of predators.

Chaotic Map, Chaotic Perturbation, and Gaussian
Mutation Strategies. Chaotic [41] is a complex nonlinear motion type ubiquitous in nature. is nonlinear phenomenon usually occurs under certain conditions, which makes the ordered trajectory deviates from the original path suddenly into a disorderly form. e chaotic map is favored by scholars because of its randomness, ergodicity, and regularity. is is associated with its capacity to sustain a rich population variety. e chaotic map optimization metaheuristic algorithm enables the algorithm to escape local optimum while gaining higher global search capability. Nine common chaotic maps are listed in this study to optimize SSA.
Chaotic perturbation is the introduction of a random perturbation quantity obeying chaotic distribution based on the original solution. A chaotic variant is derived from the chaotic map, and the chaotic variant is brought into the solution space by (13).
where m min is the minimum value of the variable X d n in d th dimension. m max is the maximum value of the variable X d n in d th dimension. X d n is the amount of chaotic perturbation generated by the solution in the m dimension. C is chaotic variable. e chaotic perturbation equation is where X ′ is the individual requiring chaotic perturbation. X n is the amount of chaotic perturbation generated. X n ′ is the individual after chaotic perturbation. Gaussian mutation [51] is an optimized strategy modified from a genetic algorithm mutation operation. Gaussian mutation operation generates random numbers obeying normal distribution to generate new positions by acting on the original position vector. It implements neighborhood search instruction in a small range to distribute most of the variational operators in the original position. e advantages of high optimization accuracy and the difficulty of falling into local optimization are gained by the optimization algorithm. A small fraction of mutation operators away from the current position makes the potential region search more advantageous and the population diversity richer. erefore, Gaussian mutation is exploited to modify the algorithm, which will result in a much faster search speed and a much faster convergence trend. e Gaussian probability density equation is as follows: where x is the initial parameter value and G is the Gaussian normally displaced stochastic number with an expectation value of 0 and a standard deviation of 1.

2.
3. An Improved SSA. SSA is an algorithm with simple structure, easy implementation, few control parameters, and strong local searchability. It obtains the initial position of the sparrow based on the random initialization method. Although this approach ensures the randomness of the initial positions, the optimal values of the initial positions of some individuals are too different from the actual optimal values, which reduces the convergence speed and the accuracy of the solution. e blind production of initial positions is prone to the phenomenon of overlapping aggregation of initial solutions. It will lead to a low probability of solution space coverage and a low rate of change of population individuals. e pseudorandom number generator is an ideal information source with excellent statistics and stochastic properties. e chaotic map has high randomness and easy implementation, and it can randomly generate chaotic numbers between 0 and 1. erefore, chaotic maps are ideal for pseudorandom number generators. e introduction of chaotic map in SSA can effectively improve the initialization population blindness problem of the algorithm. e introduction of the chaotic map can effectively increase the global search capability of SSA. Gaussian mutation optimization strategy can strengthen the local search capability of the population and improve search accuracy. To protect against the solution stagnation phenomenon caused by the emergence of local optimum, the chaotic perturbation strategy is introduced into SSA. Some local optimal individuals are endowed by chaotic perturbations with a "new dynamism" capable of stepping outside the local optimum. e optimization strategy in SSA directly affects the convergence precision, search capability, and velocity. Strategy selection is crucial to the performance of SSA. In this study, a chaotic Gaussian sparrow search algorithm (CGSSA) based on a multistrategy fusion mechanism is developed by introducing a chaotic map, Gaussian mutation, and chaotic perturbation strategies. e detailed steps of the CGSSA execution are described below.
Step 1. Initialize the population size N, the number of discoverers P a , the number of scouting warning sparrows S a , the dimensionality of the objective function D, the upper bound ub, lower bound lb of the initial value, and the maximum number of iterations T.
Step 2. Initialize the sparrow population by chaotic sequences to generate N * D dimensional vectors Z. Each component of Z is brought into a defined range of values by equations (4)-(12).
Step 3. Calculate the fitness f i of each sparrow, select the current optimal fitness f b and its correspondent position X b of each sparrow.
Step 4. Select the top P a sparrows with the best fitness as discoverers and the rest as followers. Update the positions of discoverer and follower according to (1) and (2).
Step 5. Randomly select S a sparrows from the sparrow population as reconnaissance alerts. Update their positions according to (3).
Step 6. Recompute the fitness value of individual sparrows and the average fitness value of the sparrow population after each iteration.
Step 7. If f i < f a , perform a Gaussian mutation operation on the aggregated sparrow population according to (15). Compare the postmutation individuals with the premutation individuals. Determine whether to accept the position of the postmutation sparrow individual. If f i ≥ f a , perform a chaotic perturbation operation on the dispersed population of sparrows by (4)- (12). Compare postdisturbance individuals with predisturbance individuals. Determine whether to accept the location of the postperturbation individual sparrows.
Step 8. Get the current state of the sparrow population. Update the optimal position X b and its fitness f b by the whole sparrow population.
Step 9. If the algorithm runs to the maximum number of iterations, end the loop and output the search results. Otherwise, return to Step 4.
Chaotic map strategy is used by CGSSA to initialize the population to improve the population diversity. Both Gaussian mutation and chaotic perturbation strategies are introduced into the SSA, targeting to solve the sparrow divergence and aggregation problems. e local search ability of SSA in the focal search region is enhanced by Gaussian mutation and chaotic perturbation. To find the optimal chaotic map and chaotic perturbation strategies, this study tests the performance of nine chaotic map and chaotic perturbation strategies combined with the Gaussian mutation strategy to optimize SSA, respectively. e benchmark functions are chosen for the test functions, respectively.

Benchmark Functions Test.
To select the chaotic map with the highest adaptation to the SSA, fourteen benchmark functions [52][53][54] are selected in this study. e original SSA and the nine improved algorithms are validated. e fourteen benchmark functions are given in Table 1. e solution space diagram of the fourteen benchmark functions is illustrated in Figure 1. e parameters of the 10 algorithms are set as follows.
e number of populations is 30. e maximum number of iterations is 500. e dimension of the objective function and the range of initial values are kept consistent with Table 1. e number of discoverers and followers is set to 20% of the sparrow population size. To avoid contingency in the search outcomes, each benchmark function is examined 20 times individually. e optimal value, mean, and standard deviation of the run results are assessed to determine the robustness of each algorithm. Test outcomes of fourteen benchmark functions are listed in Table 2. e bolded data in the table indicate the best value of each function. A comparison of convergence curves of 10 algorithms on benchmark functions is illustrated in Figure 2.
From Table 2 and Figure 2, no matter which chaotic strategy is chosen to improve SSA, its convergence accuracy is better than the traditional SSA. CGSSA is proved to be effective and feasible. Seven functions obtained theoretical optimal values in ten algorithm tests, which are F 1 , F 2 , F 3 , F 7 , F 9 , F 11 , and F 14 . Ten algorithms fall into the same local optimum when testing functions F 8 and F 10 . In addition to this, the Chebyshev map achieves a minimum optimum on function F 6 . e iterative map obtains a minimum optimum on functions F 4 and F 5 . e sine map reaches a minimum optimum on two functions, which are F 12 and F 13 . e iterative map and sine map strategies have the highest number of minimum optimums, which means that these two strategies perform best in terms of optimal solutions. e mean and standard deviation are a pair of statistical indicators describing the overall characteristics of the data. e mean reaction dataset trend is the standard deviation reaction dataset trend. e overall characteristics of the data can be obtained more comprehensively and accurately by means and standard deviations. From the mean and standard deviation analysis, ten algorithms acquire the smallest mean and standard deviation when testing function F 10 . Except for the sinusoidal map, the remaining nine algorithms obtain the smallest standard deviation on function Computational Intelligence and Neuroscience F 11 . Seven algorithms obtained the smallest standard deviation on F 1 , which are SSA, Chebyshev map, circle map, iterative map, sine map, singer map, and logistic map. Five algorithms obtained the smallest standard deviation on function F 11 . Seven algorithms obtained the smallest standard deviation on F 1 , which are SSA, Chebyshev map, circle map, iterative map, sine map, singer map, and logistic map. Five algorithms obtained the smallest standard deviation on F 9 , which are Chebyshev map, circle map, iterative map, logistic map, and cubic map. Six algorithms obtained the smallest standard deviation on F 14 , which are SSA, Chebyshev map, iterative map, sine map, logistic map, and cubic map. In addition to this, the Chebyshev map reaches a smallest standard deviation on F 8 . Sine map acquires the smallest standard deviation on two functions, which are F 7 and F 11 . Iterative map acquires the smallest standard deviation on seven functions, which are F 2 , F 3 , F 4 , F 5 , F 6 , F 12 , and F 13 . Eight algorithms fall into the same mean when testing functions F 8 , which are Chebyshev map, circle map, iterative map, sine map, singer map, sinusoidal map, logistic map, and cubic map. In addition to this, the Chebyshev map reaches a smallest mean on F 11 . Circle map acquires the smallest mean on F 1 . Sine map acquires the smallest mean on F 7 . Iterative map acquires the smallest standard deviation on nine functions, which are F 2 , F 3 , F 4 , F 5 , F 6 , F 9 , F 12 , F 13 , and F 14 .
e minimum means and minimum standard deviations of iterative map strategy are the highest. Iterative map strategy is best played in the standard deviation of the average value. erefore, iterative map, iterative perturbation, and Gaussian mutation optimized SSA (IIGSSA) are identified for subsequent studies.

Deep Neural
Network. DNN is a method for learning the neural structure of the brain to mimic the processing of information [55,56]. e DNN structure is comprised by multiple perceptrons, also known as multilayer feedforward neural network. DNN has strong learning capability, selflearning capability, and self-adaptive capability. DNN is a complex pattern recognition network system capable of simulating more complex models or representing more abstract relationships of things. DNN is appropriate for big data analysis, which is reflected in disease prediction, medical image recognition, cancer diagnosis, etc. From the structural observation, DNN can be divided into an input layer, a multilayer hidden layers, and an output layer, as illustrated in Figure 3. I 1 , I 2 , ..., I h stands the input layer, which is intended to accept messages from exterior devices or systems. e j layer hidden layer effectively assumes the task of the computational engine in the entire network. O 1 , O 2 , ..., O z stands for the output layer, which enables to take decisions on the inputs. h indicates the amount of the input layer's neurons. z means the amount of the output layer's neurons. e output of each vector in the network layer is expressed by (16).
where Y m,n is the output value of the n biased neurons at layer m, and W T m,n is the weight value of the n biased neurons at layer m. X m−1 is the output value of all neurons of m − 1 layer, B is the biased neuron of m − 1 layer, and f(•) is the activation function.
To enhance the representativeness and versatility of the model, activation functions are introduced into DNN to perform nonlinear transformations on the inputs. In this study, the Sigmoid function as the activation function of the DNN is elected to prevent data scattering during transmission [57]. x is set as the input. e Sigmoid function is expressed in (17).
Step  Until the loss function is minimized and the training data is not trained by overfitting, the optimal model is obtained.
erefore, weights play a crucial role in DNN training. e loss function is a fundamental criterion to judge the performance of a neural network. e loss function is sensitive to small changes in weights and biases. e susceptibility of the loss function renders it the best way to     [58]. e interlayer weights of the DNN model are optimized by adjusting the error rate, and the adaptability of the model to the current dataset is adjusted, which has a substantial gain on the accuracy of the DNN model output. Mean square error, cross-entropy error, and root mean square error are common loss functions. In this study, the mean square error (MSE) is employed to calculate the error rate, as shown in where m is the sample number in the training dataset. o k is the model output produced by the k th input. A k is the K th actual output. e DNN model is trained by data inputs, and the weight values are obtained to predict the output results.

An Improved DNN Based on IGSSA
e optimal weights of a DNN have a direct correlation with the prediction results of the model. In this study, the accuracy of IGSSA has been verified in Section II, and thus a method to improve DNN based on IIGSSA is proposed. DNN mixed with the proposed IIGSSA is applied to determine the optimal weights of DNN, which will help to enhance the efficiency of the model. e optimum solution of all sparrows in IIGSSA algorithm is exploited to renew the       Table 3. Discrete metrics in the dataset are indicated in Table 4.

Performance Evaluation.
In this study, a modeling method of optimizing DNN through IIGSSA is proposed. Seventeen blood indicators are provided in the existing esophageal cancer dataset, namely WBC, LY, MONO, NEUT, EOS, BASO, RBC, HB, PLT, TP, ALB, GLB, PT, APTT, TT, and FIB. Seven items of tumor information are supplied, namely tumor length, tumor width, tumor thickness, degree of differentiation, tumor location, transfer situation, and TNM stages. e two physical characteristics are age and gender. Use these 26 features as the input dataset, and the IIGSSA-DNN algorithm is used to build a prognostic model. e overall flow chart of IIGSSA-DNN is illustrated in Figure 5. e robustness and accuracy of IIGSSA-DNN are validated by this study. e IIGSSA-DNN is evaluated against existing scoring criteria to measure the strengths and weaknesses of the proposed algorithm. e most commonly available rubrics [59] are Acc, FPR, REC, TPR, PRE, TNR, AUC, F1-M, and G-M. To avoid the limitations and specificity of fixed division datasets, five-fold cross-validation is employed for objective evaluation of the model. e survival time of esophageal cancer patients is predicted by the proposed IIGSSA-DNN. Five optimization algorithms have been achieved well in the field of neural networks [60][61][62][63], which are particle swarm algorithm (PSO), GWO, GSA, WOA, and SSA. Five optimization algorithms are also used to optimize the DNN and tested on the esophageal cancer dataset in comparison with IIGSSA-   [64]. Python is the operating platform of algorithms mentioned in this paper. e deep learning framework PyTorch is employed to implement DNN partial training. 26 features are set as the input dataset, survival time, and survival state as the output dataset, and a DNN model with 10 layers and 128 nodes is constructed.
e Sigmoid function is adopted as the activation function of the DNN. e learning rate of the model is fixed to 0.001 [65]. e learning rate is adaptively and randomly adjusted by Adam algorithm. e first-order moment estimates and second-order moment estimates of the gradient are calculated by Adam algorithm [66] to adjust the learning rates of the different parameters. In each iteration, the learning rate is limited to a rough range, which makes the parameters more stable. IIGSSA is employed to enable minimizing the MSE to find the optimal weight value. e population size is set to 30, and the maximum number of iterations is set to 1000.
e results of the predictive model assessment are shown in Table 5. e five ROC curves of nine models are displayed in Figure 6. Five ROC curves of DNN are expressed in Figure 6(a). Five ROC curves of PSO-DNN are indicated in Figure 6(b). Five ROC curves of GSA-DNN are illustrated in Figure 6(c). Five ROC curves of GWO-DNN are depicted in Figure 6(d). Five ROC curves of WOA-DNN are      14 Computational Intelligence and Neuroscience demonstrated in Figure 6(e). Five ROC curves of SSA-DNN are seen in Figure 6(f ). Five ROC curves of IIGSSA-KNN are expressed in Figure 6(g). Five ROC curves of IIGSSA-SVM are indicated in Figure 6(h). Five ROC curves of IIGSSA-DNN are represented in Figure 6(i). Test accuracy of nine models is shown in Figure 7.
From Table 5, the performance of IIGSSA-DNN for training esophageal cancer dataset is proved to be better than other optimization algorithms. e accuracy of the proposed IIGSSA-DNN is 0.92. e accuracy of IIGSSA-DNN is outperformed by other optimization algorithms of hybrid DNN. e patient's survival time is well predicted by IIGSSA-   e value of IIGSSA-DNN is 0.92 on all these rubrics, and the model is confirmed to have a strong performance in distinguishing between negative and positive samples. F1-M is a synthesis assessment index to assess the quality of the model. e F1-M value of IIGSSA-DNN is 0.92 higher than the other indexes. e model is confirmed to have higher model quality than the other models listed in this study. e AUC and P-values are represented in Figure 6. In Figure 6, the average ROC curve for each algorithm model is plotted as the blue curve. e AUC of IIGSSA-DNN has approached 1 and the P value is less than 0.05. e model has great statistical significance and better classification performance. Consequently, IIGSSA-DNN is exhibited to be a reliable classification model with much higher performance than the other models listed in this study.

Conclusion
DNN is a complex pattern recognition network system. e network structure and optimal weights have a drastic impact on the classification results of DNN. How to value the network structure and connection weights is a daunting task. In this study, a new chaotic map and Gaussian mutation are proposed to improve the optimal search strategy of SSA. e iterative chaotic map is validated by fourteen benchmark functions, and the iterative chaotic map improves SSA to a better rate than other chaotic maps. IIGSSA is identified as the best strategy to find the optimal fully connected weights of DNN. IIGSSA-DNN is contrasted with seven algorithms on the esophageal cancer dataset, and the seven algorithms are DNN, PSO-DNN, GSA-DNN, GWO-DNN, SSA-DNN, IIGSSA-KNN, and IIGSSA-SVM. e IIGSSA-DNN is proven to possess optimal performance in predicting the survival time of esophageal cancer patients. e accuracy of IIGSSA-DNN is substantially improved by the self-learning capability of DNN and the efficient search capability of IIGSSA.
erefore, IIGSSA-DNN gains more than traditional machine learning algorithms in dealing with complex issues. It is more accurate to solve the classification and recognition problems. According to this model, doctors can more keenly monitor the progress of each patient, thereby establishing a more benign diagnosis and treatment system. e parameters of the IIGSSA-DNN model rely on empirical selection, which limits the accuracy of prediction. e dataset sample samples used in this institute are relatively small, and the future period is verified in a large number of datasets and clinical trials. Taking the IIGSSA-DNN model as an opportunity, a secure smart device application is expected to be developed by researchers in the future. e application can bind the preliminary and retraining information of each patient, enabling doctors to customize personalized diagnosis and treatment schemes for each patient.
Data Availability e datasets presented in this article are not readily available because the data used in the study are private and confidential data. e datasets' access can be obtained from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.