Feature Selection and Dwarf Mongoose Optimization Enabled Deep Learning for Heart Disease Detection

Heart disease causes major death across the entire globe. Hence, heart disease prediction is a vital part of medical data analysis. Recently, various data mining and machine learning practices have been utilized to detect heart disease. However, these techniques are inadequate for effectual heart disease prediction due to the deficient test data. In order to progress the efficacy of detection performance, this research introduces the hybrid feature selection method for selecting the best features. Moreover, the missed value from the input data is filled with the quantile normalization and missing data imputation method. In addition, the best features relevant to disease detection are selected through the proposed hybrid Congruence coefficient Kumar–Hassebrook similarity. In addition, heart disease is predicted using SqueezeNet, which is tuned by the dwarf mongoose optimization algorithm (DMOA) that adapts the feeding aspects of dwarf mongoose. Moreover, the experimental result reveals that the DMOA-SqueezeNet method attained a maximum accuracy of 0.925, sensitivity of 0.926, and specificity of 0.918.


Introduction
Heart disease destroys the function and structure of the heart, which causes the major death of humans around the globe. Several heart diseases produce heart attacks, the most difcult cardiovascular disease [1]. Te major part of the human body is the heart, which pumps blood into the entire body organ. In case, the heart does not function properly, then the diferent organs in the human body will stop to work make to death. Hence, the regular functioning of the heart is very important. Heart disease is considered the most important reason for death worldwide [2]. Moreover, heart disease is generally occurring in both women and men. Hence, the invention of an efcient heart disease prediction technique helps to reduce the death rate [3,4]. In the medical feld, heart disease diagnosis is a complex task, regularly increasing the mortality rate. Hence, the researchers have introduced an automatic disease diagnosis technique for perceiving heart disease. To detect heart disease, the researchers gathered the clinical data from clinical experience, and the detection is done by decision-making method and doctor's diagnosis [1].
Recently, various researchers utilized machine learning, data mining, and deep learning techniques in healthcare for predicting heart disease [5]. Deep learning is an extended version of the machine learning model, normally utilized in image and data processing techniques in numerous medical felds [6][7][8][9]. Normally, data mining methods are utilized to compute the relationship among numerous factors and hidden information of input data [10,11]. Various deep learning models [12] have been applied to acquire the signifcant performance of heart disease prediction [13,14]. For heart disease prediction, feature selection is considered as a signifcant step [13,15]. Te high reliability and precision classifcation methods ofer more assistance to data in recognizing prospective patients. Te commonly used heart disease prediction techniques are logistic regression, clustering algorithm, Naïve Bayes, neural networks, and support vector machine (SVM), which ofer substantial performance in heart disease prediction [16,17]. Furthermore, missing and uncertain data disturbs the prediction method's performance [4,18]. Moreover, deep learning methods provide efective performance with massive and unclear datasets. In addition, deep learning techniques help to classify the inexistence and existence of heart disease [2,19].
Te heart disease prediction technique using a newly devised model is explained in this paper. Te input data is preprocessed here using quantile normalization and missing data imputation. Ten, the preprocessed data is processed under the feature selection to choose the relevant features based on the congruence coefcient and Kumar-Hassebrook similarity. Te SqueezeNet does the heart disease prediction, wherein the weight of SqueezeNet is learned by the DMOA that provides the detected outcome is either normal or abnormal patients.
Te novelty of this research is specifed by (i) Proposed hybrid Congruence coefcient Kumar-Hassebrook similarity for feature selection: In this research, the best feature from the input data is chosen by the hybrid congruence coefcient Kumar-Hassebrook similarity. Here, the preprocessed data is frst given to the Congruence coefcient that selects the top score features. Ten, these features are again sent to the Kumar-Hassebrook similarity that selects the most appropriate features. In addition, the heart disease prediction is completed by the SqueezeNet, which is learned by the DMOA.
Te structure of this paper is given in this section. Section 2 describes the literature survey of heart disease detection, the proposed methodology is explained in Section 3, results and discussion of the introduced model are exhibited in Section 4 and then Section 5 shows the conclusion of this paper.

Literature Survey
Te survey of numerous heart disease prediction methods is given as follows: Kora et al. [20] introduced the bacterial foraging particle swarm optimization (BF-PSO) for detecting heart disease. Here, the hybrid BF-PSO is designed by integrating Bacterial foraging optimization (BFO) with particle swarm optimization (PSO). Although the proposed model provides improved detection accuracy by extracting more relevant features, this method had maximum training time. Manur et al. [4] modeled the bi-directional long shortterm memory with conditional random feld (Bi-LSTM-CRF) to predict heart disease. Here, the medical data was examined by the bidirectional LSTM, and the CRF model was employed to compute the relationship among various features. Te computation cost of this method was maximum. Budholiya et al. [21] introduced the XGBoost mode for diagnosing heart disease. However, the model failed to process complicated datasets. Oliver et al. [1] introduced the regressive learning-based neural network classifer (RLNNC) for predicting heart disease. Tis method provided a better detection result, but the computation cost of this method was high.

Challenges
Te complications of various prevailing heart disease prediction techniques are given as follows: (i) In [20], the BF-PSO model was introduced to predict heart disease. However, the scheme produced the minimum detection accuracy with very large databases. (ii) In [4], the bi-LSTM-CRF model was devised to detect heart disease early. However, this method provides poor detection performance since the detection method did not utilize any algorithm for training the classifer. (iii) Te challenges of the proposed method in [21] are that it only detects heart disease, but did not detect any other similar tasks. (iv) Te major challenging step of heart disease prediction is feature extraction. Moreover, using high dimensional data increases the training time of classifers.

Proposed Congruence Coefficient Kumar-Hassebrook Enabled Feature Selection and DMOA-SqueezeNet for Heart Disease Detection
Tis research introduced an efective heart disease detection approach, namely, DMOA-SqueezeNet. Initially, input data is considered from a specifc dataset [22], which is given to preprocessing phase where the image is preprocessed using quantile normalization [23] and missing data imputation. After that, feature selection is done to select the suitable features utilizing the proposed hybrid feature selection scheme, namely, the congruence coefcient Kumar-Hassebrook similarity. Finally, heart disease prediction is performed using SqueezeNet [22], which is trained using an optimization algorithm, namely, DMOA [24]. Te block diagram of the newly modeled heart disease detection technique is revealed in Figure 1

Get the Input Data.
Te input data is taken from the heart disease dataset C, which consists of d number of heart disease data, and is formulated as where d denotes the total number of medical data, C a specifes the a th number of data, and this data is considered 2 Computational Intelligence and Neuroscience for forecasting the heart disease in this research, and the dimension of original data is k × n.

Preprocessing.
Tis step explains the preprocessing of input data C a with size k × n using quantile normalization [23] and missing data imputation. Te preprocessing method is used to remove the redundant data from the input data C a with size k × n.

Quantile Normalization.
For quantile normalization, the input data (C a ) k×n is subjected to quantile normalization [23] for normalizing it. Te process of quantile normalization is a simple procedure to normalize the input data. To perform the quantile normalization, the frst step is to rank the input data based on its magnitude values and then compute the average values of input data with the same rank. After that, applying the values of input data occupying that specifc rank with the average value. Te fnal step is to rearrange the input data into the original order. Hence, the outcome of quantile normalization is indicated as Q k×n .

Missing Data Imputation.
After the quantile normalization, the missing data imputation is performed, replacing the missing data from the normalized data Q k×n with the substituted values. Here, the missing values are substituted in two ways, like numerical attribute substitution and categorical attribute substitution. For numerical attribute substitution, the mean values of numerical data are computed and then substituted it with the missing values.
For categorical attribute substitution, most data type is substituted with the missing values. Hence, the outcome of missing data imputation is indicated as X k×n .

Feature Selection.
After the preprocessing, the processed data contains various relevant and irrelevant features.
However, all of these features are not necessary for heart disease prediction; hence the prediction process requires only meaningful features. Tus, feature selection is required to select the appropriate and meaningful features. In this research, the suitable features are selected by the proposed hybrid congruence coefcient Kumar-Hassebrook similarity. For that, initially, the preprocessed data X k×n is sent to the congruence coefcient and then the outcome of the congruence coefcient is passed to the Kumar-Hassebrook similarity so that the best features are selected.

Congruence Coefcient.
Te congruence coefcient [25] is utilized to select the features from X k×n by comparing the candidate feature with the target values. It is used to evaluate the similarity of two confgurations. It increases the prediction accuracy of the model. Hence, the expression for the congruence coefcient is given by where P denotes the candidate feature and R specifes the target values. After calculating the congruence coefcient, the top o features with a high degree of factor are selected as the best feature, and the selected feature from the congruence coefcient is denoted as Y k×o , where n > o.

Kumar-Hassebrook Similarity.
After selecting the best feature Y k×o using the congruence coefcient, the Kumar-Hassebrook similarity [26] is applied on it to select the most appropriate feature. In Kumar-Hassebrook similarity, the best feature is picked by comparing the candidate feature with the target value, and the expression becomes

Heart Disease Prediction Using SqueezeNet.
Te dysfunction of the actual processing of the heart is called heart disease. Generally, heart diseases are identifed through various deep-learning techniques. In this research, the heart disease prediction is done using SqueezeNet [22], which is trained by the DMOA method. Here, the SqueezeNet model selects the input as V k×z for heart disease prediction. Te gain of the SqueezeNet model is that it provides better detection results with simple construction costs. Ten, the structure of SqueezeNet is explained in the succeeding section.

Structure of SqueezeNet.
Te SqueezeNet [22] generally comprises of various fre modules, and the fre modules contain a squeeze convolution layer and an expand layer. In the fre modules, the outcome of the squeeze convolution layer is sent to the next expand layer. Moreover, the SqueezeNet starts with a standalone convolution layer tracked by the 8 fre modules and ends with the fnal convolution layer. Ten, the outcome of the SqueezeNet model is represented as B m . In addition, the SqueezeNet performs the max-pooling operation in two strides, shown in Figure 2.

SqueezeNet
Training Using DMOA. Te SqueezeNet used in this research is trained with the DMOA, which is elaborated in this section. Te basic principle of DMOA is based on the foraging characteristic of the dwarf mongoose. DMOA [24] is a metaheuristic model for resolving optimization complexities. DMOA has the ability to generate and improve the candidate solution for the specifed optimization problems. In this algorithm, the dwarf monkeys explore the diferent areas of problem search space, as a result, they are moving from one food source to another. Moreover, DMO utilizes only one parameter for tuning. Te algorithmic steps of DMOA are explained as follows: (1) Initialization. Te algorithmic constraints and solutions are initialized in the frst step, which is utilized to generate the optimal solution.
(2) Fitness Measure. Te optimal solution is chosen based on the MSE, which is formulated as where w denotes the total sample count, B * m denotes the expected outcome, and denotes the classifed outcome of SqueezeNet.
(3) Alpha Group. After the population initialization, the efectiveness of the entire solution is determined. In this step, the alpha female is selected with respect to the likelihood values, which are calculated by Here, t specifes the mongoose count in k and b min specifes the ftness function. Te upgrading strategy of the solution is given as Here, the distributed random number is signifed as α p , and the vocalization of the leading female is denoted as Q, which sustains the family and s p specifes the solution of the present iteration. After every iteration, the sleeping mount is computed, which is given by Moreover, the average count of the sleeping mound is formulated as Here, L r denotes the sleeping mount and t specifes the total number of sleeping mounts. After fulflling the babysitting exchange criterion, the DMOA algorithm enters into the scouting stage.  Computational Intelligence and Neuroscience (4) Scout Group. In this step, the mongoose moves in the optimal sleeping mound while the family explores in the long distance. Tus, the scout mongoose is formulated as Here, z indicates the random value among (0, 1) and then the value of D and S → is computed as Babysitters are inferior group persons so they are normally youngsters and are focused on activating the female alpha for performing the daily hunting. Algorithm 1 shows the pseudocode of DMOA.

(5) Re-Evaluation of Feasibility. Te feasibility of the solution is determined with respect to the ftness value computation.
Here, the smallest value of MSE is considered the best solution so that the poor solution is iteratively replaced by the best solution.
(6) Termination. All the above-mentioned processes are performed continuously till the optimum solution is attained. Algorithm 1 displays the pseudocode of the DMOA algorithm.

Results and Discussion
Te results and discussion of the proposed DMOA-SqueezeNet for heart disease prediction are elucidated in this section.

Experimental Setup.
Te introduced model is implemented in the python tool on PC with windows 10 OS and intel i3 core processor.

Description of Dataset.
Te dataset used for the projected scheme is the heart disease dataset (Cleveland) [24], and the Z-Alizadeh Sani dataset [27]. Te Cleveland dataset contains 76 attributes. Te Z-Alizadeh Sani dataset contains a total of 303 patients record with 54 attributes. Specifcally, this dataset is utilized to detect heart disease, wherein the integer values vary between 0 and 4.

Performance Metrics.
Te metrics used to assess the efciency of DMOA-SqueezeNet are accuracy, sensitivity, and specifcity, which are given in the next section.

Accuracy.
Testing accuracy is used to quantify the efectiveness of detection results, which is given by u 1 � g p + g n g p + g n + h p + h n , (11) where g p defnes the true positive, g n indicates the true negative, h p expresses the false positive, and h n states the true negative.

Sensitivity.
Te metrics used to measure the accurateness of true positive rate, which is defned by

Specifcity.
Te metrics used to quantify the accurateness of false negative rate, which is defned by

Comparative
Analysis. Te analysis of novel heart disease prediction is accomplished by adjusting the two types of varying data, like training data and k value.

Initiate the algorithmic constraints while (U < H U ) do
For (p � 1 to k) do Estimate the ftness of mongoose Set the time counter Estimate the value of alpha by equation (5) Compute the best solution by equation (6) Evaluate the sleeping mound using equation (7) Evaluate the mean value of the sleeping mound using equation (8) Compute the movement vector using equation (9) Execute the scout mongoose for a successive solution using equation (10) End for c � c + 1 end while Get the best solution ALGORITHM 1: Pseudocode of DMOA. 6 Computational Intelligence and Neuroscience sensitivity of DMOA-SqueezeNet is 0.918 for K-Fold � 9, which is 2.41%, 1.43%, 1.40%, 0.67%, and 0.382% higher than the prevailing methods. Figure 4(c)) shows the specifcity graph of DMOA-SqueezeNet. Te specifcity of DMOA-SqueezeNet is 0.90, which is 6.22% better than BF-PSO, 4.83% better than bi-LSTM-CRF, 3.69% better than XGBoost, 2.08% better than RLNNC, and 1.3% better than DMOA-SqueezeNet (without feature selection) for K-Fold � 9.
(2) Analysis Regards to k-fold. Te K-Fold analysis using the Z-Alizadeh Sani dataset is shown in Figure 6. Te accuracy graph of DMOA-SqueezeNet is exhibited in Figure 6(a)).
Here, the accuracy of DMOA-SqueezeNet is 0.902 for     model used for the prediction provides a better detection result with a simple construction cost. Tus, the performance of the proposed method is better than the conventional approaches.

. Conclusion and Future Directions
Te heart disease prediction technique, namely, DMOA-SqueezeNet is explicated in this research. For heart disease prediction, the input data is preprocessed, and the various methods select the appropriate features. Here, the heart disease prediction is done by the SqueezeNet model, wherein the DMOA trains the weight and bias of SqueezeNet. DMOA is modeled by adapting the feeding behavior of dwarf mongooses. Moreover, DMOA contains only one parameter for fnding the optimal solution. Moreover, the preprocessing method uses quantile normalization and missing data imputation. Te feature selection is done by the hybrid congruence coefcient Kumar-Hassebrook similarity. Here, the selected feature from the congruence coefcient is passed to the Kumar-Hassebrook similarity, again selecting the higher score features for heart disease prediction. Moreover, the experimental result reveals that the DMOA-SqueezeNet method attained a higher accuracy of 0.925, a sensitivity of 0.926, and a specifcity of 0.918. However, the performance of the proposed method is evaluated by using some limited metrics. In the future, the efectiveness of the invented model can be progressed by adapting various optimization techniques for designing an efcient hybrid optimization scheme. Also, it will be further enhanced to classify heart diseases and the performance will be evaluated by considering more metrics.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.