Heart Risk Failure Prediction Using a Novel Feature Selection Method for Feature Refinement and Neural Network for Classification

School of Information and Software Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China Raptor Interactive (Pty) Ltd., Eco Boulevard, Witch Hazel Ave, Centurion 0157, South Africa Department of CS&IT, University of Azad Jammu and Kashmir, Muzaffarabad 13100, Pakistan Department of Electronics, University of Buner, Buner, Pakistan Department of Computer Engineering, Kangwon National University, Samcheok 25806, Republic of Korea


Introduction
e heart is a vital organ in the human body that is liable for blood circulation. e heart is responsible for oxygen and energy supply to all organs of the body including itself. Heart disease causes the abnormal blood circulation in the body that might be fatal for human life. Hence, if the heart stops its normal functionality, the whole system will be dead. From the literature, various risk factors are identified that cause the heart disease. e risk factors of heart diseases are classified into two major types such as the risk factors that can alter, e.g., smoking and physical exercise, and the risk factors that cannot alter, e.g., gender, age, and patient's family history [1]. e diagnosis of heart through conventional medical methods is quite difficult, complex, time consuming, and costly. erefore, the diagnosis of heart disease is worst in developing countries due to lack of state-of-the-art examination tools and medical experts [2,3]. Additionally, the invasive medical procedure for examination of heart failure is formed on various tests suggested by physicians, after studying the medical history of the patient and analyzing the relevant symptoms [4]. Angiography is considered as the gold standard among the medical tests for diagnosis of heart failure. Heart disease cases are affirmed through angiography as it is the best practice for diagnosis of heart disease. Moreover, angiography has side effects as well as higher cost for diagnosis of heart disease and demands extraordinary technical expertise [5,6]. erefore, machine learning and data mining techniques are needed to design the expert systems for resolving the problems of angiography.
To address the abovementioned problems, researchers have designed different noninvasive diagnosis systems by exploiting machine learning based predictive models. ese models include logistic regression, naive Bayes, k-nearest neighbor (KNN), decision tree, support vector machine (SVM), artificial neural network (ANN), and ensembles of ANN for heart failure disease classification [1,[7][8][9][10][11][12][13][14][15][16][17][18]. Robert Detrano utilized logistic regression for heart failure risk prediction and attained classification accuracy of 77%. Newton Cheung utilized various predictive models consisting C4.5, naive Bayes, BNND, and BNNF algorithms. e accuracies of proposed algorithms were 81.11%, 81.48%, 81.11%, and 80.95%, respectively, for precise classification of patients and healthy subjects. A. Khemphila and V. Boonijing proposed a classification technique based on multilayer perceptron (MLP) in addition to backpropagation learning algorithm and biomedical test values for diagnosing the heart disease through a feature selection algorithm. Information gain is utilized to filter features through elimination of the features which do not contribute for precise results. Total number of thirteen features is reduced to eight by using a feature selection algorithm. For the classification, ANN is used as a classifier. e accuracy of training dataset was 89.56%, while for data validation, the accuracy of 80.99% was reported.
Recently, Paul et al. proposed a fuzzy decision support system (FDSS) in order to detect the heart disease [19]. ey proposed a genetic algorithm based on FDSS that has five key components such as preprocessing of the dataset, effective features selection through diverse methods, weighted fuzzy rules that are set up through genetic algorithm, generated fuzzy knowledge used to build FDSS, and heart disease prediction. e proposed system obtained the accuracy of 80%. Verma et al. proposed a hybrid model for coronary artery disease (CAD) diagnosis [20]. e proposed method consists of jeopardizing factor identifiers adopting a correlation based subset (CFS) selection with particle swarm optimization (PSO) search model and K-means. Supervised learning algorithms such as multilayer perceptron (MLP), multinomial logistic regression (MLR), fuzzy unordered rule induction algorithm (FURIA), and C4.5 are then utilized to design CAD cases. e accuracy of the proposed approach was 88.4%. e proposed model enhanced the efficiency of classification techniques from 8.3% to 11.4% of Cleaveland dataset. Shah et al. proposed a technique based on the feature extraction for reducing feature dimensions [21]. e proposed approach used probabilistic principal component analysis (PPCA). Projection dimensions are extracted through PPCA that compliments high covariance and also helps to eliminate feature dimension. Parallel analysis (PA) helps in the selection of projection vectors. e feature subset of reduce feature vector is input to the radial basis function (RBF) kernel-based support vector machines (SVMs). Two types of classification are categories into heart patient (HP) and normal subject (NS) through RBF-based SVM serves. e proposed model is tested against accuracy, specificity, and sensitivity on the datasets of UCI, i.e., Cleveland. e accuracy of the proposed model for Cleveland dataset was 82.18%, 85.82%, and 91.30%, respectively. Most recently, Dwivedi tests the performance of different machine learning methods for the prediction of heart disease. e highest classification accuracy of 85% was reported based on logistic regression [22]. Amin et al. evaluate the different data mining methods and identify the significant features for predicting heart disease [23]. Predictive models were built from different combinations of features and wellknown classification methods, e.g., LR, SVM, and K-NN. From experimental results, it was studied that the best performance of the data mining technique for classification accuracy was 87.4% for the heart disease prediction.Özşen and Güneş proposed an expert system developed from an artificial immune system (AIS) and achieved accuracy of 87% [24]. An expert system was proposed byÖzşen and Güneş based on the artificial immune system (AIS). e accuracy of 87% was reported for the developed expert system. Polat et al. developed another similar system and obtained 84.5% accuracy [25]. Das et al. utilized a neural network ensemble model with the purpose of improving classification accuracy. His ensemble model obtained the classification accuracy of 89.01% [1]. Recently, Samuel et al. proposed a diagnostic system developed from ANN and Fuzzy AHP. e prediction accuracy of 91.10% was reported from the ANN and Fuzzy AHP diagnosis system [4].
As clear from the literature survey, ANN-based diagnostic systems have shown better performance on the heart disease data. Hence, we also attempt to design a diagnostic system based on neural network for heart disease detection. e development of various noninvasive diagnostic systems for heart disease detection motivates us to design an expert diagnostic system based on neural networks. From the empirical result, it is analyzed that proposed model shows promising performance. Hence, it can be used in clinics to make accurate decisions while diagnosing heart failure.

Materials and Methods
In previous studies, researchers used feature sets without eliminating irrelevant or noisy features. In this study, we propose a novel feature elimination method for removing noisy or irrelevant feature vectors and thus selected an optimal subset of feature vectors before feeding them to ANN or DNN. e proposed algorithm uses a window with adaptive size. e window size is initialized from one and is placed at the first feature of the feature vector. e feature or features to which the window points are eliminated while the remaining features constitute the subset of features that are supplied to the neural network for classification. To find the optimal configuration of the neural network for the subset of features, grid search algorithm is used. It is noteworthy that the previous studies utilized conventional ANN with only one hidden layer for heart failure detection problem. However, in this study, we found out that deep neural networks with more than one hidden layers and trained with new learning algorithms show better performance. Additionally, this study evaluates the feasibility of features selection algorithm at the input level of DNN. e working of the proposed diagnostic system is clearly shown in Algorithm 1 and Figure 1.

Dataset Description.
For this research, an online repository of machine learning and data mining from University of California, Irvine (UCI), for the heart disease dataset was used that is known as a Cleveland heart disease database. Data were gathered from the V.A. Medical Center, Long Beach and Cleveland Clinic Foundation by Dr. Robert Detrano [26]. e dataset is comprised of 303 subjects. Furthermore, the number of subjects having missing values in the dataset is 6. In the dataset, 297 subjects have complete data values out of 303 subjects. Hence, the number of subjects that have complete data values is used for experiments. Moreover, each subject in dataset has 76 raw features. In the previous work, the researchers mostly used 13 prominent features out of 76 raw features of each subject for the diagnosis of heart disease. erefore, mostly used 13 features for diagnosis of heart failure is considered for this study. Table 1 depicts the most commonly used 13 features of heart disease.

e Proposed Method.
e proposed diagnostic system has two main components that are hybridized as one blackbox model. e main reason for hybridizing the two components into one block is that they work in connection with each other. e first component of the system is a feature selection module, while the second component is a predictive model. Feature selection methods use data mining concepts to improve the performance of the machine learning models [27,28]. e feature selection module uses a search strategy to find out the optimal subset of features which are applied to the DNN that acts as a predictive model. e feature selection module uses a window that scans the feature vector. e working of the proposed method can be depicted from the algorithm.
Initially, the size of the window is set to 1. And, the window is placed at the left most side of the feature vector with size n, i.e., having n number of features. Hence, initially, the feature is eliminated from the feature vector on that the window is placed and the remaining features constitute the subset of features which are supplied to the DNN for classification. e performance of the subset of features is saved. In the next step, the window floats towards the right direction. Again, that feature is eliminated on which the window is placed and the remaining features constitute feature subset whose performance is checked by the DNN model. e same process is repeated until the window reaches the last feature, i.e., the n th feature. With this, the first round of window floating is completed. It is important to note that the features subset size is n − 1 in the first round.
In the next round, the window size is updated to 2. Hence, in this round, the window points towards the two features at a time. Again, the window starts the floating process from the left most side of the feature vector and eliminates the first two features. e remaining n − 2 features constitute the features subset that is applied as the input to the DNN model for classification, and the results are compared with the best performance achieved on the previous subset of features. If the performance is better than the previous best performance, the best performance and optimal subset of features is updated. In the next iteration, the window floats towards the right direction and those two features are eliminated on which the window is placed. e remaining features constitute the subset of features which are applied to DNN. e same process is repeated until the window reaches to the right most side of the feature vector. is marks the end of the second round. In the third round, the window size is made 3 and the same process is repeated that was carried out for the first two rounds. Finally, at the n − 1 round, the window size is made n − 1. In this round, the window can float just once towards the right. And then, the whole process is ended. Finally, the subset of features that give us the best results is declared as the optimal subset of features.
e whole process of features selection through adaptive floating window is clearly illustrated in Figure 2. Each time a subset of features is supplied to the DNN, the DNN architecture is optimized using the grid search algorithm. e performance of a DNN is highly dependent on its architecture [29]. Inappropriate DNN architecture will result in poor performance although there are chances that the DNN is applied with an optimal subset of features. e main reason for such a poor performance is that, if the DNN architecture selected for the classification is with insufficient capacity, then it will result in underfitting [30,31]. In such a case, the DNN will show poor performance on both data, i.e., training data and testing data. However, if the DNN architecture has excessive capacity, it will overfit to the training data; thus, it will show better performance on the training data but poor performance on the testing data. Hence, we need to search optimal architecture of DNN that will show good performance on both testing and training data. To understand the relationship between DNN architecture and the capacity of DNN, we need to understand the formulation of DNN. e neural network is formulated as follows: Neural networks are generated by the computational system based on mathematical models that simulate the human brain. e key element in the neural network model is known as perceptron or a node [32]. Nodes are shaped into groups which are called layers. Artificial neurons work on the same principal which is followed by the biological neuron. As an artificial neuron receives one or more inputs from the adjoined neurons, it then processes the information and transfers the output to the next perceptron. Artificial neurons are connected through a link that is known as weights. e input information χ i is weighted either positive or negative during the computation of output. An internal threshold value ϱ and weights are assigned for the solution of a problem under consideration. On every node, the result is calculated by multiplying the input values χ n and associated weight ω n that is fine-tuned by the threshold value ϱ. e output is then calculated through an activation function or transfer function (α) and is given in the following equation: e transfer can be linear or nonlinear. In the case of nonlinear function tangent, hyperbolic or radial basis form is applied. e sigmoid function, α(δ i ), is done at the following layer as an output value (equation (2)). c is related to the shape of the sigmoid function. e increase in parameter c value strengthened the nonlinearity of the sigmoid function: e neural network is obtained by connecting the artificial neurons. If the constructed neural network model has only one hidden layer, we name it ANN [17]. However, if the constructed neural network model has more than one hidden layer, we name it DNN [17].

Validation Scheme.
In earlier works, the performance of the expert diagnosis systems has been evaluated through holdout validation schemes. e dataset has to be partitioned into two parts: one is for training purpose, while another is used for testing. In the past, researchers have been using various train-test split percentages of data partitioning. Furthermore, Das (2) Figure 1: Block diagram of the newly proposed method.  4 Mobile Information Systems holdout validation schemes in their research. ey have partitioned the dataset into 70%-30% ratio, where 70% of the data set is utilized for the training purpose of the predictive model while 30% of the dataset is utilized for testing the performance of the predictive model. erefore, we also utilized the same criteria of data partition for train-test purpose.

Evaluation Metrics.
Various evaluation metrics such as specificity, sensitivity, accuracy, and Matthews correlation coefficient (MCC) are utilized for evaluating the performance and efficiency of the proposed model. e percentage of the precisely classified subjects is known as accuracy. Sensitivity is the accurate classification of the patients, whereas specificity is the absolute classification of healthy subjects. All the evaluation metrics are formulated in equations (3)- (6): where TP stands for true positives, TN describes true negatives, FP shows false positives, and FN stands for false negatives.
e characteristic of binary classification is assessed using MCC for machine learning and statistics. e value of MCC is ranging between −1 and 1. e −1 value of MCC denotes the total conflict between prediction and observation, whereas 1 shows the exact prediction, while 0 describes the classification as random prediction. Moreover, in this study, another evaluation metric, namely, the receiver operating characteristic (ROC) curve was also exploited. e ROC curve is a well-known metrics that is used to statistically evaluate the quality of a predictive model. e ROC curve provides area under the curve (AUC); thus, a model is considered a better model if its AUC is high.

Experimental Results and Discussion
In this session, two kinds of diagnostic systems are proposed. Moreover, experiments are done to test the performance of the proposed diagnostic system. In the first experiment, FWAFE-ANN is developed and stimulated, while in the second experiment, FWAFE-DNN is utilized. In the first experiment, the FWAFE algorithm is used to construct a subset of features. Furthermore, a subset of features is applied to ANN that is used as a predictive model. In the second experiment, FWAFE is used to construct a subset of features, whereas DNN is utilized for classification. All the experiments were simulated by using Python programming software package.

Experiment No. 1: Feature Selection by FWAFE and
Classification by ANN. In this experiment, at the first stage, FWAFE is used, while in the second stage, ANN is used. e feature selection module eliminates noisy and irrelevant features by exploiting a search strategy, whereas the second model is deployed as a predictive model. e proposed Mobile Information Systems diagnostic system achieves accuracy of 91.11% using only a subset of features. e optimal subset of features is obtained for n � 6, n � 7, and n � 11 where n stands for the size of the feature subset. e simulation results are reported in Table 2.
In the table, the last record displays a case where all the features are used, i.e., no feature selection is performed. It can be noticed that the best accuracy of 90% is achieved after optimizing the architecture of ANN by a grid search algorithm using all the features. us, it is evidently coherent that the proposed model is competent as it presents us better performance with the least number of features. e best performance of the proposed model is observed at 11 features for the peak training accuracy. Additionally, the feature selection module increases the performance of the optimized ANN by 1.11%. Moreover, F e denotes the features that are eliminated from the features, space during the feature selection process. e results from distinct subsets of features and diverse hyperparameters are displayed in Table 2.

Experiment No. 2: Feature Selection by FWAFE and
Classification by DNN. In this experiment, at the first stage, FWAFE is used, while at the second stage, DNN is implied. e feature selection module eliminates noisy and irrelevant features by exploiting a search strategy, whereas the second model is utilized as a predictive model. e proposed diagnostic system achieves an accuracy of 93.33% using only a subset of features. e optimal subset of features is obtained for n � 11 which includes FC 1 , FC 2 , FC 3 , FC 4 , FC 7 , FC 8 , FC 9 , FC 10 , FC 11 , FC 12 , and FC 13 , i.e., by eliminating feature number 5 and 6. e experimental outcomes are displayed in Table 3. To validate the effectiveness of the proposed feature selection method, i.e., FWAFE, the experiment is performed using the DNN model on full features without using the feature selection module. e DNN architecture was optimized using grid search algorithm. e best accuracy of 90% was obtained using neural network with four layers. e size of 1 st layer is equivalent to the number of features, 2 nd layer consists 50 neurons, and 3 th layer contains 2 neurons and output layer has only one neuron. In Table 3, the last row represents a case, whereas all features are utilized. Hence, it is evidently clear that the feature selection module boots the performance of DNN by 3.33%. Moreover, FWAFE-DNN shows better performance than FWAFE-ANN. e results at distinct subsets of features on various hyperparameters are shown in Table 3. e ROC charts are utilized to analyze the performance of the proposed model. A method whose ROC chart has maximum area beneath the curve is considered the best. e ROC chart whose points are in the upper left corner is considered to be the best. Figure 3(a) shows the ROC chart of the proposed FWAFE-ANN diagnostic system, while Figure 3(b) denotes the ROC chart of the ANN-based diagnostic system. From the figure, it is evidently vivid that the feature selection module increases the performance of the ANN model owing to more area beneath the curve. Similarly, Figure 4(a) represents the ROC chart of the proposed FWAFE-DNN diagnostic system, while Figure 4(b) depicts the ROC chart of DNN-based diagnostic system. From the figure, it is clearly observed that the feature selection module also increases the performance of the DNN model.

Experiment No. 3: Results of Other State-of-the-Art Machine Learning Models.
In this segment, a comparative analysis is done with other state-of-the-art machine learning models on biomedical datasets against our proposed model. e classifier selected for comparison are random forest (RF) classifier, randomized decision tree classifier, Adaboost ensemble classifier, SVM with radial basis function (RBF) kernel, and linear support vector machine (SVM).

Comparative Study with Previously Reported Methods.
In this section, experimental results of the proposed method are compared with those of the other methods discussed in the literature. e performance comparison is based on the prediction accuracy. Hence, Table 5 tabulates the prediction accuracies of our proposed method and other previously proposed methods in the literature. From the experimental outcomes, it is evident that the proposed hybrid method shows promising performance on heart disease, while the main limitation of the proposed method is its high time complexity. From Table 5, it can be seen that many studies proposed numerous methods for automated detection of HF. For example, Ali et al. developed a two stage system using linear SVM at the first stage for feature selection and linear discriminant analysis model for classification at the second stage and obtained 90% accuracy. In another study, Verma et al. in [20] utilized the correlation-based feature subset (CFS) for feature selection and particle swarm optimization (PSO) algorithm for k-means clustering.
eir method produced an accuracy of 88.4%. Saqlain et al in [21] proposed probabilistic principle component analysis and obtained an accuracy of 91.30%. Ali et al. in [34] proposed a novel hybrid method for improving the heart disease prediction accuracy. eir proposed method utilized linear SVM for feature selection and another SVM (with linear and nonlinear kernels) for classification. eir proposed method produced 92.22% heart disease detection accuracy. Hence, based on comparison with these methods, it is clear that our proposed method is a step forward in improving heart disease detection accuracy.

Conclusions
In this paper, an effort has been made to design a two stage diagnostic system that can improve the prediction accuracy of heart risk failure prediction. Two types of systems were developed. Both systems used same feature selection method, while the first system used ANN for classification and the second system used DNN for classification. A classification accuracy of 91.11% was achieved with the ANN-based system, while an accuracy of 93.33% was obtained with the DNN-based diagnostic system. It was also observed that the proposed diagnostic system shows better performance than other state-of-the-art machine learning models. From the experimental results, it can be safely concluded that the proposed system can help the physicians to make accurate decision while diagnosing heart disease.

Data Availability
All the data used in this study are available at UCI machine learning repository.