Identification of Cardiac Patients Based on the Medical Conditions Using Machine Learning Models

Chronic diseases are the most severe health concern today, and heart disease is one of them. Coronary artery disease (CAD) affects blood flow to the heart, and it is the most common type of heart disease which causes a heart attack. High blood pressure, high cholesterol, and smoking significantly increase the risk of heart disease. To estimate the risk of heart disease is a complex process because it depends on various input parameters. The linear and analytical models failed due to their assumptions and limited dataset. The existing studies have used medical data for classification purposes, which help to identify the exact condition of the patient, but no one has developed any correlation equation which can be directly used to identify the patients. In this paper, mathematical models have been developed using the medical database of patients suffering from heart disease. Curve fitting and artificial neural network (ANN) have been applied to model the condition of patients to find out whether the patient is suffering from heart disease or not. The developed curve fitting model can identify the cardiac patient with accuracy, having a coefficient of determination (R2-value) of 0.6337 and mean absolute error (MAE) of 0.293 at a root mean square error (RMSE) of 0.3688, and the ANN-based model can identify the cardiac patient with accuracy having a coefficient of determination (R2-value) of 0.8491 and MAE of 0.20 at RMSE of 0.267, it has been found that ANN provides superior mathematical modeling than curve fitting method in identifying the heart disease patients. Medical professionals can utilize this model to identify heart patients without any angiography or computed tomography angiography test.


Introduction
Predicting cardiac sickness accurately may save someone's life, while an incorrect diagnosis can be deadly. Heart disease is made more likely by a host of risk factors, including high cholesterol, obesity, elevated triglyceride levels, and hypertension. Heart failure occurs when the heart's muscles fail to pump blood as efficiently as they should [1]. Shortness of breath may result from blood clots in the lungs. e heart weakens or stiffens over time due to certain cardiac conditions, such as restricted arteries in the heart or high blood pressure. People with heart disease may live longer if they get the proper treatment. A low-fat, low-sodium diet, 30 minutes of moderate exercise five days a week (or more), limiting alcohol, and quitting smoking use may all help lower your risk of heart disease. You may also be prescribed medications by your doctor if lifestyle changes aren't enough to keep your cardiac disease under control. If you have a cardiac condition, the medication you get will be tailored to your specific needs. Your doctor may recommend certain therapies or surgery if medication fails to work for you. e kind of heart illness and the degree of cardiac damage will dictate the type of surgery. ese classic risk factors, such as a family history of early coronary artery disease, dyslipidemia, and age, are well documented in the etiological route of ischemic heart disease in women [1]. e prevention, diagnosis, and rehabilitation of humans with ischemic heart disease continue to be a major concern. A complex interplay of variables, including unique risk factors and disease pathogenesis for ischemic heart disease in women with nonobstructive coronary artery disease and coronary microvasculature and endothelium disorder, contribute to this conundrum. Chronic disease prediction is essential in healthcare informatics [2]. It is important to detect the sickness as soon as possible [3]. Heart disease and diabetes risk may be estimated using machine learning algorithms that analyze the data for these diseases. Medical, industrial, and educational domains can benefit from extracting valuable information from vast datasets via data mining. Machine learning (ML) is one of the fastest-growing fields of artificial intelligence [1,4].
Cardiovascular disease is one of the leading causes of death. e current study focuses on detecting, altering, and addressing risk variables on an individual basis. Despite the fact that the incidence of various cardiovascular risk factors is increasing at different rates around the world, the magnitude of the increase has prompted researchers to look into the causes of the risk factors. is study's main purpose is to develop a model for detecting cardiac patients without using angiography or computed tomography angiography.
Heart disease may be detected in a variety of persons using machine-learning techniques. Neural networks (NN), decision trees (DT), CN2 rule inducers, Stochastic Gradient Descent, and support vector machine (SVM) were utilized, and found that the DTand SVM algorithms produce the best results in the 20-fold and 10-fold cross-validation tests 87.69% of the time [5]. Na et al. [6] developed an algorithm based on heart rate variability (HRV) to distinguish panic disorder from other types of anxiety. Because panic disorder and other anxiety disorders have similar causes and symptoms, machine learning aims to create a classification model that distinguishes panic disorder from other anxiety illnesses [6]. By finding complex, nonlinear patterns of expression and linkages in data sets, it has been found that the ML techniques may extract underlying information and found that for families, Random forest models yielded AUC values of more than 0.80, and for species, more than 0.98% [7]. e need for advanced tools to detect the illness early may reduce fatality rates. AI and data mining have many different methods that could help predict CVD before it happens and find out how people act in many different ways from a lot of data. e results of these forecasts will help doctors to make decisions and get patients checked out early, which will help them live longer. Different models can be utilized in several classification approaches [8].
Various researchers have developed a layered biometric identification system resistant to PAs by fusing fingerprint and heart-signal data. Artifact attacks are avoided in the first layer using an excellent convolutional neural network (CNN). An electrocardiography (ECG) image is used in the second layer of a lightweight CNN to prevent corpse attacks. Next, fingerprint matches at a predetermined threshold are utilized to prevent attacks by conformists. A score-level fusion of the fingerprint and a cardiac signal is used in the last layer of biometric authentication to ensure security. Two freely available online databases of fingerprints and cardiac signals were used to evaluate the proposed system against different scenarios of authentication and assault. ere were no false match rates (FMRs) found in the experiments, and the false nonmatch rates were satisfactory (FNMR) [9]. A unique data-driven strategy with a fuzzy rule-based classification system for cardiac disease detection outperforms other models in order to balance interpretability and accuracy [10]. e prevalence of heart failure has been rising in lockstep with the pace of population growth [11]. A python-based app has been made for healthcare because it is more reliable and helps with the tracking and setting up different types of health monitoring Apps. A random forest classification system is being developed to diagnose cardiac problems.
is method has an 83% of average accuracy rate over training data [12]. Dynamic systems (MLDS) have been developed to increase their existing knowledge at each layer. For feature selection, the model employs the correlation attribute evaluator (CAE), extra trees classifier (ETC), information gain attribute evaluator (IGAE), gain ratio attribute evaluator (GRAE), and Lasso. e ensemble approach for categorization in the model was built using random forest (RF), gradient boosting (GB), and naive Bayes (NB) classifiers [13].
Several statistical approaches, including principal component analysis, were used to find the essential parameters for stroke prediction. ey have found that the most critical criteria for diagnosing stroke in patients are age, average glucose level, heart disease, and hypertension. Furthermore, compared to all accessible input characteristics and various benchmarking methodologies, a feed-forward neural network with four properties has the greatest accuracy rate and the lowest miss rate [14]. ree alternative goals have been chosen for CHF (chronic heart failure) modeling: CHF identification as the primary diagnostic, prediction of blood pressure, and classification of CHF stages. Several machinelearning algorithms were applied to three sorts of features for each job: static, dynamic, and the entire feature set. e findings suggest that the models perform better when temporal and nontemporal information are included [15]. Heart disease may be detected earlier because of the newly developed and improved algorithms that have been built, innovated, and optimized. Based on a variety of classifier algorithms, including NB, the salp swarm optimized neural network (SSA-NN), Bayesian optimized SVM (BO-SVM), and K-nearest neighbors, the created system for predicting cardiovascular illness have been put into practice (KNN) [16]. Heart failure models were used to analyze the severity of heart failure and predict the occurrence of adverse events such as destabilizations, re-hospitalizations, and death [17]. Total heart rate (HR) and ear-worn, long-term blood pressure (BP) monitor to improve wearability. An SVM classifier to learn and recognize raw heartbeats from moving artifact-influenced data [18]. A single mechanocardiography measurement and the atrial fib relation (AFib) can be correctly identified acute decompensated heart failure can be diagnosed with a reasonable level of accuracy [19].
Machine learning establishes a new technique for detecting significant features which enhance the accuracy of the prediction of cardiovascular disease. A hybrid random forest and linear model approach improve the performance while maintaining an accuracy rate of 88.7% in predicting heart disease [20]. Enhanced deep learning assisted convolutional neural network algorithm was used to help and enhance patient prognostics in heart disease. It has been added to the IoMT for expert systems that help clinicians quickly and efficiently diagnose cardiac patients' information on cloud platforms worldwide. Compared to standard techniques, the test findings suggest that if you have a lot of flexibility with your EDCNN hyperparameters, you can get an accuracy of up to 99.1%. A unique rapid conditional mutual information feature selection approach has been developed to overcome the feature selection challenges [21]. e feature selection methods were used for feature selection in order to improve classification accuracy and minimize classification system execution time. e experimental findings suggest that the feature selection method (FCMIM) may be used with a classifier support vector machine to create a high-level intelligent system to detect heart disease [22]. Heart rate variability is a powerful predictor of hypertensive individuals who are more likely to experience cardiovascular-related events. In contrast to the standard methodologies utilized for the same purpose, the supervised learning model is simple, efficient, and cost-effective, and it can be used for cardiac monitoring analysis [23]. Various researchers have used machine-learning algorithms to predict the ECG signals [24,25]. As we mentioned, ML techniques can be applied in different applications and used especially in medical identification [26][27][28].
e medical field has a massive amount of patient data. is data must be mined using different machine-learning methods. Healthcare experts analyze this data in order to make effective diagnostic decisions. Clinical help can be provided by analyzing medical data using classification algorithms. e existing studies have used medical data for classification purposes which help to identify the exact condition of the patient, but no one has developed any correlation equation which can be directly used to identify the patients. In this study, basic information with some important clinical data have been used to identify the cardiac patient at the early stage without going through angiography and CT angiography. e major contributions of this study are the following: (i) e correlation has been developed using curve fitting and artificial neural network (ANN) methods. (ii) Developed an artificial neural network (ANN) model that professionals can use to identify cardiac patients. An ANN-based model provides results with very high accuracy. (iii) A detailed discussion on heart disease and a selective literature review has been done to identify the issues and parameters related to the cardiac disease for testing and identification purposes. (iv) e data has been collected from the Kaggle database. e performance of the models has been compared. e results show that these correlations can help in identify cardiac patients easily with higher precision. e rest of the paper is organized in the following sections. Section 2 presents the details of data collection and data preparation which has been used for modeling. e correlation models using curve fitting and artificial neural network methods are presented in Section 3. e major findings and performance of the models of curve fitting and ANN models are summarized in Section 4.

Cardiac Patients Identification
Identification of cardiac patients in the early stages is important to reduce the risk of complications. To address this issue, it is proposed to develop correlations that can be utilized to identify cardiac patients. A methodology has been proposed, as shown in Figure 1. e data sets have been collected from the online database, and the data filtration and standardization operations have been performed to remove outliers and make the data dimensionless. e proposed curve fitting and ANN methods have been used to develop models, and the performance of the developed models has been tested on various performance parameters to select the best-fitted model.

Data Collection and Data Preparation.
e clinical parameters of a heart disease patient were collected from the open-source link (https://github.com/g-shreekant/Heart-Disease-Prediction-using-Machine-Learning) used for the Computational Intelligence and Neuroscience development of correlation [29]. A list of such parameters is listed in Table 1. For the modeling of parameters, all the parameter values are standardized in the range of 0 to 1 using equation (1). e details of the statistical properties of the parameters used for the modeling are listed in Table 2 to understand the features of the data. Figure 2 shows the correlation plot between the input (X) and output (Y) variables.
where Y is the output of the normalized value, x is the value to be normalized, X min is the minimum value in the selected dataset, and X max is the maximum value in the selected dataset. Shapley additive explanations (SHAP) of the input parameters considered for modeling are shown in Figure 3. It is used to determine the contribution of each input parameter in the final predicted output. It shows that the thalassemia value (T h ) is the most important parameter and fasting blood sugar (F b ) is the least important parameter in predicting a heart patient. e relevance of every variable can be determined based on the SHAP values. e red dot indicates that the feature value is high, which leads to a higher SHAP value.
e detailed methodology of the proposed work for the identification of cardiac patients based on the medical conditions using machine-learning models is given in Figure 3.

Modeling of Parameters
e predicted value of the heart disease patient equation can be written as follows:

Curve Fitting Technique.
e relationship between the predicted heart disease value (H d ) and patient age (A c ) is shown in Figure 4, and equation (3) shows the relationship between H d and A c : Let A 0 � 1.2152, and the values A 0 depend on other input parameters such as S c , C p , Equation (3) can be written as follows: We can rewrite the (4) as follows: Figure 5 shows the plot between A 0 and S c , and the relationship between A 0 and S c is expressed in equation (6):

Computational Intelligence and Neuroscience
Let B 0 � 0.7647, the value of B 0 depends on other input parameters such as C p , Equation (6) can be written as follows: We can rewrite the equation (7) as follows: Figure 6 shows the plot between B 0 and C p , and the relationship between B 0 and C p is expressed in equation (9): Consider, C 0 � 0.5688, and it also depends on other input parameters such as Equation (9) can be written as follows: We can re-write the (10) as follows: Figure 7 shows the plot between C 0 and B p , and the relationship between C 0 and B p is expressed in equation (12):

Computational Intelligence and Neuroscience
Consider, D 0 � 0.7619, this value also depends on other input parameters such as C h , F b , R e , H r , E x , O p , S p , C a, and T h .
Equation (12) can be written as follows: We can rewrite the (13) as follows: Computational Intelligence and Neuroscience (14) Figure 8 shows the plot between D 0 and C h , and the relationship between D 0 and C h is expressed in equation (15): Let E 0 � 0.87, the constant value D 0 also depends on other input parameters such as F b , R e , H r , E x , O p , S p , C a, and T h . Equation (15) can be written as follows: We can rewrite the (16) as follows: Figure 9 shows the plot between E 0 and F b , and the relationship between E 0 and F b is expressed in equation (18): Consider, F 0 � 0.875, this constant value also depends on other input parameters such as R e , H r , E x , O p , S p , C a , and T h .
Equation (18) can be written as follows: We can rewrite the (19) as follows: Figure 10 shows the plot between F 0 and R e , and the relationship between F 0 and R e is expressed in equation (21): Consider, G 0 � 0.8439, this constant value also depends on other input parameters such as H r , E x , O p , S p , C a, and T h .
Equation (21) can be written as follows: We can rewrite the equation (22) as follows: (23) Figure 11 shows the plot between G 0 and H r , and the relationship between G 0 and H r is expressed in equation (24): Consider, H 0 � 0.3717, this constant value also depends on other input parameters such as E x , O p , S p , C a , and T h .
Equation (24) can be written as follows: We can rewrite the (25) as follows: (26) Figure 12 shows the plot between H 0 and E x , and the relationship between H 0 and E x is expressed in equation (27): Consider, I 0 � 0.408, this constant value also depends on other input parameters such as O p , S p , C a , and T h .
Equation (27) can be written as follows: We can rewrite the (28) as follows: Figure 13 shows the plot between I 0 and O p , and the relationship between I 0 and O p is expressed in equation (30): Consider, J 0 � 0.4871, this constant value also depends on other input parameters such as S p , C a , and T h . Equation (30) can be written as follows: We can rewrite the (31) as follows: (32) Figure 14 shows the plot between J 0 and S p , and the relationship between J 0 and S p is expressed in equation (33):  Computational Intelligence and Neuroscience J 0 � 0.0258S p + 0.469.

(33)
Let K 0 � 0.469, the constant value K 0 also depends on other input parameters such as C a and T h .
Equation (33) can be written as follows: We can re-write the (34) as follows: (35) Figure 15 shows the plot between K 0 and C a , and the relationship between K 0 and C a is expressed in equation (36): Let L 0 � 0.5168, and the constant value L 0 only depends on one input parameter, that is T h .
Equation (42) can be utilized for identifying the heart disease patient. Figure 17 shows a comparison between the   predicted behavior and reported through the medical test. At an average value of M 0 , that is, 0.6881. e curve fitting method can predict heart disease patients with R 2 -the value of 0.6337, with a mean absolute error of 0.293 at RMSE of 0.3688. e curve fitting method gives poor performance; hence, it cannot use for prediction purposes.

ANN Technique. In 1944, Walter Pitts and Warren
McCullough developed new types of networks called neural networks. One of the most extensively used machines learning approaches is the artificial neural network (ANN) model, which is inspired by biological neurons. e ANN is the most commonly used statistical model for detecting the relationship between input and output via a set of interconnected data structures with multiple neurons capable of enormous calculations for information representation and data processing. ANN model might be trained to forecast the required output from the supplied input. ANN is a type of artificial intelligence that operates in the same way as the human brain. ANN is made up of a sequence of linked neurons stacked in layers, just like the human brain. e weights linking the neurons determine the capacity of ANN structures to process provided information. e ANN structure can be either feed-forward or recurrent; however, feedforward is the most commonly employed in engineering and also utilized in this study. e feedforward network is made up of three layers: input, hidden, and output layer, as shown in Figure 18. e neurons in the same layer cannot be connected with each other, but they are also connected to the adjacent layers. Neurons are linked together and have different weights. Gradient descent and backpropagation are generally implemented to decrease errors. is method separated the data into three sections on a random basis: 90% for training, 5% for validation, and 5% for testing; the same approach is employed in this investigation. TANSIG (43) and PURELIN (34) were chosen as the activation functions in the hidden and output layers, respectively.
ANN is one of the popular machine-learning techniques utilized to predict heart disease patients. A total of 303 samples were employed to model the parameters, with 90% of the data architecture used for training, 5% used for Computational Intelligence and Neuroscience validation, and 5% used for testing. Only a single hidden layer with neurons from 5 to 25 was used to obtain the best network. e hit-and-trial approach was applied to the performance indices (R and MSE) to calculate the ranking of training, testing, and validation datasets. e training, testing, and validation datasets' ranking results reveal that the 10 neurons in the hidden layer have the best performance. e error ratio plot presented in (Figure 19(a)) and the performance plot (Figure 19(b)) using the ANN model at a minimum MSE of 0.08781 has been obtained at the 12 iterations. Figure 19(c) depicts the learning process for the best-analyzed neural network gradient, momentum, and validation check.
e detailed histogram of the training, validation, and testing of input data is shown in Figure 20. e following is the mathematical expression between the standardized input parameters and the output: e heart disease patient can be predicted using the equation (43). If the value of H d is equal to 1 shows the patient has heart disease, and if the value is 0, then the patient is not suffering from heart disease: where the hidden neuron responses A i (i � 1 to 10) are fed to the network output value and can be calculated with the equation (47).
As shown in Figure 20, the final correlation can predict the patient with an R 2 -value of 0.8491, having an average mean absolute error of 0.20 at 0.267 of RMSE. As a result, it has been determined that the generated correlation indicated by (44) is the best for predicting the heart disease patient. As shown in Figure 21, the maximum values of the output parameter only lie on the two points that are 0 and 1. e proposed model is good for forecasting the diseases of heart patients.

Conclusion
Heart disease is one of the dangerous chronic diseases in which patient lives at the risk of heart attack or sometimes death. e current study generated an efficient correlation for identifying heart disease patients. Curve fitting and ANN were applied to the normalized medical results to develop the correlations. e key finding of this investigation is the curve fitting method-based correlation is not suitable for identifying the heart disease patient as its accuracy is low. e curve fitting method predicts with R 2 -value 0.6337 having a mean absolute error of 0.293 at RMSE of 0.3688. e ANN-based correlation can identify the heart disease patient with the coefficient of determination of 0.8491, having an average MAE of 0.20 at 0.267 of RMSE. e ANNbased developed correlation method is accurate for identifying the heart disease patient. is model can be utilized to identify the heart disease patient without the need for angiography or computed tomography angiography test.