Detecting Congestive Heart Failure by Extracting Multimodal Features and Employing Machine Learning Techniques

The adaptability of heart to external and internal stimuli is reflected by the heart rate variability (HRV). Reduced HRV can be a predictor of negative cardiovascular outcomes. Based on the nonlinear, nonstationary, and highly complex dynamics of the controlling mechanism of the cardiovascular system, linear HRV measures have limited capability to accurately analyze the underlying dynamics. In this study, we propose an automated system to analyze HRV signals by extracting multimodal features to capture temporal, spectral, and complex dynamics. Robust machine learning techniques, such as support vector machine (SVM) with its kernel (linear, Gaussian, radial base function, and polynomial), decision tree (DT), k-nearest neighbor (KNN), and ensemble classifiers, were employed to evaluate the detection performance. Performance was evaluated in terms of specificity, sensitivity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUC). The highest performance was obtained using SVM linear kernel (TA = 93.1%, AUC = 0.97, 95% CI [lower bound = 0.04, upper bound = 0.89]), followed by ensemble subspace discriminant (TA = 91.4%, AUC = 0.96, 95% CI [lower bound 0.07, upper bound = 0.81]) and SVM medium Gaussian kernel (TA = 90.5%, AUC = 0.95, 95% CI [lower bound = 0.07, upper bound = 0.86]). The results reveal that the proposed approach can provide an effective and computationally efficient tool for automatic detection of congestive heart failure patients.


Introduction
Heart rate variability (HRV) signals are extracted from electrocardiogram (ECG) [1], which is a noninvasive marker for monitoring an individual's health. e time interval between two consecutive R-peaks in an ECG is called an RR interval or interbeat interval. e analysis of variations in the interbeat intervals is called HRV analysis, which has diverse applications in various fields of clinical research to examine a wide range of cardiac and noncardiac diseases, including myocardial infarction (MI) [2], hypertension [3], sudden cardiac death (SCD) and ventricular arrhythmias [4], and diabetes mellitus (DM) [5]. A low or depressed HRV is seen in congestive heart failure (CHF) patients. It is hard to visually identify the minute variations in HRV signals because ECG signals contain noise and baseline shift. us, analysis of such type of signals using traditional methods and visual detection is challenging, inappropriate, and timeconsuming. Moreover, the parameters of HRV are affected by respiration [6], instantaneous variation [7], and motion artifacts [8]. us, to minimize these obstacles of visual and manual interpretation, researchers developed computeraided diagnostic (CAD) techniques for HRV analysis.
About 26 million people are suffering from CHF around the world [9]. is is the pathophysiological condition in which the heart cannot provide enough blood to meet the body's requirements [9], resulting in the reduction in the ventricle's ability to pump blood [10]. e most common indications of CHF include dyspnea, edema, fatigue [9,10], heart valve disease, myocardial infarction (MI), and dilated cardiomyopathy [11]. CHF patients are more susceptible to sudden cardiac death [12]. Hence, CHF must be detected at the early stages. In this work, we aim to develop a system that can automatically distinguish between normal persons and CHF patients using heart rate variability (HRV) signals.
Interbeat intervals cannot be easily analyzed using visual detection, which may lead toward inaccurate classification of normal and diseased subjects. In this regard, various techniques [1] have been developed for automated detection and prediction of normal and abnormal HRV signals, including discrete wavelet transform (DWT) and empirical mode decomposition (EMD). HRV signals have been used to diagnose coronary artery disease (CAD) automatically [13]. Likewise, these signals have also been used to detect arrhythmia [14], risk of cardiovascular diseases [15], postmyocardial infarction (MI) patients, hypertension [16], diabetes [17], and sudden cardiac death [4].
Researchers [18] used time domain analysis techniques to analyze HRV signals and observed that CHF has an association with autonomic dysfunction. Frequency domain measures such as low frequency (LF), very low frequency (VLF), high frequency (HF), ratio of LF and HF, and total power from the HRV signals have been used for assessing cardiac autonomic control [17]. It was observed that VLF power is an independent risk predictor in CHF patients. A decrease in HRV has been observed in CHF patients in comparison to healthy persons [19]. Likewise, researchers [20] computed the standard deviation of normal to normal beat interval (SDNN) and used it for discriminating normal and CHF subjects. e researchers [21] analyzed the HRV signal of low-risk patients (LRP) and high-risk patients (HRPs) of CHF using time and frequency domain measures. It was observed that frequency domain parameters calculated from HRV signals were low in HRPs, except LF/HF ratio. Moreover, researchers [21] studied the dynamics of HRV in CHF patients and found lower values of standard HRV measures, except HF power. e lower values of HRV parameters have a correlation with the functional severity of heart failure [21]. Kumar et al. [22] proposed an automated method to diagnose CHF using HRV signals. is method is based on FAWT by decomposing the HRV signals into different sub-band signals. Further, accumulated permutation entropy (APEnt) and fuzzy entropy (AFEnt) are computed over cumulative sums of these sub-band signals. Soni et al. [23] proposed data mining techniques for predicting heart diseases. ey observed that data mining techniques such as decision tree (DT) and Bayesian network (BN) approach outperformed other predictive methods such as KNN and neural networks. e classification accuracy of DT and BN after applying the genetic algorithm by reducing the data dimension to obtain an optimal subset of attributes improved heart disease prediction [24]. Heart rate signals are nonlinear, nonstationary, complex, and time variant. Based on these characteristics, we extracted multimodal features from these signals and used robust machine learning to distinguish NSR and CHF subjects. We used jack-knife 10-fold cross-validation and evaluated the performance in terms of sensitivity, specificity, positive predictive value, negative predictive value, and total accuracy. Figure 1 shows a schematic diagram to illustrate the procedure used for the classification of NSR and CHF subjects by using multimodal features.

Dataset.
e RR interval time series data were taken from the Physionet databases [25]. e fluctuations in the cardiac interbeat interval (RR interval) time series data of normal sinus rhythm (NSR) subjects, congestive heart failure (CHF) subjects, and atrial fibrillation (AF) subjects were studied [25]. e data of NSR subjects were taken from 24-hour Holter monitor recordings of 72 subjects consisting of 35 men and 37 women (54 from the RR interval normal sinus rhythm database and 18 from the MIT-BIH normal sinus rhythm database). e age of the measured group was 54.6 ± 16.2 years (mean ± SD), range 20-78 years. ECG data were sampled at 128 Hz. e CHF group comprised 44 subjects, 29 men and 15 women aged 55.5 ± 11.4 years, range 22-78 years. e data of 29 CHF subjects were obtained from the RR interval congestive heart failure data and 15 from the MIT_BIH Bidmic congestive heart failure database [25]. CHF subjects can be classified into four groups according to the New York Heart Association (NYHA) functional classification system.
is system classifies patients according to the symptoms to everyday activity and quality of life of patients. In this study, we considered 20,000 samples for all subjects, including both CHF and NSR subjects, while extracting features.

Feature Extraction.
In most of the classification and regression problems, the first and foremost step is to extract the most relevant features. To predict colon cancer, researchers in the past [26] extracted hybrid and geometric features. Moreover, to detect breast cancer, Dheeba et al. [27] extracted texture features. Wang et al. [28] extracted multimodal features from multimodal domains such as time domain, frequency domain, and complexity-based features to detect epileptic seizure. is will give a unified framework to include the advantages of varying characteristics of EEG signals. Moreover, nonlinear dynamics based on the KD tree algorithm (fast sample entropy) provide better results than the traditional entropy methods.
To capture the temporal short-, medium-, and long-term dynamics from the physiological signals and systems, we computed the time domain features from the CHF and normal subjects. Moreover, for spectral dynamics, we extracted the frequency domain features. e statistical features were also computed to capture basic statistical properties from these signals. Moreover, most of the physiological signals are nonlinear in nature and contain complex hidden dynamics, which can be best detected using entropy-based computational features. us, in this study, we extracted linear features such as time domain, frequency domain, and statistical and nonlinear features, such as entropy-based complexity features and wavelet entropy features, to differentiate normal subjects from CHF subjects. In order to judge the efficiency of the features, we applied t-test and ROC curve as previously employed by the researchers using different rank tests [29][30][31].

B.B.A. Linear Methods.
To measure the variability in physiological signals (i.e., EEG or ECG) affected by different pathologies, the time and frequency domain methods are widely used to capture the time and spectral dynamics in these signals. e time domain methods are used to capture the short-, medium-, and long-term variations present in the physiological signals and systems, whereas to capture the dynamics present in different spectra, frequency domain features are computed. ere are literature evidences [32,33] for patients who suffered from different variability dysfunctions [34][35][36][37][38][39][40], including heart rate variability, breathing, depression, pulse variability, insomnia problems, and epilepsy.

B.B.B. Nonlinear Methods.
Biological signals are the output of multiple interacting components and exhibit complicated patterns and rhythms. ese rhythmical changes and patterns contain very useful hidden information to study the underlying dynamics of these systems. It is unrealistic to extract valuable information using traditional data analysis techniques. e complexity of the physiological systems comprised structural components and coupling among them, which is degraded with aging and disease. Following are the most commonly used complexity base measures as detailed in [28]. e complexity of healthy subjects computed using entropy methods is higher than that of diseased subjects. e reason behind this analogy is that all the structural components and coupling functions among the structural components in these healthy subjects are properly working and connected for communication, thereby increasing their entropy values and complexity. On the other hand, the entropy and complexity of the diseased subjects are reduced because of the degradation of the coupling among the structural components.

Approximate Entropy.
Pincus in 1991 proposed approximate entropy (ApEn) [41] to quantify the regularity present in the time series data. is entropy measure indicates that the probability of similar observation patterns does not repeat. Mathematically, (1) To compute the approximate entropy, two criteria are set, i.e., m, which is the window length and r, the similarity criteria. In this study, we choose m � 3 and r � 0.15 times the standard deviation of data as offered in [41].

Fast Sample Entropy with KD Tree Approach.
Sample entropy (SampEn) as proposed by [42] is a modified form of approximate entropy. Sample entropy in comparison to approximate entropy is more robust because it is independent of data length and trouble-free implementation.
Bentley in 1975 developed a binary tree algorithm known as Kd tree algorithm. Its each "v" node is associated with a rectangle Bv. If Bv does not contain any point in its interior, the "v" will be the leaf node. In other cases, Bv will be partitioned into two rectangles by drawing a horizontal and a vertical line such that each rectangle contains at most half of the points. e computation of Kd tree algorithm is detailed by [28]. e time and space complexity is reduced using the following steps.
Step 2. e d-dimensional kd tree is constructed using N-m points for which the total cost is O(N long N) and memory is O(N).
Step 3. Range query; For d-dimensional kd search, the time cost is NO(N 1(1/d) ) for N queries and the memory cost is O(N).

Wavelet Entropy.
Researchers in the past also computed wavelet-based entropic measures to capture the nonlinearity present in the data. e most common wavelet entropy methods [43] include Shannon, reshold, Log Energy, Sure, and Norm. Shannon entropy [43] was employed to measure signal complexity by computing wavelet coefficients generated from wavelet packets (WPT),

BioMed Research International
where larger values show high uncertainty process and, therefore, higher complexity. Moreover, Rosso et al. [44] employed wavelet entropy to capture the underlying dynamical process associated with the signal. e entropy "E" must be an additive information cost function such that where on an orthonormal basis S is the signal and (S i ) are the signal coefficients, and P is the threshold, which is always greater than or equal to 0. e Wentropy method was used to compute the wavelet entropy as shown below. Figure 2 depicts the flow of computing wavelet entropy by selecting different wavelet functions, such as threshold, norm, sure, and log energy. e computation of wavelet entropy packets (Shannon, norm, log energy, threshold, and sure) as reflected in equations (3)-(9) is detailed in [45][46][47].

Shannon Entropy.
In 1948, Claude Shannon first proposed the Shannon entropy [48], which is most widely used in the information sciences. Moreover, it is a measure of the uncertainty linked with a random variable. Specifically, Shannon entropy quantifies the expected value of the information contained in a message. e Shannon entropy of a random variable X can be defined as follows: where P i is defined in equation (3), with x i indicating the ith possible value of X out of n symbols, and P i denoting the possibility of X � x i .

Wavelet Norm Entropy.
is entropy measure proposed by [49] can be mathematically expressed as follows: where p is the power and must be 1 ≪ P < 2 the terminal node signal and (si) is the waveform of terminal node signals.

reshold
Entropy. e threshold entropy was computed with threshold at 0.2.

Sure Entropy.
e parameter P is used as threshold and the values of P ≥ 0.
e Sure entropy was computed with threshold at 3.

Norm Entropy.
In norm entropy, p is used as power and the value of P ≥ 1. e concentration in l p norm entropy is as follows: e norm entropy was computed with power at 1.1.

Log Energy
where P i (B) denotes the probability distribution function and is a logarithmic sum of the square of these probabilities' distribution.

Classification
Classification is a process of categorizing based on the extracted features. In machine learning, there are different types of classification techniques, such as supervised, unsupervised, and re-enforced learning. Researchers in the past employed robust machine learning classifiers such as support vector machine (SVM), decision tree (DT), K-nearest neighbors (KNNs), and Naïve Bayes, and ensemble classifiers in detecting and predicting colon cancer [26,50]. us, in this study, we employed supervised learning based on label class data including support vector machine (SVM), decision tree (DT), K-nearest neighbor (KNN), and ensemble classifiers.
3.1. Support Vector Machine. Support vector machine (SVM) is the most important technique of supervised learning methods, which is also used for classification purposes. For solving the problems related to pattern recognition [51], medical analysis area [52,53] and machine learning [54], recently SVM, are used. Furthermore, SVMs are also used in many other fields, such as detection and recognition, recognition of text, image retrial based on contents, biometric systems, and speech recognition. To build a single hyperplane or a set of hyperplanes in infinite space or high dimension, SVM is used. For obtaining a good classification, this hyperplane may also be used. By implementing this, a hyperplane that has the greatest distance to the nearby training point of any class is achieved. Usually, a lower generalization fault of the classifier is achieved by a larger margin. Support vector machine tries to find a hyperplane that gives the training example with the greatest minimum distance. In support vector machine theory, this is also termed as margin. For maximized hyperplanes, the best margin is attained.
ere are additional significant characteristics for SVM that provide better generalization results. Support vector machine mainly has a two-type classifier which converts data into a hyperplane dependent on data that are nonlinear or dimensionally higher. e SVM hyperplane, maximizing margin, and the kernel tricks as reflected in equations (10)- (17) are detailed in [55][56][57].
Let us express a hyperplane by: Here, w is a normal. e data that are separated linearly are labeled as follows: where y i is used as a two-class SVM class label. When objective function is maximized, the boundary obtained is optimum with the greatest margin: E � w 2 gives Combining these into a set of dissimilarities as When the data are not linearly separable, then a slack variable Ξ i to represent the amount of misclassification rate is used as reflected in Figures 3(a) and 3(b). us, the objective function in this case can be defined as Subject to On the right-hand side, the first term denotes the regularization term which gives the ability to SVM for generalization on the sparse data, whereas the second term represents the empirical risk for the points that lie within the margin or are misclassified. Here, L represents the cost function and C denotes the hyper-parameter, which shows the trade-off effect by minimising the empirical risk against maximizing the margin. To detect the outlier, the linear error cost function is used. e dual formulation with Subject to Here, α � α 1 , α 2 , α 3 , . . . , α i is a set of Lagrange multipliers of the constraints in the primal optimization problem. e optimal decision boundary is now given by  BioMed Research International nonlinear mapping function from the input space is transformed into a higher dimensional feature space. us, in the input space, the dot product between two vectors is expressed by the dot product with some kernel functions in the feature space. e commonly used kernel functions are as follows.

SVM Fine Gaussian (RBF) Kernel
where n is the order of polynomial kernel and σ is the width of RBF. e dual formulation for a nonlinear case is given by Subject to e performance of SVM classifiers depends on several parameters. One of the famous methods is the grid search method, which selects the optimal parameter value by setting carefully the grid range and step size. In linear kernel, only one parameter, i.e., "c" a soft margin constant, is used, which represents the constraint violation cost associated with the data point lying on the wrong side of the decision boundary. However, the SVM with Gaussian and RBF has two training parameters: cost (c), which controls the overfitting of the model, and sigma (), which controls the degree of nonlinearity of the model. In this study, we used the default values of both cost function and sigma. For SVM fine Gaussian, the default kernel scale was selected as 0.61; for medium Gaussian, the kernel scale was 2.4; and for coarse Gaussian, the kernel scale was 9.8.

Decision Tree (DT).
e DT classifier checks the dataset similarity that is given and classifies it into different separate classes. Decision trees are used for making classifiers of data depending on the choice of a feature, which fixes and maximizes the data division. ese attributes are separated into different branches until the end criteria are met. e mathematical formulations are described in [59] for equations (23) and (24). e decision tree classifier is based on supervised learning technique, which uses a recursive approach by dividing a dataset in order to reach a similar classification of a goal like below (Figure 4.
Mathematically, the following algorithm is used to compute the DT: In the above equations, m denotes the available quantity of observations, n denotes the number of independent variables, S denotes the m-dimension vector of the variable predicted through X · X i is the i th element of n-dimension independent variables. e independent variable is . , x in of design X i vector and T is used for transpose symbolization.
e main aim of DT is to estimate the value of X. By using X, different DTs may construct different accuracy and correctness levels; however, an optimum DT is inspiring because the space for search has a larger dimension.
To find the trade-off between correctness and complication for decision trees, appropriate algorithms can be created. In this situation, a categorization of locally optimum decisions that are nearly the parameters of features is used for making partition of the dataset X using algorithms of DTs. Optimum DT, T k0 , is created according to the following problems of optimization.
where R (T) symbolizes the level of error during the misclassification of tree T k , T k0 indicates the optimum decision tree that reduces the error related to misclassification in the binary tree, and T denotes the binary tree ∈, T 1 , T 2 , T 3 , . . . , T k , t 1 . e tree index is represented by k, t stands for node of tree, t 1 stands for node of root, r(t) for resubstituting error that misclassifies node t, p(t) represents the probability that any case drop into node t is represented by T L and T R representing the sub-trees of the right and left sets of partition. e tree T is deliberate by feature plan partitioning.
Most of the classification problems with large datasets are complex and contain errors, and the decision tree algorithm is most appropriate in these situations. e decision tree works by taking the objects as an input and giving the output as yes/no decision. Decision trees use sample selection [60] and also exhibit Boolean functions [61]. e decision trees are also quick and effective methods used for large classification dataset entries and provide best decision support proficiencies. ere are many applications of using DTs, such as medical problems, and economic and other scientific situations [62].
ere are several parameters that are used to tune the decision tree. In this study, we used the default parameters to get a baseline. e min-sample-per-leaf node was set to 1 by default, which can make a tree over fit and learn from all the data points, including outliners. Another parameter is the maximum depth of the tree, which indicates how deep the tree can be. A deeper tree has more splits and is capable of capturing more information about the data. e decision tree in this study was fit with a depth ranging from 1 to 32. Another important parameter is the number of random splits required to split the internal node. is varies from considering at least one sample at each node by considering all samples at each node. By increasing the parameters, the tree can become more constrained because it will consider more samples at each node. In this study, we consider this parameter from 10% to 100% of the sample. A similar approach was adopted for minimum sample leaf.

K-Nearest Neighbor (KNN).
In the field of pattern recognition, machine learning, etc., K-nearest neighbor is the regularly used algorithm. KNN is a nonparametric method used for both classification and regression problems. In both cases, the given input consists of k-closest training samples in the feature space. e output is dependent on whether we use KNN for regression or classification. For the KNN classification method, the output is a class membership. Any object can be classified based on the majority voting of its neighboring data points with the object being assigned to the class that is common among its K-nearest neighbors (where K is a positive integer, typically small). If K � 1, then the objects will be classified and assigned to the nearest class of that single neighbor.
We used the default parameters during training/testing of data using the KNN algorithm. KNN was used for classification complications in [63]. KNN is also termed as lazy learning algorithm. A classifier is not promptly constructed; however, all preparation information tests are spared and held up until the point that new perceptions should be classified. ese characteristics of the lazy learning algorithm make it better than excited learning because it builds a classifier even before new interpretations need to be classified. It is explored by [64] that KNN is also more important when the dynamical data need to be changed and more rapidly simplified. Different distance matrices are employed for KNN. e following are steps of this algorithm in which the formula of Euclidean distance are used and reflected by equation (25) (also described in [65]).
Step I. In the first step, prepare the framework and provide the feature space to KNN.
Step II. By using the following distance formula termed as Euclidean distance formula, find the distance.
Step III. Type the calculated value from the Euclidean distance formula by using d i ≤ d i + 1, where i � 1, 2, 3, . . . , k.
Step IV. According to the nature of data, apply different means and polling.
Step V. e value of K (i.e., the number of nearest neighbors) depends on the volume and nature of data delivered to KNN. For smaller data, the value of k is also reserved small, and for large data, the value of k is reserved as large.
In this study, we selected K � 3, distance metrics as Euclidean distance, and distance weight as equal weight.

Ensemble Classifiers.
e ensemble classifiers comprise a set of individually trained classifiers whose predictions are then combined when classifying the novel instances using different approaches. ese new learning algorithms by constructing a set of algorithms classify new data based on the new data points by taking the weight of their prediction. Based on these capabilities, these algorithms have successfully been used to enhance the prediction power in a variety of applications, such as predicting signal peptide for predicting protein subcellular location [66], predicting subcellular location, and predicting enzyme subfamily prediction [67]. e ensemble classifiers in many applications give relatively enhanced performance than the individual classifier. e researchers [68] reported that individual classifiers can produce different errors during classification; however, these errors can be minimized by combining the classifiers because the error produced by one classifier can be compensated by another classifier.

Performance Evaluation Measures.
To detect CHF, the following measures were used to compute the true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), negative predictive value (NPV), total accuracy (TA), and area under the receiver operating curve (AUC) as depicted in equations (21)-(25) and detailed in [69,70].

True Positive Rate (TPR).
e TPR measure, also known as sensitivity or recall, is used to test the proportion of people who test positive for the disease among those who have the disease. Mathematically, it is expressed as follows: i.e., the probability of positive test given that the patient has the disease.

True Negative Rate (TNR).
e TNR measure also known as Specificity is the proportion of negatives that are correctly identified. Mathematically, it is expressed as TNR � True Negative Condition Negative , i.e. probability of a negative test given that patient is well.

Positive Predictive Value (PPV)
. PPV is mathematically expressed as follows: where TP denotes that the test makes a positive prediction and the subject has a positive result under gold standard, while FP is the event that the test makes a positive prediction and the subject has a negative result.

Negative Predictive Value (NPV). NPV can be computed as NPV � True Negative Predicted Condition Negative
, where TN indicates that the test makes a negative prediction and the subject also has a negative result, while FN indicates that the test makes a negative prediction and the subject has a positive result.

Total Accuracy (TA).
e total accuracy is computed as

e 95% Confidence Interval (CI).
For the mean μ X , a common confidence interval is 95% CI. For normally distributed sample means, z-statistics (called z1 and z2) is such that P (z1 < Z < z2) � 0.95. e margin of error can be computed by multiplying the value of Z2, denoted by Z * , by the standard deviation of the sample mean, i.e., δ X � δ X / � n √ . at is, the margin of error is ). e lower bound and upper bound as reflected in equations (31) and (32) are detailed in [71,72].

Lower Bound (LB) of 95% CI.
e lower bound of 95% CI for μ X is computed by subtracting the margin of error from the point estimate X:

Upper Bound (UB) of 95% CI.
e upper bound of 95% CI for μ X is computed by adding the margin of error with the point estimate X:

Results
In this study, we extracted multimodal features, such as time domain, frequency domain, statistical and complexity-based features from congestive heart failure (CHF), and normal sinus rhythm (NSR) subjects. We computed the performance based on single features and hybrid features. Robust machine learning classification methods, such as decision tree (DT), support vector machine (SVM) and its kernel, K-nearest neighbors (KNN), and ensemble methods, were employed. e performance was computed using true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), negative predictive value (NPV), total accuracy (TA), and area under the receiver operating curve (AUC). Performance based on single features is 8 BioMed Research International reflected in Tables 1-4, whereas performance based on a combination of features is reflected in Figures 5-7. We extracted the time domain features, such as SDANN, SDNN, SDSD, and RMSSD, and applied machine learning classifiers such as decision tree (DT); support vector machine (SVM) and its kernels linear, quadratic, cubic, and medium Gaussian; K-nearest neighbor (KNN) with fine, medium, and cosine KNN; and ensemble classifiers such as bagged tree, subspace discriminant, and RUSBoosted tree, as reflected in Table 1 By extracting the frequency domain features such as TP, ULF, VLF, LF, HF, and LF/HF from CHF and normal subjects, as reflected in Table 2, we applied different machine learning classifiers to distinguish these conditions. Using the decision tree, the highest detection performance was obtained with coarse DT such as TA (81.9%), AUC (0.81) followed by fine DT with TA (80.2%), AUC (0.84). Using SVM, the highest detection accuracy was obtained using SVM medium Gaussian with TA (85.   Figure 5: Heart failure rate detection performance using decision tree and KNN methods. To discriminate the CHF from normal subjects, we extracted statistical features such as RMS, variance, skewness, smoothness, and kurtosis, as reflected in Table 3, and applied robust machine learning techniques. Based on decision tree, the highest detection performance was obtained using coarse DT with TA (77.6%), AUC (0. e entropy-based features were computed based on complexity measures such as sample entropy using KD tree approaches; approximate entropy and wavelet entropy measures such as Shannon, threshold, log energy, sure, and norm; and applied machine learning classifiers such as DT, SVM, KNN and ensemble classifiers, as reflected in Table 4. By applying the decision tree, the highest detection performance was obtained using coarse DT with TA (69.8%), AUC (0.65), followed by fine DT with TA (62.9%), AUC (0.65). Likewise, using SVM, the highest detection accuracy was obtained using SVM quadratic with TA (73.3%), AUC (0.74), followed by SVM cubic with TA (70.7%), AUC (0.73); SVM medium Gaussian with TA (69.8%), AUC (0.75); and SVM linear with TA (69.0%), AUC (0.71). By applying KNN, the highest detection performance was obtained using medium KNN with TA (71.6%), AUC (0.69), followed by  Based on a combination of features, the detection performance using DT and KNN is shown in Figure 5.  Ens. RUSBoosted tree Figure 6: Heart failure rate detection performance using SVM and ensemble methods.   Figure 6. Figure 7 depicts the heart failure rate detection performance using area under the receiving operating curve (ROC). Multimodal features based on entropy methods, wavelets, statistical, time, and frequency domain features are extracted from congestive heart failure and normal subjects. Based on the combined features, the highest AUC was obtained using SVM RBF with AUC (0.9359), followed by SVM Gaussian with AUC (0.9293), Naïve Bayes and decision tree with AUC (0.9287), and SVM polynomial with AUC (0.9258). e AUC values based on the single features are reflected in Tables 1-4.
In Figures 8 and 9, the blue color denotes the means of CHF subjects and red color denotes the NSR subjects. e lines denote the correctly classified subjects, while x denotes the incorrectly classified samples using SVM linear and quadratic kernels. ere is a total of 44 CHF subjects and 72 NSR subjects. SVM with linear kernel provides the highest performance with accuracy (93.  (68) having more incorrectly classified results than SVM linear kernel, as reflected in Figure 9.
We computed the mean ± std from CHF and normal subjects by extracting different time domain, frequency domain, statistical, and entropy-based features as reflected in Table 5. To discriminate these subjects, the P-value is reflected in the last column. All the extracted features provided highly significant results to discriminate the CHF subjects from NSR subjects. e significance level is represented by * * * P-value < ×10 − 100 and > ×10 − 50 , * * P-value < ×10 − 49 and > ×10 − 25 , and * P-value < ×10 − 24 and >0.01. Mostly, the standard features computed gives higher mean values for NSR than CHF subjects. e lowest standard deviations

Discussion
e dynamics of heart signals are highly complex and nonlinear in nature. Moreover, the temporal dynamics present in the heart variability based on short-, medium-, and long-term variations can be best captured by extracting time domain features. Moreover, heart rate failure dynamics can also be captured by extracting spectral components which are computed using frequency domain features. e complex dynamics of the dynamical systems can be measured based on structural components and coupling among these components. e complexity degraded when any of the structural/functional components is lost.
is loss of complexity is also due to the pathological conditions and aging.
Recently, Kumari et al. [73] in their article concluded that patients with coronary heart disease and diabetes mellitus get significant results in clinical symptoms with improvement in the quality of life. ey employed SVM with radial base function kernel and decision support systems to predict the heart rate variability [74]. e results obtained using these methods showed good detection performance. e classification accuracy, sensitivity, and specificity of the SVM and RBF have been found to be high, thus making it a good option for the diagnosis [75].
Based on the varying dynamics of the physiological systems, researchers employed different features of extracting methods. Want et al. [28] extracted discrete wavelet transform (DWT), nonlinear, and multidomain Recently, Hussain et al. [76] extracted multimodal features to detect arrhythmia and applied machine learning techniques. Data on CHF and NSR ECG signals were taken from the Beth Israel Deaconess Medical Center (BIDMC) CHF database and the Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) arrhythmia database, respectively. By extracting the frequency domain features (TP, ULF, VLF, LF, HF, and LF/HF), the highest detection performance was obtained using SVM cubic with total accuracy (80.3%) and AUC (0.76). By extracting entropy-based features (sample entropy with KD tree; approximate entropy; and wavelet entropies Shannon, threshold, sure, log energy, and norm), the highest detection accuracy was obtained using SVM medium Gaussian, and fine KNN with total accuracy (100%) and AUC (1.0). Likewise, by extracting time domain and statistical features (SDANN, SDNN, SDSD, RMSSD, RMS, variance, skewness, kurtosis, and smoothness), the highest arrhythmia detection performance was obtained using fine KNN with total accuracy (100%) and AUC (1.0), followed by ensemble bagged tree and subspace discriminant with total accuracy (98.5%) and AUC (0.99 and 1.0, respectively). Moreover, by extracting the entropy-based features, the highest detection performance was obtained using SVM medium Gaussian, fine KNN, and ensemble subspace discriminant with sensitivity (100%), specificity (100%), total accuracy (100%), and AUC (1.0), followed by SVM cubic, medium KNN with sensitivity (98%), specificity (100%), total accuracy (98.5%), and AUC (1.0). Most recently, Tripathy et al. [77] used a similar dataset by extracting time-frequency entropy features and applied a hybrid classifier with mean metric (HCMM); the highest detection accuracy was obtained with sensitivity (98.48%), specificity (99.09%), and accuracy (98.78%). e result reveals that our approach of multimodal features from time domain, frequency domain, statistical, and entropy-based features gives higher detection performance than the feature extracting and classification approach employed by [77] for a similar dataset.
Recently, many studies have been conducted which provided different methods to discriminate CHF patients from normal patients. Isler et al. [78] offered the structure of multistage classifiers in discriminating CHF patients and obtained a specificity of 98.1% and sensitivity of 100%. A recent study [74] investigated the effect of the number of folds in discriminating patients with CHF from normal subjects using five different popular classifiers. It was proved that average performance was enhanced and the variability of performances was decreased when the number of data sections used in the cross-validation method was increased. e highest performance was obtained using KNN with the LOO method having accuracy (80.9%), sensitivity (52.1%), and specificity (96.3%).
Narin et al. [79] investigated the statistical feature selection methods to improve the classifier performances on CHF using HRV analysis. Isler and Kuntalp [80] investigated the effect of heart rate normalization in the classifier performance on CHF patients using HRV analysis. ey employed KNN with and without HR normalization by selecting K � 1, 3, 5, 7, 9, 11, and 13, with maximum performance of 93.98%. Isler and Kuntalp [81]showed the importance of wavelet-based features in the diagnosis of CHF using HRV signals. ey obtained the highest discriminating powers in terms of sensitivity and specificity. e researchers [82] employed different machine learning  Table 6 reflects the findings of previous studies. e present study was aimed to study the dynamics of heart rate variability based on multimodal features by extracting strategy and employing robust machine learning techniques. We have extracted time domain features (to capture short-, medium-, and long-term variations), frequency domain features (to capture spectral components), entropy features (to capture complex dynamics), and applied machine learning classifiers such as support vector machine (SVM) and its kernel, decision tree (DT), K-nearest neighbor (KNN), and ensemble classifiers. Coarse DTgives the highest performance with TPR (85%) and fine DT with PPV (88%). e SVM linear gives performance with TA (93%), TPR (96%), and AUC (0.97), and SVM cubic with TPR (97%), PPV (94%), TA (89.7%), and AUC (0.91). Moreover, the medium KNN gives TPR (99%), PPV (96%), TA (81%), and AUC (0.92). e ensemble method subspace discriminant gives TPR (93%), PPV (89%), TA (91.4%), and AUC (0.96). e results reveal that extracting multimodal features based on time variation, temporal dynamics, and complex dynamics can improve the early detection of heart failure and survival rate.

Conclusion
Hear rate variability analysis is a noninvasive tool used for assessing the cardiac autonomic control of the nervous system. Various kinds of defects can be detected by analyzing the oscillations between consecutive heart beats. e analysis of HRV is the subject of different clinical studies investigating a wide spectrum of cardiological and noncardiological diseases and clinical conditions. In other clinical conditions and diseases, a depressed HRV has also been observed in patients suffering from dilated cardiomyopathy, CHF, etc. In this study, we aimed to discriminate the CHF patients from normal subjects after extracting multimodal features. We extracted time domain, frequency domain, statistical, and entropy-based features from CHF and normal subjects and employed the robust machine learning techniques. A 10-fold cross-validation was applied for training and testing data validation. e performance was evaluated in terms of sensitivity, specificity, PPV, NPV, TA, and AUC. We evaluated the CHF detection performance based on single and hybrid features. e highest performance using decision tree was obtained with sensitivity (82%), specificity (82%), and accuracy (81.9%). Using SVM, the highest detection performance was obtained with SVM linear with sensitivity (96%), specificity (89%), and accuracy (93.1%). Moreover, using the ensemble methods, the highest detection performance was obtained using subspace discriminant with sensitivity (93%), specificity (89%), and accuracy (91.4%). e results reveal that by considering temporal, spectral, and nonlinear dynamics, the detection performance of CHF can be very helpful in the early diagnosis and prognosis of heart failure patients.
In the present study, we extracted multimodal features from CHF and NSR subjects and employed machine learning techniques to detect congestive heart failure. In future, we will extract features by considering the clinical information of patients and from the severity level of congestive heart failure classes. We will also apply deep convolutional neural network (CNN) using transfer learning approach for pretrained networks, such as GoogleNet, AlexNet, and Inception V3, as CNN is not feature dependent and is fine-tuned. ese directions will provide more detailed and comprehensive studies for further performance improvement.
Data Availability e data are publicly available on Physionet.

Conflicts of Interest
e authors declare that they have no conflicts of interest.