Comparison of Machine Learning Methods for the Arterial Hypertension Diagnostics

The paper presents results of machine learning approach accuracy applied analysis of cardiac activity. The study evaluates the diagnostics possibilities of the arterial hypertension by means of the short-term heart rate variability signals. Two groups were studied: 30 relatively healthy volunteers and 40 patients suffering from the arterial hypertension of II-III degree. The following machine learning approaches were studied: linear and quadratic discriminant analysis, k-nearest neighbors, support vector machine with radial basis, decision trees, and naive Bayes classifier. Moreover, in the study, different methods of feature extraction are analyzed: statistical, spectral, wavelet, and multifractal. All in all, 53 features were investigated. Investigation results show that discriminant analysis achieves the highest classification accuracy. The suggested approach of noncorrelated feature set search achieved higher results than data set based on the principal components.


Introduction
According to the World Health data, hypertension affects more than 1 billion people worldwide. Many factors can conduce to hypertension, including occipital stress and job strain [1]. One of the main problems concerning the treatments for the arterial hypertension conditions is the late detection for apparently healthy people. Some studies had been shown that among the individuals with hypertension, more than 35% were unaware of their condition [2].
The heart rate variability (HRV) is among one of the widely used biomedical signals, due to ease of record the electrical heart activity [3]. The HRV analysis can be applied in the task of arterial hypertension diagnostics, since it is well known that various features of the HRV reflect behavior of the different modules of the autonomic nervous system (ANS) [4].
Common HRV analysis implies the application of a variety of analysis methods: statistical, spectral, and nonlinear analysis. Generally, in a single study, a limited number of features are extracted. Such as in [5], 13 nonlinear features were studied for efficacy of stress state detection. The current paper comprises a study of 53 different features. Usually during one study, feature sets of a particular method are used. For example in [6], sets of time-domain features, nonlinear features, and spectral features were studied separately for automatic sleep staging by means of HRV signal analyses. In this study, combinations of different methods were analyzed.
The common uses of machine learning approaches for condition classification based on HRV information imply usage of several available methods: support vector machine (SVM), discriminant analysis (DA), and ordinal pattern statistics (OPS) [7][8][9]. However, the selection of a particular approach is not always justified. In current study, linear and quadratic DA, SVM, k-nearest neighbors, decision trees, and Naïve Bayes approaches were investigated.
In one of the previous works, the investigation of the linear and quadratic discriminant analysis was carried out, implying the study of arterial hypertension diagnostic using single features of short-term HRV signals. In that work, the evaluation of the features and the evaluation of the classifier efficacy were carried out by means of an inhouse software produced in MATLAB [10]. In the present paper, the machine learning methods were implemented in the python.
In summary, the goal of the present work is to study the efficacy of different machine learning approaches for diagnostic of the arterial hypertension by means of the short-term HRV, using combinations of statistical, spectral (Fourier and Wavelet transforms), and nonlinear features. By applying feature combinations of different methods, we aim to build more robust and accurate classifiers. Participants of this study were 30 healthy volunteers and 41 patients suffering from the arterial hypertension of II and III degree. The electrocardiography (ECG) signals were recorded in two functional states: functional rest (state F) and passive orthostatic load (state O). The length of the signal in the mentioned state was about 300 seconds. The HRV signals were consequently derived from the ECG signals automatically by the "Encephalan-131-03" software. Figure 1 presents diagram of the functional states.

Heart Rate Variability Features.
Prior to the processing, the original time series were cleaned from the artifacts. By artifacts, in this study, we considered values of the R-R intervals that differed from the HR mean by more than three standard deviations. NN is the abbreviation for the "normal to normal" time series, that is, without artifacts. Among all studied time series, less than 2% of data was removed. For spectral and multifractal analyses, NN time series were interpolated using cubic spline interpolation with the 10 Hz sampling frequency.
The feature dataset is the same as used previously [11], where it was shown that features of the HRV signals recorded in the state O have better classification accuracy for arterial hypertension diagnostics. Therefore, in this study, we analyze data only in the state O. The used features were separated into statistical, geometric, spectral (Fourier based), wavelet, nonlinear, and multifractal. Their description will be given below. (i) M is the mean value of the R-R intervals after artifact rejection: where N is the number of elements in the NN and NN i is the ith element in R-R time series.
(ii) HR, the heart rate, is an inverse ratio to M: (iii) SDNN is the standard deviation of the NN intervals: (iv) CV, the coefficient of variation, is defined as ratio of standard deviation SDNN to the mean M, expressed in percent; (v) RMSSD is the square root of mean of squares of differences between successive elements in NN; (vi) NN50 is the number of pairs of successive elements in NN that differ by more than 50 ms [12].

Geometric
Features. The geometric methods analyze the distribution of the R-R intervals as a random numbers. The common features of these methods are as follows: (i) M 0 , the mode, is the most frequent value in the R-R interval. In case of the normal distribution, the mode is close to the mean M.
(ii) VR, the variation range, is the difference between the lowest R-R interval and the highest R-R interval in the time series. VR shows variability of the R-R interval values and reflects activity of the parasympathetic system of the ANS.
(iii) AM 0 , the amplitude of the mode, is a number of the R-R intervals that correspond to the mode value. AM 0 shows the stabilizing effect of the heart rate management, mainly caused by the sympathetic activity [12].
The following indexes are derived from common geometric features: (i) SI, the stress index, reflects centralization degree of the heart rate and mostly characterizes the activity of the sympathetic department of the ANS: (ii) IAB, the index of the autonomic balance, depends on the relation between activities of the sympathetic and parasympathetic department of the ANS: (iii) ARI, the autonomic rhythm index, shows parasympathetic shifts of the autonomic balance: smaller values of the ARI correspond to the shift of the autonomic balance to the parasympathetic activity: (iv) IARP, the index of adequate regulation processes, reflects accordance of the autonomic function changes of the sinus node as a reaction of the sympathetic regulatory effects on the heart: , and ultralow frequency-ULF (lower than 0.003 Hz) [12,13]. For smaller than 300 seconds, shortterm time series ULF spectral component is not analyzed. HF spectral component characterizes activity of the parasympathetic system of the ANS and activity of the autonomic regulation loop. High frequencies of the heart rate in HRV spectrum are associated with the breathing and determined by the connection and influences of the vagus with the sinus node.
LF spectral component mainly characterizes activity of the sympathetic vascular tone regulation center. Low frequencies reflect modulation of the heart rate by the sympathetic nervous system [4].
VLF spectral component is defined by the suprasegmental regulation of the heart rate, as the amplitude of the VLF waves and is related to the psycho-emotional strain and functional state of the cortex. The genesis of the very low frequencies is still the matter of debates. Most likely, it is influenced by the suprasegmental centers of the autonomic regulation that generates slow rhythms. These rhythms are directed to the heart by the sympathetic nervous system, humoral factors on the sinus node. Biologic rhythms in the same frequency band are connected with the mechanisms of the thermoregulation, fluctuations of the vascular tone, the renin activity, and the secretion of the leptin [14]. The similarity of the frequencies implies the participation of these mechanisms in the genesis of the VLF spectral component. There are evidences of the increase of the VLF activity in case of the central nervous system damage, anxiety, and depression disorders [15].
The studied quantitative features of spectral analysis are (i) spectral power of the HF, LF, and VLF components, (ii) total power of the spectrum-TP, (iii) normalized values of the spectral components by the total power-HF n , LF n , and VLF n,

10
(iv) the LF/HF ratio, also known as the autonomic balance exponent, (v) IC, the index of centralization, (vi) IAS, the index of the subcortical nervous centers activation, (vii) HF max , the maximal power of the HF spectral components, (viii) RF, the respiration frequency, frequency that corresponds to the HF max [16].

Wavelet Transform.
For nonstationary time series, one can also use the wavelet transform (wt), to simultaneously study time-frequency patterns. The general equation for continuous wavelet transform is as follows: where a is the scale, b is the shift, ψ is the wavelet basis, and s t is the analyzed signal [17]. Moreover, the connection between the scale and the analyzed frequency is in accordance with the following: where f c is the central frequency of the wavelet basis, called by the centfrq function, f s is the sampling frequency of the analyzed signal, and f is the analyzed frequency. For wavelet transform computation in this work, we used wavelet Coiflet of the fifth order [18]. It is possible to acquire same spectral features by means of the wavelet transform: (i) Spectral power of the HF, LF, and VLF components (ii) Normalized values of the spectral components by the total power-HF n , LF n , and VLF n (iii) The LF/HF ratio.
Additionally, standard deviations SDHF(wt), SDLF(wt), and SDVLF(wt) of the HF wt (t), LF wt (t), and VLF wt (t) time series were tested as features. HF wt (t), LF wt (t), and VLF wt (t) are time series of the HF, LF, and VLF spectral components, respectively, acquired by means of the wavelet transform.
Moreover, one can study informational characteristics of the wavelet transform by analyzing the F LF wt /HF wt t function ( Figure 2). F LF wt /HF wt t is the continuous function of the LF/HF ratio. This function was not a smooth morphology. Its "excursions" (local dysfunctions) varies in case of functional loads, as the features of F LF wt /HF wt t is possible to use the number of dysfunctions Nd, the maximal value of dysfunction (LF/HF) max , and the intensity of dysfunction (LF/HF) int . By the dysfunction, we consider values of function that suppress decision threshold Δ according to previous studies of our research group Δ = 10 [19].
2.2.5. Nonlinear Feature. As the nonlinear feature in this study, we have used the Hurst exponent calculated by the aggregated variance method. The variance can be written as follows: where H is the Hurst exponent and X is a time-series vector.
H can be defined as the slope exponent in the following equation: where σ rms ΔX is the standard deviation of the ΔX increments, corresponding to the time period s, and с is a constant [20].
Note that H > 0.5 corresponds to the process with trend, so-called persistent process, contrary H < 0.5 corresponds to antipersistent processes that have a tendency for trend change, and H = 0.5 is the random process [21].
2.2.6. Multifractal Features. As nonlinear methods, we adopted the multifractal detrended fluctuation analysis (MFDFA) [22]. The algorithm and application features of  the MFDFA method to estimation of short-term time series are described in details in [23].
The main steps of the method include the following: (i) The detrending procedure with second degree polynomial on nonoverlapping segments where the length of the segments corresponds to the studied time scale boundaries.
In current study, we investigated time scale boundaries that correspond to the LF and VLF frequency bands: 6-25 sec and 25-300 sec, respectively. In our earlier works and by other authors, it was noted that multifractal analysis of the HF component is not informative because of the noising [24].
(ii) Determination of the fluctuation functions for q in range q = [−5, 5]: where NN v is the local trend in the segment ν, N s is the number of segments, and s is the scale.
(iii) Estimation of the slope exponent H x in log-log plot of the fluctuation function against scale s for each q: (iv) Calculation of the scaling exponent τ q : (v) The Legendre transform application for the probability distribution of the spectrum estimation: Figure 3 represents the main features of the multifractal spectrum estimated by the MFDFA method. Here, H 0 is the height of the spectrum and represents the most probable fluctuations in the investigated time scale boundary of the signal; H 2 is the generalized Hurst exponent (also known as correlation degree); αmin represents behavior of the smallest fluctuations in the spectrum; αmax represents behavior of the greatest fluctuations in the spectrum; and W = α max − α min is the width of multifractal spectrum that shows the variability of fluctuations in the spectrum. Multifractal characteristics are quantitative measures of the self-similarity and may characterize functional changes in the regulatory processes of the organism. In addition, we also tested the so-called 1/2width measure of the spectrum, which is defined as Table 1 presents summary of all features used in this study.

Machine Learning Approaches.
For the machine learning evaluation, the respective functions of the sklearn library were used [26]. The current paper describes supervised machine learning methods used. At first, the classifiers are trained using a training dataset. For that dataset, the class labels are known. After that, the efficacy of the classification is evaluated using test set of data. The efficacy is evaluated by comparing true labels of test set with those predicted by the model.

Discriminant Analysis (DA).
In this work, two variants of the discriminant analysis were tested-linear and quadratic discriminant analyses (LDA and QDA). The LDA aims to find the best linear combination of the input features to properly separate studied classes. In the case of the QDA, the studied classes are separated by a quadratic function [27].

k-Nearest Neighbors (kNN).
The k-nearest neighbors is one of the nonparametric machine learning approaches. In order to predict the class of the object, method chose the class, which is the most common among k "neighbors" of the object. Examples of the "neighbors" are picked from the training dataset. In the present study, different values of the k are tested-3, 4, and 5 [28].

Support Vector Machine (SVM).
The base idea of the support vector machine methods is creation of the decision hyperplane which would separate different classes. In that case, the margin between two nearest points on the different sides of the hyperplane is maximal. In present study, the radial basis function (RBF) is used. For implementation in python, one have to specify the following: SVC gamma = 2, C = 1 [29].

Decision Trees (DT).
The decision trees classification model is built around a sequence of the Boolean queries. The sequence of such queries forms the "trees" structure. In the present work, variations of the classifier were analyzed-with fixed value of the maximal tree depth (max_depth = 5). The maximal depth feature points the maximal number of queries that is allowed to use before reaching leaf. The leaf node is the node that has no "children" [30].

Naive Bayes (NB)
. This method is based on the application of the Bayes' theorem with assumptions that data has strong (or naive) independence. In current study, the Gaussian distribution of data is assumed [31].  Square root of mean of squares of differences between successive R-R (5) NN50 Variation higher than 50 ms in R-R signal -  in machine learning may lead to misleading results. Therefore, the first step in this investigation is to sort uncorrelated combinations. For this task, we compute the correlation coefficient. The whole flowchart of the script for noncorrelated feature combination selection is presented in Figure 4. The threshold correlation value was set to 0.25. Usually, correlation more than 0.75 is considered to be high. Therefore, a value lower than 0.25 is a good benchmark for low correlation. In the current work, two to five feature combinations were made. In case of more than two feature combinations, the correlation was checked pairwise. When all calculation was finished, the noncorrelated features were saved to a file for future purposes. Table 2 presents the total number of n-combinations for 53 features in case of n = [2,3,4,5] and number of selected noncorrelated combinations. Table 2 shows that application of such selection leads to both, more appropriate results and significant reduction of the analyzed combinations set. Figure 5 presents a complete flowchart of the implemented algorithm for classifier efficacy evaluation.

Cross-Validation.
Cross-validation implies division of the original datasets into m subsets, when m-1 subsets are used for the classifier training. The remaining part is used for the classifier test. The procedure is repeated m times. Such approach allows one to use dataset evenly [32].
In the current investigation, the number of random folds l was set to be 5. For the implementation of 5fold cross-validation, we randomly divide the original dataset into 5 subsets. The division is implemented for both groups simultaneously. As the result, each subset included 6 healthy volunteers and 8 patients diagnosed with hypertension.
Many machine learning methods are sensitive to train set selection, so, in order to remove such influence, the cross-validation procedure was repeated 100 times with different folds. The repeated cross-validation allows to increase number of classification accuracy estimates [33]. Table 3 presents calculation times spent for each machine learning approach for different number of features in combinations. Calculation times are presented for all noncorrelated combinations. In accordance with Table 3, the fastest approach is the decision trees. The knearest neighbors approach is the slowest one.

Results
The classifier performance was averaged over 5 crossvalidations and over 100 implementations. Figures 6-9 show overview of the classifier performance for all combinations for different numbers of features in combination. All color bars on Figures 5-8 have the same range-from 50 to 100%. Figure 10 presents maximal accuracy achieved by each classifier for different number of features in set.
It is worthy to mention that generally classification accuracy rises as the number of features in the feature set increases. For 4-feature sets, the maximum is achieved-accuracy for 5-feature sets is lower for all machine learning approaches. It drops significantly in case of support vector machine approach. Table 4 presents best results achieved by all machine learning approaches for 4-feature set.
Data in Table 4 shows that linear and quadratic DA not only achieve higher classification score but also have better stability of the results. Naïve Bayes classifier also has relatively high classification score and low deviation.
Among 53 studied features, 36 form combinations that have the classification score higher than 85. Table 5 presents   occurrences of the features among the combinations. The highest occurrences are noted for different spectral features, associated with VLF spectral band, LF/HF ratio, and statistical feature heart rate. Table 6 presents 7 features that form combinations with accuracy higher than 90%. All these combinations consist of heart rate, one feature associated with LF/HF ratio, and two features associated with VLF spectral band.

Discussion
For discussion purposes, a comparison of the results of the current study with results of one of the commonly used procedure, principal components analysis (PCA), was executed.
The PCA is a statistical procedure used to reveal the internal structure of the dataset [34]. In our case, features of different amplitude are used; PCA is known to be sensitive to the relative scaling of the feature dataset. Therefore, prior to the PCA application, the standardization procedure was implemented for each of 53 different features-subtraction of the mean value and after that division by the standard deviation. Table 7 presents explained variance as well as the cumulative variance for the first 15 principal components. First 10 principal components explain 93% of the variance. Consequent principal components add 1% of the variance or less.
In order to compare results of the semioptimal search of the noncorrelated feature space with PCA, combinations of the first 10 components were consequently tested for all    Figure 10: Maximal scores achieved by each learning machine approach. machine learning approaches using 100 repeated 5-fold cross-validation. Figure 11 presents the maximal results of classification accuracy achieved by each machine learning approach using combinations of the principal components.
Comparing the results of Figures 10 and 11, one can note that features found by the semioptimal search of the noncorrelated feature space reach higher classification accuracies than combinations of the principal components for all tested machine learning approaches.

Conclusions
In this work, various machine learning approaches were tested in task of the arterial hypertension diagnostics. In earlier works, the same datasets were used for investigation of the linear and quadratic DA methods [11]. The present work implies comparison of the DA methods with other machine learning approaches, like support vector machine, k-nearest neighbors, Naive Bayes, and decision trees.
The results of the current investigation showed that for the studied task, the application of the discriminant    analysis (linear and quadratic) revealed to be the most appropriate classifiers. These approaches have high classification score and low deviations over different realizations. A set of four features in combination seems to be the optimal number, as the classification accuracy score is higher and more consistent than those for two, three, and five features in combination. Prevalence of the VLF and LF/HF spectral features among best combinations might indicate that sympathetic nervous system takes an important part in the initialization of the arterial hypertension and maintenance of the increased vascular tone as well as increased cardiac output. These results are in accordance with scientists' interpretation of the arterial hypertension development [35,36].
The results of the suggested approach were compared with data set prepared by the commonly used procedure of principal component analysis. Results of the n-feature noncorrelated sets have achieved higher classification accuracies than ones based on the dataset of the selected principal components.
In future works, our research group will continue to improve results on this problem. One of the investigations that are planned is to analyze robustness of the classifiers based on multiple signals recorded simultaneously. Among the other perspective directions of future investigation is usage of the advanced neural networks [37] and genetic algorithms [38] for feature extraction and classification.  Figure 11: Scores of the PCA achieved by each learning machine approach.