Heartbeat Classification Using Normalized RR Intervals and Morphological Features

This study developed an automatic heartbeat classification system for identifying normal beats, supraventricular ectopic beats, and ventricular ectopic beats based on normalized RR intervals and morphological features. The proposed heartbeat classification system consists of signal preprocessing, feature extraction, and linear discriminant classification. First, the signal preprocessing removed the high-frequency noise and baseline drift of the original ECG signal.Then the feature extraction derived the normalized RR intervals and two types of morphological features using wavelet analysis and linear prediction modeling. Finally, the linear discriminant classifier combined the extracted features to classify heartbeats. A total of 99,827 heartbeats obtained from the MITBIH Arrhythmia Database were divided into three datasets for the training and testing of the optimized heartbeat classification system. The study results demonstrate that the use of the normalized RR interval features greatly improves the positive predictive accuracy of identifying the normal heartbeats and the sensitivity for identifying the supraventricular ectopic heartbeats in comparison with the use of the nonnormalized RR interval features. In addition, the combination of the wavelet and linear prediction morphological features has higher global performance than only using the wavelet features or the linear prediction features.


Introduction
The ambulatory electrocardiogram (ECG) is a powerful and noninvasive tool that can provide long-term cardiac information for the diagnosis of cardiac functions.Because the classification of heartbeats based on manpower is very costly and time consuming, many studies have contributed their efforts to the design of automatic classification systems to identify normal beats, supraventricular ectopic beats, ventricular ectopic beats, fusion beats, and other abnormal heartbeats [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15].The classification results can also provide valuable information for the diagnosis of the risk of arrhythmias or sudden cardiac death such as the presence of ventricular premature beats and nonsustained ventricular tachycardia and for further inspection, for example, for longterm heart rate variability and heart rate turbulence [1].
One of the main limitations of the current heartbeat classification methods is the low positive prediction accuracy for identifying supraventricular ectopic beats [5], because their QRS waveforms are very similar with those of normal beats.The presence of supraventricular ectopic beats only changes the RR interval and hence most of the previous studies combined RR intervals with other features for identifying supraventricular ectopic beats and other arrhythmic beats [2][3][4][5][6].However, the RR intervals are not only affected by the presence of the arrhythmic beats but are also dominated by the heart rate.The inconsistent heart rates among ECG recordings would reduce the performance of the RR intervals for classifying supraventricular ectopic beats.
In order to reduce the effects of inconsistent heart rates among ECG recordings, the RR interval features are normalized by the mean value of all RR intervals within the same ECG recording in this study.The normalized RR interval features are then combined with the morphological features extracted by the wavelet analysis and the linear prediction modeling, and a linear discriminant classifier is applied for identifying the normal beats, supraventricular ectopic beats, and ventricular ectopic beats.The MIT-BIH Arrhythmia Database [16] was used to test the performance of the proposed heartbeat classification system.The purpose of this study is to evaluate the heartbeat classification performance of combining two types of morphological features extracted using the wavelet analysis [5] and the linear prediction modeling [17] and to evaluate whether the heartbeat classification performance can be improved by using the normalized RR interval features.
The rest of this paper is organized as follows.Section 2 describes the ECG recordings obtained from the MIT-BIH Arrhythmia Database and the proposed automatic heartbeat classification system.The classification results are summarized in Section 3 and then discussed in Section 4. Finally, Section 5 concludes this study.

Materials and Method
Figure 1 is a block diagram of the proposed heartbeat classification system.The purpose of the signal preprocessing is to remove the high-frequency noise signal and baseline drift using a second-order low-pass filter and two median filters, respectively.The heartbeat classification is based on the nonnormalized and normalized RR interval features, and the morphological features were extracted using the wavelet analysis and the linear prediction modeling and are performed by linear discriminant classification.The details are described in the following sections.

ECG Recordings.
All of the ECG data used in this study are obtained from the MIT-BIH Arrhythmia Database [16] which contains common and life-threatening arrhythmic heartbeats.The MIT-BIH Arrhythmia Database contains 48 recordings of two-channel ambulatory ECG recordings with a length of 30 minutes, a sampling frequency of 360 Hz, and 11-bit resolution over a 10 mV range.In most recordings, the upper lead is a modified limb lead II (MLII), and the lower lead is usually a modified lead V1 (occasionally V2 or V5, and in one instance V4).There are over 109,000 beats that are individually labeled as one of 15 possible heartbeat classes.In accordance with the standards recommended by the Association for the Advancement of Medical Instrumentation (AAMI) [18], the four recordings containing paced beats were removed from this study.The heartbeat classes included in this study are class N, consisting of the normal and bundle branch block beats, class S, consisting of supraventricular ectopic beats, and class V, consisting of ventricular ectopic beats, according to the AAMI recommendations.Three datasets were used for the training and testing of the proposed heartbeat classification system.The selection of the imbalanced training and testing datasets are identical with the previous studies [2][3][4][5].The balanced testing dataset was randomly selected from the imbalanced testing dataset.There are 1,000 heartbeats for each of the three classes in the balanced testing dataset.Table 1 lists the heartbeat classes and numbers of the training and testing datasets.

Signal Preprocessing.
The input ECG signal is first filtered by a second-order integer low-pass filter to remove the high-frequency noise components.The -transform system function of this second-order filter is defined as follows: and the corresponding difference equation can be given with The presented second-order filter has a narrower transition band and hence has greater attenuation for highfrequency noise in comparison with the first-order filter used in the previous study [19].Figure 2 illustrates an ECG recording before (top) and after (bottom) removing highfrequency noises using the second-order integer low-pass filter.It is shown that the low-amplitude and high-frequency noises are removed after low-pass filtering.
Furthermore, two median filters were applied to remove the low-frequency baseline drift [2].Each recording was first filtered by a median filter with a width of 200 ms (i.e., 72 samples at a sampling rate of 360 Hz) to remove the QRS and P waves and was then filtered by a median filter with a width of 600 ms to remove the T waves.Hence the baseline drift signal can be extracted by the output of the second median filter, and the baseline drift eliminated ECG recording can be obtained by subtracting the estimated baseline drift from the original ECG signal.Figure 3 illustrates an ECG recording with a large baseline drift (bottom) and the same ECG recording after removing the baseline drift using two median filters (top).It is shown that most of the baseline drift can be removed after median filtering.

RR Interval Features.
The RR interval was defined as the interval between two successive R waves.Following the previous studies [2][3][4][5][6], four RR interval features were extracted in this study including the previous RR interval, the post-RR interval, the averaged 1 min RR interval, and the averaged 20 min RR interval, defined as follows.Figures 4(a) and 4(b) demonstrate the presence of atrial premature beats in class S and premature ventricular contraction beats in class V, respectively.The normal, atrial premature, and premature ventricular contraction beats are marked as N, A, and V, respectively.The presence of arrhythmic heartbeats would shorten or prolong the previous or post-RR intervals.However it is worth noting that the RR interval features were also directly related with the heart rate.The inconsistency of the heart rates between ECG recordings would reduce the classification performance of the RR interval features.Hence the four RR interval features were further normalized by the mean value of all RR intervals within the same ECG recording.The normalized RR interval between two normal heartbeats is close to one.If the RR interval is prolonged or shortened, the normalized RR interval is larger or less than one.

Morphology Features Extracted Using the Wavelet Analysis.
The wavelet transform was applied in this study to extract the morphological features of the QRS wave.The wavelet transform for a continuous signal () is defined as follows: where  and  denote the scaling and translation parameters, respectively.A small scaling parameter  can help the wavelet transform to locate details or fast transitions, and the translation parameter  can indicate the location.The selected prototype wavelet () is a quadratic spline which has been applied to ECG signals in the previous study [20].The Fourier transform of the quadratic spline wavelet is defined as follows: The corresponding discrete-time wavelet transform can be performed by the low-pass filter () and high-pass filter () defined as follows [20]: which have the following impulse responses:   First zero-cross position in lead 1  ,2 First zero-cross position in lead 2  ,1 Maximum position in lead 1 Maximum position in lead 2  ,1 (0) Optimal filter coefficient 0 in lead 1  ,1 (1) Optimal filter coefficient 1 in lead 1  ,2 (0) Optimal filter coefficient 0 in lead 2  ,2 (1) Optimal filter coefficient 1 in lead 2 The autocorrelation signal of the fourth scale of the discretetime wavelet transform was then calculated in a time window starting from 130 ms before  peak and ending at 200 ms after  peak as follows [5]: where [] is the fourth scale of the discrete-time wavelet transform, N is the length of the time window, and  is the time lag variable.Based on the autocorrelation signal, the morphological features were defined as the first zero-cross position and the maximum position which has the value of the absolute maximum.Figures 5, 6, and 7 demonstrate the results of the wavelet analysis for a normal beat from recording 101 of MIT-BIH Arrhythmia Database, an atrial premature beat from recording 209, and a premature ventricular beat from recording 119, respectively.The circle and rectangle indicate the zerocross and maximum positions in the autocorrelation signal of the fourth scale, respectively.It can be found that the QRS waveform and the wavelet morphological features of the normal beat are similar to those of the atrial premature beat, but very different from those of the premature ventricular beat, and their first zero-cross and maximum positions are 9 versus 8 and 18, and 16 versus 14 and 30, respectively.Both the first zero-cross and maximum positions are postponed due to the presence of the premature ventricular beat.the input QRS wave, () = ( − ).The prediction output of the Wiener filter with order  − 1 can be represented as [17]

Morphological Features Extracted Using the Linear Prediction Modelling.
where ⊗ denotes the operation of the convolution sum and () for  = 0, . . .,  − 1 are the filter coefficients.The Wiener filter design problem requires finding the filter coefficients, (), that minimize the mean-square error The necessary and sufficient condition for a set of filter coefficients to minimize  is that the derivative of  with respect to  * () must be equal to zero for  = 0, 1, . . .,  − 1 well-known Wiener-Hopf equations can be derived as follows [21]: which is a set of  linear equations in the  unknowns (),  = 0, 1, . . .,  − 1.The matrix form of the Wiener-Hopf equations can be written as where R  is an × autocorrelation matrix of the reference input (), w  is an  × 1 vector of the optimal filter coefficients, and r  is an  × 1 vector of the cross-correlations between the desired input () and the reference input ().This study introduces General Levinson Recursion [21] to recursively solve the Wiener-Hopf equations which are a set of Hermitian Toeplitz equations of the form given in (11).Two optimal filter coefficients,   (0) and   (1) of a firstorder linear prediction model with the prediction depth  = 1, are applied as the morphological features of QRS wave in this study.Figures 9,10,and 11 illustrate the results of the linear prediction modeling for a normal beat from recording 101, an atrial premature beat from recording 209, and a premature ventricular beat from recording 119, respectively.The solid and dashed lines denote the input QRS wave and the output of the linear prediction filter, respectively.The heartbeats used in Figures 9, 10, and 11 are the same as those in Figures 5, 6, and 7.It can be found that the differences in the two optimal filter coefficients between the normal and atrial premature beats are small (  (0) = 1.85 versus 1.88 and   (1) = −0.89versus −0.94) because of their similarity in the QRS waveforms, and the presence of the premature ventricular beat changes the coefficients to   (0) = 1.37 and   (1) = −0.38.

Linear Discriminant Classification.
This study used the linear discriminant classification method to combine the extracted RR intervals and morphological features and to classify the normal, supraventricular ectopic, and ventricular ectopic heartbeats.Assume the number of classes is , and the number of heartbeats in class  is   .The feature vector x  consists of all features extracted from the th heartbeat in class .The discriminant value of each feature vector  in class  can be derived as [2] where   denotes the prior probability of class  and the mean vector   of class  is calculated by and the covariance matrix Σ under the assumptions of normality and homogeneity of variances is defined as The prior probability of classes N, S, and V was equally set to 1/3.After determining the discriminant values of all classes, the posterior probability ( | ) for class  can be estimated by The feature vector will be classified into a class that has the highest posterior probability estimated by (16).[5].Assume    denotes the number of heartbeats correctly classified as class ,    is the total number of heartbeats in class ,    is the number of heartbeats classified as class , and   is the total number of heartbeats in the dataset.Then the performance parameters for class  are defined as follows: and the global performance parameters are defined by

Results
The extracted features described in the previous section for the classification of normal, supraventricular ectopic, and ventricular ectopic heartbeats are summarized in Table 2, including the RR interval features and the morphological features extracted using the wavelet analysis and the linear prediction modeling.The length of each feature vector is dependent on the feature configuration.The best linear discriminant classifier was determined using the imbalanced training dataset according to (13), (14), and (15) and was then applied to the imbalanced and balanced testing datasets to test the classification performance of different feature configurations.Tables 3, 4, and 5 summarize the classification results for the imbalanced training, imbalanced testing, and balanced testing datasets, respectively, and compare the classification performance of the different feature configurations.There were two RR interval configurations using the nonnormalized and normalized RR interval features and three morphological configurations using the wavelet features, the linear prediction features, and the combination of the wavelet and linear prediction features, with a total of six feature configurations.Table 3 shows that there were no significant differences in the classification performance between the uses of nonnormalized and normalized RR interval configurations in the training dataset.The positive prediction accuracy of the linear prediction morphological features for class V was lower than that of the wavelet features, 63.4% versus 78.3%, for combining with the normalized RR interval features.The global accuracies for the six feature configurations in the training dataset are high, ranging from 91.3% to 93.0%, but the positive prediction accuracies of class S only ranged from 20.6% to 25.2%.
The classification results of the imbalanced testing dataset in Table 4 demonstrate that the sensitivity and positive predictive accuracy of class S using the normalized RR  [5] 95.0 98.0 77.0 39.0 81.0 87.0 -intervals increased by 21.0% and 6.7% for combining with the wavelet features, 17.0% and 6.1% for combining with the linear prediction features, and 22.0% and 7.9% for combining with the wavelet and linear prediction features in comparison with using the nonnormalized RR interval features, respectively.The global sensitivity using the normalized RR intervals increased by 7.0% for combining with the wavelet features, 6.1% for combining with the linear prediction features, and 7.4% for combining with both the wavelet and linear prediction features in comparison with the use of the nonnormalized RR interval features.The positive prediction accuracies of the linear prediction morphological features for class V were lower than those of the wavelet features, 54.4% and 57.7% versus 77.3% and 75.3% for combining with the nonnormalized and normalized RR interval features, respectively.The combination of the wavelet and linear prediction features has higher global performance than only using the wavelet features or the linear prediction features.Table 5 shows the classification results of the balanced testing dataset and demonstrates that the use of the normalized RR interval features can increase the sensitivity and positive predictive accuracy of class S by 21.6% and 4.9% for combining with the wavelet features, 17.1% and 3.2% for combining with the linear prediction features, and 22.4% and 4.4% for combining with both the wavelet and linear prediction features in comparison with the use of the nonnormalized RR interval features, respectively.The positive predictive accuracy of class N using the normalized RR interval features was also increased by 17.3%, 15.5%, and 17% for combining with the three morphological feature configurations, respectively.The use of the normalized RR interval features also increases the global performance parameters from 6.0% to 7.6% for the six feature configurations.

Discussion
This study proposes an automatic classification system for identifying normal beats, supraventricular ectopic beats, and ventricular ectopic beats based on the nonnormalized and normalized RR intervals and the morphological features extracted by the wavelet analysis and linear prediction modeling.The signal preprocessing introduced a second-order integer low-pass filter to attenuate the high-frequency noise and two median filters to remove the baseline drift.A linear discriminant classifier was then applied for combining the extracted features to classify the heartbeats.
Because the QRS morphological features of the supraventricular ectopic heartbeats are similar to those of normal beats, the identification of a supraventricular ectopic heartbeat is mainly dependent on the shortened RR interval due to the absence of a P wave.The four RR interval features including the previous and post-RR intervals and the averaged 1 min and 20 min RR intervals are commonly used to identify the presence of the supraventricular ectopic heartbeats [2][3][4][5][6].However the RR interval is also dominated by the heart rate.The inconsistency of the heart rate among the ECG recordings would decrease the performance of the RR interval features for identifying the supraventricular ectopic heartbeats.Although Korürek and Nizam [22] have proposed the use of the averaged RR interval of the preceding 8 normal beats to normalize the RR interval features, it is not easy to determine the normal beats in advance because the heartbeat types are unknown before they can be accurately identified.This study adopted the mean value of all RR intervals within the same ECG recording to normalize the RR interval features to reduce the effect of the inconsistent heart rates instead of finding the normal RR intervals before heartbeat classification.The averaged RR interval in an ECG recording may also include the RR intervals that are shortened or prolonged due to the presence of the arrhythmic heartbeats.The study further compares the classification performance of the normalized RR interval features with that of the nonnormalized RR interval features.The classification results of the imbalanced and balanced testing dataset show that the use of the normalized RR interval features combining with the three morphological feature configurations improves the sensitivity from 17.0% to 22.4%, the positive predictive accuracy from 3.2% to 7.9% for identifying the supraventricular ectopic beats, and the global sensitivity from 6.1% to 7.4%.The normalized RR interval features also improve the positive predictive accuracy from 15.5% to 17.3% for identifying the normal beats in the balanced testing dataset.
The morphological features are mainly used for identifying the ventricular ectopic heartbeats because their waveform shapes are different from those of the normal and supraventricular ectopic heartbeats.This study adopted two types of heartbeat morphological features extracted using wavelet analysis and linear prediction modeling.The wavelet morphological features were proposed by Llamedo and Mathematical Problems in Engineering Martínez [5], including the first zero-cross and maximum positions obtained from the autocorrelation signal of the fourth scale of the discrete-time wavelet transform.The study results demonstrate that the presence of the ventricular premature contraction beats postponed the first zero-cross and maximum positions in comparison with the normal and atrial premature beats as shown in Figures 5, 6, and 7.The linear prediction morphological features were two optimal filter coefficients extracted by a first-order linear prediction model with one-step prediction depth [17].The linear prediction can estimate the predictable and smoothed parts of the input QRS wave, while the prediction error represents the unpredictable part that has been applied for the detection of the signals with sudden slope change within the QRS wave [17].Figures 9, 10, and 11 demonstrated that this low-order linear prediction filter can accurately predict most parts of the input QRS wave.The classification performance of the linear prediction morphological features is similar to that of the wavelet features except that the positive prediction accuracies of the linear prediction features for identifying the ventricular ectopic beats were much lower than those of the wavelet features in the imbalanced training and testing datasets.However, the global performance parameters of the linear prediction features were slightly higher than those of the wavelet features.The study results further demonstrate that the combination of the wavelet and linear prediction morphological features can further improve the heartbeat classification performance.
Several classification performance parameters of the proposed heartbeat classification system using the combination of the normalized RR interval features, the wavelet, and linear prediction morphological features are better than those of the previous studies.Table 6 compares the classification results of this study with those of the previous studies using the MIT-BIH Arrhythmia Database.de Chazal et al. [2] combined the RR interval features, heartbeat interval features, and morphology features based on two time-sampling methods and used the linear discriminant classifier to classify the arrhythmic heartbeats.The fusion beats are included in their classification results.de Chazal and Reilly [3] further proposed an adaptive heartbeat classification system based on their previous work [2].Although their classification performance is the best in Table 6, their system is not fully automatic and needs an expert to validate and correct a fraction of the beats of the recording during the classification processing.Another ECG classification model developed by Llamedo and Martínez [4] is based on the RR intervals and the morphological features extracted from different scales of the wavelet decomposition and applies the linear discriminant analysis and a Mahalanobis distance classifier to classify the heartbeats.The classification performance was greatly enhanced by their recent study [5].A floating feature selection algorithm was proposed to obtain the best performing and generalizing models in the training and validation sets for different search configurations.However, the sensitivity for identifying the supraventricular ectopic beats decreased from 86.0% to 77.0%, and the sensitivity for identifying the ventricular ectopic beats was lower than that of this study, 81.0% versus 86.2%.
The main limitation of the proposed heartbeat classification system is the low positive prediction accuracy, which only ranged from 20.6% to 31.6%, for identifying the supraventricular ectopic beats in the training and testing datasets.This is caused by the imbalanced ratios of normal, supraventricular ectopic beats, and ventricular ectopic beats heartbeats.The ratio of supraventricular ectopic beats is only 1.9% in the training dataset and 3.9% in the testing dataset.Even if only a small proportion of the normal and ventricular ectopic heartbeats are misclassified as the supraventricular ectopic beats, the positive prediction accuracy will be greatly decreased.The study results of the balanced testing dataset show that the positive prediction accuracy for identifying the supraventricular ectopic beats was substantially increased and ranged from 77.5% to 86.0%.

Conclusions
This study has successfully demonstrated that the use of the normalized RR interval features can greatly improve the positive predictive accuracy of identifying the normal heartbeats and the sensitivity for identifying supraventricular ectopic heartbeats in comparison with the use of the nonnormalized RR interval features, and the combination of the wavelet and linear prediction features has higher global performance than only using the wavelet features or the linear prediction features.

Figure 1 :Figure 2 :Figure 3 :
Figure 1: Block diagram of the proposed heartbeat classification system.
(a) Previous RR interval, RR[]: the interval between the th R wave and the previous R wave.(b) Post-RR interval, RR[ + 1]: the interval between the th R wave and the next R wave.(c) Averaged 1 min RR interval, RR 1 : the averaged RR intervals of 1-minute ECG recordings.(d) Averaged 20 min RR interval, RR 20 : the averaged RR intervals of 20-minute ECG recordings.

Figure 4 :
Figure 4: Illustrations of the presence of (a) atrial premature beats in class S and (b) premature ventricular contraction beats in class V. N denotes the normal beat, A denotes the atrial premature beat, and V denotes the premature ventricular contraction beat.
Figure 8 is a block diagram of a linear prediction model for modeling the input QRS wave, where  is the prediction depth and () denotes the -transform system function of the Wiener filter with finite impulse response.The desired input () is the input QRS wave, and the input reference signal () is the delayed version of

Figure 5 :Figure 6 :
Figure 5: Illustration of a normal QRS wave (a), the fourth scale (b), and the autocorrelation signal (c).The first zero-cross and maximum positions are 9 and 16, respectively.

Figure 7 :
Figure 7: Illustration of a premature ventricular contraction QRS wave (a), the fourth scale (b), and the autocorrelation signal (c).The first zero-cross and maximum positions are 18 and 30, respectively.PVC: premature ventricular contraction.

Figure 8 :
Figure 8: Block diagram of a linear prediction model for modeling the input QRS wave.

Table 1 :
Summary of heartbeat types and numbers for the training and testing datasets.
Class N: normal beats; Class S: supraventricular ectopic beats; Class V: ventricular ectopic beats.

Table 2 :
List of the features extracted for the heartbeat classification.

Table 3 :
Classification results for the imbalanced training dataset.

Table 4 :
Classification results for the imbalanced testing dataset.

Table 5 :
Classification results for the balanced testing dataset.This study evaluated the classification performance by the class sensitivity Se  and class positive prediction accuracy  +  and by the global sensitivity , global positive prediction accuracy  + , and global accuracy Acc

Table 6 :
Comparisons of the classification results with the previous studies.