An Effective LSTM Recurrent Network to Detect Arrhythmia on Imbalanced ECG Dataset

To reduce the high mortality rate from cardiovascular disease (CVD), the electrocardiogram (ECG) beat plays a significant role in computer-aided arrhythmia diagnosis systems. However, the complex variations and imbalance of ECG beats make this a challenging issue. Since ECG beat data exist in heavily imbalanced category, an effective long short-term memory (LSTM) recurrence network model with focal loss (FL) is proposed. For this purpose, the LSTM network can disentangle the timing features in complex ECG signals, while the FL is used to resolve the category imbalance by downweighting easily identified normal ECG examples. The advantages of the proposed network have been verified in the MIT-BIH arrhythmia database. Experimental results show that the LSTM network with FL achieved a reliable solution to the problem of imbalanced datasets in ECG beat classification and was not sensitive to quality of ECG signals. The proposed method can be deployed in telemedicine scenarios to assist cardiologists into more accurately and objectively diagnosing ECG signals.


Introduction
Cardiovascular diseases (CVDs) are the leading cause of death worldwide [1]. According to the World Health Organization, about 17.9 million people died of CVD in 2016, accounting for 31% of all deaths. Arrhythmia is caused by improper intracardiac conduction or pulse formation, which can affect heart shape or disrupt the heart rate [2]. An electrocardiogram (ECG) is a comprehensive manifestation of the electrical signal activity of the human heart. Obtaining the detailed physiological state of various parts of the heart by collecting signals is an indispensable means of clinical objective diagnosis. Automated analysis and diagnosis based on ECG data have a reliable clinical diagnostic reference value for arrhythmia [3].
Many methods for automatic classification of ECGs have been proposed. e type of ECG beat can be distinguished by the time-domain [4], wavelet transform [5], genetic algorithm [6], support vector machine (SVM) [7], Bayesian [8], or other methods. Although the above classification methods achieve high accuracy on experimental datasets, their performance is highly dependent on the extraction characteristics of fixed or manual design methods. Manually designing extracted features may increase computational complexity throughout the process, especially in the transform domain.
Deep learning constitutes the mainstream of machine learning and pattern recognition. It provides a structure in which feature extraction and classification are performed together [9]. Deep learning has been widely used in many fields, such as image classification [10], target detection [11], and disease prediction [12]. It is also effectively used to analyze bioinformatics signals [13][14][15][16][17]. Acharya et al. [13] proposed a nine-layer convolutional neural network (CNN) to automatically identify five ECG beat types. Yildirim et al. [15] designed an end-to-end 1D-convolutional neural network (1D-CNN) model for arrhythmia detection. Hannun et al. [16] developed a deep neural network (DNN) to detect 12 rhythm ECG classes. Oh et al. [17] used U-Net autoencoder to detect five arrhythmias. e input of CNN through its unique weight-sharing mechanism is a spatial change, that is, the spatial data with the image as a typical example perform well. However, recurrent neural networks (RNNs) are more appropriate for chronological changes in the appearance of sample sequences.
Long short-term memory (LSTM) network is a special type of RNN that is widely used for time series analysis. It can effectively retain historical information and realize learning of long-term dependence information of text. It has been used in many fields, such as natural language processing [18] and speech recognition [19]. LSTM is also used for the detection of ECG arrhythmias. [20][21][22][23]. Yildirim [20] proposed a new model for deep bidirectional LSTM network-(BLSTM-) based wavelet sequences (WS) to classified electrocardiogram (ECG) signals. Oh et al. [22] proposed a combined network model using CNN and LSTM for ECG arrhythmia diagnosis. Hou et al. [23] introduced a new algorithm based on deep learning that combines LSTM with SVM for ECG arrhythmia classification. e imbalance of the ECG dataset is an additional challenge to accurately classify ECG beats. ere are two problems in the training process: (1) low training efficiency, because normal ECG beats occupying a large proportion of the dataset are prone to negative effects, and (2) degeneration of the model when a normal ECG beat overwhelms training. Some researchers have attempted to address imbalance in the ECG beat data when diagnosing arrhythmia. Sanabila et al. [24] used the generated oversampling method (GenOMe) to solve the problem of imbalanced arrhythmias, which generated new data points with specific distributions (beta, gamma, and Gaussian) as constraints. Rajesh and Dhuli [25] employed three data-level preprocessing techniques on an extracted feature set to balance the distribution of ECG heartbeats. ese were random oversampling and undersampling (ROU), synthetic minority oversampling technique with random undersampling (SMOTE + RU), and distribution-based balancing (DBB). As an alternative to resampling the input ECG beat data or feature set, focal loss addresses imbalanced dataset classification by downweighting easy normal ECG beat examples so that their contribution to the loss is small even if their number is large, that is, focal loss concentrates network training on hard ECG beat types, which may constitute a small part of the dataset.
Inspired by the idea of FL to solve the problem of imbalanced category classification and LSTM popularization technology, an effective LSTM with FL is proposed to handle imbalanced ECG beat data on the MIT-BIH arrhythmia database. LSTM automatically extracts the timing characteristics of complex ECG signals, and FL mitigates the problem of ECG class imbalanced distribution faced by the LSTM network, enabling the network to effectively train all categories. e experimental results show that the proposed model achieved state-of-the-art performance on imbalanced ECG beat data and outperformed previous results. Furthermore, we conduct experiments on both denoised and without denoised ECG datasets, and results demonstrate the proposed model is not sensitive to quality of ECG signals.

Methodology
Arrhythmia classification using deep learning generally includes two basic stages: preprocessing and classification. In the preprocessing stage, the Daubechies 6 (db6) discrete wavelet transform is used to remove noise from the ECG signal. e ECG heartbeat is then extracted using the sliding window search method, and the data are normalized using Z-score. e LSTM network is proposed for ECG heartbeat classification. e details and theoretical background of these methods are discussed in the following sections.

Preprocessing.
Preprocessing includes denoising and segmentation of ECG signals.

Noise Removal.
We denoised the raw data with the Daubechies 6 (db6) discrete wavelet transform [26], and the denoised ECG signals were input to the LSTM network. e original and denoised ECG signals are shown in Figure 1.

ECG Beat Segmentation.
We used the sliding window search method on the sample map extraction (see Figure 2). e MIT-BIH arrhythmia database provided annotations for ECG beat class information verified by independent experts. Since R-peak detection algorithms achieved more than 99% specificity and sensitivity [27][28][29], we used the R-peak annotation file directly. All ECG signals were segmented into sequences that were 250 samples long and centered on the annotated R-peaks. Note that we used an ECG beat with a length of 250 points by default, but there is no common standard for their size.

Problem Description.
To achieve the detection of arrhythmia, the softmax regression model is used as the last layer of the LSTM network structure. For the input training set, R � (x (1) , y (1) ), . . . , (x (i) , y (i) ), . . . , (x (n) , y (n) ) . n is the number of ECG beats containing the class labels. x (i) is an ECG beat. y (i) ∈ 0, 1, 2, 3, 4, 5, 6, 7 { } is the category label of the x (i) . 0, 1, 2, 3, 4, 5, 6, and 7 are the representations of N, LBBB, RBBB, APC, NESC, ABERR, NPC, and AESC, respectively. If y � 0, x (i) is a N (normal); otherwise, x (i) is one of the arrhythmia types. For an ECG beat x (i) , the output through the LSTM network is z (i) , as shown in where g(·) is a process function, describing the process of an ECG signal from the input layer to the last full connection layer, and θ is the relevant parameter in the LSTM network. e last output vector z (i) of full connection layer is ECG signal feature extracted by LSTM network. It is fed to the softmax layer which calculates the probability of each ECG beat category. Equation (2) is the softmax function used in the proposed network: Journal of Healthcare Engineering where C is the number of ECG beat categories. y (i) is the class probability that the LSTM gives to the input feature vector z (i) .

LSTM Recurrent Network.
Long short-term memory (LSTM) is a time-recurrent neural network. It is suitable for time-series prediction of important events, and the delay interval is relatively long [30]. e neural network can effectively retain historical information and realize learning of long-term dependence information of text. e LSTM network consists of an input gate, forget gate, output gate, and cell unit to update and retain historical information. Figure 3 shows an LSTM block.
e forget gate f t in the LSTM memory block is controlled by a simple single neuron. It determines which information must be retained or discarded to enable the storage of historical information. e input gate i t is a section where the LSTM block is created by a neuron and previous memory unit effects. It is activated to determine whether to update the historical information to the LSTM block. e candidate update content c in is calculated by a tanh neuron. e current time memory cell state value c t is calculated from the current candidate cell c in , the previous time state c t− 1 , the input gate information i t , and the forget gate information f t . o t of the LSTM block at the current time is generated at the output gate. Finally, a t determines the amount of information about the current cell state that will be output. e activation of each gate and the update of the current cell state can be calculated as follows: After calculating the hidden vector for each position, we considered the last hidden vector as the ECG signal representation. We fed it to a linear layer with an output length of the classification number and added a softmax output layer to classify the ECG beat as N, LBBB, RBBB, APC, NESC, ABERR, NPC, or AESC.   Journal of Healthcare Engineering In this paper, we use the four-layer LSTM architecture including an input layer, an LSTM layer, and two fully connected layers. e structure of the proposed LSTM for imbalanced ECG signal feature extraction and classification tasks is shown in Figure 4.

Focal Loss for Imbalanced ECG Beat Data.
Focal loss is a more effective way to deal with the issue of imbalanced datasets. It is obtained by transforming the cross-entropy (CE) loss function. e CE is calculated by where (1 − y) c is a modulating factor and c is a focusing parameter. e purpose of the modulation factor is to reduce the weights of easily categorizable ECG beats so that the model is more focused on ECG beats that are difficult to classify during training. When an ECG beat is misclassified and y is small, the value of the modulation factor is close to 1 and the loss is barely affected. Loss value is calculated using FL according to the block diagram in Figure 5.
Optimization of the network parameters is important. ere are many types of gradient descent optimization algorithms, such as Adagrad, Adadelta, Adam, and Nadam.
is work uses the Nadam algorithm. is is an effective gradient descent optimization algorithm that combines the Adam and NAG algorithms to calculate adaptive learning rates for different parameters. Overall, Nadam performs better than other gradient descent optimization methods in practical applications [32].

Experiment Setup.
e LSTM network proposed in this study ran on the deep learning framework Tensorflow 1.12.0 in the Microsoft Windows 10 64 bit operating system. e computer server was configured with an 8-GB Intel (8) Core (TM) i5-7000 processor. Considering the effectiveness of the classification results, we set the epochs to 350. e loss curve and accuracy curve during the training and verification process of the LSTM network using FL (c � 2) are shown in Figure 6. By observing the curve of Figure 6, after 350 epochs, the network converged and the overall classification accuracy was stable. e average time required to train the model in one epoch was approximately 191 s. Please note that this epoch setting was only used to easily evaluate the impact of other learning parameters on the network classification results and is not guaranteed to be the best configuration for LSTM network.

Materials.
We used the MIT-BIH arrhythmia database provided by the Massachusetts Institute of Technology [33]. It comes from 47 clinical patients and contains 48 annotated ECG records. Each group is approximately 30 minutes long and is sampled at a rate of 360 Hz by a 0.1-100 Hz band pass filter, for a total of approximately 650,000 sample points.
ere are more than 109,000 marker beats from 16 heartbeat categories. All beats are marked by two or more cardiologists. e normal category has the most data volume, and the category with the least data are supraventricular premature beats (only two samples). is study used eight ECG beat types: N, LBBB, RBBB, APC, NESC, ABERR, NPC, and AESC. ese beat types and their statistics are listed in Table 1.
From Table 1, it is found that there is a heavy imbalance between normal and abnormal ECG beats. Because of imbalanced ECG beat data, the network model tends to learn the distribution of major ECG beat data, while there is insufficient learning of minority ECG beat data, and we are often concerned with the lesser categories of abnormal ECG beats.
e dataset had a total of 93,371 ECG beats. We used 10% of all ECG data as the testing set. In the remaining ECG data, 90% of the data were used as the training set and 10% as the validation set. e training and validation sets were used to adjust the parameters and determine the optimal number of elements of the designed model. e model performance was evaluated using a testing set that was not previously used.

Evaluation Metrics.
We used five metrics to evaluate the performance of the proposed network: accuracy, recall, precision, specificity, and F1 score. Accuracy is the proportion of correctly classified ECG beats of all ECG beats, which reflects the consistency between test results and real results. However, recall, precision, and specificity are less biased in evaluating the performance of the classifier on the imbalanced dataset. e F1 score is the harmonic mean of precision and recall. Five evaluation metrics can be calculated as follows: e classification categories in this study are not binary, so we use the confusion matrix to express the TP, FP, TN, and FN metrics built for a classification test. e confusion matrix makes it easy to generate the above four metrics.

Network Parameter Configuration.
To obtain the best learning parameters of our proposed LSTM network, we quantitatively analyzed the impact of different learning parameters on the experimental results. e optimal parameter value was determined by evaluating the classification accuracy of the experimental results of multiple cases on the testing set.
After 350 epochs, the LSTM network converged and the classification accuracy was stable. e settings of the LSTM network parameters to obtain the best classification accuracy are shown in Table 2.
In this experiment, we analyzed the impact of various learning parameters on the classification performance of the proposed LSTM network with FL. e primary network parameters included the dropout, batch size, and c parameter of FL.
We evaluated different dropouts for the proposed network with an increasing dropout proportion. e other learning parameter settings took the default values in Table 2. Table 3 shows the classification accuracy on the testing set with different dropout proportions after 350 epochs.
By comparing the results of Table 3, we can see that the performance of our proposed LSTM network is not improved by increasing the dropout proportion. erefore, the optimal dropout value of the LSTM network structure is around zero. en, we studied the effect on the LSTM network performance of changing the initial settings of the batch size. We evaluated the performance of five different batch sizes, as shown in Table 4.
Based on the results of Table 4, increasing or decreasing the size of the batch does not necessarily improve the performance of our proposed LSTM network. A larger batch size allows for more accurate estimation of the gradient, but it is prone to overfitting. e small batch size has a standardizing effect, but there is a risk of inefficiency, and it is not possible to stop or to not match the strategy early. For the Input: x (1) , y (1) x (k) , y Journal of Healthcare Engineering dataset and network structure used in this paper, the optimal batch size is 128. e c parameter is the most critical parameter for FL. e effect of changing c on the performance of our proposed LSTM network was investigated. e effect on the distribution of the loss for abnormal ECG beats was minor. For normal beats, however, increasing the value of the parameter c heavily reduced the loss of correctly classified normal beats, allowing the model to focus on the misclassified abnormal ECG beats. After 350 epochs, the classification accuracy associated with the testing dataset was calculated and is given in Table 5 for six c values. e other learning parameters are the same as in Table 2.
From the results shown in Table 5, we can see that increasing or decreasing c did not improve the performance of the LSTM network with FL. e best c parameter value is 2 for the proposed network.

Results and Discussion
In this study, we proposed a LSTM network structure to achieve the goal of imbalanced ECG signal classification. e ECG beat data were classified by the LSTM network, and then, we trained the LSTM network using FL. By setting the CE as the benchmark, the feasibility of using the FL to classify the imbalanced ECG beats was proved. We verified the effectiveness of the LSTM network structure by comparing with state-of-the-art methods.
Performance measures of the model were evaluated using a confusion matrix. e cost function of the LSTM     network uses CE to calculate the confusion matrix on the testing set, as shown in Figure 7(a). e diagonal values in the confusion matrix represent the correct classification of ECG beats. Other LSTM network structure parameters (except c) are the same as in Table 2. e cost function of the LSTM network uses FL to calculate the confusion matrix on the testing set, as shown in Figure 7(b). Other LSTM network parameters are the same as in Table 2.
By comparing and analyzing the confusion matrix of LSTM network with CE and LSTM network with FL in Figure 7, we can see that the LSTM network with FL performs better on the imbalanced ECG dataset than the LSTM network with CE. When the FL is examined, it appears that the LSTM network provides better recognition performance over most classes. Examining the CE, the LSTM network appears to provide lower recognition performance over most class. Also, for the CE, 41 APC beats are misclassified into N beats, while for the FL (c � 2), 32 APC beats are misclassified into N beats. is is because there is no big difference in the shape of the two beats, but there is a specific, difficult-to-position wave anomaly (e.g., the PR segment is extended). Table 6 shows the PR, RE, SP, and F1 of the LSTM network with CE and LSTM network with FL on the testing set.
By comparing the results in Table 6, the validity of the LSTM network with FL is verified on imbalanced ECG data. From this table, it can be observed that the LSTM network with FL (c � 2) achieves an ACC of 99.26%, a RE of 99.26%, a PR of 99.30%, a SP of 99.14%, and an F1 score of 99.27%. e LSTM network with CE achieves 98.70% ACC, 98.70% RE, 98.05% PR, a SP of 98.75%, and 98.36% F1 score. Although the performance improvement by the LSTM network with FL seems not to be large compared to that of the LSTM network with CE, in a real diagnosis, even a minor accuracy improvement can hold great value for human health and life.
To more intuitively compare the effectiveness of the above two methods (CE and FL), we next analyze the results using the precision-recall curve (PR curve). For the category imbalance problem, the PR curve is considered to be superior to the receiver operating characteristic curve (ROC curve) [34]. As shown in Figure 8, for the input of the imbalanced ECG data, the PR curve of each category is drawn from the classification results using the CE (shown in Figure 8(a)) and the FL (shown in Figure 8(b)), respectively. Compared with the CE, when the LSTM network proposed in this paper uses the FL, most categories obtain a relatively high area under the PR curve (AUC). erefore, our proposed LSTM network with FL is effective in solving the category imbalance ECG dataset.
To verify the robustness of the proposed LSTM network with FL in a noisy environment, the network is also analyzed without denoised and the results are listed in Table 7. e performance measurements in Table 7 show that the LSTM network with FL (c � 2) achieved a classification result close to the result of denoised ECG recordings. It shows the advantages of denoised network and also illustrates the robustness of the network. e proposed network can be deployed in telemedicine scenarios. e ECG data of heart patients are collected through wearable devices and transmitted to the cloud via the Internet. Data analysis is carried out through the proposed model in this study to assist cardiologists into more accurately and objectively diagnose ECG signals. e proposed model was primarily studied on the MIT-BIH arrhythmia database. According to the AAMI standards (ANSI/AAMI EC57: 1998), all the beats in the MIT-BIH arrhythmia database are grouped into five main classes. However, this is not always desirable. e type of arrhythmia can be judged by the specific ECG beat and the regularity of the beat type. Repeated APC beats can become dangerous arrhythmias such as atrial fibrillation when a patient has a potential structural heart problem. Bundled branch blocks impede the normal pathway of electrical impulses through the conduction system to the ventricles. is causes asynchronous ventricular contractions and heart function deterioration, which may lead to life-threatening situations.
To assess the performance of the proposed network, we compared it to some state-of-the-art methods in the literature. We record the performance of the proposed network model (in bold) and the recent representative techniques for ECG beat classification using the MIT-BIH arrhythmia database in Table 8.
From Table 8, it is evident that our proposed LSTM network with FL achieved good performance. e difference between our study and other studies in the literature is that we used deep learning to classify category-imbalanced ECG beat data. For the classification of class-imbalanced ECG arrhythmias, we proposed a LSTM network with FL. ere are also studies in the literature on the classification of imbalanced ECG data [24,25]. e main difference is that our study uses FL that modifies the loss function, which makes the LSTM network more focused on feature learning of abnormal ECG beats that are prone to misclassification and improves the accuracy of arrhythmia classification. Regarding the RE, our proposed LSTM network with FL achieved a best result on the testing set. is means that it has a smaller number of false negatives, i.e., abnormal ECG beats which are erroneously classified as normal ECG beats. Furthermore, this method avoids the problem of the effective information reduction caused by the undersampling method or the problem of the network training time increase caused by the oversampling method. e highlights of our proposed network are as follows: (i) Feature extraction and selection techniques are not needed (ii) Our important finding is that the proposed method can improve the classification accuracy rate of categories with arrhythmia (iii) Our proposed method is robust under without denoised ECG recordings     (i) is study is conducted only on eight ECG beat types (ii) e proposed network is the time cost of the training phase

Conclusions and Future Work
In this study, we proposed a LSTM network with FL to improve the training effect by inhibiting the impact of a large number of easy normal ECG beat data on model training. e results show that the LSTM network with FL achieved an accuracy, recall, precision, specificity, and F1 score of 99.26%, 99.26%, 99.30%, 99.14%, and 99.27%, respectively. Experimental results of the MIT-BIH arrhythmia database demonstrate the effectiveness and robustness of the proposed network. e proposed method can be deployed in telemedicine scenarios to assist cardiologists into more accurately and objectively diagnosing ECG signals. e study was conducted only on eight ECG beat types. To generalize the results, various types and numerous beats should be incorporated in future research. And, we also plan to add different levels of noise to ECG signals to discuss the performance of the LSTM with the FL model.

Data Availability
e data used to support the findings of this study are included in the article. Further data can be requested from the corresponding author.

Conflicts of Interest
e authors declare that there are no conflicts of interest.  Journal of Healthcare Engineering 9