A Deep Neural Network Ensemble Classifier with Focal Loss for Automatic Arrhythmia Classification

Automated electrocardiogram classification techniques play an important role in assisting physicians in diagnosing arrhythmia. Among these, the automatic classification of single-lead heartbeats has received wider attention due to the urgent need for portable ECG monitoring devices. Although many heartbeat classification studies performed well in intrapatient assessment, they do not perform as well in interpatient assessment. In particular, for supraventricular ectopic heartbeats (S), most models do not classify them well. To solve these challenges, this article provides an automated arrhythmia classification algorithm. There are three key components of the algorithm. First, a new heartbeat segmentation method is used, which improves the algorithm's capacity to classify S substantially. Second, to overcome the problems created by data imbalance, a combination of traditional sampling and focal loss is applied. Finally, using the interpatient evaluation paradigm, a deep convolutional neural network ensemble classifier is built to perform classification validation. The experimental results show that the overall accuracy of the method is 91.89%, the sensitivity is 85.37%, the positive productivity is 59.51%, and the specificity is 93.15%. In particular, for the supraventricular ectopic heartbeat(s), the method achieved a sensitivity of 80.23%, a positivity of 49.40%, and a specificity of 96.85%, exceeding most existing studies. Even without any manually extracted features or heartbeat preprocessing, the technique achieved high classification performance in the interpatient assessment paradigm.


Introduction
An electrocardiogram is a sequence that records the electrical activity of the heart [1]. With the increase in the number of heart diseases [2] and the rapid development of computer technology such as deep learning, there is a growing interest in how to use computer technology to aid in the automatic diagnosis of heart diseases [3,4]. e automated diagnosis of arrhythmias, one of the most common cardiac diseases, has been a popular area of research in computer-aided diagnosis [5]. It consists of two main challenges. e first is the dataset's high imbalance, with normal beats taking up the majority of the ECG signal. e normal beats (N) account for more than 90% of all beats in the MIT-BIH arrhythmia dataset. Second, the training and testing sets used in the interpatient evaluation paradigm are from different populations, which is more in line with our actual requirements for automatic arrhythmia diagnosis, but the prediction difficulty is greatly increased due to individual variability of different populations. e existing arrhythmia classification algorithms include two main types, depending on the input data. e one is arrhythmia classification based on manual feature extraction, and the other is arrhythmia classification based on automatic feature extraction [5].
In the method of manual feature extraction, the manually extracted features [3,6,7,8,9,10,11,12,13,14] mainly include RR interval, short-time Fourier transform, morphological features, empirical modal decomposition, higherorder statistics, and entropy metric. en machine learning classifiers are adopted to classify the extracted features, including the weighted linear discriminator [15,16,17,18], support vector machine [4,19,20,21,22,23], multilayer perceptron [24], and convolutional neural network [25,26,27,28]. Chazal et al. [29] extracted five groups of features, including R-R, HOS, wavelet, morphology, and LBP, from ECG signals and used these features to classify ECG signals by linear classifier. H. Shi et al. [30] extracted six groups of features, including RR interval, morphology, statistics, higher-order statistics, wavelet transform, and wavelet packet entropy, then used a hierarchical XGBoost classifier for classification. Dias et al. [31] manually extracted the RR interval, morphology features, and higher-order features of ECG signals and used an LD classifier to classify arrhythmia. Yang et al. [32] constructed a hybrid kernelbased extreme learning machine to compare different combinations of feature inputs, which yielded the best classification results when ten randomly selected combinations of feature inputs were used in the input. However, these methods rely excessively on manual feature selection, which increases the complexity of the computational process and the time required to extract features upfront.
With the rapid growth of deep learning, there is a need to use deep learning for automatic heartbeat classification by simply inputting the raw ECG data and then enabling the deep learning algorithm to learn the features for us and provide the final classification result [33,34,35]. Garcia et al. [36] explored a vector ECG-based deep convolutional neural network-based arrhythmia classification method to classify three beats, N, SVEB(S), and VEB(V). Takalo-Mattila et al. [37,38] used a high-pass filter, band-stop filter, and low-pass filter to remove noise from the ECG signal, then sliced the ECG signal according to the location of the marked R peaks in the database, and finally used a convolutional neural network to classify the four beats of N, S, V, and F. Li et al. [39] used equal time (5 s) slicing of the raw ECG, the discrete wavelet transform for noise removal, and a deep residual convolutional neural network for classification. However, the sensitivity of S was often low in these studies, making it challenging to apply in reallife situations. Sellami et al. [40] used a robust deep convolutional network to classify arrhythmia. e authors created a batch-weighted loss function to alleviate the data imbalance problem and used three different heartbeat input patterns for experimental comparison. In the classification model, S had a high sensitivity, whereas N had a low sensitivity when compared to other algorithms. Niu et al. [41] used the SBCX method to process the input beats, effectively removing the effect of baseline drift noise. At the same time, the authors combined the processed heartbeat signal and RR interval features together as the input data for classification. e classification effect was good, but too much preprocessing was carried out, and the classification process was not automated enough.
Based on these problems, this article proposes a new heartbeat segmentation method and constructs a deep neural network ensemble classifier with focal loss. It performs effective arrhythmia classification without any preprocessing using only raw heartbeat data.
We present the ECG dataset used and discuss the details of the implementation of the heartbeat classification algorithm in this article in the next section. In Section 3, we compare the algorithm suggested in this article to some existing algorithms and conduct ablation studies on the algorithm's primary components. We conclude this article in Section 4.

Methods
e overall structure diagram of the proposed classification system is shown in Figure 1. e publicly available MIT-BIH arrhythmia dataset is used as the input heartbeat in this article. After heartbeat sampling, the input heartbeat data are segmented using a special heartbeat segmentation method and used as the training set. e deep convolutional neural network ensemble classifier's training process uses focal loss as the loss function, and the classifier is voted to obtain the final classification results. e method proposed in this paper is validated under the interpatient assessment paradigm.

ECG Database.
e MIT-BIH arrhythmia database, which contains 48 30-minute long records from 47 patients, is used for the raw data. e dataset includes detailed annotations from cardiologists containing the type of heartbeat for each beat and the location of each R-peak peak [42].
For comparison, the ECG dataset is partitioned into DS1 and DS2, as described in [29]. In the interpatient evaluation paradigm, a modified limb lead II (MLII) is used to go as the input signal for the model, using DS1 for training and DS2 for experimental testing. As shown in Table 1, the heartbeat types are classified into five categories according to the American Association for the Advancement of Medical Devices (AAMI). Four records with rhythm are excluded, as suggested by AAMI. e specific division is shown in Table 2.

Segmentation of ECG Signals.
Following the database division, each heartbeat record is segmented based on the R-peak annotation location provided by the dataset. Table 3 shows the segmentation lengths of the different methods and the sensitivity of the S in the final classification results. It can be seen that most methods have a heartbeat segmentation length of 300 or less. Due to the high degree of similarity between S and N, and the small proportion of S in the overall heartbeats, it is easy to cause overfitting of S during the training process, resulting in low sensitivity of S. Under the interpatient assessment paradigm, Garcia (2017) and Takalo-Mattila (2018) had a classification sensitivity of just around 60% for S [36,37]. Jinghao Niu (2020) and Haojie Zhang (2021) had higher sensitivity of S (77.35% and 88.24%), respectively, but their training data not only contained the original signal of the heartbeat but also included the manually extracted R-R interval features [39,41]. After removing the R-R interval features, Haojie Zhang (2021) and Jinghao Niu (2020) both had extremely low sensitivity (38.7% and 8.06%).
In this article, we want the model to be able to classify heartbeats automatically using only the raw heartbeat signal, with no data preprocessing or manual feature input. So, we provide a new heartbeat segmentation strategy. Figure 2 shows the difference between the traditional single heartbeat segmentation approach and the heartbeat segmentation approach in this article. Figure 2(a) shows the traditional single heartbeat segmentation method [41].    consisting of 250 samples before and 257 samples after the R-peak. Figure 2(c) shows a comparison of N and S in the two segmentation methods. Figure  can be seen that the morphology of N and S is highly similar in the traditional segmentation method, which makes it difficult for the classifier to distinguish N and S. Comparing Figure 2(c)(ii) and Figure 2(c)(iv), it can be seen that the segmentation method in this article increases the window length of the heartbeat, which is conducive to extracting more neighbourhood features from the original signal. e morphological distinctions between N and S are bigger when applying the segmentation method in this research, making it easier for the classifier to distinguish between the two. Table 4 shows the proportions of each class of heartbeats after segmentation. To make the classification more automated, this article does not apply any filtering to the segmented heartbeat signals and only inputs the raw heartbeat data for classification. e proportion of heartbeats following segmentation is severely unbalanced, as shown in Table 4. To prevent overfitting in the training, the effect of data imbalance needs to be further weakened.

Overall Structure of the Algorithm and Heartbeat
Sampling. Figure 3 shows the general structure of the method. Based on the interpatient paradigm, the MIT-BIH arrhythmia database is classified into DS1 and DS2. en, for N, random sampling is used, and incremental sampling is used for the remaining data classes [43]. e neural network is trained using the obtained samples from DS1. Six focal loss    [37,39,40,41,44], this publication raises the sampling size for class S to 7544, for class V to 4592, and for class N to be randomly sampled with a size of 11188 and the sizes of F and Q remain unchanged,.
ere are two main reasons why such a unique approach to data sampling is chosen. Firstly, we want the classifier to focus more on the classification of S, so S accounts for the largest proportion of abnormal heartbeats in the training set. Secondly, the final result is a vote for each focal loss deep neural network classifier. In the training set of each classifier, only N is a random sample from DS1; the rest of the samples are the same. According to the theory of ensemble learning [45,46], when the base classifiers have the same classification performance, the higher the independence of each base classifier, the better the overall classification performance. As a result, the highest percentage of N (47.11%) is found in each training set.

Focal Loss Function.
After random sampling and incremental sampling of DS1, the impact of data imbalance is mitigated. To further reduce the problem of overfitting, the focal loss function is adopted. e focal loss was proposed by He et al. [47] to address the problem of difficult imbalance in classification among data in dense object detection. e amount of data containing objects in object detection is much smaller than the amount of data without objects, and the difficulty of classifying data without objects is low, which has a very small improvement effect on the model. Focal loss reduces the weight of easily classifiable samples, allowing the model to focus more on the hard-to-classify samples.
Lf � −a t 1 − P t y log P t .
In (1) and (2), P t represents the probability of the predicted value. Equation (1) is the traditional cross-entropy loss function, where the closer P t is to 1, the smaller the loss value (CE) is, thus achieving the training purpose by making the total loss value decrease in the training of the  Journal of Healthcare Engineering model. Equation (2) is the local loss function, compared to the traditional cross-entropy loss function addeda t and(1 − P t ) y . e proportion of loss value weights for different samples can be adjusted by a t , which helps us to alleviate the problems caused by data imbalance. For (1 − P t ) y , the proportion is smaller when P t is closer to 1. erefore, we can make the model pay more attention to data with smaller P t values, thus increasing the model's attention to the hard-to-classify samples. In this study, such a focus is very important. We can adjust the size of y to make the model more focused on the distinction between N and S.
In this article, we choose y � 2.35 and use different weighting ratios for different heartbeat categories. e specific weight ratio is N : S : V : F : Q � 1.6 : 1.8 : 0.8 : 1.0 : 1.0.

Structure of the Deep Residual Convolutional Neural
Network. Figure 4 shows the classification model structure for each base classifier. e target heartbeats and their category labels comprise the input data. e network consists of 9 convolutional layers, each with a convolutional kernel size of 17. e first five convolutional layers include 20 convolutional kernels, while the final four convolutional layers contain 40 convolutional kernels. A batch normalization [48] process is added after each convolutional layer to speed up the training process. e use of "Tanh" after the first convolutional layer and "ReLU" after the other convolutional layers allows the model to better adapt to the data. To prevent overfitting during the deep convolutional neural network training, a 40% dropout is added after 3-9 convolutional layers [49]. To enable the network to be trained at a deeper level, the residual structure proposed by Kaiming He et al. [50] is used. e addition of jump connections helps to weaken the problem of depth information loss. After the last jump connection, batch normalization and "ReLU" processing are again performed. Finally, our base classifier results are obtained after global average pooling and a fully connected layer.

Performance Metrics.
To evaluate the performance of the model, the confusion matrix of the model is given, and four statistical performance metrics are adopted to evaluate our method as a whole according to the guidelines provided by AAMI [29], namely, accuracy (Acc), sensitivity (Sen),    e calculation of each metric is defined in the following equations: When calculating the overall model performance metric, TP is defined as the number of correctly classified abnormal heartbeats, TN is defined as the number of normally classified normal heartbeats, FP indicates the number of normal heartbeats classified as abnormal, and FN indicates the number of abnormal signals classified as normal.

Experimental Results and Discussion.
e model performance for the six base classifiers and the final ensemble classifier is shown in Table 5. e overall accuracy (Acc) of the base classifiers is all above 83%, with the majority exceeding 88%. Most of the sensitivities (Sen) are around 85%. e final classification result is obtained after voting for the six base classifiers.
ere are two reasons for this. First, neural network training has a chance. Second, in the focal loss function in this article, S has a high weight, which may cause the classifier to overfocus on S during the training process. e confusion matrix of the final classification results is given in Table 6. Table 7 shows the sensitivity (Sen), positive productivity (+P), and specificity (Spe) for classes N, S, and V and compares them with six classical arrhythmia classification algorithms. Table 8 shows the overall performance of the model in terms of accuracy (Acc), positive productivity (+P), sensitivity (Sen), and specificity (Spe) and compares it with the classical algorithms.
Among them, the accuracy of Garcia (2017) is slightly higher than the method proposed in this article (92.38% to 91.89%), and the sensitivity of N is also slightly higher than that of the method proposed in this article (93.99% to 93.15%). However, the sensitivity of S in this article is much higher than that of Garcia (2017) (80.23% to 61.96%). e sensitivity of V in this article is slightly higher than that of Garcia (2017) (90.99% to 87.34%) and the productivity (+P) of V is much higher than that of Garcia (2017) (83.09% to 59.44%). Of interest is that Garcia (2017) only achieved 3 classifications, rounding off F and Q in the training process, whereas this article is the result of 5 classifications, which is much more difficult. e recognition of abnormal heartbeats is especially significant in classifying arrhythmia, which is related to the model's effectiveness in real-world applications. Reflecting on the overall metrics of the model, in Table 8, the sensitivity (Sen) of the algorithm in this article is higher than it (85.37% to 81.73%). us, although the method proposed in this article is slightly lower in terms of overall accuracy (Acc), the model is more effective in practical applications [36].
Takalo-Mattila (2018) used a convolutional neural network with three convolutional layers to classify the heartbeat signal. Although the adoption of a smaller network structure to perform classification is advantageous in terms of reducing training time, it has a significantly lower classification effect than the approach in this study. Despite the model  structure being more complex, our input heartbeats do not need to be denoised. is also reduces the overall time and better reflects the model's automation [37]. e accuracy of the proposed algorithm is higher than that of Sellami (2019) (91.89% to 89.91%) and the sensitivity of N is higher than that of Sellami (2019) (93.15% to 88.52%). e sensitivity of S in this article is slightly lower than that of Sellami (2019) (80.23% to 82.04%), but the productivity (+P) is higher than it (49.40% to 30.44%). e sensitivity of V in this article is slightly lower than that of Sellami (2019) (90.99% to 92.05%), but the productivity (+P) is higher (83.09% to 72.13%). Sellami (2019) has a high classification performance for all types of heartbeats. However, it has a slightly lower sensitivity to N of only 88.52% compared to other methods. e algorithm in this article has comparable classification results for S and V but has a higher sensitivity to N(93.15%) [40].
Yuanlu Li (2021) desired a more automated classification algorithm and therefore used an equal-length segmentation method that did not require R-peak location. However, the sensitivity of S in their classification results is too low, at only 35.22%. is causes its classification model to almost fail in identifying S. In Table 8, the overall sensitivity of their model is the lowest at just 52.10%, making it difficult to apply in real-life situations [39].
De Chazal (2004) and Jinghao Niu (2020) used manually extracted features in their input data. De Chazal (2004) extracted many domain-specific features from the two-lead ECG signal to construct the classifier [29]. Due to the timeconsuming nature of manual feature extraction and the low sensitivity of S, it has been difficult to meet the current demand for real-time classification of arrhythmia. Jinghao Niu (2020) used input data as a combination of raw heartbeat and RR interval features. To improve the classification performance of the classifier, Jinghao Niu (2020) used SBCX for the input heartbeats [41]. e accuracy of this article's method is lower than that of Jinghao Niu (2020) (91.89 to 95.87%), and the sensitivity of N is lower than that of Jinghao Niu (2020) (93.15% to 98.28%). However, the sensitivity of S is higher (80.23% to 77.35%) and the sensitivity of V is higher than that (90.99% to 85.08%) for the algorithm in this article. We compared our data input with that of Jinghao Niu (2020). e input data taken in this article is 508 points of raw heartbeats and is not filtered or preprocessed in any way. e input data taken in Jinghao Niu (2020) is a combination of 256 heartbeats and RR interval features and is preprocessed using SBCX. erefore, the method proposed in this article is more automated.

Ablation Studies.
We conduct ablation studies on the different components of the framework. In Table 9, signal-508 represents the heartbeat segmentation method proposed in this article. Signal-256 represents the conventional 256sample heartbeat segmentation method [41,44]. Focal loss represents the loss function used in this article. Ensemble represents the model ensemble module.
Replacing the heartbeat segmentation method with the traditional 256-sample heartbeat segmentation method, the sensitivity of S is greatly reduced (80.23% to 16.34%), and the ability to classify S is almost lost. is fact strongly suggests that the heartbeat segmentation approach adopted in this article helps to improve the model's ability to classify heartbeats. e medical diagnosis of arrhythmia is not only based on a single heartbeat but is determined by combining multiple consecutive heartbeats. For example, S will have a shorter RR interval relative to N. When the heartbeat segmentation approach in this article is used, it helps to make it easier for the classifier to obtain distinct features to distinguish between N and S, preventing the overfitting of S during training. By replacing the focal loss with the traditional cross-entropy loss function, the sensitivity of S will decrease (80.23% to 61.60%). e focal loss function helps to make the model focus more on the hard-to-discriminate categories during training, and S, as the generally less sensitive type among the classification results of the various methods, is more likely to receive attention. After removing the ensemble module, the sensitivity of N decreased by 3.78% (93.15% to 89.37%), and that of S decreased by 8.61% (80.23% to 71.62%). In this research, six basic classifiers are trained for voting, resulting in a more robust classification model with increased performance. e comparisons in Table 9 demonstrate the effectiveness of our strategy.

Conclusion
is article proposes a brand-new system for classifying arrhythmias. In order to achieve a more automatic classification effect, we use the original heartbeat signal as the data input in this article.
ree key characteristics define the classification model put forth in this article. First, we employ a novel heartbeat segmentation strategy to assist the model in automatically extracting more features. Second, in order to aid in model training, we use the focal loss as the loss function. Finally, an ensemble classifier is used to produce more reliable classification results. According to the analysis of the experimental data, increasing the input heartbeat window length enhances the model's classification performance, particularly in terms of its sensitivity to S. In the meantime, the ensemble training strategy used aids in reducing the issue of training overfitting brought on by data imbalance. e classification method proposed in this article is still plagued by low classification performance for F and Q. is is due to the fact that the F and Q samples in the MIT-BIH dataset are far too minimal. As a result, we intend to collect some additional arrhythmia data to help us better train the classification model in future work.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest or personal relationships that could have appeared to influence the work reported in this article.