Combining Rhythm Information between Heartbeats and BiLSTM-Treg Algorithm for Intelligent Beat Classification of Arrhythmia

Arrhythmia is a cardiovascular disease that seriously affects human health. The identification and diagnosis of arrhythmia is an effective means of preventing most heart diseases. In this paper, a BiLSTM-Treg algorithm that integrates rhythm information is proposed to realize the automatic classification of arrhythmia. Firstly, the discrete wavelet transform is used to denoise the ECG signal, based on which we performed heartbeat segmentation and preserved the timing relationship between heartbeats. Then, different heartbeat segment lengths and the BiLSTM network model are used to conduct multiple experiments to select the optimal heartbeat segment length. Finally, the tree regularization method is used to optimize the BiLSTM network model to improve classification accuracy. And the interpretability of the neural network model is analyzed by analyzing the simulated decision tree generated in the tree regularization method. This method divides the heartbeat into five categories (nonectopic (N), supraventricular ectopic (S), ventricular ectopic (V), fused heartbeats (F), and unknown heartbeats (Q)) and is validated on the MIT-BIH arrhythmia database. The results show that the overall classification accuracy of the algorithm is 99.32%. Compared with other methods of classifying heartbeat, the BiLSTM-Treg network model algorithm proposed in this paper not only improves the classification accuracy and obtains higher sensitivity and positive predictive value but also has higher interpretability.


Introduction
With the improvement of people's living standards, the incidence and mortality of cardiovascular diseases are increasing year by year and are accompanied by a younger trend [1]. Arrhythmia is a common cardiovascular disease, which may endanger people's lives in serious cases [2]. erefore, the accurate detection of arrhythmia to prevent heart disease has a very important significance. Electrocardiogram (ECG), as a comprehensive expression of cardiac electrical activity on the body surface, contains a wealth of physiological and pathological information reflecting cardiac rhythm and electrical conduction and is one of the important bases for diagnosis of heart disease and evaluation of cardiac function [3]. Different types of arrhythmias can be identified and diagnosed by analyzing the ECG waveform. Traditional ECG waveform analysis is performed manually by medical personnel, who need to give a diagnosis based on cardiovascular disease diagnosis rules and personal experience. Due to the individual differences of patients and the complexity of diseases, there are many types of ECG. In addition, some arrhythmias occur only occasionally in the daily life of the patients, and the ECG data need to be recorded for a long time.
erefore, the amount of collected ECG data is huge, which brings a heavy burden to doctors. Under the circumstances, mistakes, missed inspections, or misdetections easily occur. With the rapid development of computer technology and electronic information technology, the computer has become an indispensable and important tool of medical modernization, and computer-aided medical treatment has penetrated into every corner of medical service [4]. In recent years, increasing attention has been paid to the study of computer-aided analysis algorithms for electrocardiography, particularly those that can accurately and rapidly identify and diagnose arrhythmias. e automatic classification and diagnosis algorithm of ECG signals can save doctors' time by helping them better judge the symptoms of arrhythmia quickly. In addition, it can provide good healthcare in areas where medical resources are scarce.
is paper presents a beat classification method based on the time-series network, which integrates the interheartbeat rhythm information. is method is based on tree regularization constraints and the BiLSTM neural network model. is method improves the accuracy of heartbeat classification. And the interpretability of the proposed algorithm is analyzed by tree regularization constraints and feature analysis. e main contributions of this work are as follows: (1) A time-series BiLSTM-Treg algorithm was designed to classify the beats, which combined the information of the beats so that the deep neural network could learn more rhythm information between heartbeats.
(2) A tree regularization method for the heartbeat classification model is proposed to optimize the BiLSTM-Treg algorithm and improve the generalization ability of the neural network model. (3) By analyzing the key nodes of the simulated decision tree in tree regularization, the concerns in the learning process of the BiLSTM-Treg algorithm are analyzed, and the interpretability of the model is analyzed to a certain extent. (4) Compared with other deep learning methods, the proposed BiLSTM-Treg algorithm improves the accuracy of heartbeat classification and reduces doctor's misdiagnosis rate to a certain extent 2. Related Work e diagnosis of early arrhythmia is mainly the doctor's manual analysis of ECG waveform, which requires the doctor to have a professional medical theoretical basis and rich clinical experience. Because of the diversity of arrhythmia and the complexity of the ECG waveform, this method cannot meet the needs of patients. With the development of artificial intelligence, the classification of arrhythmia using intelligent processing technology has become a hot topic in recent years.
In the 1950s, the automatic analysis technology of ECG signals has appeared in the field of ECG research. At first, only ECG filtering processing technology developed relatively mature. Later, with the continuous development of technology, automatic detection and diagnosis of arrhythmia disease also began to be continuously explored by researchers. In the past decades, domestic and foreign ECG researchers have proposed a variety of heartbeat classification methods. ese methods can be divided into two categories from the perspective of whether manual feature extraction of ECG signals is needed: feature engineeringbased classification methods and deep learning-based methods [5]. Traditional rule-based and machine-learningbased heartbeat classification methods both require manual feature extraction.

Heartbeat Classification Method Based on Feature
Engineering. Feature engineering is to process a series of original data and extract the features as the input of the model to improve the performance of the model. Feature engineering mainly includes three aspects: feature selection, feature extraction, and feature construction. Feature extraction is the key step of ECG signal classification and recognition, and the extracted feature quality will affect the accuracy of ECG signal classification and recognition [6]. Generally, the features of ECG signals extracted by researchers mainly include morphological features [7], interphase features [8,9], wavelet transform features [10], higher-order statistics (HOS) [9,11], Hermite basis function (HBF) [12], QRS amplitude vector [13], and QRS composite wave area [14]. en machine-learning algorithms are used for classification, such as the KNN algorithm [15], support vector machine (SVM) [7], and random forest [9]. Zhu et al. [7] extracted the ECG morphological features and used the SVM algorithm to classify the heartbeat, achieving a high classification accuracy. Yang et al. [9] extracted a variety of features, including RR interval, wavelet coefficient, and highorder statistics, and then used the random forest classifier based on an extreme learning machine to detect arrhythmias. Ji et al. [15] proposed a multifeature combination and stacked DWKNN algorithm to classify arrhythmias. e effects of different characteristic combinations on the classification of the heartbeat were analyzed.
Although this method based on feature engineering can also achieve relatively high classification accuracy, because of the complex waveform and poor anti-interference ability of ECG signal, the features extracted by hand often produce the human error. And the features of the manual design are very dependent on the prior knowledge of the researcher. Deep learning has the advantage of automatically extracting features and classification, which well solves a series of problems caused by manual feature extraction.

Heartbeat Classification Method Based on Deep Learning.
e deep learning model has become a common model for ECG data classification [16]. Compared with the feature engineering-based ECG classification method, the deep learning method, which uses original data rather than manually extracted features as input, can achieve better classification performance. In the deep learning method, researchers use the nonlinear transformation of hidden layers in the network to automatically obtain effective features and transform the original features into different new feature spaces by changing the structure of hidden layers in the network and the way of stacking [17], so as to make full use of the rich hidden information in the data and improve the classification accuracy.
Recently, some researchers [18,19] have used a deep neural network model for automatic classification of ECG signals. Ji et al. [20] proposed an ECG classification system based on Faster R-CNN. One-dimensional ECG signal is converted into two-dimensional image as the input of neural network to realize the classification of arrhythmias. Akarya et al. [21] proposed a 9-layer deep convolutional neural network (CNN) for automatic recognition of ECG signals. e original ECG signal and the ECG signal filtered out the high-frequency noise were used to classify the heartbeat, and the accuracy rates were 94.03% and 93.47%, respectively. Khan et al. [22] used the long short-term memory network (LSTM) to automatically identify 16 different types of arrhythmias. Wu et al. [23] proposed a heartbeat classification algorithm that integrated CNN and BiLSTM deep learning models and extracted the morphological and temporal features of heartbeat, respectively, by using CNN and BiLSTM. Li et al. [24] proposed a BiLSTM-Attention Network model to distinguish different types of arrhythmias. Pandey et al. [25] applied the extracted features of wavelet, RR interval, morphology, and high-order statistics to BiLSTM to achieve the automatic classification of the heartbeat. Yildirim et al. [26] proposed a heartbeat classification model based on wavelet transform and BiLSTM network, which used wavelet to decompose ECG signals into signals of different frequency scales and used the signals as the input sequence of the BiLSTM model. e classification method of ECG signals based on deep learning realizes the "end-to-end" learning mode, eliminates the manual design process of features, saves manpower, and makes the process of ECG classification simpler and more efficient. Although all the above studies cleverly used the deep neural network to classify ECG signals, the rhythm information between heartbeats has not been fully considered, the interpretability of the network has not been analyzed, and the classification accuracy needs to be improved.

Method
e heartbeat classification method of the BiLSTM-Treg algorithm that integrates rhythm information between heartbeats proposed in this paper mainly includes the following steps: firstly, the data are preprocessed to filter out the noise in the ECG signal and segment ECG signal into heartbeats. Secondly, the continuous single heartbeat is combined into heartbeat segments so that the rhythm information between the heartbeats can be retained. en, the BiLSTM-Treg model was constructed and optimized. Finally, the heartbeats were classified. Section 3.1 is the preprocessing part, Section 3.2 is the representation of the rhythm information part, and Section 3.3 is the model building and optimization part.

ECG Signal Preprocessing.
e preprocessing stage is mainly denoising and segmentation of ECG signals. Generally speaking, the collected ECG signals inevitably contain noise due to the influence of equipment and human body itself [27], which mainly includes baseline drift, power frequency interference, and EMG interference. It is important to remove as much noise as possible from ECG signals before classifying them. Wavelet transform is a generalization of short-time Fourier transform (STFT) [28], which can perform timefrequency analysis of ECG signals well. Compared with the equally spaced time-frequency localization of STFT, wavelet transform can provide higher frequency resolution at low frequency and higher time resolution at high frequency. In this paper, discrete wavelet transform is used to denoise ECG signals, which can avoid losing important physiological details in ECG signals and better retain the characteristics of ECG signals. Because of the high regularity of the Daubechies wavelet, the reconstructed signal is relatively smooth. And the strength spectrum of the DB6 wavelet [29,30] is focused on low frequencies. Its moderate filter length and moderate coefficient values, compared with the other wavelets, provide more smoothing and less shift in the ECG fiducials. erefore, in order to obtain a good classification accuracy, this paper uses the DB6 wavelet in the Daubechies wavelet base to process ECG signals. In terms of implementation, we use python's open-source wavelet transform tool pywt. e discrete wavelet transform formula [31] is shown in (1) and (2).
where W Ψ (j, k) is the wavelet coefficient, Ψ j,k (x) is the discrete wavelet function at different scales and locations, f(x) is the input ECG signal, Ψ(k) is the wavelet basis function, and j is the order of the scale. e larger j is, the smaller the scale is, which means the higher the frequency is and the closer it is to the details. k is the offset of position. a 0 is the scale parameter and b 0 is the position parameter. Signal comparison before and after pretreatment with discrete wavelet transform is shown in Figure 1 and Figure 2. Heartbeat segmentation is to divide an ECG record with a complete heartbeat as a unit [32]. A complete heartbeat should contain P wave, QRS compound wave, and T wave [33], as shown in Figure3(a). In this paper, the peak value of the R wave marked in the MIT-BIH database was used as the reference point for heartbeat segmentation, and 0.25s and 0.4s were extracted before and after the peak of R, as shown in Figure 3(b). We take this 0.65S data as a sample of a single heartbeat. For MIT-BIH ECG data with a sampling rate of 360HZ, we extracted 90 points before R peak and 144 points after R peak. erefore, the reconstructed sample is 235 points.

Rhythm Information between Heartbeats.
e rhythm information between heartbeats contained in the ECG is an important basis for doctors to diagnose heart diseases. Changes in ECG rhythm can reflect problems in different parts of the heart, which can help medical staff design more rational treatment plans. Common rhythm types are bigeminy, trigeminy, ventricular tachycardia, and atrial tachycardia.   Bigeminy. Every normal heartbeat is followed by a premature beat. And the occurrence of three or more groups in a row is called bigeminy. According to the type of premature beat, it can be divided into ventricular bigeminy and atrial bigeminy. For example, the rhythm change of N-V-N-V-N-V is ventricular bigeminy, and the rhythm change of N-S-N-S-N-S is atrial bigeminy.

Trigeminy.
A premature beat occurs after every two normal heartbeats. And the occurrence of three or more groups in a row is called trigeminy. According to the type of premature beat, it can be divided into ventricular trigeminy and atrial trigeminy. For example, the rhythm change of N-N-V-N-N-V-N-N-V is ventricular trigeminy, and the rhythm change of N-N-S-N-N-S-N-N-S is atrial trigeminy. e ECG signal with ventricular trigeminy is shown in Figure 4.

Ventricular Tachycardia.
ree or more consecutive ventricular premature beats are called ventricular tachycardia, such as the rhythm change V-V-V.
Atrial Tachycardia. ree or more consecutive atrial premature beats are called atrial tachycardia, such as the rhythm change S-S-S.
In addition, the appearance of certain types of heartbeats also reflects changes in ECG rhythm. For example, after a continuous ventricular tachycardia, a ventricular fusion heartbeat is often generated due to electrical signals from the sinus node, followed by ventricular capture. erefore, ventricular fusion heartbeat and ventricular capture are important characteristics of ventricular tachycardia.
In this paper, this rhythmic information, which is beneficial to the classification of heartbeats, was integrated into the model. Specifically, in processing the dataset, successive single beats were grouped into segments, which preserved information about rhythm between beats. en, the ECG data is input into the neural network model in the unit of heartbeat segment, which enables the model to make full use of the rhythm information contained in the heartbeat segment when identifying the heartbeat type and improves the classification accuracy. e length of the heartbeat segment is one of the key points of our study.

BiLSTM-Treg Algorithm.
Recurrent neural network (RNN) is a kind of neural network with short-term memory ability, which is very effective in processing data with sequence characteristics. However, in deep neural networks, the gradient is unstable. e gradient close to the input layer is calculated based on the product of the gradients of the subsequent layers [34]. When the neural network has too many hidden layers or the input sequence of the RNN network is too long, it will cause the gradient near the input layer to vanish or blow up, which affects the performance of RNN to some extent. In order to solve this problem, Hochreiter et al. [35] proposed the long short-term memory network (LSTM) in 1997. By adding gating units into RNN, the network can choose whether to retain the historical information so as to solve the problem of gradient disappearance and gradient explosion caused by long-term dependence of the RNN network.

BiLSTM Neural Network
Structure. Compared with RNN, LSTM adds three gating units, which are input gate, forgetting gate, and output gate. In addition, there are two important parts of LSTM, namely, memory unit, and hidden state. e forgetting gate controls whether the information in the memory unit is discarded, the input gate controls whether the information of the current signal and hidden state is added to the memory unit, and the output gate determines the information output in the memory unit. Figure 5 shows the unit structure of the LSTM, where f t , i t , and o t , respectively, represent the forgetting gate at the current moment, the input gate, and the output gate; C t−1 and C t , respectively, represent the state value of the memory unit at the previous moment and the current moment; h t−1 and h t , respectively, represent the hidden state at the previous moment and the current moment. x t represents the input at the current moment, and C t is the candidate value of the memory unit at the current moment. σ and tanh represent the sigmoid activation function and tanh activation function, respectively. e calculation process of LSTM can be expressed as equations (3)(4)(5)(6)(7)(8): Formulas (3)- (6), respectively, represent the calculation formulas for the input gate i t , forget gate f t , output gate o t , and candidate value C t of the memory unit. ey are all determined by the input data x t at the current moment, the hidden state h t−1 at the previous moment, and their corresponding weight matrix, where W i , W f , W o , and W c are the weight matrix of the current input x t ; U i , U f , U o , and U c represent the weight matrix of the hidden state h t−1 at the last moment; b i , b f , b o , and b c are the corresponding bias items, respectively. ese weight matrices and bias terms are trained by the way of gradient descent. Formula (7) indicates that the current moment memory unit C t is adjusted by the current candidate unit C t and its own state C t−1 as well as the input gate and the forgetting gate. Finally, formula (8) indicates that the output at the current moment, that is, the hidden state at the current moment, is determined by the current memory unit C t and the output gate.
One disadvantage of LSTM is that it cannot encode information upfront and can only use its past context, not its future context. In the classification of heartbeat, if the Journal of Healthcare Engineering relevant information of the former and the latter can be obtained at the same time during the classification of the current heartbeat, the rhythm information of the heartbeat will be grasped more accurately, thus improving the classification accuracy of the current heartbeat. And BiLSTM solves this problem well [36]. In each BiLSTM layer, there are two independent LSTM to process sequences in two directions, respectively. e specific formula is shown in (9)- (11). At the time t, the hidden layer state H t of BiLSTM obtains the heartbeat information h t → before the time t through the forward LSTM and the heartbeat information

BiLSTM Network Based on Tree Regularization.
In machine learning, there are many strategies designed to reduce model generalization errors, which are collectively referred to as regularization. e form of regularization is very simple, which is to add an additional term after the objective function to affect the selection of the optimal point of the objective function. e common regularization methods are L1 regularization and L2 regularization. e common regularization methods are L1 regularization and L2 regularization. e objective function is shown in equation (12), where λΨ(W) is a regular term. min W N n�1 loss y n , y n x n , W + λΨ(W).
Tree regularization is a new regularization method proposed by Wu et al. [37], which can not only effectively improve the generalization ability of the model but also analyze the interpretability of the model. e tree regularization method of deep network model interpretability is a postinterpretable method, that is, the method of applying model analysis after model training to make the model interpretable. is method looks for the decision tree representation of the deep network model and realizes the human understanding of the prediction results of the network model by improving the human simulability of the network model. e implementation method of tree regularization includes the following two stages. First, we train deep neural network while being closely modeled by decision trees. In this way, this decision tree can accurately simulate the prediction process of the network. Secondly, the complexity metric of the decision tree, the average path length (APL), is taken as the penalty term for model  Figure 5: e LSTM cell structure.
optimization. In this way, the neural network can be encouraged to generate simple decision trees and restricted to generate complex decision trees, which further makes the generated decision trees easier to be simulated by human beings. e decision tree generation formula can be expressed by (13) and (14), where x n is the sample feature of the training set, y n (x n , W) is the prediction label of the depth model, W is the weight matrix of the depth model, and y tn is the prediction label of the decision tree. e reason why y n is used as the input of the decision tree is to make y tn and y n as similar as possible so as to realize the purpose of using the decision tree to simulate the deep network.
Tree � Traintree x n , y n x n , W , e calculation formula of tree regularization is shown in (15), where Path Length(tree, x n ) is the path length of the n sample and Ω(W) is the average path length, namely, the penalty term.
It can be seen from equation (15) that is not differentiable for network parameter W. erefore, in order to use the gradient descent strategy in the network optimization process, Wu et al. [37] proposed the surrogate regularization function Ω(W), which can surrogate the previous APL calculation method, as shown in equations (16) and (17). By training a Multilayer Perceptron (MLP), the mapping relationship between the parameter vector W of the neural network model and APL is established. With W and APL as inputs to MLP, the objective function of MLP is shown in equation (17), where ξ represents the weight matrix of the MLP model, ε represents the regularization intensity, W j , Ω(W j ) represents the known parameter vectors and their corresponding real APL datasets, and J represents the total number of datasets. erefore, after using the surrogate model, the objective function of the BiLSTM network is shown in equation (18).
min W N n�1 loss y n , y n x n , W + λΩ(W).
In this paper, tree regularization is used in the BiLSTM model to optimize the model, reduce the generalization error of the model, and improve the classification accuracy. At the same time, the generated simulated decision tree is used to analyze and understand how the BiLSTM model carries out heartbeat classification.
e BiLSTM model using tree regularization is shown in Figure 6. Specifically, x t � [x t1 , x t2 , . . . , x t235 , ] is used to represent a single heartbeat sample. e heartbeat segment composed of consecutive single heartbeats is used as the input of the network, and the number of single heartbeats in the heartbeat segment t is the timestep of the network. e model first uses BiLSTM to classify heartbeat. Secondly, the decision tree is used to simulate BiLSTM, and APL is calculated. en, the MLP model is trained to get the surrogate regularization function Ω(W), and then Ω(W) is added to the objective function of the BiLSTM model for the next round of training. Algoithm 1 describes the BiLSTM-Treg model algorithm.

Experiment
e processing and analysis of ECG signal is very important to the classification of the heartbeat. e research focus of this paper is on the construction and optimization of the model integrating rhythm information. According to the ANSI/AAMI EC57:2012 classification proposed by the Association for the Advancement of Medical Instruments (AAMI), arrhythmia can be divided into five categories: N (normal or bundle branch block), S (supraventricular ectopic beat), V (ventricular ectopic beat), F (fusion beat), and Q (beat not specified). On the basis of extracting continuous heartbeat segments, this experiment constructs a time-series network that integrates rhythm information between heartbeats and divides heartbeats into the above five types.

Experimental Environment.
e model proposed in this paper is trained and tested on a PC workstation with Xeon(R)Silver-4114CPU, 32 GB memory, and Gefor-ce2080Ti graphics card. e PC workstation runs on Ubuntu 18.04 system. And the algorithm is run under the Tensor-Flow-GPU V2.2.0 framework.

Experimental Data.
A unified and authoritative standard database is the basis of the automatic analysis of ECG signals. In the research field of ECG signals, the MIT-BIH arrhythmia database is the most widely used database by researchers [38]. e database contains 48 records, each of which is about 30 minutes long, with about 650,000 sampling points and a sampling frequency of 360 Hz. Fifteen categories were labeled in the MIT-BIH arrhythmia database. Table 1 is the corresponding table of the two heartbeat classification methods.
In this paper, we classified 109,454 heartbeats from the MIT-BIH arrhythmia database, including 90,595 N-type heartbeats; 2,781 heartbeats in the S category; 7,235 V-type heartbeats. e number of heartbeats in category F was only 802 and in Q was 8041. In this paper, 90% of the heartbeat data were randomly selected from the dataset as the training set and the remaining 10% as the test set. And the specific distribution of data is shown in Table 2.

Evaluation Metrics.
In order to calculate the performance of the model for heartbeat classification, the classification results were divided into four categories: TP, FP, TN, and FN. Take N-type as an example; formulas (19)- (22), respectively, represent the calculation methods of type N true positive heartbeat (TP N ), type N false-positive heartbeat (FP N ), type N true negative heartbeat (TN N ), and type N false-negative heartbeat (FN N ). Table 3 shows the confusion matrix of the classification results.
In this paper, sensitivity, specificity, positive predictive value, and accuracy are used as indicators of classifier performance. Sensitivity (Se), also known as recall rate, is the proportion of positive samples that are correctly judged to be positive. e higher the sensitivity, the greater the proportion of samples correctly predicted. Specificity (Sp) is the proportion of correctly judged negative samples to actually negative samples. e positive predictive value (+p) refers to the proportion of correctly judged positive samples to all the judged positive samples. Accuracy (Acc) is the ratio of the sum of true positives and true negatives to the total number of samples, reflecting the consistency between test results and actual results. e calculation formula of the above four evaluation metrics is shown in (23)(24)(25)(26).

Results and Analysis
In order to build a time-series network model that is most suitable for the task of heartbeat classification and more accurately distinguish the categories of arrhythmias, we conducted the following five groups of experiments. In this section, we first compare and analyze the performance of RNN, GRU, and LSTM in heartbeat classification (Section 5.1). Secondly, the network is changed to bidirectional, and the classification results of BiRNN, BiGRU, and BiLSTM are compared (Section 5.2). irdly, by comparing the effects of different heartbeat lengths on the classification performance of the BiLSTM model, the optimal heartbeat length was selected (Section 5.3). en, tree regularization was used to optimize the BiLSTM model. By adding tree regularization, the generalization ability of BiLSTM is improved, and the classification accuracy is improved, compared with the traditional L1 and L2 regularization (Section 5.4). en, the important features of the simulated decision tree are analyzed and verified by experiments (Section 5.5). Finally, the results are compared with other references (Section 5.6).

Analysis of Experimental Results of Different Time-Series
Networks. In order to select the optimal time-series network model, Experiment 1 selected three network models, namely, RNN, GRU, and LSTM, for heartbeat classification. e experimental results show that the overall classification accuracy of the RNN model and GRU model is 98.98% and 98.97%, respectively. e overall classification accuracy of the LSTM model is 99.09%, which is better than that of the RNN model and GRU model. However, it cannot fully consider the rhythm information by using the one-way recurrent neural network for heartbeat classification. Table 4 shows the classification results and performance of three one-way recurrent neural networks.

Analysis of Experimental Results of Different Bidirectional Time-Series Networks.
e one-way recurrent neural network can only learn the heartbeat information before the current moment when performing heartbeat classification. erefore, we improve the selected LSTM network to BiLSTM so that the network can consider both the previous heartbeat information and the future heartbeat information. And the BiRNN and BiGRU networks are used for comparison and verification. e experimental results show that the overall classification accuracy of the BiRNN model and BiGRU model is 99.13% and 98.92%, respectively. e overall classification accuracy of the BiLSTM model is 99.18%, which is better than that of the BiRNN model and BiGRU model. Table 5 shows the classification results and performance of the three bidirectional recurrent neural networks.

Select the Optimal Length of Heartbeat Segment.
In order to select the optimal length of the heartbeat segment, a total of 7 experiments were conducted. e length of heartbeat segments selected by us is 1, 5, 10, 15, 20, 25, and 30, respectively, and the corresponding timestep of the BiLSTM is also 1, 5, 10, 15, 20, 25, and 30, respectively. e experimental results show that the classification accuracy of the network is gradually improved when the length of the heartbeat segment is less than 15. However, when the length of the heartbeat is greater than 15, the classification performance of the network decreases rapidly. e main reason is that the rhythm information of heartbeat, such as bigeminy, trigeminy, atrial tachycardia, and ventricular tachycardia, can be shown within 15 beats. When the heartbeat segment is too long, the heartbeat information considered by the network is too redundant, which will affect the network performance. Table 6 shows the classification results of the BiLSTM network with different lengths of heartbeat segments.

Analyze the Experimental Results of Different Regularization Methods.
In order to improve the generalization ability of BiLSTM and further improve the classification accuracy, we choose tree regularization to constrain the weight of the network and use the traditional L1 and L2 regularization for comparison. Experimental results verify the feasibility and effectiveness of the proposed model, and the overall classification accuracy is 99.32%. e overall classification accuracy of the models using L1 regularization and L2 regularization was 99.26% and 99.23%. Compared with Experiment 2, the overall accuracy of Experiment 4 was improved by 0.14%, and the precision of class S, class V, and class F was all improved, among which the precision of class F was improved more obviously by 5.62%.
rough the above analysis, it is concluded that tree regularization can effectively improve the classification accuracy of the network, which is better than the traditional L1 and L2 regularization. Table 7 shows the classification results of BiLSTM models under different regularization methods. Figure 7 shows the confusion matrix of heartbeat classification results based on the BiLSTM-Treg model.

Analyze the Key Nodes of the Simulated Decision Tree.
e tree regularization method used in this paper looks for the decision tree representation of the model in the training process of the network. e generated decision tree simulates the decision process of the BiLSTM network model. We call this decision tree a simulated decision tree (SDT). Since there are many feature points in a single heartbeat, the generated SDT is too large, so we selected the tree generated by the top 10 important feature points of SDT when displaying this decision tree, as shown in Figure 8. e top 10 important feature points are 126, 112, 162, 121, 153, 80, 224, 93, 100, and 120. e positions of these feature points corresponding to the ECG waveform are as follows: sampling points 126, 120, 121, and 153 correspond to ST segment, sampling point 112 to J point, sampling point 224 corresponds to the endpoint of T wave, sampling point 162 corresponds to the beginning point of T wave, sampling point 80 corresponds to the peak value of Q wave, sampling point 93 corresponds to the peak value of R wave, and sampling point 100 corresponds to the peak value of S wave, as shown in Figure 9.
In Figure 8, we have modified the representation of the value field in the decision tree node. We represent the value in the value field as the percentage of the number of heartbeats of N, S, V, F, and Q in the total number of heartbeats of the corresponding category. Taking node 2 as an example, 0.08 in the value field represents that the number of class S heartbeats in this node accounts for 0.08% of the total number of class S heartbeats, which means that this node almost contains no class S heartbeats. erefore, according to this simulated decision tree, we have the following analysis:       Journal of Healthcare Engineering between ventricular depolarization and ventricular repolarization [39]. e normal ST segment is smooth and flush with the baseline.
(2) It is shown by nodes 11 and 12 that node 2 distinguishes the F heartbeats from the Q heartbeats according to the value of sampling point 224. ere is only 0.38% of class Q heartbeat in node 11 and 0% of class S heartbeat in node 12. Analysis of the reason: sampling point 224 is the endpoint of T wave in the ECG waveform. T wave is a wave with a larger amplitude and longer duration after the QRS complex, which shows the process of ventricular repolarization.  [40].
To verify that the BiLSTM-Treg algorithm focuses on and learns from these medically significant feature points, in Experiment 5, we only used these 10 important feature points as the features of a single heartbeat and used the BiLSTM-Treg algorithm to classify the heartbeat. e experimental results are shown in Table 8, and the overall classification accuracy is 98.45%. Compared with Experiment 4, Experiment 5 showed no significant decrease in all other metrics except the sensitivity of class S. e experimental results validate the importance of these medically significant feature points in the model. Table 9 compares the classification performance of this method and other literature methods. e experimental data of other pieces of literature also comes from the MIT-BIH arrhythmia database. It can be seen from Table 9 that the method proposed in this paper has    the best classification accuracy, with an overall classification accuracy of 99.32%. e classification methods in literature [23,25] all use the BiLSTM model. e results show that the proposed method has obvious advantages in all metrics except for the low sensitivity of class F, and the classification accuracy is 2.03% and 0.74% higher than the two methods, respectively. From the perspective of heartbeat type, the sensitivity of class S is significantly improved by the method presented in this paper compared with other methods. Compared with the literature [21], the method presented in this paper makes all metrics of Q heartbeat better, especially the sensitivity of Q heartbeat increased by 2.25%. In this paper, a classification method is proposed to integrate the rhythmic information between heartbeats that doctors are concerned about into the time-series network so that the network can learn this information effectively. Moreover, the bidirectional time-series network model can more conveniently obtain the context information of the heartbeat segment, so the algorithm in this paper can have better classification performance in the heartbeat classification problem.

Conclusion
In this paper, an intelligent classification of heartbeat based on the BiLSTM-Treg algorithm is proposed, which integrates rhythm information between heartbeats. is method fully considers the information of heart rhythm, which doctors pay attention to when diagnosing heart disease, and realizes the automatic classification of heartbeats. In this paper, the influence of different lengths of heartbeat segments on the classification results of the model is analyzed to select the best heartbeat segment length. On this basis, the BiLSTM-Treg algorithm was used for heartbeat classification. Experiments were carried out on the MIT-BIH arrhythmia database, and the results showed that the method can effectively distinguish five types of heartbeats, N, S, V, F, and Q, and the overall classification accuracy rate is 99.32%. e significance of this study is to provide patients with more accurate medical care services. e highlight of this study are as follows: (1) e heartbeat segment containing rhythm information between heartbeats was selected as the characteristics of the heartbeat sample, and the BiLSTM-Treg algorithm was used to automatically learn the potential rhythm information of individuals (2) A tree regularization method is proposed to optimize the BiLSTM-Treg algorithm and improve the accuracy of heartbeat classification (3) By analyzing the key nodes of the simulated decision tree, the interpretability of the BiLSTM-Treg algorithm is analyzed (4) e experimental results show that the algorithm proposed in this paper can effectively realize the classification of arrhythmia In the future study, we will collect more class F-type heartbeat data for pretraining of the model so as to obtain more accurate intelligent ECG diagnosis results.

Data Availability
(1) All datasets used to support the findings of this study are included within the paper. (2) All datasets used to support the findings of this study were supplied by the publicly available MIT-BIH database from the Massachusetts Institute of Technology. e URL to access this data is https:// archive.physionet.org/cgi-bin/atm/ATM. (3) e coding used to support the findings of this study has not been made available because the source code in this paper is part of a national project and is a trade secret, so the source code is not available.

Conflicts of Interest
e authors declare no conflicts of interest.