KecNet: A Light Neural Network for Arrhythmia Classification Based on Knowledge Reinforcement

Acquiring electrocardiographic (ECG) signals and performing arrhythmia classification in mobile device scenarios have the advantages of short response time, almost no network bandwidth consumption, and human resource savings. In recent years, deep neural networks have become a popular method to efficiently and accurately simulate nonlinear patterns of ECG data in a data-driven manner but require more resources. Therefore, it is crucial to design deep learning (DL) algorithms that are more suitable for resource-constrained mobile devices. In this paper, KecNet, a lightweight neural network construction scheme based on domain knowledge, is proposed to model ECG data by effectively leveraging signal analysis and medical knowledge. To evaluate the performance of KecNet, we use the Association for the Advancement of Medical Instrumentation (AAMI) protocol and the MIT-BIH arrhythmia database to classify five arrhythmia categories. The result shows that the ACC, SEN, and PRE achieve 99.31%, 99.45%, and 98.78%, respectively. In addition, it also possesses high robustness to noisy environments, low memory usage, and physical interpretability advantages. Benefiting from these advantages, KecNet can be applied in practice, especially wearable and lightweight mobile devices for arrhythmia classification.


Introduction
Arrhythmias are the most common cardiovascular disease and the leading cause of stroke and sudden cardiac death. Electrocardiogram (ECG) is a common tool for detecting arrhythmias because of its noninvasive and easy-to-perform nature. Because of the randomness of arrhythmia onset, it is necessary for patients to be monitored for a long period, causing the difficulty of processing the resulting large number of ECG data. A computer-aided arrhythmia diagnosis system running on mobile devices can reduce labor and responding time, thereby improving the efficiency of daily arrhythmia classification [1,2].
Most of the existing algorithms for automatic ECG recognition of arrhythmia are based on the assessment of morphological features of single or fewer heartbeats. Due to individual differences, the methods based on short-term features are prone to errors. In addition, numerous clinical studies have demonstrated the importance of long-term rhythm features for detecting arrhythmia associated with many diseases such as tachycardia, atrial fibrillation, and premature beats. is motivated us to study a new solution to arrhythmia classification based on a long period of continuous ECG signals.
In recent years, ECG classification algorithms based on deep neural networks (DNNs) have demonstrated excellent performance. DNNs use raw ECG data as input to build an end-to-end model for feature learning and classification to achieve comparable accuracy for both noise-free data and noisy data. Among them, convolutional neural networks (CNNs) [3][4][5] and recurrent neural networks (RNNs) [6][7][8] are the two most frequently used classes of neural nets.
One of the major problems with DNNs is that the performance of the algorithm is highly dependent on the network's scale. Most of the current mobile devices are still limited in both computation power and memory capacity and thus unfit for existing DNN approaches. erefore, it is crucial to building lightweight networks that can operate in resource-constrained environments. Currently, there exist two main approaches to build lightweight networks. (1) e first approach uses network pruning techniques [9] or knowledge distillation [10] to achieve model compression and inference acceleration by removing redundant structures and parameters. Because the accuracy-focused models contain strategies that help overcome various problems encountered during training, such as overfitting, it is difficult to scale such a model down sufficiently without sacrificing accuracy. (2) e second approach designs models specifically for resource-constrained environments. Such efficiency-focused models include MobileNet [11], ShuffleNet [12], and LiteNet [13]. is approach sacrifices model performance to achieve a more efficient network structure [14].
On the other hand, DNNs fail to take advantage of useful features established by cardiologists in the past decades; therefore, we need an immense amount of high-quality labeled data to learn the potential causal relations between ECG data and diseases. Such data are often difficult to obtain and expensive. erefore, only limited data sets can be used for model training. Because of the inevitable contamination by various noises in practical application, models built on limited data sets may be at risk of low robustness. erefore, it is still a challenge to design a lightweight and robust DL algorithm for resource-constrained mobile devices.
Recent studies have shown that neural networks approximate the mapping between input and output data. It is believed that the mapping can be simplified by using domain knowledge or explicit features of the raw data, and the network presentation capability and scale can be optimized accordingly [15]. In ECG signals, the information reflecting diseases corresponds to a certain frequency range. Based on these findings, a lightweight DL architecture named KecNet is implemented to recognize 10-second ECG signal fragments in this study. 10 seconds is the typical duration of the central rhythm strip acquisition on a routine 12-lead ECG [16]. e developed scheme includes the following features. Firstly, a CNN network containing a customized filter bank is designed based on digital signal processing to precisely fit each application. By imposing constraints on filter shapes, it effectively extracts components in a specific frequency range from complex signals. Because the filter design is incorporated with the knowledge of digital signal processing, the proposed approach performs more physical interpretability than the conventional CNN [17]. It is also much more robust against noisy environments and less consuming in terms of implementation cost. Secondly, the long-duration correlation features of the ECG fragments are represented symbolically based on the clinical knowledge, which is used as an additional parameter to further constrain the decision-making process and improve the performance of the model. We evaluate the performance of the proposed KecNet on the MIT-BIH arrhythmia database. According to the Association for the Advancement of Medical Instrumentation (AAMI) protocol [18], heartbeat types in the MIT-BIH database [19] can be sorted into 5 main classes: normal beat (N), supraventricular ectopic beat (S), ventricular ectopic beat (V), fusion beat (F), and unknown (Q). Experimental results show that the proposed method has better performance and robustness for the arrhythmia classification task.

Problem Definition
Most deep learning-based approaches to arrhythmias adopt the form of supervised learning to learn mapping functions. Mathematically, we denote an ECG data set containing N samples as X � (x (i) , y (i) )|i ∈ N , where x (i) is the i th sample and y (i) is the label. e procedure for the neural network learning about data can be viewed as a parameter optimization problem, as formulated in where f( * , θ f ) is the function that we will design to simulate the mapping between data and labels and θ f is the parameters associated with the mapping f( * ). L( * ) is a loss function describing the loss of assigning a prediction category for a sample x (i) with label y (i) . Due to the complexity of arrhythmia pathology, the conventional approach for improving the performance of classification models is to increase the number of layers of the network and enhance the representation capability of the model by adding nonlinear operations. However, this poses three problems. Firstly, the increase in the number of layers causes the increase of parameters in the network, which intensifies the storage difficulty and computational complexity of the model. Secondly, the increase in model depth may cause the risk of vanishing gradient [20], resulting in the inability to effectively update the parameters of the shallow convolution kernel. irdly, more training data is needed to prevent overfitting of the model [21].
Based on the above analysis, our goal is to achieve better classification performance using shallow networks without increasing the model complexity and training data. To achieve this goal, we try to incorporate relevant domain knowledge into the design process of convolutional neural networks, mainly including using the amplitude-frequency characteristics of bandpass filters to filter the noise in ECG signals to improve the model's ability to grasp valid information, and extracting dominant features of ECG data as additional parameters to prevent the loss of temporal information due to the pooling mechanism of CNN.

Methods
e workflow of the proposed method is shown in Figure 1. First, the data is segmented and normalized. e segmented data are fed into the KecNet model, which contains a CNN structure with a customized convolutional layer and a symbolic parameter extraction structure in the feature extraction part of the model. Finally, the fused feature vectors are fed into the softmax classifier for classification.

Sinc-Convolution Layer.
e ECG signal is a mixture of electrical activity from various parts of the myocardium. Depending on the quality of the data, it may also include multiple types of noise, such as baseline drift, motion artifacts, and electromyographic interference [22]. e filter information learned by traditional CNNs usually contains a mixture of noise and multiband modes, which reduces the representation capability and readability of the ECG signal [17].
In order to overcome this defect, it is very important to optimize the first convolution layer of CNN because this layer directly processes the original ECG samples containing rich underlying information and helps the subsequent convolution layer to perform complex nonlinear representation of data. is study introduces the Sinc-convolution layer, a structure based on the interpretable CNN developed by Ravanelli and Bengio for speech recognition [17]. e core idea of this structure is based on parameterized cardinal-sine (Sinc) functions for bandpass filter design. While the conventional CNN learns all the parameters of the filters, the Sinc-convolution defines a tunable filter bank g with clear physical explanation in advance to replace the filter in traditional CNN, as formulated in where θ g is the parameter to be learned. For ECG signal analysis, the bandpass filter is an effective choice for designing tunable filter banks [23], because the time domain signal is divided into different subspaces by frequency band transformation so that the filter can be activated by the information of specific frequency band to achieve more effective and reliable filtering.
In the frequency domain, the amplitude of a universal bandpass filter can be written as the difference between two rectangular filters. After returning to the time domain using the Fourier transform [23], the function g becomes as where f L and f H are the learned low and high cut-off frequencies, and sin c(x) � sin(x)/x.
To mitigate the spectral leakage, a popular solution is windowing by multiplying the truncated function g with a window function: In this work, we use the Hamming [15] window as follows: In fact, as shown in Figure 2, similar results are obtained when training arrhythmia classification models, regardless of whether the window function type is Hamming, Hanning [16], or Kaiser [22]. In addition, since filter g is symmetric, the computational efficiency can be improved with one side of the filter inheriting the results from the other side. e experimental results in Section 5.3 show that the Sinc-convolution layer is more selective in frequency response compared to the CNN. Because the filter effectively extracts components from complex signals over a specific frequency range, it improves the model's robustness and readability [24]. After the Sinc filter self-adaptively classifies the frequency band of the raw ECG data, the standard CNN structure is used to extract the time domain features.

Symbolic Representation of Rhythmic Features.
When analyzing the discrete-time series of the data, converting sequences into symbols with practical meaning is a common method to simplify the analysis process [25]. In fact, some quantitative features based on clinical knowledge (e.g., heart rate variability [26], RR interval [27], etc.) are considered to be more relevant to the underlying pathological mechanisms. Among them, the coefficient of variation (CV) describes the degree of dispersion of the RR interval and is commonly used to measure the regularity of the RR interval [26]. Since the R-peak is the most obvious waveform in the ECG, features based on the RR interval have stronger noise immunity and are one of the most important features for analyzing the rhythm variation [28].  Journal of Healthcare Engineering e model is optimized based on the above findings. In addition to the CNN structure designed to extract spatial morphological features, the CVs were added to the network as symbolic representations of rhythm features. ECG fragments with anomalous rhythms are identified more easily. e computational procedure is as follows/ Step 1. Detect the R-peak of ECG fragment x (i) based on the Pan-Tompkins algorithm [29] to obtain the sequence of R-peak positions Step 2. Calculate the RR interval series Step 3. Calculate the mean value t � n i�1 t i /n and standard deviation σ � ����������� n i�1 (t i − t)/n of the sequence T (i) .

e Architecture of the Network.
e proposed KecNet is based on a 1D convolutional neural network, including one Sinc-convolution layer, two standard convolution layers, three pooling layers, two batch standardization layers, three dropout layers, one global average pooling layer, and three dense layers. Table 1 summarizes the basic structure of KecNet. e key characteristics of different layers in basic KecNet are detailed as follows: (i) Sinc-convolution layer: e length of the filter significantly affects the classification accuracy. With the increase of filter length, the accuracy is improved. rough experiments, the length of convolution kernel L � 251 is selected. It should be noted that since a Sinc filter has only two parameters, no matter how long the filter length is selected, the parameters will not increase (ii) Standard convolution layer: Two 1D convolution layers are added after the Sinc-convolution layer and convolve with a filter size of 5. All convolution layers (including Sinc-convolution layer) adopt ReLU activation function [30] (iii) Max-pooling layer: KecNet performs a maxpooling operation after the Sinc-convolution layer and two standard convolution layers. e max-pooling operation reduces the computation cost between the convolution layers while achieving translation invariance of the neural network (iv) Dropout [31] layer: e dropout layer reduces the complex coadaptation relationships between neurons by randomly dropping a fraction of the network nodes and overcomes the overfitting problem at the same time. We build the dropout layer after each of the two groups of pooling layers, with a ratio of 0.2. We also add a dropout layer after the first full connection layer, with a ratio of 0.3 (v) Batch Normalization (BN) [32] layer: e BN technique ensures the validity of the gradient by adjusting the distribution of the output data, smoothing the loss plane, and speeding up the convergence of the network. We apply the BN layer after the two dropout layers (vi) Global Average Pooling (GAP) [33] layer: GAP averages the values of all elements of the feature map to reduce the number of parameters. We use the GAP layer before the dense layer. Also, the rhythm feature notation described in Section 3.2 is added to the GAP layer to maximize the impact of this factor on the network (vii) Dense layer: Two fully connected layers are used in the basic KecNet. e first layer consists of 16 units. e second layer consists of 8 units. After dense layers, the softmax function is used as a classifier to predict five classes

Materials and Preprocessing.
We used the MIT-BIH arrhythmia database to evaluate the performance of the proposed method. e database contains 48 dynamic ECG recordings, each with 30-minute long, 360 Hz sampling rate, 11-bit resolution, and ±10 mv dynamic range. Each recording contains two lead configurations (usually MLII and V1). Lead II is commonly used for wearable single-lead ECG sensors. According to the AAMI protocol, we merged the original 18 categories of heartbeat types in the MIT-BIH database into 5 major categories. e heartbeat classes' mapping between the AAMI protocol and the MIT-BIH database is shown in Table 2.
After data merging, a sliding window with length M � 3600 is set to segment the data. Because the data sampling rate is 360 Hz, it is equivalent to using about 10 s of data as input. To overcome the data imbalance problem, the data were synthesized by translating the start point for small-size data. After the enhancement, each category has 55000 samples. Because each signal in the MIT-BIH dataset is labeled with a disease class accurate to the second, the class with the largest percentage is used as the label for that ECG segment. Finally, the ECG segments are normalized by Z-score to solve the problems of amplitude scaling and eliminate the offset effect. e data set is divided into two mutually exclusive sets: training set (80%) and testing set (20%), and 12.5% of the data in the training set is used as the validation set. To ensure the consistency of the data distribution, each class is randomly sampled separately according to the proportion of the data set. e data set division is shown in Figure 3.

Evaluation Criteria.
e following indicators are used to evaluate the proposed method:  Figure 4 shows the accuracy and loss of the training and validation set of KecNet at each training phase. ese curves indicate that the accuracy and loss of the model are stable, and the network basically converges after 60 epochs of training. Considering the model's validity and training efficiency, the epoch for the training model was set to 60. e hyperparameters are set by optimizing the model through trial and error, as shown in Table 3. Table 4 shows the performance of the proposed method on the data set. A standard CNN architecture is used as the baseline model, and the length of the first layer convolution kernel is set to 32. It can be seen that the performance of Sinc convolution is better than that of standard convolution. Moreover, the parameters are reduced by about 80% compared with the same structure of CNN. Table 5 shows the effect of the coefficients of rhythm variation on the model performance. e performance of the model is improved by 1-1.5%. However, the increase in time is negligible.

Implementation PC Reduction.
For wearables and mobile devices, improving the accuracy of deep learning algorithms is not the only problem. Most devices have difficulties in deploying complex, high-performance models due to limited computing and storage capacity. erefore, it is equally crucial to reduce the memory footprint of the model. PC represents the spatial complexity of the model, which is an important indicator of the model size and memory usage.
Having a low PC with guaranteed classification accuracy is an advantage of KecNet, because the Sinc-convolution structure greatly reduces the number of parameters in the first convolution layer. Since the low and high cut-off frequencies  Conv1D  32  251  1  Max-pooling  -2  2  Dropout  ---BN  ---Conv module   Conv1D  16  5  1  16  5  1  Max-pooling  -2  2  Dropout  ---BN - Journal of Healthcare Engineering 5 are the only parameters of the filter learned from ECG data, the number of parameters is only dependent on the number of filters. For example, for a layer with F filters of length L, a standard CNN contains F × L parameters, while a Sincconvolution layer only requires F × 2. e parameter gain of the Sinc-convolution layer compared to CNN is shown in where PC is the number of parameters. PC of the CNN increases with the increase of L, but the Sinc-convolution layer always remains the same. Accordingly, the longer the L is, the more gain the KecNet has in reducing PC in comparison with the standard CNN. For instance, the PC of KecNet is reduced by 80% compared to the standard CNN in Table 4.
In addition, we compare the classification performance of KecNet with the classical CNNs: GoogleNet [34], MobileNet, and SqueezeNet. e hyperparameters are set as in Table 3. Notably, MobileNet and SqueezeNet are also lightweight models and have been successfully deployed on resource-constrained mobile devices, such as cell phones, robots, and self-driving cars [35]. e experimental results are shown in Table 6. In terms of classification effect, KecNet outperforms SqueezeNet and MobileNets and is slightly lower than GoogleNet. However, in terms of PC, KecNet reduces the amount of PC by about 50% compared to SqueezeNet and MobileNet and by about 80% compared to GoogleNet.

Filter Analysis.
For the feature mapping obtained from the first convolutional layer, KecNet has some advantages over the standard CNN in terms of interpretability and readability. is is mainly because the Sinc-convolutional structure used in the model is functionally equivalent to a bandpass filter, which learns parameters with a clear physical meaning, i.e., the high and low cut-off frequencies of the ECG signal [24]. Figure 5 shows examples of filters learned by KecNet (Figure 5(a)) and standard CNN ( Figure 5(b)) using the MIT-BIH arrhythmia dataset. From the figure, it can be seen that the filtering result of standard CNN is noisy and not very readable. In contrast, the processing result of KecNet for ECG signal is significantly better than CNN and more regular.
In addition to comparing the effects, it is necessary to analyze which bands are covered by the filters. Figure 6 shows the cumulative frequency response of the filters learned by KecNet and CNN on the arrhythmia classification task.
e cumulative frequency response is obtained by normalizing the accumulation of all the filters in the convolution layer. By analyzing the cumulative frequency response, the importance of frequency-specific information for the arrhythmia classification task can be found, since the frequency bands with small normalized cumulative values are less important. It can be seen that the KecNet plot has two distinct peaks. e first one mainly concentrated in the 0 Hz-15 Hz range, which corresponds to the frequency domain range of the P and T waves in the ECG fragment. e second peak corresponds to the QRS wave in the ECG segment, with a frequency range of 20 Hz to 50 Hz. is result shows that KecNet has successfully adapted its characteristics to classify ECG signals. In contrast, the standard CNN does not exhibit a similarly meaningful pattern: its frequency response curve does not clearly represent the frequency band peaks of the corresponding bands.
at is, the filters learned by KecNet are, on average, more selective than CNNs and thus can better capture the valid information in ECG signals. Table 7 shows several examples of frequency bandwidths extracted with KecNet. It is worth mentioning that the ability to acquire the principal frequency component of the ECG signal enhances the robustness of the network.

Generalization and Robustness in Noisy Environment.
Generally speaking, the data sets used to prove the effectiveness of the proposed method are usually collected under the same conditions. However, there are usually various interferences in real applications. In this case, the effectiveness of the automatic classification method will be  greatly reduced since it is difficult to collect labeled data for the training purpose in all environments. In order to overcome this shortcoming, a robust method is needed. We discuss the robustness of KecNet in various environments by adding white noise to the data. All the models are trained with the original data sets. en, they are tested with noisy signals. Table 8 shows the variation in the accuracy of the models with the signal-to-noise ratio (SNR) from 0 to 60 dB. It can be seen that the accuracy of all models is higher at higher SNR. However, as the SNR decreases, the accuracy (%) of the standard CNN model decreases significantly. e accuracy is below 80% when SNR is 20 dB or less. In contrast, the accuracy of KecNet is still higher than 98% and is stable. It is demonstrated that the proposed method is more robust than traditional CNN. e main reason for this result is due to the fact that the proposed method has a clear filter with a well-defined spectral shape at the first convolution layer, whereas the filter learned by the traditional CNN model is highly correlated with the training data and susceptible to noise interference.

Conclusion
In this work, we propose a lightweight end-to-end solution for resource-constrained mobile devices that leverages domain knowledge to optimize neural networks for arrhythmia classification. Firstly, we introduce a physically interpretable Sinc-convolutional layer to learn customized filters with clear bandwidths as features to improve the feature extraction ability of CNNs and reduce the number of parameters. Secondly, the rhythm variation coefficient is added to the network as a symbolic representation of time series to compensate for the difficulty of grasping long-duration correlation features in shallow CNNs and to improve network performance and clinical usability. We trained and tested on the MIT-BIH arrhythmia data set using raw ECG data. e ACC, SEN, and PRE reached as high as 99.31%, 99.45%, and 98.78%, respectively. In addition, neural network size is reduced, and robustness to noise was increased.
In the future, we will collect and annotate ECG recordings from real patients and study the classification of more different types of diseases. In addition, multilead ECG recordings will be used for training models. On the clinical    side, we will develop an ECG system that can be deployed on wearable medical devices and low-cost ECG devices, test, and improve its performance.

Data Availability
All data sets used to support the findings of this study are included within the article. All data sets used to support the findings of this study were supplied by the publicly available MIT-BIH database from the Massachusetts Institute of Technology. e URL to access the data is https://www. physionet.org/content/mitdb/1.0.0/

Conflicts of Interest
e authors declare that they have no conflicts of interest.