An ECG Denoising Method Based on the Generative Adversarial Residual Network

High-quality and high-fidelity removal of noise in the Electrocardiogram (ECG) signal is of great significance to the auxiliary diagnosis of ECG diseases. In view of the single function of traditional denoising methods and the insufficient performance of signal details after denoising, a new method of ECG denoising based on the combination of the Generative Adversarial Network (GAN) and Residual Network is proposed. The method adopted in this paper is based on the GAN structure, and it restructures the generator and discriminator. In the generator network, residual blocks and Skip-Connecting are used to deepen the network structure and better capture the in-depth information in the ECG signal. In the discriminator network, the ResNet framework is used. In order to optimize the noise reduction process and solve the lack of local relevance considering the global ECG problem, the differential function and overall function of the maximum local difference are added in the loss function in this paper. The experimental results prove that the method used in this article has better performance than the current excellent S-Transform (S-T) algorithm, Wavelet Transform (WT) algorithm, Stacked Denoising Autoencoder (S-DAE) algorithm, and Improved Denoising Autoencoder (I-DAE) algorithm. Experiments show that the Root Mean Square Error (RMSE) of this method in the Massachusetts Institute of Technology and Beth Israel Hospital (MIT-BIH) noise pressure database is 0.0102, and the Signal-toNoise Ratio (SNR) is 40.8526 dB, which is compared with that of the most advanced experimental methods. Our method improves the SNR by 88.57% on average. Besides the three noise intensities for comparison experiments, additional noise reduction experiments are also performed under four noise intensities in our paper. The experimental results verify the scientific nature of the model, which is that our method can effectively retain the important information conveyed by the original signal.


Introduction
As one of the main components of cardiovascular diseases, heart disease is extremely harmful, affecting patients' normal life, and can be fatal. Electrocardiogram (ECG) is one of the main techniques for heart disease diagnosis [1,2], which mainly reflects the electrical activity of the heart. When the heart is in good condition, it will show a regular ECG signal curve. Doctors can make a quick judgment on the heart condition by observing the shape, amplitude, and interval of continuous heartbeats of the waveform. This is the most effective and quickest method of monitoring [3,4], classification [5,6], and treatment of heart diseases. In practical application, we find that the collected ECG signal is often mixed with a lot of noise, which is not conducive to signal analysis. Therefore, the most important step in data processing is to denoise the collected signal so as to improve the usability of the signal.
Traditional noise removal methods do have certain limitations in practical applications. The noise reduction methods and the types of noise removal are often in a oneto-one relationship, which cannot meet the actual needs of ECG signal noise reduction. For example, the Fourier Transform method [7] is usually used in the overall analysis of the signal. This method can reveal the correlation between the time domain and frequency domain signals. But at the same time, there are some requirements that cannot meet the local detailed analysis of the ECG signal. This Filtering method [7] can eliminate noise to a certain extent, but sometimes, the results of the Filtering effect are not satisfactory because the denoised ECG signal loses its medical value. Traditional ECG noise reduction methods include Spectrum Decomposition [8][9][10], Fourier Decomposition [11,12], Filtering [13][14][15], Empirical Mode Decomposition [16], and Wavelet Transform (WT) [17,18]. Obviously, the traditional noise reduction methods have poor adaptability to the type of noises, and their ability to capture local signals is also weak. Compared with other technologies, Filtering technology has a simpler noise reduction capability [19]. For example, a low-pass filter can only allow waveforms below a certain frequency to pass, and its true noise reduction capability is relatively simple.
Deep learning has attracted more and more in-depth research studies in the field of ECG noise reduction. The representative ones are the Functional Link Neural Network (FLNN) [20], Wavelet Neural Network (WNN) [21], and Denoising Autoencoder (DAE) [22]. Among them, FLNN and WNN are the most widely used, but they can remove only one type of ECG signal. The Improved Wavelet Neural Network and WT have improved the single problem of the abovementioned noise reduction function and can remove the three kinds of noises of the ECG signal. However, through the observation of the experimental images, we found this method will make the ECG signal lose its medical value, thus not worth further studies. In addition, the Three-Hidden-Layer Feedforward Neural Network has achieved good results since the adaptability of the network to other noises is not discussed, but the experiment is only conducted on the electrocardiographic signal containing electromagnetic noise.
We know through research that the current adversarial denoising is mainly applied in the direction of image noise reduction [23][24][25]. The more mature methods include the Conditional Generative Adversarial Network (CGAN) [26], Wasserstein Generative Adversarial Network (WGAN) [27], and Cyclic Generative Adversarial Network (Cycle-GAN) [28]. Research institutions adopt the adversarial idea to establish the image noise model. What is representative is that Divakar and Babu have a lot of thought on the image denoising work of GAN [29,30]. Based on GAN, Fu et al. [31] and Zhao et al. [32] conducted in-depth research studies on image superresolution, which has been studied many times in recent years. Through experimental analysis, the advantages and disadvantages of different networks have been summarized, and a lot of work on visual images was done. DAE is an Autoencoder (AE) [33] input that imposes sparse constraints on hidden units. The advantage of the deep learning method is that it can learn features from the data itself without manual intervention, which is worth learning from.
The Residual Network was proposed by He et al. [34] through the use of residual blocks to successfully train 152 layers of neural networks. The main idea of ResNet is to add a direct connection channel which is named Skip-Connecting [35]. Skip-Connecting can solve the problem of gradient disappearance in the case of a deep network layer and, at the same time, help the backpropagation of the gradient and speed up the training process. The entire model is trained with an end-to-end method, which simplifies the difficulties of model training.
In terms of current research, the existing noise reduction methods have achieved satisfactory results, but there are still three shortcomings. One is the single function of the existing denoising methods, and their adaptability to various noises is not good enough. Second, the existing denoising methods tend to treat pathological information as noise in the denoising process, which may lose a lot of useful information and cause serious signal distortion. Third, the existing noise reduction network ignores the importance of the local correlation and global correlation of the ECG signal.

Computational and Mathematical Methods in Medicine
This article is organized as follows. The second part lists the contributions. The third part describes the related work. In the fourth part, the denoising methods of ECG signals are discussed. The fifth part introduces the Generative Adversarial Residual Network denoising method in detail, and the sixth part shows the summary of this paper.

Contributions of This Article
This article summarizes the three shortcomings of current noise reduction technology and uses this as a research point to conduct the following research studies.
Contributions of this paper are as follows. Firstly, a new method of ECG noise reduction is proposed. Based on the ability of adversarial networks to learn differences, we propose a method to use adversarial methods to establish a new ECG noise reduction view. The network designed with Residual Network in this paper can greatly improve the computing speed of GAN, and the Skip-Connecting structure in the residual block is used to promote the gradient disappearance during GAN training.
Secondly, we have designed a new model that can remove multiple mixed noises, which overcomes the problem of the single function of traditional denoising methods. The proposed model has strong adaptability to various noises. It can remove the three common noise interferences and mixed noises in the ECG signal.
Finally, in view of the shortcomings of the existing denoising methods that it is easy to treat pathological information as noises in the denoising process, the global difference function and maximum difference function are added to the loss function to improve the current situation. The local difference function can greatly capture the original local characteristics of the signal so that the medical features of the signal to be reduced can be preserved by adding the maximum difference function to capture the global features with good performance.

Problem
The performance of a complete heartbeat cycle on the ECG is shown in Figure 1. The ECG signal contains six different types of waveforms: P, Q, R, S, T, and U [36]. However, noises are often included in the process of ECG signal acquisition, which will affect doctors' diagnosis.
Common ECG noises include Electrode Motion Artifacts (EM), Muscle Artifacts (MA), and Baseline Wander (BW) [37,38]. Figure 2 is a waveform diagram of a normal ECG signal (signal 213 in the MIT-BIH database as an example). Figure 3 shows the ECG signal image after adding three common noises.
We know that the problem of one-dimensional signal noise reduction can be described as inspired by [39], where XðiÞ represents a noisy signal, SðiÞ is a  3 Computational and Mathematical Methods in Medicine clean signal, and NðiÞ represents a noise. The purpose of noise reduction is to suppress noise NðiÞ so that the signal component in XðiÞ tends to be SðiÞ. Since the ECG signal is a one-dimensional signal, we can derive the following equation: where X½i = ðx 1 , x 2 ,⋯,x k Þ, S½i = ðs 1 , s 2 ,⋯,s k Þ, and N½i = ðn 1 , n 2 ,⋯,n k Þ. They represent the noisy signal, clean signal, and noise represented by a one-dimensional vector. i represents the number of samples, and k represents the length of a sample. The idea of ECG noise reduction in this paper is to obtain high-quality clean signals S½i from noisy signals X½i.

Methodology
In this section, we will introduce the experimental ideas of this article in three parts. Section 4.1 describes the overall network framework used in this article. Section 4.2 introduces the loss function of this article. Section 4.3 systematically introduces the Generative Adversarial Residual Network used in this article. Table 1 is summarized for references.

Overall
Structure. GAN is proposed by Goodfellow [39]. It consists of a generator network and a discriminator network [37]. With noise from the signal generator into the network, the generator generates pseudorandom samples from close to the true input. The action discriminator is generated by the generator to distinguish true and false. The overall structure is shown in Figure 4. Like the original GAN paper [39], the green line represents the distribution of the denoised ECG data, the black dashed line represents the . . .
(d) Figure 4: The adversarial learning process of the GAN model. This figure can be understood as (a) an adversarial pair close to convergence, similar to P g and P data . The discriminator is used to judge whether the two distributions are the same. Then, the discriminator is trained. As shown in (b), the discriminator is continuously improving the ability to identify a clean ECG signal and denoising the raw ECG signal. In (c), as the adversarial progresses, the signal generated by the generator gradually approaches the original clean signal received in the discrimination. In (d), through enough feedback, the generator has the ability to compete with the discriminator. We use the figure (Figure 2) in [39] to explain the principle of signal noise reduction in this article. S is the original clean signal. S ′ is the denoising ECG data generated by the generator. We will define the distributionP g of the generated data and the distribution P data of the clean raw data.
In Figure 5, we show this article's proposed method. In this article, we add the residual block structure to the generator network, and in the discriminator network, we use the full ResNet architecture.

Loss
Function. The smaller the difference between the original clean signal S and the signal S ′ obtained after noise reduction by the generator, the more similar the denoised signal is to the original clean signal, which can be expressed as X is the signal with noise, S is the original clean signal, N is the noise, and S ′ is the signal generated by the generator. According to the characteristics of the adversarial network, the generator and the discriminator learn the distribution of ECG noise through continuous games. The denoising of the ECG signal is completed through adversarial training. Then, the above formula is rewritten as  Figure 6: The overall structure of the generator network.  Figure 7: The overall structure of the discriminator network.

Computational and Mathematical Methods in Medicine
The original loss function of the generator is expressed as GðXÞ means that the noisy signal X is used as the input signal. DðGðXÞÞ represents the probability that the discriminator network will determine the GðXÞ signal that comes from the original clean signal instead of the generator denoised signal.
Considering that the difference between the signal generated by the generator network and the original clean signal reflects the noise reduction effect, it can be added to the loss function. Use L dist to represent the overall difference between the generated signal and the original signal. The overall dif-ference function formula can be expressed as Local features play an important factor in measuring the effectiveness of denoising. Define the maximum generated difference signal with the original clean signal to a partial difference. The local maximum difference function is represented by L dist−max to be added to the loss function of the generating network. When L dist−max is smaller, it means that the denoising ECG signal retains more original useful Therefore, the loss function L G of the generator is defined as where α = 0:7 and β = 0:2. The discriminator network is used to classify the generated signal from the original signal. Therefore, the definition uses the crossover loss function to represent the loss function of the discriminator network, which is expressed as DðSÞ represents the probability that the signal S obtained by the discriminator is the original clean signal. In the generator network loss function, by joining the local difference function to capture the signal local features and save useful medicine and by joining the overall difference functions to capture the performance of good stable global features and use the training process, the noise reduction of ECG signal partial correlation is solved and the noncomprehensive global correlation problem is considered. The generator may better understand the overall distribution of noise in the signal and local noise. Meanwhile, the ECG denoised signal is kept as the original clean signal as much as possible. Figures 6 and 7, respectively, describe the network structure of the generator and the discriminator. In this article, ten original clean signals are added to the SNR of 0 dB, 1.25 dB, and 5 dB to generate the noisy data needed for our experiment. For the entire Residual Network of the discriminator, the first layer of convolution uses a 3 * 3 size template, the step size is 2, the padding is 3, and then Batch Normalization (BN), Rectified Linear Unit (ReLU), and Max pooling are performed. These constitute the first part of the convolution plate. From the first floor to the fourth floor, each floor is different, which is shown in detail in Figure 6.

Generative Adversarial Residual Network.
The dashed link part indicates that the number of channels is different. The feature map is connected from 3 * 3 * 64 to 3 * 3 * 128. Since the number of channels is different, the cal-culation method is What is particularly important here is that the input of the residual block is x. After the first layer is linearly changed and activated, the residual FðxÞ is the output. After the second layer is linearly changed, before activation, FðxÞ is added to the input  x of this layer, and then the output is activated after activation. The path added to x before the activation of the second layer output is what we call Skip-Connecting. The underlying mapping value of the expected output is HðxÞ. When the network has learned a more saturated accuracy, the so-called residual FðxÞ = HðxÞ − x. Then, we can say that the input x is similar to the output HðxÞ, and the learning goal of this phenomenon is transformed into identity mapping learning. In formulas with different numbers of channels, W is a convolution operation with 1 * 1, which is used to adjust the dimension of x.
The Residual Network forms a basic residual block by adding Skip-Connecting on the original convolutional layer so that the learning of HðxÞ is expressed as HðxÞ = FðxÞ + Wx. At the same time, the accumulation of the residual block structure solves the problem of small gradients or gradient explosions to a certain extent. The Skip-Connecting in the Residual Network breaks the layer-by-layer transfer characteristic of the traditional network. It is about learning the error rate for the entire model by superimposing a multilayer network problem rather than fall, but rise provides a new direction.

Results and Discussion
In this section, we will introduce the experiment in four parts. We explain in Section 5.1 that we have an experimental environment, experimental parameters, and experimental datasets. In Section 5.2, we use different types of noise and different intensities to conduct experiments. Section 5.3 is a comparison between the experimental method of this paper and the results of the four most advanced methods. In Section 5.4, we did a verification experiment to prove that our method can achieve denoising efficiently and accurately.

Experimental Configuration.
This experiment is conducted on a workstation. There are 8 GPU nodes on this workstation, and each node is configured with 1 Tesla V100 (32 GB) and 48 GB of memory. Each node is a Compute Unified Device Architecture (CUDA) 10.1 environment. We are experimenting with code in py3.5-pytorch1.5.1-gpu. The computer model used in the experiment was Lenovo Legion Y7000P2020H. The CPU configuration is Intel i7-10750H 2.60 GHz. GPU is RTX2060. RAM is 16.0 GB. Storage is 1 TB SSD. The battery has 4 cells, and a 64-bit operating system, as well as an x64 processor, is adopted. The operating system is the home version of Windows 10.

Performance Metrics.
The evaluation index is based on the Root Mean Square Error (RMSE) and Signal-to-Noise Ratio (SNR), defined as follows: S i represents the original clean signal, S′ i is the ECG signal after noise reduction processing, and i represents the number of sample points. RMSE describes the degree of similarity between the two data. The smaller the RMSE value, the smaller the difference between the two. SNR represents the ratio between the signal and the noise contained in the signal. The larger the SNR, the better the noise reduction effect.  Taking into account the learning characteristics of the neural network, the ECG signal is normalized by the normalization method: X max and X min represent the maximum and minimum values of x, respectively.
According to the periodicity of the ECG signal, we can reasonably divide and use the recording of the ECG signal. Select 10 groups of original ECG signal data; each group of data has 2 * 650000 data points. In this article, we use the upper signal data for analysis and verification. When dividing samples, we consider that in order to ensure that each sample contains at least one ECG cycle (one ECG cycle has 310 sample points), the data is divided with 512 sample points as a cycle. The segmented data is added to 0 dB, 1.25 dB, and 5 dB noise intensity EM, MA, BW, EM+MA, EM+BW, BW+MA, and BW+EM+MA noise signals to form the training test required for the experiment. A total of 26649 experimental data, according to the training set ðtraining set + validation setÞ: test set = 9 ð9 : 1Þ: 1 for the experiment.

Training Parameters.
After each convolutional layer in this article, BN is used before the activation function. The batch size is 64, and the learning rate is set to 0.1. When the training error is not shrinking, reduce the learning rate to one-tenth of the original and continue training. The training process is carried out for 200 rounds. When designing the building block for the residual function, we use a three-layer stack, which uses 1 * 1, 3 * 3, and 1 * 1 convolutions. Among them, 1 * 1 convolution is used to reduce dimensionality and then increase dimensionality; that is, use 1 * 1 convolution to solve problems with different dimensions. 3 * 3 corresponds to a bottleneck; that is, fewer input and output dimensions can be achieved. In the experiment, the number of training samples of each type of noise is 23984; the total number of test samples is 2664.

18
Computational and Mathematical Methods in Medicine Figure 9 is the result of ECG denoising in our Generative Adversarial Residual Network model, including EM noise, BW noise, and MA noise. In the subpicture, the first row is the noisy ECG signal, the second row is the original clean ECG signal, and the third row is the clean signal after noise reduction. For a more intuitive comparison, we stack the effect pictures together, as shown in the fourth row of effect pictures. It can be seen from the figure that the denoised ECG signal can coincide well with the original ECG signal. These results show that our method has a very good effect on removing single noise. Figure 10 depicts the result of ECG noise reduction used to remove mixed noise. The ECG signal types of mixed noise are divided into seven types in this article. The top line is the noise signal, the next is the original clean ECG signal, the bottom is the denoised ECG signal, and the last line is the visual superimposition effector graph. As shown in Figure 7, the signal after noise reduction can be highly coincident with the original signal. The experimental results show that our method has a good effect on the denoising of mixed noise. Table 2 shows the average SNR and RMSE of the denoising results for different noise types. Ten groups of ECG signals are added to the noise reduction effect of 0 dB, 1.25 dB, and 5 dB under 7 kinds of noises. It can be seen from Table 2 that our model has achieved impressive results by adding different intensities of noise to the same noise environment with different types of noises. For a signal containing only one type of noise, our method can achieve a SNR of up to 66.9607 dB. For signals with mixed noises, this method can achieve a SNR of 62.5019 dB. When 0 dB noise is added, the highest SNR after noise reduction can reach 36.4455 dB. When SNR = 1:25 dB, the noise can reach 33.7478 dB after noise reduction. When SNR = 5 dB noise is added, the highest SNR after noise reduction can reach 66.9607 dB.

Comparison with Existing
Methods. The S-T algorithm [40], WT algorithm [41], S-DAE algorithm, and I-DAE algo-rithm [22] are compared with our proposed method (P-M). Tables 3-5 show the result of SNR and RMSE of the different noise types (i.e., MA noise, EM noise, and BW noise) of the 10 records mentioned in Section 5.1. We observe that our proposed method achieves the highest SNR and the lowest RMSE average in all cases. The average SNR result of MA and BW can be above 30 dB; for EM noise, the average SNR result is more than 28 dB.
In Figures 11-13, we assume that the input signal contains MA noise, EM noise, and BW noise, and the SNR is 1.25 dB. From Figures 9, 10, and 14, we can see that our proposed method all achieves the highest SNR. Specifically, most of the average SNR reached 28 dB.
In order to verify the network model, the following two sets of verification experiments are carried out. The experimental results are shown in Figures 15 and 16. Figure 15 shows the noise reduction effect of different ECG signals with the same noise type under different noise intensity interferences. Figure 16 is a graph of the noise reduction effect for the same ECG signal with different types and different intensities of noises. As shown in Figures 15 and 16, we can see that our network model performs well in removing single noises or mixed noises. It shows good performance for various noises and multiple intensity types of noises. Our model overcomes the singularity of traditional noise reduction functions, and the method proposed in this article can remove multiple noises.

Model Noise Reduction Effect
Verification. In addition to adding the traditional 0 dB, 1.25 dB, and 5 dB, this article also has data of 1 dB, 2 dB, 3 dB, and 4 dB for effect verification, and the experimental results are all good. The experimental data is shown in Table 6. It can be seen that the model in this article has a good noise reduction effect on noise with different SNR and is applicable to data with different noise types.
Previous work used the methods of Decomposition, Transformation, and Filtering to denoise the ECG signal. However, these studies have three shortcomings. First, the existing noise reduction methods do not take into account the locality and global correlation of ECG signals. Second, the adaptability to various noises is not good enough. Third, it may trigger severe signal distortion in the course of this research. We have established our experimental methods in accordance with the above three shortcomings. This paper establishes a new point of view with adversarial methods; that is, adversarial methods have the ability to learn the difference between input and output. This view makes it possible to denoise ECG signals with adversarial methods. We added L dist and L dist−max to the loss function of the generated network to increase the local and overall grasp of the noise reduction signal. We propose an adversarial ECG signal denoising method, and our method has an excellent denoising effect for all kinds of noisy signals. A large number of experiments verify the superiority of the model's noise reduction effect. As shown in Table 7, our method improves the SNR by 88.57% on average.

Conclusion
This article describes a new method for ECG signal denoising. The high-quality noise reduction of the ECG signal is realized, and the characteristic distribution of noise is learned through the adversarial network. First, this article improves the original adversarial network structure, adds the residual block structure to the generation network, and uses the ResNet structure in the discrimination network. All of these improvements accelerate the training process while improving the stability of network optimization and have stronger generalization capabilities than general networks.
Second, in the aspect of feature layer normalization, this article carries out batch processing operations. It is convenient to adjust the data distribution in the hidden layer, making the network easier to train.
Third, the residual block and Skip-Connecting structure are added to the network model in this paper. The introduction of the improved network structure effectively suppresses the defect that the gradient explodes or disappears easily in the training process of the counter network itself, which greatly reduces the possibility of gradient explosion and dispersion during training, thereby improving the stability of network optimization.
Fourth, we have improved the traditional loss function. Through the network model noise reduction effect verification experiment, it is known that the loss function used in this paper has achieved excellent results under different types of noises with different intensities.
But for now, the theory of learning differences based on adversarial methods has not been clearly established. The classification operation can be added to the output part after noise reduction to further verify the noise reduction effect of the network model, which will be the focus of our future work.
Certain achievements have been made in noise reduction of the ECG signals, yet further research studies are still needed, which will focus on the following aspects: (1) A good signal noise reduction method performs better in ECG signal classification. Inspired by [42], we will combine the denoising and classification of ECG signals for further research studies. Noise reduction can improve the classification accuracy, and the classification result is used to verify the performance of the noise reduction method (2) The research results in ECG noise reduction can provide support for the development of other industries. For example, the progress in noise reduction can improve the accuracy of the ECG measurement device. In the next step, the proposed noise reduction method will be applied to the ECG signal acquisition process of the twelve-lead wearable ECG monitor that is actually involved in the research and development. In this way, the practicality and feasibility of the research results can be further verified. Because the ECG signal interaction method collected by the 12-lead wearable ECG monitoring suit developed by our team is a cloud platform, the blockchain-based medical data protection proposed in [43] is also one of the research directions in the future (3) The data noise reduction method proposed in this paper is not limited to its application in the field of ECG signals. The proposed method can also be applied to other one-dimensional signal noise reduction, such as the processing of noise data in ocean data [44]. Data denoising methods are also applicable in the field of big data preprocessing. For example, as stated in [45], according to different service requirements to expand big data processing, it can provide more accurate valuable data. Such data processing ideas are consistent with the ideas of this article

Data Availability
All the data utilized in our research can be accessed from http://ecg.mit.edu/dbinfo.html.

Conflicts of Interest
There is no conflict of interest regarding the publication of this paper.