A Hierarchical Method for Removal of Baseline Drift from Biomedical Signals: Application in ECG Analysis

Noise can compromise the extraction of some fundamental and important features from biomedical signals and hence prohibit accurate analysis of these signals. Baseline wander in electrocardiogram (ECG) signals is one such example, which can be caused by factors such as respiration, variations in electrode impedance, and excessive body movements. Unless baseline wander is effectively removed, the accuracy of any feature extracted from the ECG, such as timing and duration of the ST-segment, is compromised. This paper approaches this filtering task from a novel standpoint by assuming that the ECG baseline wander comes from an independent and unknown source. The technique utilizes a hierarchical method including a blind source separation (BSS) step, in particular independent component analysis, to eliminate the effect of the baseline wander. We examine the specifics of the components causing the baseline wander and the factors that affect the separation process. Experimental results reveal the superiority of the proposed algorithm in removing the baseline wander.


Introduction
The electrocardiogram (ECG) is an important physiological signal that helps determine the state of the cardiovascular system; however, this signal is often corrupted by interfering noise. Baseline wander is a commonly seen noise in ECG recordings and can be caused by respiration, changes in electrode impedance, and motion. Baseline wander can mask important information from the ECG, and if it is not properly removed, crucial diagnostic information contained in the ECG will be lost or corrupted. Therefore, it is vital to effectively eliminate baseline wander before any further processing of ECG such as feature extraction.
The simplest method of baseline wander (drift) removal is the use of a high-pass filter that blocks the drift and passes all main components of ECG though the filter. The main components of ECG include the P-wave, QRS-complex, and Twave. Specifically, the PR-Segment, ST-Segment, PR-Interval, and QT-Interval are considered as the main segments of the ECG. Each of these intervals/segments has its corresponding frequency components, and according to the American Health Association (AHA), the lowest frequency component in the ECG signal is at about 0.05 Hz [1]. However, a complete baseline removal requires that the cut-off frequency of the high-pass filter be set higher than the lowest frequency in the ECG; otherwise some of the baseline drift will pass through the filter. The frequency of the baseline wander high-pass filter is usually set slightly below 0.5 Hz. Therefore, knowing that the actual ECG signal has components between 0.05 Hz and 0.5 Hz, the forementioned simple approach for baseline 2 The Scientific World Journal removal distorts and deforms the ECG signal. In particular, it affects the ST-segment that has very low frequency components. Furthermore, ectopic beats occurring in the ECG during the course of different types of diseases and injuries change the frequency spectrum of both the baseline wander and the ECG waveforms. All the above-mentioned characteristics demand a more comprehensive approach that works for a wider range of applications and avoids distorting the main ECG waves when removing the baseline drift.
Digital filters are commonly employed method to eliminate baseline wander. Cut-off frequency and phase response characteristics are two main factors considered in the majority of these designs. The use of linear phase filters prevents the issue of phase distortion [2]. For finite impulse response (FIR) filters, it is rather straightforward to achieve linear phase response directly. Feed-forward and feed-back technologies such as infinite impulse response (IIR) filters can also provide minimum phase distortion [3]. In all of these methods, the cut-off frequency should be chosen so that the information in the ECG signals remains undistorted while the baseline wander is removed, which results in a trade-off. Usually, the cut-off frequency is set according to the slowest detected (or assumed) heart rate. However, if there are ectopic beats in the ECG signal, it is even more difficult to find this particular frequency. It is a prevalent phenomenon that the overlap between the baseline wander and low frequency components of the ECG compromises the accuracy of the extracted features.
Time-variant filters are designed to increase flexibility in the adjustment and control of the cut-off frequency. In such methods, the cut-off frequency of the filter is controlled by the low frequency characteristics of the ECG signal [4]. Cubic spline curve fitting [5], linear spline curve fitting [6], and nonlinear spline curve fitting [7] belong to another family of filters that remove the baseline wander but often require some reference points. For instance, the linear spline curve fitting method [5] forms a subsignal of the ECG for a single cardiac cycle starting 60 ms before the P-wave and ending 60 ms after the T-wave and fits a first order polynomial to this sub-signal after subtracting the mean of sub-signal. Multirate system wavelet transform has also been utilized for the ECG baseline wander removal. The approach using wavelet adaptive filter (WAF) [8] consists of two steps. First, a wavelet transform decomposes the ECG signal into seven frequency bands. The second step is an adaptive filter that uses the signal of the seventh lowest frequency band as the primary input and a constant as a reference unit for filtering. Another multi-rate system, empirical mode decomposition (EMD) [9], has also been adopted to eliminate the baseline wander. Compared with the wavelet technique that uses some predefined basis functions to represent a signal, EMD relies on a fully datadriven mechanism; that is, EMD does not require any a-priori known basis.
Adaptive filters as a cascade structure [10] have also been used for this application. The first step of this approach uses an adaptive notch filter to eliminate the DC component of the ECG. The second step forms a comb filter assuming that the signal is an event-related signal. Blind source separation (BSS), in particular independent component analysis (ICA) [11][12][13], is another choice to remove the baseline wander. As a specific type of BSS method, ICA has been extensively used in biomedical signals [14][15][16], such as the ECG and the EEG. It has been used as an effective method to decompose multichannel signals into fundamental components. As many more applications of ICA are being recognized, newer variations of ICA are being introduced. Standard ICA [17] (sICA) is a technique that is used to estimate source signals when several mixtures of signals are available. Both the source signals and the mixing process are unknown, and the sources are estimated only on the assumption that they are statistically independent. Comparing the formulation of the standard ICA, convolutive ICA (fICA) deems that the finite impulse response is closely associated with the mixing process, and the mixing process can be considered as a weighted and delayed mixture of sources [18,19]. Fast and robust fixed-point ICA [20] is produced based on the idea that it is feasible to use contrast function to approximate negentropy. Through a fixed-point algorithm, the contrast function is maximized to extract latent sources with high speed. Temporally constrained ICA [21,22] is a more flexible model to separate latent sources. By using prior knowledge or additional constraints, the targeted latent source is extracted. Moreover, there are many other forms of ICA for different applications such as topographic independent component analysis [23] and spatial and temporal independent component analysis [24].
In summary, the traditional methods are limited in either frequency delineation or reference choice, and the case of BBS in applications mentioned previously does not give sufficient evidence in noise removal. Based on these points of view, in the proposed method, a unified method utilizing an adaptive notch filter and BSS is used for baseline drift removal. Specifically, multichannel signals are constructed using a singlechannel signal, and ICA is applied to the ECG. The main contributions of our work lie in combining the capabilities of adaptive filters and BSS, expanding the capabilities of the independent components for this application by customizing the ICA method towards the removal of the ECG baseline wander. Furthermore, the factors affecting the performance of the separation process are explored and improved in this paper.
The rest of the paper is organized as follows. The overall structure of the proposed method is illustrated in Section 2. The adaptive notch filter, as employed in the paper, is described in Section 3. The concepts and formulation of the ICA, the fast and robust fixed-point ICA, and the customized form are introduced in Section 4. Section 5 introduces the process of detecting the components that cause the baseline wander and verifies this process. This section also explores the factors that affect the separation of the results. Finally, Section 6 concludes the paper. Figure 1 shows the framework of the proposed method. As it can be seen in Figure 1, the first step of the proposed method is an adaptive notch filter, designed to form subsignals of the ECG, as described later. Next, as shown in Figure 1,  the proposed method utilizes ICA to remove the baseline drift. Considering the noisy nature of the typical raw ECG signal, in this study, subsignals in low frequencies of the ECG are formed and these filtered signals are, then, formed by an adaptive notch filter, then used as the input to the ICA algorithm. Moreover, with regard to the inputs fed to the ICA algorithm, in this study, only a single-channel ECG signal is available. Therefore, knowing that ICA requires multichannel signals to process as its input, in order to use ICA to remove baseline wander, one needs to build multichannel signals from the single-channel ECG. In order to address this issue in the proposed method, a systematic process was created in which delayed versions of the ECG are stacked to form the multi-channel signal. In addition, as shown in Figure 1, the independent component formed by the ICA as the output, which is originally labeled as the baseline wander, needs to be further adjusted to form a better estimate of the baseline wander. This is due to the fact that, while one of the components resembles the baseline drift, it is unlikely that any of the original components detected by the ICA is "purely" the baseline wander. The specific steps shown in Figure 1 are further described below.

Method
(a) Form sub-signals of ECG using an adaptive notch filter: as shown in Figure 1, the adaptive notch filter [25,26] is designed and customized to form the subsignal. The reason for using the adaptive notch filter is its flexibility as well as its relatively superior performance compared with other filters. As mentioned above, applying the ICA algorithm on a sub-signal of the ECG has the advantage of reducing the errors coming from multi-channel signals in estimating the baseline wander.
(b) Construct multi-channel signals: applying ICA requires that the signals are multi-channel ones. However, in many ECG processing applications only the single-channel ECG signal is available and/or processed. The proposed method applies the methodology in [11] to construct multi-channel signals by delaying the single-channel signal. In our study, the multi-channel signals are constructed using sixty signals, which are delayed 10 sample points (∼83 ms) of the original signal in succession.
(c) Adjust the baseline wander extracted by ICA: the baseline wander extracted by ICA is an approximation of the true baseline wander because (1) there will be some errors in the resulting component due to the fact that the estimation process used in the ICA (in particular in the first few attempts) may be nonoptimal; (2) in the ICA analysis there may be more than one maximum in the estimation function and, therefore, the true baseline wander may not be located accurately; (3) the constructed multi-channel signals cannot convey all information about the baseline wander and, as such, the proposed process may alleviate the issues associated with the non-optimal construction of multi-channel signals. The 10-sample shift of the signals provides large enough variations between the multisignal component to alleviate the issues concerning dependencies for ICA processing.

Adaptive Notch Filter
The adaptive notch filter [26] is based on the same theoretical foundations as adaptive noise cancellation [25]. There are two inputs in the structure of the adaptive noise cancelling. One is the primary input, containing the signal and the noise and the other one is the reference input, which is the reference signal related to the noise in the primary input. Using least mean square (LMS) criterion, the reference signal is gradually approached to the noise in the primary input. When the stability is achieved, the output is acquired through subtracting the reference input from the primary input. This type of filter can deal with inputs that are deterministic or stochastic, stationary or time-variant. If the inputs are stationary stochastic, the solution of the adaptive noise cancelling approaches closely Wiener filter [25]. As to the adaptive notch filter, the reference signal is the signal with one-or multifixed frequencies, which are treated as the frequencies to be excluded. The advantages of adaptive notch filters lie in the following aspects: (1) if the frequency of the interference is not precisely known or the interference drifts in the frequency, the exact excluded frequency could be measured/adapted during the filtering process; (2) the filter is tunable since the null point moves with the reference frequencies; (3) the adaptive notch filter can be made very sharp at the reference frequency; (4) through adjusting the parameters, the adaptive notch filter can be considered as a time-invariant filter by 4 The Scientific World Journal · · · · · · · · · · · · · · · · · · ∑ ∑ + lessening the influence of the time-varying components. The inference of adaptive notch filter is described in [25,26]. The diagram of adaptive noise cancelling is shown in Figure 2. The system is an -stage tapped delay line (TDL). The weight of the filter is updated according to the following equations: where is the reference input, is the desired response, is the output of the filter, is the weight of the filter, is the adaptation constant, and is the time index. As described in [26], the response from ( ) to ( ) includes two parts. In practical applications, it is feasible to make the time-varying component to be insignificant ( / ≈ 0) by changing the values of and setting as follows: where is the frequency of the interference. If the reference input is considered to be the following form: the transfer function of adaptive notch filter can be expressed as follows: Therefore, the parameter can be set to the fixed value as described above. It can be seen that the above-mentioned filter is very flexible and can be adjusted using the adaption constants and to provide the desired bandwidth and depth of a suitable notch filter.

Independent Component Analysis
After applying the notch filter, the main step used is ICA. First, the "standard" ICA is described. ICA can be briefly explained using a simple example of separating two source signals 1 ( ) and 2 ( ) that were mixed by an unknown linear process. Two different linear mixtures, 1 ( ) and 2 ( ), are given as follows: where 11 , 12 , 21 , and 22 are unknown coefficients. The objective of the problem is to recover the signal 1 ( ) and 2 ( ) from mixture signals 1 ( ) and 2 ( ) without knowing any prior information about the source signals 1 ( ) and 2 ( ) and the mixing process (i.e., 11 , 12 , 21 , and 22 ), except that 1 ( ) and 2 ( ) are statistically independent.
In the generalized case, where there are more latent sources and more mixture of signals, the formal definition of ICA is as follows: where ( ) is called latent source, ( ) is the mixture signal, is the mixing coefficient between ( ) and ( ), and is the number of latent sources and mixture signals. The above formulation can be expressed as the following matrix form: where is the matrix of mixture signals, in which each column is one mixture signal; is the matrix of latent signals, in which each column is one latent signal; and × is the matrix for mixing coefficients.
The feasibility of solving the ICA problem lies in the condition that the latent sources are independent of each other. According to the Central Limit Theorem, the distribution of a sum of independent random variables approaches a Gaussian distribution. This implies that the solution of ICA can be achieved when distribution diverges from Gaussianity. The deviation from Gaussianity can be determined using measures such as Negentropy.
Negentropy is one measure of non-gaussianity defined based on the concept of entropy, which is the fundamental concept of information theory. Entropy, , as a measure of information in random variables is defined for a discrete random variable as folows: where is the possible values of and ( = ) means the probability when the value of is . For a continuous random variable , entropy is defined as the following equation: where is the probability distribution function. Negentropy, , is then defined as follows: The Scientific World Journal 5 where gauss is a Gaussian random variable with the same covariance matrix as . A fundamental conclusion in information theory is that a Gaussian variable has the largest entropy among all random variables of equal variance. Hence, negentropy is always nonnegative, and it is zero only if has a Gaussian distribution. The exact calculation of negentropy requires an accurate estimation of the probability distribution function, which may be computationally costly or data intensive. Therefore, it is often preferred to find simple approximations of negentropy. Simple approximations of negentropy have been introduced [27], which are based on the maximum entropy principle. In general, the following family of approximations is the most commonly used group: where are constants and V is a gaussian random variable with zero mean and unit variance. Often, the value of and can be set to one. Therefore, the above formulation becomes as follows: The following formulations of functions have proved very useful in practical applications: where 1 ≤ 1 ≤ 2, 2 ≈ 1, and is the first derivative of the function . Before applying the main processing operations of the ICA, it is often necessary to perform some preprocessing. Usually, the two different operations are conducted: centering and whitening. Centering requires that the random variable is a zero-mean random variable, and it is performed by subtracting its mean vector. Whitening will make the random variable uncorrelated and set their variances equal to unity by using the eigenvalue decomposition of their covariance matrix: where is the orthogonal matrix of eigenvectors and is the diagonal matrix of eigenvalues. Now, assuming that is a new random variable after whitening, consider the following: Whitening makes the problem change from estimating mixing matrix to estimating a new onẽ: Among several improvements of ICA, fast and fixed-point independent component analysis [20], as a direct extension of the standard ICA, was developed for calculating latent sources with high speed. The basic rule of fast and fixed-point independent component analysis is to find a direction, which can maximize non-Gaussianity of . Non-Gaussianity is decided according to the approximation of nongaussianity as mentioned above. The following is the basic description of the algorithm.  where is the weight vector to calculate latent source = and convergence means that the old weight vector and the new weight vector are in the same direction.
In this study, the fast and fixed-point independent component analysis [20] is used as the implementation of ICA block shown in Figure 1.

Results
An ECG dataset of human volunteer undergoing lower body negative pressure (LBNP) [28] as a surrogate of hemorrhage was employed to verify the effectiveness of removing baseline wander. This data set was created under Institutional Review Board approval. The LBNP dataset consisted of a total of 91 subjects. Each subject had a single vector lead ECG recording collected at the sampling rate of 500 Hz. The baseline wander in ECG signals demonstrated significant level of variations in the amplitude over the course of the LBNP experiment. During LBNP, subjects are exposed to increasing negative pressure to their lower bodies. This causes a redistribution of blood volume to the lower extremities and abdomen causing a decrease in blood pressure and cardiac output and resulting in an increased respiratory rate.
The results of the proposed method are compared with a reference method, called robust locally weighted regression [29], which is often treated as one of the most robust and commonly used methods to remove baseline drift. The robust locally weighted regression method employs two techniques: the local fitting of polynomials and an adaptation of iterated weighted least squares to remove the baseline drift.

Results of Adaptive Notch Filter.
One objective of the proposed system is the removal of unwanted frequencies around 0 Hz as well as 60 Hz. As the frequencies around zero are excluded, the filter acts as a high-pass filter. In order to lessen the influence of the time-varying components, one needs to first set a suitable parameter to obtain a desirable level of time-varying component, / . Figure  the filter. In general, with the increase in the value of , this influence decreases gradually. In this study, the value of was set to 10,000. The parameter identifies whether or not the adaptation converges [25]. The value of should be greater than 0 but less than the reciprocal of the largest eigenvalue, , of the matrix , which is defined as the correlation matrix of signal [25]. In this study, the value of was set to 0.0001. The bandwidth of the filter can be approximated using the following equation [26]: Figure 4 shows the transfer function of the resulting adaptive notch filter, and, as expected, this filter acts as a highpass filter. Note that the value of provides yet another degree of freedom for this filter design process, and, hence, Figure 4 presents the transfer function for two different filters formed using two different values of C, each resulting in a very different bandwidth. A main advantage of the adaptive notch filter used here is that changing the values of parameters , , and can provide a wide spectrum of desired filters with diverse shapes of transfer function.
Adaptive notch filter for frequencies around 60 Hz is designed similarly. The parameter was to 2048, to 0.001, and to 0.1. Figure 5 depicts the transfer function of the resulting adaptive notch filter.
The Scientific World Journal 7

Experimental Results and Problems
Analysis. The results of both methods, that is, the proposed and the reference methods, are examined and compared in all 91 subjects. A unified "span" value, described in the reference method [29], which is designed to assess the quality of the methods in removing the baseline wander, is calculated for all cases. This value for all experimental results was 1500, which is the level identifying a very high quality of baseline removal. The 91 cases, based on the closeness of the results of the two methods, are divided into two groups. The details of the results are shown for 72 out of 91 subjects in Table 1; for these subjects the proposed algorithm achieves almost identical results as the reference method. The results of the remaining 19 subjects, which will be discussed separately, show that the proposed method cannot be able to remove the baseline drift optimally.
In Table 1, "shift" and "elevation" are the values for adjustments to the original independent component (baseline wander) to form the new baseline wander in the horizontal and vertical directions; "error 1 " represents the difference between the old baseline wander (sig 1 ) before shift and the baseline wander (sig) from the reference method calculated as follows: where is the number of sample points in the baseline wander, and finally "error 2 " represents the difference between the new baseline wander (sig 2 ) and the baseline wander (sig) from the reference method calculated as follows: As it can be seen in Table 1, for all cases error 2 is significantly smaller than error 1 which shows the impact of that method in "purifying" the baseline wander and creating a better estimate of the drift. In order to better assess the performance of the proposed method in removing the baseline wander, more analyses are conducted on the results. Figures 6 and 7 show the shift and elevation for all 72 subjects. As can be seen, both of these variables are almost the same for all subjects and do not change across different subjects ( -axis) or vary in a small scope. This observation illustrates the reason to adjust the parameters between the old baseline wander and the new baseline wander. Figure 8 shows the error reduction in 72 subjects after adjusting shift and elevation value. It can be seen that in all of these cases the errors decrease significantly after adjusting the baseline wander compared with the baseline wander. The average percentage of error reduction Aver reaches up to 90.13%. The formulation of the average percentage of error reduction is shown in the following: where is the index of subject and is the total number of subjects. Sample signals before baseline removal and after baseline removal with the proposed method as well as the reference method are shown in Figure 9. As shown in Figure 9, the results of the two methods in all above-mentioned 72 subjects  are very similar. In addition, as it can be seen, both methods are very effective in removing the baseline drift.
However, as mentioned above, on the ECG of the remaining 19 subjects, the results of the proposed method and the reference method are not as similar; that is, the value of error 2 (which shows the difference between the two methods) is significant. This is because in these signals the inherent pattern observed from ECG is highly distorted hence leading to spurious estimations. As mentioned before, we have visually inspected all 91 cases. By examining the signals for these 19 cases, it was discovered that the high value of error 2 does not seem to come from the inability of the proposed method to remove the baseline wander. In such case, the possible reason and improvement are discussed in the following part.
As a comparison between the proposed method and reference method, some such sample results are shown in Figure 10. In these cases, due to the presence of significantly stronger baseline drifts, the reference method seems not to be eliminating almost all the baseline drift. The reason for this might lie in the fact that the reference method relies heavily on the parameters set that may work very well for some ECG signals but not for others. As shown in Figure 10, our proposed method shows more effective performance in removing the baseline around times such as 4700, 5500, 7500, and 9500. Another major advantage is that the proposed The Scientific World Journal  method is computationally faster than the reference method while achieving the same quality of results.

Further Experimental Analysis of Method.
As mentioned above, in the experiment, multi-channel signals are constructed through a single-channel signal. The multi-channel signals are constructed using sixty signals, which are 10 sample point delayed successions of the original signal. By observation, the number of the constructed signals greatly impacts the success of finding the true baseline wander. Moreover, the degree of delay has a close relationship with the smoothness of the baseline wander. Experimentally, it can be considered that more channels and smaller delayed signals may achieve better results, meaning that the constructed multi-channel signals may convey enough information in order to accurately extract the baseline wander.
In addition, as discussed above, the LBNP dataset shows a significant level of variations in the baseline drift. Therefore, in further analysis of the method, the sub-signals were segmented to verify whether the slow changes in the trend of the baseline wander affect the results of the proposed method in separating the baseline wander. The sub-signals were chosen to be only 10,000 sample points long from the beginning of the original signal in LBNP dataset. Experimental results showed that the slow changing trend of the baseline wander did not affect the performance of the proposed method in extracting the baseline wander. In other words, the baseline drift with slow changing trends can also be successfully extracted using the proposed method.

Conclusion
While using the blind source separation paradigm, the ECG baseline wander or drift may be removed. The present paper demonstrates a hierarchical method utilizing ICA to significantly improve the performance of this process and achieve improved performance. Compared with the existing methods, the proposed method has the following advantages.
(1) The proposed method provides more flexibility with regard to parameter estimation and selection. (2) When following the steps proposed for adjustment of ICA process, the fundamental assumption of baseline noise coming from an independent source can be further verified, which supports the validity of using the method for ECG baseline removal. Such an assumption, verified by additional experimental results, would present a chance to remove other types of noise.
(3) The filtering process proposed for forming the multichannel signals provides a highly flexible method to form the input to ICA.