Empirical Mode Decomposition Combined with Local Linear Quantile Regression for Automatic Boundary Correction

and Applied Analysis 3 LetK be a positive symmetric unimodal kernel function and consider the following weighted quantile regression problem:


Introduction
We consider the following general nonparametric regression model: where  is the response variable,  is a covariate, () = ( | ) is assumed to be a smooth nonparametric function, and   represents independent and identical random errors with mean 0 and variance  2 .Empirical mode decomposition (EMD) is a form of analysis based on nonparametric methods [1].This technique is particularly useful for analyzing nonlinear and nonstationary time series.This method has been widely applied over the last few years to analyze data in different disciplines, such as biology, finance, engineering, and climatology.EMD can enhance estimation performance.Applying the capabilities of EMD as a fully adaptive method and its advantages of handling nonlinear and nonstationary signal behaviors leads to better results.However, EMD suffers from boundary extension, curve fitting, and stopping criteria [2].Such problems may corrupt the entire data and result in a misleading conclusion [3].Given that finite data are involved, the algorithms must be adjusted to use certain boundary conditions.In EMD, the end points are also considered problems.The influence of the end points propagates into the data range during sifting.Data extension (or data prediction) is a risky procedure for linear and stationary processes and is more difficult for nonlinear and nonstationary processes.The work in [1] indicated that only the values and locations of the next several extrema, and not all extended data, need to be predicted for EMD.Widely used approaches, such as the characteristic wave extending method, mirror extending method [4], data extending method [5], data reconstruction method [6], and similarity searching method [7], were proposed to overcome the problem and generate a more reasonable solution.The work in [8] introduced quantile regression, a significant extension of traditional parametric and nonparametric regression methods.Quantile regression has been largely used in statistics since its introduction because of its ease of interpretation, robustness, and numerous applications in important areas, such as medicine, economics, environment modeling, toxicology, and engineering [9,10].A robust version of classical local linear regression (LLR) known as local linear quantile regression (LLQ) by [11,12] respectively, have increasingly drawn interest.With its robust behavior, LLQ exhibits excellent boundary adjustment.This characteristic can more efficiently distinguish systematic differences in dispersion, tail behavior, and other features with respect to covariates [12,13].
The current study aims to use the advantages of LLQ to automatically reduce the boundary effects of EMD instead of using classical boundary solutions mentioned previously.The proposed method consists of two stages that automatically decrease the boundary effects of EMD.At the first stage, LLQ is applied to the corrupted and noisy data.The remaining series is then expected to be hidden in the residuals.At the second stage, EMD is applied to the residuals.The final estimate is the summation of the fitting estimates from LLQ and EMD.Compared with EMD, this combination obtains more accurate estimates.
The remainder of this study is organized as follows.In Section 2, we present a brief background of EMD and LLQ.Section 3 introduces the proposed method.Section 4 compares the results of the original EMD algorithm and the proposed new boundary adjustment by simulation experiments.Conclusions are drawn in Section 5.

History of Boundary Treatment in Nonparametric Estimators.
Most nonparametric techniques such as kernel regression, wavelet thresholding, and empirical mode decomposition show a sharp increase in variance and bias at points near the boundary.Lots of works have been reported in the literature in order to reduce the effects of boundary problem.For kernel regression solution, see [14,15].For wavelet thresholding, in addition to use of periodic or symmetric assumption, the authors in [16,17] used polynomial regression to improve the boundary problem.For empirical mode decomposition the authors in [18] provided a new idea about the boundary extension instead of using the traditional mirror extension on the boundary, and they proposed a ratio extension on boundary.The authors in [19] applied neural network to each IMF to restrain the end effect.The work in [2] provided an algorithm based on the sigma-pi neural network which is used to extend signals before applying EMD.The authors in [20] proposed a new approach that couples the mirror expansion with the extrapolation prediction of regression function to solve boundary problem.The algorithm includes two steps: the extrapolation of the signal through support vector (SV) regression at both endpoints to form the primary expansion signal, and then the primary signal is further expanded through extrema mirror expansion and EMD is performed on the resulting signal to obtain reduced end limitations.
In this paper we have followed [16] and [17] strategies to handle end effects of boundary problem in EMD.Instead of using classical polynomial nonparametric regression we will replace it by using a more robust nonparametric estimator, called local linear quantile regression.Practical justifications for choosing such estimator will be explained in Section 2.4.

Empirical Mode Decomposition (EMD)
. EMD [1] has proven to be a natural extension and an alternative technique to traditional methods for analyzing nonlinear and nonstationary signals, such as wavelet methods, Fourier methods, and empirical orthogonal functions [21].In this section, we briefly describe the EMD algorithm.The main objective of EMD is to decompose the data   into small signals called intrinsic mode functions (IMF).An IMF is a function in which the upper and the lower envelopes are symmetric; in addition, the number of zero-crossings and the number of extremes are equal or differ by at most one [22].The algorithm for extracting IMFs for a given time series   is called shifting and consists of the following steps.
(II) Constructing the lower minima  min −1 and the upper  max −1 envelopes of the signal by the cubic spline method.
(III) Computing the mean values,   , by averaging the upper envelope and the lower envelope as (IV) Subtracting the mean from the original signal, that is, Steps II to IV are repeated until   becomes an IMF.If so, the th IMF is given by IMF  =   .
(V) Updating the residue as This residual component is treated as new data and subjected to the process described above to calculate the next IMF +1 .
(VI) Repeating the steps above until the final residual component () becomes a monotonic function and then considering the final estimation of residue r().
Many methods have been presented to extract trends from a time series.Freehand and least squares methods are the commonly used techniques; the former depends on the experience of users, and the latter is difficult to use when the original series are very irregular [23].EMD is another effective method for extracting trends [19].

Local Linear Quantile Regression (LLQ).
The seminal study by [8] introduced the parametric quantile regression, which can be considered an alternative to classical regression in both parametric and nonparametric fields.Many models for the nonparametric approach, including locally polynomial quantile regression by [11] and kernel methods by [24], have since been introduced into the statistical literature.In this paper we adopt local linear regression (LLQ) introduced by [12].Let {(  ,   ),  = 1, . . ., } be bivariate observations.To estimate the th conditional quantile function of response , the equation below is defined given  = : ( Let  be a positive symmetric unimodal kernel function and consider the following weighted quantile regression problem: where   () = ((  − )/ℎ)/ℎ.Once the covariate observations are centered at point , the estimate of () is simply β0 , which is the first component of the minimizer of (2).β1 determines an estimate of the slope of the function  at point .
The higher-order LLQ estimate is the minimizer of the following: The choice of the bandwidth parameter ℎsignificantly influences all nonparametric estimations.An excessively large ℎ obscures too much local structure by excessive smoothing.
Conversely, an excessively small ℎ introduces too much variability by relying on very few observations in the local polynomial fitting [13].

Bandwidth Selection.
The practical performance of Q () depends strongly on selected of bandwidth parameter.
In this study we adopt the strategy of [12].In summary, we have the automatic bandwidth selection strategy for smoothing conditional quantiles as follows.
(1) Use ready-made and sophisticated methods to select ℎ mean ; we use the technique of [25].
Here,  and Φ are standard normal density and distribution function and ℎ mean is a bandwidth parameter for regression mean estimation with various existing methods.As it can be seen, this procedure leads to identical bandwidth for  and (1 − ) quantiles.

The Behavior of Local Linear Quantile Estimator at Boundary Region.
To examine the asymptotic the asymptotic behavior of the local linear quantile estimators at the boundaries, we offer this theorem which has been discussed in detail; see [26].Here we omitted the proofs and summarized the key points.Without loss of generality, one can consider only the left boundary point  0 = ℎ, 0 <  < 1, if   takes value only from [0, 1].However, a similar result holds for the right boundary point  0 = 1 − ℎ.Define Theorem 1 (see [26]).Consider the following assumptions.
(1) () is twice continuously differentiable in a neighborhood of  0 for any  0 .

The asymptotic normality of the local linear quantile estimator at the left boundary point is given by
where Further, the asymptotic normality of the local constant quantile estimator at the left boundary point  0 = ℎ for 0 <  < 1 is Table 1: Formula of the test functions used in the simulation.
Test function Formula 1 () = sin() − sin(2) + 0.5 2 () = 10 −10 + 2   ≤ 0.5 3cos(10) 0.5 <  < 1 3 () = 5 (−10(−0.5) 2 ) +  (−10) From the above theorem, one can deduce that, at the boundaries, the asymptotic bias term for the local constant quantile estimate is of the order ℎ, compared to the order ℎ 2 for the local linear quantile estimate.Hence, the local linear estimation possesses good behavior at boundaries and there is no need for any boundary correction.In other words, the local linear quantile estimate does not suffer from boundary effects but the local constant quantile estimate does.Therefore, local linear quantile is preferable in practice.

Proposed Method
This section elaborates on the proposed method.This technique combines EMD and LLQ (EMD-LLQ).Since local linear quantile regression produces excellent boundary treatment [27], it is expected that the addition of this component to empirical mode decomposition will result in equally wellboundary properties.Results from numerical experiments extremely support this claim.The basic idea behind the proposed method is to estimate the underlining function  with the sum of a set of EMD functions,  EMD , and an LLQ function,  LLQ .That is, fLLQ⋅EMD = fEMD + fLLQ . ( We need to estimate the two components  EMD and  LLQ to obtain our proposed estimate, fEMD⋅LLQ , by the following steps. (1) Applying LLQ to the corrupted and noisy data,   and obtaining the trend estimate fLLQ .(2) Determining the residuals   from LLQ; that is,   =   − fLLQ .
(3) Applying EMD to   , given that the remaining series is expected to be hidden in the residuals.This step is accomplished by performing the following substeps.(4) The final estimate is the summation of the fitting estimates from LLQ and EMD, as follows:

Simulation Study
In this simulation, the software package  was employed to evaluate classical EMD by [1] and the proposed combined method, EMD-LLQ.The following conditions were set.
(2) Three different values of quantile  (0.25, 0.50, and 0.75).(3) Three different kinds of noise structure errors, namely: (a) normal distribution with zero mean and unity variance, (b) correlated noise from the first-order autoregressive model AR (1) with parameter (0.5), (c) heavy-tailed noise from  distribution with three degrees of freedom.
Datasets were simulated from each of the three test functions with a sample size of  = 100 (Figure 1).For each simulated dataset, the above two methods were applied to estimate the test function.In each case, 1,000 replications of the sample size  = 100 were made.The mean squared error (MSE) was used as the numerical measure to assess the quality of the estimate.The MSE was calculated for those observations that were at most 10 sample points away from the boundaries of the test functions: where (Δ) = {1, . . ., Δ,  − Δ + 1, . . ., }.To compare the methods, Tables 2, 3, and 4 present the numerical results of the classical EMD with respect to the proposed method.

Results.
From the simulation results, reported in Tables 2, 3, and 4, we have observed the following.Regardless of the boundary assumptions, test functions, noise structures, and different values of quantile, the proposed method is constantly superior to the classical EMD under periodic, symmetric (Mirror) and wave conditions.Tables 2, 3, and 4 summarize the results.
To ensure that the improvement in mean squared error is due to our proposed treatment, not to something else, we evaluated the classical method and our proposed one when no boundary treatment has been set up at all.From simulation result, we observed that even though the classical solutions help improve the mean squared error, our improvement is much better.Then, at the end, to get rid of some suspicions that the differences might not be significant, we used rank Wilcoxon test.This provided us evidence that our proposed method still achieves a better performance near the boundaries than EMD.All  value for Wilcoxon test are less than 0.05.

Conclusions
In this study, a new two-stage method is introduced to decrease the effects of the boundary problem in EMD.The proposed method is based on a coupling of LLQ at the first stage and classical EMD at the second stage.The empirical

(I)Figure 1 :
Figure 1: Three test functions used in the simulation.

Table 2 :
The MSE of the classical EMD and proposed method under variety of boundary solution noise structure, different values of quantile  (0.25, 0.50, and 0.75), and sample size 100.

Table 3 :
The MSE of the classical EMD and proposed method under variety of boundary solution noise structure, different values of quantile  (0.25, 0.50, and 0.75), and sample size = 100.

Table 4 :
The MSE of the classical EMD and proposed method under variety of boundary solution noise structure, different values of quantile  (0.25, 0.50, and 0.75), and sample size = 100.performance of the proposed method was tested on different numerical experiments by simulation and real data application.The results of these experiments illustrate the improvement of the EMD estimation in terms of MSE.