Robust Wavelet Estimation to Eliminate Simultaneously the Effects of Boundary Problems , Outliers , and Correlated Noise

Classical wavelet thresholding methods suffer from boundary problems caused by the application of the wavelet transformations to a finite signal. As a result, large bias at the edges and artificial wiggles occur when the classical boundary assumptions are not satisfied. Although polynomial wavelet regression and local polynomial wavelet regression effectively reduce the risk of this problem, the estimates from these two methods can be easily affected by the presence of correlated noise and outliers, giving inaccurate estimates. This paper introduces two robust methods inwhich the effects of boundary problems, outliers, and correlated noise are simultaneously taken into account. The proposed methods combine thresholding estimator with either a local polynomial model or a polynomial model using the generalized least squares method instead of the ordinary one. A primary step that involves removing the outlying observations through a statistical function is considered as well. The practical performance of the proposed methods has been evaluated through simulation experiments and real data examples. The results are strong evidence that the proposed method is extremely effective in terms of correcting the boundary bias and eliminating the effects of outliers and correlated noise.


1.1
The classical model 1.1 assumes that the unknown function, f, is square integrable on the interval 0, 1 .The sequence {ε i } n i 1 is independent and distributed normally with identical International Journal of Mathematics and Mathematical Sciences means zero and identical variances σ 2 .Wavelet methods have been intensively used over the last two decades for estimating an unknown function observed in the presence of noise, following the pioneering work presented in the seminal papers of Donoho and Johnstone 1, 2 , where the concept of wavelet thresholding was introduced to the statistical literature.
Researchers have begun looking at the situation where the usual assumptions are no longer satisfied.When the noise contains a certain amount of structure in the form of a correlation, the variances of the wavelet coefficients will depend on the resolution level of the wavelet decomposition but will be constant at each level.As a result, the global thresholding breaks down because of its great difficulty in providing a global threshold value to threshold wavelet coefficients at all the desired levels.In order to overcome this limitation, Johnstone and Silverman 3 recommended using level by level thresholding, that is, not to use only one thresholding value for all coefficients at all resolution levels, but instead, one depending on the levels.Different thresholding methods have been considered, such as Universal of Donoho and Johnstone 1 , SURE of Donoho and Johnstone 2 , and the translation-invariant denoising algorithm of Coifman and Donoho 4 .Details can be found in Johnstone and Silverman 3 .The level-dependent thresholding method proposed by 3 considered the case where the noise is correlated but stationary.If the noise is correlated but not stationary, the DWT coefficients will be heteroscedastic and correlated.Kovac and Silverman 5 have taken these circumstances into consideration and proposed more general thresholding method, which can be used for both stationary and nonstationary noise.Another method was proposed by Wang and Wood 6 , in which the covariance structure of the data has been taken into account.This method is based on estimating the correlation structure of the noise in two different domains, namely, the time domain and the wavelet domain.As to the presence of outliers, a lot of research has been carried out to eliminate the effects of outliers.Outliers in the data typically lead to large coefficients in the wavelet decomposition.These large coefficients will not be removed by the usual thresholding approaches.One of the early robust wavelet thresholding estimators dates back to Kovac 7 , where thresholding was applied to the output of a median filter.Additionally, Kovac and Sliverman 5 proposed a robust procedure involving the following steps.First, the problematic outliers are identified using a statistical test.Then, these outliers are removed from the data.The final step is to apply wavelet thresholding to the remaining irregularly spaced data to estimate the underlying function.Sardy et al. 8 proposed a robust M-type wavelet denoising method.Averkamp and Houdré 9 extended the minimax theory for wavelet thresholding to some known symmetric heavy tail noises.In a similar way to Kovac 7 , Wei et al. 10 used a two-stage robust denoising approach based on a median filter and wavelet thresholding.At the first stage, the corrupt data is passed through the median filter, in which the outliers are suppressed.At the second stage, the data is shrunk with a soft wavelet threshold, and the ultimate reconstructed signal is attained.More recently, new methods have been proposed by Oh et al. 11 , based on the concept of pseudo data, and by Oh et al. 12 , based on the concept of robust thresholding within robust estimation.Altaher and Ismail 13 showed that the practical performance of these methods would be improved by taking into consideration the automatic boundary treatment proposed by Oh et al. 14 and Oh and Lee 15 .In all the literature discussed above, Kovac and Silverman 5 were the only ones to take into account the effect of correlated noise and outliers at the same time, but the limitation of this method is the use of classical boundary assumptions.This method performs well when the boundary assumptions are satisfied, but its practical achievement can be greatly improved by using the idea of polynomial wavelet regression PWR or local polynomial wavelet regression LPWR for automatic boundary corrections.We conclude that there is a real need to find a proper way to take into simultaneous consideration the effect of the boundary problem, correlated noise, and outliers by one estimator; this is main contribution of the present paper.
The rest of this paper is organized as follows.In Section 2, we give a brief background for automatic boundary treatments in wavelet regression.Section 3 reviews the wavelet estimation of Kovac and Silverman 5 and generalized least squares, followed by our proposed methods.A simulation experiment and real data examples are presented in Section 4. Conclusions are drawn in Section 5.

Boundary Treatment in Wavelet Thresholding
Boundary effects have been taken into account when using classical wavelet thresholding by imposing on f some boundary assumptions such as symmetry or periodicity.Unfortunately, such assumptions are not always valid, and the problem still exists for some cases.The boundary problem can be treated automatically using the principles of either polynomial wavelet regression PWR or local polynomial wavelet regression LPWR .

Polynomial Wavelet Regression (PWR)
The polynomial wavelet regression method PWR considered by Oh et al. 14 is based on a combination of wavelet functions f W x and low order polynomials f P x .Therefore, the estimator of f, f PW , is The optimal order for the polynomial model will enable the polynomial estimator to remove the "nonperiodicity" in the data.Then, the remaining signal can be well estimated using wavelet thresholding with, say, a periodic boundary assumption.Different approaches have been proposed in order to find such choices of the proper polynomial order and threshold value.For the threshold value, the EBayesThresh procedure of 16 is the most popular for its good theoretical properties and good performance in simulations and in practice.To find f PW x , it is proposed to regress the observed data {y i } n i 1 on {x, . . ., x d } for a fixed order value, d.Once f P x has been estimated, the remaining signal is expected to be hidden in the residuals, {e i y i − f P x } n i 1 .For this, the second step is to apply wavelet regression to {e i } n i 1 .The final estimate of f will be the summation of f P x and f W x as in 2.1 .For more details, see, for example, Oh et al. 14 or Lee and Oh 17 .
The use of PWR for resolving boundary problems works efficiently if the polynomial estimator f P is able to remove the "nonperiodicity" in the data, and this certainly requires using a proper order with an appropriate threshold.Different criteria have been proposed to aid in choosing the order of the polynomial model that should be incorporated into the wavelet functions in order to get successful corrections to the bias at the boundary region.Here we give a brief introduction to the common ones.
1 Lee and Oh 17 proposed two criteria: the first criterion uses the value of d that maximizes Here, {a i } d i 0 denote the parameter coefficients of the polynomial model.In this method, a model of this order d has been used with the SURE threshold.
2 The second criterion is based on the Bayesian information criterion BIC , which is employed to select d by minimizing In this method, a model of order d should be used with EbayeThresh of Johnstone and Silverman 16 .
3 Oh and Kim 18 proposed three different Bayesian methods based on integrated likelihood, conditional empirical Bayes, and reversible jump Markov chain Monte Carlo MCMC .For the mathematical details regarding these methods, see Oh and Kim 18 .

Local Polynomial Wavelet Regression (LPWR)
The local polynomial wavelet regression method was introduced by Oh and Lee 15 as an improved boundary adjustment in wavelet regression.Instead of using the global polynomial fit as in Oh et al. 14 , it was proposed to use a local polynomial fit, f LP .Therefore, the local polynomial wavelet regression estimator, f LPWR x , can be written as follows: As shown in Oh and Lee 15 , f LPWR x can be computed through an iterative algorithm inspired by the back-fitting algorithm, see Johnstone and Silverman 16 .The following steps summarize the key points for finding the final local polynomial wavelet regression estimate, f LPWR x .
1 Select an initial estimate f 0 for f and let f LPWR f 0 .
International Journal of Mathematics and Mathematical Sciences 5

Wavelet Estimation of Kovac and Silverman [5]
The level-dependent thresholding method proposed by Johnstone and Silverman 3 considers the case where the noise is correlated but stationary.In this case, the variances of the wavelet coefficients differ between levels but are identical, so the coefficients can be thresholded level by level, see Altaher and Ismail 19 for numerical simulation results.On the other hand, if the noise is correlated but not stationary, the discrete wavelet transform DWT coefficients will be heteroscedastic and correlated.Kovac and Silverman 5 have taken these circumstances into consideration and proposed more general thresholding method which can be used for both stationary and nonstationary noise.
As shown in Kovac and Silverman 5 , an algorithm can be used to find all the variances and within-level covariances in the wavelet table of a sequence with a given covariance structure.If the covariance matrix of the original data is band limited, then the algorithm is linear in the length of the sequence.It also has been shown that the variancecalculation algorithm allows data on any set of independent variable values to be treated as follows: first, the noisy data are interpolated to a new equally spaced grid of a suitable length the function makegrid available in R can be used .Then, wavelet thresholding is applied to the gridded data.
Based on the information given in Kovac and Silverman 5 , the key points of this method are as follows.
Given a discrete sequence y y 1 , . . ., y n observed on x 1 , . . ., x n , a new equally spaced grid t 1 , . . ., t N on 0, 1 is generated, where Based on this grid, the original data values y i on x i are linearly interpolated to new data values y * on the new grid:

3.1
This interpolation can be written in the matrix form as follows: where the matrix R is the linear transformation matrix.The first step of wavelet regression is to apply the discrete wavelet transform, which can be written in the matrix form as follows:  For all j and k, consider thresholds of the form where λ 2 log n , the universal threshold.Although this method works fine for many different kinds of functions, only the classical treatment for the boundary problem is considered such as assumptions of periodicity or symmetry .Now, as a motivation for our proposed methods which will be described later, we present here a very popular function, called fg1, of Donoho and Johnstone 1 to show how badly this traditional method behaves when the regression function is neither periodic nor symmetric.Then, we will show how we could improve the estimation quality if we combine this method with a polynomial model or a local polynomial model.Figure 1 depicts the weakness and the improvement in the estimation method of 5 .In this figure, it can be seen that the best fit is given by our proposed methods, that is, when we combine the thresholding method of Kovac and Silverman 5 with polynomial or local polynomial models.
Details for these methods will be given next, but first we will introduce the method of generalized least squares which will be used in the first stage of our proposed method.

Ordinary Least Squares versus Generalized Least Squares
Consider the standard linear model: where Y is the n × 1 response vector; X is an n × p model matrix; β is a p × 1 vector of parameters to be estimated; is an n × 1 vector of errors.The error term could be constant and uncorrelated such ∼ N 0, σ 2 I n or correlated, such as ∼ N 0, Ω , where Ω is the variance-covariance matrix of the errors.Based on these two assumptions, we can distinguish two formulas for the parameter estimates.
The presence of correlated noise has several effects on OLS.Although the regression coefficient estimates are still unbiased, they are no longer minimum-variance estimates.
In the case of positively correlated noise, the residual mean squared error may seriously underestimate the error variance, σ 2 .Consequently the standard errors of the regression coefficients may be too small.As a result, the confidence intervals are shorter than they should be, see Montgomery et al. 20 .
An alternative solution to estimate the regression coefficients in the presence of correlated noise is to use the following generalized least square estimates.

3.8
Using the Choleski decomposition, the matrix Ω can be written in terms of a triangular matrix, S, such that Ω S S. The model in 3.6 can be written as follows: where Y * S −1 Y and X * S −1 X.In the new model 3.10 one can easily show that var * var S −1 I n .

Proposed Robust Estimation Method for Data with Outliers and Correlated Noise
In this section, we will propose some two-stage combined methods which can be used in the presence of outliers and correlated noise.The classical way in PWR and LPWR is to use OLS to get the residuals.Due to the serious effects of the correlated noise on OLS, we suggest using the method of generalized least squares GLSs as an effective alternative solution to get rid of or at least reduce these effects.A primary step for removing outlying observation is employed before applying GLS.
At the second stage, we employ the wavelet thresholding estimator illustrated in Section 3.1 for removing the remaining noise.Therefore, the final estimate will be the summation of the function estimate from the polynomial model or local polynomial at the first stage and the estimate of the Kovac and Silverman 5 estimator at the second stage.
The following two algorithms describe our two proposed methods.The first algorithm is for PWR and the second is for LPWR.Algorithm 3.1.Let {x i , y i } be the original noisy data sample.The following few steps describe the first proposed method.
1 Remove any outliers in {y i }.The function get.outliers available in R can be used to determine probable problematic outliers.
2 Interpolate the remaining data to a new equally spaced design and use them in the next steps.The function makegrid available in R can be used.
3 Apply the generalized least squares method and select the optimal polynomial order to be used for getting f P and then the residuals e i .The function gls is used to apply GLS; it is available in R in the nlme package.
4 Apply the wavelet thresholding procedure of 5 to the residuals and find f W .
5 The final estimate will be f P f W .
Algorithm 3.2.In a similar way to Algorithm 3.1, the second proposed method involves the following steps.
1 Remove any outliers in {y i }.The function get.outliers is used.
2 The function makegrid is used to interpolate the remaining data to a new equally spaced design and use them in the next steps.
3 Apply a local polynomial estimator using the interpolated data to get f LP and then the residuals.
4 Apply the wavelet thresholding procedure of Kovac and Silverman 5 to the residuals and find f W .
5 The final estimate will be f LP f W .

Simulation Experiment
The R statistical package was used to carry out a simulation study in order to compare the numerical performance of the following three methods: 1 The wavelet regression method of Kovac and Silverman 5 , labelled as I . 2 The proposed method based on a polynomial model, labelled as II .
3 The proposed method based on a local polynomial model, labelled as III .
We simulated 100-time series data of length 128 using three common correlated process models with different parameters.These included first-and second-order autoregressive models and a first-order moving average model, see Generating time series data will ensure that we have data with correlated noise.The noises mentioned above will play the role of having outliers.For example, in the second noise setting, a small fraction of the clean observations is replaced by symmetric outliers SOs , and, in the third noise setting, it is replaced by asymmetric outliers AOs .The last noise is a heavy tail noise, a common example of non-Gaussian noise with outliers.
The numerical results for the global mean squared error and local mean squared error are given in Tables 3, 4, and 5. Having examined these simulation results, the following empirical remarks can be made.
i In terms of the global mean squared error: 1 In the case of clean data, the global mean squared error shows that the proposed method III outperforms the other two methods for all three autoregressive models with their four different parameters.Among the other two methods, the proposed method I outperforms the traditional method of 4 In the case of fat tail data, the global mean squared error shows that the proposed method III performs much better than the other two methods for all three autoregressive models with their four different parameters.
ii In terms of the local mean squared error: 1 In the case of clean data, the local mean squared error shows that the proposed method III outperforms the remaining two methods for all three autoregressive models and their four different parameters.The only exception was for AR 1 with parameter 0.3 and AR 2 with parameter 0.1, 0.5 . 2 In the case of symmetric outliers, the local mean squared error shows that the two proposed methods II and III outperform the traditional method of Kovac and Silverman 5 .For AR 1 and AR 2 , the proposed method II achieved better results except for the parameters 0.3 and 0.1, 0.5 .3 In the case of asymmetric outliers, the local mean squared error shows that the proposed method II outperforms the other two methods except for AR 1 with parameter 0.9 and MA 1 with parameter 3.8 .4 In the case of long tail data, the local mean squared error shows that the proposed method III performs better than the other two methods.5 Generally, in terms of the global mean squared error, out of the 48 different models with different parameters used in this simulation, it has been found that the first proposed method won 25 times, the second proposed method won 23 times, while the traditional method of Kovac and Silverman 5 did not win even once.In terms of the local mean squared error, out of the 48 different models with different parameters used in the simulation, it has been found that the first proposed method won 19 times, the second proposed method won 22 times, while the traditional method of Kovac and Silverman 5 won only 7 times.
Overall, this simulation experiment revealed that the two proposed methods are preferable to that of Kovac and Silverman 5 in fitting data with correlated noise and outliers.

Evaluation through Real Data Examples
In this section, we will apply the traditional method of Kovac and Silverman 5 and the two new proposed methods to two data sets.The first data are the US monthly influenza mortality data taken from Shumway and Stoffer 22 .The response variable denotes the mortality caused by pneumonia and influenza at month t. Figure 2 is a plot of the monthly pneumonia and influenza deaths per 10,000 in the US 11 years from 1968 to 1978.Many interesting features can be seen in this data.The number of deaths typically tends to increase more slowly than it decreases.There is no reason to think that this time series follows a linear Gaussian process.If that was the case, we would not have seen such large bursts of positive and negative changes occurring periodically.
The second data are also from Shumway and Stoffer 22 , the average global temperature deviations for n 64 years in degrees centigrade 1880-1943 ; see Figure 3.
We first study the correlation structure to make sure that we have data with correlated noise.Figure 4 shows the autocorrelation functions and their corresponding partial correlation functions.Then we follow up by computing the Durbin-Watson statistics for  the OLS regression estimates, see Table 2.The results are clear evidence for significant correlations at lags 1, 4, and 5 for both data sets.Figure 5 shows the estimated curve using the two proposed methods panels c and d and the traditional method panel b of Kovac and Silverman 5 .Visual evaluation is not sufficient for determining which method is more accurate than the others, so we used the mean squared error for judgement.The mean squared error values are 0.01127487, 0.01046806, and 0.004596079 for the traditional method, the first proposed method, and the second proposed method, respectively.This means that the best fit is given by the second proposed method, then by the first proposed method, and finally by the traditional method.The first row is for first data, while the second row is for the second data.The fitting of the second data set is depicted in Figure 6.The mean squared error values are 0.008085288, 0.006734333, and 0.005019866 .This means that the best fitting is given by the second proposed method, then by the first proposed method, and finally by the traditional method.
A similar conclusion is reached when we examine the local mean squared error.The local mean squared error values using the first data set are 0.01767953, 0.01677273, and 0.00516326 and, for the second data set, 0.007946142, 0.002644399, and 0.002241301.

Conclusion
In this paper, the boundary problem and the effects of correlated noise and outliers have been considered for wavelet thresholding estimation.Two combined methods have been proposed to eliminate simultaneously the effect of the boundary problem, the outliers, and the correlated noise.The proposed methods have combined the thresholding estimator of 5 with either a local polynomial model or a polynomial model, through generalized least squares estimation.Simulation experiment and real data examples have been used to evaluate the practical performance of the proposed methods.Overall, the results revealed that the two proposed methods are preferable to that of Kovac and Silverman 5 in fitting data with correlated noise and outliers.

2
For j 1, 2, . . .iterate the following steps.a Apply wavelet thresholding to the residuals y i − f LPWR and obtain f j W . b Estimate f j LP by fitting a local polynomial regression to

Figure 1 :
Figure 1: Curve estimation of data with correlated noise: a fitted by level-dependent thresholding, b fitted by the method of Kovac and Silverman 5 , c fitted by the first proposed method, and d fitted by the second proposed method.The black line is the fitted curve, while the red is the true function.

Figure 4 :
Figure 4: Autocorrelation and partial-autocorrelation functions for the residuals from the OLS method.The first row is for first data, while the second row is for the second data.

Figure 5 :
Figure 5: Curve estimation for the first data: a noisy data, b the fitting by 5 , c the fitting by first proposed method, and d the fitting by the second proposed method.

Figure 6 :
Figure 6: Curve estimation for the second data: a noisy data, b the fitting by 5 , c the fitting by first proposed method, and d the fitting by the second proposed method.The green line is the fitting.
Wy * ,3.3where W is the N × N wavelet matrix.Since the original observations {y i } are correlated, the covariance matrix of y * is given by

Table 1 :
Autoregressive models used in the simulations.

Table 1 .
For each model, we added four different kinds of noise, similar to Gelper et al.21 .Then the three wavelet regression methods I, II, and III were applied, and, finally, the global and local mean squared errors were computed for comparison.The four kinds of noise were as follows.

Table 2 :
Summary for Durbin-Watson test.

Table 3 :
Simulation results of global and local mean squared using AR 1 with different parameters.The first half of the table displays the global mean squared error.

Table 4 :
Simulation results of global and local mean squared using AR 2 with different parameters.The first half of the table displays the global mean squared error.

Table 5 :
Simulation results of global and mean squared using MA 1 with different parameters.The first half of the table displays the global mean squared error.