Robust Wild Bootstrap for Stabilizing the Variance of Parameter Estimates in Heteroscedastic Regression Models in the Presence of Outliers

Nowadays bootstrap techniques are used for data analysis in many other fields like engineering, physics, meteorology, medicine, biology, and chemistry. In this paper, the robustness of Wu 1986 and Liu 1988 ’s Wild Bootstrap techniques is examined. The empirical evidences indicate that these techniques yield efficient estimates in the presence of heteroscedasticity problem. However, in the presence of outliers, these estimates are no longer efficient. To remedy this problem, we propose a Robust Wild Bootstrap for stabilizing the variance of the regression estimates where heteroscedasticity and outliers occur at the same time. The proposed method is based on the weighted residuals which incorporate the MM estimator, robust location and scale, and the bootstrap sampling scheme of Wu 1986 and Liu 1988 . The results of this study show that the proposed method outperforms the existing ones in every respect.


Introduction
Bootstrap technique was first proposed by Efron 1 .It is a computer intensive method that can replace theoretical formulation with extensive use of computer.The attractive feature of the bootstrap technique is that it does not rely on the normality or any other distributional assumptions and is able to estimate standard error of any complicated estimator without any theoretical calculations.These interesting properties of the bootstrap method have to be traded off with computational cost and time.There are considerable papers that deal with bootstrap methods in the literatures see 2-5 .The classical bootstrap methods are known to be a good general procedure for estimating a sampling distribution under the independent and identically distributed i.i.d.models.Let us consider a standard linear regression model: where Y y 1 , y 2 , . . ., y n T , X x 1 , x 2 , . . ., x n T , and ε ε 1 , ε 2 , . . ., ε n T .In this equation β is a k×1 vector of unknown parameters, Y is an n × 1 vector, X is an n × k data matrix of full rank k ≤ n, and ε is an n × 1 vector of unobservable random errors with E ε 0 and V ε σ 2 I.In practice the i.i.d.set-up is often violated, as, for example, the homoscedastic assumption of Var ε i σ 2 I is often violated.Wu 6 proposed a weighted bootstrap technique which gives better performance under both the homoscedastic and heteroscedastic models.However, a better alternative approximation is developed by Liu 7 following the suggestions of Liu 7 and Beran 8 .This type of weighted bootstraps is called the wild bootstrap in the literature.Several attempts have been made to use the Wu and Liu wild bootstrap techniques to remedy the problem of heteroscedasticity see 6, 7, 9, 10 .
Salibian-Barrera and Zamar 11 pointed out that the problem of classical bootstrap is that the proportion of outliers in the bootstrap sample might be greater than that of the original data.Hence, the entire inferential procedure of bootstrap would be erroneous in the presence of outliers.As an alternative, robust bootstrap technique has been drawn a greater attention to the statisticians see 11-15 .However, not much work is devoted to bootstrap technique when both outliers and heteroscedasticity are present in a data.Those wild bootstrap techniques can only rectify the problem of heteroscedasticity and not resistant to outliers.Moreover, these procedures are based on the OLS estimate which is very sensitive to outliers.We introduce the classical wild bootstrap in Section 2. In Section 3, we discuss the newly proposed robust wild bootstrap methods.A numerical example and a simulation study are presented in Sections 4 and 5, respectively.The conclusion of the study is given in Section 6.

Wild Bootstrap Techniques
In regression analysis, the most popular and widely used bootstrap technique is the fixed-x resampling or bootstrapping the residuals 2 .This bootstrapping procedure is based on the ordinary least squares OLS residuals summarized as follows.
Step 1. Fit a model y i f x i , β ols by the OLS method to the original sample of observations to get β ols and hence the fitted model is y i f x i , β ols .
Step 2. Compute the OLS residuals ε i y i − y i and each residual ε i has equal probability, 1/n.Step 4. Fit the OLS to the bootstrapped values y * b i on the fixed-x to obtain β * b ols .
We call this bootstrap scheme Boot ols since it is based on the OLS method.When heteroscedasticity is present in the data, the variances of the data are different and neither of these bootstrap schemes can yield efficient estimates of the parameters.Wu 6 showed that they are inconsistent and asymptotically biased under the heteroscedasticity.Wu 6 proposed a wild bootstrap weighted bootstrap that can be used to obtain the standard error which is asymptotically correct under heteroscedasticity of unknown form.Wu slightly modified Step 3 of the OLS bootstrap and kept the other steps unchanged.For each i, draw a value t * i , with replacement, from a distribution with zero mean and unit variance and attached to y i for obtaining fixed-x bootstrap values y * b i , where x T i X T X x i is the ith leverage.Note that the variance of t * i ε i is not constant when the original errors are not homoscedastic.Therefore, this bootstrap scheme takes into consideration the nonconstancy of the error variances.As an alternative 6 , t * i can be chosen, with replacement, from a 1 , a 2 , ..., a n , where For a regression model with intercept term, ε i approximately equals zero.This is nonparametric implementation of Wu's bootstrap since the resampling is done from the empirical distribution function of the normalized residuals.We call this method Wu's bootstrap and denote it by Boot wu .
Following the idea of Wu 6 , another wild bootstrap technique was proposed by Liu 7 in which t * i is randomly selected from a population that has third central moment equal to one with zero mean and unit variance.Such kind of selection is used to correct the skewness term in the Edgeworth expansion of the sampling distribution of I T β, where I is an n-vector of ones.Liu's bootstrap can be conducted by drawing random numbers t * i in the following two ways.It is worth mentioning that selecting random numbers t * i by procedure 1 or procedure 2 of Liu 7 will produce third central moment equal to one.Following Cribari-Neto and Zarkos 16 , we consider the second procedure of drawing the random sample t * i .We call this bootstrap scheme as Boot liu .

Proposed Robust Wild Bootstrap Techniques
We have discussed the classical wild bootstrap procedures which are based on the OLS residuals.It is now evident that the OLS suffers a huge setback in the presence of outliers since it has 0% breakdown 17 .Since the wild bootstrap samples are based on the OLS residuals, it is not resistant to outliers.Hence, in this article we propose to use the high-breakdown and high-efficiency robust MM estimator 18 to obtain the robust residuals.It is expected that for good data point, the residuals of the MM estimator are approximately the same as the OLS residuals.On the other hand, the residuals of the MM estimator would be larger for outlier observation.We assign weights to the MM residuals.The standardized residuals | ε MM i |/σ MM are computed, where σ MM is the square root of the mean squares error of the residuals of the MM estimates see 19 .Following the idea of Furno 20 , weights equal to one and where c is an arbitrary constant which is chosen between 2 and 3. We multiply the new weights with the residuals of the MM estimates and the resultants are denoted by ε WMM i .
It is now expected that not only the residuals corresponding to the good data points but also the residuals corresponding to the bad data point of the MM residuals tend to be similar to the OLS residuals with no outliers.Based on the new weighted residuals ε WMM i , we propose to robustify Boot ols , Boot wu , and Boot liu .We call the resulting robust bootstraps RBoot ols , RBoot wu , and RBoot liu .
We propose to replace the OLS residuals by with simple random sampling and the other steps remain unchanged.We call this bootstrap scheme Rboot ols .Now we will discuss the formulation of robust wild bootstrap based on Wu's procedures.The algorithm is summarized as follows.
Step 1. Fit a model y i x i β ε i by the MM estimator to the original sample of observations to get the robust parameters β MM and hence the fitted model is y i x i β MM .
Step 2. Compute the residuals of the MM estimate ε MM i y i − y i .Then assign weight to each residual, ε MM i , such that the weight equals Step 3. The final weighted residuals of the MM estimates denoted by ε WMM i are formulated by multiplying the weights obtained in Step 2 with the residuals of the MM estimates.That is, Step 4. Construct a bootstrap sample y * i , X , where and t * is a random sample following Wu 6 procedure.
Step 5.The OLS procedure is then applied to the bootstrap sample y * i , X , and the resultant estimate is denoted by Here, the robust estimates are very reliable since the bootstrap sample is constructed based on the robust weighted residuals, ε WMM i .
Step 6. Repeat Steps 4 and 5 for B times, where B is the bootstrap replications.
As discussed earlier, in the classical scheme of Wu's bootstrap, the quantity t * i is drawn from a population that has mean zero and variance equal to one or, t * i can be drawn from normalized residuals a 1 , a 2 , ..., a n , that is, In this paper we also want to robustify the wild bootstrap based on the Liu 7 algorithm.It is important to note that the only difference between the Wu and Liu implementation of wild bootstrap is the choice of the random sample t * i .In the proposed robust bootstrap based on the Liu wild bootstrap, we choose the random sample t * i exactly the same manner as the classical Liu bootstrap.We call this bootstrap scheme as RBoot liu .

Numerical Example
In this section, a numerical example is presented to assess the performance of the robust wild bootstrap methods.In order to compare the robustness of the classical and robust wild bootstrap in the presence of outliers, the Concrete Compressive Strength data is taken from Yeh 22 .Concrete is the most important material in civil engineering.The concrete compressive strength is a function of the eight output such as cement Kg/m 3 , blast furnace slag Kg/m 3 , fly ash Kg/m 3 , water Kg/m 3 , superplasticizer Kg/m 3 , coarse aggregate Kg/m 3 , fine aggregate Kg/m 3 , and age of testing days .The residuals versus fitted values are plotted in Figure 1 that show a funnel shape suggesting a heterogeneous error variances for the data see 19 .
We checked whether this data set contain any outliers or not by using Least trimmed of Squares LTSs residuals.It is found that 61 observations about 6% of the sample of size 1030 appear to be outliers.The robust and non-robust Classical wild bootstrap methods were then applied to the data by considering two types of situations, namely, the data with outliers and data without outliers omitted the outlying data points .The results are based on 500 bootstraps and are given in Table 1.
The standard errors of the parameter estimates from robust and nonrobust wild bootstrap methods are exhibited in Table 1.The average standard errors of the parameter estimates are also shown.When there are no outliers, the standard errors of the classical wild bootstrap are reasonably closed to the standard errors of the robust wild bootstrap.It is interesting to note that the classical wild bootstrap methods provide larger standard errors compared to the wild bootstrap methods when outliers are present in the data.We cannot make a final conclusion yet, just by observing the results of the real data, but a reasonable interpretation up to this stage is that the classical wild bootstrap is affected by outliers.

Simulation Study
In this section, the performances of the proposed robust wild bootstrap estimators are evaluated based on a simulation study.At first we generate some artificial data to see the performance of proposed bootstrap techniques.The final investigation of the performance of the proposed estimators is verified by the simulation approach on bootstrap samples.

Artificial Data
We follow the data generation technique of Cribari-Neto and Zarkos 16 and MacKinnon and White 23 .The design of this experiment involves a linear model with two covariates: 5.1 We consider the sample sizes n 20, 60, 100.For n 20 the covariate values x 1i were obtained from U 0, 1 and the covariate values x 2i were obtained from N 0,1 .These observations were replicated three and five times for creating the sample of size n 60 and n 100, respectively.The data generation was performed using β 0 β 1 β 2 1.For all i under the homoscedasticity, σ i 1.However, the main interest here is to find the heteroscedastic model.In this respect, we create a heteroscedastic generating mechanism following Cribari-Neto 24 's work, where The degree of heteroscedasticity was measured by The degree of heteroscedasticity remains constant for different sample sizes since the covariate values are replicated for generating different sample sizes.In our study the degree of heterogeneity was approximately ℘ 4. We focus on the situation where regression design would include outliers.To generate a certain percentages of outliers in Model 5.1 , some i.i.d.normal errors ε i 's were replaced by N 5, 10 .Hence the contaminated heteroscedastic model becomes where ε i cont.αN 0, 1 1 − α N 5, 10 and α is chosen according to level of percentage of outliers.In this study we choose the 5%, 10%, 15%, and 20% outliers in the model; that is, α is 0.95, 0.80, 0.85, and 0.80, respectively.Now for each sample size, the OLS, the classical, and the proposed robust wild bootstrap were then applied to the data.The replications of the bootstrap were 500 in each model for the different sample sizes.It is noteworthy that the bootstrap is extremely computer intensive, and S-plus programming language was used for computing the bootstrap estimates.
The wild bootstrap standard errors of the estimates for different sample sizes and different percentage of contaminations are computed.The bootstrap standard errors of Boot ols, Boot wu , and Boot liu are obtained by taking the square root to the main diagonal of the covariance matrix: where On the other hand, the bootstrap standard errors of RBoot ols , RBoot wu , and RBoot liu are obtained by taking the square root to the main diagonal of the covariance matrix as given in 5.5 ; the only essential difference is, however, we replace the usual bootstrap estimates by the robust bootstrap estimates.
The influences of outliers on the standard errors of the estimates are visible in Figures 2, 3, and 4. In these plots, the average standard errors of the parameters estimates are plotted at different levels of outliers for different bootstrap methods.The results presented in Figures 2-4 show that the performances of the wild bootstrap estimates are fairly close to the classical estimates at the 0% level of contamination.It emerges that the average standard errors of the RBoot wu and RBoot liu are closer to the average standard errors of the classical Boot wu and Boot liu , respectively, in "clean" data, regardless of the percentage of outliers.However, at the 5%, 10%, 15%, and 20% levels of contaminations, the classical standard errors of the bootstrap estimates become unduly large.On the contrary, it is interesting to see that not much influence is visible for the robust wild bootstrap techniques of RBoot wu and RBoot liu , at the different percentage levels of outliers.It is also observed that the performance of RBoot liu is the best overall followed by RBoot wu .

Simulation Approach on Bootstrap Sample
In the previous section, we used artificial data sets for different sample sizes.Now we would like to investigate the performances of different bootstrap estimators where data sets are generated by Monte Carlo simulations.Let us consider a heteroscedastic model which is given by The covariate values of x 1i and x 2i are generated from U 0, 1 for sample sizes 20, 60, and 100.We have also considered β 0 β 1 β 2 1 as the true parameters in this model and the heteroscedasticity generating function was σ 2 i exp 0.4x 1i 0.4x 2i .In this study the level of heteroscedasticity is set as ℘ max σ 2 i / min σ 2 i 4. In each simulation run and for the different sample size, ε i 's were generated from N 0, 1 for the data with no outliers.However, for generating the 5% and 10% outliers, the 95% and 90% of ε i 's were generated from N 0, 1 and the 5% and 10% were generated from N 0, 20 .It is worth mentioning that although such simulations are extremely computer intensive, the simulation for each sample size entails a total of 250000 replications with 500 replications and  the Boot wu increases with the increase in the percentage of outliers.On the other hand, the RBoot wu , and the RBoot liu are slightly biased with the increase in the percentage of outliers.We can draw the same conclusion from the mean of the biasness of the estimates.The standard errors of the non-robust and robust wild bootstrap are presented in Table 3.It is observed that the standard errors of the classical bootstrap estimates increase with the increase in the percentage of outliers for different sample sizes.However, the robust bootstrap estimates are slightly affected by these outliers.By investigating the average standard errors of the estimates, it is also observed that the robust wild bootstrap techniques provide less standard error of the estimates in the presence of outliers.Finally, the robustness of different bootstrapping techniques are evaluated based on robustness measures defined in 5.5 .Here the percentage robustness measure, that is, the ratio of the RMSEs of the estimators compared with the RMSEs of the OLS estimator for good data is presented in Table 4. From this table we see that the OLS and the classical bootstrap methods perform poorly.In the presence of outliers, the efficiency of the classical bootstrap estimates is very low.However, the efficiency of the robust bootstrap estimates is fairly closed to 100%.

Concluding Remarks
This paper examines the performance of classical wild bootstrap techniques which were proposed by Wu 6 and Liu 7 in the presence of heteroscedasticity and outliers.Both the artificial example and simulation study show that the classical bootstrap techniques perform poorly in the presence of outliers in the heteroscedastic model although they perform superbly for "clean" data.We attempt to robustify those classical bootstrap techniques to gain better efficiency in the presence of outliers.The numerical results show that the newly proposed robust wild bootstrap techniques, namely, the RBoot wu and RBoot liu outperform the classical wild bootstrap techniques when both outliers and heteroscedasticity are present in the data.RBoot liu performs slightly better than RBoot wu .Another advantage of using the RBoot wu and the RBoot liu is that no diagnosis for the data is required before the application of these methods.

Step 3 .
Draw a random sample ε * 1 , ε * 2 , ..., ε * n from ε i with simple random sampling with replacement and attached to y i for obtaining fixed-x bootstrap values y * b i where y * b i f x i , β ols ε * b i .

10 Figure 1 :
Figure 1: Residuals versus Fitted values plot of Concrete Compressive Strength data.

Figure 2 :Figure 3 :
Figure 2: The average effect of outliers on standard errors of parameters for sample size n 20.

Figure 4 :
Figure 4: The average effect of outliers on standard errors of parameters for sample size n 100.

2
21wever, following Maronna et al.21, we suggest computing the robust normalized residuals based on median and normalized median absolute deviations NMADs instead of mean and standard deviation which are not robust.Thus, i − median ε WMM i |}/0.6745.We call this proposed robust nonparametric bootstrap as RBoot wu .

Table 1 :
Wild bootstrap standard errors of the parameters for the Concrete Compressive Strength data.

Table 2 :
Biasness measures of the non-robust and robust wild bootstrap.Neto and Zarkos 16 and Furno 20 .The simulation results for the different bootstrap methods are presented in Tables2-4.Table2shows the biasness measures of the non-robust and robust wild bootstrap techniques.It is observed that for the different sample sizes, the biasness of the Boot ols , the Boot liu , and

Table 3 :
Standard errors of the non-robust and robust wild bootstrap.

Table 4 :
Robustness measure of RMSE of the non-robust and robust wild bootstrap.