Robust SiZer Approach for Varying Coefficient Models

Varying coefficient models have widely been applied to many practical fields for exploring dynamic patterns of the regression relationships. In this study, we propose a robust scenario of SiZer (significant zero crossing of derivatives) inference approach based on the local least absolute deviation fitting procedure and the bootstrap confidence interval to uncover the statistically significant features of the coefficient functions in a varying coefficient model under different smoothing scales. The simulation study shows that the proposed SiZer approach is quite robust to outliers and performs well in finding the significant features of the coefficient functions. Furthermore, a real environmental data set is analyzed to demonstrate the application of the proposed approach.


Introduction
Varying coefficient models, introduced by Cleveland et al. [1] and Hastie and Tibshirani [2], are a useful extension of the classical linear regression models.By allowing the regression parameters in the linear models to be some nonparametric functions of a covariate, the models can be used to explore dynamic patterns of the regression relationship.Many approaches, including the kernel method [1], splines method [2][3][4], the local polynomial method [5,6], and local maximum likelihood estimation method [7,8], have been proposed to fit the varying coefficient models.Furthermore, Fan and Zhang [9] and Zhang and Peng [10] developed the simultaneous confidence bands of the coefficients in the varying coefficient models and the generalized varying coefficient models, respectively.
Most of the aforementioned methods are based on the mean regression and employ the least-square procedure to estimate the coefficients.As it is well known, the least-square procedure may not be a proper choice in the presence of heavy tailed errors or extreme values in a dataset and the resulting coefficient estimates consequently suffer from a lack of robustness.The effect of outliers may distort the model fitting process and create spurious patterns in the estimates of the coefficients [11].Therefore, the robust procedures, such as  1 estimation [12], M-estimation [13,14], and quantile regression [15][16][17], have been proposed to handle the effect of outliers in the fits of the varying coefficient models.Although the above studies developed the robust methods to estimate the coefficient functions and established the pointwise asymptotic normality of the estimators, they mainly focused on the construction of the pointwise confidence intervals of the coefficients.This is inadequate for many applications.For instance, in the process of exploring the dynamic features of the regression relationship in the presence of outliers in the varying coefficient models, one often wants to know whether a coefficient function really varies, and if so, how it varies in detail.
The foregoing two questions involve the crucial problem of choosing an optimal bandwidth which may seriously affect the estimates of and the inferences on the regression coefficients.The issue of optimal bandwidth section is far from being solved satisfactorily, although there have been some practical robust criterion of choosing the bandwidth, such as the absolute cross-validation [12], the Schwarz information criterion [16], and the Akaike information criterion [17].Furthermore, when the coefficient functions in a varying coefficient model possess different degrees of smoothness, the situation becomes more complicated because different bandwidths have to be, respectively, selected for the coefficients with different degrees of smoothness [5,6,8,9].It is also worth noting that the selected optimal bandwidth for the estimation may not be proper for hypothesis testing [18,19].
The SiZer (significant zero crossing of derivatives) method, proposed by Chaudhuri and Marron [20,21], provides a powerful framework for exploring the features in a family of smooths indexed by different smoothing levels and allows one to handle the bandwidth selection problem in a new way.The SiZer method is based on the idea that the estimated curves under different smoothing levels may provide different information on the variation of the curve and highlights the full family of smooths instead of the "true underlying curve." Due to its flexibility and interpretability, the SiZer method has extensively been studied in its methodology and applications.Hannig and Marron [22] have proposed an improved simultaneous inference version of SiZer.The Bayesian scenario of SiZer has been developed by Erästö and Holmström [23], Godtliebsen and Øigård [24], and Øigård et al. [25].Ganguli and Wand [26] and Godtliebsen et al. [27,28] have considered the bivariate smoothing technique in SiZer.SiZer for smoothing splines method has been investigated by Marron and Zhang [29].The other specific SiZer tools include SiZer for additive model [19], comparison of curves [30], time series analysis [31], SiZer for regression quantiles [32], analysis of random signals [33], and spatially dependent images [34], among others.Recently, Zhang and Mei [35] developed a SiZer approach for the varying coefficient models to explore the statistically significant features of the coefficient functions under different bandwidths.Although the study shows that the SiZer method is efficient at uncovering the genuine features such as monotonicity, peaks, valleys, and even the degree of smoothness of the coefficients, it did not consider the situation where outliers are present in the data.In fact, as pointed out by Hannig and Lee [36], outliers can seriously inflate the estimate of the model variance in the least-squaresbased SiZer methods, which accordingly deflate the test statistics.As a result, some important underlying features in the coefficient functions may be missed, which can mislead the varying patterns of the regression relationship.
In this paper, We focus on developing the robust scenario of the SiZer inferential approach to the varying coefficient models in the presence of heavy tailed errors or extreme values in a dataset.The varying coefficient model is of the form where   and (  ,  1 ,  2 , . . .,   ) are the observations of the response variable and the explanatory variables, respectively and   (⋅) ( = 1, 2, . . ., ) are unknown nonparametric functions and   's are the independent random disturbances with zero median.Note that the model can include a varying intercept by setting  1 ≡ 1.The proposed robust SiZer approach simultaneously considers a family of the median estimates of the coefficient functions indexed by a series of bandwidths, and the estimates are obtained by the local least absolute deviation (LAD) procedure [37].The corresponding SiZer maps summarize a plenty of results of the multiple slope tests based on the bootstrap confidence intervals, which can provide the information on the variation patterns of the coefficients at different scales of smoothing.The remainder of this paper is organized as follows.Section 2 derives the robust version of the SiZer approach based on the LAD median estimation of the varying coefficient model.Section 3 investigates the performance of the proposed method by simulations.The robust SiZer approach is further used to analyze a real-world dataset in Section 4. Concluding remarks are given in Section 5.

Robust SiZer Approach for the
Varying Coefficient Model

LAD Estimates of the Coefficients and Their Derivatives.
The local LAD method for nonparametric regression, studied comprehensively by Wang and Scott [37], is much more robust than the local least-squares procedures.Furthermore, the local LAD method can be easily generalized to the case of multiple regression because solving the LAD problem is equivalent to solving a linear programming that is easy to be implemented even for a large-scale linear programming.Tang and Wang [12] have adopted the local LAD method to calibrate the varying coefficient model.Considering that the estimates of the derivatives of the coefficient functions take a key role in the robust SiZer inference for varying coefficient model, we use the local linear LAD fitting procedure to simultaneously obtain the estimates of the coefficients and their derivatives.Suppose that each coefficient   () in the model in (1) has continuous derivative in the domain of  and let {(  ,  1 , . . .,   ,   )}  =1 be a sample from the model.For each focal point  0 , the coefficient functions   () ( = 1, . . ., ) can be approximated in the neighborhood of  0 by where    ( 0 ) is the derivative of   () at  0 .Given  0 and a bandwidth ℎ, the local linear LAD estimates of   ( 0 ) and    ( 0 ) are, respectively, denoted as βℎ ( 0 ) and β ℎ ( 0 ) ( = 1, . . ., ), which are the solution of the following locally weighted LAD problem: minimize where ( 0 ) = ( 1 ( 0 ), . . .,   ( 0 ),   1 ( 0 ), . . .,    ( 0 ))  , and  ℎ (⋅) = (⋅/ℎ)/ℎ with (⋅) being a kernel function which is taken to be the Gaussian kernel.Wagner [38] showed that solving an LAD problem is equivalent to solving a linear programming.For the weighted LAD problem in (3), the corresponding linear programming can be formulated as follows. Let for  = 1, 2, . . ., .We have Then the weighted LAD problem in (3) can equivalently be expressed as the following linear programming: The solutions for   ( 0 ) and    ( 0 ) ( = 1, . . ., ) in the linear programming in ( 5) are the corresponding local linear LAD estimates βℎ ( 0 ) and β ℎ ( 0 ) ( = 1, . . ., ), respectively.
The above linear programming can be solved by the so-called simplex method or interior point method (see [39, pages 181-202]).Currently, many computer software packages (e.g., the R software) are available for solving the linear programming problems.

Robust SiZer Maps for the Coefficient Functions.
In order to uncover statistically significant features of a smooth, Chaudhuri and Marron [20] constructed a color map, called the SiZer map, to summarize the inferential results of the confidence intervals of the smooths of the derivatives at each location and each bandwidth.In the SiZer map, the horizontal axis associates with a certain covariable and the vertical axis is the logarithmic value of the bandwidth.When the confidence interval for the smooth of the derivative is completely above (below) zero at the given location and bandwidth, the corresponding pixel in the SiZer map will be colored blue (red) to illustrate that the smooth of the curve has a significantly increasing (decreasing, resp.)trend around the pixel.Furthermore, a purple pixel in the SiZer map means that the corresponding confidence interval for the smooth of the derivative contains zero, which suggests that the smooth of the curve does not significantly vary around that pixel.This graphical device can display the significant features of the smooth of the curve such as monotonicity, peaks, and valleys.

Bootstrap Confidence Intervals for the LAD Derivative
Estimates.In order to build the robust SiZer maps for the varying coefficient model, we need to firstly construct the confidence interval for β ℎ () ( = 1, . . ., ) and the LAD derivative estimates of the coefficient functions in model (1) with the bandwidth ℎ.The asymptotic normality of the LAD estimator derived, for example, in the context of nonparametric regression [37] and in the context of the varying coefficient model [12], makes it possible to construct the asymptotic confidence interval.However, the nominal coverage probability of the confidence interval for LAD estimators may be very different from the true coverage probability in the case of the finite sample [40][41][42].Thus, the wild bootstrap techniques have been proposed in the literature to improve the finite-sample performance of the tests in the median regression (e.g., [43,44]).In this paper, we use a modified version of the residual-based wild bootstrap procedure [44,45] to construct the confidence interval for the LAD estimator for median regression.Given the i.i.d.sample {(  ,  1 , . . .,   ,   )}  =1 , the method can be described in what follows.
Step 2. For each , draw the bootstrap residual  * ℎ =   |ε ℎ |, where the weight   is independently generated from the Bernoulli distribution with equal probabilities at −1 and 1.

Construction of the Robust SiZer Maps of the Coefficient
Functions.Suppose that the grid points {ũ  ;  = 1, 2, . . ., } are equally spaced in the whole range of the covariate .
Empirically, according to the suggestion by Chaudhuri and Marron [20], ℎ 1 can be taken as 2 min and ℎ  can be taken as 1.5 max , where  min and  max are, respectively, the minimal and maximal distances among all of the distances between each pair of sample points of covariate .As a result, a SiZer map is covered by  ×  grid pixels denoted by (ũ  , ℎ  ) ( = 1, . . ., ,  = 1, . . ., ).
In practice, for each of the coefficient functions   () ( = 1, 2, . . ., ) and the given bandwidth ℎ  ∈ , we can perform the test according to the confidence interval in (7) at each pixel (ũ  , ℎ  ) and classify the confidence intervals into three categories: completely above zero, completely below zero, and containing zero.Then the corresponding pixel in the SiZer map is, respectively, colored in blue, red, and purple to illustrate that the smooth of each coefficient has a significantly increasing, significantly decreasing, and no significantly varying trend around each pixel.

Simulation Study
In this section, we conduct the simulation experiments to evaluate the performance of the proposed robust SiZer method on discovering the significant features of the coefficient functions in the varying coefficient model under the presence of outliers in the data.

Design of the Experiments.
The experimental data are generated by the following varying coefficient model: where the observations   ( = 1, . . ., ) of the covariate  are independently drawn from the uniform distribution on the interval (0, 1), and the observations  1 and  2 ( = 1, . . ., ) of the explanatory variables are randomly drawn from the uniform distributions on the intervals (0, 2) and (0, 3), respectively.The coefficients  1 (  ) and  2 (  ) are, respectively, chosen from the following three groups of the functions.
Group 1: Group 2: Group 3: The coefficients  1 () and  2 () in each of the above groups have different degrees of smoothness and they have been used in Zhang and Mei [35] to demonstrate the local least-squares-based SiZer analysis for the varying coefficient model.Figure 1 depicts the true curves of the coefficient functions  1 () and  2 () in Groups 1-3.
The random errors  2 ( = 1, . . ., ) in the model in (8) are independently drawn from one of the following distributions.
Case 2. The Cauchy distribution (0, 0.2) with the location parameter 0 and the scale parameter 0.2.
In order to give an impression of the signal to noise ratio in the simulation model, Figure 2 depicts the generated data with three error distributions and the coefficients in Group 2. When the model errors are drawn from the distribution (0, 0.5 2 ) in Case 1, theoretically, no outliers will be included in the data and this can be seen from Figure 2(a).The Cauchy distribution in Case 2 is of such heavy tails that its mean and variance do not exist.Therefore, the errors from this distribution generally include some extreme values or severe outliers.Indeed, Figure 2(b) displays the simulated data which contains extreme values.Similarly, when the model errors are drawn from the contaminated normal distribution in Case 3, about 30% of them come from (0, 8 2 ) and may include some excessively large values.Obviously, the simulated data in Figure 2(c) display a characteristic of fat tail.In summary, when the model errors are drawn from the distribution in Case 2 or 3, the data generated by the model in (8) will include outliers.
For each group of the above coefficient functions, we conduct the simulation with the sample size  = 1000.We choose the set of the bandwidth grids as  = {ℎ  ;  = 1, 2, . . ., 50}, where each log 10 (ℎ  ) is located at the equally spaced 50 points on the interval [log 10 (0.01), log 10 (1.5)].The Gaussian kernel is used in the simulation.For each bandwidth ℎ  ∈ , the LAD estimates of the coefficients and their derivatives are obtained by solving the linear programming in (5) at equally spaced 200 points on the range (0, 1) of the covariate .Given the confidence level  = 0.05, the simultaneous bootstrap confidence interval is computed by (7) at each location (  , ℎ  ) ( = 1, 2, . . ., 200;  = 1, 2, . . ., 50) and the consequent SiZer maps are shown in Figures 2, 3, and 5.The white lines in the SiZer maps show the effective window widths, as the distance between the two white lines along the horizontal direction is 2ℎ.

Simulation Results with Analysis. Figures
We know from Figure 1(a) that the true coefficient  1 () in Group 1 is constant, and therefore the correct SiZer maps should display no significant features no matter which distribution (the normal distribution, the Cauchy distribution, or the contaminated normal distribution) the model errors are drawn from.It can be seen from the Figures 3(a), 3(c), and 3(e) that the SiZer maps of  1 () with three types of model errors show only purple color, indicating, as expected, that the coefficient is not varying.Figures 3(b), 3(d) and 3(f) display the SiZer maps of  2 () with three types of model errors.We can infer from the SiZer maps that the coefficient  2 () shows a significant increasing trend (blue) across the range of the covariate  for larger values of the bandwidth.Furthermore, it can be observed that, for smaller values of the bandwidth, the SiZer maps firstly show a significant increasing trend (blue) of  2 () for smaller values of , then no significant varying pattern (purple) for medium values of , and a significant increasing trend (blue) for the larger values of .Clearly, the significant features in each SiZer map correctly reveal the varying behaviors of a cubic function at the different scales.
It is also observed that when the model errors come from the Cauchy distribution and the contaminated normal distribution, the corresponding SiZer maps of both the coefficient functions are almost the same as those with the normally distributed model errors, which indicates that the outliers exert nearly no effects on the local LAD-based SiZer inference for the features of the coefficients.
Figure 4 shows the robust SiZer maps of the coefficients in Group 2 with the model errors drawn from the normal distribution, the Cauchy distribution and the contaminated normal distribution, respectively.We know from Figure 1(b) that  1 () is a cosine curve and  2 () is a parabola.The SiZer maps in Figures 4(a), 4(c), and 4(e) display the regular color changes with red-blue or blue-red for a quite wide range of the bandwidths.It is common knowledge that a peak on a smooth curve has the positive derivative (blue) on the left and the negative derivative (red) on the right, while the situation of a valley is reverse.Therefore, the SiZer maps of  1 () correctly shows all of the peaks and valleys of the cosine curve with all three cases of model error distribution.The SiZer maps in Figures 4(b), 4(d), and 4(f) display only one significant peak because of a significant increasing trend (blue) on the left half interval of  and a significant decreasing trend (red) on the right half interval of .These characteristics shown by the SiZer maps of  2 () are entirely consistent with the true features of the parabola.Moreover, it can also be observed that the peaks and valleys in Figure 4(b) are much fewer than those in Figure 4(a), meaning that  2 () is smoother than  1 ().Therefore, the degrees of smoothness of the coefficients can be clearly visualized by the SiZer maps.
Additionally, in order to make a comparison of the robustness between the local LAD-based SiZer approach and the local least-squares-based SiZer method suggested by Zhang and Mei [35], Figure 5 displays the local leastsquares-based SiZer maps and the families of the smooths of the coefficients in Group 2 with the model errors drawn, respectively, from the Cauchy distribution and the contaminated normal distribution.It can be seen that the SiZer maps cannot correctly reflect the features of either the cosine curve or the parabola.The smooths families in Figure 5 show that the estimated coefficient curves are seriously distorted by outliers.The reason is that outliers can inflate the estimate of the model variance and accordingly inflate the lengths of the corresponding confidence intervals.As a result, too many confidence intervals contain zero leading to the SiZer maps missing the important underlying features of coefficient functions.6(c) and 6(e) when outliers appear in the data.In addition, it can be observed that the more permanent significant characteristics along ℎ indicate the larger amplitudes of the peaks and the valleys.This can also be seen from the SiZer maps of the bimodal curve in Figures 6(b), 6(d) and 6(f), in which the significant features on the boundary areas of the maps are more permanent than those in the middle area.
A comparison with the local least-square-based SiZer method in Zhang and Mei [35] is also made in this case to demonstrate the robustness of the proposed SiZer approach.Like the situation of the second group coefficient functions, the local least-squares-based SiZer maps of the coefficient functions are seriously distorted by outliers.The corresponding SiZer maps provided in supplementary material available online at http://dx.doi.org/10.1155/2013/547874.In summary, when the data set does not include outliers, the local LAD-based SiZer approach performs as well as the local least-squares-based SiZer method in revealing the features such as monotonicity, peaks, valleys, and the degrees of smoothness of the coefficients in a varying coefficient model.When the data set does include outliers, the local least-squares-based SiZer method is very sensitive to the outliers and produces distorted SiZer maps.In contrast, the proposed robust scenario of the SiZer approach is very robust to outliers even if severe outliers exist in the data.

A Real-Data Example
In this section, we demonstrate the application of the proposed robust SiZer approach via applying the environmental data set that has been analyzed by Fan and Zhang [5,9], and Cai et al. [7] for different purposes.The data were collected in Hong Kong from January 1 1994 to December 31, 1995 and consists of daily measurements of air pollutants along with the daily number of hospital admissions for circulatory and respiratory problems.The study aims at examining the association between the levels of the pollutants and the number of daily total hospital admissions for circulatory and respiratory problems.Like the model in Fan and Zhang [5,9], we consider the following varying coefficient model: where the response variable  is the number of daily hospital admissions, the explanatory variables  2 ,  3 , and  4 are the levels of sulphur dioxide (SO 2 , in g/m 3 ), Nitrogen Dioxide (NO 2 , in g/m 3 ), and dust (in g/m 3 ), respectively, and the covariate  is the time that the observations were collected.Additionally, each level of the three pollutants is centered by their respective averages.We use the SiZer maps settings described in Section 2.2.2 and the proposed SiZer approach is performed to explore the significant features of the coefficients in the model in (13) Figure 7 shows the robust SiZer maps of the four coefficients.
The SiZer maps of the intercept  1 () in Figure 7(a) and the coefficient  2 () of the variable  2 (i.e., SO 2 ) in Figure 7(b) show a significant increasing trend (blue) for larger bandwidths across the time range.However, for smaller bandwidths, the SiZer map of  1 () displays the color changes with blue-red or red-blue meaning significant peaks and valleys, whereas the SiZer map of  2 () does not show any permanent characteristics.
The SiZer map in Figure 7(c) displays a few weak and irregular features in the coefficient  3 () of the variable  3 (i.e., NO 2 ) for medium bandwidths, as the SiZer map shows a weak decrease trend (red) on the left but an irregular increase trend (blue) at the center.For the coefficient  4 () of the variable  4 (i.e., dust), its SiZer map in Figure 7(d) shows significantly decreasing (red) characteristics across the range of  for larger values of the bandwidth and a significant increase trend (blue) for smaller values of  at medium scales.
The above findings are very similar to those obtained by the local least-squares-based SiZer method in Zhang and Mei [35] except the SiZer map of the coefficient  3 ().For intermediate bandwidths, the local least-squares-based SiZer map of  3 () displays a strong increase trend on the time range (250, 450), but this significant characteristic does not appear in the robust SiZer map in Figure 7(c), which may indicate that there are outliers in the data.The outliers can be identified, with the help of the scatter plot of standardized data of response variable  and explanatory variable  3 , which is shown in Figure 8.The smoothed curves under particularly different bandwidths will represent different information in the data and the influences of potential outliers will also be reduced.The results indicate that the robust SiZer approach is useful for mining relatively full information in data and drawing more comprehensive conclusions.

Concluding Remarks
The varying coefficient model is a useful statistical tool to explore dynamic patterns of a regression relationship, in which the variation features of the regression coefficients are taken as the main evidence to reflect the dynamic relationship between the response and the explanatory variables.However, outliers commonly exist in data and may lead to the distorted estimates of the coefficients and misleading inference on the underlying regression relationship.In this paper, we propose a robust scenario of the SiZer approach based on the local linear LAD procedure and the wild bootstrap confidence interval to uncover the genuine features such as monotonicity, peaks, valleys, and the degree of smoothness of the coefficients under different smoothing scales.The simulation study and the real environmental data analysis demonstrate that the proposed SiZer approach is very robust to outliers and has good performance in uncovering significant features of the coefficient functions in the varying coefficient model.

Figure 1 :
Figure 1: The true curves of the coefficients in (a) Group 1; (b) Group 2; and (c) Group 3. The solid curves are for  1 () and the dashed curves are for  2 ().

Figure 6
Figure 6 depicts the robust SiZer maps for the coefficients in Group 3 with normal, Cauchy, and contaminated normal model errors, respectively.It is known from Figure 1(c) that the two coefficients in Group 3 are respectively an amplitude decay function and a bimodal function.For the amplitude decay function, the robust SiZer maps in Figures 6(a), 6(c), and 6(e) precisely show the decay characteristics of the peaks and valleys from the central to the both sides of the range of , although the minimal amplitudes are almost missed in Figures 6(c) and 6(e) when outliers appear in
Figure 8 displays a cluster of outlier on the time range (350, 450).The outliers distort the smoothing curve of  3 (), and the local least-squaresbased SiZer map consequently shows the spurious increase trend.

Figure 8 :
Figure 8: The scatter plot of standardize data of response variable  and explanatory variable  3 in the environmental data.