Least Absolute Deviation Estimate for Functional Coefficient Partially Linear Regression Models

The functional coefficient partially linear regression model is a useful generalization of the nonparametric model, partial linear model, and varying coefficient model. In this paper, the local linear technique and the L1 method are employed to estimate all the functions in the functional coefficient partially linear regression model. The asymptotic properties of the proposed estimators are studied. Simulation studies are conducted to show the validity of the estimate procedure.


Introduction
In this paper, we are concerned with a functional coefficient partially linear regression model FCPLR , that is, where X and U are random explanatory variables, Z Z 1 , . . ., Z p is a random vector, and a j • is some measurable function from R to R for j 0, . . ., p.We call a 0 • the intercept function, and {a j • }, j 1, . . ., p, the coefficient functions.As usual, ε denotes the errors with zero-mean and fixed variance.
The FCPLR model, first introduced by Wong et al. 1 , is a generalization of the nonparametric model, partial linear model, and varying coefficient model.Zhu et al. 2 studied a similar functional coefficient model, a functional mixed model, using a new Bayesian method.Model 1.1 reduces to a varying coefficient regression mode if the intercept function a 0 • is a constant function and a partially linear regression model when the coefficient functions {a j • } p j 1 are constants.Many researchers, for example, Aneiros-Pérez and Vieu 3 , made contributions to studying this kind of model.When X and U in model 1.1 coincide, the model becomes a semiparametric varying coefficient model discussed by Ahmad et al. 4 .Since the FCPLR model combines the nonparametric and functional coefficient regression model, its flexibility makes it attractable in various regression problems 1 .
Statistical inference for the FCPLR model mainly includes the estimations of the intercept function a 0 • and the coefficient functions {a j • } p j 1 .To estimate the unknown functions in the nonparametric/semiparametric regression models, many statistical inference methods have been proposed over the past decades, such as the kernel estimate method 5-7 , spline smoothing 8, 9 , and two-step estimation method 10, 11 .Wong et al. 1 employed local linear regression method and integrated method to give the initial estimates of all functions in the FCPLR model.All the papers mentioned above used the least-squares technique to obtain the estimators of the unknown coefficient functions.The least-squares estimators L 2 method , of course, have some good properties, especially for the normal random errors case.It is well known that, however, the least-squares method will perform poor when the random errors have a heavy-tailed distribution in that it is highly sensitive to extreme values and outliers.This motivates us to find more robust estimation methods instead of the aforementioned inference methods for model 1.1 .
Local linear approximation is a good method for nonparametric regression problems 12 , and the L 1 method based on the least absolute deviations overcomes the sensitivity caused by outliers.As noted in Wang and Scott 13 and Fan and Gijbels 12 , among many robust estimation methods, the L 1 method based on the local least absolute deviations behaves quite well.In this paper, we adopt the L 1 method, accompany with the local linear technique and the integrated method to estimate all the unknown functions in model 1.1 .Furthermore, the estimating problem can be reduced to a linear programming problem, and the numerical solutions are obtained quickly by some available softwares subsequently e.g., Matlab is very useful for this kind of problems .The main difficulty encountered in the proof of the asymptotic normalities is that the L 1 estimates have no closed form.This paper shows the asymptotic normalities of L 1 estimators through a method completely different from those based on the L 2 method, and the simulation results show that the L 1 method is a robust method indeed.
The rest of this paper is organized as follows.In Section 2, we describe the estimation method and the associated bandwidth selection procedure.Section 3 gives the the asymptotic theories of the estimators.Simulation studies are conducted in Section 4. A real application is given in Section 5. Section 6 gives the proofs of the main results.

Least Absolute Deviation Estimate
This section gives the main idea of the proposed estimation method; that is, local linear polynomials are used to approximate the nonparametric function and the functional coefficients, and the least absolute deviation technique is used to find the best approximation.Bandwidth selection technique is also discussed in this section.Throughout this paper, we suppose that { Y i , X i , U i , Z i } n i 1 is an i.i.d.sample from model 1.1 and assume the following conditions.
5 All functions a j u , j 1, . . ., p and a 0 x are twice continuously differentiable in neighborhoods of u 0 and x 0 , respectively.
To simplify typesetting, we introduce the following symbols:

Local Linear Estimate Based on Least Absolute Deviation
The main idea is to approximate the functional coefficients a j • by linear functions for j 0, . . ., p, that is, a 0 x can be approximated by for x in a neighborhood of x 0 within the closed support of X, and a j u by for u in a neighborhood of u 0 within the closed support of U.For simplicity, denoting a 0 x 0 , a 0 x 0 as a 0 , b 0 , and a j u 0 , a j u 0 as a j , b j for j 1, . . ., p.The local linear least absolute deviation estimate L 1 estimate of the unknown parameters a 0 , b 0 , a : a 1 , . . ., a p and b : b 1 , . . ., b p , denoted, respectively, by a 0 , b 0 , a, b, is the optimal solution of the minimization problem as follows: where 1, 2 are the given kernel functions and h is the chosen bandwidth.The optimization problem is equivalent to the following linear programming problem:

2.5
There are many algorithms available for the optimal solution of problem 2.5 ; for example, the feasible direction method can be directly used to compute the optimal solution 14 , and the numerical solution of 2.5 can be quickly computed by a series of Matlab functions.By the integrated method 1 , the estimator of the intercept function a 0 x is defined by and the estimators of the coefficient functions a j u 0 , j 1, . . ., p are defined by a j X k , u 0 , j 1, . . ., p.

2.7
We focus our main task in establishing the asymptotic distributions of the estimators a 0 x 0 and a j u 0 for j 1, . . ., p.

Selection of Bandwidth
It is well known that the choice of the bandwidth strongly influences the adequacy of the estimators.We use an automatic bandwidth choice procedure in this paper; that is, the absolute cross-validation ACV method, and the ACV bandwidth h ACV is defined as

Asymptotic Theory
This section gives the asymptotic distribution theories of the estimators.Using Taylor's expansion, for where ξ i is between x 0 and X i , and η ij is between U i and u 0 .Let a u 0 a 1 u 0 , . . ., a p u 0 , b u 0 a 1 u 0 , . . ., a p u 0 , c u 0 a 1 u 0 , . . ., a p u 0 , then we have The aim of this paper is to study the asymptotic behavior of √ nh a 0 − a 0 x 0 and √ nh a − a u 0 .Combining some technique reasons, we first introduce the new variables where S n is the objective function of the equality above and sgn • is the sign function.
Since the L 1 estimators have no closed forms, we first give the limit form of the function F n , which is critical to obtain the asymptotic properties of the estimators.Theorem 3.1.Suppose Assumptions 1 -7 hold, and n → ∞, then for any fixed α 0 , β 0 , α, β, F n converges to F α 0 , β 0 , α, β , which is defined as • are symmetric about zero and Lipschitz continuous, the limit form of F α 0 , β 0 , α, β can be simplified as

3.6
Now we are in the position to state the asymptotic properties of the estimators.
Theorem 3.3.Suppose Assumptions 1 -7 hold and n → ∞, then one has where • are symmetric about zero and Lipschitz continuous, the results of Theorem 3.3 can be simplified as

3.10
Remark 3.5.Here, we have considered estimation method and asymptotic distributions for the case that two bandwidths are same.It is important to note that similar asymptotic theories can be obtained for the case that two bandwidths with the same order are different.
Remark 3.6.If we consider different bandwidth h 1 , h 2 for kernel functions K 1 and K 2 .Furthermore, suppose the Assumptions 2 and 7 are replaced by , nh 1 h 2 } , respectively.Similar results also will be obtained except that all the second-order derivatives in the results will disappear.
Remark 3.7.This paper restricts the study to one-dimensional variable X.The ideas used here can be adapted to higher dimensional variable X, for example, consider d-dimensional variable X for the case of same bandwidths.Similar asymptotic distribution results for √ nh d a x 0 − a x 0 and √ nh a u 0 − a u 0 can be obtained under the assumptions with Assumptions 2 and 7 being replaced by h ∼ n −1/ 5 d and max Z i o p n 2/ 5 d , respectively.

Simulations
In this section, we carry out some simulations to illustrate the performance of L 1 -method, and compare the performance of our L 1 -method with that of the L 2 -method.All the following simulations are conducted for sample size n 100.
The following example is considered: , the marginal distributions of Z 1 and Z 2 are standard normal, ε ∼ N 0, 0.2 2 , and ε, X, U and Z 1 , Z 2 are mutually independent.
In each simulation, the L 1 estimators of a 0 x , a 1 u , a 2 u were computed by solving the minimization problem 2.5 and using the integrated method described in 2.6 and 2.7 .We use the Epanechnikov kernel, K u All bandwidths in a model are selected by the method proposed in Section 2.
To evaluate the asymptotic results given in Theorem 3.3, the quantile-quantile plots of the estimators are constructed.Figure 1 presents the quantile-quantile plots for a 0 0 , a 1 0 , a 2 0 with sample size n 100 and 100 replications, respectively, and these plots reveal that the asymptotic approximation is reasonable.
Figure 2 displays the true function curves of a 0 x , a 1 u , and a 2 u and their estimated curves with sample size n 100 and one replication.We can see from the figure that the L 1 estimates perform well.
In order to illustrate that the L 1 method is a robust method.Figure 3 displays the estimated curves with four outliers, that is, −0.4538, −0.4402, −10 , 0.13377, 0.1393, 20 , 0.2703, 0.3085, 20 , 0.4174, 0.4398, 18 for the three element array x, u, y .From Figure 3, we can see that the L 1 estimate also has a good performance even in the presence of four large singular points of Y .The fact that outliers have little influence on the L 1 estimates is displayed in Figure 3, so it is a robust method.
By solving the following minimization problem: we can obtain similarly the L 2 estimators of the functions a 0 x , a 1 u , a 2 u by the equations 2.6 and 2.7 .For comparing the L 1 -method with the L 2 method.We simulated the function a 0 x by L 2 method and display the fitted curves with without outliers for sample sizes n 100 and 1000 replications in Figure 4. We can see that L 2 estimate cannot perform well for the data sparsity and singularity, and three small outlying data points make the estimated curve by the L 2 method deviate from the true curve significantly.Combining Figures 2, 3, and 4, we conclude that the L 1 method performs better than the L 2 method, the L 1 method is a robust method.
Finally, for further comparing the L 1 -estimate with the L 2 -estimate method, we also assess their performance via the weighted average squared error WASE , which is defined as where range a k for k 0, 1, 2 are the ranges of the functions a 0 x , a 1 u , and a 2 u .The weights are introduced to account for the different scales of the functions.We conducted 200 replications with sample size n 100.For the bandwidths and the Epanechnikov kernels used in the simulations, we obtain the mean and standard deviation of the WASE are 0.1201 and 0.0183 for the L 1 method, and 1.6613 and 0.8936 for the L 2 method.We can see that L 1 method outperforms L 2 method.

Application
A real data is analyzed by the proposed L 1 -method in this section.The classic gas furnace data was studied recently by Wong et   In the proposed method, Epanechnikov kernel is used and all the bandwidths h j j 0, . . ., 4 are selected as 0.14 via cross validation for simplicity.The mean absolute error MAE and the mean squared error MSE for fitting and forecasting are listed as follows.
Fitting: MAE 0.3189, MSE 0.1584; forecasting: MAE 0.3705, MSE 0.2127.Since the model is chosen based on the L 2 errors, the results are not perfect as that in Wong et al. 1 .Moreover, the data set does not contain obvious outliers, so the advantage of the L 1 estimation method is not apparent.Compared to the results showed in Wong et 5.These results indicate that the estimated results are reasonable.

Proofs of the Main Results
Before completing the proofs of the main results, we give the following useful lemma first.Lemma 6.1.Suppose Assumptions 1 -7 hold, then for any fixed α 0 , β 0 , α, β, R n converges to 0 in probability, that is, R n o p 1 as n → ∞.

6.3
It can be easily seen that Hence we have where I • is the indicator function.For any > 0, δ > 0, we have Since E sgn ε i 0, we have Journal of Probability and Statistics here L n n i 1 L ni .Combining 6.2 and 6.4 , we have when δ → 0. Combining 6.2 , 6.5 , and the argument E I {d n <δ} R 2 n → 0 as δ → 0 and n → ∞, the desired conclusion follows by using the Chebyshev's inequality.This completes the proof of Lemma 6.1.

6.8
By Lemma 6.1, under Assumptions 1 − 7 , A X i , U i , Z i and B X i , U i , Z i converge to zero as n → ∞.
Start from the equality, 6.9 Journal of Probability and Statistics 15 We first give the limit form of F n , for fixed α 0 , β 0 , α, β, we have

6.10
By the Integral Mean Value Theorem refer to the Appendix , we have where G •|• is the conditional probability distribution function of ε, and τ 1 , τ 2 , τ 3 , τ 4 converge to zero as δ → 0 and n → ∞.Then by Assumptions 1 -5 , for any small enough δ > 0, we have Note that, for any fixed α 0 , β 0 , α, β,d n → 0 as n → ∞, we draw the desired conclusion.This completes the proof of Theorem 3.1.
Proof of Theorem 3.3.Since the proofs of 3.7 and 3.8 are quite similar, we only give the proof of 3.8 .
Start from the equality

6.16
We obtain that L n is bounded in probability for any fixed α 0 , β 0 , α, β.Thus, for any fixed α 0 , β 0 , α, β, the random convex function S n L n converges to F α 0 , β 0 , α, β .According to the convexity lemma 17 , we can deduce that, for any compact set K, when δ → 0. By the proof of Theorem 2 in Wang 18 , we obtain that the "limit" here is not only in the sense of the limit of a sequence of random variables, but also in the sense of the limit of a sequence of stochastic process, and the minimizer of S n converges to the minimizers of F α 0 , β 0 , α, β − L n .

Journal of Probability and Statistics
By the convexity of the function F α 0 , β 0 , α, β , we have .

6.19
By interchanging summation signs and noting that A 1 can be rewritten as .

6.22
By Assumptions 1 -7 , and using the methods used in the proof of Theorem 1 in Tang and Wang 19 , we can easily check that the Lindeberg-Feller's condition for t A 1 is held.So Cramer-Wold theorem tells us 3.8 follows.This completes the proof of Theorem 3.3.

Appendix Integral Mean Value Theorem
If f and g are continuous functions defined on a, b , and g x ≥ 0 or g x ≤ 0 , then there exists a number ξ ∈ a, b such that

and β √ nh 2 b
− b u 0 h 16 and form a new equivalent problem as follows:

Figure 2 :
Figure 2: Simulation results, for example.Dotted curves are L 1 estimators; solid curve are the true curves.

Figure 3 :
Figure 3: Simulation results, for example, with outliers.Dotted curves are L 1 estimators; solid curve are the true curves.
fit the data.The first 250 samples are used to establish the model, and the remained 46 samples are used for prediction.
Z k .According to Wang and Scott 13 , this bandwidth is better than cross-validation CV bandwidth h CV .The latter one is suggested by Rice and Silverman 15 and often used in curve regression as in Hoover et al. 6 and Wu et al. 7 .
al. 1 , the difference between the fitting MAE/MSE and the forecasting MAE/MSE is small.
b Figure 5: a Fitted values of O t t 5, . .., 250 for gas data; b Predictive values O t t 251, . .., 296 for gas data.Dotted lines are fitted or predicted; solid lines are observed.The reason is that L 1 -methods are employed in our method.The fitted values { O t }