The Oracle Inequalities on Simultaneous Lasso and Dantzig Selector in High-Dimensional Nonparametric Regression

During the last few years, a great deal of attention has been focused on Lasso and Dantzig selector in high-dimensional linear regression when the number of variables can be much larger than the sample size. Under a sparsity scenario, the authors (see, e.g., Bickel et al., 2009, Bunea et al., 2007, Candes and Tao, 2007, Candès and Tao, 2007, Donoho et al., 2006, Koltchinskii, 2009, Koltchinskii, 2009, Meinshausen and Yu, 2009, Rosenbaum and Tsybakov, 2010, Tsybakov, 2006, van de Geer, 2008, and Zhang and Huang, 2008) discussed the relations between Lasso and Dantzig selector and derived sparsity oracle inequalities for the prediction risk and bounds on the Lp estimation loss. In this paper, we point out that some of the authors overemphasize the role of some sparsity conditions, and the assumptions based on this sparsity condition may cause bad results. We give better assumptions and the methods that avoid using the sparsity condition. As a comparison with the results by Bickel et al., 2009, more precise oracle inequalities for the prediction risk and bounds on the Lp estimation loss are derived when the number of variables can be much larger than the sample size.


Introduction
During the last few years, a great deal of attention has been focused on the  1 penalized least squares (Lasso) estimator of parameters in high-dimensional linear regression when the number of variables can be much larger than the sample size (e.g., see [1][2][3][4][5][6][7][8][9][10][11][12]).Quite recently, Candes and Tao [13] have proposed the Dantzig estimate for such linear models, and other authors [1,6,[14][15][16][17][18][19][20][21][22] have discussed the Dantzig estimate and established the properties under a sparsity scenario, that is, when the number of nonzero components of the true vector of parameters is small.Lasso estimators have also been studied in the nonparametric regression setup (see [23][24][25][26]).In particular, Bunea et al. [23,24] obtain sparsity oracle inequalities for the prediction loss in this context and point out the implications for minimax estimation in classical nonparametric regression settings as well as for the problem of aggregation of estimators.Modified versions of Lasso estimators (nonquadratic terms and/or penalties slightly different from  1 ) for nonparametric regression with random design are suggested and studied under prediction loss in Koltchinskii [27] and van de Geer [28].Sparsity oracle inequalities for the Dantzig selector with random design are obtained by Koltchinskii [29].In linear fixed design regression, Meinshausen and Yu [7] establish a bound on the  2 loss for the coefficients of Lasso that are quite different from the bound on the same loss for the Dantzig selector proven in Candes and Tao [13].Bickel et al. [15] show that, under a sparsity scenario, the Lasso and the Dantzig selector exhibit similar behavior, both for linear regression and for nonparametric regression models, for  2 prediction loss, and for   loss in the coefficients for 1 ≤  ≤ 2. In the nonparametric regression model, they prove sparsity oracle inequalities for the Lasso and the Dantzig selector.Moreover, the Lasso and the Dantzig selector are approximately equivalent in terms of the prediction loss.They develop geometrical assumptions that are considerably weaker than those of Candes and Tao [13] for the Dantzig selector and Bunea et al. [23] for the Lasso.
We give the assumptions equivalent with assumptions by Bickel et al. [15] and derive oracle inequalities that are more precise than Bickel et al. 's [15] for the prediction risk in the general nonparametric regression model and bounds that are more precise than Bickel et al. 's [15] on the   estimation loss in the linear model when the number of variables can be much larger than the sample size.We begin, in the next section, by defining the Lasso and Dantzig procedures and the notation.In Section 3, we present our key three assumptions and discuss the relations between the assumptions and assumptions by Bickel et al. [15].In Section 4, we give some equivalent results and sparsity oracle inequalities for the Lasso and Dantzig estimators in the general nonparametric regression model and improve corresponding results by Bickel et al. [15].The concluding remarks are given in Section 5.

Definitions and Notations
Unless stated otherwise, all of our notations, definitions, and terminologies follow Bickel et al. [15].Let ( 1 ,  1 ), . . ., (  ,   ) be a sample of independent random pairs with where  : Z → R is an unknown regression function to be estimated, Z is a Borel subset of R  , the   's are fixed elements in Z, and the regression errors   are Gaussian.Let   = { 1 , . . .,   } be a finite dictionary of functions   : Z → R,  = 1, . . ., .We assume throughout that  ≥ 2.
Finally, for any  ≥ 1,  ≥ 2, we consider the Gram matrix and let  max denote the maximal eigenvalue of Ψ  .

Discussion of the Assumptions
Under the sparsity scenario, we are typically interested in the case where  >  and even  ≫ .Here, sparsity specifies that the high-dimensional vector  has coefficients that are mostly 0. Clearly, the matrix Ψ  is degenerate, and ordinary least squares do not work in this case, since the require positive definiteness of Ψ  .That is, min It turns out that the Lasso and Dantzig selector require much weaker assumptions.The idea by Bickel et al. [15] is that the minimum in (12) be replaced by the minimum over a restricted set of vectors, and the norm || 2 in the denominator of the condition be replaced by the  2 norm of only a part of .This is feasible.Because for the linear regression model, the residuals  = β −  and  = β −  satisfy with  0 = 1 by Candes and Tao [13] and  0 = 3 by Bickel et al. [15], respectively, where  0 > 0 and  0 = () is the set of nonzero coefficients of the true parameter  of the model; therefore, for any  satisfying (13), we have where Ψ is a positive definite matrix and Thus, we have a kind of "restricted" positive definiteness if   | 0 | is small enough.This results in the following restricted eigenvalue (RE) assumption.
Assumption RE(,  0 ) (Bickel et al. [15]).For some integer  such that 1 ≤  ≤  and a positive number  0 , the following condition holds: The purpose of giving this assumption may be in order to facilitate the use of they frequently use it in the proofs of their theorems and so do Candes and Tao [13].
Note that the role of Therefore, it is not necessary that the norm || 2 in the denominator of (12) be replaced by the  2 norm of only a part of .We give the following assumptions.
In Section 4, we will see that RE 1 (,  0 ) and RE 2 (,  0 ) are all better than (,  0 ) since they use as little as possible.Therefore, the inequalities given are more precise.

Comparisons with the Results by
Bickel et al.
In the following, we give a bound of the prediction losses when the number of nonzero components of the Lasso or the Dantzig selector is small as compared to the sample size.

Mathematical Problems in Engineering
In fact, the corresponding results have been enlarged due to the use of |   0 | 1 ≤  0 |  0 | 1 when solving the problems of Lasso and Dantzig selector.When proving sparsity oracle inequalities for the prediction loss and bounds on the   estimation loss, using again |   0 | 1 ≤  0 |  0 | 1 must be to enlarge the inequalities again and to result in reduced accuracy.