JAM Journal of Applied Mathematics 1687-0042 1110-757X Hindawi Publishing Corporation 10.1155/2014/946241 946241 Research Article Weaker Regularity Conditions and Sparse Recovery in High-Dimensional Regression http://orcid.org/0000-0002-5876-3137 Wang Shiqing 1 Shi Yan 2 Su Limin 1 Xu Yuesheng 1 College of Mathematics and Information Sciences North China University of Water Resources and Electric Power Zhengzhou 450045 China ncwu.edu.cn 2 Institute of Environmental and Municipal Engineering North China University of Water Resources and Electric Power Zhengzhou 450045 China ncwu.edu.cn 2014 1772014 2014 27 10 2013 07 07 2014 17 7 2014 2014 Copyright © 2014 Shiqing Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Regularity conditions play a pivotal role for sparse recovery in high-dimensional regression. In this paper, we present a weaker regularity condition and further discuss the relationships with other regularity conditions, such as restricted eigenvalue condition. We study the behavior of our new condition for design matrices with independent random columns uniformly drawn on the unit sphere. Moreover, the present paper shows that, under a sparsity scenario, the Lasso estimator and Dantzig selector exhibit similar behavior. Based on both methods, we derive, in parallel, more precise bounds for the estimation loss and the prediction risk in the linear regression model when the number of variables can be much larger than the sample size.

1. Introduction

In the recent years, the problems of statistical inference in high-dimensional setting, in which the dimension of the data p exceeds the sample size n, have attracted a great deal of attention. One concrete instance of a high-dimensional inference problem concerns the standard linear regression model: (1)y=Xβ+W, where XRn×p is called the design matrix, βRp is an unknown target vector, and WRn is a stochastic error term, in which the goal is to estimate a vector βRp based on response y and the vector of covariates X=(X1,,Xp). In the setting pn, the classical linear regression model is unidentifiable, so that it is not meaningful to estimate the parameter vector βRp.

However, many high-dimensional regression problems exhibit special structure that can lead to an identifiable model. In particular, sparsity in the regression vector β is an archetypal example of such structure; that is, only a few components of β are different from zero, say s-sparsity; β is then said to be s-sparsity, and there has been a great interest in the study of this problem recently. The use of the l1-norm penalty to enforce sparsity has been very successful and there have been several methods, such as the Lasso  or basis pursuit , and the Dantzig selector . Sparsity has also been exploited in a number of other questions, for instance, instrumental variable regression in the presence of endogeneity .

There is now a well-developed theory on what conditions are required on the design matrix XRn×p for such l1-based relaxations to reliably estimate β; for example, see . The restricted eigenvalue (RE) condition due to Bickel et al.  is a weaker one of the conditions mentioned above. Wang and Su [7, 13] presented some equivalent conditions with them, respectively, and there is also a large body of work in the high-dimensional setting; for example, see [3, 6, 12, 1719], which showed a uniform uncertainty principle (UUP, a condition that is stronger than the RE condition; see [10, 20]). In this paper, we consider a restricted eigenvalue condition that is weaker than the RE conditions in [7, 10, 13] under certain setting.

Thus, in the setting of high-dimensional linear regression, the interesting question is accurately estimating the regression vector β and the response Xβ from few and corrupted observations. In the standard form, under assumptions on the matrix X and with high probability, the estimation bounds are of the form Cβ0(log(p)/n)q/2 (e.g., see [7, 8, 13, 21]), and the prediction errors are bounded by Clog(p)β0 (e.g., see [1, 7, 21]), where C is a positive constant.

The main contribution of this paper is the following: we present a restricted eigenvalue assumption that is weaker than the RE conditions in previous paper under certain setting. Using the l1-norm penalty, our results are more precise than the existing ones. There is an open question that is finding a weaker assumption and obtaining better results no matter under what circumstances.

The remainder of this paper is organized as follows. We begin in Section 2 with some notations and definitions. In Section 3, we introduce some assumptions and discuss the relation between our assumptions and the existing ones. Section 4 contains our main results, and we also show the approximate equivalence between the Lasso and the Dantzig selector. We give three lemmas and the proofs of the theorems in Section 5.

2. Preliminaries

In this section, we introduce some notations and definitions.

Let a vector βRp. We denote by (2)M(β)=j=1pI{βj0}=|J(β)| the number of nonzero coordinates of β, where I{·} denotes the indicator function (3)J(β)={j{1,,p}:βj0} and |J| the cardinality of J. We use the standard notation (4)βq=(i=1p|βi|q)1/q to stand for the lq-norm of the vector of β. Moreover, a vector β is said to be k-sparse if β0k; that is, it has at most k nonzero entries. For a vector ΔRp and a subset J{1,,p}, we denote by ΔJ the vector in Rp that has the same coordinates as Δ on J and zero coordinates on the complement Jc of J.

For linear regression model (1), regularized estimation with the l1-norm penalty, also known as the Lasso  or the basis pursuit , refers to the following convex optimization problem: (5)β^argminβRp{1nXβ-y22+λβ1}, where λ>0 is a penalization parameter. The Dantzig selector has been introduced by Candes and Tao  as (6)β^argminβRpβ1subject  toXT(y-Xβ)λ, where λ>0 is a tuning parameter. It is known that it can be recast as a linear program. Hence, it is also computationally tractable.

For an integer 1sp/2 and s-sparse vector βRp, let βJ0R|J0| be a subvector of βRp confined to J0. One of the common properties of the Lasso and the Dantzig selector is that, for an appropriately chosen λ and a vector δ=β^-β, where β^ is the solution from either the Lasso or the Dantzig selector, it holds with high probability (cf. Lemmas 11 and 12): (7)δJ0c1c0δJ01, with c0=1 for the Dantzig selector by Candes and Tao  and with c0=3 for the Lasso by Bickel et al. , where c0>0 and (8)J0=J(β){1,2,,p} is the set of nonzero coefficients of the true parameter β of the model.

Finally, for any n1, p2, we consider the Gram matrix: (9)Ψn=1nXTX, where X is the designed matrix in model (1) and XTRp×n denotes the transpose matrix of X.

3. Discussion of the Assumption

Under the sparsity scenario, we are typically interested in the case where p>n, and even pn. Here, sparsity specifies that the high-dimensional vector β has coefficients that are mostly 0. Clearly, the matrix Ψn is degenerate, and ordinary least squares does not work in this case, since it requires positive definiteness of Ψn. That is, (10)minδRp,δ0Xδ2nδ2>0. It turns out that the Lasso and Dantzig selector require much weaker assumptions. The idea by Bickel et al.  is that the minimum in (10) be replaced by the minimum over a restricted set of vectors and the norm δ2 in the denominator of the condition be replaced by the l2-norm of only a part of δ. Note that the role of (7) is to restrict set of vectors {δRp:δ0} to (11){δRp:δ0,δJ0c1c0δJ01}.

Assumption 1 (RE<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M87"><mml:mo stretchy="false">(</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> (Bickel et al. [<xref ref-type="bibr" rid="B10">10</xref>])).

For some integer s such that 1sp and a positive number c0, the following condition holds: (12)κ(s,c0)minJ0{1,2,,p},|J0|sminδ0,δJ0c1c0δJ01Xδ2nδJ02>0.

Bickel et al.  showed that the bounds of estimation error and prediction error are Cβ0(log(p)/n)q/2 and Clog(p)β0, respectively, for both the Lasso and Dantzig selector, where C is a positive constant and β0 is the sparsity level. Next, we describe the REτ2(s,c0) assumption presented by Wang and Su , which is obtained by replacing δ2 by its upper bound δ1 in (10).

Assumption 2 (RE<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M99"><mml:msub><mml:mrow><mml:mi>τ</mml:mi></mml:mrow><mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> (Wang and Su [<xref ref-type="bibr" rid="B7">7</xref>])).

For some integer s such that 1sp and a positive number c0, the following condition holds: (13)τ2(s,c0)minJ0{1,2,,p},|J0|sminδ0,δJ0c1c0δJ01Xδ2nδ1>0.

The two conditions are very similar. The only difference is the l1- versus l2-norm of a part of δ in the denominator. The REτ2(s,c0) condition is equivalent to RE(s,c0); see [7, 13] for the discussion on equivalence. The results of [7, 13] are more precise for the bounds of estimation and prediction than those derived in Bickel et al.  and do not lie on the sparsity level β0.

In order to obtain our regularity condition in this paper, we decompose δ into a set of vectors δS0,δS1,δS2,,δSK, such that S0 corresponds to locations of the s largest coefficient of δ in absolute values, S1 corresponds to locations of the next s largest coefficient of δS0c in absolute values, and so on. Hence, we have S0c=k=1KSk, where K1, |Sk|=s, for allk=1,,K-1, and |SK|s.

Now for each j1, we have (14)δSj2sδSj1sδSj-11, where vector · represents the largest entry in absolute value in the vector, and hence (15)δS0c2k1δSk2s-1/2(δS01+δS11+δS21+)s-1/2(δS01+δS0c1)=s-1/2δ1.

Replacing δ1 by sδS0c2 in (13), we get the following assumption.

Assumption 3 (LR<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M129"><mml:msub><mml:mrow><mml:mi>φ</mml:mi></mml:mrow><mml:mrow><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>).

For some integer s such that 1sp and a positive number c0, the following condition holds: (16)φ1(s,c0)minJ0{1,2,,p},|J0|sminδ0,δJ0c1c0δJ01Xδ2nsδJ0c2>0.

The inequality sδS0c2δ1 immediately implies that the assumption LRφ1(s,c0) is weaker than the assumptions REτ2(s,c0) and RE(s,c0). Noting the norm δJ0c2 in the denominator of (16), it makes the proof become more complicated. We need an equivalent condition of LRφ1(s,c0) for the sake of simplicity, as similarly discussed on equivalence (cf. [7, 13]).

Assumption 4 (LR<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M140"><mml:msub><mml:mrow><mml:mi>φ</mml:mi></mml:mrow><mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>).

For some integer s such that 1sp and a positive number c0, the following condition holds: (17)φ2(s,c0)minJ0{1,2,,p},|J0|sminδ0,δJ0c1c0δJ01Xδ2snδJ01>0.

The two conditions above can be used to solve all the problems of sparse recovery in high-dimensional regression. Due to technical reasons, we only give the results when the LRφ2(s,c0) is satisfied.

4. Main Results of Sparse Recovery for Regression Model

In order to provide performance guarantees for l1-norm penalty applied to sparse linear models, it is sufficient to assume that the regularity conditions are satisfied. In this section, we show main results when the LRφ2(s,c0) is satisfied. In particular, for convenience, we assume that all the diagonal elements of the matrix XTX/n are equal to 1.

We firstly prove a type of approximate equivalence between the Lasso and the Dantzig selector. Similar results on equivalence can be found in [7, 10, 13]. It is expressed as closeness of the prediction losses Xβ-XβD22 and Xβ-XβL22 when the number of nonzero components of the Lasso or the Dantzig selector is small as compared to the sample size.

Theorem 5.

For linear model (1), let Wi~N(0,σ2) be independent random variables with σ2>0. Consider the Lasso estimator β^L and Dantzig estimator β^D defined by (5) and (6) with the same λ. If LRφ2(s,c0) is satisfied, where c0>0, then, with probability of at least 1-p1-A2/8, one has (18)|1nXβ-XβD22-1nXβ-XβL22|16λ2sφ22.

Next, we get the bounds on the rate of convergence of Lasso and Dantzig selector.

Theorem 6.

For linear model (1), let Wi~N(0,σ2) be independent random variables with σ2>0. Consider the Lasso estimator β^L defined by (5) with λ>2λ0>0. If LRφ2(s,3) is satisfied, where c0>0, then, with probability of at least 1-pexp(2λ2n/σ2), one has (19)β^L-β14λsφ22(s,c0),(20)β^L-β24λsφ22(35+1s),(21)1nX(β^L-β)22144λ225sφ22(s,c0), where φ2=φ2(s,3).

Theorem 7.

For linear model (1), let Wi~N(0,σ2) be independent random variables with σ2>0. Consider the Dantzig selector β^D defined by (6) with λ>λ0>0. If LRφ2(s,1) is satisfied, where c0>0, then, with probability of at least 1-pexp(λ2n/2σ2), one has (22)β^D-β18λsφ22,(23)β^D-β24λsφ22(1+2s),(24)1nX(β^D-β)2216λ2sφ22, where φ2=φ2(s,1).

Remark 8.

We have no conditions on the parameter λ. As in , we can rewrite λ in terms of another parameter A in order to clarify the notation: (25)λ=Aσlogpn,A>2. Then, the results of Theorems 57 are as follows: (26)|1nXβ-XβD22-1nXβ-XβL22|16A2σ2logpnφ22s,(9)β^L-β14Aσsφ22(s,3)logpn,(10)β^L-β24Aσsφ22(35+1s)logpn,(11)1nX(β^L-β)22144A225φ22(s,3)σ2logpsn,(12)β^D-β18Aσsφ22(s,1)logpn,(13)β^D-β24Aσsφ22(1+2s)logpn,(14)1nX(β^D-β)2216A2φ22(s,1)σ2logpsn. The results of Theorems 7.1 and 7.2 in Bickel et al.  are (27)β^L-β14Aσsφ22(s,3)logpn,1nX(β^L-β)22144A225φ22(s,3)σ2slogpn,β^D-β18Aσsφ22(s,1)logpn,1nX(β^D-β)2216A2φ22(s,1)σ2slogpn. Comparing the results above, our results greatly improve those in Bickel et al. .

Additionally, the similar results for Lasso can be found in Wang and Su . They are (28)β^L-β14Aτ12(s,1)σlogpn,1nX(β^L-β)22144A225τ12(s,1)σ2logpn.

It is clear that our results are more precise than those in the existing results, for example, [7, 10].

Remark 9.

The assumptions LRφ1(s,c0) and LRφ2(s,c0) are weaker than assumptions REτ2(s,c0) and RE(s,c0), since sδS0c2δ1. Note that the inequality sδS0c2δ1 holds under the setting discussed in Section 3. That is, our weaker assumptions hold under certain condition, but they cannot be considered to be better than those in previous paper at any time.

5. Lemmas and the Proofs of the Results

In this section, we give three lemmas and the proofs of the theorems.

Lemma 10.

Let Wi~N(0,σ2) be independent random variables with σ2>0. Then, for any λ0>0, (29)P(1n|XTW|λ0)pexp(λ02n2σ2).

Proof.

Since Wi~N(0,σ2), it immediately follows that (30)P(1n|XTW|λ0)jP(1n|ixi,jwi|λ0)pjP(|n-1/2σ-1ixi,jwi|λ0n(σ))pjP(|η|λ0n(σ)),pexp(λ02n2σ2), where η~N(0,1).

Lemma 11.

Let Wi~N(0,σ2) be independent random variables with σ2>0. Let β^L be the Lasso estimator defined by (5). Then, with probability of at least 1-pexp(2λ2n/σ2), one has, simultaneously for all βRp and λ>2λ0, (31)1nXβ^L-Xβ22+λj=1p|βj-β^L,j|4λjJ(β)|βj-β^L,j|,(32)1nXTX(β-β^L)3λ2.

Proof.

By the definition of β^L, (33)1nY-Xβ^L22+2λβ^L11nY-Xβ22+2λβ1 for all βRp, which is equivalent to (34)1nXβ^L-Xβ22+2λβ^L12λβ1+2nWTX(β^L-β). From Lemma 10, we have that (35)1nXβ^L-Xβ222λj=1p|βj|-2λj=1p|β^L,j|+λj=1p|βj-β^L,j| holds with probability of at least 1-pexp(2λ2n/σ2).

Adding the term j=1pλ|β^j,L-βj| to both sides of this inequality, it yields that (36)1nXβ^L-Xβ22+λj=1p|βj-β^L,j|2λj=1p|βj|-2λj=1p|β^L,j|+2λj=1p|βj-β^L,j|2j=1pλ(|βj-β^L,j|+|βj|-|β^L,j|). Now, note that (37)|β^j,L-βj|+|βj|-|β^j,L|=0 since jJ(β). So, we get that (38)1nXβ^L-Xβ22+λj=1p|βj-β^L,j|2λjJ(β)(|βj-β^L,j|+|βj|-|β^L,j|)4λjJ(β)|βj-β^L,j|.

To prove (32), it suffices to note that, from Lemma 10 and λ>2λ0, we have that (39)1nXTWλ2. Then (40)1nXTX(β-β^L)=1nXT(Y-W-Xβ^L)1nXT(Y-Xβ^L)+1nXTWλ+λ2=3λ2.

Lemma 12.

Let βRpsatisfy the Dantzig constraint (41)1nXTX(β^D-β)2λ and set δ=β^D-β, J0=J(β). Then (42)δJ0c1δJ01. Further, let the assumptions of Lemma 11 be satisfied. Then, with probability of at least 1-pexp(λ2n/2σ2), one has, for λ>λ0, (43)1nXT(Xβ-Xβ^D)2λ.

Proof.

Inequality (42) immediately follows from the definition of Dantzig selector.

Next, we prove (43). From Lemma 10 and analogously to (32), using the definition of Dantzig selector, we get that (44)1nXTX(β-β^D)=1nXT(Y-W-Xβ^D)1nXT(Y-Xβ^D)+1nXTWλ+λ=2λ.

Proof of Theorem <xref ref-type="statement" rid="thm4.1">5</xref>.

Set δ=β^L-β^D. We start the calculation by simple matrix equality: (45)Xβ-Xβ^D22-Xβ-Xβ^L22=2(β^L-β^D)TXT(Xβ-Xβ^D)-X(β^L-β^D)222δ1XT(Xβ-Xβ^D)-Xδ224nλδ1-Xδ22, where the last inequality holds with probability of at least 1-pexp(λ2n/2σ2) from (43).

By assumption LRφ2(s,1) and (42), we get that (46)Xβ-Xβ^D22-Xβ-Xβ^L228nλδJ1-φ22sδJ1216n2λ2φ22s. From (32), a nearly identical argument yields that (47)Xβ-Xβ^L22-Xβ-Xβ^D222δ1XTX(β-β^L)-X(β-β^L)22(48)3nλδ1-Xδ22(49)9n2λ2φ22s.

This theorem follows from (46) and (49).

Proof of Theorem <xref ref-type="statement" rid="thm4.2">6</xref>.

Set δ=β^L-β. Using (31) with probability of at least 1-pexp(2λ2n/σ2), (50)1nXδ224λδJ01-λδ1. From (48), we have (51)2nXδ223λδ1. Then (52)1nXδ22125λδJ01. By assumption LRφ2(s,3), we obtain that (53)φ22sδJ0121nXδ22125λδJ01, where φ2=φ2(s,3). Thus, (54)δJ0112λ5sφ22(s,c0),(55)1nXδ22144λ25sφ22(s,c0). From (50), we have that (56)λδ14λδJ01-1nXδ224λ2sφ22(s,c0). Thus, (57)δ14λsφ22(s,c0). Inequalities (55) and (57) coincide with (19) and (21), respectively.

Finally, to prove (20) we decompose δ into a set of vectors δJ0,δJ1,δJ2,,δJK, such that J0 corresponds to locations of the s largest coefficient of δ in absolute values, J1 corresponds to locations of the next s largest coefficient of δJ0c in absolute values, and so on. Hence we have that J0c=k=1KJk, where K1, |Jk|=s, for all k=1,,K-1, and |JK|s.

It immediately follows that (58)δJ0c21sδ1. On the other hand, from (54), we have that (59)δJ02δJ0112λ5sφ22(s,c0). Therefore, (60)δ2δJ02+δJ0c212λ5sφ22(s,c0)+1sδ112λ5sφ22(s,c0)+4λssφ22(s,c0)4λsφ22(s,c0)(35+1s), and the theorem follows.

Proof of Theorem <xref ref-type="statement" rid="thm4.3">7</xref>.

Set δ=β^D-β. Using (42) and (43), with probability of at least 1-pexp(λ2n/2σ2), we have that (61)1nXδ22=1nδTXTXδ1nXTXδδ12r(δJ01+δJ0c1)4rδJ01. From assumption LRφ2(s,1), we get that (62)1nXδ22sφ22δJ012, where φ2=φ2(s,1). This and (61) yield that (63)1nXδ2216λ2sφ22,δJ014λsφ22. The first inequality in (63) implies (24). Next, (22) is straightforward in view of the second inequality in (63) and of relation (42). The proof of (23) follows from (20) in Theorem 6. From (22) and (58), we get that (64)δJ0c21s8λsφ22. Then (65)δ2δJ02+δJ0c24λsφ22+1s8λsφ224λsφ22(1+2s), where the second inequality holds from the second inequality in (63) and the inequality δJ02δJ01.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Tibshirani R. Regression shrinkage and selection via the lasso Journal of the Royal Statistical Society B 1996 58 1 267 288 MR1379242 Chen S. S. Donoho D. L. Saunders M. A. Atomic decomposition by basis pursuit SIAM Journal on Scientific Computing 1998 20 1 33 61 10.1137/S1064827596304010 MR1639094 2-s2.0-0032131292 Candes E. Tao T. The Dantzig selector: statistical estimation when p is much larger than n The Annals of Statistics 2007 35 6 2313 2351 10.1214/009053606000001523 MR2382644 2-s2.0-34548275795 Gautier E. Tsybakov A. B. High-dimensional instrumental variables regression and confidence sets Working Paper 2011 Bühlmann P. van de Geer S. Statistics for High-Dimensional Data 2011 New York, NY, USA Springer Springer Series in Statistics 10.1007/978-3-642-20192-9 MR2807761 Candes E. J. Romberg J. K. Tao T. Stable signal recovery from incomplete and inaccurate measurements Communications on Pure and Applied Mathematics 2006 59 8 1207 1223 10.1002/cpa.20124 MR2230846 2-s2.0-33745604236 Wang S. Q. Su L. M. The oracle inequalities on simultaneous Lasso and Dantzig selector in high-dimensional nonparametric regression Mathematical Problems in Engineering 2013 2013 6 571361 MR3063029 10.1155/2013/571361 van de Geer S. A. Buhlmann P. On the conditions used to prove oracle results for the Lasso Electronic Journal of Statistics 2009 3 1360 1392 10.1214/09-EJS506 MR2576316 Wang S. Q. Su L. M. Recovery of high-dimensional spares signals via l1-minimization Journal of Applied Mathematics 2013 2013 6 636094 MR3090603 10.1155/2013/636094 Bickel P. J. Ritov Y. Tsybakov A. B. Simultaneous analysis of lasso and Dantzig selector The Annals of Statistics 2009 37 4 1705 1732 10.1214/08-AOS620 MR2533469 2-s2.0-68649086910 van de Geer S. The Deterministic Lasso 2007 Zürich, Switzerland Seminar für Statistik, Eidgenössische Technische Hochschule (ETH) Candes E. J. Tao T. Decoding by linear programming IEEE Transactions on Information Theory 2005 51 12 4203 4215 10.1109/TIT.2005.858979 MR2243152 2-s2.0-29144439194 Wang S. Q. Su L. M. Simultaneous lasso and dantzig selector in high dimensional nonparametric regression International Journal of Applied Mathematics and Statistics 2013 42 12 103 118 2-s2.0-84882799891 Wang S. Q. Su L. M. New bounds of mutual incoherence property on sparse signals recovery International Journal of Applied Mathematics and Statistics 2013 47 17 462 477 Alquier P. Hebiri M. Generalization of L1 constraints for high dimensional regression problems Statistics and Probability Letters 2011 81 12 1760 1765 10.1016/j.spl.2011.07.011 2-s2.0-80053648228 Zhao P. Yu B. On model selection consistency of Lasso The Journal of Machine Learning Research 2006 7 12 2541 2563 MR2274449 2-s2.0-33845263263 Adamczak R. Litvak A. E. Pajor A. Tomczak-Jaegermann N. Restricted isometry property of matrices with independent columns and neighborly polytopes by random sampling Constructive Approximation 2011 34 1 61 88 10.1007/s00365-010-9117-4 MR2796091 2-s2.0-79956267709 Mendelson S. Pajor A. Tomczak-Jaegermann N. Uniform uncertainty principle for Bernoulli and subgaussian ensembles Constructive Approximation 2008 28 3 277 289 10.1007/s00365-007-9005-8 MR2453368 2-s2.0-55649100730 Baraniuk R. G. DeVore R. A. Davenport M. B. A simple proof of the restricted isometry property for random matrices Constructive Approximation 2008 28 3 253 263 10.1007/s00365-007-9003-x MR2453366 2-s2.0-55649115527 Koltchinskii V. The Dantzig selector and sparsity oracle inequalities Bernoulli 2009 15 3 799 828 10.3150/09-BEJ187 MR2555200 2-s2.0-72249100613 de Castro Y. A remark on the lasso and the Dantzig selector Statistics and Probability Letters 2013 83 1 304 314 10.1016/j.spl.2012.09.020 ZBL06130798 2-s2.0-84867136418