We introduce the Q-lasso which generalizes the well-known lasso of Tibshirani (1996)
with Q a closed convex subset of a Euclidean m-space for some integer m≥1. This set Q can be interpreted as the set of errors within given tolerance level when linear measurements
are taken to recover a signal/image via the lasso. Solutions of the Q-lasso depend on a tuning parameter γ. In this paper, we obtain basic properties of the solutions as a function of γ. Because of ill posedness, we also apply l1-l2 regularization to the Q-lasso. In addition, we discuss iterative methods for solving the Q-lasso which include the proximal-gradient algorithm and the projection-gradient algorithm.

1. Introduction

The lasso of Tibshirani [1] is the minimization problem:
(1)minx∈ℝn12∥Ax-b∥22+γ∥x∥1,
where A is an m×n (real) matrix, b∈ℝm, and γ>0 is a tuning parameter. It is equivalent to the basis pursuit (BP) of Chen et al. [2]:
(2)minx∈ℝn∥x∥1subjecttoAx=b.
It is well known that both lasso and BP model a number of applied problems arising from machine learning, signal/image processing, and statistics, due to the fact that they promote the sparsity of a signal x∈ℝn. Sparsity is popular phenomenon that occurs in practical problems since a solution may have a sparse representation in terms of an appropriate basis and therefore has been paid much attention.

Observe that both the lasso (1) and BP (2) can be viewed as the ℓ1 regularization applied to the inverse linear system in ℝn:
(3)Ax=b.
In sparse recovery, the system (3) is underdetermined (i.e., m<n and often m≪n indeed). The theory of compressed sensing of Donoho [3] and Candès et al. [4, 5] makes a breakthrough that under certain conditions the underdetermined system (3) can determine a unique k-sparse solution. (Recall that a signal x∈ℝn is said to be k-sparse if the number of nonzero entries of x is no bigger than k.)

However, due to errors of measurements, the system (3) is actually inexact: Ax≈b. It turns out that the BP (2) is reformulated as
(4)minx∈ℝn∥x∥1subjectto∥Ax-b∥≤ε,
where ε>0 is the tolerance level of errors and ∥·∥ is a norm on ℝn (often it is the ℓp norm ∥·∥p for p=1,2,∞; a solution to (4) when the tolerance is measured by the ℓ∞ norm ∥·∥∞ is known as the Dantzig selector by Candès and Tao [6]; see also [7]).

Note that if we let Q:=Bε(b) be the closed ball in ℝm around b and with radius of ε, then (4) is rewritten as
(5)minx∈ℝn∥x∥1subjecttoAx∈Q.

Let now Q be a nonempty closed convex subset of ℝm and let PQ be the projection from ℝm onto Q. Then noticing the condition Ax∈Q being equivalent to the condition Ax-PQ(Ax)=0, we see that the problem (5) is solved via
(6)minx∈ℝn∥x∥1subjectto(I-PQ)Ax=0.
Applying the Lagrangian method, we arrive at the following equivalent minimization:
(7)minx∈ℝn12∥(I-PQ)Ax∥22+γ∥x∥1,
where γ>0 is a Lagrangian multiplier (also interpreted as a regularization parameter).

Alternatively, we may view (7) as the ℓ1 regularization of the inclusion
(8)Ax∈Q(equivalently,theequation(I-PQ)Ax=0)
which extends the linear system (3) in an obvious way. We refer to the problem (7) as the Q-lasso since it is the ℓ1 regularization of inclusion (8) as lasso (1) is the ℓ1 regularization of the linear system (3). Throughout the rest of this paper, we always assume that (8) is consistent (i.e., solvable).

Q-lasso (7) is also connected with the so-called split feasibility problem (SFP) of Censor and Elfving [8] (see also [9]) which is stated as finding a point x with the property
(9)x∈C,Ax∈Q,
where C and Q are closed convex subsets of ℝn and ℝm, respectively. An equivalent minimization formulation of the SFP (9) is given as
(10)minx∈C12∥Ax-PQAx∥22.
Its ℓ1 regularization is given as the minimization
(11)minx∈C12∥Ax-PQAx∥22+γ∥x∥1,
where γ>0 is a regularization parameter. Problem (7) is a special case of (11) when the set of constraints, C, is taken to be the entire space ℝn.

The purpose of this paper is to study the behavior, in terms of γ>0, of solutions to the regularized problem (7). (We leave the more general problem (11) to further work, due to the fact that the involvement of another closed convex set C brings some technical difficulties which are not easy to overcome.) We discuss iterative methods for solving the Q-lasso, including the proximal-gradient method and the projection-gradient method, the latter being derived via a duality technique. Due to ill posedness, we also apply the ℓ1-ℓ2 regularization to the Q-lasso.

2. Preliminaries

Let n≥1 be an integer and let ℝn be the Euclidean n-space. If p≥1, we use ∥·∥p to denote the ℓp norm on ℝn. Namely, for x=(xj)t∈ℝn,
(12)∥x∥p=(∑j=1n|xj|p)1/p(1≤p<∞),∥x∥∞=max1≤j≤n|xj|.
Let K be a closed convex subset of ℝn. Recall that the projection from ℝn to K is defined as the operator
(13)PK(x)=argminu∈K∥x-u∥22,x∈ℝn.
The projection PK is characterized as follows:
(14)givenx∈ℝnandz∈K:z=PKx⟺〈x-z,y-z〉≤0,x∈ℝnandz∈K:z=PKx⟺〈x-z,y-z〉≤0y∈K.
Projections are nonexpansive. Namely, we have the following.

Proposition 1.

One has that PK is firmly nonexpansive in the sense that
(15)〈x-y,PKx-PKy〉≥∥PKx-PKy∥2,x,y∈ℝn.
In particular, PK is nonexpansive; that is, ∥PKx-PKy∥≤∥x-y∥ for all x,y∈ℝn.

Recall that function f:ℝn→ℝ is convex if
(16)f((1-λ)x+λy)≤(1-λ)f(x)+λf(y)
for all λ∈(0,1) and x,y∈ℝn. (Note that we only consider finite-valued functions.)

The subdifferential of a convex function f is defined as the operator ∂f given by
(17)∂f(x)={ξ∈ℝn:f(y)≥f(x)+〈ξ,y-x〉,y∈ℝn}.
The inequality in (17) is referred to as the subdifferential inequality of f at x. We say that f is subdifferentiable at x if ∂f(x) is nonempty. It is well known that, for an everywhere finite-valued convex function f on ℝn, f is everywhere subdifferentiable.

Examples. (i) If f(x)=|x| for x∈ℝ, then ∂f(0)=[-1,1]; (ii) of f(x)=∥x∥1 for x∈ℝn, then ∂f(x) is given componentwise by
(18)(∂f(x))j={sgn(xj),ifxj≠0,ξj∈[-1,1],ifxj=0,1≤j≤n.
Here sgn is the sign function; that is, for a∈ℝ,
(19)sgn(a)={1,ifa>0,0,ifa=0,-1,ifa<0.

Consider the unconstrained minimization problem
(20)minx∈ℝnf(x).
The following are well known.

Proposition 2.

Let f be everywhere finite-valued on ℝn.

If f is strictly convex, then (20) admits at most one solution.

If f is convex and satisfies the coercivity condition(21)∥x∥⟶∞⟹f(x)⟶∞,

then there exists at least one solution to (20). Therefore, if f is both strictly convex and coercive, there exists one and only one solution to (20).

Proposition 3.

Let f be everywhere finite-valued convex on ℝn and z∈ℝn. Suppose f is bounded below (i.e., inf{f(x):x∈ℝn}>-∞). Then z is a solution to minimization (20) if and only if it satisfies the first-order optimality condition:
(22)0∈∂f(z).

3. Properties of the <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M137"><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:math></inline-formula>-Lasso

We study some basic properties of the Q-lasso which is repeated below
(23)minx∈ℝnφγ(x)∶=12∥Ax-PQAx∥22+γ∥x∥1,
where γ>0 is a regularization parameter. We also consider the following minimization (we call it Q-least squares problem):
(24)minx∈ℝn12∥Ax-PQAx∥22.
Denote by S and Sγ the solution sets of (24) and (23), respectively. Since φγ is continuous, convex, and coercive (i.e., φγ(x)→∞ as ∥x∥2→∞), Sγ is closed, convex, and nonempty. Notice also that since we assume the consistency of (8), we have S≠∅; moreover, the solution sets of (8) and (24) coincide.

Observe that the assumption that S≠∅ actually implies that Sγ is uniformly bounded in γ>0, as shown by the lemma below.

Lemma 4.

Assume that (24) is consistent (i.e., S≠∅). Then, for γ>0 and xγ∈Sγ, one has ∥xγ∥1≤infx∈S∥x∥1.

Proof.

Let xγ∈Sγ. In the relation
(25)12∥(I-PQ)Axγ∥22+γ∥xγ∥1≤12∥(I-PQ)Ax∥22+γ∥x∥1,x∈ℝn,
taking x∈S yields (for PQx∈Q)
(26)12∥(I-PQ)Axγ∥22+γ∥xγ∥1≤γ∥x∥1,x∈S.
It follows that
(27)∥xγ∥1≤∥x∥1,x∈S.
This proves the conclusion of the lemma.

Proposition 5.

One has the following.

The functions(28)ρ(γ)∶=∥xγ∥1,η(γ)∶=12∥(I-PQ)Axγ∥22

are well defined for γ>0. That is, they do not depend upon particular choice of xγ∈Sγ.

The function ρ(γ) is decreasing in γ>0.

The function η(γ) is increasing in γ>0.

(I-PQ)Axγ is continuous in γ>0.

Proof.

For xγ∈Sγ, we have the optimality condition:
(29)0∈∂φγ(xγ)=At(I-PQ)Axγ+γ∂∥xγ∥1.
Here At is the transpose of A and ∂ stands for the subdifferential in the sense of convex analysis. Equivalently,
(30)-1γAt(I-PQ)Axγ∈∂∥xγ∥1.
It follows by the subdifferential inequality that
(31)γ∥x∥1≥γ∥xγ∥1-〈At(I-PQ)Axγ,x-xγ〉,γ∥xγ∥1-〈At(I-PQ)Axγ,x-xγ〉∀x∈ℝn.
In particular, for x^γ∈Sγ,
(32)γ∥x^γ∥1≥γ∥xγ∥1-〈At(I-PQ)Axγ,x^γ-xγ〉.
Interchange xγ and x^γ to get
(33)γ∥xγ∥1≥γ∥x^γ∥1-〈At(I-PQ)Ax^γ,xγ-x^γ〉.
Adding up (32) and (33) yields
(34)0≥〈Ax^γ-Axγ,Ax^γ-Axγ〉=∥Ax^γ-Axγ∥22.
Consequently, Ax^γ=Axγ. Moreover, (32) and (33) imply that ∥x^γ∥1≥∥xγ∥1 and ∥xγ∥1≥∥x^γ∥1, respectively. Hence ∥x^γ∥1=∥xγ∥1, and it follows that the functions
(35)ρ(γ)∶=∥xγ∥1,η(γ)∶=12∥(I-PQ)Axγ∥22∥xγ∥1,η(γ)∶=12∥(I-PQ)Axγ∥22(xγ∈Sγ)
are well defined for γ>0.

Now substituting xβ∈Sβ for x in (31), we get
(36)γ∥xβ∥1≥γ∥xγ∥1-〈At(I-PQ)Axγ,xβ-xγ〉.
Interchange γ and β and xγ and xβ to find
(37)β∥xγ∥1≥β∥xβ∥1-〈At(I-PQ)Axβ,xγ-xβ〉.
Adding up (36) and (37) and using the fact that (I-PQ) is firmly nonexpansive, we deduce that
(38)(γ-β)(∥xβ∥1-∥xγ∥1)≥〈(I-PQ)Axγ-(I-PQ)Axβ,Axγ-Axγ〉≥∥(I-PQ)Axγ-(I-PQ)Axβ∥22.
We therefore find that if γ>β, then ∥xβ∥1≥∥xγ∥1. This proves that ρ(γ) is nonincreasing in γ>0. From (38) it also follows that (I-PQ)Axγ is continuous for γ>0, which implies the continuity of η(γ) for γ>0.

To see that η(γ) is increasing, we use the inequality (as xγ∈Sγ)
(39)12∥(I-PQ)Axγ∥22+γ∥xγ∥1≤12∥(I-PQ)Axβ∥22+γ∥xβ∥1
which implies that
(40)η(γ)≤η(β)+γ(∥xβ∥1-∥xγ∥1).
Now if β>γ>0, then, as ∥xβ∥1≤∥xγ∥1, we immediately get that η(γ)≤η(β) and the increase of η is proven.

Proposition 6.

One has the following.

limγ→0η(γ)=infx∈ℝn(1/2)∥(I-PQ)Ax∥22.

limγ→0ρ(γ)=minx∈S∥x∥1.

Proof.

(i) Taking the limit as γ→0 in the inequality (and using the boundedness of (xγ))
(41)12∥(I-PQ)Axγ∥22+γ∥xγ∥1≤12∥(I-PQ)Ax∥22+γ∥x∥1,∀x∈ℝn,
yields
(42)limγ→0η(γ)≤12∥(I-PQ)Ax∥22,∀x∈ℝn.
The result in (i) then follows.

As for (ii), we have, by (27), ∥xγ∥1≤∥x~∥1 for any x~∈S. In particular, ∥xγ∥1≤∥x†∥1, where x† is an ℓ1 minimum-norm element of S; that is, ∥x†∥1=minx∈S∥x∥1.

Assume γk→0 is such that xγk→x^. Then for any x,
(43)12∥(I-PQ)Ax^∥22=limk→∞12∥(I-PQ)Axγk∥22=limk→∞12∥(I-PQ)Axγk∥22+γk∥xγk∥1≤limk→∞12∥(I-PQ)Ax∥22+γk∥x∥1=12∥(I-PQ)Ax∥22.
It follows that x^ solves the minimization problem: minx(1/2)∥(I-PQ)Ax∥22; that is, x^∈S. Consequently,
(44)limγ→0ρ(γ)=limk→∞ρ(γk)=limk→∞∥xγk∥1=∥x^∥1≤∥x†∥1=minx∈S∥x∥1.
This suffices to ensure that the conclusion of (ii) holds.

It is a challenging problem how to select the tuning (i.e., regularizing) parameter γ in lasso (1) and Q-lasso (7). There is no general rule to universally select γ which should instead be selected in a case-to-case manner. The following result however points out that γ cannot be large.

Proposition 7.

Let Q be a nonempty closed convex subset of ℝm and assume that Q-lasso (7) is consistent (i.e., solvable). If γ>max{∥AtPQAx∥∞:∥x∥1≤minv∈S∥v∥1} (note that this condition is reduced to γ>∥Atb∥∞ for lasso (1) for which Q={b}), then xγ=0. (Here S is, as before, the solution set of the Q-least squares problem (24).)

Proof.

Let xγ∈Sγ. The optimality condition
(45)-At(I-PQ)Axγ∈γ∂∥xγ∥1
implies that
(46)-(At(I-PQ)Axγ)j=γ·sgn[(xγ)j],if(xγ)j≠0,|(At(I-PQ)Axγ)j|≤γ,if(xγ)j=0.
Taking x=2xγ in subdifferential inequality (31) yields
(47)γ∥xγ∥1≥-〈At(I-PQ)Axγ,xγ〉=-∑(xγ)j≠0(At(I-PQ)Axγ)j(xγ)j=∑(xγ)j≠0γ·[sgn(xγ)]j(xγ)j=γ∑(xγ)j≠0|(xγ)j|=γ∥x∥1.
It follows that
(48)γ∥xγ∥1=-〈At(I-PQ)Axγ,xγ〉=-〈(I-PQ)Axγ,Axγ〉,(49)=-∥Axγ∥22+〈PQAxγ,Axγ〉≤〈PQAxγ,Axγ〉=〈AtPQAxγ,xγ〉≤∥AtPQAxγ∥∞∥xγ∥1.

Now by Lemma 4, we have ∥xγ∥1≤minv∈S∥v∥1. Hence, from (49) it follows that if xγ≠0, we must have γ≤max{∥AtPQAx∥∞:∥x∥1≤minv∈S∥v∥1}. This completes the proof.

Notice that (48) shows that ρ(λ)=∥xγ∥1 can be determined by Axγ. Hence, we arrive at the following characterization of solutions of Q-lasso (23).

Proposition 8.

Let Q be a nonempty closed convex subset of ℝm and let γ>0 and xγ∈Sγ. Then x^∈ℝn is a solution of the Q-lasso (23) if and only if Ax^=Axγ and ∥x^∥≤∥xγ∥. It turns out that
(50)Sγ=xγ+N(A)∩Bρ(γ),
where N(A)={x∈ℝn:Ax=0} is the null space of A and where Br denotes the closed ball centered at the origin and with radius of r>0. This shows that if one can find one solution to Q-lasso (23), then all solutions are found by (50).

Proof.

If Ax^=Axγ, then from the relations
(51)φγ(xγ)=12∥(I-PQ)Axγ∥22+γ∥xγ∥1≤12∥(I-PQ)Ax^∥22+γ∥x^∥1=12∥(I-PQ)Axγ∥22+γ∥x^∥1,
we obtain ∥xγ∥1≤∥x^∥1. This together with the assumption of ∥x^∥1≤∥xγ∥1 yields that ∥x^∥1=∥xγ∥1 which in turn implies that φγ(x^)=φγ(xγ) and hence x^∈Sγ.

4. Iterative Methods

In this section we discuss the proximal iterative methods for solving Q-lasso (7). The basics are Moreau's concept of proximal operators and their fundamental properties which are briefly mentioned below. (For the sake of our purpose, we however confine ourselves to the finite-dimensional setting.)

4.1. Proximal Operators

Let Γ0(ℝn) be the space of convex functions in ℝn that are proper, lower semicontinuous and convex.

Definition 9 (see [<xref ref-type="bibr" rid="B10">10</xref>, <xref ref-type="bibr" rid="B11">11</xref>]).

The proximal operator of φ∈Γ0(ℝn) is defined by
(52)proxφ(x)∶=argminv∈ℝn{φ(v)+12∥v-x∥2},x∈ℝn.
The proximal operator of φ of order λ>0 is defined as the proximal operator of λφ; that is,
(53)proxλφ(x)∶=argminv∈ℝn{φ(v)+12λ∥v-x∥2},x∈ℝn.

For fundamental properties of proximal operators, the reader is referred to [12, 13] for details. Here we only mention the fact that the proximal operator proxλφ can have a closed-form expression in some important cases as shown in the examples below [12].

If we take φ to be any norm ∥·∥ of ℝn, then
(54)proxλ∥·∥(x)={(1-λ∥x∥)x,if∥x∥>λ.0,if∥x∥≤λ.

In particular, if we take φ to be the absolute value function of the real line ℝ, we get
(55)proxλ|·|(x)=sgn(x)max{|x|-λ,0}

which is also known as the scalar soft-thresholding operator.

Let {ek}k=1n be an orthonormal basis of ℝn and let {ωk}k=1n be real positive numbers. Define φ by
(56)φ(x)=∑k=1nωk|〈x,ek〉|.

Then proxφ(x)=∑k=1nαkek, where
(57)αk=sgn(〈x,ek〉)max{|〈x,ek〉|-ωk,0}.

In particular, if φ(x)=∥x∥1 for x∈ℝn, then
(58)proxλ∥·∥1(x)=(proxλ|·|(x1),…,proxλ|·|(xn))=(α1,…,αn),

where αk=sgn(xk)max{|xk|-λ,0} for k=1,…,n.

4.2. Proximal-Gradient Algorithm

The proximal operators can be used to minimize the sum of two convex functions:
(59)minx∈ℝnf(x)+g(x),
where f,g∈Γ0(ℝn). It is often the case where one of them is differentiable. The following is an equivalent fixed point formulation of (59).

Proposition 10 (see [<xref ref-type="bibr" rid="B7">12</xref>, <xref ref-type="bibr" rid="B14">14</xref>]).

Let f,g∈Γ0(ℝn). Let x*∈ℝn and λ>0. Assume f is finite valued and differentiable on ℝn. Then x* is a solution to (59) if and only if x* solves the fixed point equation:
(60)x*=(proxλg∘(I-λ∇f))x*.

Fixed point equation (60) immediately yields the following fixed point algorithm which is also known as the proximal-gradient algorithm for solving (59).

Initialize x0∈ℝn and iterate
(61)xk+1=(proxλkg∘(I-λk∇f))xk,
where {λk} is a sequence of positive real numbers.

Theorem 11 (see [<xref ref-type="bibr" rid="B7">12</xref>, <xref ref-type="bibr" rid="B14">14</xref>]).

Let f,g∈Γ0(ℝn) and assume (59) is consistent. Assume in addition the following.

∇f is Lipschitz continuous on ℝn:
(62)∥∇f(x)-∇f(y)∥≤L∥x-y∥,x,y∈ℝn.

0<liminfn→∞λn≤limsupn→∞λn<2/L.

Then the sequence (xk) generated by the proximal-gradient algorithm (61) converges to a solution of (59).
4.3. The Relaxed Proximal-Gradient Algorithm

The relaxed proximal-gradient algorithm generates a sequence (xk) by the following iteration process.

Initialize x0∈ℝn and iterate
(63)xk+1=(1-αk)xk+αk(proxλkg∘(I-λk∇f))xk,
where {αk} is the sequence of relaxation parameters and {λk} is a sequence of positive real numbers.

Theorem 12 (see [<xref ref-type="bibr" rid="B14">14</xref>]).

Let f,g∈Γ0(ℝn) and assume (59) is consistent. Assume in addition the following.

∇f is Lipschitz continuous on ℝn:
(64)∥∇f(x)-∇f(y)∥≤L∥x-y∥,x,y∈ℝn.

0<liminfn→∞λn≤limsupn→∞λn<2/L.

0<liminfn→∞αn≤limsupn→∞αn<4/(2+L·limsupn→∞λn).

Then the sequence (xk) generated by proximal-gradient algorithm (61) converges to a solution of (59).

If we take λn≡λ∈(0,2/L), then the relaxation parameters αk can be chosen from a larger pool; they are allowed to be close to zero. More precisely, we have the following theorem.

Theorem 13 (see [<xref ref-type="bibr" rid="B14">14</xref>]).

Let f,g∈Γ0(ℝn) and assume (59) is consistent. Define the sequence (xk) by the following relaxed proximal algorithm:
(65)xk+1=(1-αn)xk+αkproxλg(xk-λ∇f(xk)).
Suppose that

∇f satisfies the Lipschitz continuity condition (i) in Theorem 12;

0<λ<2/L and 0≤αk≤(2+λL)/4 for all k;

∑n=1∞αn((4/(2+λL))-αk)=∞.

Then (xn) converges to a solution of (59).
4.4. Proximal-Gradient Algorithms Applied to Lasso

For Q-lasso (7), we take f(x)=(1/2)∥(I-PQ)Ax∥22 and g(x)=γ∥x∥1. Noticing that ∇f(x)=At(I-PQ)Ax which is Lipschitz continuous with constant L=∥A∥22 for I-PQ is nonexpansive, we find that proximal-gradient algorithm (61) is reduced to the following algorithm for solving Q-lasso (7):
(66)xk+1=proxλkγ∥·∥1(I-λkAt(I-PQ)A)xk.

The convergence theorem of general proximal-gradient algorithm (61) reads the following for Q-lasso (7).

Theorem 14.

Assume 0<liminfk→∞λk≤limsupk→∞λk<2/∥A∥22. Then the sequence (xk) generated by proximal-gradient algorithm (66) converges to a solution of lasso (7).

Remark 15.

Relaxed proximal-gradient algorithms (63) and (65) also apply to Q-lasso (7). We however do not elaborate on them in detail.

Remark 16.

Proximal-gradient algorithm (61) can be reduced to a projection-gradient algorithm in the case where the convex function g is homogeneous (i.e., g(tx)=tg(x) for t≥0 and x∈ℝn) because the homogeneity of g implies that the proximal operator of g is actually a projection; more precisely, we have
(67)proxλg=PλK,λ>0,
where K=∂g(0). As a result, proximal-gradient algorithm (61) is reduced to the following projection-gradient algorithm:
(68)xk+1=(I-PλkK)(I-λk∇f)xk.

Now we apply projection-gradient algorithm (68) to Q-lasso (7). In this case, we have f(x)=(1/2)∥(I-PQ)Ax∥22 and g(x)=γ∥x∥1 (homogeneous). Thus, ∇f(x)=At(I-PQ)Ax and the convex set K=∂g(0) is given as K=γ∂(∥z∥1)|z=0=γ[-1,1]n. We find that, for each positive number λ>0, PλK is the projection of the Euclidean space ℝn to the ℓ∞ ball with radius of λγ; that is, {x∈ℝn:∥x∥∞≤λγ}. It turns out that proximal-projection algorithm (66) is rewritten as a projection algorithm below:
(69)xk+1=(I-Pλkγ[-1,1]n)(I-λkAt(I-PQ)A)xk.

5. An <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M389"><mml:mrow><mml:msub><mml:mrow><mml:mi>ℓ</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>-<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M390"><mml:mrow><mml:msub><mml:mrow><mml:mi>ℓ</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> Regularization for the <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M391"><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:math></inline-formula>-Lasso

Q-lasso (7) may be ill posed and therefore needs to be regularized. Inspired by the elastic net [15] which regularizes lasso (1), we introduce an ℓ1-ℓ2 regularization for the Q-Lasso as the minimization
(70)minx∈ℝn12∥(I-PQ)Ax∥22+γ∥x∥1+δ12∥x∥22=:φγ,δ(x),
where γ>0 and δ>0 are regularization parameters. This is indeed the traditional Tikhonov regularization applied to Q-lasso (7).

Let xγ,δ be the unique solution of (70) and set
(71)φγ(x)∶=12∥(I-PQ)Ax∥22+γ∥x∥1,ψδ(x)∶=12∥(I-PQ)Ax∥22+δ12∥x∥22
which are the limits of φγ,δ(x) as δ→0 and γ→0, respectively. Let
(72)Sγ=argminx∈ℝnφγ(x),x^δ=argminx∈ℝnψδ(x).

Proposition 17.

Assume the Q-least-squares problem
(73)minx∈ℝn12∥(I-PQ)Ax∥22
is consistent (i.e., solvable) and let S be its nonempty set of solutions.

Asδ→0 (for each fixedγ>0),xγ,δ→xγ†which is the (ℓ2) minimum-norm solution toQ-lasso (7). Moreover, asγ→0, every cluster point ofxγ†is a (ℓ1) minimum-norm solution ofQ-least squares problem (73), that is, a point in the setargminx∈S∥x∥1.

Asγ→0 (for each fixedδ>0),xγ,δ→x^δwhich is the unique solution to theℓ2regularized problem:(74)minx∈ℝn12∥(I-PQ)Ax∥22+δ12∥x∥22.

Moreover, asδ→0,x^δ→x^which is theℓ2minimal norm solution of (73); that is,x^=argminx∈S∥x∥2.

Proof.

We have that xγ,δ satisfies the optimality condition:
(75)0∈∂φγ,δ(xγ,δ),
where the subdifferential of φγ,δ is given by
(76)∂φγ,δ(x)=At(I-PQ)Ax+δx+γ∂∥x∥1.
It turns out that the above optimality condition is reduced to
(77)-1γ(At(I-PQ)Axγ,δ+δxγ,δ)∈∂∥xγ,δ∥1.
Using the subdifferential inequality, we obtain
(78)γ∥x∥1≥γ∥xγ,δ∥1-〈At(I-PQ)Axγ,δ+δxγ,δ,x-xγ,δ〉
for x∈ℝn. Replacing x with xγ′,δ′ for γ′>0 and δ′>0 yields
(79)γ∥xγ′,δ′∥1≥γ∥xγ,δ∥1-〈At(I-PQ)Axγ,δ+δxγ,δ,xγ′,δ′-xγ,δ〉.
Interchange γ and γ′ and δ and δ′ to get
(80)γ′∥xγ,δ∥1≥γ′∥xγ′,δ′∥1-〈At(I-PQ)Axγ′,δ′+δ′xγ′,δ′,xγ,δ-xγ′,δ′〉.
Adding up (79) and (80) results in
(81)(γ′-γ)(∥xγ,δ∥1-∥xγ′,δ′∥1)≥〈(I-PQ)Axγ′,δ′-(I-PQ)Axγ,δ,Atxγ′,δ′-Atxγ,δ〉+〈δ′xγ′,δ′-δxγ,δ,xγ′,δ′-xγ,δ〉≥∥(I-PQ)Axγ,δ-(I-PQ)Axγ′,δ′∥22+(δ′-δ)〈xγ′,δ′,xγ′,δ′-xγ,δ〉+δ∥xγ′,δ′-xγ,δ∥22.
Since ℓ1-ℓ2 regularization (70) is the Tikhonov regularization of Q-lasso (7), we get
(82)∥xγ,δ∥2≤∥xγ∥2≤c∥xγ∥1≤c∥x∥1,∥xγ∥2≤c∥xγ∥1c∥x∥1xγ∈Sγ,x∈S.
Here c>0 is a constant. It follows that {xγ,δ} is bounded.

For fixed γ>0, we can use the theory of Tikhonov regularization to conclude that xγ,δ is continuous in δ>0 and converges, as δ→0, to xγ† which is the (ℓ2) minimum-norm solution to Q-lasso (7), that is, the unique element xγ†:=argminx∈Sγ∥x∥2. By Proposition 6, we also find that every cluster point of xγ†, as γ→0, lies in the set argminx∈S∥x∥1.

Fix δ>0 and use Proposition 6 to see that xγ,δ→x^δ as γ→0. Now the standard property of Tikhonov's regularization ensures that x^δ→argminx∈S∥x∥2 as δ→0.

ℓ1-ℓ2 regularization (70) can be solved by proximal-gradient algorithm (61). Take f(x)=(1/2)∥(I-PQ)Ax∥22+(1/2)δ∥x∥22 and g(x)=γ∥x∥1; then algorithm (61) is reduced to
(83)xk+1=proxλkγ∥·∥1(xk-λk[At(I-PQ)Axk+δxk]).
The convergence of this algorithm is given as follows.

Theorem 18.

Assume
(84)0<liminfk→∞λk≤limsupk→∞λk<2∥A∥22+δ.
Then the sequence (xk) generated by algorithm (83) converges to the solution of ℓ1-ℓ2 regularization (70).

We can also take f(x)=(1/2)∥(I-PQ)Ax∥22 and g(x)=γ∥x∥1+(1/2)δ∥x∥22. Then
prox
μg(x)=
prox
ν∥·∥1((1/(1+μδ))x) with ν=μγ/(1+μδ), and the proximal algorithm (61) is reduced to
(85)xk+1=
prox
νk∥·∥1(11+δγk(xk-λkAt(I-PQ)Axk)).
Here νk=γλk/(1+δγk). Convergence of this algorithm is given below.

Theorem 19.

Assume
(86)0<liminfk→∞λk≤limsupk→∞λk<2∥A∥22.
Then the sequence (xk) generated by the algorithm (85) converges to the solution of ℓ1-ℓ2 regularization (70).

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors were grateful to the anonymous referees for their helpful comments and suggestions which improved the presentation of this paper. This work was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, under Grant no. 2-363-1433-HiCi. The authors, therefore, acknowledge the technical and financial support of KAU.

TibshiraniR.Regression shrinkage and selection via the lassoChenS. S.DonohoD. L.SaundersM. A.Atomic decomposition by basis pursuitDonohoD. L.Compressed sensingCandèsE. J.RombergJ.TaoT.Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency informationCandèsE. J.RombergJ. K.TaoT.Stable signal recovery from incomplete and inaccurate measurementsCandèsE.TaoT.The Dantzig selector: statistical estimation when p is much larger than nCaiT. T.XuG.ZhangJ.On recovery of sparse signals via l1 minimizationCensorY.ElfvingT.A multiprojection algorithm using Bregman projections in a product spaceXuH.-K.Iterative methods for the split feasibility problem in infinite-dimensional Hilbert spacesMoreauJ.-J.Propriétés des applications ‘prox’MoreauJ.-J.Proximité et dualité dans un espace hilbertienCombettesP. L.WajsV. R.Signal recovery by proximal forward-backward splittingMicchelliC. A.ShenL.XuY.Proximity algorithms for image models: denoisingXuH. K.Properties and iterative methods for the Lasso and its variantsZouH.HastieT.Regularization and variable selection via the elastic net