Properties and Iterative Methods for the Q-Lasso

and Applied Analysis 3 Here sgn is the sign function; that is, for a ∈ R,


Introduction
The lasso of Tibshirani [1] is the minimization problem: where  is an  ×  (real) matrix,  ∈ R  , and  > 0 is a tuning parameter.It is equivalent to the basis pursuit (BP) of Chen et al. [2]: It is well known that both lasso and BP model a number of applied problems arising from machine learning, signal/image processing, and statistics, due to the fact that they promote the sparsity of a signal  ∈ R  .Sparsity is popular phenomenon that occurs in practical problems since a solution may have a sparse representation in terms of an appropriate basis and therefore has been paid much attention.
Observe that both the lasso (1) and BP (2) can be viewed as the ℓ 1 regularization applied to the inverse linear system in R  :  = . ( In sparse recovery, the system (3) is underdetermined (i.e.,  <  and often  ≪  indeed).The theory of compressed sensing of Donoho [3] and Candès et al. [4,5] makes a breakthrough that under certain conditions the underdetermined system (3) can determine a unique -sparse solution.(Recall that a signal  ∈ R  is said to be -sparse if the number of nonzero entries of  is no bigger than .)However, due to errors of measurements, the system (3) is actually inexact:  ≈ .It turns out that the BP (2) is reformulated as where  > 0 is the tolerance level of errors and ‖ ⋅ ‖ is a norm on R  (often it is the ℓ  norm ‖ ⋅ ‖  for  = 1, 2, ∞; a solution to (4) when the tolerance is measured by the ℓ ∞ norm ‖ ⋅ ‖ ∞ is known as the Dantzig selector by Candès and Tao [6]; see also [7]).
Note that if we let  :=   () be the closed ball in R  around  and with radius of , then (4) is rewritten as min ∈R  ‖‖ 1 subject to  ∈ . ( Let now  be a nonempty closed convex subset of R  and let   be the projection from R  onto .Then noticing the 2 Abstract and Applied Analysis condition  ∈  being equivalent to the condition  −   () = 0, we see that the problem ( 5) is solved via Applying the Lagrangian method, we arrive at the following equivalent minimization: where  > 0 is a Lagrangian multiplier (also interpreted as a regularization parameter).Alternatively, we may view (7) as the ℓ 1 regularization of the inclusion  ∈  (equivalently, the equation which extends the linear system (3) in an obvious way.We refer to the problem (7) as the -lasso since it is the ℓ 1 regularization of inclusion (8) as lasso ( 1) is the ℓ 1 regularization of the linear system (3).Throughout the rest of this paper, we always assume that (8) is consistent (i.e., solvable).-lasso (7) is also connected with the so-called split feasibility problem (SFP) of Censor and Elfving [8] (see also [9]) which is stated as finding a point  with the property where  and  are closed convex subsets of R  and R  , respectively.An equivalent minimization formulation of the SFP ( 9) is given as Its ℓ 1 regularization is given as the minimization where  > 0 is a regularization parameter.Problem ( 7) is a special case of (11) when the set of constraints, , is taken to be the entire space R  .The purpose of this paper is to study the behavior, in terms of  > 0, of solutions to the regularized problem (7).(We leave the more general problem (11) to further work, due to the fact that the involvement of another closed convex set  brings some technical difficulties which are not easy to overcome.)We discuss iterative methods for solving the -lasso, including the proximal-gradient method and the projection-gradient method, the latter being derived via a duality technique.Due to ill posedness, we also apply the ℓ 1 -ℓ 2 regularization to the -lasso.

Preliminaries
Let  ≥ 1 be an integer and let R  be the Euclidean -space.If  ≥ 1, we use ‖ ⋅ ‖  to denote the ℓ  norm on R  .Namely, for  = (  )  ∈ R  , Let  be a closed convex subset of R  .Recall that the projection from R  to  is defined as the operator The projection   is characterized as follows: given  ∈ R  and  ∈  :  =    ⇐⇒ ⟨ − ,  − ⟩ ≤ 0,  ∈ .
Projections are nonexpansive.Namely, we have the following.
for all  ∈ (0, 1) and ,  ∈ R  .(Note that we only consider finite-valued functions.) The subdifferential of a convex function  is defined as the operator  given by The inequality in (17) is referred to as the subdifferential inequality of  at .We say that  is subdifferentiable at  if () is nonempty.It is well known that, for an everywhere finite-valued convex function  on R  ,  is everywhere subdifferentiable. Examples.
Here sgn is the sign function; that is, for  ∈ R, Consider the unconstrained minimization problem min ∈R   () . (20) The following are well known.
Proposition 2. Let  be everywhere finite-valued on R  .
(i) If  is strictly convex, then (20) admits at most one solution.
(ii) If  is convex and satisfies the coercivity condition then there exists at least one solution to (20).Therefore, if  is both strictly convex and coercive, there exists one and only one solution to (20).

Proposition 3.
Let  be everywhere finite-valued convex on Then  is a solution to minimization (20) if and only if it satisfies the first-order optimality condition:

Properties of the 𝑄-Lasso
We study some basic properties of the -lasso which is repeated below where  > 0 is a regularization parameter.We also consider the following minimization (we call it -least squares problem): Denote by  and   the solution sets of ( 24) and (23), respectively.Since   is continuous, convex, and coercive (i.e.,   () → ∞ as ‖‖ 2 → ∞),   is closed, convex, and nonempty.Notice also that since we assume the consistency of (8), we have  ̸ = 0; moreover, the solution sets of ( 8) and (24) coincide.
Observe that the assumption that  ̸ = 0 actually implies that   is uniformly bounded in  > 0, as shown by the lemma below.
Proof.For   ∈   , we have the optimality condition: Here   is the transpose of  and  stands for the subdifferential in the sense of convex analysis.Equivalently, It follows by the subdifferential inequality that Adding up (36) and (37) and using the fact that ( −   ) is firmly nonexpansive, we deduce that We therefore find that if  > , then ‖  ‖ 1 ≥ ‖  ‖ 1 .This proves that () is nonincreasing in  > 0. From (38) it also follows that (−  )  is continuous for  > 0, which implies the continuity of () for  > 0.
To see that () is increasing, we use the inequality (as ( −   ) Now if  >  > 0, then, as ‖  ‖ 1 ≤ ‖  ‖ 1 , we immediately get that () ≤ () and the increase of  is proven.Proposition 6.One has the following.
Proof.(i) Taking the limit as  → 0 in the inequality (and using the boundedness of (  )) This suffices to ensure that the conclusion of (ii) holds.
It is a challenging problem how to select the tuning (i.e., regularizing) parameter  in lasso (1) and -lasso (7).There is no general rule to universally select  which should instead be selected in a case-to-case manner.The following result however points out that  cannot be large.

Proposition 7.
Let  be a nonempty closed convex subset of R  and assume that -lasso (7) is consistent (i.e., solvable).If  > max{‖    ‖ ∞ : ‖‖ 1 ≤ min V∈ ‖V‖ 1 } (note that this condition is reduced to  > ‖  ‖ ∞ for lasso (1) for which  = {}), then   = 0. (Here  is, as before, the solution set of the -least squares problem (24).)Notice that (48) shows that () = ‖  ‖ 1 can be determined by   .Hence, we arrive at the following characterization of solutions of -lasso (23).Proposition 8. Let  be a nonempty closed convex subset of R  and let  > 0 and   ∈   .Then x ∈ R  is a solution of the -lasso (23) if and only if  x =   and ‖ x‖ ≤ ‖  ‖.It turns out that where () = { ∈ R  :  = 0} is the null space of  and where   denotes the closed ball centered at the origin and with radius of  > 0. This shows that if one can find one solution to -lasso (23), then all solutions are found by (50).

Iterative Methods
In this section we discuss the proximal iterative methods for solving -lasso (7).The basics are Moreau's concept of proximal operators and their fundamental properties which are briefly mentioned below.(For the sake of our purpose, we however confine ourselves to the finite-dimensional setting.) 4.1.Proximal Operators.Let Γ 0 (R  ) be the space of convex functions in R  that are proper, lower semicontinuous and convex.
Definition 9 (see [10,11]).The proximal operator of  ∈ Γ 0 (R  ) is defined by The proximal operator of  of order  > 0 is defined as the proximal operator of ; that is, prox  () := arg min For fundamental properties of proximal operators, the reader is referred to [12,13] for details.Here we only mention the fact that the proximal operator prox  can have a closedform expression in some important cases as shown in the examples below [12].
(a) If we take  to be any norm ‖ ⋅ ‖ of R  , then In particular, if we take  to be the absolute value function of the real line R, we get which is also known as the scalar soft-thresholding operator. ( where   = sgn(  ) max{|  | − , 0} for  = 1, . . ., .
Let  , be the unique solution of (70) and set (73) is consistent (i.e., solvable) and let  be its nonempty set of solutions.
(i) For fixed  > 0, we can use the theory of Tikhonov regularization to conclude that  , is continuous in  > 0 and converges, as  → 0, to  †  which is the (ℓ 2 ) minimum-norm solution to -lasso (7), that is, the unique element  †  := arg min ∈  ‖‖ 2 .By Proposition 6, we also find that every cluster point of  †  , as  → 0, lies in the set arg min ∈ ‖‖ 1 .