JAM Journal of Applied Mathematics 1687-0042 1110-757X Hindawi Publishing Corporation 139609 10.1155/2012/139609 139609 Research Article New Nonsmooth Equations-Based Algorithms for 1-Norm Minimization and Applications Wu Lei 1 Sun Zhe 2 Chun Changbum 1 College of Mathematics and Econometrics Hunan University Changsha 410082 China hnu.edu.cn 2 College of Mathematics and Information Science Jiangxi Normal University Nanchang 330022 China jxnu.edu.cn 2012 26 12 2012 2012 18 09 2012 08 12 2012 09 12 2012 2012 Copyright © 2012 Lei Wu and Zhe Sun. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Recently, Xiao et al. proposed a nonsmooth equations-based method to solve the 1-norm minimization problem (2011). The advantage of this method is its simplicity and lower storage. In this paper, based on new nonsmooth equations reformulation, we investigate new nonsmooth equations-based algorithms for solving 1-norm minimization problems. Under mild conditions, we show that the proposed algorithms are globally convergent. The preliminary numerical results demonstrate the effectiveness of the proposed algorithms.

1. Introduction

We consider the 1-norm minimization problem (1.1)minxf(x)12Ax-b2+ρx1, where xn, bm, Am×n, and ρ is a nonnegative parameter. Throughout the paper, we use v=i=1n|vi|2 and v1=i|vi| to denote the Euclidean norm and the 1-norm of vector vRn, respectively. Problem (1.1) has many important practical applications, particularly in compressed sensing (abbreviated as CS)  and image restoration . It can also be viewed as a regularization technique to overcome the ill-conditioned, or even singular, nature of matrix A, when trying to infer x from noiseless observations b=Ax or from noisy observations b=Ax+ξ, where ξ is the white Gaussian noise of variance σ2 .

The convex optimization problem (1.1) can be cast as a second-order cone programming problem and thus could be solved via interior point methods. However, in many applications, the problem is not only large scale but also involves dense matrix data, which often precludes the use and potential advantage of sophisticated interior point methods. This motivated the search of simpler first-order algorithms for solving (1.1), where the dominant computational effort is a relatively cheap matrix-vector multiplication involving A and AT. In the past few years, several first-order algorithms have been proposed. One of the most popular algorithms falls into the iterative shrinkage/thresholding (IST) class [6, 7]. It was first designed for wavelet-based image deconvolution problems  and analyzed subsequently by many authors, see, for example, . Figueiredo et al.  studied the gradient projection and Barzilai-Borwein method  (denoted by GPSR-BB) for solving (1.1). They reformulated problem (1.1) as a box-constrained quadratic program and solved it by a gradient projection and Barzilai-Borwein method. Wright et al.  presented sparse reconstruction algorithm (denoted by SPARSA) to solve (1.1). Yun and Toh  proposed a block coordinate gradient descent algorithm for solving (1.1). Yang and Zhang  investigated alternating direction algorithms for solving (1.1).

Quite recently, Xiao et al.  developed a nonsmooth equations-based algorithm (called SGCS) for solving 1-norm minimization problems in CS. They reformulated the box-constrained quadratic program obtained by Figueiredo et al.  into a system of nonsmooth equations and then applied the spectral gradient projection method  to solving the nonsmooth equation. The main advantage of the SGCS is its simplicity and lower storage. The difference between the above algorithms and SGCS is that SGCS did not use line search to decrease the value of objective function at each iteration and instead used a projection step to accelerate the iterative process. However, each projection step in SGCS requires two matrix-vector multiplication involving A or AT, which means that each iteration requires matrix-vector multiplication involving A or AT four times, while each iteration in GPSR-BB and IST is only two times. This may bring in more computational complexity. In addition, the dimension of the system of nonsmooth equations is 2n, which is twice of the original problems. These drawbacks motivate us to study new nonsmooth equations-based algorithms for the 1-norm minimization problem.

In this paper, we first reformulate problem (1.1) into a system of nonsmooth equations. This system is Lipschitz continuous and monotone and many effective algorithms (see, e.g., ) can be used to solve it. We then apply spectral gradient projection (denoted by SGP) method  to solve the resulting system. Similar to SGCS, each iteration in SGP requires matrix-vector multiplication involving A or AT four times. In order to reduce the computational complexity, we also propose a modified SGP (denoted by MSGP) method to solve the resulting system. Under mild conditions, the global convergence of the proposed algorithms will be ensured.

The remainder of the paper is organized as follows. In Section 2, we first review some existing results of nonsmooth analysis and then derive an equivalent system of nonsmooth equations to problem (1.1). We verify some nice properties of the resulting system in this section. In Section 3, we propose the algorithms and establish their global convergence. In Section 4, we apply the proposed algorithms to some practical problems arising from compressed sensing and image restoration and compare their performance with that of SGCS, SPARSA, and GPSR-BB.

Throughout the paper, we use ·  ,· to denote the inner product of two vectors in Rn.

2. Preliminaries

By nonsmooth analysis, a necessary condition for a vector xn to be a local minima of nonsmooth function f:n is that (2.1)(0,,0)Tf(x), where f(x) denotes the subdifferential of f at x . If f is convex, then (2.1) is also sufficient for x to be a solution of (1.1). The subdifferential of the absolute value function |t| is given by the signum function sign(t), that is (2.2)|t|=sign(t):={{1},t>0,[-1,1],t=0,{-1},t<0. For problem (1.1), the optimality conditions therefore translate to (2.3)if(x)+ρsign(xi)=0,|xi|>0,|if(x)|ρ,xi=0, where if(x)=f(x)/xi, i=1,,n. It is clear that the function defined by (1.1) is convex. Therefore a point x*Rn is a solution of problem (1.1) if and only if it satisfies (2.4)if(x*)+ρ=0,if  xi*>0,if(x*)-ρ=0,if  xi*<0,-ρif(x*)ρ,if  xi*=0. Formally, we call the above conditions the optimality conditions for problem (1.1).

For any given τ>0, we define a mapping Hτ=(H1τ,H2τ,,Hnτ)T:nn by (2.5)Hiτ(x)max{τ(if(x)-ρ),min{xi,τ(if(x)+ρ)}}. Then Hτ is a continuous mapping and is closely related to problem (1.1). It is generally not differentiable in the sense of Fréchet derivative but semismooth in the sense of Qi and Sun . The following proposition shows that the 1-norm minimization problem (1.1) is equivalent to a nonsmooth equation. It can be easily obtained by the use of the optimality conditions and the convexity of the function f defined by (1.1).

Proposition 2.1.

Let τ>0 be any given constant. A point x*n is a solution of problem (1.1) if and only if it satisfies (2.6)Hτ(x*)=0.

The above proposition has reformulated problem (1.1) as a system of nonsmooth equations. Compared with the nonsmooth equation reformulation in , the dimension of (2.6) is only half of the dimension of the equation in .

Given a,b,c,d. It is easy to verify that (see, e.g. ) (2.7)min{a,b}-min{c,d}=(1-s)(a-c)+s(b-d),max{a,b}-max{c,d}=(1-t)(a-c)+t(b-d) with (2.8)s={0,ab,cd;1,a>b,c>d;min{a,b}-min{c,d}+c-ab-d+c-a,otherwise,t={0,ab,cd;1,a<b,c<d;max{a,b}-max{c,d}+c-ab-d+c-a,otherwise. It is clear that 0s, t1. By (2.5), we have for any x,yn, it holds that (2.9)Hiτ(x)-Hiτ(y)=max{τ(if(x)-ρ),min{xi,τ(if(x)+ρ)}}-max{τ(if(y)-ρ),min{yi,τ(if(y)+ρ)}}=τ(1-ti)(if(x)-if(y))+ti(min{xi,τ(if(x)+ρ)}-min{yi,τ(if(y)+ρ)})=τ(1-ti)(if(x)-if(y))+ti((1-si)(xi-yi)+τsi(if(x)-if(y)))=ti(1-si)(xi-yi)+τ(1-ti+tisi)(if(x)-if(y)), where 0si, ti1. Define two diagonal matrixes S and T by (2.10)S=diag{s1,s2,,sn},T=diag{t1,t2,,tn}. Then we obtain (2.11)Hτ(x)-Hτ(y)=T(I-S)(x-y)+τ(I-T+TS)(f(x)-f(y)). Since f(x)=AT(Ax-b), we get (2.12)Hτ(x)-Hτ(y)=(T(I-S)+τ(I-T+TS)ATA)(x-y). The next proposition shows the Lipschitz continuity of Hτ defined by (2.5).

Proposition 2.2.

For each τ>0, there exists a positive constant L(τ) such that (2.13)Hτ(x)-Hτ(y)L(τ)x-y,x,yn.

Proof.

By (2.10) and (2.12), we have (2.14)Hτ(x)-Hτ(y)T(I-S)+τ(I-T+TS)ATAx-y(1+τATA)x-y. Let L(τ)1+τATA. Then (2.13) holds. The proof is complete.

The following proposition shows another good property of the system of nonsmooth equations (2.6).

Proposition 2.3.

There exists a constant τ*>0 such that for any 0<ττ*, the mapping Hτ:nn is monotone, that is (2.15)Hτ(x)-Hτ(y),x-y0,x,yn.

Proof.

Let Dii be the ith diagonal element of ATA. It is clear that Dii>0, i=1,,n. Set τ*mini{1/Dii}. Note that ATA is symmetric and positive semidefinite. Consequently, for any τ(0,τ*], matrix T(I-S)+τ(I-T+TS)ATA is also positive semidefinite. Therefore, it follows from (2.12) that (2.16)Hτ(x)-Hτ(y),x-y0. This completes the proof.

3. Algorithms and Their Convergence

In this section, we describe the proposed algorithms in detail and establish their convergence. Let τ>0 be given. For simplicity, we omit τ and abbreviate Hτ(·) as H(·).

Algorithm 3.1 . (spectral gradient projection method (abbreviated as SGP)).

Given initial point x0n and constants θ0=1, r>0, ν0, σ>0, γ(0,1). Set k:=0.

Step 1.

Compute dk by (3.1)dk=-θkH(xk), where for each k1, θk is defined by (3.2)θk=sk-1Tsk-1yk-1Tsk-1 with sk-1=xk-xk-1 and yk-1=H(xk)-H(xk-1)+rH(xk)νsk-1. Stop if dk=0.

Step 2.

Determine steplength αk=γmk with mk being the smallest nonnegative integer m such that (3.3)-H(xk+γmdk),dkσγmH(xk+γmdk)dk. Set zk:=xk+αkdk. Stop if H(zk)=0.

Step 3.

Compute (3.4)xk+1=xk-H(zk),xk-zkH(zk)2H(zk). Set k:=k+1, and go to Step 1.

Remark 3.2.

(i) The idea of the above algorithm comes from . The major difference between Algorithm 3.1 and the method in  lies in the definition of yk-1. The choice of yk-1 in Step 1 follows from the modified BFGS method . The purpose of the term rH(xk)νsk-1 is to make yk-1 be closer to H(xk)-H(xk-1) as xk tends to a solution of (2.6).

(ii) Step 3 is called the projection step. It is originated in . The advantage of the projection step is to make xk+1 closer to the solution set of (2.6) than xk. We refer to  for details.

(iii) Since -H(xk),dk=H(xk)dk, by the continuity of H, it is easy to see that inequality (3.3) holds for all m sufficiently large. Therefore Step 2 is well defined and so is Algorithm 3.1.

The following lemma comes from .

Lemma 3.3.

Let H:nn be monotone and x,yn satisfy H(y),x-y>0. Let (3.5)x+=x-H(y),x-yH(y)2H(y). Then for any x*n satisfying H(x*)=0, it holds that (3.6)x+-x*2x-x*2-x+-x2.

The following theorem establishes the global convergence for Algorithm 3.1.

Theorem 3.4.

Let {xk} be generated by Algorithm 3.1 and x* a solution of (2.6). Then one has (3.7)xk+1-x*2xk-x*2-xk+1-xk2. In particular, {xk} is bounded. Furthermore, it holds that either {xk} is finite and the last iterate is a solution of the system of nonsmooth equations (2.6), or the sequence is infinite and limkxk+1-xk=0. Moreover, {xk} converges to some solution of (2.6).

Proof.

The proof is similar to that in . We omit it here.

Remark 3.5.

The computational complexity of each of SGP’s steps is clear. In large-scale problems, most of the work is matrix-vector multiplication involving A and AT. Steps 1 and 2 of SGP require matrix-vector multiplication involving A or AT two times each, while each iteration in GPSR-BB involves matrix-vector multiplication only two times. This may bring in more computational complexity. Therefore, we give a modification of SGP. The modified algorithm, which will be called MSGP in the rest of the paper, coincides with SGP except at Step 3, whose description is given below.

Algorithm 3.6 (modified spectral gradient projection Method (abbreviated as MSGP)).

Given initial point x0n and constants θ0=1, r>0, σ>0, γ(0,1) a positive integer M. Set k:=0.

Step 3.

Let m=k/M. If m is a positive integer, compute (3.8)xk+1=xk-H(zk),xk-zkH(zk)2H(zk); otherwise, let xk+1=zk. Set k:=k+1, and go to Step 1.

Lemma 3.7.

Assume that {xk} is a sequence generated by Algorithm 3.6 and x*n satisfies H(x*)=0. Let λmax(ATA) be the maximum eigenvalue of ATA and τ(0,1/λmax(ATA)]. Then it holds that (3.9)xk+1-x*xk-x*,k=0,1,2,.

Proof.

Let xk+1 be generated by (3.8). It follows from Lemma 3.3 that (3.9) holds. In the following, we assume that xk+1=zk. Then, we obtain (3.10)xk+1-x*=xk+αkdk-x*=xk-αkθkH(xk)-x*+αkθkH(x*)=(xk-x*)-αkθk[H(xk)-H(x*)]. This together with (2.12) implies that (3.11)xk+1-x*=xk-x*-αkθk[(Tk(I-Sk)+τ(I-Tk+TkSk)ATA)](xk-x*)=(1-αkθk)(xk-x*)+αkθk(I-Tk+TkSk)(I-τATA)(xk-x*)(1-αkθk)xk-x*+αkθkI-τATAxk-x*. Let τ(0,1/λmax(ATA)]. Then we get (3.12)xk+1-x*xk-x*. This completes the proof.

Now we establish a global convergence theorem for Algorithm 3.6.

Theorem 3.8.

Let λmax(ATA) be the maximum eigenvalue of ATA and τ(0,1/λmax(ATA)]. Assume that {xk} is generated by Algorithm 3.6 and x* is a solution of (2.6). Then one has (3.13)xk+1-x*xk-x*,k=0,1,2,. In particular, {xk} is bounded. Furthermore, it holds that either {xk} is finite and the last iterate is a solution of the system of nonsmooth equations (2.6), or the sequence is infinite and limkxk+1-xk=0. Moreover, {xk} converges to some solution of (2.6).

Proof.

We first note that if the algorithm terminates at some iteration k, then we have dk=0 or H(zk)=0. By the definition of θk, we have H(xk)=0 if dk=0. This shows that either xk or zk is a solution of (2.6).

Suppose that dk0 and H(zk)0 for all k. Then an infinite sequence {xk} is generated. It follows from (3.3) that (3.14)H(zk),xk-zk=-αkH(zk),dkσαk2H(zk)dk>0. Let x* be an arbitrary solution of (2.6). By Lemmas 3.7 and 3.3, we obtain (3.15)xk+1-x*xk-x*,xmM+1-x*2xmM-x*2-xmM+1-xmM2, where m is a nonnegative integer. In particular, the sequence {xk-x*} is nonincreasing and hence convergent. Moreover, the sequence {xk} is bounded, and (3.16)limmxmM+1-xmM=0. Following from (3.8) and (3.14), we have (3.17)xmM+1-xmM=H(zmM),xmM-zmMH(zmM)σαmM2dmM. This together with (3.16) yields (3.18)limmαmMdmM=0. Now we consider the following two possible cases:

liminfmH(xmM)=0;

liminfmH(xmM)=ϵ>0.

If (i) holds, then by the continuity of H and the boundedness of {xmM}, it is clear that the sequence {xmM} has some accumulation point x* such that H(x*)=0. Since the sequence {xk-x*} converges, it must hold that {xk} converges to x*.

If (ii) holds, then by the boundedness of {xmM} and the continuity of H, there exist a positive constant C and a positive integer m0 such that (3.19)12ϵH(xmM)C,mm0. On the other hand, from (3.2) and the definitions of sk-1 and yk-1, we have (3.20)θmM=smM-1TsmM-1ymM-1TsmM-1=smM-1TsmM-1H(xmM)-H(xmM-1),xmM-xmM-1+rH(xmM)νsmM-1TsmM-1, which together with (3.19) and Propositions 2.2 and 2.3 implies (3.21)1L+rCνθmM2νrϵν,mm0. Consequently, we obtain from (3.1), (3.19), and (3.21) (3.22)dmM=θmMH(xmM)ϵ2(L+rCν),dmM2νCrεν,mm0. Therefore, it follows from (3.18) that limmαmM=0. By the line search rule, we have for all m sufficiently large, mk-1 will not satisfy (3.3). This means (3.23)-H(xmM+γmk-1dmM),dmM<σγmk-1H(xmM+γmk-1dmM)dmM. Since {xmM} and {dmM} are bounded, we can choose subsequences of {xmM} and {dmM} converging to x** and d**, respectively. Taking the limit in (3.23) for the subsequence, we obtain (3.24)-H(x**),d**0. However, it is not difficult to deduce from (3.1), (3.19), and (3.21) that (3.25)-H(x**),d**>0. This yields a contradiction. Consequently, lim infmH(xmM)=ϵ>0 is not possible. The proof is then complete.

4. Applications to Compressed Sensing and Image Restoration

In this section, we apply the proposed algorithms, that is, SGP and MSGP, to solve some practical problems arising from the compressed sensing and image restoration. We will compare the proposed algorithms with SGCS, SPARSA, and GPSR-BB. The system of nonsmooth equations in SGCS is (4.1)F(z)min{z,τ(Hz+c)}=0, where z, c, H are defined as those in . The test problems are associated with applications in the areas of compressed sensing and image restoration. All experiments were carried out on a Lenovo PC (2.53 GHz, 2.00 GB of RAM) using Matlab 7.8. The parameters in SGCS are specified as follows: (4.2)τ=7,β=1,γ=1.2,ξ=10-4,ρ=0.1. The parameters in SGP and MSGP are specified as follows: (4.3)τ=7,σ=1,r=0.8,γ=0.5,M=10. Throughout the experiments, we choose the initial iterate to be x0=0.

In our first experiment, we consider a typical CS scenario, where the goal is to reconstruct a length-n sparse signal (in the canonical basis) from m observations, where m<n. The m×n matrix A is obtained by first filling it with independent samples of the standard Gaussian distribution and then orthonormalizing the rows. Due to the storage limitations of PC, we test a small size signal with m=1024, n=4096. The observed vector is b=Axorig+ξ, where ξ is Gaussian white noise with variance σ2=10-4 and xorig is the original signal with 50 randomly placed ±1 spikes and with zeros in the remaining elements. The regularization parameter is chosen as ρ=0.05ATb. We compare the performance of SGP and MSGP with that of SGCS, SPARSA, and GPSR-BB by solving the problem and choose ν=1 in SGP and MSGP algorithms. We measure the quality of restoration by means of mean squared error (MSE) to the original signal xorig defined by (4.4)MSE=1nx-xorig2, where x is the restored signal. To perform this comparison, we first run the SGCS algorithm and stop the algorithm if the following inequality is satisfied: (4.5)xk+1-xkxk<10-5, and then run each of the other algorithms until each reaches the same value of the objective function reached by SGCS.

The original signal and the estimation obtained by solving (1.1) using the MSGP method are shown in Figure 1. We can see from Figure 1 that MSGP does an excellent job at locating the spikes with respect to the original signal. In Figure 2, we plot the evolution of the objective function versus iteration number and CPU time, for these algorithms. It is readily to see that MSGP worked faster than other algorithms.

From top to bottom: original signal, observation, and reconstruction obtained by MSGP.

The objective function plotted against iteration number and CPU time for SGCS, SPARSA, GPSR-BB, SGP, and MSGP.

In the second experiment, we test MSGP for three image restoration problems based on the images as House, Cameraman, and Barbara. House and Cameraman images are of size 256×256 and the other is of size 512×512. All the pixels are contaminated by Gaussian noise with the standard deviation of 0.05 with blurring. The blurring function is chosen to be a two-dimensional Gaussian, (4.6)h(i,j)=1(1+i2+j2), truncated such that the function has a support of 9×9. The image restoration problem has the form (1.1), where ρ=0.0005 and A=HW are the composition of the 9×9 uniform blur matrix and the Haar discrete wavelet transform (DWT) operator. We compare the performance of MSGP with that of SGCS, SPARSA, and GPSR-BB by solving the problem and choose ν=0 in the MSGP method. As usual, we measure the quality of restoration by signal-to-noise ratio (SNR) defined as (4.7)SNR=10×log10xorig2xorig-x2, where xorig and x are the original and restored images, respectively. We first run SGCS and stop the process if the following inequality is satisfied: (4.8)xk+1-xkxk<10-5, and then run the other algorithms until their objective function value reach SGCS's value. Table 1 reports the number of iterations (Iter), the CPU time in seconds (Time), and the SNR to the original images (SNR).

Test results for SGCS, SPARSA, GPSR-BB, SGP, and MSGP in image restoration.

Ima SGCS SPARSA GPSR-BB SGP MSGP
Iter Time SNR Iter Time SNR Iter Time SNR Iter Time SNR Iter Time SNR
House 53 12.15 30.68 19 0.85 30.35 25 1.28 30.44 38 5.27 30.61 16 0.78 30.82
Cameraman 59 9.55 22.42 19 0.87 23.64 23 1.06 22.67 48 4.81 22.50 17 0.79 23.36
Barbara 150 234.05 22.95 29 7.08 23.72 41 12.78 22.93 62 42.12 23.06 26 6.92 23.09

It is easy to see from Table 1 that the MSGP is competitive with the well-known algorithms: SPARSA and GPSR-BB, in computing time and number of iterations and improves the SGCS greatly. Therefore we conclude that the MSGP provides a valid approach for solving 1-norm minimization problems arising from image restoration problems.

Preliminary numerical experiments show that SGP and MSGP algorithms have improved SGCS algorithm greatly. This may be because the system of nonsmooth equations solved here has lower dimension than that in  and the modification to projection steps that we made reduces the computational complexity.

Acknowledgments

The authors would like to thank the anonymous referee for the valuable suggestions and comments. L. Wu was supported by the NNSF of China under Grant 11071087; Z. Sun was supported by the NNSF of China under Grants 11126147 and 11201197.

Donoho D. L. Compressed sensing IEEE Transactions on Information Theory 2006 52 4 1289 1306 10.1109/TIT.2006.871582 2241189 ZBL1163.94399 Elad M. Matalon B. Zibulevsky M. Image denoising with shrinkage and redundant representations Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06) June 2006 New York, NY, USA 1924 1931 2-s2.0-33845588006 10.1109/CVPR.2006.143 Alliney S. Ruzinsky S. A. An algorithm for the minimization of mixed 1 and 2 norms with application to Bayesian estimation IEEE Transactions on Signal Processing 1994 42 3 618 627 2-s2.0-0028386025 10.1109/78.277854 Candès E. J. Romberg J. K. Tao T. Stable signal recovery from incomplete and inaccurate measurements Communications on Pure and Applied Mathematics 2006 59 8 1207 1223 10.1002/cpa.20124 2230846 ZBL1098.94009 Donoho D. L. For most large underdetermined systems of linear equations the minimal 1-norm solution is also the sparsest solution Communications on Pure and Applied Mathematics 2006 59 6 797 829 10.1002/cpa.20132 2217606 Daubechies I. Defrise M. De Mol C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint Communications on Pure and Applied Mathematics 2004 57 11 1413 1457 10.1002/cpa.20042 2077704 ZBL1077.65055 De Mol C. Defrise M. A note on wavelet-based inversion algorithms Contemporary Mathematics 2002 313 85 96 Chambolle A. DeVore R. A. Lee N. Y. Lucier B. J. Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage IEEE Transactions on Image Processing 1998 7 3 319 335 10.1109/83.661182 1669536 ZBL0993.94507 Beck A. Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems SIAM Journal on Imaging Sciences 2009 2 1 183 202 10.1137/080716542 2486527 ZBL1175.94009 Hale E. T. Yin W. Zhang Y. Fixed-point continuation for 1-minimization: methodology and convergence SIAM Journal on Optimization 2008 19 3 1107 1130 10.1137/070698920 2460734 Wen Z. Yin W. Goldfarb D. Zhang Y. A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation SIAM Journal on Scientific Computing 2010 32 4 1832 1857 10.1137/090747695 2678081 ZBL1215.49039 Figueiredo M. A. T. Nowak R. D. Wright S. J. Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems IEEE Journal of Selected Topics in Signal Processing 2007 1 4 586 597 2-s2.0-39449126969 10.1109/JSTSP.2007.910281 Barzilai J. Borwein J. M. Two-point step size gradient methods IMA Journal of Numerical Analysis 1988 8 1 141 148 10.1093/imanum/8.1.141 967848 ZBL0638.65055 Wright S. J. Nowak R. D. Figueiredo M. A. T. Sparse reconstruction by separable approximation IEEE Transactions on Signal Processing 2009 57 7 2479 2493 10.1109/TSP.2009.2016892 2650165 Yun S. Toh K.-C. A coordinate gradient descent method for 1-regularized convex minimization Computational Optimization and Applications 2011 48 2 273 307 10.1007/s10589-009-9251-8 2783427 Yang J. Zhang Y. Alternating direction algorithms for 1-problems in compressive sensing SIAM Journal on Scientific Computing 2011 33 1 250 278 10.1137/090777761 2783194 Xiao Y. Wang Q. Hu Q. Non-smooth equations based method for 1-norm problems with applications to compressed sensing Nonlinear Analysis: Theory, Methods & Applications 2011 74 11 3570 3577 10.1016/j.na.2011.02.040 2803084 Zhang L. Zhou W. J. Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 2006 196 2 478 484 10.1016/j.cam.2005.10.002 2249439 ZBL1128.65034 Li Q. N. Li D. H. A class of derivative-free methods for large-scale nonlinear monotone equations IMA Journal of Numerical Analysis 2011 31 4 1625 1635 10.1093/imanum/drq015 Solodov M. V. Svaiter B. F. Fukushima M. Qi L. A globally convergent inexact Newton method for systems of monotone equations Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods 1998 22 Kluwer Academic Publishers 355 369 1682755 ZBL0928.65059 Zhou W. J. Li D. H. Limited memory BFGS method for nonlinear monotone equations Journal of Computational Mathematics 2007 25 1 89 96 2292430 Zhou W. J. Li D. H. A globally convergent BFGS method for nonlinear monotone equations without any merit functions Mathematics of Computation 2008 77 264 2231 2240 10.1090/S0025-5718-08-02121-2 2429882 ZBL1203.90180 Clarke F. H. Optimization and Nonsmooth Analysis 1983 New York, NY, USA John Wiley & Sons 709590 Qi L. Q. Sun J. A nonsmooth version of Newton's method Mathematical Programming A 1993 58 3 353 367 10.1007/BF01581275 1216791 ZBL0780.90090 Chen X. Xiang S. Computation of error bounds for P-matrix linear complementarity problems Mathematical Programming A 2006 106 3 513 525 10.1007/s10107-005-0645-9 2216793 ZBL1134.90043 Li D. H. Fukushima M. A modified BFGS method and its global convergence in nonconvex minimization Journal of Computational and Applied Mathematics 2001 129 1-2 15 35 10.1016/S0377-0427(00)00540-9 1823208 ZBL0984.65055