We propose a computationally simple and efficient method for sparse recovery termed as the semi-iterative hard thresholding (SIHT). Unlike the existing iterative-shrinkage algorithms, which rely crucially on using negative gradient as the search direction, the proposed algorithm uses the linear combination of the current gradient and directions of few previous steps as the search direction. Compared to other iterative shrinkage algorithms, the performances of the proposed method show a clear improvement in iterations and error in noiseless, whilst the computational complexity does not increase.
1. Introduction
Compressed sensing (CS) [1–3] is a new framework for acquiring sparse signals based on the revelation that a small number of linear measurements of the signal contain enough information for its reconstruction. CS relies on the fact that many natural signals are sparse or compressible when expressed in the proper basis and frame. The model of CS can be written as a linear sampling operator by a matrix Φ yielding a measurement vector
(1)y=Φx,
where Φ is an M×N matrix, x is S-sparse vector, and M≪N. Since the linear sampling operator Φ is not bijection and therefore has infinitely many solutions. Efficient algorithms to find sparse solutions are becoming very important. This leads to solving the l0-minimization problem
(2)min∥x∥0s.t.y=Φx.
Unfortunately, this minimization problem is NP-hard [2]. As alternatives, approximation algorithms are often considered. Approximation algorithms to find sparse solutions may be classified into greedy pursuits algorithms, convex relaxation algorithms, Bayesian framework, and nonconvex optimization. In this paper, we will focus on greedy pursuits algorithms and convex relaxation algorithms; thus more details of Bayesian framework and nonconvex optimization methods can be found in [4, 5]. Greedy pursuits algorithms include orthogonal matching pursuit (OMP) [6], stagewise OMP (StOMP) [7], regularized OMP (ROMP) [8], compressive sampling matching pursuit (CoSaMP) [9], iterative hard thresholding (IHT) [10], and gradient descent with sparsification (GraDeS) [11]. Convex relaxation algorithms include gradient projection for sparse reconstruction (GPSR) [12] and sparse reconstruction by separable approximation (SpaRSA) [13]. For more details about convex relaxation algorithms, see, for example [14]. Convex relaxation algorithms succeed with a very small number of measurements, but they tend to be computationally burdensome [15]. An alternative family of numerical algorithms has gradually built, addressing the optimization problems very effectively [15]. This family is the iterative-shrinkage algorithms. Iterative-shrinkage algorithms include iterative hard thresholding (IHT) [10] and gradient descent with sparsification (GraDeS) [11], parallel coordinate descent (PCD) [16], and fast iterative-shrinkage thresholding algorithm (FISTA) [17]. In these methods, each iteration consists of a multiplication by Φ and its transpose, along with a scalar shrinkage step on the obtained x. For iterative-shrinkage algorithms, IHT and GraDeS use a negative gradient as the search direction, that is, Landweber iteration [18], but the main drawback of Landweber iteration is its slow performance, that is, a large number of iterations need to obtain the optimal convergence rates [18]. Inspired by the semi-iterative method [18] and hard thresholding, we present an algorithm for solving sparse recovery, which requires less time and fewer iterations.
2. Background on Compressed Sensing2.1. Sensing Matrix
Without further information, it is impossible to recover x from y, since y=Φx is highly underdetermined. In order to recover a good estimate of x from M measurements, the measurement matrix Φ must obey the restricted isometry property (RIP) [2],
(3)(1-δs)∥x∥22≤∥Φx∥22≤(1+δs)∥x∥22,
for all x∈Σs, Σs={x∈RN:∥x∥0≤S} denotes the set of S-sparse vectors, δs∈(0,1) is restricted isometry constant, S∈{1,2,…,N}, provided that M≥C·S·log(N/S), where C is some constant depending on each instance. It is difficult to verify the RIP conditions for a given matrix. A widely used technique for avoiding checking the RIP directly is to generate the matrix randomly, such as Gaussian matrix, symmetric Bernoulli matrix, and partial Fourier matrix [1–3], and to show that the resulting random matrix satisfies the RIP with high probability. In this paper, we will use Gaussian matrix as the measurement matrix.
2.2. Sparse Recovery
An alternative approach to sparse signal recovery is based on the idea of iterative greedy pursuit and tries to approximate the solution to (2) directly. In this case, the problem (2) is closely related to the following optimization problem:
(4)min12∥y-Φx∥22s.t.∥x∥0≤s,
where S denotes the sparse level of the vector x.
The second one is convex relaxation. In this case, the problem (2) is closely related to the following optimization problem:
(5)min∥x∥1s.t.y=Φx.
However, these methods are often inefficient, requiring many iterations and excessive central processing unit time to reach their solutions [15].
An alternative family of numerical algorithms has gradually built, addressing the above optimization problems very effectively [15]. This family is the iterative-shrinkage algorithms. We will discuss iterative-shrinkage algorithms in the next section.
3. Semi-Iterative Hard Thresholding
The main drawback of Landweber iteration is its comparatively slow rate of convergence while for Landweber iteration only information about the last iterate x[k-1] is used to construct the new approximation x[k]. In order to overcome the drawback, more sophisticated iteration methods have been developed on the basis of the so-called semi-iterative methods. A basic step of a semi-iterative method (polynomial acceleration methods) consists of one step of iteration, followed by an averaging process over all or some of the previously obtained approximations. A basic step of a semi-iterative method has the form
(6)x[k]=μ1,kx[k-1]+μ2,kx[k-2]+⋯+μk,kx[0]+ωkΦT(y-Φx[k-1]),
where ∑i=1kμi,k=1,ωk≠0,k≥1. An example for semi-iterative methods with optimal rate of convergence are the γ-methods (two-step methods) by [18], which are defined by
(7)x[k]=x[k-1]+μk(x[k-1]-x[k-2])+ωkΦT(y-Φx[k-1]),
where
(8)μ1=0,ω1=(4γ+2)(4γ+1),ωk=4(2k+2γ-1)(k+γ-1)(k+2γ-1)(2k+2γ-1),μk=(k-1)(2k-3)(2k+2γ-1)(k+2γ-1)(2k+4γ-1)(2k+2γ-3),x[-1]=x[0]=x0.
From (4), the gradient of the cost function f(x)=(1/2)∥y-Φx∥22 is given by ∇f(x)=ΦT(Φx-y) and easy to compute the step length α that minimizes f(x[k]-α∇fk). By differentiating the function φ(α)=f(x[k]-α∇fk) with respect to α, we obtain
(9)φ′(α)=f(x[k]-α∇fk)T∇fk.
By setting the derivative to zero, we obtain
(10)αk=∇fkT∇fk∇fkTΦTΦ∇fk.
If we choose the step lengths by (10), thus f(x[k]-α∇fk)T∇fk=0, that is, the search direction ∇fk+1=∇f(x[k]-α∇fk) is orthogonal to the gradient ∇fk (previous search direction). In this case, the sequence of iterations is subject to zigzags. Since IHT and GraDeS use the negative gradient of the cost function f(x)=(1/2)∥y-Φx∥22 as the search direction, and sampling matrix Φ must obey the RIP, that is, ∥Φ∥22≈1, which means αk≈1, thus the iteration zigzag toward the solution. As a result, a large number of iterations need to obtain the optimal solution.
In order to avoid zigzagging toward solution and find the sparse solution for (4), inspired by the γ-methods [18] as mentioned above, we present the semi-iterative hard thresholding method, which has the form
(11)x[k]=Ps(x[k-1]+μk(x[k-1]-x[k-2])aaaaa+ωkΦT(y-Φx[k-1])),
where Ps(·) is the nonlinear operator that sets all but the largest (in magnitude) S elements of a vector to zero. From (11), we use the linear combination of the current negative gradient ΦT(y-Φx[k-1]) and the search direction of the previous step (x[k-1]-x[k-2]) as the new search direction. In this case, the search direction μk(x[k-1]-x[k-2])+ωkΦT(y-Φx[k-1]) dose not tend to become orthogonal to the gradient ∇fk; thus SIHT avoids zigzagging toward solution. The algorithm is summarized as in Algorithm 1.
Algorithm 1: Semi-iterative hard thresholding algorithm.
Step 4. Repeat step2-step3, until stopping criterion
∥y-Φx[k]∥2≤ε∥y∥2 is satisfied.
As mentioned above, the semi-iterative hard thresholding algorithm is easy to implement. It involves the application of the matrix Φ and ΦT at each iteration as well as two vector additions. The storage requirements are small. Apart from storage of y, we only require the storage of the vector x[k-1] and x[k-2], which require two S elements to be stored. The choice of the parameter γ will be discussed in the next section.
4. Experimental Results
This section describes some experiments testifying to the performances of the proposed algorithm. All the experiments were carried out on HP z600 workstation with eight Intel Xeon 2.13 GHz processors and 16 GB of memory, using a MATLAB implementation under Windows XP.
4.1. Choice of the Parameter γ
In our experiment, we consider a typical CS scenario, where the goal is to reconstruct a length-N sparse vector from M measurements. In this case, first, the M×N random matrix Φ is created by filling it with entries generated independently and identically distribution and then orthogonalizing the rows. Second, original vector x contains S randomly placed ±1 spikes, and the measurement y is generated according to (1). Unless otherwise stated, we terminate the iteration after ∥y-Φx[k]∥22≤ε∥y∥22, with ε=10-10.
The experiment assesses how the running time of the proposed algorithm grows with the parameter γ. In order to find a better optimization parameter γ in the experiment, we set the parameter γ, respectively, to γ∈{2,3,…,19,20}, whilst the running time of our method is computed. Figure 1 shows the running time of our algorithm as the parameter γ is varied. The label M/N/S stands for M measurements and S sparse length-N vector in our experiments. A careful examination reveals that as parameter γ is increased, the running time of our method is minimized with respect to γ=7. For 4≤γ≤20, the running time increases only marginally as γ is increased, that is, the choice of parameter γ appears to give good performance for a wide range of problems.
Influence of different parameter gamma on running time.
4.2. Comparison in Recovery Rate
In this experiment, we compared the empirical performance of GraDes, IHT, SpaRSA, FISTA, and SIHT solutions to the sparse recovery. We generated a Gaussian N(0,1) random matrix Φ∈R256×512 and generated S sparse ±1 spikes vector. The reconstruction is considered to be exact when the 2 norm of the difference between the reconstruction and original vector is below 10-2. We repeated the experiment 100 times for each value of S from 2 to 128 (in steps of 2). Figure 2 shows that SIHT algorithm provides higher probability of perfect recovery than GraDes, IHT, SpaRSA, and FISTA, when the sparse vectors are drawn ±1 spikes. Furthermore, in the perfect recovery case, we observe that the GraDes and SpaRSA algorithms perform similarly. While it reveals that measurements of SIHT require less than those of IHT, GraDes, SpaRSA, and FISTA to recover the sparse vector for a given N and S.
Simulation of the exact recovery rate.
4.3. Comparison in Running Time
In order to evaluate running time of the proposed algorithm, these experiments include comparisons with OMP, StOMP, ROMP, IHT, and GraDeS. Now, the sampling matrix Φ, the measurement vector y, and the sparsity level S are given to each of the algorithms as inputs. For the proposed algorithm, we set γ=7; the performance is insensitive to the choices. Table 1 compares the running time of the MATLAB implementation of SIHT and the five existing methods. The symbol “∞” indicates the algorithm fails.
The running time of different algorithm.
Rows
Colum
Sparsity
Running time (seconds)
M
N
S
OMP
StOMP
ROMP
IHT
GraDes
SIHT (7)
3000
8000
500
17.14
34.47
7.93
2.81
5.22
2.06
3000
10000
300
8.83
∞
2.77
3.52
5.69
2.72
3000
10000
600
19.39
∞
∞
9.33
∞
2.86
3000
10000
1050
∞
∞
∞
∞
∞
∞
4000
10000
500
25.33
∞
9.35
3.11
4.94
2.93
Table 1 shows that the iterative-shrinkage algorithms are significantly faster than match pursuit algorithms. For simplicity, we will compare the performance of SIHT with IHT and GraDes in the next experiments.
4.4. Comparison in Sparsity
In this experiment, we show the dependence of the 2-norm errors of different algorithms in different sparsity level S. In Figure 3, we show the 2-norm errors of SIHT comparison with IHT, GraDes, SpaRSA, and FISTA in different sparsity levels. We generated a Gaussian N(0,1) random matrix Φ∈R512×1024 and generated S sparse ±1 spikes vector or Gaussian vector. We repeated the experiment 100 times for each value of S from 2 to 120 (in steps of 10). Both GraDes and SpaRSA begin to fail when sparsity level is above 120; thus the failed results are omitted from the figure.
Recover error versus sparsity with fixed M=512andN=1024: (a) Gaussian sparse vectors and (b) sparse ±1 spikes vectors.
Figure 3 shows that GraDes, IHT, and SIHT algorithms perform similarly for Gaussian sparse vectors, and GraDes algorithm is to fail in recovery for sparse ±1 spikes vectors when sparsity level S is above 70, that is, GraDes algorithm requires more measurements to recover the sparse vectors. It reveals that FISTA, IHT, and SIHT algorithms are insensitive to the sparsity level S, whilst GraDes and SpaRSA algorithms are sensitive to the sparsity level S. In addition, SIHT algorithm outperforms other algorithms in 2-norm errors for sparse ±1 spikes or Gaussian vector.
4.5. Comparison in Number of Iterations
In the experiment, we show the number of iterations required by SIHT algorithm in comparison with four algorithms, namely, IHT, GraDes, SpaRSA, and FISTA for sparse ±1 spikes vector or Gaussian vector. We generated a Gaussian N(0,1) random matrix Φ∈R512×1024 and generated S sparse ±1 spikes or Gaussian vector. Figures 4 and 5 show the number of iterations needed by the algorithms as mentioned above for M=512, N=1024, and S=20.
Recover error versus number of iteration with fixed M=512, N=1024, and S=20 for Gaussian sparse vectors: (a) iteration between 2 and 120 and (b) iteration between 2 and 20.
Recover error versus number of iteration with fixed M=512, N=1024, and S=20 for sparse ±1 spikes vectors: (a) iteration between 1 and 120 and (b) iteration between 1 and 20.
Figures 4 and 5 depict that IHT and GraDes algorithms show a faster rate of convergence, when the number of iteration is less than 4. However, when the number of iteration is above 6, owing to polynomial acceleration, FISTA and SIHT algorithms show a faster rate of convergence than the others. In addition, from Figures 4 and 5, for each S≥7, FISTA and IHT algorithms are roughly similar in terms of number of iterations. In that SIHT algorithm uses the linear combination of the current gradient and directions of a few previous steps as the new search direction, SIHT algorithm shows a faster rate of convergence than the others. While GraDes algorithm exhibits poorer performance than the others in rate of convergence.
From Figures 4 and 5, the 2-norm errors of those algorithms except SpaRSA algorithm are insensitive to the number of iterations, that is, 2-norm errors are strictly monotone reduced as iteration is increased.
As expected, these results suggest that SIHT outperforms other iterative-shrinkage algorithms in iterations and 2-norm errors.
5. Conclusions
In this paper, semi-iterative hard thresholding recovery algorithm for sparse recovery was proposed in this work. The proposed algorithm uses the linear combination of the current gradient and directions of a few previous steps as the new search direction and avoids zigzagging toward solution. Owing to using the new search direction, the performance of SIHT is improved compared with iterative-shrinkage algorithms.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (Grant no. 61271294). The authors would like to thank Arvind Ganesh, Allen Y. Yang, and Zihan Zhou for sharing their software packages (L1benchmark) with us.
DonohoD. L.Compressed sensing20065241289130610.1109/TIT.2006.871582MR2241189ZBL1163.94399CandesE. J.WakinM. B.An introduction to compressive sampling2008252213010.1109/MSP.2007.914731CandesE. J.TaoT.Near-optimal signal recovery from random projections: universal encoding strategies?200652125406542510.1109/TIT.2006.885507MR2300700ZhangZ.RaoB. D.Sparse signal recovery with temporally correlated source vectors using sparse Bayesian learning20115591292610.1109/JSTSP.2011.2159773ChartrandR.Exact reconstruction of sparse signals via nonconvex minimization2007141070771010.1109/LSP.2007.898300TroppJ. A.GilbertA. C.Signal recovery from random measurements via orthogonal matching pursuit200753124655466610.1109/TIT.2007.909108MR2446929DonohoD. L.TsaigY.DroriI.StarckJ. L.Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit20062006-2NeedellD.VershyninR.Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit20099331733410.1007/s10208-008-9031-3MR2496554ZBL1183.68739NeedellD.TroppJ. A.CoSaMP: iterative signal recovery from incomplete and inaccurate samples200926330132110.1016/j.acha.2008.07.002MR2502366ZBL1163.94003FigueiredoM.NowakR.WrightS.Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems20071458659710.1109/JSTSP.2007.910281WrightS. J.NowakR. D.FigueiredoM. A. T.Sparse reconstruction by separable approximation20095772479249310.1109/TSP.2009.2016892MR2650165YangA. Y.Zihan ZhouA. G.SastryS. S.Yi MaFast L1-minimization algorithms for Robust face recognition. In press, http://arxiv.org/abs/1007.3753ZibulevskyM.EladM.L1-L2 optimization in signal and image processing20102737688BlumensathT.DaviesM. E.Iterative hard thresholding for compressed sensing200927326527410.1016/j.acha.2009.04.002MR2559726ZBL1174.94008GargR.KhandekarR.Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry propertyProceedings of the 26th International Conference on Machine Learning2009337344EladM.MatalonB.ZibulevskyM.Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization200723334636710.1016/j.acha.2007.02.002MR2362407ZBL1133.65022BeckA.TeboulleM.A fast iterative shrinkage-thresholding algorithm for linear inverse problems20092118320210.1137/080716542MR2486527ZBL1175.94009HankeM.Accelerated Landweber iterations for the solution of ill-posed equations199160334137310.1007/BF01385727MR1137198ZBL0745.65038