Sparse Recovery by Semi-Iterative Hard Thresholding Algorithm

We propose a computationally simple and efficient method for sparse recovery termed as the semi-iterative hard thresholding (SIHT). Unlike the existing iterative-shrinkage algorithms, which rely crucially on using negative gradient as the search direction, the proposed algorithm uses the linear combination of the current gradient and directions of few previous steps as the search direction. Compared to other iterative shrinkage algorithms, the performances of the proposed method show a clear improvement in iterations and error in noiseless, whilst the computational complexity does not increase.


Introduction
Compressed sensing (CS) [1][2][3] is a new framework for acquiring sparse signals based on the revelation that a small number of linear measurements of the signal contain enough information for its reconstruction.CS relies on the fact that many natural signals are sparse or compressible when expressed in the proper basis and frame.The model of CS can be written as a linear sampling operator by a matrix Φ yielding a measurement vector where Φ is an  ×  matrix, x is -sparse vector, and  ≪ .Since the linear sampling operator Φ is not bijection and therefore has infinitely many solutions.Efficient algorithms to find sparse solutions are becoming very important.This leads to solving the  0 -minimization problem min ‖x‖ 0 s.t.y = Φx.
Unfortunately, this minimization problem is NP-hard [2].As alternatives, approximation algorithms are often considered.Approximation algorithms to find sparse solutions may be classified into greedy pursuits algorithms, convex relaxation algorithms, Bayesian framework, and nonconvex optimization.In this paper, we will focus on greedy pursuits algorithms and convex relaxation algorithms; thus more details of Bayesian framework and nonconvex optimization methods can be found in [4,5].Greedy pursuits algorithms include orthogonal matching pursuit (OMP) [6], stagewise OMP (StOMP) [7], regularized OMP (ROMP) [8], compressive sampling matching pursuit (CoSaMP) [9], iterative hard thresholding (IHT) [10], and gradient descent with sparsification (GraDeS) [11].Convex relaxation algorithms include gradient projection for sparse reconstruction (GPSR) [12] and sparse reconstruction by separable approximation (SpaRSA) [13].For more details about convex relaxation algorithms, see, for example [14].Convex relaxation algorithms succeed with a very small number of measurements, but they tend to be computationally burdensome [15].An alternative family of numerical algorithms has gradually built, addressing the optimization problems very effectively [15].This family is the iterative-shrinkage algorithms.Iterative-shrinkage algorithms include iterative hard thresholding (IHT) [10] and gradient descent with sparsification (GraDeS) [11], parallel coordinate descent (PCD) [16], and fast iterative-shrinkage thresholding algorithm (FISTA) [17].In these methods, each iteration consists of a multiplication by Φ and its transpose, along with a scalar shrinkage step on the obtained x.For iterative-shrinkage algorithms, IHT and GraDeS use a negative gradient as the search direction, that is, Landweber iteration [18], but the main drawback of Landweber iteration Mathematical Problems in Engineering is its slow performance, that is, a large number of iterations need to obtain the optimal convergence rates [18].Inspired by the semi-iterative method [18] and hard thresholding, we present an algorithm for solving sparse recovery, which requires less time and fewer iterations.

Background on Compressed Sensing
2.1.Sensing Matrix.Without further information, it is impossible to recover x from y, since y = Φx is highly underdetermined.In order to recover a good estimate of x from  measurements, the measurement matrix Φ must obey the restricted isometry property (RIP) [2], for all x ∈ Σ  , Σ  = {x ∈   : ‖x‖ 0 ≤ } denotes the set of -sparse vectors,   ∈ (0, 1) is restricted isometry constant,  ∈ {1, 2, . . ., }, provided that  ≥  ⋅  ⋅ log(/), where  is some constant depending on each instance.It is difficult to verify the RIP conditions for a given matrix.A widely used technique for avoiding checking the RIP directly is to generate the matrix randomly, such as Gaussian matrix, symmetric Bernoulli matrix, and partial Fourier matrix [1][2][3], and to show that the resulting random matrix satisfies the RIP with high probability.In this paper, we will use Gaussian matrix as the measurement matrix.

Sparse
Recovery.An alternative approach to sparse signal recovery is based on the idea of iterative greedy pursuit and tries to approximate the solution to (2) directly.In this case, the problem (2) is closely related to the following optimization problem: where  denotes the sparse level of the vector x.
The second one is convex relaxation.In this case, the problem (2) is closely related to the following optimization problem: min ‖x‖ 1 s.t.y = Φx. (5) However, these methods are often inefficient, requiring many iterations and excessive central processing unit time to reach their solutions [15].An alternative family of numerical algorithms has gradually built, addressing the above optimization problems very effectively [15].This family is the iterative-shrinkage algorithms.We will discuss iterative-shrinkage algorithms in the next section.

Semi-Iterative Hard Thresholding
The main drawback of Landweber iteration is its comparatively slow rate of convergence while for Landweber iteration only information about the last iterate x [−1] is used to construct the new approximation x [] .In order to overcome the drawback, more sophisticated iteration methods have been developed on the basis of the so-called semi-iterative methods.A basic step of a semi-iterative method (polynomial acceleration methods) consists of one step of iteration, followed by an averaging process over all or some of the previously obtained approximations.A basic step of a semiiterative method has the form where An example for semiiterative methods with optimal rate of convergence are the methods (two-step methods) by [18], which are defined by where From ( 4), the gradient of the cost function (x) = (1/2)‖y − Φx‖ 2  2 is given by ∇(x) = Φ T (Φx − y) and easy to compute the step length  that minimizes (x [] − ∇  ).By differentiating the function () = (x [] −∇  ) with respect to , we obtain By setting the derivative to zero, we obtain If we choose the step lengths by (10), thus (x [] − ∇  ) T ∇  = 0, that is, the search direction ∇ +1 = ∇(x [] − ∇  ) is orthogonal to the gradient ∇  (previous search direction).In this case, the sequence of iterations is subject to zigzags.Since IHT and GraDeS use the negative gradient of the cost function (x) = (1/2)‖y − Φx‖ 2 2 as the search direction, and sampling matrix Φ must obey the RIP, that is, ‖Φ‖ 2  2 ≈ 1, which means   ≈ 1, thus the iteration zigzag toward the solution.As a result, a large number of iterations need to obtain the optimal solution.
In order to avoid zigzagging toward solution and find the sparse solution for (4), inspired by the -methods [18] as mentioned above, we present the semi-iterative hard thresholding method, which has the form )) , (11) where   (⋅) is the nonlinear operator that sets all but the largest (in magnitude)  elements of a vector to zero.From (11), we use the linear combination of the current negative gradient Φ T (y − Φx [−1] ) and the search direction of the previous step (x [−1] − x [−2] ) as the new search direction.In this case, the search direction   (x [−1] − x [−2] ) +   Φ T (y − Φx [−1] ) dose not tend to become orthogonal to the gradient ∇  ; thus SIHT avoids zigzagging toward solution.The algorithm is summarized as in Algorithm 1.
As mentioned above, the semi-iterative hard thresholding algorithm is easy to implement.It involves the application of the matrix Φ and Φ T at each iteration as well as two vector additions.The storage requirements are small.Apart from storage of y, we only require the storage of the vector x [−1]  and x [−2] , which require two  elements to be stored.The choice of the parameter  will be discussed in the next section.

Experimental Results
This section describes some experiments testifying to the performances of the proposed algorithm.All the experiments were carried out on HP z600 workstation with eight Intel Xeon 2.13 GHz processors and 16 GB of memory, using a MATLAB implementation under Windows XP.

Choice of the Parameter 𝛾.
In our experiment, we consider a typical CS scenario, where the goal is to reconstruct a length- sparse vector from  measurements.In this case, first, the  ×  random matrix Φ is created by filling it with entries generated independently and identically distribution and then orthogonalizing the rows.Second, original vector x contains  randomly placed ±1 spikes, and the measurement y is generated according to (1).Unless otherwise stated, we terminate the iteration after ‖y − Φx [] ‖ 2 2 ≤ ‖y‖ 2  2 , with  = 10 −10 .
The experiment assesses how the running time of the proposed algorithm grows with the parameter .In order to find a better optimization parameter  in the experiment, we set the parameter , respectively, to  ∈ {2, 3, . . ., 19, 20}, whilst the running time of our method is computed.Figure 1 shows the running time of our algorithm as the parameter  is varied.The label // stands for  measurements and  sparse length- vector in our experiments.A careful examination reveals that as parameter  is increased, the running time of our method is minimized with respect to  = 7.For 4 ≤  ≤ 20, the running time increases only marginally as  is increased, that is, the choice of parameter  appears to give good performance for a wide range of problems.

Comparison in Recovery Rate.
In this experiment, we compared the empirical performance of GraDes, IHT, SpaRSA, FISTA, and SIHT solutions to the sparse recovery.We generated a Gaussian N(0, 1) random matrix Φ ∈  256×512 and generated  sparse ±1 spikes vector.The reconstruction is considered to be exact when the 2 norm of the difference between the reconstruction and original vector is below 10 −2 .We repeated the experiment 100 times for each value of  from 2 to 128 (in steps of 2). Figure 2 shows that SIHT algorithm provides higher probability of perfect recovery than GraDes, IHT, SpaRSA, and FISTA, when the sparse vectors are drawn ±1 spikes.Furthermore, in the perfect recovery case, we observe that the GraDes and SpaRSA algorithms perform similarly.While it reveals that measurements of SIHT require less than those of IHT, GraDes, SpaRSA, and FISTA to recover the sparse vector for a given  and .

Comparison in Running Time
. In order to evaluate running time of the proposed algorithm, these experiments include comparisons with OMP, StOMP, ROMP, IHT, and GraDeS.Now, the sampling matrix Φ, the measurement vector y, and the sparsity level  are given to each of the algorithms as inputs.For the proposed algorithm, we set  = 7; the performance is insensitive to the choices.Table 1 compares the running time of the MATLAB implementation of SIHT and the five existing methods.The symbol "∞" indicates the algorithm fails.
Table 1 shows that the iterative-shrinkage algorithms are significantly faster than match pursuit algorithms.For simplicity, we will compare the performance of SIHT with IHT and GraDes in the next experiments.

Comparison in Sparsity.
In this experiment, we show the dependence of the 2-norm errors of different algorithms in different sparsity level .In Figure 3, we show the 2-norm errors of SIHT comparison with IHT, GraDes, SpaRSA, and FISTA in different sparsity levels.We generated a Gaussian N(0, 1) random matrix Φ ∈  512×1024 and generated  sparse ±1 spikes vector or Gaussian vector.We repeated the experiment 100 times for each value of  from 2 to 120 (in steps of 10).Both GraDes and SpaRSA begin to fail when sparsity level is above 120; thus the failed results are omitted from the figure.
Figure 3 shows that GraDes, IHT, and SIHT algorithms perform similarly for Gaussian sparse vectors, and GraDes algorithm is to fail in recovery for sparse ±1 spikes vectors when sparsity level  is above 70, that is, GraDes algorithm requires more measurements to recover the sparse vectors.It reveals that FISTA, IHT, and SIHT algorithms are insensitive to the sparsity level , whilst GraDes and SpaRSA algorithms are sensitive to the sparsity level .In addition, SIHT algorithm outperforms other algorithms in 2-norm errors for sparse ±1 spikes or Gaussian vector.

Comparison in Number of Iterations.
In the experiment, we show the number of iterations required by SIHT algorithm in comparison with four algorithms, namely, IHT, GraDes, SpaRSA, and FISTA for sparse ±1 spikes vector or Gaussian vector.We generated a Gaussian N(0, 1) random matrix Φ ∈  512×1024 and generated  sparse ±1 spikes or Gaussian vector.
Algorithm 1: Semi-iterative hard thresholding algorithm.Figures 4 and 5 depict that IHT and GraDes algorithms show a faster rate of convergence, when the number of iteration is less than 4.However, when the number of iteration is above 6, owing to polynomial acceleration, FISTA and SIHT algorithms show a faster rate of convergence than the others.In addition, from Figures 4 and 5, for each  ≥ 7, FISTA and IHT algorithms are roughly similar in terms of number of iterations.In that SIHT algorithm uses the linear combination of the current gradient and directions of a few previous steps as the new search direction, SIHT algorithm shows a faster rate of convergence than the others.While GraDes algorithm exhibits poorer performance than the others in rate of convergence.
From Figures 4 and 5, the 2-norm errors of those algorithms except SpaRSA algorithm are insensitive to the number of iterations, that is, 2-norm errors are strictly monotone reduced as iteration is increased.
As expected, these results suggest that SIHT outperforms other iterative-shrinkage algorithms in iterations and 2-norm errors.

Conclusions
In this paper, semi-iterative hard thresholding recovery algorithm for sparse recovery was proposed in this work.The proposed algorithm uses the linear combination of the current gradient and directions of a few previous steps as the new search direction and avoids zigzagging toward solution.Owing to using the new search direction, the performance of SIHT is improved compared with iterative-shrinkage algorithms.would like to thank Arvind Ganesh, Allen Y. Yang, and Zihan Zhou for sharing their software packages (L1benchmark) with us.

Figure 1 :
Figure 1: Influence of different parameter gamma on running time.

Figures 4 and 5
Figures 4 and 5 show the number of iterations needed by the algorithms as mentioned above for  = 512,  = 1024, and  = 20.Figures4 and 5depict that IHT and GraDes algorithms show a faster rate of convergence, when the number of iteration is less than 4.However, when the number of iteration is above 6, owing to polynomial acceleration, FISTA and SIHT algorithms show a faster rate of convergence than the others.In addition, from Figures4 and 5, for each  ≥ 7, FISTA and IHT algorithms are roughly similar in terms of number of iterations.In that SIHT algorithm uses the linear combination of the current gradient and directions

Figure 2 :
Figure 2: Simulation of the exact recovery rate.

Table 1 :
The running time of different algorithm.