The nonlinear compressive sensing (NCS) is an extension of classical compressive sensing (CS) and the iterative hard thresholding (IHT) algorithm is a popular greedy-type method for solving CS. The normalized iterative hard thresholding (NIHT) is a modification of IHT and is more effective than IHT. In this paper, we propose an approximately normalized iterative hard thresholding (ANIHT) algorithm for NCS by using the approximate optimal stepsize combining with Armijo stepsize rule preiteration. Under the condition similar to restricted isometry property (RIP), we analyze the condition that can identify the iterative support sets in a finite number of iterations. Numerical experiments show the good performance of the new algorithm for the NCS.
National Natural Science Foundation of China11271233Natural Science Foundation of Shandong ProvinceZR2012AM0161. Introduction
Compressed sensing (CS) [1, 2] deals with the problem of recovering sparse signals from underdetermined linear measurements. In recent years, it has attracted considerable attention in areas of signal processing, electrical engineering, computer science, and applied mathematics; see [3, 4]. However, many real-life applications in physics and biomedical sciences carry some strongly nonlinear structure, so that the linear model is not suited anymore, even as an approximation. It is of utmost interest to investigate compressive sensing with nonlinear measurements, which is called nonlinear compressive sensing (NCS) [5, 6]. So, it is necessary to consider the following NCS: to find a vector x∈Rn from the observations b∈Rm given by(1)Φx+η=b,s.t.x0≤s,where Φ(x)=(ϕ1(x),…,ϕm(x))⊤:Rn→Rm is nonlinear, η is some noise term and s<m<n, and x0 is the l0-norm of x, which refers to the number of nonzero elements in the vector x. The optimization problem associated with (1) is(2)minfx≜12Φx-b2,s.t.x0≤s,where · is l2-norm and r(x)≜Φ(x)-b is the residual function. Let Φ(x) be continuously differentiable and let J(x)∈Rm×n denote its Jacobian matrix. Then ∇f(x)=J(x)⊤r(x). Clearly, if Φ(x) is linear function, model (1) and the optimization problem (2) reduce to classical CS,(3)Ax+e=b,s.t.x0≤s,and problem(4)min12Ax-b2,s.t.x0≤s,respectively, where A∈Rm×n.
Greedy methods have already proven useful and efficient to tackle (4) [7]. A variety of greedy methods have been proposed to solve (4), such as matching pursuit (MP) [8], orthogonal MP (OMP) [9], compressive sampling matching pursuit (CoSaMP) [10], subspace pursuit (SP) [11], hard thresholding pursuit (HTP) [12], and conjugate gradient iterative hard thresholding (CGIHT) [13].
There is another greedy method, iterative hard thresholding (IHT) algorithm for problem (4), which was proposed by Blumensath and Davies in [14, 15]. When matrix A is row full-rank and the spectral norm A2<1, IHT converges to a local minimum of (4) [14]. In [16], the authors showed that the numerical studies of IHT are not very promising and the algorithm often fails to converge when the conditions fail. Then they gave normalized IHT (NIHT) with an adaptive stepsize and line search and proved that it converges to a local minimum if A is row full-rank and s-regular, where s-regular means that any s columns of A are linear independent [17]. Cartis and Thompson [18] showed NIHT converges to local minimum if matrix A is 2s-regular. Blumensath [5] showed that IHT can recover signals from NCS under conditions similar to those required in CS.
Inspired by these works, we propose an approximately NIHT (ANIHT) algorithm to solve the NCS problem (2). Since problem (2) is in general a nonconvex programming, we can only expect to find out the stationary point rather than the local minimizer, which is different from NIHT for CS. Then we show that the accumulation point of the ANIHT algorithm is the stationary point. With the nondegeneracy and strict complementarity of stationary point, the support sets of the sequence are identified in a finite number of iterations. At last, we simulate several experiments to demonstrate the effectiveness of the algorithm.
This paper is organized as follows. Section 2 gives the ANIHT algorithm for (2) and proves its convergence properties. Numerical results are given in Section 3. The last section makes some concluding remarks.
2. Algorithm
In this section, we will present the approximately normalized iterative hard thresholding (ANIHT) algorithm for (2) and then analyze its convergence properties. Denote (5)S≜x∈Rn∣x0≤s.For x∈Rn and set A⊆Rn, projector onto A is PA(x)≜argminy∈Ax-y. Note that the projection onto sparse set S, written as PS(·), sets all but s largest absolute value components of x to zero. The definition of L-stationarity was proposed in [17] based on the notion of fixed-point equation.
Definition 1.
The point x∗∈S is called the stationary point of problem (2), if it holds that(6)x∗∈PSx∗-α∇fx∗.
Note that x∗∈S is the stationary point of problem (2) if and only if [17] (7)∇ifx∗=0,i∈Γ∗≤1αMsx∗,i∉Γ∗, where Mi(x) denotes the ith largest element in absolute value of x∈S and Γ∗=supp(x∗)={i∈{1,…,n}∣xi∗≠0}.
In NIHT for CS [16], to guarantee the objective function a sufficient decrease per iteration, the authors added a stepsize strategy based on restricted isometry property (RIP) [1]. In ANIHT for NCS (2), we use the approximately optimal stepsize to accelerate the convergence and use Armijo-type stepsize and make a sufficient decrease of the objective function directly without RIP. Here (·)Γ is the subvector (submatrix) obtained by discarding all but the elements (columns) in Γ. The framework of ANIHT is described as follows.
Step 1.
Initialize x0, α0>0, σ>0, 0<β<1.
Step 2.
Let Γk=supp(xk) and compute (8)x~k+1=PSxk-αk0Jxk⊤rxk,where (9)αk0=minα0,Jxk⊤rxkΓk2JxkΓkJxk⊤rxkΓk2.
Step 3.
If supp(x~k+1)=Γk, then xk+1=x~k+1, Γk+1=supp(xk+1), and αk=αk0; else compute (10)xk+1=PSxk-αkJxk⊤rxkand Γk+1=supp(xk+1), where αk=αk0βmk and mk is the smallest positive integer m such that (11)rxkαk0βm2≤rxk2-σxkαk0βm-xk2, and xk(α)=PS(xk-αJ(xk)⊤r(xk)).
Step 4.
If the stopping criterion is met, then stop.
Remark 2.
We now briefly illustrate the algorithm:
In Step 2, a proposal point x~k+1 is calculated, whether accepting it depends on the relationship between its support set and the previous point’s; see Step 3. In addition, to accelerate the rate of convergence, we utilize the approximately optimal initial stepsize to compute x~k+1. The linear approximate of Φ(xk-αJ(xk)⊤r(xk)) at xk is (12)Φxk-αJxk⊤rxk≈Φxk-αJxkJxk⊤rxk.
Γk is the support of the best s term approximation to b at the current iteration. So we obtain that(13)argminΦxk-αJxkΓkJxk⊤rxkΓk-b2,α>0=Jxk⊤rxkΓk2JxkΓkJxk⊤rxkΓk2.
This stepsize is in accordance with the optimal stepsize in NIHT for CS in [16]. Furthermore, by Assumption 4, when α is relatively small, the error introduced in the linear approximation is small; then the objective function decreases if the support set is not changed.
Armijo-type stepsize rule in Step 3 makes the choice of stepsize and support set adaptively and a sufficient decrease of the objective function meanwhile per iteration. It is well defined by Lemma 6.
The following assumptions are chosen to ensure the descent property (14) of the objective function 1/2r(x)2.
Assumption 3.
There exists a constant U2s>0 such that J(x)(x-y)≤U2sx-y for |suppx∪supp(y)|≤2s.
We also need the assumption that the Jacobian J(·) of residual r(·) is restricted Lipschitz continuous on Rn.
Assumption 4.
There exists a constant J2s>0 such that J(x)-J(y)≤J2sx-y for suppx∪suppy≤2s.
Lemma 5.
Suppose that Assumptions 3 and 4 hold. If there exists ρ>0 for x and the iterative point xk of ANIHT satisfying suppx∪suppxk≤2s and x-xk≤2ρ, then(14)rx2≤rxk2+2Jxk⊤rxk,x-xk+L~2sx-xk2,where L~2s≜U2s2+J2srx0+ρU2sJ2s and x0 is the initial point of ANIHT.
Proof.
We first show(15)Jx⊤rx-Jxk⊤rxk,x-xk≤L~2sx-xk2.Since ANIHT algorithm generates monotonically decreasing function values, then r(xk)≤r(x0) for all k. Direct calculation yields that (16)Jx⊤rx-Jxk⊤rxk,x-xk=Jx⊤rx-Jx⊤rxk+Jx⊤rxk-Jxk⊤rxk,x-xk≤Jx⊤rx-rxk,x-xk+rxk,Jx-Jxkx-xk=Jx⊤∫01Jxk+tx-xkdtx-xk,x-xk+rxk,Jx-Jxkx-xk≤Jx⊤Jxxk-x,x-xk+Jx⊤∫01Jxk+tx-xkdt-Jxx-xk,x-xk+rx0J2sx-xk2=Jxx-xk,Jxx-xk+∫01Jxk+tx-xk-Jxdtx-xk,Jxx-xk+J2srx0x-xk2≤U2s2x-xk2+∫01tJ2sx-xk2dt·U2sx-xk+J2srx0x-xk2≤U2s2+J2srx0+ρU2sJ2sx-xk2=L~2sx-xk2. It follows from (15) that (17)rx2-rxk2-2Jxk⊤rxk,x-xk=2∫01Jxk+tx-xk⊤rxk+tx-xk-Jxk⊤rxk,x-xkdt≤2∫01tL~2sx-xk2dt=L~2sx-xk2, which completes the proof.
Lemma 6.
Suppose that Assumptions 3 and 4 hold and xk is the iterative point of ANIHT with 0<α0<1/L~2s. Then(18)rxkα2≤rxk2-1α0-L~2sxkα-xk2,if0<α≤α0,rxk2-σxkα-xk2,if0<α≤1L~2s+σ.Therefore, αk is well defined.
Proof.
According to the computation in Step 2, we have (19)xkα∈argminx-xk+αJxk⊤rxk2,x0≤s,which implies that (20)xkα-xk+αJxk⊤rxk2≤αJxk⊤rxk2,that is,(21)xkα-xk2≤-2αJxk⊤rxk,xkα-xk.If suppx~k+1=supp(xk), then the above inequality and the monotonicity of {r(xk)} yield that (22)xkα-xk2≤-2αrxk,Jxkxkα-xk≤2αrxk·Jxkxkα-xk≤2α0rx0U2sxkα-xk. Otherwise, by the Armijo-type stepsize rule and the monotonicity of {r(xk)}, we have (23)xkα-xk2≤1σrxk2-rxkα2≤1σrx02.Then xk(α)-xk can be smaller than max{2α0U2s,1/σ}·r(x0).
By Lemma 5 and (21), we get that(24)rxkα2≤rxk2+2Jxk⊤rxk,xkα-xk+L~2sxkα-xk2≤rxk2-1αxkα-xk2+L~2sxkα-xk2=rxk2-1α-L~2sxkα-xk2. If supp(x~k+1)=Γk, α=α0<1/L~2s, then 1/α-L~2s>0. Otherwise, by letting 1/α-L~2s≥σ, we can obtain the desired result by the definition of α.
2.1. Convergence
Combining Assumptions 3 and 4 and Lemma 6, the convergence of ANIHT can be established in this subsection.
Theorem 7.
Let Assumptions 3 and 4 hold and let {xk} be generated by ANIHT with 0<α0<1/L~2s. Then
limk→∞xk+1-xk/αk=0;
any accumulation point of {xk} is the stationary point of (2).
Proof.
(i) It follows from (18) that r(xk)2-rxk+12≥cxk+1-xk2, where c=min{1/α0-L~2s,σ}>0. Then (25)∑k=0∞xk+1-xk2≤1c∑k=0∞rxk2-rxk+12<1crx02<+∞,which signifies limk→∞xk+1-xk=0.
It follows from Assumption 3 that (26)Jxk⊤rxkΓk2JxkΓkJxk⊤rxkΓk2≥1U2s2,and then αk0≥min{α0,1/U2s2}. By Lemma 6 and the definition of αk in the algorithm, we have (27)αk≥minαk0,βL~2s+σ.Therefore, αk is bounded from below by a positive constant minα0,β/L~2s+σ (β/L~2s+σ<1/U2s2). We can conclude that (28)limk→∞xk+1-xkαk=0.
(ii) Suppose that x∗ is an accumulation point of the sequence {xk}; then there exists a subsequence {xkj} converging to x∗ and limj→∞xkj+1=x∗ by (i). For the simplification of notation, denote αk as α by its boundedness and Γ∗=supp(x∗). Based on (29)xkj+1=PSxkj-αJxkj⊤rxkjin Step 2, we consider two cases.
Case 1 (i∈Γ∗). The convergence of {xkj} and {xkj+1} guarantees that, for some n1>0, xikj>0, xikj+1>0 for all j>n1. The definition of the projection on S shows that (30)xikj+1=xikj-αJxkj⊤rxkji. Taking j→∞, we have (Jx∗⊤rx∗)i=0.
Case 2 (i∉Γ∗). If there exists an n2>0 such that, for all j>n2, xikj+1=0, the projection implies that (31)xikj-αJxkj⊤rxkji≤Msxkj+1.Letting j→∞ and exploiting the continuity of the function Ms, we obtain that (32)αJx∗⊤rx∗i≤Msx∗. On the other hand, if there exists an infinite number of indices of kj for xikj+1>0, as the same proof in Case 1, it follows that (J(x∗)⊤r(x∗))i=0. Since αk is bounded from below by a positive constant, we have(33)Jx∗⊤rx∗i=0,ifi∈Γ∗,≤1αMsx∗,ifi∉Γ∗, which means x∗ is a stationary point of (2).
We are now ready to show that under suitable conditions the support set of a point is identified in a finite number of iterations. We can easily verify that if x∗0=s, then the support set of x in a sufficiently small neighborhood of x∗ is identified. For x∗0<s, we introduce the concept of strict complementarity to identify the support set.
Definition 8.
The point x∗∈S is called nondegenerate if x∗0=s. The condition(34)Jx∗⊤rx∗Γ∗=0,i∈Γ∗≠0,i∉Γ∗is called strict complementarity condition of (2), where Γ∗=supp(x∗).
Theorem 9.
For any sequence {zk} converging to z∗, we have the following:
if z∗ is nondegenerate point, then(35)suppzk=suppz∗
for all k sufficiently large and(36)limk→∞Jzk⊤rzkΓzk=0,where Γzk=supp(zk) and Γz∗=supp(z∗).
if z∗ satisfies strict complementarity, then (35) holds if and only if (36) holds.
Proof.
(i) Suppose that z∗ is a nondegenerate point. We have (35) when zk is sufficiently close to z∗. By (35) and the continuity of J(·)⊤r(·), we can easily get that (37)limk→∞Jzk⊤rzkΓzk=limk→∞Jzk⊤rzkΓz∗=Jz∗⊤rz∗Γz∗=0.
(ii) Suppose that z∗ satisfies strict complementarity condition. The “only if” part can be obtained by the similar proof as in (i). Let (36) hold. Since zk→z∗, Γz∗⊆Γzk. Assume that there is an infinite subsequence K and an index j such that j∉Γz∗ and j∈Γzk for all k∈K. By (36), (38)Jz∗⊤rz∗j=limk→∞Jzk⊤rzkj=0. While the strict complementarity condition (34) implies that [J(z∗)⊤r(z∗)]j≠0. This contradiction proves that Γz∗=Γzk.
3. Numerical Experiments
In this part, sensor localization problem and phase retrieval problem will be stimulated. In both examples, the stop criteria will be set as (Jxk⊤r(xk))Γk≤ϵ2, where ϵ2 is pretty small in different cases or the maximum iterative times being equal to 5000.
Sensor localization problem can be described as follows: given M known anchors p1,p2,…,pM∈Rn, the purpose is to find a sensor x∈Rn satisfying (39)x-pi2+ηi=bi,i=1,…,M,x0≤s, where ηi, i=1,…,M, is the noise (which obeys the normal distribution with zero expectation and σ02 variance here). The problem of finding an x∈Rn satisfying above equalities is the same as finding an optimal solution to the optimization problem (4) with f(x)=∑i=1M(x-pi2-bi)2. We first compare the ANIHT on an example with M=80, N=120, and s=1,2,…,10. Each component of the M vectors p1,…,pM was randomly and independently generated from a standard normal distribution. Then the true vector xorig and b are designed as following MATLAB codes:(40)xorig=zerosn,1;T=randpermn;xorigT1:s=10∗rands,1;bi=xorig-pi2+σ0*randn1,1.For each value of s(s=1,2,…,10), we ran ANIHT algorithm from 100 different and randomly generated initial data sets. The numbers of runs of 100 in which the methods found the “correct” solution are given in Table 1. Here, the “correct” solution x(x0≤s) means that f(x)≤f(xorig), or x and xorig are very close, say (41)x-xorigx<10-3. As can be clearly seen by the results in the table, the ANIHT performs well in terms of the success probability. For more details, when the “true” solutions are quite sparse compared to the dimension n, ANIHT can almost recover all the “ture” solutions, while the performance is becoming worse as s rises.
The success numbers over 100 runs under different noise.
s
1
2
3
4
5
6
7
8
9
10
σ0=0.00
100
99
100
99
94
90
86
82
82
75
σ0=0.01
100
100
99
96
99
93
91
91
82
75
σ0=0.10
100
98
94
83
81
68
66
56
54
46
Then we run ANIHT algorithm in a higher dimensional data set, where N=2M, s=⌈0.01N⌉, and M=100,200,…,2000. For each data set, we run 40 times and record the average results (in which the unsuccessful recoveries are expelled). Figure 1 shows the performance of ANIHT when addressing this problem.
Average results over 40 simulations with different noise.
Phase retrieval is to recover a signal from the magnitude of its Fourier transform, or of any other linear transform. Due to the loss of Fourier phase information, the problem is generally ill-posed. The phase retrieval problem can be described as follows: given M known measurement vectors p1,p2,…,pM∈Rn, the purpose is to reconstruct a signal x∈Rn satisfying (42)x,pi2+ηi=bi,i=1,…,M,x0≤s, where pi is the ith column of the general matrix or the discrete Fourier transform (DFT) and ηi, i=1,…,M, is the noise (which obeys the normal distribution with zero expectation and σ02 variance here). Also this problem is equivalent to recover an optimal solution to the optimization problem (4) with f(x)=∑i=1M(x⊤Aix-bi)2, where Ai=pipi⊤, i=1,…,M.
There are some other methods for sparse phase retrieval and the codes are available. So we can compare our ANIHT algorithm with them. We first compare the ANIHT algorithm with the partial sparse-simplex method (PSS) and greedy sparse-simplex (GSS) method in [17] with M=80, N=120, and s=2,3,…,9 which is identical to those in [17]. The true vector xorig and the measurement vectors p1,p2,…,pM are generated as that produced in sensor localization problem; b is designed as following MATLAB codes:(43)bi=xorig,pi2+σ0∗randn1,1.For each value of s(s=2,3,…,9), we ran ANIHT algorithm from 100 different and randomly generated initial data sets. The numbers of runs of 100 in which the methods found the “correct” solution are given in Table 2. As can be clearly seen by the results in the table, the ANIHT outperforms PSS and GSS in terms of the success probability. What is more, the data in the row with σ0=0 are higher than those with σ0=0.01 and σ0=0.1.
The success numbers of four methods over 100 runs under different noise.
s
2
3
4
5
6
7
8
9
ANIHT
σ0=0.00
100
98
91
85
76
70
52
41
σ0=0.01
92
98
93
88
68
60
47
34
σ0=0.10
78
85
82
85
68
54
43
31
PSS
σ0=0.00
35
28
22
9
5
9
4
3
σ0=0.01
33
25
20
7
8
7
5
3
σ0=0.10
30
24
16
4
5
6
3
2
GSS
σ0=0.00
80
75
69
20
17
13
7
6
σ0=0.01
57
52
35
30
27
15
6
5
σ0=0.10
65
54
27
60
25
12
5
3
We also compare our ANIHT algorithm with GESPAR in [19] to recover a signal from the magnitude of its Fourier transform. Namely, it is to find a real-valued discrete time signal x∈RN from its magnitude-squared value of an N point discrete Fourier transform (DFT): (44)bj=∑k=1nxke-2πij-1k-1/N2,j=1,…,N. We denote by F the DFT matrix; then elements Fjk=e-2πi(j-1)(k-1)/N and b=Fx2, where ·2 denotes the element-wise absolute-squared value. We get b by the pseudo MATLAB codes: (45)b=absfftxorig.∧2+σ0∗randnN,1.
To see the accuracy of the solutions and the speed of these two methods, we run the two methods for n increasing from 512 to 3072 and keeping N=2n, s=1%n. We also test them under noiseless and two noise levels, σ0=0.01 and σ0=0.1. From Table 3, we can see that ANIHT outperforms GESPAR in terms of both average CPU time and average relative error for large n(n≥2048).
Average results with N=2n,s=1%n.
n
σ0=0
Time
Relative error
ANIHT
GESPAR
ANIHT
GESPAR
256
0.1869
0.09
3.013e-09
4.97e-06
512
0.541
0.18
2.816e-11
1.936e-06
1024
1.021
1.03
9.234e-11
2.235e-06
2048
2.732
21.83
1.587e-06
4.683e-06
3072
5.443
491.22
1.067e-06
1.300e-06
n
σ0=0.01
Time
Relative error
ANIHT
GESPAR
ANIHT
GESPAR
256
0.183
0.0960
1.756e-06
1.6470e-05
512
0.456
0.2022
5.404e-06
1.0901e-06
1024
0.976
1.0755
1.663e-06
1.0517e-06
2048
2.701
22.64
1.184e-06
1.0579e-06
3072
5.251
472.48
3.780e-07
1.3010e-06
n
σ0=0.1
Time
Relative error
ANIHT
GESPAR
ANIHT
GESPAR
256
0.179
0.0914
5.912e-06
4.0600e-05
512
0.167
0.2056
7.974e-05
1.0901e-06
1024
1.013
0.7589
3.178e-05
1.0517e-06
2048
2.750
37.36
9.063e-06
9.9184e-06
3072
5.417
299.62
4.403e-06
1.3010e-06
4. Conclusion
Nonlinear CS (NCS) not only is of academic interest but also might be important in many real-world applications when the measurements cannot be designed to be perfectly linear. In this paper, we have proposed an ANIHT algorithm for NCS and studied its convergence. We have showed that any accumulation point of the algorithm is the stationary point. The support set of the sequence can be identified with the assumption of nondegeneracy and strict complementarity of stationary point. The numerical experiments show that ANIHT algorithm is effective for NCS. In the future, we will further consider other methods for nonlinear least square problem to improve the rate of convergence, such as L-M method or cubic regularization methods [20].
Competing Interests
The author declared that there are no competing interests in their submitted paper.
Acknowledgments
This research was supported by National Natural Science Foundation of China (11271233) and Shandong Province Natural Science Foundation (ZR2012AM016).
CandesE. J.TaoT.Decoding by linear programmingDonohoD. L.Compressed sensingEldarY. C.KutyniokG.FoucartS.RauhutH.BlumensathT.Compressed sensing with nonlinear observations and related nonlinear optimization problemsOhlssonH.YangA. Y.DongR.SastryS. S.Nonlinear basis pursuitProceedings of the 47th Asilomar Conference on Signals, Systems and ComputersNovember 2013Pacific Grove, Calif, USA11511910.1109/acssc.2013.6810285TemlyakovV. N.ZheltovP.On performance of greedy algorithmsMallatS. G.ZhangZ.Matching pursuits with time-frequency dictionariesDavisG.MallatS.AvellanedaM.Adaptive greedy approximationsNeedellD.TroppJ. A.CoSaMP: iterative signal recovery from incomplete and inaccurate samplesDaiW.MilenkovicO.Subspace pursuit for compressive sensing signal reconstructionFoucartS.Hard thresholding pursuit: an algorithm for compressive sensingBlanchardJ. D.TannerJ.WeiK.CGIHT: conjugate gradient iterative hard thresholding for compressed sensing and matrix completionBlumensathT.DaviesM. E.Iterative thresholding for sparse approximationsBlumensathT.DaviesM. E.Iterative hard thresholding for compressed sensingBlumensathT.DaviesM. E.Normalized iterative hard thresholding: guaranteed stability and performanceBeckA.EldarY. C.Sparsity constrained nonlinear optimization: optimality conditions and algorithmsCartisC.ThompsonA.A new and improved quantitative recovery analysis for iterative hard thresholding algorithms in compressed sensingShechtmanY.BeckA.EldarY. C.GESPAR: efficient phase retrieval of sparse signalsCartisC.GouldN. I.TointP. L.On the evaluation complexity of cubic regularization methods for potentially rank-deficient nonlinear least-squares problems and its relevance to constrained nonlinear optimization