An Adaptive Gradient Projection Algorithm for Piecewise Convex Optimization and Its Application in Compressed Spectrum Sensing

Signal sparse representation has attracted much attention in a wide range of application fields. A central aim of signal sparse representation is to find a sparse solution with the fewest nonzero entries from an underdetermined linear system, which leads to various optimization problems. In this paper, we propose an Adaptive Gradient Projection (AGP) algorithm to solve the piecewise convex optimization in signal sparse representation. To find a sparser solution, AGP provides an adaptive stepsize to move the iteration solution out of the attraction basin of a suboptimal sparse solution and enter the attraction basin of a sparser solution. Theoretical analyses are used to show its fast convergence property. The experimental results of real-world applications in compressed spectrum sensing show that AGP outperforms the traditional detection algorithms in low signal-to-noise-ratio environments.


Introduction
The marked advances in signal processing in recent years have been driven by the emergence of new signal models and their applications.Signal sparse representation is an effective model for solving real-world problems, such as brain signal processing [1], face recognition [2], compressed spectrum sensing [3], and singing voice separation [4].
In this paper, we propose a novel Adaptive Gradient Projection (AGP) algorithm for the piecewise convex optimization (2).This algorithm moves the iteration solution out of the attraction basin of a suboptimal sparse solution and finds a sparser solution in another attraction basin.The convergence analysis reveals that AGP performs better than AST when finding the global optimal sparse solution.The experimental results show that the detection performances of compressed spectrum sensing based on AGP are greatly improved compared to other algorithms.
The remainder of the paper is organized as follows.In Section 2, we derive an Adaptive Gradient Projection algorithm that can find a sparser solution than AST.Section 3 presents the application of AGP to compressed spectrum sensing.The detection performances based on AGP are compared to the traditional spectrum sensing method.Finally, conclusions are presented in Section 4.

Adaptive Gradient Projection Algorithm for
Piecewise Convex Optimization Note that the objective function is nondifferentiable with zero components.AST uses an affine scaling transformation to solve problem (3).For the ( + 1)th iteration, it defines a symmetric scaling matrix ) and a scaled variable Thus, problem (3) in  is transformed to the problem in where where  ∈  × is an identity matrix,  + +1 =   +1 ( +1   +1 ) −1 is a Moore-Penrose pseudoinverse matrix, and   is a stepsize.
Using a fixed stepsize   = 1/, AST is summarized as The convergence theorem of AST is as follows.
From ( 9), we see that some small entries of iteration solution converge to zero, because they are sequentially compressed by the scaling elements in  +1 .Thus, a sequence of iteration solutions of AST will converge to a sparse solution  * , which may be close to  0 .However, this solution may be not the sparsest solution of problem (3).Making the iteration solution enter the attraction basin of other sparse solution is very important to reduce the effect of the initial point.Furthermore, Theorem 1 shows that AST obtains  * within an infinite number of iterations, which affects the convergence rate.How to enhance the convergence speed of AST is another problem to be solved.

Derivation of Adaptive Gradient Projection Algorithm.
To solve the above two problems, we first consider the convergence process of AST if an iteration solution has some zero entries.
Lemma 2. Given a block matrix  = (  1  2  3  4 ), we get where Lemma 3. If   with  zero entries is a solution of problem (3), then these zero entries will not change in the remaining iterations.
For simplicity, let the front  entries of   be zero (i.e.,   = ( 1  ,  2  )  = (0,  2  )  ), where  1  = (  (1), . . .,   ())  ,  2  = (  ( + 1), . . .,   ())  .Then,  +1 can be computed as follows: where ). Partitioning , we calculate  +1 in (11) to be and its Moore-Penrose pseudoinverse matrix is where Because  +1 is full row rank,  +1 ( +1 )  is also full rank, which has an invertible submatrix.For convenience, assume that  12 ( where 12) and ( 15) into ( 13), we have where ) . ( The front  entries of  +1 in ( 17) are still zero, and the reverse cannot occur in the remaining iterations.Lemma 3 implies that an unknown sparse solution can be identified in smaller and smaller subspace.It motivates us to accelerate convergence by sequentially shrinking the solving range.Meanwhile, the choice of subspace cannot be limited by the initial point; that is, the iteration solution is able to enter the subspace that does not contain the initial point.AST chooses a fixed stepsize, so it cannot move the iteration solution from one octant to another distant octant.These solutions concentrate in the attraction basin of the suboptimal sparse solution.Moving the iteration solutions out of the current attraction basin is a goal of the Adaptive Gradient Projection algorithm.
For the ( + 1)th iteration, let W+1 = diag(|   ()| 1−/2 ) and Ã+1 =    W+1 ; the gradient g and the search direction l are where   = ( 1 , . . .,    ) records the locations of entries as    () ̸ = 0 ( = 1, . . .,   ), the column vectors of    are selected from  due to   , and   ∈    ×  is an identity matrix.In the span space of    , the new solution x+1 is defined due to (6): where   is a stepsize.The major challenge in solving problem (3) is to identify the most appropriate subspace where the sparse solution locates.Equation (22) states that   is important to find an appropriate subspace; thus, a function  () ( x+1 ) with respect to  is defined to investigate the property of the stepsize Figure 1(b) shows that the piecewise convex function () has nonunique minimum points that correspond to the zero entries of x+1 .We can compute these extreme points.Without loss of generality, let every entry of x+1 equal to zero (i.e., x+1 () = 0).Then, the extreme point  , is computed as where  = 1, . . .,   and Ṽ+1 () = W+1 l .An adaptive stepsize is chosen to find the minimal objective function value and the new solution x+1 is written as The minimum point ( *  ,  () ( x+1 )), marked as " * " in Figure 1(b), is determined by comparing three extreme values.Using the adaptive stepsize to obtain an iteration solution is beneficial to accelerate convergence.However, AST cannot obtain a minimum point at the search direction  +1   in (6), and the corresponding point (1/,  () ( +1 )) is marked as "+" in Figure 1(a).Using the fixed stepsize makes the iteration solutions gather in the adjacent region of the initial point.On the other hand, we set x+1 () = 0 as | x+1 ()| <  ( = 1, . . .,   ), where  is a threshold.AGP then determines multiple zero entries at each iteration so that it can quickly identify subspace where the sparse solution locates.
As discussed above, AGP is summarized as follows.

Convergence Analysis.
The convergence property of AGP is discussed as follows.

Lemma 5. AGP can determine at least one zero entry at each iteration.
Due to (26), there are three cases that AGP determines zero entry at each iteration.The first case is that one entry of    is set to zero, if    moves from one octant to the coordinate surface of another distant octant.In the second case, more than one entry of    become zero when the new iteration solution x+1 exactly locates on the coordinate axis of another distant octant.In the third case, if x+1 locates on the side of the coordinate axis of another distant octant, we set x+1 () = 0 as | x+1 ()| <  ( = 1, . . .,   ).Therefore, AGP can determine at least one zero entry at each iteration.Theorem 6.Let  * ∈   with  nonzero entries be a sparse solution of problem (3).The number of iterations of AGP is no more than  − .If  * has  nonzero entries, then the locations of  −  zero entries need to be determined.According to Lemma 5, AGP obtains multiple zero entries at each iteration, and these zero entries do not change in the remaining iterations based on the definition in (20).Therefore, the number of iterations of AGP is no more than  − .
Remark 7. Theorem 6 shows that AGP obtains a sparse solution within a finite number of iterations, while Theorem 1 shows that AST requires an infinite number of iterations to obtain a sparse solution.In theory, the number of iterations of AGP is less than AST. Figure 2 gives an example to show the improved convergence performance of AGP compared to AST.There are three sparse solutions  * 1 ,  * 2 ,  * 3 ∈  3 , where  * 3 is the global optimal sparse solution.To clearly display the iteration process of AGP, we return all    to three dimensional space, which form a sequence {  }  =1 , where   (  ) ←    represents that the entries of    located in   which are assigned to   .
Starting from  0 , a sequence {  } 6 =1 solved by AST converges to  * 1 in Figure 2(a). 1 and reaches  * 3 .In Figure 2(d), the iteration solution moves out of the attraction basin of  * 1 and enters the attraction basin of  * 3 , in which the adaptive stepsize plays an important role.This example verifies that AGP can find a sparser solution than AST by calculating the minimum point of the search direction at each iteration.There exist two sparse solutions  * 1 = (0, 0.9453, 0, 0, 0, 0.4061, 0.4310, 0)  and  * 2 = (0, 0, 0, 1.2, 0, 0.7, 0, 0)  , where  * 2 is the global optimal sparse solution.We choose  0 =  +  as an initial point.Comparing the results in Table 1 with Table 2, we see that AGP quickly finds  * 2 in smaller and smaller subspace, while AST limited by the initial point just obtains the suboptimal solution  * 1 .On the other hand, the computing times of AST and AGP are 0.0063 s and 0.0089 s, respectively.AGP costs some time to find an adaptive stepsize at each iteration, but it can obtain the global minimizer.Obviously, it is more important to find the global optimal sparse solution of problem (3).

Application of Adaptive Gradient Projection Algorithm in Compressed Spectrum Sensing
The Compressive Spectrum Sensing (CSS) is considered for this study, because it performs the same tasks as signal sparse representation.In [15,16], the model of CSS is formulated as follows: 0 :  = Φ, PU absent where  is a measurement,  is a Primary User (PU) signal,  is a Gaussian noise,  +  is a Secondary User (SU) received signal, and Φ ∈  × is a Gaussian random matrix.Assume that  and  can be represented on the discrete cosine basis Ψ, (i.e.,  = Ψ  and  = Ψ  ), where   and   are spectrum coefficients.Let  = ΦΨ; the model in (30) can be reformulated as CSS intends to reconstruct  *  from  =   , so the reconstruction error   = ‖Φ *  − Φ  ‖ 2 /‖Φ  ‖ 2 is used to evaluate the reconstruction performance.
Corresponding to  and  +  in Figure 3(a), Figure 3(b) shows that   has  = 4 nonzero entries, while   is where  ∈  20×64 .After 21 iterations,  * AST solved by AST in Figure 3(d Next, we consider the detection performance using the reconstructed SU signal with denoising coefficient.Let   be a false alarm probability and  be a judgment threshold.Then, a binary hypothesis testing problem is used to determine whether the PU is present where (  ) = ‖  ‖ 2 2 denotes the energy of the detection signal   = Ψ  .The   Monte Carlo experiments are performed to test the detection probability   =   /  , where   is the number of accurately detecting PU signal.Given   = 0.05,   = 100, Figure 5 shows that the Energy Detection (ED) method [17] exhibits high mistaken probabilities in low signal-to-noise-ratio (SNR) environments.However, the detection probabilities of AST, AGP, IRL1, and ITM are greatly enhanced, when the reconstructed SU received signals are solved by the denoising spectrum coefficients.For example, when SNR equals to −5 dB, the detection probabilities using AST, AGP, IRL1, and ITM improve by 75.47%, 79.25%, 58.49%, and 47.17% compared with ED.Furthermore, when SNR changes from −15 dB to −1 dB, AGP shows better detection performance than AST, IRL1, and ITM because of its improved reconstruction performance.Note that the sparsity of the spectrum coefficient vector has impact on the reconstruction performance of   -norm minimization.Given SNR is −7 dB, Figure 7 displays that the detection probabilities of AST, AGP, IRL1, and ITM descend when K changes from 2 to 10.This is because the measurement vector  ∈  20 in (32) is not able to attain the whole information of a SU received signal when K is larger than 6.Therefore, the unsatisfactory reconstruction results of   -norm minimization greatly affect the detection performance of CSS via AST, IRL1, and ITM, while the performance degradation of CSS via AGP is slower than that of the three reconstruction algorithms.Meanwhile, AST, AGP, IRL1, and ITM cost more computing time to find the sparse solution in Figure 8.The above results reveal that the detection performance of CSS needs to be improved when the number of measurement is insufficient.ED determines the state of PU by measuring the energy of a SU received signal, so the property of the sparsity has little impact on its detection result and computing time.

Conclusions
Signal sparse representation has become a fundamental tool that is embedded into various application systems.One of its fundamental problems is finding a sparse coefficient.In this paper, we develop a novel AGP algorithm to solve the  norm minimization.Theoretical analysis demonstrates that AGP can find a sparser solution than AST, because it avoids the iteration solutions concentrating in the attraction basin of a suboptimal sparse solution.Applying AGP to compressed spectrum sensing, it can obtain the better detection performance than ED, AST, IRL1, and ITM by spending a little more computing time.Future research will extend AGP to more scenarios.

Figure 2 :
Figure 2: Iteration processes of AST and AGP.(a) Sequence solved by AST converges to  * 1 .(b) Sequence solved by AGP converges to  * 3 .(c) Iteration process of AST.(d) Iteration process of AGP.

Figure 2 (
Figure 2 gives an example to show the improved convergence performance of AGP compared to AST.There are three sparse solutions  * 1 ,  * 2 ,  * 3 ∈  3 , where  * 3 is the global optimal sparse solution.To clearly display the iteration process of AGP, we return all    to three dimensional space, which form a sequence {  }  =1 , where   (  ) ←    represents that the entries of    located in   which are assigned to   .Starting from  0 , a sequence {  } 6 =1 solved by AST converges to  * 1 in Figure 2(a).Figure 2(c) shows that all iteration solutions concentrate in the attraction basin of  * 1 in the contour map.The second iteration solution of AGP in

Figure 3 :
Figure 3: Reconstruction of the SU received signal using AST and AGP, respectively.(a) PU signal  and SU received signal  + .(b) Coefficients   and   .(c) Reconstructed SU received signals solved by AST and AGP.(d) Reconstructed coefficients solved by AST and AGP.

Figure 4 :
Figure 4: SU received signals are reconstructed with the denoising spectrum coefficients solved by AST and AGP, respectively.(a) Reconstructed SU received signals based on AST and AGP.(b) Denoising spectrum coefficients solved by AST and AGP.
) is unsatisfactory, while  * AGP solved by AGP converges to   after 17 iterations.The corresponding computing times of AST and AGP are 0.0154 s and 0.0227 s, respectively.The reconstructed signals Φ * AST and Φ * AGP are shown in Figure3(c), and the reconstruction errors are 27.22% and 9.99%, respectively.At the cost of a little computing time, the reconstruction performance of AGP is improved compared to AST in noise environment, so AGP exhibits better performance of noise suppression than AST.Note that the characteristic of noise suppressing can greatly improve the detection performance of CSS, especially when the number of nonzero entries  is not given in advance.We can reconstruct the SU received signal by choosing some larger nonzero entries of  *  with a threshold .Setting  = 10, Figure4(b) shows the denoising spectrum coefficients are sparser than the coefficients in Figure 3(d).The corresponding reconstructed SU received signals are shown in Figure 4(a).The reconstruction errors reduce to 21.71% and 8.50%.Meanwhile, the variance  2  of white noise  reduces to  2 ŵ.

Figure 5 :Figure 6 :
Figure 5: Comparison of the detection performance using ED, ITM, AST, AGP, and IRL1 for different SNRs.

Figure 6
Figure6shows the corresponding computing time of five reconstruction algorithms, in which the time consumption of AGP increases at most 43.66% and 23.39% compared with ITM and AST.Spending a little more computing time, AGP can attain the best result of spectrum sensing especially in low SNR environment.Note that the sparsity of the spectrum coefficient vector has impact on the reconstruction performance of   -norm minimization.Given SNR is −7 dB, Figure7displays that the detection probabilities of AST, AGP, IRL1, and ITM descend when K changes from 2 to 10.This is because the measurement vector  ∈  20 in (32) is not able to attain the whole information of a SU received signal when K is larger than 6.Therefore, the unsatisfactory reconstruction

Figure 7 :Figure 8 :
Figure 7: Comparison of the detection performance using ED, ITM, AST, AGP, and IRL1 for different sparsity of the spectrum coefficient vector.

Table 1 :
List of the partial iteration solutions solved by AST.

Table 2 :
List of the partial iteration solutions solved by AGP.