Efficient LED-SAC Sparse Estimator Using Fast Sequential Adaptive Coordinate-Wise Optimization ( LED-2 SAC )

Solving the underdetermined system of linear equations is of great interest in signal processing application, particularly when the underlying signal to be estimated is sparse. Recently, a new sparsity encouraging penalty function is introduced as Linearized Exponentially Decaying penalty, LED, which results in the sparsest solution for an underdetermined system of equations subject to the minimization of the least squares loss function. A sequential solution is available for LED-based objective function, which is denoted by LED-SAC algorithm. This solution, which aims to sequentially solve the LED-based objective function, ignores the sparsity of the solution. In this paper, we present a new sparse solution.The newmethod benefits from the sparsity of the signal both in the optimization criterion (LED) and its solution path, denoted by Sparse SAC (2SAC).The new reconstructionmethod denoted by LED-2SAC (LED-Sparse SAC) is consequently more efficient and considerably fast compared to the LED-SAC algorithm, in terms of adaptability and convergence rate. In addition, the computational complexity of both LED-SAC and LED-2SAC is shown to be of orderO(d), which is better than the other batch solutions like LARS. LARS algorithm has complexity of orderO(d+nd), where d is the dimension of the sparse signal and n is the number of observations.


Introduction
Compressed sensing (CS) signal processing has gained a lot of popularity due to the lower sampling rate needed for reconstruction of the original signal compared to that of Nyquist lower bound of sampling rate.Consequently, the signal is sampled in a compressed manner via the sampler and there is no need for an additional compression procedure (as it is common in conventional signal processing).
The main challenge in CS signal processing is how to recover the original signal using the few samples from the sampler.Suppose the sparse signal,  * ∈ R  , ‖ * ‖ 0 =  ≪ , is to be estimated via the following linear regression model: where   , x  ∈ R  , and V  are the observation, regressor and observation noise at time index , respectively.V  is assumed to be additive white Gaussian noise with mean 0 and variance  2 .In the matrix form, the linear regression model (1) will have the following form: where y = [ 1 , . . .,   ]  , X = [x  1 , . . ., x   ]  , and k = [V 1 , . . ., V  ]  are the observation vector, regression matrix (also called measurement matrix in CS), and noise vector, respectively.
Variable selection and high prediction accuracy are two major issues in sparse signal estimation.Variable selection is necessary for sparsity aware signal estimation.The common approach is to introduce a penalty to the overall objective function which will guarantee sparsity in the estimated signal.In order to ensure high prediction accuracy, a suitable loss function must be introduced.The most common loss function is the  2 loss function due to its convexity which results in the well-known least squares solution.A general form of the objective functions to be minimized in order to 2 Mathematical Problems in Engineering recover the sparse solution, which is also accepted through this paper, is as follows: y − X * where P(⋅) is the penalty function and  is a tuning parameter that balances the prediction accuracy and sparsity.Depending on the choice of the loss and sparsity encouraging functions and the techniques used for solving the optimization problem, extensive works have been presented in the literature.Some sophisticated and relatively high precision approaches are basis pursuit [1] and greedy [2,3] reconstruction algorithms which perform batch-based estimation.The other class of batch-based sparse estimation algorithms is based on the   norm ( < 1) as the sparsity encouraging factor [4][5][6][7][8].As it is reported, using the   norm in the objective function will effectively reduce the number of required measurements compared to that of the  1 norm reconstruction.However, there is no analytic guarantee.
Adaptive signal reconstruction is of interest in applications where the sparse signal of interest undergoes variations in its support as well as the magnitude of its nonzero entries.Furthermore, in most signal acquisition devices, the observations are obtained sequentially.Thus, the variations in the support of the unknown signal can be sensed by sequentially processing the observations rather than batch processing.
In [9], observations are received in sequence and without any prior assumptions on the signal sparsity; the reconstruction error is computed between observations in order to decide whether enough samples have been obtained.Variational adaptive filters are extensively used to sparse signal reconstruction as in [10][11][12][13][14][15][16][17][18].The resulting estimators are sequential but too slow.Furthermore, due to the lack of a direct variable selection stage, like a thresholding rule, exact reconstruction of the zero parameters is impossible.
Recently, the family of Least-Absolute Shrinkage and Selection Operator (Lasso) objective functions has gained a lot of popularity in sparse signal reconstruction context [19].Lasso in its standard form includes the  1 penalized least squares error criterion, which continuously shrinks the parameters toward zero.The Least Angle Regression (LARS) is the most famous batch-based algorithm to solve the Lasso problem [20] (a similar solution is also already proposed by Osborne et al. [21]).
It is shown that the standard Lasso objective function leads to biased estimator, which is due to the pure soft thresholding stage used in the estimation procedure [22][23][24][25].SCAD (Smoothly Clipped Absolute Deviation) and adaptive Lasso are two alternatives which are presented in [22,24], respectively, in order to make the standard Lasso an unbiased estimator.We have presented the LED objective function in [26] as another alternative for Lasso objective function and it is demonstrated that the resulting estimator outperforms the SCAD and adaptive Lasso estimators.
Based on the SCAD and adaptive Lasso objective functions, the TNWL and AdaLasso algorithms are presented in [27,28,32], respectively, which are the adaptive and sequential implementation of these objective functions.We have also developed an adaptive and sequential solution for the LED objective function in [29], namely, LED-SAC, by solving the objective function in a coordinate-wise manner.It is shown that the proposed algorithm satisfies the oracle properties of asymptotic normality and consistency in variable selection and reaches better tracking performance compared to TNWL and AdaLasso.
Although the LED-SAC is a sparse reconstruction algorithm, it doesn't take the advantage of the sparsity of the signal to be estimated in its solution path.In this paper, we first study the complexity of LED-SAC algorithm.Then, a solution path is presented for LED-SAC algorithm, which benefits the sparsity of the signal to be estimated.More specifically, the most effective coordinates of the sparse signal are detected and the update procedure is done merely for those coordinates.Consequently, the resulting estimator, that is LED-2SAC, is more efficient and faster in terms of convergence rate compared to original one and it is shown that it has the same order of complexity as LED-SAC.

Summary of LED-SAC Reconstruction Algorithm
The exponentially decaying sparsity encouraging penalty function presented in [26] is as follows: where   and  are the parameters of the penalty function which define its shape.Considering the penalty function in (4), the penalization rate reduces from a constant value to 0 (unlike standard Lasso), and the transition is smooth and controlled by the parameter , contrary to SCAD penalty, which has rough and linear decaying rate.The overall objective function, which is obtained by augmenting the least squares loss function to the penalty function in (4), is nonconvex and cannot be solved via wellestablished convex optimization tools.Moreover, solving this nonconvex optimization problem may lead to a local minimum which is not the sparsest solution.In [26], we have obtained a convex approximation for this objective function by locally linearizing it using Taylor series expansion around some consistent estimate of  * .The resulting objective function is called LED objective function which is given as where   is a consistent estimator of  * , such as the ordinary least squares solution (In [29] we have demonstrated the effects of decreasing the number of observations, , while   is the ordinary least squares solution.The results show that, decreasing the number of observations degrades the performance of the sparse reconstruction algorithm.In this case, the solution of other suitable estimators like ridge regression estimator can be used, which is experimentally shown to perform better).The properties of asymptotic normality and consistency hold for the LED objective function as long as the two constraints of lim  → ∞   → 0 and lim  → ∞ √  → ∞ are met, as  goes to infinity and the observation noise has finite variance [26].
In order to solve the approximated objective function in (5), one needs to solve a multivariate optimization problem, which is computationally too expensive to solve especially for sparse signals of higher dimension.In [29], a sequential and adaptive solution path is developed in order to solve the LED objective function in (5), called LED-SAC.The basic idea lies behind the fact that the objective function is convex and separable for each variables of the sparse signal .Therefore, the optimization problem is well suited to solve via the coordinate-wise optimization method.The proposed LED-SAC algorithm is capable of tracking the time variations in the support of the underlying sparse signal.The uniqueness of the LED-SAC algorithm is that it uses a novel sparsityencouraging penalty and solves the overall objective function sequentially.As it is reported in [29], the LED-SAC estimator outperforms the TNWL and AdaLasso estimators in terms of mean squared error as well as tracking capability.
In what follows, we will explore the possibility of improving the convergence rate of the solution path for the LED-SAC algorithm.If so, a solution path is given to solve the objective function in (5), which itself is sparse in the sense that it has the sparsity of the signal to be estimated in mind, while pursuing the solution path.Therefore, the optimization method is itself sparse, which also leads to the sparsest solution.Consequently, the tracking capability of the algorithm will be increased, that is vital for online implementation, particularly in time varying sparse signal scenario.

The Proposed LED-Sparse SAC (LED-2SAC) Reconstruction Algorithm
Although the LED-SAC estimator restricts the solution of the under determined system of equations in (5) to the sparsest solution, it ignores the sparsity of the signal to be estimated in the solution path.This is due to the fact that at each iteration, it updates all of the coordinates of the sparse signal in cyclic manner, no matter which one of them belongs to the support of the true signal.This will significantly increase the number of observation needed to reach the desired precision in the reconstructed signal.As the signal of interest is sparse, most of its coordinates are zero and need not to be updated.Furthermore, in time varying sparse signal scenario, detecting and updating the coordinate, which has the most variation in its value, will significantly increase the tracking ability of the estimator.Therefore, despite all advantages, updating whole coordinate set in a cyclic manner while most of them have zero values will still remain as the main shortcoming of LED-SAC estimator.This kind of blindly processing the coordinates of the signal to be estimated makes the reconstruction algorithm inefficient and rather slow to converge to a solution.
In order to consider the sparsity of the signal in the solution path and implement the LED-SAC more efficiently, one needs to introduce a procedure to detect the most effective coordinate/coordinates at each iteration and perform the update procedure for those coordinates.The objective function in (5) can be decomposed into two parts as follows: where J(  ) is the portion of J() which merely depends on   and the term "constant" is a function of all of the entries of the parameter vector  except   , so it can be considered as constant with respect to   .Let us define ΔJ(  ) as the difference between J(  () ) and J(  (−1) ), such that ) , where   () is the th entry of the parameter vector  at time .One can select the most effective coordinate as the coordinate which makes the most changes to ΔJ  , as [30]: Selecting the most effective coordinate as in (8) leads to the best tracking performance for the adaptive estimator; however, it is to some extent computationally expensive.This is due to the fact that before obtaining  * using (8), one has to obtain the estimated values for all of the elements of the sparse vector  at the current time step, that is  () , which is not a suitable strategy for online implementation of the reconstruction algorithm.
Another alternative which is much more straight forward, and has lower computational burden, is to use the directional derivatives of the objective function J(  (−1) ) in order to detect the most effective coordinate at each iteration.This approach is initially introduced in [31] for Lasso penalized regressions.In order to proceed in this way, we need to define the forward and backward derivatives of J(  (−1) ) with respect to   (−1) as follows: where   is the coordinate direction, along which Considering the directional derivatives of J(  (−1) ) with respect to   (−1) in ( 10), the most effective coordinate is the one which has the most negative value either for forward or backward derivatives as It is noteworthy that one can obtain the  most effective coordinates by finding the  coordinates which have the most negative values among  coordinates.This is of interest in the situations in which the signal sparsity, , is known or there exists an approximate value for that.Furthermore, in the case of unknown sparsity, increasing the value of  will increase the convergence rate of the estimator.However, this will come at the cost of increasing the computational burden.
In the next section, we will give the complexity analysis for both LED-SAC and LED-2SAC algorithms and show that they both have the same order of complexity.

Complexity Analysis of LED-SAC and LED-2SAC Algorithms
The solution flow of the LED-SAC algorithm for one of the elements of the sparse signal, that is θ () , at time index , is shown in Figure 1 where the shrinkage and update procedures are given in ( 12), (13), and ( 14), respectively, as follows: where the shrink (⋅) operator in ( 12) is defined such, that shrink (, ) = (|| − ) + sgn(), where () + =  if  > 0 and it is 0 otherwise.According to Figure 1, the input parameters are the observation and the corresponding regressor at time index .In order to obtain θ () , which is the estimate of the th coordinate at time index , using the shrinkage procedure in ( 12) one needs to have the th simple least squares coefficient, that is (, ), as well as C  (), at the current time index.At each iteration, updating C  (), having C  ( − 1), needs O( 2 ) algebraic operations.Having C  (), obtaining (, ) via ( 7), needs O() algebraic operations.Due to the fact that the recursive least squares estimate is used as the consistent estimate of the sparse signal    in (14), and considering the complexity of the recursive least squares algorithm, which is of order O( 2 ), the overall complexity of the LED-SAC algorithm at each time step is O( 2 ) whereas the computational complexity of the wellknown batch-based algorithm such as LARS, which performs multivariate optimization, is of order O( 3 +  2 ).Thus, the LED-SAC algorithm is cheaper in terms of computational burden, especially for signals of higher dimensionality, while it is capable of tracking the variations in the sparse signal contrary to the LARS algorithm.

Computational Complexity of the LED-2SAC Algorithm.
Taking into account the sparse solution path given in Section 3 and the recursive relations for the parameters C  () and (, ) in ( 13) and ( 14), respectively, the pseudo code for efficient and fast implementation of the LED-SAC algorithm using the directional derivatives is given as LED-2SAC algorithm in Algorithm 1.The implementation is done for  most effective coordinates of the underlying sparse signal.
According to Algorithm 1, the computational complexity of LED-2SAC algorithm is the same as LED-SAC, except for the stage in which the indices of the most effective coordinates are to be estimated.For the case of  = 1, one needs to compute the forward and backward derivatives of J(  (−1) ) via (10), having the estimate of the sparse signal from the previous time step, that is  (−1) , and the simple least squares coefficients (, ) for  = 1, . . ., .Computing  +  J( which do not affect the complexity order of the original algorithm, LED-SAC, which is of order O( 2 ).Therefore, the presented algorithm, LED-2SAC, reaches higher convergence rate and tracking capability compared to LED-SAC algorithm with just a little increase in the computational burden.

Simulation Results
In what follows, the simulation results for the presented LED-2SAC reconstruction algorithm are given and compared to the former version, that is LED-SAC in [29], as well as TNWL reconstruction algorithm in [32] and AdaLasso algorithm presented in [24,28].
Likewise [26,29], the data set  is generated according to the model in (1) with  = 30 and V  ∼ N(0, 1).The regressors x  are also assumed to be samples from a Gaussian density of the form x  ∼ N(0  , I  ).The parameter vector  comprises  randomly allocated ±1 entries, while all the other remaining entries are set to 0. Therefore, the tuning parameters for the parameter   =  − √ 2 log() are set to the same values as in [26], which are extracted from cross validation, that is (, , ) = (2.5, 2.7, 0.25).
In Figure 2, the learning curves of the algorithms are given in terms of the MSE plots.The MSE plots are obtained over 50 repetition of the experiment.The sparsity of the underlying signal to be estimated is set to 3 and the observation noise variance is set to 0.5.Comparing the convergence rate and the steady state error for the presented algorithm and the others reveals the superiority of the LED-2SAC algorithm.As it can be seen, the LED-2SAC algorithm has significantly improved the MSE performance, even compared with its former version, that is LED-SAC.This is due to the efficient implementation of the presented algorithm, such that at each iteration the most effective coordinate is detected and updated first.
In the next experiment, the sparsity of the signal is set to 10, and the results are shown in Figure 3.As it is seen, decreasing the sparsity of the signal has less affected the LED-2SAC algorithm, which is an interesting property for a sparsity aware reconstruction algorithm, since most of the existing methods fail to maintain their performance for lower sparsity levels.
In order to compare the performance of the algorithms in terms of variable selection capability, the percentage of  exact model selection is given in Figure 4 for two different observation noise powers versus the number of iterations (or number of observations in online implementation).As it is seen, the LED-2SAC algorithm reaches the best model selection rates in different situations.The TNWL and AdaLasso algorithms have almost the same performance in terms of variable selection, as reported in [29], which are overcome by LED-2SAC and LED-SAC algorithms.The plots of percentage of exact model selection for the algorithms are also given in Figure 5 for two different sparsity levels.Again, in this case, the presented LED-2SAC has retained the superior performance.As it was expected, the performance of the algorithms gets better by increasing the sparsity of the signal and vise versa.
The presented LED-2SAC algorithm is a sequential and adaptive reconstruction algorithm as LED-SAC, TNWL, and AdaLasso.So, we have to demonstrate its ability to track the variations in the support of the sparse signal.In the following, the adaptation capabilities of the algorithms are compared.For this purpose, in an arbitrary iteration (after all of the  algorithms are settled in their steady state performance), the support of the sparse signal is changed in the following fashion: one of the active coordinates is set to zero (inactive) and one of the inactive coordinates is set to 1 (active).The MSE curves are given in Figure 6 for  = 3.
As it is shown in Figure 6, the presented LED-2SAC algorithm has significantly improved the tracking capability of the former version, LED-SAC.However, both of the algorithms have outperformed the TNWL and AdaLasso algorithms.Figure 7 shows the same results for the case of  = 10.As it was expected from the results of Figures 2 and 3, the LED-2SAC algorithm has maintained its performance as the sparsity of the underlying signal is decreased.Finally, the estimation trajectories for an active coordinate becoming inactive and an inactive coordinate becoming active are shown in Figures 8 and 9, respectively.The LED-2SAC algorithm has outperformed all of the competitors as it was expected from the results of Figures 6 and 7.The interesting thing about Figure 8 is the fact that the LED-2SAC algorithm has almost abruptly detected and updated the inactive element.Although this is not the case for the entire coordinates, it happens more often.

Conclusion
In this paper, an efficient solution is proposed to sequentially solve the LED-based objective function that unlike the  existing SAC solution considers the sparsity of the signal to be estimated.Consequently, the proposed algorithm, denoted by LED-2SAC, leads to a significant improvement in the convergence rate and tracking capability of the original LED-SAC algorithm.Moreover, the complexity analysis of LED-SAC and LED-2SAC algorithms shows that both methods have the same order of complexity with additional improved convergence and adaptability behavior in LED-2SAC.Finally, the simulation results are given for the proposed algorithm.
Comparing the performance of the presented algorithm with the original one, as well as two other existing methods, confirms the superiority of the LED-2SAC in terms of convergence rate and adaptation capability.

Figure 1 :
Figure 1: Block diagram of the estimation of one of the elements of the underlying sparse signal at the current time index, θ () , for LED-SAC algorithm.

Figure 5 :Figure 6 :
Figure 5: Percentage of exact model selection (different sparsity levels are denoted in the brackets).

Figure 7 :Figure 8 :
Figure 7: MSE when one of the active elements is set to zero and one of the inactive elements is set to one,  = 30,  = 10, and  2 = 0.5.

Figure 9 :
Figure 9: Estimation trajectory of the estimate of an inactive parameter becoming active.