On a Gradient-Based Algorithm for Sparse Signal Reconstruction in the Signal / Measurements Domain

Sparse signals can be recovered from a reduced set of samples by using compressive sensing algorithms. In common compressive sensing methods the signal is recovered in the sparsity domain. A method for the reconstruction of sparse signals that reconstructs the missing/unavailable samples/measurements is recently proposed. This method can be efficiently used in signal processing applications where a complete set of signal samples exists. The missing samples are considered as the minimization variables, while the available samples are fixed. Reconstruction of the unavailable signal samples/measurements is preformed using a gradientbased algorithm in the time domain, with an adaptive step. Performance of this algorithm with respect to the step size and convergence are analyzed and a criterion for the step-size adaptation is proposed in this paper. The step adaptation is based on the gradient direction angles. Illustrative examples and statistical study are presented. Computational efficiency of this algorithm is compared with other two commonly used gradient algorithms that reconstruct signal in the sparsity domain. Uniqueness of the recovered signal is checked using a recently introduced theorem. The algorithm application to the reconstruction of highly corrupted images is presented as well.


Introduction
A signal is sparse in a transformation domain if the number of nonzero coefficients is much lower than the number of signal samples.For linear signal transforms the signal samples can be considered as linear combinations (measurements) of the signal transform coefficients.Sparse signals can be fully recovered from a reduced set of samples/measurements. Compressive sensing theory is dealing with sparse signal reconstruction .In common signal processing problems, for signals sparse in a transformation domain, signal samples can be considered as observations since they are linear combinations of the transformation domain coefficients.The goal of compressive sensing is to reduce the set of signal measurements while preserving complete information about the whole signal.In some applications unavailable signal samples result from the system physical constraints including unavailability to perform all signal measurements.Signals may also contain arbitrarily positioned samples that are heavily corrupted.In these cases it is better to remove such samples and consider them as unavailable in the signal recovery [6].Compressive sensing theory based reconstruction algorithms may be also used when the unavailable signal samples are not a result of an intentional strategy to compress the data.Note that one of the initial compressive sensing theory successes in applications (computed tomography reconstruction) was not related to the intentional compressive sensing strategy but to the physical problem constraints, restricting the set and positions of available data.
The topic of this paper is the reconstruction of signals with some arbitrary positioned unavailable samples.These samples may result from a compressive sensing strategy or from their unavailability due to various reasons.A method for the unavailable signal samples reconstruction, considering them as variables, has been proposed in [22].In contrast to the other common methods that recover sparse signals in their sparsity domain this method reconstructs missing samples/measurements to make the set of samples/measurements complete.The available samples are fixed (invariant) during the reconstruction process.An analysis of the algorithm performance is presented in this paper.A relation between reconstructed signal bias and the algorithm step-size is derived and statistically confirmed.A new criterion for the algorithm step-size adaptation is proposed.The improved version of this algorithm is compared with two other common gradient-based algorithms proving its accuracy and computational efficiency.The discrete Fourier transform domain is used as a case study, although the algorithm application is not restricted to this transform [23].The solution uniqueness is checked after signal reconstruction by using the recently proposed theorem [24].
The paper is organized as follows.After the definitions in the next section, the adaptive gradient algorithm is presented.In the comments to the algorithm the step-size and bias are considered and illustrated.A new criterion for the algorithm parameter adaptation is proposed in Section 3 as well.Reconstruction is illustrated on examples and uniqueness of the obtained solution is analyzed.Statistical performance are checked in Section 4.An illustration of the algorithm application to the image reconstruction is given as well.
If the number of nonzero coefficients () ̸ = 0 for  ∈ { 1 ,  2 , . . .,   } is such that  ≪  then the signal () is sparse in this transformation domain.Next assume that only  <  signal samples are available in the time domain at the discrete-time positions: Under some conditions, studied within the compressive sensing theory [1][2][3], full signal reconstruction is possible based on the reduced set of available samples if the signal is sparse in a transformation domain.In the signal theory this problem can be formulated as a full recovery of the unavailable samples as well.In general, a reconstruction algorithm is based on the sparsity measure minimization.Nonzero transform coefficients' counting is the simplest sparsity measure.Counting can be achieved by the mathematical form ‖X‖ 0 , referred to as the "ℓ 0 -norm" [1,20].The signal reconstruction problem statement is then min ‖X‖ 0 subject to y = AX. ( Vector y = [( 1 ) ( 2 ) ⋅ ⋅ ⋅ (  )]  contains the available signal samples, while the transform vector X = [(1) (2) ⋅ ⋅ ⋅ ()]  elements are the unknown transform coefficients.Matrix A is the measurement matrix.In common signal transforms it corresponds to the inverse transformation matrix where the rows corresponding to the unavailable samples are omitted.If x = ΨX then the matrix A is a submatrix of Ψ containing its rows corresponding to the discrete-time instants (1).
Although there are some approaches to reconstruct the signal using the ℓ 0 -norm based formulation [25] in principle, this is an NP-hard combinatorial optimization problem.It is the reason why the closest convex ℓ 1 -norm of the signal transform is used instead of the "ℓ 0 -norm" in the minimization process: where ‖X‖ 1 = ∑ −1 =0 |()|.Under the conditions defined using the restricted isometry property (RIP) [3,4], minimization of the ℓ 1 -norm can produce the same result as (2) [1,22,26].Common gradient-based algorithms reformulate the problem defined by (3) into a form suitable for application of standard minimization tools.Minimization is done in the sparsity domain.One such a method is briefly reviewed in Appendix.

Algorithm.
A gradient-based algorithm that minimizes the sparsity measure by varying the missing sample values is presented next.In this gradient-based minimization approach the missing samples are considered as variables [22].The missing sample values are varied in an iterative way to produce lower sparsity measure values and finally to approach the minimum of the convex ℓ 1 -norm based sparsity measure (3) with an acceptable accuracy.Note that if the reconstruction conditions are met for the ℓ 1 -norm [3] then the ℓ 1 -norm minimum position will correspond to the true values of the missing samples.
The signal where the components of X  are   () = [ ()  ()].In order to find the position of function  = ‖X  ‖ 1 minimum, the relation for iterative missing samples (variables) calculation is defined by using the gradient-based descend on the sparsity measure ‖X  ‖ 1 as Vector y ()    contains the minimization variables (missing signal sample values) in the th iteration.The gradient vector coordinates (  ) = ‖X  ‖ 1 /(y  ) can be estimated, in the th iteration, by using the finite differences of the sparsity measure.They are calculated for each missing sample at instants   ∈ N  : where Again for   ∈ N  the available signal values are not changed, (  ) = 0.All gradient vector g () values, including the gradient zero and nonzero coordinates, in the th iteration are denoted by  () ().Through statistical study [22] it has been concluded that an appropriate step parameter value in ( 6) is related to the finite difference step as  = 2Δ.(ii) For the DFT instead of using signals defined by (9) and their transforms for each   ∈ N  it is possible to calculate

Comments on the Algorithm
with Note that    () values are not dependent on the signal and the iteration number .They can be calculated only once.Similar simplification can be done for any linear signal transform [⋅].

Adaptive
Step.The influence of step-size Δ to the solution (minimum position) precision is analyzed next.Assume that we have obtained the exact solution for the missing samples.The change of sparsity measure is tested by the change of signal sample ( 0 ) for ±Δ.For a signal of sparsity  in the DFT domain, whose form is () = ∑  =1    2 0   / , the signals with the steps used for the gradient estimate calculation are  +  () = () + Δ( −  0 ) and  −  () = () − Δ( −  0 ).Their sparsity measures are In the worst case, amplitudes   are in phase with  −2 0   / and Δ ≤ |  | when It means that ( 0 ) = (‖X +  ‖ 1 − ‖X −  ‖ 1 )/(2Δ) = / ̸ = 0.This is an important fact leading to the conclusion that the stationary state of the iterative algorithm is biased.The algorithm moves the solution ( 0 ) to a biased value ( 0 )− in order to produce ( 0 ) = 0 in the algorithm stationary point.For the zero value of the gradient we have In this case the bias  value can be obtained from The bias limit is proportional to the step-size Δ. Obviously, the bias limit can be reduced by using small values of the step Δ.
Calculation with a very small step Δ would be inefficient and time consuming with an extremely large number of iterations.An efficient way to use the gradient-based algorithm is in adapting its step-size.In the initial iteration the step-size of an order of signal amplitude is used: When the algorithm reaches a stationary point with this step Δ then the bias will dominate and the mean squared error will be almost constant.The algorithm can improve the solution precision by reducing the step-size.This simple relation between the bias and step-size has been confirmed through statistical analysis on more complex cases, with a large number of missing samples.

Adaptation Criterion.
The gradient-based algorithm convergence improvement by changing steps is analyzed in detail in [22].A criterion that can efficiently detect the event that the algorithm has reached the stationary point with regard to the mean square error (the vicinity of the sparsity measure minimum defined by the bias) described in the previous subsection is proposed next.The presented criterion is based on the direction change of the gradient vector.When the vicinity of the minimum sparsity measure point is reached within the region defined by the bias value, then the gradient estimate oscillates and changes its direction for almost 180 degrees.The angle between two successive gradient vector estimations g (−1) and g () denoted as   can be calculated as If this angle   is, for example, above 170 ∘ then we can conclude that the signal missing samples (variables) reached oscillatory nature around the minimal sparsity measure position.When these values of the angles are detected the algorithm step Δ is reduced, for example, for Δ/10 → Δ or Δ/ √ 10 → Δ.The iterations are continued starting from the reconstructed signal values reached in the previous step.
When the minimum of the sparsity measure is reached with a sufficiently small Δ, the value of Δ can also be used as an indicator of the solution precision.For example, if the signal amplitudes are of order 1 and / = 0.1 taking Δ = 1 in the first iteration will produce the solution with a precision better than 20 [dB].Then if the step Δ is reduced to Δ = 0.1 a precision better than 40 [dB] will be obtained and so on.Value of Δ can be used as the algorithm stopping criterion.

Stopping
can be used as a rough estimate of the reconstruction error to signal ratio.In this relation    () is the signal reconstructed before Δ reduction and  ()   () is the reconstructed signal after iterations done with the reduced step.Value of   can be used as the algorithm stopping criterion.If the value of   is above the required precision  max (e.g., if   > −100 dB), the calculation procedure should be continued with reduced values of step Δ.

Large Step
Lower limit 0 is obtained if  is imaginary-valued, while the upper limit 2| ()  ()| follows if  is real-valued.If Δ is large the missing signal values will be adapted for a value independent on Δ.The missing sample values will oscillate within the range of the original signal values until the step Δ is reduced in iterations.The variable values will be arbitrary changed within the signal amplitude order as far as Δ is too large.It will not influence further convergence of the algorithm, when the step Δ assumes appropriately small values.
A pseudocode of this algorithm is presented in Algorithm 1.
Illustrative Example 1.The effects analyzed within the comments on the algorithm will be illustrated on a simple signal: The bias upper limit in the stationary state for step Δ = 1, according to (13), is (/( − ))Δ = 1.After each reduction of the step-size from Δ to Δ/10 the bias caused MSE will be reduced for almost 20 [dB].The result for the signal reconstruction and the obtained MSE for the calculated missing values (1) and (6) are presented in Figure 1.
After the signal is reconstructed the problem is to establish if this reconstruction is unique in the sense that there is no other signal (missing sample values) with the same or lower sparsity.A very simple check of the solution uniqueness, using the missing sample positions and the positions of the reconstructed signal nonzero components in the sparsity domain, is given in [24] by the following.
Theorem 1.Consider a signal () that is sparse in the DFT domain with unknown sparsity.Assume that the signal length is  = 2  samples and that  samples are missing at the instants   ∈ N  .Also assume that the reconstruction is performed and that the DFT of reconstructed signal is of sparsity .Assume that the positions of the reconstructed nonzero values in the DFT are holds.Integers  2 ℎ and  2 −ℎ are calculated as where Signal independent uniqueness corresponds to the worst case signal form, when  2 −ℎ = 0.
The answer is obtained almost immediately, since the computational complexity of the Theorem is of order ().The proof is given in [24].For the considered set of missing samples  ∈ N  = {0, 1, 2, 5, 7, 8, 11, 15, 18, 27, 28, 30}, from the previous example, we easily get For signal independent uniqueness, It means that  < 8, with considered N  , guarantees uniqueness of the reconstructed signal for any signal parameters.When a sparse signal is reconstructed, for some sets of K  uniqueness may be guaranteed for higher values of sparsity .Additional details and examples may be found in [24].

Statistical Analysis and Comparison
Consider a general real-valued form of a signal sparse in the DFT domain The total number of signal samples is .
Bright colors indicate the region where the algorithm had fully recovered missing samples (compared to the original samples) in all realizations, while dark colors indicate the region where the algorithm could not recover missing samples in any realization.In the transition region for  slightly greater than 2 we have both the cases when the signal recovery is not achieved and the cases of full signal recovery.A stopping criterion for the accuracy of 120 [dB] is used.It corresponds to a precision in the recovered signal of the same order as in input samples, if they are acquired by a 20-bit A/D converter.The case with  = 64 is repeated with an additive input Gaussian noise such that the input signal-tonoise ratio is 20 [dB] in each realization (Figure 3(b)).The reconstruction error in this case is limited by the input signalto-noise ratio.
The average reconstruction error in the noise-free cases is related to the number of the full recovery events.For  = 64 the number of the full recovery events is checked and presented in Figures 4(a The efficiency of the presented algorithm is compared with the standard routines used for the ℓ 1 -norm based minimization problems.The performance of the proposed algorithm are compared with the algorithm that recasts the recovery problem (3) into regression framework (LASSO-ISTA) and a linear programming framework with the primaldual interior point method (L1-magic code in MATLAB).A simple review of the LASSO-ISTA algorithm is presented in Appendix.All there algorithms are run using 100 sparse signals with random parameters.The algorithm parameters are set in such a way that the average computational time for all cases is about 25 ms.The results are presented in Table 1.Columns' notation in the table is  for sparsity and  =  −  for the number of missing samples and MAE stands for the mean absolute error.Calculation time using MATLAB is presented in all cases.
We can conclude that in most of the considered cases the proposed algorithm outperforms other two algorithms in both the calculation time and accuracy.
Example 3. The algorithm can be applied to other signal transforms as well [23].An example with missing (saltand-pepper noise) pixels in an image with the 2D-DCT as its sparsity domain will be presented.Consider a standard MATLAB test image canoe.tif(Figure 5) and assume that 50% of the samples are corrupted (noisy image).These pixels with extreme values are removed and considered as unavailable.Since the coefficients in the 2D-DCT converge fast we will assume that the sparsity condition is satisfied (without imposing it in an explicit way by making some 2D-DCT coefficients equal zero).The image is reconstructed by minimizing the 2D-DCT sparsity measure and varying missing pixel values according to the presented algorithm.
(i) The algorithm inputs are the signal length , the set of available samples N  , and the available signal values (  ),   ∈ N  .

with 𝑁 = 8
and sparsity  = 4. Assume that the samples at positions  ∈ N  = {1, 6} are missing.Reconstruction of the signal is done in 60 iterations.The initial value of the step Δ = 1 and the initial values of missing samples (1) = 0 and (6) = 0 are used.The missing sample values in the first 20 iterations are given in Figure 1(a) by dots (connected by a line).After 6 iterations the algorithm does not significantly change the missing sample values with given step-size Δ = 1 (lower subplot within the figure presents changes of variables in a highly zoomed scale).When the stationary point vicinity is reached with Δ = 1 the gradient estimate is almost zero-valued.Its direction change for almost 180 degrees.The measure values are on the contour with approximately the same value (circles on contour plot).The algorithm resumes its fast approach toward the exact missing sample values after the algorithm step is reduced to Δ = 0.1 in the 20th iteration.When a new stationary state is reached change of Δ to Δ = 0.01 is done and the approach to the correct missing sample values is continued.

Figure 1 :
Figure 1: (a) Illustration of a signal reconstruction using variable step gradient algorithm.(b) Original signal.(c) Signal with two missing samples.(d) Reconstructed signal.(e) Reconstructed signal MSE as a function of the iteration number.

Figure 2 :
Figure 2: Gradient-based reconstruction of a sparse signal with reconstructed MSE (a).Angle between successive gradient estimations   and the MSE as a function of the number of iterations with step changes based on the gradient angles (b).

Figure 3 :
Figure 3: Signal-to-reconstruction-error (SRR) obtained by using Algorithm 1, averaged over 100 realizations for various values of sparsity  and number of available samples .(a) The total number of samples is  = 128.(b) With a Gaussian noise in the input signal, SNR = 20 [dB] and  = 64.
) and 4(b).The average number of the algorithm iterations to produce the required precision, as a function of the number of missing samples and signal sparsity , is presented as well (Figure4(c)) along with the corresponding average computation time (in seconds) for the Windows PC with Intel Dual Core processor (Figure4(d)).The average computation time is proportional to the average number of iterations multiplied by the number of missing samples (variables)  =  − .

Figure 4 :
Figure 4: ((a), (b)) The percentage of the full recovery events as a function of the number of available samples  and the sparsity  in the case of  = 64.(c) The average number of iterations as a function of the number of missing samples and sparsity.(d) The average computation time.
Criterion.The precision of the result in iterative algorithms is commonly estimated based on the change of the result values in the last iterations.Therefore an average of changes in a large number of variables is a good estimate of the achieved precision.The value of Influence.A possible divergence of a gradientbased algorithm is related to large steps  = 2Δ.
The signal sparsity  = 2 is varied from  = 2 to  = /2.The signal amplitudes   , frequencies   , and phases   are random variables.Signal amplitudes   are Gaussian random variables with unity variance; the frequency indices   are random numbers within 1 ≤   ≤  − 1.The phases are uniform random variables 0 ≤   ≤ 2, in each signal realization.Signal is reconstructed in 100 different realizations for each sparsity  and random positions of missing samples.The statistical calculations are performed for  = 128.The reconstructed signals are denoted by   ().The reconstruction results are

Table 1 :
MAE and computational time for the L1-magic (LP-DP), LASSO-ISTA, and the proposed algorithm.