Sparse Signal Inversion with Impulsive Noise by Dual Spectral Projected Gradient Method

. We consider sparse signal inversion with impulsive noise. There are three major ingredients. The first is regularizing properties; we discuss convergence rate of regularized solutions. The second is devoted to the numerical solutions. It is challenging due to the fact that both fidelity and regularization term lack differentiability. Moreover, for ill-conditioned problems, sparsity regularization is often unstable. We propose a novel dual spectral projected gradient (DSPG) method which combines the dual problem of multiparameter regularization with spectral projection gradient method to solve the nonsmooth ℓ 1 + ℓ 1 optimization functional. We show that one can overcome the nondifferentiability and instability by adding a smooth ℓ 2 regularization term to the original optimization functional. The advantage of the proposed functional is that its convex duality reduced to a constraint smooth functional. Moreover, it is stable even for ill-conditioned problems. Spectral projected gradient algorithm is used to compute the minimizers and we prove the convergence. The third is numerical simulation. Some experiments are performed, using compressed sensing and image inpainting, to demonstrate the efficiency of the proposed approach.


Introduction
In the present manuscript we are concerned with ill-posed linear operator equation: where  is sparse with respect to an orthonormal basis and  : () ⊂  →  is a bounded linear operator.In practice, exact data  are not known precisely, but only an approximation  훿 with       −  훿      ≤ is available.We call  훿 the noisy data and  the noise level.
It is well known that the conventional method for solving (1) is sparsity regularization, which provides an efficient way to extract the essential features of sparse solutions compared with oversmoothed classical Tikhonov regularization.
In the past ten years, sparsity regularization has certainly become an important concept in inverse problems.The theory of sparse recovery has largely been driven by the needs of applications in compressed sensing [1,2], bioluminescence tomography [3], seismic tomography [4], parameter identification [5], and so forth.For accounts of the regularizing properties and computational techniques in sparsity regularization we refer the reader to [5][6][7] and the references given there.In general, sparsity regularization is given by min 푥       −  훿 where ,  is the regularization parameter balancing the fidelity ‖ −  훿 ‖ 2 ℓ 2 and regularization term ‖‖ 푝 푤,푝 .The functional in (3) is not convex if  < 1, it is challenging to investigate the regularizing properties and numerical computing method of minimizers.Limited work has been done for  < 1; we refer the reader to [8][9][10][11] for a recent account of the theory.In this paper, we will focus our main attention on the situation of  = 1.
The aim of this paper is to consider a regularization functional of the form min 푥       −  훿     ℓ We call (4) ℓ 1 + ℓ 1 problem.A main motivation to investigate the ℓ 1 + ℓ 1 problem is that noisy data  훿 often contain impulsive noise.For Gaussian noise, ℓ 2 fidelity is a natural choice.However, a typical nondifferentiable fidelity used in application involving impulsive noise is the ℓ 1 fidelity, which is more robust than ℓ 2 fidelity [12].ℓ 1 fidelity technique was motivated by [13] for signal processing and later had attracted considerable attention in image processing, especially in image denoising.In image processing, ℓ 1 fidelity typically combines with TV regularization term.For details of ℓ 1 +TV problems, we refer the reader to [14][15][16][17].Nowadays ℓ 1 fidelity has received growing interest in the inverse problems where solutions are sparse with respect to an orthonormal basis.Minimizers of cost functions involving ℓ 1 fidelity combined with sparsity regularization have been studied.We refer the reader to [18][19][20][21][22][23] and the references given there.
For ℓ 1 fidelity, regularizing properties must be handled specifically.Burger and Osher [24] proved convergence rate when the regularization parameter is chosen arbitrarily fixed but sufficiently small; the authors call this phenomenon exact penalization.The most curious situation is that, in [25], the optimal convergence rate requires that regularization parameter must be equal to a fixed value.In [26], Grasmair et al. proved that source condition together with finite basis injectivity (FBI) condition is weaker than the restricted isometry property and obtained linear convergence ().Flemming and Hegland discussed convergence rates in ℓ 1 -regularization when the basis is not smooth enough [27].König et al. obtained high order polynomial rates of convergence in the special corrupted domain, even though the underlying problem is exponentially ill-posed in the classical sense [28].
Though ℓ 1 fidelity is robust, more researchers prefer to use ℓ 2 fidelity because of its differentiability.Hence a key issue for the ℓ 1 fidelity is the numerical computing methods.In the past few years, numerous algorithms have been systematically proposed for the ℓ 1 + TV problems.On the other hand, in spite of growing interests in the ℓ 1 fidelity, we can indicate limited work has been done for numerical methods of ℓ 1 + ℓ 1 problems.For sparsity regularization, the popular algorithms, for example, homotopy (LARS) method [29], iteratively reweighted least squares (IRLS) method [30], and iterative thresholding algorithm [31,32], cannot be directly applied to ℓ 1 + ℓ 1 problem due to the fact that both fidelity and regularization term lack differentiability.There are only a few papers, in which numerical algorithms for ℓ 1 + ℓ 1 problems have been discussed systematically.References [20,22] proposed the new model by statistics method.Reference [21] discussed the prior sparse representation and the data-fidelity term and proposed the two-phase approach.In [23], authors propose a robust bisparsity model (RBSM) to effectively exploit the prior knowledge about the similarities and the distinctions of signals.However, the researchers often devoted them to compressive sensing problem, where random matrices are well-conditioned.For ill-conditioned problems, these methods are often unstable [33] Moreover, the researchers assume that the solution is sparse itself, which is different from the general assumption that the solution is sparse with respect to an orthonormal basis.In [19], Yang and Zhang proposed a Primal Dual-Interior Point Methods (PD-IPM) for EIT problem, which is efficient at dealing with the nondifferentiability.However, they did not give the convergence proof.Yang and Zhang reformulated the ℓ 1 + ℓ 1 problem into the basis pursuit model which can be solved effectively by ADM method [18].It is a competitive method compared with other algorithms for compressive sensing.In [34], Xiao et al. applied ADM method to ℓ 1 + ℓ 1 problem directly.Numerical results illustrated that the proposed algorithm performs better than Yall1 [18].
In this paper, we investigate regularizing properties of ℓ 1 + ℓ 1 problems.The convergence rates of ‖ 훿 훼 −  † ‖ ℓ 2 can be shown to be ( 1−휀 ) and () by a priori and a posteriori choice with the additional source condition and FBI condition.As mentioned above, dual is a conventional technique to solve the Tikhonov regularization with ℓ 1 fidelity.However, there are some limitations to this approach to ℓ 1 + ℓ 1 problems due to the fact that it is difficult to obtain the dual formulation of ℓ 1 + ℓ 1 problems.Inspired by [14] and multiregularization theory [35][36][37], a smooth ℓ 2 term is added to original functional of regularization.The dual problem of this new cost function is reduced to a constraint smooth functional.Moreover, the smooth ℓ 2 regularization term can improve the stability.The conventional method to solve the constraint smooth functional is projected gradient algorithm.We use spectral projected gradient (SPG) method to seek for the minimizers of the constraint functional and prove the convergence.
An outline of this paper is as follows.We devote Section 2 to a discussion of regularizing properties, including wellposedness and convergence rate.The convergence rate for a priori and a posteriori choice is established.In Section 3, inspired by the theory of multiregularization, we construct a new functional which is equal to a constraint smooth functional according to Frechel duality.In Section 4, the spectral projected gradient method is applied to compute the minimizers.Section 5 provides a detailed exposition of multiparameter choice rule based on the balancing principle.Numerical experiments involving compressed sensing and image inpainting are presented in Section 6, showing that our proposed approaches are robust and efficient.

Regularization Properties
where () fl ∑ 훾  훾 |⟨ 훾 , ⟩| and the subdifferential of () at  is denoted by () ⊂ .All along this paper,  denotes a Hilbert space which is a subspace of ℓ 2 space and ⟨⋅, ⋅⟩ denotes the inner product;  denotes a Banach space which is a subspace of ℓ 1 space. : dom() ⊆  →  is a bounded linear operator and dom() ∩ dom() ̸ = 0. ( 훾 ) 훾∈Λ ⊂  is an orthonormal basis where Λ is some countable index set.From now, we denote In order to prove convergence rate results we denote by  훿 훼 the minimizer of the regularization functional  훼,훿 () for every  > 0 and use the following definition of ()minimum norm solution.
We define the sparsity as follows.
The FBI condition was originally introduced in [8].In [26], Grasmair et al. used a slightly weaker condition based on FBI condition and showed that Assumption 3 is in some sense the weakest possible condition that guarantees the linear convergence rate.Actually, if  † is the unique ()minimizing solution, Assumption 3 is satisfied [26].For 0 ≤  < 1,  † being the unique ()-minimizing solution can not guarantee that Assumption 3 is satisfied.Even if Assumption 3 is satisfied for 0 ≤  < 1, we do not know if linear convergence can be obtained due to fact that traditional proof for linear convergence needs the convexity of ().
We now turn to convergence rate.For ℓ 2 + ℓ 1 problem, Lorenz proved convergence rate ( 1/2 ) with source condition and FBI condition [7].Linear convergence rate () was improved by Grasmair et al. for nonlinear equation with additional two nonlinear conditions [8].In [26], Grasmair et al. proved that source condition together with FBI condition is weaker than the restricted isometry property and obtained linear convergence ().Flemming discussed convergence rates in ℓ 1 -regularization when the basis is not smooth enough [27].
Remark 5. We note that Bregman distance cannot be used as an error measure in this section due to the fact that () fl ∑ 훾  훾 |⟨ 훾 , ⟩| fails to be strictly convex.We refer reader to [8,39] for details.In this section we use ℓ 2 norm as error measure.
The next results show a convergence rate for a priori parameter choice rule.Lemma 6. Assume that Assumption 3 holds.Then there exists a constant  > 0 such that for all  ∈ .
Together with the definition of , (10) implies that  is a finite set.From Assumption 3, the restriction  to every  ∈  is injective.Then for some constant  > 0 for all  ∈ .By the definition of  and  † ∈  we have that ⟨ 훾 ,  † ⟩ = 0 for every  ∉  and  † ∈ .From (11), it is easy to show that Then we conclude that and the proof is completed.
Next we turn our attention to a posteriori parameter choice which is so-called discrepancy principle due to Morozov [40].The regularization parameter defined via the discrepancy principle is for some  ≥ 1. Morozov discrepancy principle is a widely used and easy numerical implementation rule.Furthermore, for  = 1, (4) is equivalent to a constrained minimization problem [26]  ( † ) → min subject to which is widely used in compressive sensing.The next results show a convergence rate by the Morozov discrepancy principle.
Theorem 9. Assume that Assumption 3 holds and that the regularization parameter is determined by (24).Then holds.This together with Proceeding as in the proof of Theorem The proof is completed.
Remark 10.Linear convergence rate () can also be obtained when the a priori choice and the Morozov discrepancy principle are applied to ‖ 훿 훼 − 훿 ‖ 푝 ℓ  (1 <  ≤ 2) fidelity.However, one must let  = () in the a priori parameter choice rule.

Dual Problem
In this section, let  훾 in (4) take the same value; that is,  훾 =  > 0 for all  ∈ Γ.It is reasonable because convergence can be obtained when / → 0 [6].Let  fl  =  훾 ; then (4) is equivalent to Let  = ( 1 ,  2 , . . .,  훾 , . ..) ∈ ℓ 2 , where  훾 = ⟨ 훾 , ⟩.In addition, we denote by  : ℓ 2 → ℓ 2 a dictionary satisfied with  =  and  † =  † .For example, in the field of wavelet transform,  is a wavelet decomposition operator and  푇 is a wavelet reconstruction operator [41,42].Let  =  ∘  푇 ; then (4) is equivalent to Dual is a conventional method for solving Tikhonov regularization with ℓ 1 fidelity.However, there are some limitations to this approach to solve (32).The main difficulty is that both the ℓ 1 fidelity and the ℓ 1 regularization term are nondifferentiable.Moreover, for ill-conditioned problems, sparsity regularization is often unstable.We add the smooth penalty (/2)‖‖ 2 ℓ 2 to (32) to construct the following functional: The advantage of problem (33) in place of (32) is that the dual problem of ( 33) is a constraint smooth functional and projected gradient algorithm can be used to compute minimizers.Moreover, the regularization effect of ℓ 1 penalty is weak and ℓ 2 penalty can improve the stability of (32).Next we will investigate the convergence of the minimizers to the functional P 훽 as  tending to zero.Theorem 11.Let  be fixed and { 푛 } be a sequence tending to zero.Then the minimizers  훽  훼 to the functional  훼,훽  () have a subsequence converging to  * being a minimizer of the functional  훼 ().
Proof.By the definition of  훽  훼 , we have for all  ∈ ().Therefore,  * is a minimizer of the functional  훼 () and the proof is completed.
Next we consider the dual problem of P 훽 .We will show that the constraint smooth minimization problems P * 훽 P * 훽 : min are the dual problem of P 훽 .
Theorem 12. P * 훽 is the dual problem of P 훽 .The solutions  훽 훼 of P 훽 and  훽 훼 of P * 훽 have the following relation: for all  ∈ ℓ 2 with ‖‖ ℓ ∞ ≤ min{, 1}, where is the soft threshold function. Proof.Let then problem P 훽 can be rewritten as Let us denote by  * and  * the conjugate function of  and .By the Fenchel duality [43] [Chap 3, Chap 10], it follows that In  훼,훿 (), the functionals  and  are proper, convex lower semicontinuous.By the Fenchel duality theorem [43] Since  훽 훼 and  훽 훼 are minimizers of P 훽 and P * 훽 , we have that Moreover, the extremality condition (48) is equivalent to the Kuhn-Tucker conditions By the definition of the subgradient and − 훽 훼 ∈ ( 훽 훼 ) in (49), it follows that Algorithm 1: Spectral projected gradient method for P * 훽 .
Remark 13.In [39] (chap 1), numerical experiments showed that the choice of  slightly larger than 1 gives even better results than  = 1.It is necessary to discuss the functional where 1 <  < 2; we call (55) ℓ 1 +ℓ 푝 problem.We do not have to add additional ℓ 2 penalty to (55); its dual problem can be obtained by Fenchel dual theory directly.

Computation of Minimizers
As mentioned above, the functional equation ( 33) can be transformed using Fenchel duality into a smooth functional with a box constraint, which can be solved effectively using projected gradient-type methods.The spectral projected gradient (SPG) method [44] has been proved to be effective for the minimization of differentiable functional with box constraints.The function () is defined by then By setting where  = min {, 1} , we denote by  퐵(푐) () the orthogonal projection on the ball () Given initial value  0 and a constraint on step length  푘 , that is, 0 <  min ≤  푘 ≤  max , let parameter  ∈ (0, 1), and 0 <  1 <  2 < 1,  > 1.We use the convergence criteria given by (see Algorithm 1).We have the following convergence results.
Proof.Since () is convex and has continuous derivatives, the convergence follows from standard results (refer to Theorem 2.1 in [44]). ( and go to (2) end if Algorithm 2: Line research.
Algorithm 3: Fixed point algorithm for  and .

Choice of Parameter 𝛼 and 𝛽
The solution  훽 훼 of P 훽 converges theoretically to the solution  훿 훼 of P as  → 0. Obviously smaller  is better.However, (1/2)‖ * ‖ 2 ℓ 2 tends to infinite as  → 0.Moreover, the smaller  weakens the regularization effect of the ℓ 2 penalty, which leads to instability.If the solution is sparse, we can say that the nonzero coefficients are impulsive parts and the zero coefficients are smooth parts.Multiparameter regularization is a conventional method for ill-posed problems if the solutions have a number of different structures.In [35,37], numerical experiments show that multiparameter regularization can effectively recover the different part of solutions.If the solutions contain only a single structure, multiparameter regularization also has better performance.Hence it does not necessarily require the parameter  tending to zero when we choose the parameters  and .A conventional method for the choice of parameters  and  is multiparameter regularization choice principle, for example, discrepancy principle [35] and balance principle [37].In this paper, we use multiparameter balance principle for the choice of regularization parameters  and .
Balance principle is to compute minimizers of the function where  훾 is a fixed constant, and (62) is equivalent to [37] A fixed point algorithm for  and  according to (63) is as in Algorithm 3.

Numerical Simulations
In this section, we present some numerical experiments to illustrate the efficiency of the proposed method.In Section 6.1, numerical experiments involve compressive sensing.We aim to demonstrate that ℓ 1 fidelity is more stable than the ℓ 2 fidelity and is capable of handing both impulsive and Gaussian noises.In Section 6.2, we first compare the performance of the DSPG method with the alternating direction method (ADM) and TNIP method by well-conditioned compressive sensing problems.In the second example, we discuss an ill-posed problem where the condition number of linear operator  is 255; we aim to demonstrate that the proposed method is stable even with large condition numbers.In Section 6.As can be seen from Figure 2, the ℓ 1 fidelity is more stable for impulsive noise and always offers high-quality restoration even with poor data.In contrast to the ℓ 2 fidelity, the quality of restoration results by the ℓ 2 fidelity is always poor.

Comparison of DSPG with ADM-ℓ 1 , ADM-ℓ 2 , and TNIP.
In this subsection, we present the comparison results of DSPG with ADM-ℓ 1 , ADM-ℓ 2 , and TNIP.ADM-ℓ 1 [18] is an efficient alternating direction method for ℓ 1 + ℓ 1 problem.ADM-ℓ 2 [18] and TNIP [45] (truncated Newton interior point) are for ℓ 2 + ℓ 1 problem.In the first experiment, we  for / = 0.2./ increase from top to bottom row.As can be seen from Figure 3, ADM-ℓ 1 converges faster than DSPG and the accuracy is better than DSPG when / = 0.4.With / decreasing, DSPG method performs more competitively.The accuracy of DSPG method is even better than ADM-ℓ 1 when / = 0.1.Though the optimal relative error of ADMℓ 1 is better than DSPG method, the corresponding optimal iteration number or stopping tolerance of ADM-ℓ 1 is difficult to estimate in practice.The other ℓ 2 fidelity algorithms, especially TNIP method, converge obviously faster than ADM-ℓ 1 and DSPG method.However, the accuracy of the two algorithms is worse than ADM-ℓ 1 and DSPG method no matter what value the / is.Next, in order to test the stability of the DSPG method for ill-conditioned problems, we use matrix  푛×푛 ( = 200) whose condition number is 255.This problem was discussed by Lorenz in [8] where the ill-conditioned matrix is generated by Matlab code:  = tril(ones(200)).The signal is -sparsity where / = 0.1 and 0.2.We add 1% impulsive noise to data.As can be seen from Figure 4, DSPG method converges obviously faster than the ADM-ℓ 1 method.The relative error of DSPG method is also better than ADM-ℓ 1 method.It is shown that DSPG method is stable even for large condition number matrices.In Table 1, data contain impulsive noise with corruption Rerr() = 0.1%, 0.3%, 0.5%, 1%, 3%, 5%, 10%.As can be seen from Table 1, the quality of restoration improves as noise level  decreases.Theoretically, ADM-ℓ 1 and DSPG methods are adept to process impulsive noise.However, ADM-ℓ 1 method is sensitive to noise when the operators are ill-conditioned.In this case, ADM-ℓ 1 cannot obtain reasonable restoration.ADM-ℓ 2 and DSPG methods are more stable to noise level  even if matrix  has large condition numbers.For low noise levels, ADM-ℓ 2 and DSPG methods have advantage over the other two methods.For high noise levels, the restoration results of ADM-ℓ 1 method are very close to ADM-ℓ 2 and TNIP methods.Restoration results of the DSPG method are obviously better than the other three methods.

Image Inpainting.
We consider 2D image inpainting problems.The image is Barbara ( = 512; cf. Figure 5).We randomly remove 40% pixels of Barbara to create an incomplete image.In this case, the image inpainting is a moderate ill-posed problem.The condition number of the matrix is around 4000.For our purpose, we make use of Daubechies 4 wavelet basis as a dictionary.We use four scales, for a total of 8192 × 512 wavelet and scaling coefficients (cf. Figure 6).As seen from Figure 6, the representation of the image with respect to Daubechies 4 basis is sparse.We add salt-and-pepper noise by Matlab code: imnoise(image, 'salt & pepper' , d).In this example,  = 0.05.The restoration results are shown in Figure 5(d).Restoration results of four images with different noise levels are given in Table 2. Restoration results show that if images have a sparse representation with respect to an orthogonal basis, DSPG method can obtain satisfactory results even if the image inpainting are moderate ill-posed problems.

Conclusion
With source and FBI conditions, we have proved that ℓ 1 + ℓ 1 regularization method yields convergence rates of order  1−휖 and .For numerical solutions, we have proposed a novel DSPG approach for sparsity regularization with ℓ 1 fidelity.Numerical results indicate that the proposed DSPG algorithm performs competitively with several state-of-art algorithms such as ADM method.On various classes of test problems with different condition numbers, the proposed DSPG method has exhibited the following advantages: (i) compared with other algorithms, the dual problem of our methods is more simple which is a box-constraint smooth functional and can be solved effectively by projected methods; (ii) for ill-conditioned problems, DSPG method is more stable with respect to noise level.We remark that for well-conditioned problems, for example, compressive sensing, optimal accuracy of ADM-ℓ 1 is better than DSPG method.However, the optimal accuracy of ADM-ℓ 1 method is strongly dependant on stopping tolerance values which can be difficult to estimate in practice.To the best of the author's