TV + TV 2 Regularization with Nonconvex Sparseness-Inducing Penalty for Image Restoration

In order to restore the high quality image, we propose a compound regularization method which combines a new higherorder extension of total variation (TV+TV) and a nonconvex sparseness-inducing penalty. Considering the presence of varying directional features in images, we employ the shearlet transform to preserve the abundant geometrical information of the image. The nonconvex sparseness-inducing penalty approach increases robustness to noise and image nonsparsity. In what follows, we present the numerical solution of the proposed model by employing the split Bregman iteration and a novel p-shrinkage operator. And finally, we perform numerical experiments for image denoising, image deblurring, and image reconstructing from incomplete spectral samples. The experimental results demonstrate the efficiency of the proposed restoration method for preserving the structure details and the sharp edges of image.


Introduction
Image restoration is one of the most classical problems in image processing.During the formation, transmission, or recording process, images may be degraded by noise and blur.Thus, an important problem in image processing is the restoration of the original true image from an observed image.Mathematically, it amounts to estimate an image  ∈ R  from a measurement obtained through a noninvertible linear degradation operator A : R  → R  (i.e., an  ×  matrix) and contaminated by an additive noise  ∈ R  .Typical degradation operators include identity as in image denoising, incomplete linear projector as in compressive sensing, convolution operator as in image deconvolution problems, 0-1 mask as in image inpainting, and so on.The recovery of  from the observed image  ∈ R  can be considered as an inverse problem which is ill-posed in general.The information provided by  is not sufficient to ensure existence, uniqueness, and stability of a solution .It is therefore necessary to regularize the problem by adding a prior constraint on the solution.One typical regularization approach takes the following form: where the first term is the fidelity function (or fitting function) that forces closeness to data according to (1), R : R  → R is the so-called regularization function (or penalty function) that embodies the priors about the unknown image  with certain characteristics (such as piecewise smoothness or sparseness), and  > 0 is a parameter that controls the trade-off between these two functions.The intuitive meaning of the linear inverse problem (2) is clear: its minimizers reach a compromise between lack of fitness to the observed image (as measured by the fidelity function) and a degree of "undesirability" (as quantified by R(⋅)).
The choice of the regularization function R(⋅) significantly affects the quality of the restored image.The majority of the existing regularization functions favor simpler images with piecewise constant intensities.Details and fine features in images are not necessarily more complicated.The existing state-of-the-art image restoration models usually are designed for single convex regularization function such 2 Mathematical Problems in Engineering as total variation (TV) seminorm or sparseness-inducing penalty.The success and widespread use of the TV seminorm in the past two decades can be mainly attributed to its ability to produce well-preserved and sharp edges.Moreover, its convexity permits the design of efficient algorithms.Although the TV seminorm is extremely popular in a variety of applications, it also gives rise to some undesired effects.In particular, it leads to the well-known staircase effect when applied to images that are not necessarily piecewise constant [1].To attenuate the staircase effect, there is a growing interest in the literature for replacing TV seminorm by combining a second-order regularization function with the TV seminorm [2][3][4][5] or employing a ℓ 1 norm of some higherorder differential operator in a standalone way [6][7][8].The motivations behind these attempts are to restore potentially a wider range of images, which comprise more than merely piecewise constant regions.More recently, Papafitsoros and Schönlieb constitute a straightforward higher-order extension of the TV seminorm by adding a nonsmooth secondorder regularization function; it is referred to as TV+TV 2 regularization [9].TV+TV 2 regularization can compete with total generalized variation (TGV) [10], infimal convolution [11], and Euler's elastic [12], three other state-of-the-art higher-order regularizations.But most of all, TV+TV 2 is simple and efficiently numerically solvable.
On the other hand, problems of taking R(⋅) as sparsenessinducing penalty have become familiar over the past three decades, particularly in statistical and signal processing contexts.One typical sparseness-inducing penalty problem takes the following form: where ‖ ⋅ ‖ 1 denotes the ℓ 1 norm and Ψ is a sparsity operator, such as a wavelet transform.When Ψ is a discrete gradient operator, the second term leads to TV regularization.In the 1990s, seminal work on the use of ℓ 1 sparseness-inducing penalty appeared in the statistics literature, such as the famous basis pursuit denoising (BPDN) criterion [13] and the least absolute shrinkage and selection operator (LASSO, [14]).
For brief historical accounts on the use of the ℓ 1 penalty in statistics and signal processing, see [15].Another intriguing new application for the ℓ 1 sparseness-inducing penalty is compressive sensing (CS) [16][17][18] when A is the measurement or sampling operator in the optimization problem (3).
In addition, it is worth mentioning the numerical results in [19,20] and theoretical works [21][22][23].These results have justified that the robustness of restored images to noise and the nonsparsity can be increased by replacing ℓ 1 norm in (3) with the nonconvex nonsmooth ℓ  (0 <  < 1) quasinorm.Moreover, recent researches show that ℓ  (0 <  < 1) quasinorm offers much richer possibilities to restore high quality images with neat edges.
To restore high quality image, in the regularization approach (2), one may be desired to encourage the solution to exhibit characteristics that are piecewise smooth and sparse simultaneously.The resulting optimization problem cannot be dealt with by single regularization.It is most naturally expressed by the combination of more than one regularization function, such as compounding total variation and sparseness-inducing penalty [24].
In this paper, we consider a new compound regularization, that is, linear combination of TV+TV 2 regularization and sparseness-inducing ℓ  (0 <  < 1) quasinorm.Considering the presence of varying directional features in image, we employ the shearlet transform to preserve the abundant geometrical information of the image.As mentioned above, TV+TV 2 regularization can effectively attenuate the staircase effect of TV regularization.The nonconvex sparsenessinducing penalty approach increases robustness to noise and image nonsparsity.

Preliminary
In this section, we will briefly review TV+TV 2 regularization and shearlet transform to make this paper self-contained.

TV+TV 2
Regularization.In image restoration model (2), the most frequently chosen regularization term is TV seminorm where |∇| =: √|  | 2 + |  | 2 .It is well-known that TV seminorm is tuned towards the preservation of edges and performs very well if the reconstructed image is piecewise constant.However, as an image does not only consist of flat regions and jumps but also possess slanted regions, that is, piecewise linear parts, such regularization procedure will cause undesired staircase effect.One way to reduce this staircase effect is in fact to "round off " TV seminorm by using Huber regularization [25].However, such procedure can only eliminate the staircase effect in areas with small gradient.Another way of improving TV seminorm is the introduction of higher-order derivatives in the regularization function.Especially in recent years, higher-order versions of nonsmooth image enhancing methods have been considered.
To reduce the staircase effect of TV regularization, Chambolle and Lions [4] understood the image  as an addition of two image layers, that is,  =  1 +  2 ( 1 is the piecewise constant part of  and  2 is the piecewise smooth part), and proposed the inf-convolution regularization function: where The former is TV regularization and the latter is TV 2 regularization, which is total variation of the gradient.In [3,7,[26][27][28] the authors also consider using TV 2 regularization.
Following the idea of the inf-convolution regularization, Chan et al. proposed the following regularization [29]: where the Hessian ∇ 2 used in the inf-convolution regularization is replaced by the Laplacian Δ.
Another attempt to combine first-and second-order regularization originates from Chan et al. [2], who consider TV regularization together with weighted versions of the Laplacian.More precisely, they proposed the following regularization: where (⋅) must be a function with certain growth conditions at infinity in order to allow jumps.It is particularly worth mentioning that, in [10], Bredies et al. proposed the following total generalized variation (TGV), which is recognized as the currently best higher-order regularization function: where the variable V = (V 1 , V 2 ) varies in the space of symmetric tensors of order 2 and The TGV regularization generalizes the inf-convolution regularization in the sense that it coincides with the infconvolution regularization when V only varies in the range of ∇.The TGV regularization improves the results for image denoising over TV regularization and inf-convolution regularization.The interested reader can consult [10] for details.
More recently, in [9], the authors choose the following combined first-and second-order regularization function: where  1 : R 2 → R + and  2 : R 4 → R + are all convex function with at most linear growth at infinity.When one considers  1 () = || and  2 () = ||, the regularization procedure essentially is a combination of the total variation and the total variation of the gradient; it is referred to as TV+TV 2 regularization.The idea of TV+TV 2 regularization is to regularize with a fairly large weight  in the firstorder term-preserving the jumps as good as possible-and use a small weight  for the second-order term such that staircase artifacts created by the first-order regularization are eliminated without introducing any serious blur in the reconstructed image.
In [9], the authors show that for image denoising, deblurring, and inpainting TV+TV 2 regularization offers solutions whose quality is not far off from the ones produced by TGV regularization.Moreover, the computational effort needed for its numerical solution is not much more than the one needed for solving TV regularization [30].For comparison the numerical solution for TGV regularization is in general about ten times slower than TV+TV 2 regularization.
In addition, it is well-known that TV regularization is not optimal for the restoration of textured regions.In recent years directional representation systems were proposed to cope with curved singularities in images.In particular, curvelets and shearlets provide an optimally sparse approximation in the class of piecewise smooth functions with  2 singularity boundaries.In the next subsection, we will briefly review the discrete shearlet transform that is suited as sparsity operator for the restoration of curved structures.

Shearlet Transform.
The classical wavelet transform is unable to provide additional information about the geometry of the set of singularities of image.This is due to the fact that this transform is isotropic.As a result, it has a very limited ability to resolve edges and other distributed discontinuities which usually are very important for image.Shearlet transform is a directional representation system that provides more geometrical information for multidimensional functions such as images [31].
is generated by applying the operations of dilation, shear transformation, and translation on : The continuous shearlet transform of function  ∈  2 (R 2 ) is defined as The shearlet transform is invertible if the function  satisfies the admissibility property where ψ is Fourier transform of .

Mathematical Problems in Engineering
With these notations the shearlet becomes  ,, :=    ,  ,  () = ( −1    −1  , ( −   )).We select shearlet in this paper due to its directional sensitivity, availability of efficient implementation.Shearlet transform is by far the most optimal sparsifying transform for piecewise smooth image containing rich geometric information such as edges, corners, and spikes.It combines the power of multiscale methods with the ability of extracting geometry of image.

Our Model and Algorithm
where 0 <  < 1.This is a nonconvex optimization problem.One approach to handle this kind of nonconvex optimization problem is the iteratively reweighted least-squares (IRLS) approach which was previously taken by Rao and Kreutz-Delgado [32], but with unimpressive results due to getting trapped in local minima.The alternative herein proposed is to use an operator splitting method and the Bregman iteration method, as well as a novel -shrinkage operator to solve (15).
To facilitate the discussion of our algorithm, without the loss of generality, we represent a gray image as  ×  matrix, and the Euclidean space R × is denoted as , so  ∈ ,  ∈ .The linear operator A is a mapping A :  → .Using the above notations, the model (15) We also need the discrete divergence operator div :  2 →  that has the adjointness property This property forms the discrete analogue of integration by parts.For  = ( 1 ,  2 ) ∈  2 we define where  −  and  −  are the following backward difference operators: Analogously we define the second-order discrete differential operators   ,   , and   .These are just the corresponding compositions of the first-order operators: One can easily check that As before, we also need the discrete second-order divergence operator div 2 :  4 →  with the adjointness property For a  = ( where  −  =  +  ,  −  =  +  , and  −  =  −  with Remark 1.By choosing periodic boundary conditions, the action of each of the discrete differential operators can be regarded as a circular convolution of u and allows the use of fast Fourier transform, which our algorithm is relying on.

Constrained Formulation for Our Model.
In the following we use what is frequently called the split Bregman iteration to solve (16); see [33,34].To do so, we firstly formulate the unconstraint minimization problem ( 16) as a constraint problem and then apply the Bregman iteration to solve it.We introduce auxiliary variables Here, we define Note that the problems ( 16) and ( 27) are equivalent.The Bregman iteration that corresponds to (27) is the following: where ( Subproblem 2: Subproblem 3: Subproblem 4: We now describe how we solve each of the four subproblems ( 33)-(36).

To
2 ) = 0. ( The th subband of shearlet transform of  is able to be efficiently implemented as component-wise multiplication with a mask matrix denoted by   in frequency domain: with    = F −1   F.Here F and F −1 are the 2D discrete Fourier transform and inverse Fourier transform, respectively.Now by solving the above optimality condition, we get Taking into account that the discrete differential operators act like circular convolutions, we can use the 2D discrete Fourier transform to solve system (39) which has a closedform solution where "⋅" means pointwise multiplication of matrices.Consider and   is the right hand side of (39).The term FD needs to be calculated only once at the beginning of the algorithm.Thus, we have 3.4.To Solve Subproblems 2 and 3. Subproblems 2 and 3 are similar and the solutions can be explicitly expressed in the form of shrinkage where Shink(, ) = sgn() max(|| − , 0).

3.5.
To Solve Subproblem 4. Now we present the -shrinkage operator to solve subproblem 4. To avoid nondifferentiability caused by the ℓ  penalty term in subproblem 4, we consider its smooth approximation problem where   (⋅) is defined by for  > 0 and any real .Here the parameters  and  are chosen to make   (⋅) a  1 function, and  will be chosen shortly.For convenience we adopt the unusual convention that, when  = 0, ||  / will mean log(||).
Next, we show that there is a function   satisfying Let () is a smooth approximation of ‖‖   ; it still is nonconvex, but by taking  ≥  1/(−2) , above the auxiliary function   () is convex.Simple manipulation shows that, for   and   to satisfy (46), it is necessary and sufficient to have   =  *  , where  *  is the conjugate (or Legendre-Fenchel transform) of   defined as which shows how to construct   from   through computing   from   .Simple manipulations then show that (46) is satisfied by Moreover, the minimizer of (46) will be given by  * = ∇  (), which can be shown to be given by what we call the shrinkage operator as follows: This shrinkage operator will not be necessary to compute   () explicitly.We now incorporate the above into our algorithm.By using the half-quadratic technique originally introduced by Geman and Yang in [35], the approximation problem (45) becomes an equivalent augmented problem as follows: where  = ( 1 ,  2 , . . .,   ) ∈   .By letting  → ∞,  is forced to approach , making the solution to (51) approach that of (45).There is no interaction between  and  in (51), so (51) can become the following two subproblems which are alternately computed: Ma et al. [36] use a continuation approach to solve (52), starting with small  values and then increasing them geometrically, using the solution at each stage to initialize the next stage.Instead, we adopt the Bregman iteration approach of Goldstein and Osher [33].

Numerical Experiments
In this section, we will perform some image restoration experiments using our proposed TV+TV 2 +ℓ  sparsenessinducing penalty model (16).The image restoration scenarios will be denoising, deblurring, and reconstruction from incomplete data.We will use Algorithm 1 to solve (16).We compare our method with the one proposed in [30] (TV method) and the one proposed in [9] (TV+TV The quality of the restoration results with different methods is compared quantitatively by using the peak signalto-noise ratio (PSNR), the mean-squared error (MSE), and the structural similarity index (SSIM).They are defined as follows: PSNR = 10 ⋅ log 10 ( 255 × 255 MSE ) , where  and û are the expected image of size  ×  and the restored one, respectively, and where  1 and  2 are averages of  and û, respectively,  1 and  2 are the variance of  and û, respectively, and  12 is the covariance of  and û.The positive constants  1 and  2 can be thought of as stabilizing constants for near-zero denominator values.The SSIM is a well-known quality metric used to measure the similarity between two images.The SSIM metric was developed by Wang et al. [37] and is based on three specific statistical measures that are much closer to how the human eye perceives differences between images.The SSIM index also assesses the conservation of the structural information of the reconstructed image.Note that the perfect reconstruction would have SSIM value equal to 1.In the following experiments, we will use SSIM map to reveal areas of high or low similarity between two images, the whiter SSIM map, and the closer between the two images.

Denoising Experiments.
In our denoising implementation, the linear operator A equals the identity.We perform     experiments to images that are corrupted a white Gaussian noise with standard deviation .Note that, for  ̸ = 0,  = 0, and  = 0, our model ( 16) corresponds to the classical Rudin-Osher-Fatemi denoising model [30], and, for  ̸ = 0,  ̸ = 0, and  = 0, it corresponds to the TV+TV 2 restoration model [9].In the following we will examine whether the introduction of the higher-order term ( ̸ = 0) and the nonconvex sparsenessinducing penalty term ( ̸ = 0) in the denoising procedure produces results of higher quality.
We test our method on the cropped Lena image of size 256 × 256.The image is contaminated with Gaussian noise of variance 0.05.For comparison, we report the results of our proposed method (taking parameters  = 0.02,  = 0.01,  = 0.05,  1 = 0.2,  2 = 0.1,  3 = 0.0001, and  = 1000), the TV regularization method (taking parameters  = 0.02,  = 0,  = 0,  1 = 0.2,  2 = 0, and  3 = 0), and the TV+TV 2 regularization method (taking parameters  = 0.02,  = 0.01,  = 0,  1 = 0.2,  2 = 0.1, and  3 = 0).In general, we have found that  1 = 10,  2 = 10, and  ≥ 0.5 usually result in good convergence.Iterations are terminated when the condition ‖ +1 −   ‖ 2 /‖ +1 ‖ 2 ≤ 10 −3 is met.The denoising results by all three methods are shown in Figures 1 and 2. For better visualization, we include the middle row slices and the patch 3D profile.Note that the nonconvex parameter is  = 0.5 for our proposed method.We also have found that the convergence speed will become slow if  < 0.5, and the results are roughly the same if  ≥ 0.5.The following deblurring and reconstructing experiments are also this kind of circumstance.Compared with the other methods, the proposed method yields better visual results since it is able to preserve edges and fine features and provide more "natural-looking" images.

Deblurring Experiments.
In our deblurring implementation, A denotes a circular convolution with a discrete approximation of a Gaussian kernel of size 11 × 11.As an additional degradation factor, the initial image is corrupted by additive Gaussian noise of variance 0.001.The quality of the resulting images is measured in terms of an increase in the signal-to-noise ratio (ISNR) measured in dB.The ISNR is defined as where MSE 1 and MSE 2 are the mean-squared error between the degraded and the original image and the restored and the original image, respectively.The test image takes the Goldhill image of size 256 × 256.In this experiment, we take  = 0.00001,  = 0.00002,  = 0.001,  1 = 0.0001,  2 = 0.0002,  3 = 0.0001, and  = 5000 for our method,  = 0.00001,  = 0,  = 0,  1 = 0.0001,  2 = 0, and  3 = 0 for the TV method, and  = 0.00001,  = 0.00002,  = 0,  1 = 0.0001,  2 = 0.0002, and  3 = 0 for the TV+TV 2 model.As the previous experiments, in these deblurring experiments, we still take  = 0.5 for our proposed method.For visual appreciation, the corresponding restoration results are reported in Figures 3 and 4. The results show the consistent advantage of the proposed method over all the others.The TV solution is characterized by sharper edges, as expected, but also results in staircase effects appearing in the smoother regions of the image.The TV+TV 2 solution produces the smoother results and loses some details of the image.The result of our method achieves better visual effect and quantitative index than other methods.

Reconstructing Experiments from Incomplete
Data.Now we show the performance of the proposed method on the incomplete spectral data.Here, the linear operator A denotes the incomplete measurements on the Fourier transform domain of image.To simulate the incomplete measurements,

Figure 7 :
Figure 7: Reconstruction performance (Iterations versus MSE) comparisons on two test images.
, . ..,   3, ) ∈   , and  1 ,  2 ,  3 are positive constants.It is difficult to solve the minimization problem (29) directly; thus we split it into four separate subproblems for , V, , and , each of which can be solved fast as follows.
Solve Subproblem 1. Subproblem 1 can be solved through its optimality condition.The optimality condition reads as follows: 2 method).All computations of the present paper are carried out in Matlab 7.0.The results are obtained by running the Matlab codes on an Intel(R) Core (TM) i3 CPU (3.20 GHz, 3.19 GHz) computer with RAM of 1.92 GB.