Image Denoising via L 0 Gradient Minimization with Effective Fidelity Term

The L 0 gradient minimization (LGM) method has been proposed for image smoothing very recently. As an improvement of the total variation (TV)model which employs the L 1 norm of the gradient, the LGMmodel yields much better results for the piecewise constant image. However, just as the TV model, the LGM model also suffers, even more seriously, from the staircasing effect and the inefficiency in preserving the texture in image. In order to overcome these drawbacks, in this paper, we propose to introduce an effective fidelity term into the LGM model. The fidelity term is an exemplar of the moving least square method using steering kernel. Under this framework, these two methods benefit from each other and can produce better results. Experimental results show that the proposed scheme is promising as compared with the state-of-the-art methods.


Introduction
Noise is inevitable in the process of image acquisition and transmission, which brings great trouble to the subsequent image analysis; therefore, image denoising is the most fundamental research topic in the community of image processing and computer vision.However, there is always a dilemma for the denoising algorithms to simultaneously remove noise and to preserve edges.The objective of almost all methods focus on how to get a tradeoff between smoothing noise and blurring edges.The state-of-the-art denoising algorithms can be categorized as (1) those taking advantage of nonlocal similarity of patches in the image: such methods include the nonlocal mean (NL-Mean) [1], BM3D [2], and PLOW [3]; in [4], the author presented a tutorial on these stateof-the-art denoising methods, and it has been shown that the LARK [5] takes the bilateral filter [6] and the nonlocal mean (NLM) [1] as special cases and they are closely related to the anisotropic diffusion [7]; (2) variational and partial differential equations-(PDEs-) based methods, such as the anisotropic diffusion [7], total variation [8], and related works [9][10][11][12][13][14]; and (3) sparse representation based method, such as the K-SVD [15] and nonlocal sparse works [16,17].There have been a flurry of works generalizing the variational methods in nonlocal manner, such as [18,19].
The partial differential equations (PDEs) have been justified as effective tools for image smoothing during the last two decades, which are able to achieve a good tradeoff between noise removal and edge-preserving.The characteristic of these approaches is that it takes the form of an unconstrained regularized data fitting model, where the desired image is obtained as a regularized minimizer to a certain functional which contains both regularization and fidelity terms.One of the most popular methods in this framework is the total variation (TV) method [8,11,13,14,20,21].It can be described as an unconstrained problem with  1 norm of the gradient as the regularization term.Recently, Xu et al. proposed a modified scheme of the TV model by replacing  1 norm of the gradient with  0 norm in the optimization framework for image smoothing, that is, the  0 gradient minimization (LGM) [22].Compared with the TV model using the  1 norm, the LGM model has proved to be very effective on preserving edges due to the avoidance of local filtering and of averaging operation.However, LGM scheme tends to smooth the observed image towards a piecewise constant image, which also suffers, even more seriously than the TV model, from the staircase effect, and it can remove most of the texture when applied to the texture images.
Meanwhile, a nonparametric data fitting approach based on the localized least square method has been proposed for image processing [23,24].This framework is represented by moving least squares (MLS) model [23], where two kinds of robust estimators were applied to the MLS for image processing.In [5], a steering kernel regression is proposed which steers the local kernels along the directions of the local edge structure; the method is typically called local adaptive kernel regression (LARK).It is the exemplar of the moving least square method using steering kernel.In [24], Lee et al. combined the MLS and the total variation used in the image denoising.The MLS method using steering kernel can preserve the texture and edges well.However, the models show weakness against outliers, which is the main reason that the denoising performance of MLS based models is usually not good.
In order to overcome the drawback of the LGM model, in this paper, we introduce the MLS model with steering kernel into the LGM model as fidelity term.The proposed scheme can provide a better solution than the conventional LGM scheme on overcoming the staircase effect and preserving texture.The proposed model also endows the MLS model with better denoising performance since LGM is strong against outliers.
The remainder of this paper is organized as follows.In Section 2, the LGM and MLS models are introduced in brief.In Section 3, the proposed model is presented and the numerical solution is given.In Section 4, the performance of the proposed model is demonstrated by experiment and comparison.Finally, conclusions are drawn in Section 5.

Background
In this section, we briefly review the related works, that is, the LGM [22] and MLS with steering kernel (LARK) [5], and our description follows that of [5,22].

2.1.
LGM:  0 Gradient Minimization.Suppose  is the input image and (x) the smoothed image and   (x  ) and   (x  ) are the partial derivatives calculated between neighboring pixels along  and  directions, respectively.Then the gradient ∇(x  ) = (  (x  ),   (x  ))  can be obtained for each pixel.The gradient measure of  0 gradient minimization is expressed as follows: where #{} is the counting operator; it counts x  whose magnitude |  (x  )| + |  (x  )| is not zero, that is, the  0 norm of gradient.To note that the measure () should be combined with a general constraint, the result (x) should be structurally similar to the input image (x) [22].The specific objective function is expressed as where () =  indicates that  nonzero gradients exist in the result.Equation ( 2) can be further written as an unconstrained optimization problem as follows: where  is a weight directly controlling the significance of ().

MLS: Moving
Least Squares.Suppose that the observed image and the estimated image are, respectively, discrete sampling of functions (x) :  →  and (x) :  →  at an equally spaced point set in a rectangle  ⊂  2 .The task of image restoration is to estimate the image (x) for each point x ∈  given the low quality observations (x).In the MLS method, (x) is always derived from a polynomial space of degree  and dimension  denoted by ∏   .The relationship between degree  and dimension  is defined as  = (1/2)( + 2)( + 1), where dimension  decides the number of basis {  } −1 =0 of the polynomial space ∏   .For example, if  = 2, then  = 6, and we take this combination in our implementation.We can get the polynomial space ∏   with the corresponding basis {  } 5 =0 = {1, , ,  2 , ,  2 }.Therefore, the form of the regression function at any point x is given by where {  } 5 =0 are the coefficients.The MLS method performs in a local manner, and the value of the estimated image at a location is influenced only by the pixels within a small neighborhood of that position.As such, MLS provides a rich mechanism for computing pointwise estimates of the function with minimal assumptions about global signal or noisy model.Define  = 1, 2, . . .,  as the th sampling point around the estimated point x.Usually,  has to be bigger than  so that the problem can be solved.Then, for any point x, the polynomial approximation function (x) is obtained by solving the following quadratic minimization problem: min where (x  − x) denotes the kernel function, which decides how much the data point (x  ) contributes to the estimated pixel value.If (x  − x) is large, (x  ) contributes much to the estimated pixel value.In our proposed method, we employ the steering kernel function [5] that is shown in detail in the next subsection.

The Steering Kernel.
A typical choice of the kernel is an isotropic Gaussian kernel, which computes a weighted average of pixel values in the neighborhood, where the weights decrease with distance from the neighborhood center.However, the isotropic Gaussian kernel suffers from severe limitation that it usually blurs the edge structures in the image.The bilateral kernel, which is a typical data-adapted kernel, can preserve the edge structures better by introducing the pixel value into the weight.However, the bilateral kernel breaking into spatial and radiometric terms ignores correlations between positions of the pixels and their values, so the performance is weakened.The steering kernel proposed in [5] measures the local structure of data by making use of an estimate of the local geodesic distance between nearby samples, so it can preserve image edges and texture well.The steering kernel is represented as where   is a scalar that captures the local density of data samples (nominally set to   = 1), ℎ is the global smoothing parameter, and C  is symmetric covariance matrices of the gradient of sample values estimated from the given data, yielding an approximation of local geodesic distance in the exponent of the kernel.This kernel is closely related but somewhat more general than the Beltrami kernel [25] and the coherence enhancing diffusion approach [26,27].

Proposed Method
Since the LGM suffers from the serious staircasing effect and removes most details, we introduce an effective fidelity term that is an exemplar of the moving least square method using steering kernel.Then, the proposed model is written as where  and () are identical to those in (3) and the second part in (7) is the same as that in (5); that is, we replace the fidelity term in (3) with the weighted sum in (5) such that the LGM in (3) can preserve texture better.The numerical methods are borrowed from [5,22,28].Specifically, two auxiliary variables ℎ  and V  are introduced to expand the original terms and iteratively updated; therefore, (7) is reformulated as where  is an automatically adapting parameter to control the similarity between variables (ℎ, V) and their corresponding gradients.The specific solution is to split (8) into two subproblems to find (ℎ, V) and , respectively, in an alternative minimization manner.
Then, the object function is written as the following form: This subproblem is similar to that in the LGM and can be easily solved by spatially decomposing so that each element ℎ  and V  can be estimated individually.Since (ℎ, , therefore, (9) can be further written as where The energy for each element (ℎ  , V  ) reads and each element (ℎ  , V  ) has the following closed-form: Subproblem 2 (Computing ).When calculating , the terms not involving  are removed from (8).Then, the functional is written as the following form: Since (x  ) takes the form in (4), our goal is to get the estimate value of regression coefficients b = [ 0 ,  2 , . . .,  5 ]  , and it is possible to express ( 14) as a weighted least squares optimization problem [23,24].Let And (b) can be rewritten in matrix form as follows: where (1) Input: noisy image , parameter , , iteration number IT (2) Initialization:  0 = ,  = 2, calculate ∇ 0 .
(3) For  = 1, . . ., IT, do Step 1. Estimate ℎ () and V () in ( 13) with  (−1) Step 2. With ℎ () and V () , solver for  ()  For each pixel location x, do (i) Construct the weight matrix w x (ii) Calculate the regression coefficients with (19) and update the estimation  () (x) with ( 20 and set it to zero; we have Then, we can get the estimate of (x) as follows: The algorithm is presented in Algorithm 1.

Experimental Results
In this section, we will demonstrate the performance of the proposed method and make a comparison with several stateof-the-art methods including NL-Mean [1], LARK [5], BLF [6], TV [8], K-SVD [15], LGM [22], and LMMSE [29].Eight images are employed for test, which are shown in Figure 1 and a noisy image is coined by adding white Gaussian noise with standard deviation of  = 20 to the clean one.The peak signal-to-noise ratio (PSNR) and the mean structure similarity (MSSIM) [30] are employed as object indexes to evaluate the image quality of the filtered images.The MSSIM ranges from 0 to 1 and if the filtered image is identical to the noise-free one, it is 1.Since  is a weight directly controlling the significance of (⋅), which is in fact a smoothing parameter, a large  makes the result have few edges.Since the "House" and "Test" images are nearly piecewise constant, the parameter  is relatively small and set to 1.0e2, other images possess texture, and the parameter  is 1.0e4.The parameter  is multiplied with parameter  in each iteration to speed up convergence [28], and it is 1.05 in all the experiments.All the parameters of the other methods are set as what have been suggested to be the optimal one in the original paper.
The PSNR and MSSIM indices of the eight models on eight images are reported in Table 1, from which one can conclude that the proposed method performs the best in terms of PSNR and MSSIM indices.This observation clearly verifies the effectiveness of incorporating MLS based fidelity term into the framework of the LGM model.It is also clear that the proposed method performs comparably to the LARK and K-SVD method in terms of PSNR and MSSIM; however, the LARK method performs visually inferior to the proposed one; let us further visually inspect the filtered images.The results of the eight methods on House image are listed in Figure 2, but only a part of the House image is shown for the sake of clarity.Visually, the TV model suffers from blocky artifacts (see Figure 2(b)); the LMMSE method blurred image edges, in Figure 2(c).The results by the BLF and NL-Mean models are somewhat foggy and misty, the BLF model performs more seriously (see Figure 2(d)), and the foggy results are visually unpleasant.The result by the LARK method is shown in Figure 2(f), where there are serious flow-like structures around the edges.The results by the K-SVD method and LGM method are presented in Figures 2(g) and 2(h), respectively.There is impulse noise in the result by the LGM method.The result by the proposed method in Figure 2(i) is more satisfactory than the that of the LARK method, although the corresponding PSNR and MSSIM indices are comparable.
Since the proposed method employs the MLS based model as fidelity term in the LGM model, as seen in Table 1 and Figure 2, this strategy not only improves the PSNR and MSSIM indices, but also makes the filtered results closer to the original image.There are two segments in Figure 2(a), and the intensity profiles of the two segments of the noise-free image and the results by the LARK, LGM, and the proposed methods are shown in Figure 3.The intensity profiles of the results of the LARK method possess similar trend to the noise-free one; however, the intensity values are much smaller than that of the original image, which implies the LARK method reduces image contrast.The LGM method tends to make the image intensity constant and, therefore, cannot preserve the original profile.In contrast to the LARK and LGM methods, the proposed one yields result very close to, and much smoother than, the original one.This observation means the proposed method yields smooth result of high fidelity.Since the proposed method yields result of high fidelity, it can preserve texture very well.The filtered Eagle images and the corresponding residuals are shown in Figure 4. From the residuals of the LARK, LGM, and the proposed methods, the proposed method performs the best on preserving texture.
In order to demonstrate the performance of the proposed method at different noise level, the Pepper image is contaminated by additive white Gaussian noise with the standard deviation of 20, 30, and 40, and the eight models are applied to the noisy Pepper images.The PSNR and MSSIM indices are reported in Table 2 (see Table 1 when noise deviation is 20).The filtered images when noise deviation is 40 are shown in Figure 5. From Table 2, it is clear that the proposed method, LARK, and K-SVD perform comparably, but the proposed method is the best.On the other hand, the results of the proposed method are visually the best since the LARK method  suffers from flow-like artifacts and the K-SVD method suffers from artifacts due to the inaccurate sparse representation.

Conclusion
In this paper, we have proposed an improvement of the  0 gradient minimization (LGM) model.The proposed model is coined by introducing the MLS based fidelity into LGM model and the fidelity term is an exemplar of the moving least square method using steering kernel (LARK).The main result is that the proposed method combines the advantages of the LGM and LARK, so that the proposed model can simultaneously preserve texture and resist the flow-like artifacts.
Experiments have been conducted on both synthetic and real images, and comparisons have been launched with the classical and state-of-the-art models such as the TV, LMMSE, NL-Mean, BLF, LARK, K-SVD, and LGM models.We have evaluated these models from PSNR and MSSIM indices and visual inspection.Overall, the proposed model yields promising results and we believe that the MLS based method can also be combined with other variational methods for image filtering.
To note, the overall computational load of the proposed method is heavier than that of the model in [22] since the steering kernel has to be calculated.

1 Figure 1 :
Figure 1: From left to right, top to bottom, the noise-free images are Lena, Woman, Lady, Pepper, Eagle, Bird, Test, and House.

Figure 2 :
Figure 2: Demonstration and comparison on the House image: (a) noisy image and filtered images by (b) the TV model, (c) the LMMSE, (d) the BLF model, (e) the NL-Mean model, (f) the LARK model, (g) the K-SVD model, (h) the LGM model, and (i) proposed model.

Table 1 :
Denoising results measured in terms of PSNR (top) and MSSIM (bottom).