New Regularization Models for Image Denoising with a Spatially Dependent Regularization Parameter

We consider simultaneously estimating the restored image and the spatially dependent regularization parameter which mutually benefit from each other. Based on this idea, we refresh two well-known image denoising models: the LLT model proposed by Lysaker et al. (2003) and the hybrid model proposed by Li et al. (2007). The resulting models have the advantage of better preserving image regions containing textures and fine details while still sufficiently smoothing homogeneous features. To efficiently solve the proposed models, we consider an alternating minimization scheme to resolve the original nonconvex problem into two strictly convex ones. Preliminary convergence properties are also presented. Numerical experiments are reported to demonstrate the effectiveness of the proposed models and the efficiency of our numerical scheme.


Introduction
Image denoising is a fundamental problem in image processing and computer vision. In many real-world applications, it forms a significant preliminary step for subsequent image processing operations, such as object recognition, medical image analysis, surveillance, and many more. During acquisition and transmission, images are often corrupted by Gaussian noise. In this problem, the degradation process is modeled as where , exact , and represent the observed image, the original image, and the additive white Gaussian noise, respectively. The objective of image denoising is to compute a good estimate of exact from . To obtain a reasonable approximated solution from (1), the regularization method, generally used as a numerical technique for stabilizing inverse problems, has been increasingly applied to image denoising over the past decades. A large class of regularization based denoising methods unifies under the following framework: where denotes the restored image to be estimated. In the right-hand side of (2), ( ) represents the fidelity term, which measures the closeness of the estimate to the data. The functional ( ) is called the regularization term pushing to exhibit some a priori expected features. Parameter in (2) is known as the regularization parameter, which balances the tradeoff between the two terms.
One key topic in regularization methods is the choice of the regularizer. Among many regularization based denoising models, the total variation regularization, proposed by Rudin, Osher, and Fatemi (ROF) [1], has won tremendous success due to its edge-preserving property. In the ROF model, the restoration result is generated by solving the following minimization problem: where the TV-term ROF ( ) = ∫ Ω | | denotes the total variation of (see [2] for more details) and Ω denotes the image domain. A remarkable aspect of the ROF model is that the TV-term does not penalize the discontinuities in ; see, for example, [3]. This property allows us to restore the edges of the original image. However, the main disadvantage of the ROF model is the so-called staircase effect (smooth regions are transformed into piecewise constant regions), a phenomenon long observed in the literature [4][5][6]. As a consequence, the restoration result is unsatisfactory to the eye due to the loss of texture details and the generation of artificial edges that do not exist in the true image. To overcome this effect, models based on high-order PDEs have been proposed in the literature; see, for example, [7][8][9][10][11]. For instance, Lysaker, Lundervold, and Tai [8] proposed a fourthorder PDE model (termed as the LLT model) with the form where LLT ( ) = ∫ Ω |∇ 2 | is a fourth-order filter. From atheoretical point of view [12], it has been shown that fourthorder PDEs are superior to second-order PDEs in some aspects, including avoiding the staircase effect. However, this type of filters usually blurs the edges of the original image and suffers from the speckle effect in homogeneous regions. Since the ROF model and the LLT model are of both merits and drawbacks, it may be desirable to promote solutions that simultaneously exhibit properties that are enforced by both regularizers. This is the basic idea of [13] proposed by Lysaker and Tai, in which they studied an iterative algorithm based on combining the results generated by (3) and (4). But their method needs to solve two separate PDEs and their combination is not quite intuitive. Li et al. [14] proposed a hybrid model in the following form: where the function is chosen as an edge detector 1/(1 + + |∇ * | 2 ) to control the relative weights of the two regularizers (see [14] for details). By their selection of , the second-order filter plays the dominant role where |∇ * | is large (regions including sharp features), whereas the fourthorder filter plays the dominant role where |∇ * | is small (homogeneous regions). Therefore, (5) reaps the benefits of both regularizers. More related works on the combination of the ROF model and the LLT model can be found in [15][16][17].
Another crucial issue in the regularization process is the suitable selection of the regularization parameter. In regularization models, the regularization parameter controls the relative weights of the data fidelity and regularization terms. However, due to the inhomogeneous distribution of cartoon, textures, and small details in an image (in terms of variance), a global constant parameter may not be suitable for these features of different scales. According to the cartoon pyramid model studied in [18], as the regularization parameter goes from small to large values,      the corresponding solutions generated by the ROF model range from undersmoothed (textures are preserved, while noise remains almost unchanged in homogeneous regions) to oversmoothed (noise is reduced well, but significant details are lost). This suggests that a spatially dependent regularization parameter including, instead of a single constraint, a group of constraints adapted to different regions of the image is desired to obtain higher quality results. To this end, denoising methods with a spatially dependent regularization parameter have been extensively studied; see, for example, [18][19][20][21][22]. In these works, the automated selection strategies of the regularization parameter are based on local variance measures [18,[20][21][22] and the local statistical characteristics of the noise [21,22].
Note that, in models (3), (4), and (5), the selection of the regularization parameter never accommodates the information of the current restored image. This triggers us to seek a new approach for regularization parameter estimation. Our main motivation is to simultaneously estimate both the restored image and the regularization parameter which mutually benefit from each other during the denoising procedure. We propose a general model with the following form: where denotes a spatially dependent regularization parameter, ( ) and ( ) are, respectively, fidelity terms for the two variables, and the binary function ( , ) represents a regularizer. The advantages of (6) are presented as follows.
First, instead of a global constant regularization parameter, a spatially dependent regularization parameter is more flexible to image features of different scales. Second, since the estimation of the two variables is derived simultaneously, the regularization parameter is changing more reasonablly by exploiting the more accuracy restored image instead of Abstract and Applied Analysis the observed images corrupted by noise. Here we remark that (6) is a general-purpose model which can incorporate various classical methods. Specially we focus on the fourthorder filter in (4) and the hybrid regularizer in (5). The refresh models are named as Models 1 and 2, respectively. Model 1 can suppress the speckle effect caused by the LLT model while less overregularizing textures. Model 2 has the advantage of better restoring textures and homogeneous regions while preserving edges. To overcome the nonconvexity of our models, we utilize an alternating minimization scheme to resolve the original nonconvex problem into two strictly convex ones. Thus, our models can be asymptotically solved. The outline of the rest of the paper is as follows. In the next section, we give notations and discretizations. In Section 3, we introduce our models with some discussions. The following Section 4 presents the numerical scheme for solving the proposed models. In Section 5, numerical experiments are given to demonstrate the performance of the proposed models. Finally, we conclude the paper in Section 6.

Notations and Discretizations
From now on, we will restrict our attention to the discrete setting. We first introduce some notations. Without loss of generality, we assume that all the images in this paper are grayscale and have a square domain. Then we represent an image as an × matrix, where , represents the intensity value of at pixel ( , ), for , = 1, . . . , . For the sake of simplicity, we assume that the image is periodically extended, and then the FFT can be adopted in our algorithm. It should be pointed out that adaption to other boundary conditions is not difficult in principle. In the rest of this paper, we let ‖ ⋅ ‖ 2 , ‖ ⋅ ‖ F , and ∘ denote the 2-norm, the Frobenius norm, and the Hadamard product, respectively. Let denote the Euclidean space R × . The usual inner product and Euclidean norm of are denoted as ⟨⋅, ⋅⟩ and ‖ ⋅ ‖ , respectively. Denote by the space × equipped with the inner product ⟨⋅, ⋅⟩ leading canonically to the norm ‖ ⋅ ‖ , that is; for = ( 1 , 2 ) ∈ and = ( 1 , 2 ) ∈ , Moreover, for = ( 1 , 2 ) ∈ , | | denotes the × matrix whose element | | , is equal to ‖ , ‖ 2 with , = ( 1 , , 2 , ). We denote the space × as . The definitions of the inner product ⟨⋅, ⋅⟩ , the norm ‖ ⋅ ‖ , and | | are analogous to those of .

6
Abstract and Applied Analysis Now we introduce a discretized version of some necessary operators. For ∈ , we introduce forward and backward difference operators as follows: Second-order difference operators can be expressed by using a recursive application of first-order difference operators; that is, the operator Other second order difference operators used in this paper such as can be similarly defined. Based on the definitions above, we can introduce the discrete gradient operator. For ∈ , ∇ is a vector of , which is given by The discrete total variation of is then given by To discretize the fourth-order filter in the LLT model (4), we have to introduce the discrete Hessian operator. Here we adopt the definition in [23]. The discrete Hessian operator is a mapping : → , and for ∈ , is defined by Then, LLT ( ) can be discretized as We also introduce two important operators: div : → and * : → , that is, the adjoint operator of −∇ and Abstract and Applied Analysis , respectively. By analogy with the continuous setting, for ∈ , ∈ , and ∈ , we want them to satisfy ⟨div , ⟩ = ⟨ , −∇ ⟩ , Then they are formulated as follows: Finally, the composite operators Δ = div ⋅∇ and * are also used.

The Proposed Models and Discussion
3.1. The Proposed Models. Arising from model (6) and exploiting the benefits of the LLT model (4) and the hybrid model (5), we consider and study the following models.
Model 2. One has In the above two models, and are positive parameters, 1 represents the matrix whose elements are equal to 1, and : → denotes a discrete mean filter. More precisely, we choose an odd integer and define an -by-window 8 Abstract and Applied Analysis where , = 1, . . . , . Then for ∈ , ( ) is given by It is immediately clear that if = 1, coincides with the identity operator. We start our investigations of the proposed models by the following lemma.
Proof. We define the following function : R 2 → R: From the definition of , it is immediate to see that ( , ) = (− , − ). Thus, we have According to Lemma 1, we can rewrite the regularizers in our models in another equivalent form. In (16), for example, we rewrite Proof. First, under our assumption, both models are bounded from below, which implies that the minimization problems are the correct setting.
It is immediate to see that the functions 1 : × → R and 2 : × × → R are proper, coercive, and continuous. Then, the existence of a global minimizer is deduced by Weierstrass' theorem [24].
We end this subsection with the following proposition which describes the relationship between the variables in a solution of our models.
Proof. (i) Under our assumption, there exists some > 0 such that Then, we have which implies that * is a local minimizer of the following minimization problem: According to the nonnegativity of ( ( * ∘ * )) , , we can deduce that the objective function of (28) is strictly convex. Then, * is the unique global minimizer of (28). Similarly, we can obtain that * is the unique global minimizer of the following strictly convex minimization problem: Hence we complete the proof of (i).
(ii) Using the same technique, (ii) can be similarly proved, and we omit the details.

Discussion on the Proposed Models.
In this subsection we discuss the strengths of the proposed models by investigating the numerical behavior of the solution. Proposition 3 implies that there is an interaction between the restored image and the regularization parameter in a solution of our models, which can achieve joint optimality. In other words, both variables benefit from each other. Assume that ( * , * ) and ( * , * 1 , * 2 ) are solutions of Models 1 and 2, respectively. According to Proposition 3, * , * 1 , and * 2 can be calculated directly from * . Since the minimization subproblems with respect to * and * 2 are actually the same, we just consider * 1 and * 2 for the sake of brevity. They are given by We are interested in the numerical behavior of * 1 and * 2 corresponding to regions of * with different scales (e.g., texture, flat, and ramp regions). Figure 1(a) shows a 256 × 256 synthetic image. The corresponding * 1 and * 2 are shown in Figures 1(b) and 1(c), respectively, where * is set to be the original image. Form Figure 1, numerical behavior of * 1 and * 2 is summarized as follows.
(i) For texture regions of * , * 1 and * 2 are both small (darker regions indicate smaller value).
(ii) For flat regions of * , * 1 and * 2 are both large. (iii) For ramp regions of * , * 1 is inversely proportional to the gradient of * , whereas * 2 is large.
Below we pay attention to the restored image * . As we have mentioned in the Introduction, values of the regularization parameter control the relative weights of the fidelity and regularization terms. More precisely, small values lead to little regularization, whereas large values result in overregularization. Based on the behavior analysis above, in Model 1, textures of * are not compromised due to the corresponding small values of * , and the speckle effect caused by the LLT model is removed from flat and ramp regions of * due to the corresponding large values of * . Therefore, Model 1 can produce higher quality restoration results than the LLT model. Model 2 inherits the advantages of Model 1 and exhibits some new ones. Edges of * are well preserved by total variation regularization. At the same time, the staircase effect is suppressed in ramp regions due to the corresponding small values of * 1 (the fourth-order filter plays the dominate role). Therefore, Model 2 incorporates the strengths of both regularizers while avoiding their drawbacks. Finally, we remark that, compared with the hybrid model (5), Model 2 is more reasonable. Observe that (5) utilizes a fixed weighting function to combine the two regularizers. However, the weighting function is computed from the noisy image, which leads to inaccuracy. On the other hand, the regularization parameter in Model 2 depends on the restored image, which is less affected by noise.

Algorithms
In this section, we formulate the numerical scheme for solving our models. Our basic idea is to use the alternating minimization scheme which is described in Algorithms 1 and 2.
Next we would like to show the convergence of our algorithms.  Proof. (i) First, we can easily deduce the following inequality from Algorithm 1: Then the sequence { 1 ( , )} is bounded, and there exists a constant ≥ 0 such that The above inequality reads as which implies that {( , )} is uniformly bounded in × . Then we can find a subsequence {( , )} ⊂ {( , )} and (̂,̂) ∈ × such that they satisfy On the other hand, for any ∈ , we have Recall that the function 1 : × → R is continuous; we obtain by letting tend to infinity. Similarly, for any ∈ , we have Hence we complete the proof of (i).
(ii) Using the same technique, (ii) can be similarly proofed, and we omit the details.
It is obvious that (b) in Algorithm 1 and (b) in Algorithm 2 can be easily solved. To solve (a) in Algorithm 1 and (a) in Algorithm 2, we use the alternating direction method (ADM) [25], which is closely related to the augmented Lagrangian method [23], the Douglas-Rachford splitting algorithm [26], and the split Bregman method [27]. For the sake of brevity, we only present the ADM procedure for solving (a) in Algorithm 2. The ADM procedure for solving (a) in Algorithm 1 can be analogously derived. We rewrite (a) in Algorithm 2 by introducing auxiliary variables and as follows: Then the problem fits the framework of the alternating direction method. By using the Lagrangian multipliers 1 ∈ and 2 ∈ , the augmented Lagrangian function of (40) is given by where > 0 is the penalty parameter for the linear constraints. Then the minimization of (40) is achieved by an iterative process: in each iteration, we minimize the augmented Lagrangian function (41) with respect to , , and , given the other two updated in previous iteration, and then update 1 and 2 . The computational procedure is presented in Algorithm 3. Now we make some remarks on the ADM procedure. First, we observe that every step in Algorithm 3 has a closedform solution, and then the alternating direction method can be efficiently implemented. Moreover, for a convex objective function with linear constraints, the convergence of the alternating direction method is guaranteed; see, for example, [28,29]. Second, to obtain an exact solution of (a) in Algorithm 2, one needs to let (the maximum iteration number of the ADM procedure) tend to infinity; that is, = ,∞ . Since the regularization parameters are not optimal, in practice, it is not necessary to compute exactly. For the sake of computational efficiency, only several ADM iterations are implemented. Third, aiming at faster convergence of Input: , , , and . Output: . Initialization: 0 = 0.
while not converged do Step 1. Given −1 , computing by solving Step 2. Given , computing by solving Algorithm 1: The alternating minimization method for solving Model 1.
Input: , , , and . Output: . Initialization: 0 1 = 0 and 0 2 = 0. while not converged do Step 1. Given −1 1 and −1 2 , computing by solving min Step 2. Given , computing 1 and 2 by solving the ADM procedure, we initialize each iteration with the auxiliary variables and the Lagrangian multipliers updated in the previous iteration. Numerical examples will be given in Section 5.2 to demonstrate the efficiency of our numerical scheme.

Numerical Experiments
In this section, we provide numerical results to illustrate the effectiveness of the proposed models. In Section 5.1, we compare the performance of the proposed models with the ROF model (3), the LLT model (4), and the hybrid model The quality of the restored images is assessed using the relative error (ReErr) and the signal-to-noise ratio (SNR). They are defined as where , , and are the original image, the mean intensity value of , and the restored image, respectively. Now we present implementation details of the comparative experiment. In our algorithms, parameters and are fixed and set to be 0.07 and 10, respectively. For parameters and , their values are reported in the corresponding figures. For solving models (3), (4), and (5), we use the alternating direction method (ADM) [22]. For a fair comparison, all of the parameters in these three models are optimized to achieve the best restoration with respect to the ReErr and the SNR values. In all the ADM procedures, we set the penalty parameter to be 1 as a default value for most of digital images where F and F −1 denote the discrete Fourier transform and the inverse discrete Fourier transform, respectively, and Fourier transforms of operators Δ and * are regarded as the transforms of their corresponding convolution kernels.
Step 3. Given , , , , and , , update , 1 and , 2 by with intensity range in [0, 1]. We terminate all the algorithms by the following stopping criterion: In Figures 3, 4, 5, and 6, we display the resulting images restored by different models. The corresponding relative errors, SNR values (dB), and computational time (in seconds) are listed in Table 1. From these results, we find that our models restore better images than the ROF model (3), the LLT model (4), and the hybrid model (5), both visually and quantitatively. In the images restored by the ROF model, we observe that sharp edges are preserved, but the staircase effect is clearly present (e.g., the homogeneous regions in Figures 5 and 6). For the restored results of the LLT model, the staircase effect is suppressed, but some speckle effect is introduced in homogeneous regions (e.g., the background of Figures 3, 4, 5, and 6). We also note that both models (3) and (4) compromise details and textures due to using the global constant regularization parameter. For the images restored by the hybrid model, visual artifacts are almost suppressed, but details are not recovered well (e.g., the ramp region in Figure 3). This is due to the inaccurate estimation of their weighting function from the observed image. Our methods, on the other hand, restore the homogeneous regions significantly better while preserving more details. To better understand the behavior of our methods, some zoomedin local results and contour plots are shown in Figures 7,   8, and 9. We can find that the homogeneous regions are almost smooth (e.g., the background of Figure 3, the flat and ramp regions in Figure 7, and the face of Barbara in Figure 9). The contour plots given in Figure 8 also visually illustrate the above observation. At the same time, details and textures are preserved clearly without being overregularized (e.g., the ramp region in Figure 3, the textures in Figure 7, and the textures on the scarf in Figure 9). Furthermore, we find that Model 2 performs better than Model 1 with respect to preserving edges (e.g., the contours around the lips and nose of Lena in Figure 8). In Figure 10 Moreover, we observe that the values of the regularization parameters are small in detail and texture regions, and they are large in homogeneous regions. Therefore, our methods are superior with respect to removing noise while preserving details.

Parameter Study.
In this section, we present some criterions on choosing the parameters which are necessary to start up our algorithms. In our experiments, the parameter settings are the same as those in Section 5.1, except the testing parameters.
The window size is used to control the smoothness of the spatially dependent regularization parameter. We test our algorithms for different values of varying from 1 to 19. Figure 11 shows the plots of the SNR values of the restored images. We see from the plots that a small value of (leads to a sharp regularization parameter) is suitable for images with sharp features (e.g., = 1 for Figure 2(a) and = 3 for Figure 2(c)), whereas a moderate value of (leads to a relatively smooth regularization parameter) is needed for images containing textures (e.g., = 7 for Figures 2(b) and 2(d)). For large values of ( ≥ 9), we find that the changes of the SNR values are not significant. However, if becomes too large, then the regularization parameter is oversmooth, which compromises image details.
The parameters and jointly measure the values of the spatially dependent regularization parameter. The difficulty of tuning these parameters is that they depend not only on the noise level but also on the images. For this reason, trial by error for the two parameters is used. More precisely, we simplify the tuning process by fixing and searching for the best of each image.
We also study the behavior of the proposed numerical scheme with respect to the setting of and our initialization. For the setting of Figure 2(c), Table 2 shows the summarized results of our models for different values of with our initialization and zero initialization (the auxiliary variables and the Lagrangian multipliers are initialized to zero). We see from Table 2 that when using zero initialization, only sufficiently large (e.g., = 30 and 40) can provide good restoration result with increasing computational time. Our initialization, on the other hand, produces results as effectively as zero initialization with slightly less computation time. In addition, there is no considerable difference between the relative errors and the SNR values for different values of . Since a too small or too large yields comparatively high computational cost, a moderate (e.g., = 10) is recommended in our algorithms.

Concluding Remarks
We have proposed two regularization models for image denoising, in which both the restored image and the spatially dependent regularization parameter are estimated simultaneously. By the construction of our models, both variables mutually benefit from each other during the denoising process, which can achieve joint optimality. The numerical behavior of the regularization parameter and the strengths of our models have been discussed in detail. The proposed models, which are nonconvex, can be asymptotically solved by the proposed alternating minimization scheme. Finally, the numerical results indicated that our methods outperform several popular methods with respect to both noise removal and detail preservation.