A Novel Directionlet-Based Image Denoising Method Using MMSE Estimator and Laplacian Mixture Distribution

A novel method based on directionlet transform is proposed for image denoising under Bayesian framework. In order to achieve noise removal, the directionlet coefficients of the uncorrupted image are modeled independently and identically by a two-state Laplacian mixture model with zero mean. The expectation-maximization algorithm is used to estimate the parameters that characterize the assumed prior model. Within the framework of Bayesian theory, the directionlet coefficients of noise-free image are estimated by a nonlinear shrinkage function based on weighted average of the minimum mean square error estimator. We demonstrate through simulations with images contaminated by additive white Gaussian noise that the proposed method is very competitive when compared with other methods in terms of both peak signal-to-noise ratio and visual quality.


Introduction
The recent advancement in multiscale geometric analysis has promoted an enormous amount of research in the field of image processing, including image denoising.The distortion of images by additive white Gaussian noise is common during the images acquisition, processing, compression, and transmission.The goal of image restoration is to reconstruct a plausible estimate of the original image from the distorted image or observation and keep the image features as much as possible at the same time.
As a representative of multiscale geometric analysis tools, wavelet transform (WT) [1] played an important role in the application of image denoising and has achieved great success over the last decade.There are plenty of works on image denoising using wavelet thresholding since Donoho [2,3] has done the pioneering work for signal denoising using WT.The method described in [3] has near-optimal properties in the minimax sense and performs well in simulation studies of one-dimensional curve estimation.The image denoising methods using threshold mainly consist of three steps: (1) representing the image sparsely using WT; (2) determining an appropriate threshold and comparing the coefficients with it, preserving or modifying the few high-magnitude transform coefficients that convey mostly the true-signal energy and discarding the rest which are mainly due to noise; (3) reconstructing the denoised image using inverse WT based on the thresholded coefficients.Due to their effectiveness and simplicity that operate on one wavelet coefficient at a time, various wavelet-based thresholding methods were proposed for image denoising [4][5][6].Although the denoising performance of the wavelet thresholding is better than that of spatial filters (such as the filters proposed in [7][8][9]), the problem of how to develop an optimal threshold used for distinguishing between the coefficients that are mainly due to signal and those mainly due to noise has not been solved well.Thus, for the issue of image denoising, the wavelet-based thresholding algorithms with threshold are not an optimal strategy.
To improve the denoised results, the prior probabilistic knowledge of the noise-free image and noise image in wavelet domain is exploited in the process of image denoising.In the statistical wavelet-based denoising approaches, instead of using an arbitrary threshold to shrink the noisy wavelet coefficients, the shrinkage function is designed by minimizing a Bayesian risk, typically under the minimum mean square error (MMSE) criterion [10], minimum mean absolute error (MMAE) criterion [11], or maximum a posteriori (MAP) criterion [12].In general, modeling the statistics of natural images is a challenging task, partly because of the high dimension of the signal.However, the power of statistical image models can be substantially improved by representing the spatial image using WT, since the WT nearly decorrelates the dependencies between image pixels.By exploiting the nature of the histogram of the wavelet coefficients, different noise reduction algorithms try to incorporate the prior model that best matches the true image coefficients.In [13], Mallat proposed that the wavelet coefficients show non-Gaussian behavior and sharp peak centering around zero with heavytailed distribution and used the generalized Gaussian distribution (GGD) with the shape parameter  to fit the coefficients.Since then, the GGD has been commonly used to model the image wavelet coefficients and accordingly derived different image denoising algorithms based on some criteria [14,15].For example, Moulin and Liu [15] first assumed that the wavelet coefficients follow a simple independent and identically distributed GGD prior, then estimated the shape parameter using Mallat's algorithm proposed in [13], and finally developed a shrinkage estimator using MAP criterion based on the assumed prior.In addition to the GGD prior, several other PDFs have also been proposed to model the wavelet coefficients of the image, such as Gaussian scale mixture (GSM) distribution [16,17], -stable distribution [18], and Bessel -form distribution [19].Among these proposed models used for the purpose of denoising, the GSM model accounts for both marginal and pairwise joint distributions of the wavelet coefficients, and thus the corresponding denoising methods outperform the methods using other prior models.Although the -stable model provides a more accurate fitness to the histogram of the WT, this probability density function (PDF) has no closedform expression even though its characteristic function has one, and its parameter estimation is poor, specially in the presence of noise [20].This disadvantage hampers its wide application for image denoising in practice.It has been shown that the application of prior knowledge in the noise reduction algorithms results in a performance better than that of the wavelet-based denoising methods using thresholds.However, due to the inherent drawbacks of the wavelet bases which lack anisotropy and have poor directional selectivity, wavelets fail to capture the geometric regularity along the singularities of surfaces (e.g., the edges and contours), thus seriously affecting the performances of the wavelet-based methods.
To exploit the anisotropic regularity of a surface along edges, several image representations with elongated basis functions have been proposed to capture the geometric regularity of a given image, including curvelets [21], bandelets [22], and contourlets [23].They were also introduced into the field of image denoising because of their ability of efficiently representing the anisotropic regularity [24,25].For example, in [25], Eslami and Radha modeled the joint PDF of parents and children of the coefficients in semi-translation-invariant contourlet transform and implemented image denoising by constructing a bivariate shrinkage function under MAP criterion.However, since the directional filter banks used in these representation tools are inseparable, the construction of directional filters is very complex, which is the major disadvantage of these transforms.Moreover, the process of image decomposition using directional filters is more time consuming than decomposition using wavelets.
Apart from these image representation tools mentioned above, another related image representation scheme called sparse representation [26,27] was proposed and extended to the image denoising community.Sparse representation accounts for most or all information of a signal with a linear combination of a small number of elementary signals called atoms which are chosen from a so-called overcomplete dictionary.The results reported in [27] demonstrated that the denoising performance based on learned dictionaries surpasses that of wavelet-based methods.However, the algorithm imposes a very high computational burden, since it utilizes highly overcomplete dictionaries obtained via a preliminary training procedure.
In this paper, a new denoising method based on directionlet transform is proposed.Because of anisotropy and multidirectionality of directionlets, they allow us to capture more image features which can enhance the denoising performance.To recover the noise-free image using the Bayesian technique, we model the directionlet coefficients to be a two-state Laplacian mixture PDF with zero mean and use the expectation-maximization (EM) algorithm which is a specialization to the mixture density estimation problem to estimate the parameters involved in the assumed PDF.The noise-free coefficients are estimated by using average MMSE estimator.
The rest of this paper is organized as follows.The directionlet transform is briefly reviewed in Section 2. The statistical model of the directionlet coefficients of the noise-free image is studied in Section 3 as well as the discussion on the goodness of fit.Moreover, the design of the MMSE estimator is also presented in this section.The performance of the proposed algorithm is evaluated by using experimental results and compared with other existing denoising methods in Section 4. Finally, some conclusions are drawn in Section 5.

Directionlet Transform
Discontinuity curves (i.e., the edges and contours) presented in the images are highly anisotropic and they are characterized by a geometrical coherence.These features are not properly captured by the standard two-dimension (2D) WT that uses isotropic basis functions.Because of this, when an image is decomposed by the standard 2D WT, many wavelet bases intersect the discontinuity curves and result in many large magnitude coefficients.Obviously, in such cases, WT is not an optimal sparse representation for images.To efficiently capture these elongated structures characterized by geometric regularity along different directions (not only the horizontal and vertical), the directionlet transform [28] based on lattice is proposed recently.This transform is constructed by using two concepts: directionality and anisotropy, and they are realized through two steps, respectively.First, use an integer lattice which is represented by a nonunique generator matrix to partition the discrete space, where the two vectors d 1 (with slope  1 =  1 / 1 ) and d 2 (with slope  2 =  2 / 2 ) determine two directions which are called transform direction and alignment direction, respectively.By sampling the image with generator matrix  Λ , one can get | det( Λ )| cosets.Second, achieve the anisotropy by using an unequal number of 1D wavelet transform steps along the two directions; that is, the 1D transform is applied more along one direction than the other direction.It should be noticed that if then the iterated 1D transform should be applied in each coset separately.For notational simplicity, the directionlet transform is denoted as SAWT( Λ ,  1 ,  2 ) in the rest of this paper, where  1 and  2 are the numbers of 1D WT along transform direction and alignment direction in each iteration, respectively.The basis functions of SAWT are called directionlets, which include elongated functions that are very suitable for image representation.The frequency decomposition of SAWT( Λ , 2, 1) along 0 ∘ for images is shown in Figure 1.
From the construction principle of directionlets, it can be seen that the SAWT( Λ ,  1 ,  2 ) retains the separable filtering and subsampling, the simplicity of computations and filter design from the standard 2D WT, unlike the case for some other directional transform constructions (e.g., curvelets, contourlets, or bandelets).The corresponding asymmetric basis functions have directional vanishing moments along any two directions with rational slopes, which allow for a sparser representation of elongated and oriented features.As a consequence of the improved sparsity, directionlets provide an efficient tool for nonlinear approximation of images, significantly outperforming the standard 2D WT.

PDF Modeling and MMSE Estimator
We consider the problem of image restoration as an MMSE estimate of the noisy image.Let us denote the observed data by  =  + , where  is the clean data and  is the additive Gaussian noise.Applying the SAWT( Λ ,  1 ,  2 ) to the observed data, we have where , , and  signify the directionlet coefficients of , , and , respectively.Since we would like to use MMSE estimator to recover the noise-free image, the PDF of variable  is needed.

Statistical Model of Directionlet Coefficients.
In this subsection, we investigate the statistical distribution of directionlet coefficients of the image and show some results on modeling natural images.By casually observing natural images, one can find that they are highly inhomogeneous; that is, they typically contain many regions that are smooth, interspersed with "features" such as contours, or surface marking.After applying the standard 2D WT on the images, the marginal distributions of the detail coefficients which reflect the inhomogeneous characteristics show a large peak at zero and tails that fall significantly slower than a Gaussian of the same variance [13] (i.e., non-Gaussianity).Based on the observation, various PDFs [15,[17][18][19] have been proposed to model this non-Gaussianity in the past decades.Due to the directionlets including elongated basis functions that are nearly parallel to the edges, the coefficients obtained by using SAWT( Λ ,  1 ,  2 ) contain more zeros than those obtained by using WT.Correspondingly, the marginal distributions of the detail directionlet subbands at every scale will exhibit stronger non-Gaussianity with heavier tails.Thus, in this paper, a mixture density of two Laplacian distributions with zero mean and heavy tails is proposed to model the directionlet coefficients of the images.This prior mixture PDF can be expressed as where   ≥ 0 and ∑ 1 =0   = 1 and   () represents the th Laplacian PDF denoted as Laplace (;   ) and where   is the standard deviation of the th Laplacian PDF.To determine whether the assumed prior PDF can provide an appropriate fit for the directionlet coefficients, we visually examine the degree of match between the histogram of the coefficients and the adopted Laplacian mixture PDF whose parameters are estimated from the coefficients.Figure 2(a) displays a natural image called "Couple" (http://www.cipr.rpi.edu/resource/stills/misc1.html).Figure 2(b) displays the histogram of the  1 subband of the image shown in Figure 2(a) and a Laplacian mixture PDF fitted to this subband.Also the corresponding Gaussian density function with the variance estimated from the coefficients is displayed in Figure 2(b).It is clear that the Laplacian mixture PDF tends to fit the directionlet coefficients far better than the Gaussian PDF.
Naturally, another real question is whether the Laplacian mixture model describes the directionlet coefficients more accurately than other PDFs proposed in the literature.Here, we compare the assumed PDF fit with Gaussian PDF and Gaussian mixture PDF proposed in [29] and [16], respectively.The comparison is conducted by using a distance [30] between the modeled CDF and the empirical CDF of a given directionlet subband of the test image to determine which prior PDF approaches the empirical PDF more precisely.The distance was originally defined as where () and () denote the modeled CDF and the empirical CDF, respectively.To get the data samples, we use the specified SAWT( Λ , 2, 1) with a Daubechies' Symlet 4 wavelet to decompose the test image along two directions 0 ∘ and 45 ∘ , respectively.Then, in each detail subband, the histogram of the coefficients is computed to generate the empirical CDF, and the modeled CDF is computed by estimating the corresponding parameters from the directionlet coefficients.It should be noted that when the transform direction 45 ∘ is adopted in SAWT( Λ , 2, 1), two cosets will be produced in image decomposition and there will be too many detail subbands.But here, we only compute the distance on the subbands belonging to one of the two cosets due to the limited space.The resultant distances computed from Gaussian, Gaussian mixture, and the proposed PDFs for image "Couple" are reported in Table 1.It can be observed that the Laplacian mixture density provides smaller  values than the other compared models in most cases.In a few cases, even though the Gaussian mixture distribution produces a slightly smaller value of  than the proposed model, the latter, in fact, is already statistically very close to the histogram of the directionlet coefficients.This provides a strong evidence that the Laplacian mixture PDF fits to the directionlet subbands of the image more accurately than others.A similar conclusion applies to the other detail subbands at different composition levels.

MMSE Estimator.
Having known the prior model of the directionlet coefficients of the noise-free image, now we derive the shrinkage function based on (2).Since the directionlet transform is orthogonal and linear, the variable  is still following a Gaussian distribution with the same variance as .In the Bayesian framework, the denoising task is performed by statistical estimation.For a given noisy observation , there are infinitely many possible signals  that may have generated the observation.These candidates are not all equally likely, however, and follow the posterior probability distribution  | ( | ).One can obtain the most possible candidate ĝ() under a given criterion.A common criterion used for getting ĝ() is the MMSE which minimizes the expected squared error Δ = ∫  | ( | )‖ ĝ() − ‖ 2 .
The minimum error (MMSE estimator) is achieved for where {⋅} represents the mathematical expectation, the a posterior mean (i.e., the optimal value for ).
Using Bayes' rule, the MMSE estimator of  given noisy observation  can be easily derived as where   () is the convolution of   () with   () = (1/√2 a Laplacian distribution, then  follows a Laplacian-Gaussian distribution, which can be expressed as where * denotes the convolution operator and erfcx() = 2/√ ∫ ∞ 0  − 2 −2  is the complementary error function.For notational simplicity, throughout this paper we use LapGauss(; ,   ) to denote the Laplacian-Gaussian distribution expressed in (7) and also use erfcx( + , ,   ) and erfcx( − , ,   ) to denote erfcx(  / + /( √ 2  )) and erfcx(  / − /( √ 2  )), respectively.Finally, the corresponding MMSE estimator can be derived as However, it is usually difficult to derive the expression of the MMSE estimator for mixture PDFs.To get a closed-form estimator for Laplacian mixture prior, we introduce a hidden variable into (3).
Once the hidden variable is introduced, (3) can be rewritten as where ( |  = ) ∼ Laplace(;   ) and  is a binary hidden variable which can take on the value 0 (insignificant coefficient) or 1 (significant coefficient).Obviously,   ( = 0) and   ( = 1) are equal to  0 and  1 , respectively.The posterior probability  | ( | ) in ( 5) becomes Inserting (10) into the expression of the MMSE estimator ( 6) and exchanging the order of integration and summation yielded From (11), one can see that the integral term corresponds to a Laplacian MMSE estimator ĝ= () defined in (8) under a given hidden state .In fact, this is a weighted average of two MMSE estimates, where the weighting is determined by  | ( | ).Using Bayes' rule, the weightings  | ( | ) can be computed as where  | ( | ) follows a Laplacian-Gaussian distribution defined in (7) under hidden state .Substituting ( 7), (8), and ( 12) into ( 11) yields the full Bayes MMSE estimator for the Laplacian mixture prior where   ≥ 0 and ∑ 1 =0   = 1,  is the directionlet coefficient of noisy image, and   and   are the standard deviations of the th Laplacian PDF and the Gaussian noise, respectively.
In order to implement image denoising using ( 13), we must compute the parameters involved in the estimator, including the standard deviation   and other three independent parameters  1 ,  2 , and  0 (or  1 ) that characterize the Laplacian mixture prior.In this paper, we assume that the standard deviation   is known.If this is not the case, we use a robust median estimator proposed in [31] to estimate   from the finest decomposition scale of the noisy wavelet coefficients; that is,   = Median(| , |)/0.6745,where Median represents the median absolute deviation operator and  , represents the noisy wavelet coefficients in the finest decomposition level.The other three unknown parameters  1 ,  2 , and  0 can be estimated with the EM algorithm described in [32].Here, we only give the corresponding estimation formulas which are as follows: where is the responsibility factor,  denotes the th iteration of the algorithm, and  is the number of samples.It can be seen that once the initial values of  1 ,  2 ,  0 , and  1 are given, all the parameters can be iteratively estimated.For a more detailed discussion on EM algorithm, please refer to [33].

Experimental Results
In this section, we present some simulation results obtained by processing several test images using the proposed MMSE estimator in directionlet domain.To assess the performance of our method, we compare our algorithm with other commonly used methods including SureShrink [3], BayesShrink [31], BLS-GSM [17], and K-SVD [27] which is not a waveletbased method.The directionlet used in our method is constructed using Haar wavelet, and the image is decomposed into three levels.To obtain better results, the specified SAWT( Λ , 2, 1) is implemented along 0 ∘ , 45 ∘ , 90 ∘ , and −45 ∘ , respectively, and the final denoised image is obtained by averaging the four results.For all other algorithms used for comparison, if not stated otherwise, the used wavelets and the free parameters are set as suggested in the reference papers.Since both the decimated wavelet transform and directionlet transform lack shift invariance, all the wavelet-based and directionlet-based algorithms are conducted in undecimated transform domain to minimize the side effects like pseudo-Gibbs phenomena.

Results
Using Additive Gaussian Noise.Five test images, "Boat" and "Peppers" from http://sipi.usc.edu/database/database.php?volume=misc and "Couple, " "Cameraman, " and "Lena" from http://www.cipr.rpi.edu/resource/stills/misc1.html,are used for gauging the performance of the proposed denoising algorithm.The five images are all of size 512 × 512.In order to obtain the noisy images, we add white Gaussian noise on the original test images.A series of noise levels   ranging from 5 to 100 are selected in our experiments.To quantify the achieved performance improvement, a measure called peak signal-to-noise ratio (PSNR) is computed, which is defined as where MSE is the mean square error between the original and the denoised images.
The PSNR values obtained from all the used methods are listed in Table 2, with the best one highlighted in bold font.Even though, in general, the wavelet-based thresholding methods (SureShrink and BayesShrink) work well with an improvement of several decibels with respect to the noisy images, more sophisticated denoising methods prove definitely superior in all cases.From the table, it can be seen that ( 1) the PSNR values provided by the proposed method are larger than those of other methods.This indicates that the directionlet transform achieves a more sparse representation for images than other transforms such as wavelet transform and steerable pyramid transform; (2) the K-SVD achieves a comparable performance with our approach and is even better in some cases at low noise levels, but the performance degrades faster than the proposed method at high noise levels.This is because the increase in noise intensity results in a decrease in the number of effective sparse atoms, which further results in the substantial degradation of the denoising performance; (3) the behavior of BLS-GSM is similar to our algorithm but with a high degradation rate of denoising performance when the noise level is high.
For visual comparison, the noisy image (  = 20) of "Couple" and the denoised images obtained by various methods are illustrated in Figures 3(a)-3(f), respectively.By observing these figures, we find that the visual evaluation of the denoised results is consistent with the quantitative evaluation in terms of PSNR.Specifically, the results with SureShrink and BayesShrink exhibit plenty of visual artifacts and Gibbs phenomena near discontinuities; thus they achieve bad visual effects.Comparing with other algorithms, the proposed method provides fewer artifacts as well as better preservation of edges and other details.Even though the K-SVD and BLS-GSM can obtain satisfactory visual results, the former blurs the edges and other details which can be seen clearly from the floor and the desk, and the latter also introduces some artifacts while removing noise.
In addition to the PSNR and visual comparison, we also conduct a significance test to further justify the improvement.To examine whether there is a significant difference between the performances of the proposed method and the other methods, the Mann-Whitney test was applied on the obtained PSNRs.In this test, additional 25 images from the USC-SIPI Image Database (http://sipi.usc.edu/database/database.php?volume=misc) as well as the above 5 test images were used to compute the PSNR values.The differences between the proposed method and each of the other four methods were considered statistically significant if  values are less than 0.05.Statistical analyses were performed using SPSS 17.0.The test results are listed in Table 3.
From the table, it can be seen that all the  values are less than 0.05.Thus, the difference between the proposed method and other methods is significant; that is, the improvement is significant.

Results
Using Speckle Noise.To make the proposed method more convincing, we also conducted the experiments  1 SD: standard deviation of the PSNR series; 2 #: one of the other four methods (i.e., SureShrink, BayesShrink, BLS-GSM, and K-SVD).on the speckle noise which is a multiplicative noise with a  distribution where  ≥ 1 and Γ(⋅) denotes the gamma function.It should be noted that the smaller the value of , the higher the level of the speckle noise.To adapt the above methods to speckle noise, we must modify some steps involved in each algorithm.The specific modifications are as follows: (1) for the waveletbased methods including the proposed one, we first take logarithmic operation on the noisy images to convert the multiplicative noise into additive one.Then the denoising algorithm as used in the case of additive noise experiment is performed.Finally, the denoised result is obtained by taking an inverse logarithmic operation on the result of the denoising algorithm; (2) since the original K-SVD denoising algorithm is designed for homogeneous white Gaussian noise and the speckle noise cannot be assumed to be white and Gaussian, we use the local noise variance instead of a constant global noise level to train the dictionary.Then, compute the denoised output image using the trained overlapping patches.For more details about the modification on K-SVD, please see [34].
Three images "Couple, " "Cameraman, " and "Lena" are used as the test images in this experiment.For each image, five noise levels, that is,  = 1, 6, 12, 20, 32, are considered in this experiment.The obtained PSNR values are listed in Table 4. From the table, it can be seen that the performances of the algorithms are consistent with the case of additive noise except for the K-SVD.This is because when  is small, the speckle noise is strongly non-Gaussian, which degrades the performance of the K-SVD, and the speckle gradually approaches Gaussian distribution with the increase of .It can also be found that the proposed method obtains larger PSNR values in most cases, which indicates that the proposed method outperforms the other methods.For visual comparison, the speckled Lena image ( = 12) and the corresponding denoised results are illustrated in Figure 4.It can be observed that the results are consistent with the case of additive Gaussian noise.

Conclusion
The main issues regarding image denoising were addressed in this paper.The directionlet coefficients of the image were modeled as Laplacian mixture distribution with zero mean and at the same time the goodness of fit was analyzed by using a distance between the CDF of the assumed prior and the CDF of the directionlet coefficients.To recover the noise-free image, the MMSE estimator was constructed based on the assumed models.The experimental results confirm that the proposed MMSE approach allows efficient removal of both white Gaussian noise and speckle noise, outperforming the denoising methods published in the literature in terms of PSNR.Furthermore, our method is seen to generate fewer artifacts as well as better preservation of edges and other details.Although the computational cost of the proposed method is higher than the wavelet-based methods due to the directionlet transform producing more subbands, it is still less than the steerable pyramid transform due to the separable filters of directionlets and far less than the K-SVD that needs training highly overcomplete dictionaries.

Figure 1 :
Figure 1: Frequency decomposition of SAWT( Λ , 2, 1) along 0 ∘ , where two decomposition levels are used. and  stand for high pass and low pass, respectively.

Figure 2 :
Figure 2: A natural image and its corresponding fitting models in directionlet domain.(a) Couple image.(b) Histogram of the normalized directionlet coefficients for image and the Gaussian and Gaussian mixture PDFs used to model the coefficients.

Table 1 :
The cumulative distribution distance  values of Gaussian, Gaussian mixture, and Laplacian mixture PDFs, respectively, and the corresponding parameters estimated from two level subbands along two directions of Couple image.

Table 3 :
The statistical comparison (Gaussian noise   = 20) between the proposed method and the other methods.

Table 4 :
PSNR comparison (speckle noise) of the proposed method with the other methods (in dB).