Image Denoising Using Sparsifying Transform Learning and Weighted Singular Values Minimization

In image denoising (IDN) processing, the low-rank property is usually considered as an important image prior. As a convex relaxation approximation of low rank, nuclear norm-based algorithms and their variants have attracted a significant attention. These algorithms can be collectively called image domain-based methods whose common drawback is the requirement of great number of iterations for some acceptable solution. Meanwhile, the sparsity of images in a certain transform domain has also been exploited in image denoising problems. Sparsity transform learning algorithms can achieve extremely fast computations as well as desirable performance. By taking both advantages of image domain and transform domain in a general framework, we propose a sparsifying transform learning and weighted singular values minimization method (STLWSM) for IDN problems. The proposed method can make full use of the preponderance of both domains. For solving the nonconvex cost function, we also present an efficient alternative solution for acceleration. Experimental results show that the proposed STLWSM achieves improvement both visually and quantitatively with a large margin over state-of-the-art approaches based on an alternatively single domain. It also needs much less iteration than all the image domain algorithms.


Introduction
Noise inevitably exists in images during the process of real-world scenes acquisition by reason of physical limitations, leading to image denoising (IDN) becomes a fundamental task in image processing.The recent IDN can be categorized as data-driven and prior-driven approaches.
The data-driven methods turn to certain deep convolution neural network, such as Universal Denoising Net (UDN) [1] and Fractional Optimal Control Net [2], for the IDN problem.These CNN models, although have achieved great success provided with sufficient training samples, may not perform well in small-scale data applications.For example, one cannot obtain the acceptable network parameters on a single corrupted image, which is the case considered in this study.The aim of the prior-driven methods for image denoising is to renovate the inferior image by certain image prior or other properties, such as local smoothness, non-local similarity, low-rank structure and so forth [3][4][5].More specifically, the prior-based image denoising process means to find the inherently ideal image from the degraded one by extracting the few significant factors and excluding the noisy information.It is a typical ill-posed linear inverse problem, and a widely used image degradation model can be generally formulated as [6][7][8][9]: where X, Y are both matrices representing the original image and the degraded one, respectively.
H is also a matrix denoting the non-invertible degradation operator and N is the additive noise.
To cope with the ill-posed problem, the general image denoising problem can be formulated as [9,10] ).
To be specific, given an image x, the synthesis sparse coding problem is subject to finding a sparse κ to minimize 2 2

||
|| − xD κ .Various algorithms have been proposed [12][13][14][15][16] to figure out this NP-hard problem.Numerous researchers have learned the synthesis dictionary and updated the non-zero coefficients simultaneously to well represent the potential high-quality image.And these methods have been demonstrated useful in image denoising.Specifically, these synthesis models typically alternate two steps: the sparse coding updating and dictionary learning.However, the practical operation of synthesis models requires some rigorous conditions, which often violate in applications.
While the synthesis model has attracted extensive attentions, the analysis model has also been catching notice recently [17,18].The analysis model considers that a noisy image is regarded as an analysis dictionary, since it 'analyzes' the image x to a sparse form.The essence of Ωx defines the subspace to which the image belongs.And the underlying ideal image is formulated as =+ yxξ , with ξ representing noise.The denoising problem is to find x by minimizing . This problem is also NP-hard and resemblant of sparse coding in the synthesis model.Approximation algorithms of learning analysis dictionary have been proposed in recent years, which similar to the synthesis case are also computationally expensive.
More recently, a generalized analysis model named transform learning model has been proposed, which follows the intuition that images are essential sparse in certain transform domain and can be expressed as: is sparse coefficient, and ε is approximation error [19].The distinguishing feature from the synthesis and analysis models is that approximation error ε of transform learning model is in transform domain and is likely to be small.Another superiority of transform model compared to image domain model is that the former can achieve exact and extremely fast computations.
Instead of learning synthesis or analysis dictionary, the transform learning model aims at learning the transform matrix to minimize the approximation error ε .After getting the learned transform W, the original image is recovered by † W μ ，where † W is pseudo-inverse of W. The transform learning model has earned great success in application of image denoising in both efficiency and effectiveness [19][20][21][22].
Nonetheless, a remaining drawback is that transform model overemphasizes transform domain but ignores the primary image domain.There is always a connection between image domain and transform domain, and this can be treated as a regularization term in image denoising.
For taking full use of the advantages both image domain and transform domain, and implementing single image denoising problem, this study focuses on sparsifying transform learning and essential sparsity property of image, and proposes a novel algorithm named Sparsifying Transform Learning and Weighted Singular Values Minimization (STLWSM).Specifically, our model simultaneously considers the sparsifying transform learning and the weighted singular values minimization of image patches.
The remainder of this paper is organized as follows.In the next section, a brief review of the transform domain and image domain for IDN is provided.In section 3, we propose our method and the efficient obtain of solution.Section 4 provides experimental results of gray images and color images.Conclusions are drawn in section 5.

Related Works Transform domain for IDN
As mentioned in the previous section, the transform model can utilize the sparsity of image in transform domain to increase efficiency.Therefore the analytical transform models such as Wavelets and discrete cosine transform (DCT) are widely used in practical application, for instance, the image compression standards JPEG2000.As a classical and effective tool, transform models have been increasingly used in image denoising.Inspired by dictionary learning, Saiprasad et.al [20] et.al [20] solved the proposed problem by alternately updating W and μ , and proved the convergence.To carry forward their achievements, they further proposed a Learning Doubly Sparse Transforms (LDST) for IDN [22].Specifically,  = WB Φ is adopted to replace the original W, where B and Φ are both square matrices with the same size.B is a transform constrained to be sparse, and Φ is an analysis transform with an efficient implementation.
They use doubly sparse transform model in image denoising and get faster and better results than unstructured transforms.And then, Wen et.al [19,21] proposed a Structured Overcomplete Sparsifying Transform Learning (SOSTL) model.The main feature different from aforementioned transform models is that Wen et.al cluster image patches and learn diverse W for corresponding patch groups.This process can be formulated as the following: is a regular term to prevent trivial solutions.{C k } indicates the specific class of image  X , K is the number of categories and G is the set of all classes.

Image domain for IDN
While the transform learning models have achieved great success, in image domain, there also have been proposed various algorithms for IDN.As mentioned before, in general image denoising model, F(X) is an additional regularization.The widely studied regularizations include l 1 , l 2 , l 1/2 norm, nuclear norm, low-rank property and so on [23][24][25].Focusing on patch form instead of vector form, low-rank property has been attracting significant research interest.As a convex relaxation of low-rank matrix factorization problem (LRFM), the nuclear norm minimization (NNM) has engrossed more attention [4,[25][26][27].The nuclear norm of an image X is defined as However, many researchers hold that the minimization of different singular values should be separated.Gu et.al [4] proposed weight nuclear norm minimization (WNNM) for image denoising problems.The weight nuclear norm is defined as XX , and w = [w 1 , w2,…, w n ] is non-negative.At this point, we can treat F(X) as ( ) , and the denoising model is: By taking consideration of different singular values, as well as image structure, the WNNM shows strong denoising capability.Meanwhile, Hu et.al [27] proposed Truncated Nuclear Norm Regularization (TNNR) for matrix completion.They deemed that the minimization of the smallest min(m, n)-r singular values can maintain the original matrix rank by holding the first r nonzero singular values fixed.Using ( ) , the TNNR constrained model can be written as follows: TNNR gets a better approximation to the rank function than the nuclear norm based approaches.Inspired by both WNNM and TNNR, Liu et.al [28] improved the previous algorithms by reweighting the residual error separately and minimizing the truncated nuclear norm of error matrix simultaneously (TNNR-WRE).In their work, F(X) is considered as follows: U HV (7) where H = X-Y，U and V are left and right matrices of H's singular value decomposition (SVD) respectively, and r is the truncation parameter.TNNR-WRE further achieves higher accuracy than TNNR.
From the above, the nuclear norm based algorithms usually can get considerable results because of the essential low-rank property in image domain.For taking both advantages of transform domain and image domain in IDN, a Sparsifying Transform Learning and Weighted Singular Values Minimization (STLWSM) method is proposed.In contrast to LST, LDST, SOLST, WNNM, TNNR and TNNR-WRE, the proposed STLWSM jointly takes consideration of sparsity in transform domain and low-rank in image domain.The main results of our work can be enumerated as follows: (i) We propose a general framework of image process in both transform domain and image domain, which combines the sparsifying transform learning of image patches and the lowrank property of the original image.
(ii) As image patches can take advantage of the non-local similarity exists inherently in image, we learn the sparsifying transform for each group of similar patches by Euclidean distance.
(iii) For solving the proposed NP-hard problem, we present an efficient alternative optimization algorithm.In practical applications, our method requires limited number of iterations, mostly less than 3, for the final solution.
(iv) We applied our model to IDN, the results show that STLWSM can achieve evident PSNR (Peak Signal to Noise Ratio) improvements over other state-of-the-art methods.

Proposed method
In this section, we propose a general framework in both transform domain and image domain.
To be clear, we take sparsifying transform learning in transform domain and weighted singular values minimization in image domain simultaneously.To solve this NP-hard problem, an efficient solution is also derived.

Sparsifying Transform Learning and Weighted Singular Values Minimization (STLWSM)
In light of the observations mentioned above, we first introduce a sparsifying learning transform base on image patches, and utilize the weighted singular values minimization to improve the image quality.

Given a noisy image
, nonlocal similarity is a well-known patch-based prior which means that one patch in one image has many similar patches [7][8][9].Accordingly, overlapped image patches can be extracted with a sliding window in fixed step size.For each specific patch, we choose the most similar M patches by Euclidean distance [4,7,[19][20][21] for potential low-rank structure, and a matrix of and the total number N of i  X depends on the size of the original image X, patch size and step size.After similar patches aggregation process, each group i  X is obtained and 12 [ , ,..., ] p . Following the idea of transform learning algorithm [19-21], with the obtained i  X and some initialized W i , our preliminary model can be formulated as the following: The definition of Q(W i ) is the same as one in problem ( 4), but where

Efficient optimization of the proposed model
In this subsection, we introduce an efficient solution for the non-convex Sparsifying Transform Learning and Weighted Singular Values Minimization problem.According to [17][18][19][20], the transform learning process is not sensitive to the initialization of W. As a result, with given W, the sub-problem of i μ can be obtained using cheap hard-thresholding, ˆ()

Here ()
s Th is the hard thresholding operator.And the sub-problem of i W is as follows: Because of the term a.The first formula is: WXμ can be written as  PΨQ respectively.If we take consideration of their diagonal matrix only, the foregoing formula can be rewritten as: where is constant and can be omitted.The revised problem is convex for i  , so the optimizing solution can be found by taking partial differential with respect to i  and setting the derivative to 0.
Therefore, excluding the non-positive results, the solution is: To sum up, the transform update step can be computed as follows: b.The second formula is: With fixed ˆi W obtained in step.a, this part can be simply seen as where † † † †

and †
ii W μ represents the denoised matrix.Following Ref. [4], a desirable weighting vector w i in image domain can be: † / ( ( ) ) where † () -16 is to avoid dividing by zero.And the second formula's optimal solution is: where † XW μ and the soft-thresholding operator () wi S Σ is defined as ( ) max( , 0) -noisy image with size hl  , p -patch size, M -number of similar patches, initial sparsity -denoised image Initialization: Wi is the DCT matrix of size pp  , N' is number of similar patches' group.For iteration =1:3 Do: For each group (i=1:N') calculate: 1) Transform domain: a. Decompose the image X into patch form X'. b.Compute i μ by The summary of our optimization solution is presented in Algorithm.1,where the similar patches are determined by Euclidean distance.

Experiment Results
In this section, we choose 25, 12, 15, 10 reference images with size of 256*256 from TID2008 [29], USC-SIPI 1 , Live-IQAD [30], IVC-SQDB [31] to test the image denoising effects, respectively.As we use six different noise levels to the test images in our experiments, the total number of distorted images is 372.Some representative images from USC-SIPI database are shown in Fig. 1 and Fig. 2. Four recently proposed methods, including patch-based algorithm GSR, weighted nuclear norm WNNM, sparsity learning transform scheme SOLST and sparsity transform learning and low-rank model STROLLR, are adopted as contrasts.The noisy images are obtained by additional Gaussian noise with n  = 15, 20, 30, 40, 50, 75.All competing algorithms use their default settings, which has been finely tuned and deeply verified in their original publications.Since that our method is derived from both the schemes of image domain and transform domain, we set our parameters the same as the representative methods in these two domains, i.e., WNNM and SOLST, for fairness.That is, for the image denoising application, when 20 And when n  is set others, p is 9, M is 140, i  is 0.58.In addition, 6 images of 512*512 from USC-SIPI (shown in Fig. 10) are used in image inpainting application.For the image inpainting application, we also follow the similar setting rule.The balance parameters . Table 1 shows the detailed parameter setting in our experiments, where the texts in bracket is used for the 512*512 images, while the plain ones are for the 256*256 images.
The Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) are used to evaluate the quality of the denoised images.PSNR is defined by: where MSE is the mean squared error between the original image and the denoised one.SSIM is defined as [30,32]: For a thorough comparison, we list the average denoising results from all the 372 distorted images in Table 2. Also, the experimental results from all the gray images of USC-SIPI are shown in Table 3.  From these two tables, we can observe that among the competing algorithms, GSR also adopts the nonlocal similarity that groups image patches for low-rank structure.However, it requires too much iterations in practical applications, e.g., 100 or even up to 200 times.In contrast, WNNM needs fewer iterations, around 14, and achieves pretty good results than other 3 algorithms at average of 8.26dB for gray images.In the meantime, the proposed STLWSM needs the least iterations and achieves best performance.
SOLST and STROLLR are both transform algorithms and have hard-to-catch efficiency.STROLLR trains transform matrices for each group, while SOLST combines non-local lowrank and transform learning, they also achieved better results than STROLLR at average of 5.54dB.In Table 3, the numerical results of the proposed STLWSM are all made bold that means the best one among the five algorithms.It's evident that the proposed method has achieved visible improvement in PSNR under all kinds of noise levels at average of 13.61dB.More visual results are shown in Fig. 3, in which our method clearly outperforms all other methods.
Moreover, considering that GSR needs too much iterations and pure transform learning algorithms are extremely faster, we compare our time consummation against WNNM, and the results are shown in Fig. 4. It can be seen that our method spends much less time than WNNM, at average of 55.46%.Our algorithm also has good scalability, we further use RGB images in IDN, experiments results show that the proposed STLWSM still outperform than other algorithms, and specific numerical comparison are shown in Table 4. Again, Fig. 5 and Fig. 6 respectively show the visual results in terms of the average PSNR and the elapsed time, which also demonstrate our superiority against other competitors.Fig. 7 and Fig. 8 show the visual results of average SSIM comparison of gray images and color images respectively.It can be seen that our method can hold denoised image structure even with high noise rate.To detailed display the efficiency of our algorithm, we provide its generated results versus different iterations (up to 10).The experimental results are shown in Fig. 9.All 12 images' PSNR values are averaged for each noise level.The PSNR value of the original noisy images in different noise levels is shown as the starting point, where the top black line is the max value of 24.63, the bottom black line is the min value of 10.65, and the red line represents median of 17.64.And the green star is average of 17.72.Fig. 7 shows that our algorithm has a fast constringency speed and needs limited number of iterations, mostly 3, for the final solution.5.The original images are shown in Fig. 10.The results show that all methods achieve admirable inpainting results for filling in missing pixels, and the proposed STLWSM still outperforms all the other state-of-the-art algorithms.Taking into account of the image denoising results, our STLWSM has better robustness with much less PSNR changes compared to other competing approaches.

Conclusions
In this paper, we have proposed a unified framework of image denoising using both knowledge from image domain and transform domain, namely Sparsity Transform Learning and Weighted Singular Values Minimization (STLWSM).Specifically, we learned the transform matrix for each group of patches with similar structure.After obtaining the optimized transform matrix and the sparse coefficient with an efficient optimization algorithm, we further restored the image patch groups through their low rank prior.By adopting STLWSM to all the groups, a denoised image can be reconstructed.For both gray images and color images, experimental results show that, the proposed model can achieve visible improvement in PSNR over other state-of-the-art approaches.Our efficient optimization algorithm also costs much less running time compared to the typical image domain based method.Note that while the pure transform learning methods run faster than STLWSM, they perform poorer with a large margin.To further improve the efficiency of our framework will be our main work in the near future.


are regularization parameters and usually set empirically.This formulation can minimize the residual in transform domain and the rank of the recovered matrix i  X simultaneously.

1
http://sipi.usc.edu/database/where x and y represent the original image and the denoised one, respectively, x  and y  are the mean values of x and y, x  and y  are variances, and xy  is the covariance.C1 and C2 denote two stabilization variables.

Fig. 3 Fig. 7 SSIM
Fig. 3 PSNR AVG of gray images denoising results Fig. 4 Elapsed Time comparison in gray images

Fig. 10
Fig. 10 Original images of size 512*512Table 5 Images inpainting results of size 512*512 Image term in image domain[11].This model is called as synthesis model, and κ is supposed sparse ( 0 || || m κ transform domain, which is a matrix.Suppose the transform W  X also has low-rank structure, hence, we utilize weighted singular values minimization to approximate the matrix.The unified denoising minimization is: i

Table 1
Parameter setting in our experiments n  (

Table 4
Color images de-noising results (PSNR/SSIM) Image House image inpainting results are shown in Table

Table 5
Images inpainting results of size 512*512 Image