Optimizing Shrinkage Curves and Application in Image Denoising

1School of Information & Software Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China 2College of Computer Engineering, Yangtze Normal University, Chongqing 408000, China 3School of Electrical Engineering, Wuhan University, Wuhan 430072, China 4School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China


Introduction
Low rank matrix approximation has been attracting significant research interest in recent years.This approach aims to reconstruct the latent data from its degraded observation matrix and is frequently applied in many fields, such as machine learning [1], computer version [2], recommendation system [3], and image processing [4].As a branch of this research, a regularized nuclear norm minimization problem is widely considered over matrices, min X∈R × ( (X) +  ‖X‖ * ) , where X denotes a matrix, scalar  > 0 is a parameter, and (X) and ‖X‖ * are the data fidelity term and the data regularization term, respectively.In this formula, ‖X‖ * = ∑ min(,)

𝑖=1
(X) is called the nuclear norm of X, where   (X) denotes the th largest singular value of X.
If (X) is convex, then (X) + ‖X‖ * is a convex function because nuclear norm ‖X‖ * is also convex.Thus, problem (1) is a convex optimization problem and can be treated by using various classic iterative optimization algorithms including steepest-descent, conjugate-gradient, and interiorpoint algorithms.When the data fidelity term (X) = 1/2‖X− Y‖ 2   , where Y is an observation matrix and ‖ ⋅ ‖  denotes the Frobenius norm operator, this is the well-known nuclear norm minimization (NNM) [5].The NNM problem was proved that it can be solved by applying a soft threshold operation on the singular values of Y, and the solution can be achieved using a Singular Value Thresholding (SVT) algorithm [6].
Despite the success of NNM, it is not flexible enough to handle more complex issues.To pursue the convex property, NNM treats each singular value equally.As a result, the soft-thresholding operator shrinks each singular value by the same amount [6].In principal component analysis however, different principal directions quantify different information.For example, the large singular value delivers the major feature information such as edges and texture.This implies that, in image denoising, the larger the singular value is, the lesser the amount shrinks.Obviously, the NNM model and the corresponding solver cannot handle this issue.
To overcome this limitation, a regularized nuclear norm minimization with weights was put forward.These weights 2 Mathematical Problems in Engineering may enhance the representation capability of the original nuclear norm.Its form is as follows: min X∈R × ( (X) + min(,) where  [𝑖] is the weight designed to   (X).Problem ( 2) is a nonconvex nonsmooth low rank minimization problem.Of course, if [] is replaced with , problem (2) reverts to problem (1).Solving problem (2) is challenging, or even NP-hard.To solve this problem, researchers presented some assumption to handle it.Gu et al. [7] assumed that  is nondescending on [0, ∞), 0 ≤ [1] ≤ ⋅ ⋅ ⋅ ≤ [min(, )], and thus problem (2) becomes convex and can be solved by a soft-thresholding operation.Moreover, the authors devised a solver in which [] is inversely proportional to   (X).Gasso et al. [8] argued that if both  and − are convex, problem (2) can be solved by DC (Difference of Convex functions) programming.Lu et al. [9] assumed that  is concave increasing monotonically on [0, ∞) and  satisfies Lipschitz continuous condition; the weights are achieved at the super gradient point of the concave function .Based on this assumption, the authors proposed Iteratively Reweighted Nuclear Norm (IRNN) method.In addition, Hu et al. [10] reported their Truncated Nuclear Norm Regularization (TNNR) method, based on the same assumptions as in [9].
By applying these low rank matrix approximation theories, different image denoising methods have been reported.For example, a method of coupling sparse denoising and unmixing with low rank constraint is proposed for hyperspectral image in [11]; a scheme of incorporating iterative support detection into TNNR is presented to reduce white Gaussian noise in [12]; the eigenvectors of the Laplacian are considered to suppress Gaussian noise in [13]; a weighted nuclear norm minimization model is presented [14] and is used in three applications, that is, image denoising, background subtraction, and image completion.These methods achieve high quality results.A main reason is that all of them employ a powerful patch-based technique.
Inspired by weighted nuclear norm minimization and patch-based technique, a parameter optimization method is proposed in this paper.The proposed method utilizes the difference between a latent matrix and its observation to design a penalty function and employs odd polynomials to shrink the singular values of the observation matrix.As a result, the coefficients of polynomial fully characterize the shrinkage operator.Furthermore, for the penalty function, its Frobenius norm is converted into a spectral norm.Thereby, the parameter optimization can be easily solved by using plain least-squares.
To validate the effectiveness, the optimization theory is applied in image denoising.Since the proposed method is to optimize shrinkage curves, it is called OSC method.In the practical application, the OSC method does not work on the whole image at once, but rather a series of matrix termed Rank-Ordered Similar Matrix (ROSM, see Definition 2).Thirty-two images were tested.Experimental results show that the OSC method achieves better results than the Bilateral Filter; when the noisy standard deviation is less than 20, the results achieved by OSC are better than those by BM3D, and when the noisy standard deviation varies from 20 to 40, the results by OSC are weaker than by BM3D.
The contribution of this paper is twofold.Firstly, in the penalty function devised for weighted nuclear norm minimization, the weight representation is replaced by odd polynomials.So, the coefficients of polynomial characterize the role of the weights fully.Furthermore, the Frobenius norm of the penalty function is converted into a spectral norm.Secondly, the proposed optimization method is adapted to image denoising.Experimental results show that the proposed OSC method outperforms the Bilateral Filter and also is superior to the BM3D method on the case of low noise.
The rest of paper is organized as follows.In Section 2, the shrinkage curve optimization is formulated.Section 3 describes the image denoising algorithm, and the corresponding analysis is followed in Section 4. Section 5 reports the experimental results, and conclusions are drawn in Section 6.

Optimizing Shrinkage Curves
In this section, the problem to be discussed is formulated, and the method of optimizing shrinkage curves is followed.

Problem Formulation.
Let X be a unknown square matrix in R × , and let Y be its observation.The observed matrix is corrupted by white Gaussian noise N with deviation  2 .This is expressed as To reconstruct the original square matrix X from its noisy version, the following weighted nuclear norm minimization with a constraint is considered: where In this formula,   (X) denotes the th largest singular value of X, and ‖X‖ * ,w and ‖Y − X‖  are the weighted nuclear norm of X and the Frobenius norm of (Y−X), respectively.Our aim is to use polynomial coefficients to characterize the weights and obtain the solution of the problem.

Optimization Method.
As in [7], it is proven that the weighted nuclear norm minimization (4) can be solved by imposing a soft threshold operation on the singular values of observation matrix.The form is as follows: where Y = UΣV  is the Singular Value Decomposition (SVD) of Y and  is the soft-thresholding operator.This operator with weight vector w shrinks the singular values; Mathematical Problems in Engineering 3 (Σ)  = max(Σ  −  , 0).To obtain the thresholds, the following penalty function is employed: Assuming that shrinkage operation is applied differently to every singular value in matrix Σ, and thus it can be broken into where Σ[] is the th largest singular value of Y and   denotes the th shrinkage operator.In our work, odd polynomials are taken to represent these shrinkage operators, where the coefficients of polynomial characterize the shrinkage operator fully.Thus, the shrinkage operator is expressed as Substituting ( 8) into ( 7), the shrinkage operator can be rewritten as Substituting ( 9) into ( 6) and considering V is a unitary matrix, the penalty function can be rewritten as The focus now shifts to optimization of the penalty function.To obtain the optimal solution by using plain leastsquares, the Frobenius norm in (10) is converted into a spectral norm.For ease of formulation, a vector c ∈ R  is defined that contains all the coefficients series: {  [] :  = 1, . . ., }, for  = 1, 2, . . ., ; that is, A matrix Z ∈ R  2 × is also defined that is a blockdiagonal matrix with  blocks, where each of the blocks z[] is size of  × , with the content for  = 1, 2, . . ., .
In addition, two operators, vec and diag, are introduced.The vec operator returns a vector by concatenating the columns in a matrix; the diag operator returns a diagonal matrix by putting the elements of a vector on the main diagonal.For example Using these notations and operators, the penalty function (10) can be rewritten as where b = vec (XV) and A = diag (vec (U)).
The optimal set of parameters that define the shrinkage curves is Mathematical Problems in Engineering which leads to

Application in Image Denoising
In this section, the OSC method is introduced for image denoising, containing denoising modeling data and the denoising algorithm.
Definition 2. Let p  ∈ P be a reference patch and ‖ ⋅ ‖ denote the Euclidean norm; the distance between p  and p  can be calculated, using the metric These scalar distances are then sorted and the patches in P are correspondingly ordered, where  () denotes the th smallest distance value, ≺ denotes the order relation between patches, and  = |P| = ( 1 − √ + 1) × ( 2 − √ + 1) denotes the number of patches in P.
The denoising modeling data, termed Rank-Ordered Similar Matrix (ROSM), are defined as Obviously, ROSM is a square matrix of size × and p (1) = p  .When all patches in P are complete, a matrix-set in R × can be built, denoted by M = {Y  :  = 1, 2, . . ., }.

Denoising Algorithm.
The proposed denoising algorithm consists of Algorithms 1 and 2. The former trains the polynomial coefficients c opt , and the latter reduces noise.
Suppose Y is a noisy image and X is its noise-free version.Based on Definition 2, the matrix-set M of Y is built, and the corresponding noise-free matrix-set is also obtained, denoted by M = {X  :  = 1, . . ., }.Thereby, there exist  paired samples {(X  ,Y  ) :  = 1, . . ., } to train c opt .The corresponding penalty function is as follows: where The optimum parameters are where b  = vec (X  V  ), A  = diag (vec (U  )), and Z  can be obtained by using ( 12) and ( 13).
Next, the two-iteration denoising algorithm is introduced.Suppose Y is a noisy image to be processed, of size  1 ×  2 .Employing Definition 2, the matrix-set M of Y is built.For any ROSM Y  in M, the estimate of its noise-free version can be obtained by where Y  = U  Σ  V T  is the SVD of Y  and  c (Σ  ) is expressed as (10).When all ROSMs in M are complete, an estimatedset can be built, denoted by M = { X :  = 1, 2, . . ., }.Every estimate X is put back into the original image canvas, then intensities of pixels falling in the same position in the canvas are averaged.And thus, the first-estimate of the noisefree image is obtained, denoted by X( = 1).Repeating the above operations again, the second-estimate X( = 2) is also yielded.
In addition, a scaled version of the residual error, the difference between the noisy and estimated image, is considered.Let   denote the operation putting the th estimate X back into the original image canvas; the denoising method can be expressed as the following tuples formula: where X() is the th image estimate, Ŷ() is the th adjustment,  is the scaled factor of residual error, ⊘ denotes entrywise division, and D denotes the matrix with the same size as the noisy image, in which each entry records the number that the pixel on the same location is processed.for  = 1 :  (5) Initialize G e t a p a i r e d i m a g e s , (X, Y) ← (X(), Y()); (8) Based on Definition 1, extract all patches from image Y and build the patch-set P = {p  } (9) for each patch p  in P do (10) Based on Definition 2, obtain a ROSM Y  ; (11) O b t a i n t h e m a t r i x X  corresponding to Y  ; (12) Singular value decomposition, M a p U  to a diagonal matrix, M a p Σ  to a diagonal block matrix Z  , according to Eq. ( 12) and ( 13

Algorithm Analysis
In this section, the influence of different levels of noise for the singular vectors is first discussed; then the shrunken scales of singular values are analyzed.
where the symbol ⟨⋅⟩ is the inner-product operator.The vector difference, n()−n(), is the Gaussian noise with covariance matrix 2 2  2 I, where symbol I denotes identity matrix.Based on the argument in [15], if the dimension of y is large, the norm squared, ‖n() − n()‖ 2  2 , is concentrated around its mean 2 2  2 , with high probability.Similarly, the projection ⟨x() − x(), n() − n()⟩ of the noise on the deterministic vector x() − x() is concentrated around 0 with high probability.Therefore, ( 17) can be interpreted as a translation-invariant procedure.
On the other hand, as the noise level increases, the singular vectors of Y  ∈ M are no longer aligned with the original singular vectors of noise-free matrix X  .As a result, the number of reliable singular vectors of Y  becomes smaller.For example, when X  is corrupted by Gaussian noise at very small magnitude, all the singular vectors of Y  track the noise; when corrupted at a large magnitude, the number of the singular vectors tracking the noise becomes smaller.Therefore, at increasing noise level, the singular vectors of Y  become more and more unreliable and result in worse restored feature.

Experiments
In this section, the OSC denoising performance is evaluated.In Section 5.1, setting parameters are formulated.Section 5.2 introduces two metrics, the Peak Signal-to-Noise-Ratio (PSNR) and the Mean Structural Similarity Index Measure (MSSIM).Section 5.3 reports the experimental results, and the comparisons with other methods are discussed.

Setting Parameters.
A total of three parameters are set in the proposed method, a scale factor  of residual error, an integer  regulating odd polynomial order, and a patchsize √ × √.The large  may lead to oscillatory effects.For all noise levels, it is desirable for  to be set to 0.1.To study the effect of parameters (, √) on denoising, the coefficients vector c opt was trained on the Lena image, and the trained c opt is used to reduce noise for the Barbara image.The influences of  and √ for PSNR results are shown in the left and right insets in Figure 2, respectively.By experience, these parameter values used in our experiments are shown in Table 1.During implementation process, because the behavior of coefficients c opt on large entries may be completely distorted, thereby an input value outside the training range may be violently magnified.For this case, the algorithm checks whether the input is within the learning data range, and if not, it is not modified at all.

Two Metrics.
The PSNR measurement is based on pixel intensity errors between the noise-free and the restored images, given in decibels (dB).Higher PSNR means better denoising capability.Let X and X be the noise-free and the restored image, respectively, |X| denote the cardinality of X, and ‖⋅‖  denote the Frobenius norm; the calculation of PSNR is as follows: PSNR = 20 log 10 ( 255 The MSSIM measurement is the mean SSIM that yields mean value of structural similarities.MSSIM values are bounded in the range [0, 1], and the closer the value is to 1, the better the denoising scheme is implied.The calculation of SSIM involves paired blocks extracted from the noise-free image X and the restored X, respectively.Let  1 and  2 denote paired blocks,   1 and   2 be the mean values of  1 and  2 , respectively,   1 and   2 be respective variances, and   1  2 be their covariance.Assuming  1 and  2 are two stabilization variables, the SSIM value can be calculated by . (28)

Experimental Results and Comparisons.
A test set was built to evaluate the OSC method, which was a combination of two groups, denoted by Γ = {Γ 1 , Γ 2 }.Each group Γ  contained noisy images, with eight different levels of white noise.The Γ 1 group is associated with the sixteen noisefree images widely used, shown in Figure 3; the Γ 2 group    is associated with the sixteen noise-free biomedical images, shown in Figure 5.These noise-free images were taken from the CVG-Granada database.

Mathematical Problems in Engineering
The corresponding training set was also built, which was a subset of the test Γ, denoted by Ψ = {Ψ 1 , Ψ 2 }.The group Ψ 1 contained the noisy versions of the five noise-free images, the Einstein, Elaine, Barbara, Fingerprint, and the Man image; the group Ψ 2 included the noisy versions of the six noisefree images, the C05c, Celulas, Cromo2, Mr2, Mr032, and the Mr034 image.The OSC method was applied to the test set Γ. Using the parameter values shown in Algorithm 1, the coefficients c opt were trained by Algorithm 1. Algorithm 2 used the same parameter values and the trained c opt .The PSNR and MSSIM results associated with both groups are reported in Tables 2 and 3, respectively.Moreover, the visual results and zoom-in for three widely used images are shown in Figure 4 and are shown in Figure 6 for three biomedical images.
To augment the performance evaluations, the OSC method was compared with Bilateral Filter [16] and BM3D [17].For the Bilateral Filter there are three parameters needed to be set, containing a sliding window of size  × , a geometric spread standard deviation   , and a photometric spread standard deviation   .The three parameters, ,   , and   , were set to 5, 5, and 0.1, respectively.The source codes of BM3D were taken from the original authors, and the parameter values used in the experiments were those recommended by the authors.The PSNR and MSSIM results from the two methods are reported in Table 2, and the visual results for three images are shown in Figure 4.
In addition, the mean PSNR and mean MSSIM results for fixed noise are calculated for noisy images, the Bilateral Filter, BM3D, and the OSC method, respectively.The corresponding calculations are as follows: In these two formulae, Γ  denotes all the images with the same level noise  in the test set Γ, and thus |Γ  | = 32; and PSNR  () and MSSIM  () denote the PSNR and MSSIM values of the th image with the noise , respectively.For example, if the OSC method employs formula (29) to calculate, PSNR 5 denotes the arithmetical mean PSNR for all images with the noise  = 5, associated with the OSC method.The mean PSNR and MSSIM values for fixed noise for the four methods are plotted in Figure 7.
Some observations and conclusions can be drawn from the quantitative measurements and visual results.Firstly, the OSC method achieved the desirable results.The OSC method achieved the same results as the BM3D method on average and significantly outperformed the noisy method and the Bilateral Filter by 8.49 dB and 5.26 dB on average, respectively, in the PSNR results.The OSC method achieved the same results as the BM3D method on average and significantly outperformed the noisy method and the Bilateral Filter by 0.417 and 0.275 on average, respectively, in the MSSIM results.Secondly, the OSC method has a strong capability to preserve detail.The OSC method reconstructed more image details from noisy images than the Bilateral Filter and was almost the same as the BM3D.As seen in Figure 4, the fine-details of the moustache for the Baboon image were preserved, while the Bilateral Filter introduced excessive smoothing and resulted in visual feature blurring.Thirdly, in contrast to BM3D, the OSC method achieved decreasing results at increasing levels noise, as in Figure 7.

Conclusions
In this paper, a shrinkage curves optimization is proposed for weighted nuclear norm minimization.Based on the theory [7] that the weighted nuclear norm minimization can be solved by imposing a soft threshold operation on the singular values, the proposed optimization method designs a penalty function using the difference between a latent matrix and its observation and employs odd polynomials to shrink the singular values of the observation matrix.As a result, the coefficients of polynomial fully characterize the shrinkage operator.Furthermore, the Frobenius norm of the penalty function is converted into the corresponding spectral norm, so the parameter optimization can be easily solved by off-andshelf plain least-squares.To validate the effectiveness, the proposed parameter optimization method is adapted to image denoising.In our experiments, a total of 256 noisy images were tested.They contained the noisy version of thirty-two noise-free images, corrupted by eight different levels of noise.Experimental results show that the proposed denoising method is effective in terms of the comparative results.The proposed method achieves better results than the Bilateral Filter; when the noisy standard deviation is less than 20, the proposed method slightly outperforms the BM3D method and is weaker than the BM3D method when the noisy standard deviation varies from 20 to 40.

Figure 1 :
Figure1: Illustration of the shrinkage scale of the th largest singular value.From left to right and top to bottom, the th inset is associated with the th largest singular value.In each inset, the blue diagonal line is the ground singular value, and red line is its estimate.The parameters: patch-size √ × √ = 9 × 9 and  = 1.

Figure 2 :
Figure 2: Illustration of influence of different parameters for denoising performance.The left inset illustrates the dependence of the parameter  on PSNR results when the size of patches is fixed to 7 * 7. The right inset illustrates the dependence of the patch sizes on PSNR results when the parameter  is fixed to 4.

Figure 3 :
Figure 3: Sixteen widely used noise-free images associated with the group Γ 1 .

Figure 5 :
Figure 5: Sixteen noise-free biomedical images associated with the group Γ 2 .

Figure 6 :
Figure 6: Visual results for four biomedical images.Row 1 corresponds to noisy images corrupted by noise ( = 35).Row 2 corresponds to denoised images.Row 3 corresponds to zoomed-in views.Columns correspond to different images.

Figure 7 :
Figure 7: The mean PSNR and SSIM results for fixed noise from four methods.
The proposed OSC method does not work on the whole image at once, but rather a matrix-set in which each matrix contains a fixed number of similar patches extracted from the original noisy images.A patch is first defined as follows.Definition 1. Y denotes an image, sized  1 ×  2 pixels.Let   = Y(, ) be a reference pixel.A block of size √ × √ is extracted from Y, where   resides at the top-left corner.By applying vec to the block, the √×√ block is identified with a vector in R  .The corresponding patch is defined by 3.1.Denoising Modeling Data.

Table
W that containing the parameters c opt .
O b t a i n a n e w Y() ← Average the pixels for fixed position in the image canvas; Obtain an estimated image, X() ← obtained canvas image is entry-wisely divided by D; (19) end for Output: The final estimated image Ŷ(3) × 128 + 127.Algorithm 2: Optimizing shrinkage curves based denoising algorithm.

Table 1 :
Parameters used in training and denoising procedures.