Supervised and Unsupervised Subband Adaptive Denoising Frameworks with Polynomial Threshold Function

Unlike inflexible structure of soft and hard threshold function, a unified linear matrix form with flexible structure for threshold function is proposed. Based on the unified linear flexible structure threshold function, both supervised and unsupervised subband adaptive denoising frameworks are established. Todetermine flexible coefficients, a directmean-square error (MSE)minimization is conducted in supervised denoising while Stein’s unbiased risk estimate as a MSE estimate is minimized in unsupervised denoising. The SURE rule requires no hypotheses or a priori knowledge about clean signals. Furthermore, we discuss conditions to obtain optimal coefficients for both supervised and unsupervised subband adaptive denoising frameworks. Applying an Odd-Term Reserving Polynomial (OTRP) function as concrete threshold function, simulations for polynomial order, denoising performance, and noise effect are conducted. Proper polynomial order and noise effect are analyzed. Both proposed methods are compared with soft and hard based denoising technologies—VisuShrink, SureShrink, MiniMaxShrink, and BayesShrink—in denoising performance simulation. Results show that the proposed approaches perform better in both MSE and signal-to-noise ratio (SNR) sense.


Introduction
Signals are usually corrupted by noise in capturing and transmission stages due to environment disturbance and device error.Signal denoising has become an important research topic for a long time and a wide variety of denoising methods have been proposed.Due to their effectiveness and good performance, wavelet threshold methods have become a powerful tool for denoising problems since Donoho and several others' fundamental works.The main purpose of these methods is to estimate a wide class of functions in some smoothness spaces from their corrupted versions [1].The different energy distribution property between smoothness spaces' functions and noise makes these methods effective: the energy of a function in smoothness is often concentrated on few coefficients while noise is spread on all coefficients.
Of the various wavelet threshold schemes, soft and hard based threshold methods are the most popular technologies and have been theoretically verified by Donoho and Johnstone [2].They gave a near optimal threshold value  = √2 2 log  in minimax sense, where the threshold is a function of noise variance  and the number of samples .This universal thresholding method is known as Vis-uShrink [3].In MSE sense, better threshold methods have been proposed including SureShrink [4] and BayesShrink [5].SureShrink gets the threshold via minimizing Stein's unbiased risk estimate [6].Assuming a generalized Gaussian distribution for wavelet coefficients, BayesShrink obtains the threshold through a Bayesian framework.Different a priori assumptions about the statistical distribution of wavelet coefficients, such as Gaussian scale mixture [7], Mixture of Laplacians [8], and alpha-stable [9], are also proposed, while the dependency on statistical distribution restricts their flexibility.
Soft and hard based threshold schemes suffer their own flaws due to inherent defects of soft and hard threshold functions.For soft threshold schemes, systematical biased Figure 1: A three-level wavelet decomposition (each color represents a level).There are three detail subbands each with a length of   (1 ≤  ≤ 3) and one approximation subband with a length of  3 .estimation could happen, while hard threshold schemes are less biased but less sensitive to small perturbations in the data.In addition, the more important drawback is that soft and hard threshold functions do not have continuous derivatives.Various improvements had been proposed by exploring new threshold functions [1,[10][11][12][13][14][15][16][17], but the nonnegative garrotelike functions [10][11][12][13][14] are still not differentiable.Zhang [1,15], Nasri and Nezamabadi-pour [16], and Wu et al. [17], respectively, proposed a series of threshold functions with adjustable parameters.The differentiable property makes these threshold functions suitable for gradient based minimization problems.
Most of mentioned methods above depend on a single parameter threshold, which makes those methods very sensitive to threshold value and lack of freedom.More flexible and convenient strategies had been proposed by Luisier et al. [18] and Smith et al. [19].Both employ an idea of linear combination: the threshold function is formed by a linear combination of a set of parameters.The parameters are determined via minimizing a MSE problem.Compared to gradient based minimization, this minimization is linear and is easy to solve.In this paper, we adopt the linear combination idea and propose a unified matrix form with flexible structure for threshold function.Based on the unified flexible structure threshold function, both supervised and unsupervised subband adaptive denoising frameworks are established.In supervised denoising, a direct MSE minimization is conducted while in unsupervised denoising, we apply Stein's unbiased risk estimate as MSE estimation.The SURE rule requires no hypotheses or a priori knowledge about clean signals.Furthermore, we discuss conditions for optimal solution for both supervised and unsupervised denoising frameworks.
Our contributions can be summarized as follows: (1) a unified linear matrix form with flexible structure for threshold function and a concrete OTRP function are proposed; (2) both supervised and unsupervised denoising frameworks are established; (3) conditions for guaranteeing optimal solution for minimizing problems are discussed and provided.
The paper is organized as follows.In Section 2, both supervised and unsupervised denoising frameworks are introduced.In Section 3, frameworks employing proposed OTRP function are compared with soft and hard based denoising methods and comparison results are presented.Conclusions are made in the final.

Problem Settings and Denoising Scheme.
In time domain, it is assumed that the clean signal x is additively corrupted by noise e to produce the noisy signal y in the form of where , and e = {  }   = 1 are discrete time series.Additive Gaussian white noise (AGWN) is only considered, with a zero mean and a  2 variance; that is,   ∼ N(0,  2 ).
We only consider orthonormal wavelet transform (OWT), which keeps energy conservation.In OWT domain, the AGWN is still Gaussian.Relationship of wavelet coefficients in vector form is where Y = Wy, X = Wx, and E = We.W is discrete wavelet transform (DWT) matrix.Each vector V (V = Y, X, E) is stacked by both detail and approximation wavelet coefficients as similarly shown in Figure 1.Each detail is a subband.Let V  (1 ≤  ≤ ) denote the th detail subband with a vector length of   . represents the total number of detail subbands and also the number of decomposition level.There holds the relationship   = /2  .We measure the quality of denoising by mean-square error (MSE) defined as where (⋅) is the expectation operator.An estimate of MSE is given as By considering OWT, energy conservation property guarantees that the total MSE is a weighted sum of MSE in each individual subband, which is in the form of where   is a weight of corresponding subband.It only needs to minimize each individual subband MSE to get minimum of total MSE.In the rest of paper, we ignore subband index   and give a unified notation MSE(X, X) to represent MSE in any subband.Wavelet domain denoising generally consists of a threestage procedure.First, perform DWT on noisy signal.Then a threshold function is applied to wavelet domain coefficients.Finally, denoised signal is obtained through an inverse DWT (IDWT).We adopt this three-stage procedure to perform our proposed denoising methods.Each subband is adaptively denoised by supervised or unsupervised denoising method as shown in Figure 2.That makes our approaches work in a subband adaptive manner.There are two input signals in Figure 2 where d represents the desired signal for supervised denoising and y denotes noisy signal.Corresponding uppercase letters of d and y denote their subbands.Both methods output an estimate for clean signal x.

Proposed Threshold Function.
The general problem can be formulated as a construction of threshold function that minimizes the MSE.By employing linear combination idea, an estimate of a clean signal can be represented by a linear combination of atoms; atoms mean the columns of a matrix.Involving wavelet transform, an estimate of clean wavelet coefficients can be represented by a linear combination of atoms which are decided by noisy wavelet coefficients.A general threshold function can be demonstrated as where  is an estimate for clean wavelet coefficients.so  is a weighted sum of atoms and can be denoted as In this paper, the adopted concrete threshold function is OTRP function with order , denoted as This polynomial function could fit almost any curves along with varying polynomial order.The most important advantages are its differentiability and its linear flexible structure over soft and hard threshold functions.
Clearly, (7) conforms with (6) where atom is among soft and hard threshold function and a realization of a three-order () are shown in Figure 3.

Supervised
The optimal parameter a opt can be obtained through MSE(D, )/a = 0 on condition that Ψ  Ψ is positive semidefinite and invertible.These conditions are discussed in Section 2.5.We first derive the optimal solution form as As seen from ( 10), the optimal parameter a opt is affected by matrix Ψ and desired signal D. The estimates of {Ψ  Ψ} and {Ψ  D} are obtained via an estimate of expectation operator in (4): ε{Ψ  Ψ} −1 = (Ψ  Ψ) −1 is obtained from ε{Ψ  Ψ} = Ψ  Ψ/; thus, coefficient 1/ is eliminated by .So, the estimate of a opt is obtained as then

Stein's unbiased risk estimate of MSE(Y + h(Y), X) is
The proposed structure of threshold function does not strictly satisfy conditions of h(Y) in Theorem 1.Because each element of  in ( 6) is R → R mapping while each element in h(Y) is R  → R 1 mapping.Therefore, a tailored version of SURE for  is stated in the following theorem.
The unbiased estimate of MSE(X,) is Proof.Combining ( 2) and ( 8), the MSE is represented as The only unknown part in the above equation is {  E} while equality {  (  )} =  2 {  (  )}, 1 ≤  ≤  proved in [18] ensures that {  E} can be transformed to a known part.Concrete deduction formula is in the following equation: So, MSE can be rewritten as Estimate the expectation of (X  X), (  Y), (∇ ⋅ ), and It is obvious that ( 19) is an unbiased estimate of MSE(X, ).Proof is completed.

Coefficients Determination.
From (6), we first obtain element derivative of  in the form of So, the formula of ∇ ⋅  can be deduced as Thus, Combining ( 6) and ( 18) and (26), MSE(X, ) can be rewritten as a opt is obtained through MSE(X, )/a = 0 on condition that Ψ  Ψ is positive semidefinite and invertible.These conditions the same as in supervised denoising are discussed in Section 2.5.Here, we first derive the optimal solution form as An estimate of a opt is obtained via Theorem 2. From ( 6), we first obtain pointwise derivative of  that can be denoted as Then ( 19) can be rewritten as The result can be found by setting the derivative over a to zero.Still positive semidefinite and invertible must be satisfied for Ψ  Ψ. Estimate for optimal solution is achieved by where  is the mean value in corresponding subband.Figure 4: Clean "blocks," "quadchirp," "multitone," and "multiband," each signal with length of 8192 and audio signal with a length of 65536.

Optimal Solution Guarantee
Discussion.Optimal solution assurance conditions for both supervised and unsupervised denoising are discussed in this section.To guarantee optimal solution of a by setting MSE(X, )/a to zero, the matrix Ψ  Ψ must be positive semidefinite.According to the definition of positive semidefinite matrix, A is positive semidefinite when x  Ax ≥ 0 for any vector x.So, it is easy to verify positive semidefinite of Ψ  Ψ; that is, for any vector x, there always exists Thus, it can be concluded that Ψ  Ψ is always positive semidefinite.Positive semidefinite property guarantees that MSE(X, ) is convex while invertible of Ψ  Ψ makes the optimal solution of a in the form of ( 10), ( 12) and ( 29), (33).It must be full rank to ensure invertible Ψ  Ψ, so we can deduce that Ψ must be full row or column rank because of rank(Ψ) = rank(Ψ  Ψ).In practice, we do not use too many atoms to form matrix Ψ and the number of noisy coefficients  in a subband is much greater than atom number  (i.e.,  ≫ ).In other words, with row number far greater than column number, Ψ is a very "thin" matrix.Thus, matrix Ψ tends to be full column rank and matrix Ψ  Ψ tends to be full rank.

Experiment Settings.
Standard experiment settings for latter simulations are given.The adopted signals in simulations are "blocks," "quadchirp," "multitone," and "multiband" with AGWN.In order to explain the superiority, the proposed methods are also tested by real audio signal.All those clean signals are shown in Figure 4 with each signal having length of 8192 for the first four signals and a length of 65536 for audio signal.The adopted orthogonal wavelet is "sym8" and decomposition level is  = 5.So the th detail subband has a vector length of /2  (1 ≤  ≤ 5), where  can be 8192 or 65536.
In supervised denoising, coefficient a is determined by matrix Ψ and desired subband D in each individual subband.Desired subbands come from desired signal by OWT.Desired signal can be clean or noisy.In practice, it may only have access to noisy signal with known noise level; thus, we adopt it as desired signal to determine coefficient a and then use a to derive the denoised version of noisy signal with unknown noise level.A remarkable notice is that the desired signal and noisy signal are, respectively, the same clean signal corrupted by known and unknown noise level.In unsupervised denoising, coefficient a is determined by matrix Ψ, noisy signal subband Y, and noise variance.There are no requirements for clean signal or desired signal.For simplicity, an estimate for noise variance is provided by MAD.
Applying the proposed OTRP function as concrete threshold function, supervised and unsupervised denoising methods are compared with several other available techniques, that is, VisuShrink, SureShrink, MiniMaxShrink, and BayesShrink.Wavelab 850 is applied to perform the other four techniques; their noise variance estimates are also provided by MAD.Denoising quality is measured by both MSE and SNR, which are defined as (36)

Polynomial Order Simulation.
Before carrying out the final denoising, proper polynomial order needs be explored.Due to the fact that polynomial order is equal to atom number of matrix Ψ, it needs to select proper polynomial order to guarantee full column rank of Ψ and thus full rank of Ψ  Ψ.Although it has already been discussed in Section 2.5 that subband wavelet coefficients number is far greater than polynomial order and Ψ  Ψ tends to be full rank, simulations are still needed.Whether Ψ  Ψ is full rank or not is determined by polynomial order and subband wavelet coefficients.Thus, a probability statistical table is made through simulations.The achieved probabilities make up Table 1 where PO represents polynomial order and SB denotes subband.The probabilities reveal how often Ψ  Ψ is full rank in different PO for different signals corrupted with different noise level.Six orders of PO are tested in Table 1.Due to 5-level wavelet decomposition, there are five detail SBs to be processed.Each SB corresponds to a Ψ  Ψ and different PO corresponds to different dimension of Ψ  Ψ.So, there are 30 combinations shown in Table 1 header where each PO contains five SBs.In -axis direction, Table 1 reflects how PO influences matrix full rank probabilities.In axis direction of Table 1, four signals are tested with each corrupted by different noise variance .This reflects how noise level influences matrix full rank probabilities.For example, the italic number shown in Table 1 means matrix full rank probability is 100%.This matrix is determined by PO2 and "blocks" signal's SB3 in  = 1.6 noise level.All probabilities are obtained by counting matrix full rank times in a 100-trail.
From Table 1, PO1-PO3 always ensure 100% matrix full rank for four signals in all tested noise level.Saying 100% matrix full rank for a signal we mean 100% matrix full rank for every subband of this signal.PO4 only cannot ensure "blocks" due to almost 0% probability for SB5 (bold underlined numbers, bold numbers indicate the probabilities under 100%).For PO5 and PO6, there are more and more bold numbers than PO4.In horizontal direction, the trend is higher PO lower probability for satisfying matrix full rank.In vertical direction, higher noise level tends to lead lower probability for full rank, but PO is a dominant factor.From statistical analysis, we restrict our option from PO1 to PO3.
Visual comparisons of PO1 to PO3 in supervised denoising are shown in Figure 5.As it can be seen, in both MSE and  SNR sense PO1 is inferior to PO2 and PO3 in all four signals.
PO2 is inferior to PO3 in blocks and almost equals PO3 in the three remaining signals.In unsupervised denoising, similar conclusions can be made and are not repetitive state.In the end, we choose PO2 for final denoising simulation.From global and local views of Figure 6, supervised denoising method achieves better MSE and SNR quality than unsupervised denoising method and BayesShrink for all four noisy signals.All these three methods have better MSE and SNR performance than VisuShrink, SureShrink, and MiniMaxShrink.In noisy "blocks" processing, BayesShrink is slightly better than unsupervised denoising method when  is greater than 0.9 while poorer when  is less than 0.9 in both MSE and SNR sense.However, unsupervised denoising method has better MSE and SNR performance than BayesShrink in noisy quadchirp, multitone, and multiband processing.Thus, a conclusion can be made that the proposed supervised and unsupervised denoising methods show their advantages.
Our approaches are also validated for real-world signals.We test supervised and unsupervised denoising approaches by an AGWN corrupted real audio signal.In both MSE and SNR sense, comparisons with changing  in range of 0.1-0.5

3. 3 .
Denoising Performance Comparison.As mentioned above, comparisons are made among supervised and unsupervised denoising methods and VisuShrink, SureShrink, Min-iMaxShrink, and BayesShrink.In both MSE and SNR sense, denoising performance comparisons in global and local views with different noise variance  are presented in Figure6. changes in range of 0.5-2.0 with step 0.1 and each data point plotted in Figure6is a 50-average result.A remarkable notice is that we adopt a clean signal corrupted by noise in  = 0.5 level as desired signal for supervised denoising.Moreover, MSEs in Figure6are plotted in log scale and SNRs in Figure6are plotted in linear scale.

)
[6].Unsupervised Denoising.In practice, when a desired signal is not available, a direct MSE minimization is impossible.Constructing an estimate of MSE seems a reasonable choice.A practical approach is Stein's unbiased risk estimate[6].This part mainly introduces a SURE based unsupervised denoising.
Theorem 1. Y = ( 1  2 ⋅ ⋅ ⋅   ) is a normal random vector with mean X = ( 1  2 ⋅ ⋅ ⋅   ) and identity as covariance matrix.Let Y + h(Y) be an estimate of X, where function