Incoherent Dictionary Learning Method Based on Unit Norm Tight Frame and Manifold Optimization for Sparse Representation

,


Introduction
In recent years, the research of dictionary learning has attracted a lot of attention because a learned dictionary captures some of the intrafeatures of training samples in many applications like denoising [1], compressed sensing [2], pattern recognition, and classification tasks [3][4][5].A learned dictionary allows an interesting signal to be represented as a linear combination with relatively few atoms, and the representation coefficients are as sparse as possible.Hence, the problem of dictionary learning can be stated as follows [6]: where  = {y  }  =1 ∈ R × , D is the admissible set of all column-normalized dictionaries,  = {d  }  =1 ∈ R × is an overcomplete dictionary ( < ), and each column of  is referred to an atom.X represents the admissible set of all sparse coefficient matrices (i.e., most of the entries are either zero or are sufficiently small in magnitude), and  = {x  }  =1 ∈ R × . represents nonzero numbers in x  .
Equation ( 1) is not a convex problem regarding the pair (, ), so most dictionary learning methods employ alternating optimization over  and .The following two stages are repeated until convergence: ( (2) Dictionary update The first stage is a sparse coding with  fixed, and the second stage is a dictionary update that updates partial atoms with  fixed.
(1) Related Work.Different applications tend to use different optimization algorithms for learning sparsifying dictionaries to obtain the desired characteristics.Traditional dictionary learning methods, such as the method of optimal directions (MOD) [7] and -means singular value decomposition (K-SVD) [8], aim at optimizing a dictionary to represent all training samples sparsely, but the coherence between atoms is ignored.However, many studies of compressed sensing focus on the mutual coherence of an effective dictionary (the multiplication of a sensing matrix and dictionary) [9][10][11][12], which is a key factor in controlling the support of solutions of the least-squares with  1 penalized and greedy problems.Furthermore, highly incoherent dictionaries tend to avoid ambiguity and improve noise stability when sparse coding is enforced.Therefore, an incoherent frame is applied typically to optimize the sensing matrix in compressed sensing.Tsiligianni et al. [13] constructed an incoherent frame to optimize a sensing matrix by the averaged projections onto a Gram matrix and obtained better performance of sparse signal recovery.Rusu and González-Prelcic [14] directly optimized the maximum inner product between pairs of atoms to construct incoherent frames using convex optimization.It is unlikely that we will focus on learning an incoherent dictionary for sparse representation.
Much research has concentrated on reducing the coherence of a learned dictionaries.Yaghoobi et al. [15,16] introduced the optimization of dictionary coherence by imposing a minimal coherence constraint to design a parametric dictionary to deal in advance with a relatively well-known signal model.The penalty term on the coherence is added in the dictionary learning; therefore, (1) can be reformulated as follows [17][18][19][20]: Inspired by MOD [7], Ramirez et al. [17] proposed the method of optimal coherence-constrained directions (MOCOD) to learn a dictionary.The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm was used to cooptimized the coherence between atoms and the performance of sparse representation [18].Abolghasemi et al. [19] constructed an incoherent dictionary by using steepest-gradient descent in the iteration of K-SVD.Bao et al. [20] proposed a hybrid alternating proximal algorithm for incoherent dictionary learning accompanied by a convergent analysis and a proof.The abovementioned methods cannot reduce the coherence obviously because the sum of the squared inner product of all atom pairs is minimized in the second term of (4).Meanwhile, a learned dictionaries cannot obtain the target coherence arbitrarily.Mailhé et al. [21] proposed an incoherent K-SVD (INKSVD) algorithm, in which each pair of atoms, having a higher value than the target coherence, is decorrelated in the dictionary update after K-SVD is performed.The key idea is to cluster atoms and symmetrically decrease the correlation of each pair of atoms based on a greedy method.The main drawback is that if the target coherence is set too low, the work will not perform well on sparse representation (Figure 5), and the computation will rise dramatically (Table 2).Barchiesi and Plumbley [22] proposed an incoherent dictionary learning method that enforces the iterative projection (IP) onto the spectral and structural constraint set in order to obtain the optimal Gram matrix.As a result, dictionary optimization was performed based on the orthogonal Procrustes problem (OPP) for better sparse representation performance.More recently, Rusu and González-Prelcic [14] directly constructed an incoherent dictionary followed by orthogonal constraints such as in [22] for sparse representation.Similar work has been done with regard to the dictionary optimization by [14,22].We are only reporting results obtained using methods from [21,22], as they seem to provide better performance of incoherent dictionaries and sparse representation.Note that it is a difficult task to obtain an arbitrarily low coherence, and this is not approximating the flat spectrum of an equiangular tight frame (ETF) using the methods from the literature [21,22] (see Section 4).Additionally, our proposed methods improved [21,22].Manopt is employed to solve orthogonal Procrustes problem in the dictionary optimization because only reducing the coherence of a learned dictionary will degenerate the performance of sparse representation, which is very different from the work done in [14,21,22] and is also the major contribution of our work.
(2) Our Contributions.There are three specific characteristics of our proposed incoherent dictionary learning methods that distinguish them from prior methods.
(1) Rather than general dictionary learning methods, an efficient framework based on a unit norm tight frame (UNTF) is developed for solving the incoherent dictionary learning problem, which constrains the dictionary to approximate to ETF.
(2) The mutual coherence of the dictionary is reduced by alternately restricting the tightness and coherence, which gives a significantly higher incoherence than those reports in [21,22].
(3) We use manifold optimization (Manopt) to solve the problem of optimization with orthogonal constraints, that is, (14), which aims to obtain better performance Mathematical Problems in Engineering 3 from incoherent dictionaries and sparse representation.Experiments are carried out on synthetic data and real audio data to illustrate the better performance of our proposed methods.
(3) Organization of Paper.The rest of this paper is organized as follows.Section 2 gives the definitions of mutual coherence and ETFs, after which our proposed algorithms are presented in Section 3. Section 4 gives the details of dictionary optimization by employing Manopt.Section 5 reports on extensive experiments that were carried on synthetic data and real audio data.Finally, conclusions are drawn in Section 6.

Incoherent Dictionary
2.1.The Mutual Coherence.The mutual coherence of a dictionary is defined as the maximum inner product between atoms [23]: where d  and d  denote two different atoms.
The coherence measures the similarities between atoms.We have that () ∈ (0, 1), and a dictionary is considered incoherent if () is small.Mutual coherence is an important sufficient condition to provide a theoretical guarantee for exactly sparse signal recovery.
Theorem 1 (see [9,10]).Let  ∈ R × be overcomplete dictionary with mutual coherence (), if condition ( 6) is satisfied where  is nonzero numbers in x.Consider the system y = x, in which case x can be recovered using basis pursuit (BP) and orthogonal matching pursuit (OMP).Theorem 1 shows that an incoherent dictionary is desirable; here, the best expectation is that the mutual coherence can reach the Welch bound.
Theorem 2 (see [25]).Consider overcomplete dictionary  ∈ R × with normalized columns.The coherence satisfies The bound is achieved if and only if matrix  is an equiangular tight frame (ETF).
Therefore, optimizing a dictionary to approximate ETF is an effective method to reduce the coherence in sparse representation.
(3) The columns form a tight frame.That is to say,   = (/)I  , where   is a unit matrix with  × .It follows that  is the lowest coherence, and matrix  is full row rank, and  nonzero singular values are equal to √/.

Our Proposed Incoherent Dictionary Learning
Frames are an overcomplete version of a basis set and tight frames are an overcomplete version of an orthogonal basis set.ETFs generalize the geometric properties of an orthogonal basis [26].However, ETF is difficult to construct.
In particular, Tropp et al. [27] have demonstrated that -tight frame is the closest design in the Frobenius-norm sense to the solution of the relaxed problem.
Theorem 4 (see [27]).Given the matrix  ∈ R × with  < , suppose that it has singular value decomposition Σ  , then the matrix   is called orthogonal polar factor.With regard to Frobenius norm,   is proximate -tight frame to the matrix , and it is also obtained by computing (  ) −1/2 .
We call the given -tight frame UNTF if all columns ‖  ‖ 2 = 1, in which case  = √/.UNTF is employed in our proposed methods, because it is closest to the computed low-coherent dictionary in terms of its Frobenius norm.

Our Proposed Incoherent Dictionary Learning Algorithms.
To constrain the coherence between atoms, (3) can be reformulated as argmin where   is the target coherence.Next, we modify the INKSVD [21] and IP [22] algorithms according to Theorem 4 in the expectation that the new dictionary will be proximal to ETFs.Following these modification, the proposed algorithms are named UNTF-INKSVD and UNTF-IP, respectively, in order to emphasize our different framework as well as prior work.
In the first algorithm, the coherence of an initial dictionary is reduced sequentially by finding a new dictionary, which has a lower coherence and is nearest to the previous one.Accordingly, we modify the objective function based on Theorem 4: where Φ is the reference UNTF of the previous dictionary.Equation ( 10) can be resolved based on local convex problems by using convex-optimization toolbox (CVX) (http://cvxr.com/cvx/doc/CVX.pdf).The proposed algorithm is called UNTF-INKSVD, as discussed in relation to Algorithm 1. Firstly, we take the normalized Φ 0 as the initial UNTF; then (10) is used to seek for a new dictionary with a lower coherence, which is proximal to the reference UNTF of the previous one.That is to say, Φ −1 can be viewed as the reference UNTF in the th iteration.Lastly we project the new dictionary onto the UNTF manifold, √/(  ) −1/2 , achieving an incoherent tight frame.Thus, the constraints between optimizing coherence and projection onto the UNTF manifold are performed alternately in the iterative dictionary update, yielding a tightness and lower coherence between atoms.

The Improvement of IP Algorithm.
The off-diagonal entries   of the Gram matrix  =    represent the coherence between atoms, so another technique for reducing coherence is to operate on the entries of the Gram matrix also in order to meet the following property: where   is the target coherence.Barchiesi and Plumbley [22] proposed iterative projections (IP) onto Gram matrix to reduce the correlation between atoms.Shrinkage is performed on the off-diagonal entries of the Gram matrix based on the following function: Unfortunately, the rank of shrunken Gram matrix may be greater than .Therefore, SVD is used to keep the best rank- approximation.The decomposition can be used further to extract the square root of the new Gram matrix, thus obtaining the optimal dictionary .
In the second algorithm, the coherence of the initial Gram matrix decreases sequentially upon finding a new Gram matrix that has a lower coherence and is nearest to the Input: initial dictionary ,   , iterations; Output:  opt (1) Initialize  = 1; (2) [Σ] = SVD(); (3) Φ 0 = Σ    ; (4) normalize the columns of Φ 0 ; (5) while  ≤ iterations do (6) Compute the reference Gram matrix   = Φ  −1 Φ −1 ; (7) Apply (12) to   for decreasing the coherence; (8) Apply SVD to   to obtain the matrix   which rank is equal to be ; (9) Building the squared-root of   = where   = Φ  Φ is called to be reference Gram matrix.The core methodology is to operate the reference Gram matrix   , instead of   .Our modified algorithm is referred to as UNTF-IP, and the optimization process is described as Algorithm 2.
Firstly, the closest -tight frame Φ 0 is obtained.Normalization is then executed, after which the Gram matrix  1 = Φ  0 Φ 0 is computed.In the th iteration, Φ −1 can be viewed as the best coherence set over the current dictionary by employing Theorem 4. The above shrinkage operation equation ( 12) is performed, and SVD is enforced to obtain the rank .The updated Gram matrix is then decomposed to obtain the new dictionary.Lastly, we project the new dictionary onto the UNTF manifold, √/(  ) −1/2 , achieving the next reference UNTF.Consequently, we obtain an effectively tighter and lower coherence dictionary that those obtained with the IP algorithms [22].

Dictionary Optimization with Manopt
Only reducing the coherence in (10) and (13) will result in poor performance of the sparse representation.Hence, after (10) and ( 13) are resolved, we add dictionary optimization to maintain good performance of the sparse representation based on the OPP.Equation (1) can be formulated equivalently as an orthogonal-constraint minimization as follows: It is clear that ()  () =      =   .So the dictionary optimization in (14) has two advantages: (I) good representation performance can be obtained; (II) incoherence remains unchanged.
In [14,22], dictionary rotation (DR) is employed to solve ( 14), but this is performed in the iterative dictionary update of ( 10) and (13).As demonstrated in [28], Manopt provides efficient algorithms to find an optimal solution of the OPP.In the next section, we introduce an optimization framework based on Manopt.
Let () = (1/2)‖ − ‖ 2  .We consider the special orthogonal constraint as a Riemannian submanifold of  ∈ R × .Hence the purpose of manifold optimization is to find an optimal solution of  for the following model: where the search space M is a Riemannian manifold that can be linearized locally at each point  as a tangent space   M.
The inner problem at the current iterate   ∈ M is defined as follows: min where grad(  ) and Hess(  ) are the Riemannian gradient and the Hessian of the cost function at   , respectively.The Riemannian gradient of  at  is defined as follows: grad where ∇() = ( − )()  is the gradient of  as a function in R × and skew() = ( −   )/2.Intuitively, we also define the Riemannian Hessian of  at  along grad(): where ∇ 2 () =  grad()()()  is the Hessian matrix of  at  along grad () as a function in R × and sym() = ( +   )/2.
Next,   is calculated based on inner iterations with Steihaug-Toint truncated conjugate gradient (tCG) [29]; a candidate next iteration is produced by The term    is a retraction function on the manifold M and describes the mapping between the tangent space    and M for any point   .A simpler mapping is selected:    (  ) =   +     .Let  +  be orthogonalized.The selection of whether to receive or discard the candidate and quotient is used to update the trust-region radius: We optimize  using the Manopt toolbox [29] while  is fixed.Algorithm 3 presents the procedure for this optimization.Afterwards, the optimal  is obtained by  = .A better performance can be achieved for sparse representation, and the dictionary coherence remains unaffected.Furthermore, this optimization leads to a faster algorithm because it can be performed after the dictionary update process, which is in contrast to [22].

Experiment Results
In this section, we report on the experiments with synthetic data and real audio data that were intended to compare Input: , ,  and iterations Output:  (1) Initialize  = 0,  0 = eye(); (2)  0 = 0.1, Δ 0 = 45.6889,Δ = 365.5114, = 1 − 9; (3) while  < iterations and ‖ − ‖ our proposed incoherent dictionary learning with the prior methods.All the experiments were performed on a Dell computer with 4 GB of memory and a 2-core 2.6 GHz Intel Pentium processor.All the codes were written in MATLAB.

Incoherent Dictionary Construction.
In this experiment, incoherent dictionaries are constructed without learning from training samples, and we aim to reduce directly the mutual coherence of a given dictionary.We set the initial dictionary  ∈ R × randomly, and  = 20 and  = 50 are chosen according to the condition that  < .Each atom is normalized as a unit norm, and the Welch bound is 0.1750.In order to observe the benefits of our proposed methods, the dictionary update is taken as follows: (I) INKSVD [21]; (II) IP [22]; (III) UNTF-INKSVD; (IV) UNTF-IP.The INKSVD and IP are taken from the web (http://code.soundsoftware.ac.uk/).Each algorithm is executed for ten times, and average results are taken.Specifically, the same initial dictionary and iterations are used for the measurements, and we evaluate the mutual coherence of the constructed dictionaries using each algorithm.
Figure 1 shows the mutual coherence of the constructed dictionaries.We note that our proposed algorithms exhibit significantly lower coherences, with the performance of the UNTF-IP algorithm slightly exceeding those of IP, UNTF-INKSVD, and INKSVD.A standard line is √/ = 1.5811 in Figure 2, which indicates that ETF has 20 nonzero singular values that are same.As can be seen, the UNTF-IP and  UNTF-INKSVD algorithms give approximately flat spectra and approximate the properties of ETF.This is a better outcome than with the IP or INKSVD algorithms, because alternating the constraints on tightness and coherence has a beneficial effect on incoherent dictionary construction.As a result, the incoherent dictionary constructed with the UNTF-IP algorithm possess an apparent property with approximation of ETFs.The error bars in Figures 1 and 2 show the standard deviation based on running each test 10 times and demonstrate the consistency of the results.  1 summarizes the tested methods, which are executed for 30 iterations between dictionary update and optimization.Each algorithm is executed for 10 times, and average results are taken.The error bars in Figures 3 and 4 show the standard deviation based on running each test 10 times, and they demonstrate the consistency of the results.Figure 3(a) shows significant coherence of each learned dictionary.The UNTF-IP and UNTF-INKSVD algorithms have better mutual coherence on average than those of IP [22] and INKSVD [21] algorithms.In particular, the coherence of the learned dictionary with the UNTF-IP algorithm is closest to the Welch bound.Note that we have used Manopt to achieve a better performance of sparse representation.A signal-tonoise (SNR) value of 20 log 10 (‖‖ 2  /‖−‖ 2  ) is computed in order to evaluate the performance of sparse representation.The SNR value is showed in Figure 3(b), where it can be seen that Manopt gives a better performance of sparse representation with compared to [21,22], while the coherence is reduced.
Figure 4 shows the ratio between the SNR and the coherence,  = SNR/().The experimental results show that our proposed algorithms perform well equilibrium between the coherence and sparse representation and exhibit a generalized performance of learned incoherent dictionaries.

Application on Audio Data.
To verify the efficiency of our proposed methods, experiments are reported in this section on real audio data via  =  + , where  is a predetermined noise.For the purposes of comparison and analysis, the audio dataset that we use is the one adopted by [21,22], in which the data comprise an audio sample from a 16 kHz guitar recording.Furthermore, all columns in the initial dictionary are selected randomly from training samples and are normalized.
In this simulation, the target coherence   is set in a range from 0.05 to 0.5, and the step size is 0.05.The tested methods are the same as those in Table 1, which are executed 10 times, and average results are taken.The termination criterion is that the target coherence is satisfied.We then evaluate our proposed incoherent dictionary learning methods by computing the mutual coherence and SNR.
As shown in Figure 5, the standard deviation based on running 10 times is showed, and the consistency in many tests is obtained.And when the target coherence is less than 0.3, the proposed method II in Table 1-employing UNTF-IP followed by Manopt-generates the best effect compared to other methods and approximates the lowest bound.However, if the target coherence is greater than 0.3, the SNR of [21] is the highest, followed by that of our proposed method I. Table 2 shows the computational running times.The key idea behind  [21] is to decrease symmetrically the correlation of each pair of atoms having higher coherence based on a greedy method.Therefore, when the target coherence is higher, the number of pairs of atoms to be decorrelated will decrease dramatically, and the computation decreases dramatically as shown in the first row of Table 2. Unlike [22], the most important benefit of our proposed methods is to obtain a better computational efficiency when the target mutual coherence is very low, because Manopt can be performed after the dictionary update process rather than during the dictionary update process.
Compared with the prior methods, the present experimental results indicate that our learned dictionaries have a lower coherence, and with a certain degree of sparse representation.

Conclusion
In this paper, we have proposed two methods of learning an incoherent for sparse representation, adding the dictionary update and dictionary optimization in the traditional dictionary learning.First, UNTF-INKSVD and UNTF-IP algorithms were developed to solve the problem of the higher incoherent and tighter dictionary effectively.Unlike other dictionary learning algorithms, our proposed algorithms learned an incoherent dictionary based on a unit norm tight frame in the dictionary update.An efficient framework was developed for sequentially reducing the coherence of an initial dictionary (or Gram matrix) by finding a new dictionary (or Gram matrix), which has a lower coherence and is nearest to the previous one.Hence, our learned incoherent dictionaries approximate the properties of ETFs, and the support of sparse coding is maximized.
Second, Manopt was employed to solve the orthogonal Procrustes problem in dictionary optimization, because only reducing the coherence of a learned dictionary will degenerate the performance of sparse representation.Meanwhile, we compared our proposed methods with the other methods, and the experimental results showed that our proposed methods balance the performance between incoherence and sparse representation.In particular, our proposed methods provide state-of-the-art results when   is too low and have higher running speeds and better representation performances when compared to [21,22].This is because Manopt is performed after the dictionary update rather than during the dictionary update process.
However, a drawback is that our proposed methods are mainly suitable for learning an incoherent dictionary for sparse representation.Traditional dictionary learning seems to work well if the coherence of a learned dictionary is not restricted.In our work, more general, objective functions are proposed (see (10) and ( 13)) to construct an incoherent dictionary where tightness and coherence are restricted alternately at each iteration of the algorithm, and this method is similar to each alternating minimization.The theoretical proof of convergence in alternating minimization on more than two sets is still an open issue in [13,15,22].Nevertheless, the experiments in our work show that incoherent dictionary learning methods can converge with a set of accumulation points under certain conditions.Our proposed algorithms can result in approximate converge to the values of objective coherence as in Figures 1, 3(a), and 5. Constructed dictionaries with our proposed algorithms give approximately flat spectrum of ETF in Figure 2. The SNR value is shown in Figures 3(b) and 5, which prove the effectiveness of our proposed algorithm when compared to [21,22].The convergence of the objective value does not prove the convergence of our proposed algorithms.Therefore, we will continue to work to prove the convergence of our proposed algorithms.And apply our proposed methods to other domains.

Figure 1 :
Figure 1: The mutual coherence of constructed dictionaries.

Figure 2 :
Figure 2: The singular values of constructed dictionaries.

Figure 3 :Figure 4 :
Figure 3: The mutual coherence and SNR with incoherent dictionary learning methods.

Figure 5 :
Figure 5: Comparing incoherent dictionary learning methods on audio data.

Table 1 :
Our proposed methods for learning incoherent dictionary.In this section, we have investigated the incoherent dictionary learning performance for sparse representation of synthetic data.The training samples are generated via underdetermined  = , where  ∈ R × and  ∈ R × are generated randomly.The dictionary is enforced by a column normalization, where  = 64 and  = 256.The matrix  ∈ R × is a sparse matrix with  = 20,000.The nonzero coefficients are distributed randomly, and their values are determined according to a standard Gaussian distribution.The target coherence   is set as a Welch bound of 0.1085.Table

Table 2 :
The overall execution times (in seconds) with different incoherent dictionary learning methods.