Dictionary learning problem has become an active topic for decades. Most existing learning methods train the dictionary to adapt to a particular class of signals. But as the number of the dictionary atoms is increased to represent the signals much more sparsely, the coherence between the atoms becomes higher. According to the greedy and compressed sensing theories, this goes against the implementation of sparse coding. In this paper, a novel approach is proposed to learn the dictionary that minimizes the sparse representation error according to the training signals with the coherence taken into consideration. The coherence is constrained by making the Gram matrix of the desired dictionary approximate to an identity matrix of proper dimension. The method for handling the proposed model is mainly based on the alternating minimization procedure and, in each step, the closed-form solution is derived. A series of experiments on synthetic data and audio signals is executed to demonstrate the promising performance of the learnt incoherent dictionary and the superiority of the learning method to the existing ones.
National Natural Science Foundation of China612731956147326261503339Natural Science Foundation of Zhejiang ProvinceLQ14F030008Zhejiang Hua Yue Institute of Information and Data Processing1. Introduction
Sparse representation (SR) theory [1, 2] indicates that a signal can be represented by certain linear combination of a few atoms of a prespecified dictionary. It is an evolving field, with state-of-the-art results in many signal processing tasks, such as coding, denoising, face recognition, deblurring, and compressed sensing [3–7].
A fundamental consideration in employing the above theory is the choice of the dictionary and this leads to the famous dictionary learning (DL) problem. DL has attracted a lot of attention since its introduction at the end of last century [8, 9]. Most of the research has been done to learn a data adaptive dictionary so that a particular class of signals can be sparsely represented in this dictionary with low approximation error.
Under the SR framework, a signal vector y∈RN×1 can be expressed in the form of(1)y≈∑k=1KxkD:,k≜Dx,where D∈RN×K is the dictionary with its columns {D(:,k)} referred to as atoms (throughout this paper, MATLAB notations are used) and x∈RK×1 is the corresponding sparse coefficient vector.
Let v∈RK×1 with v(k) being its kth element. The lp-norm of vector v is defined as (2)vp≜∑k=1Kvkp1/p,p≥1.Note that vp is not a norm in a strict sense for 0≤p<1. For convenience, v0 is used to denote the number of nonzero elements in v. A vector y given by (1) is said to be S-sparse in D if x0=S.
Let ymm=1M be a set of training samples from a class of signals to be considered. The basic problem of DL is to find a dictionary D such that, for each ym, there exists a vector xm that is sparse. Such a problem has been widely investigated during the last decade or so [10–12] and can be formulated as(3)minD,xmm=1MY-DXF2+∑m=1Mτmxm0,where ·F denotes the Frobenius norm, τm are proper constants, xm0 is the sparsity of sparse vector xm, and(4)Y≜y1⋯ym⋯yM∈RN×M,X≜x1⋯xm⋯xM∈RK×M.Such a problem is difficult to be solved as it is nonconvex in D and X, and ·0 is nonsmooth and highly unstable. A popularly used approach is based on the alternating minimization strategy. A two-stage procedure is usually carried out for solving the above problem and also for avoiding the selections of τm [10–12]. The problem in the first stage is referred to as sparse coding, aiming at finding the (column) sparse matrix X with a given D; that is(5)x^m≜argminxmym-Dxm22s.t.xm0≤S,∀m.Note that the equivalent expression to (5) where the constraint is a fixed sparse representation error (SRE) level can also be formulated as (6)x^m≜argminxmxm0s.t.ym-Dxm22≤ϵ,∀mwith ϵ being the error threshold. Such a problem can be solved using the orthogonal matching pursuit (OMP) based methods [13, 14]. Furthermore, it can be shown that the solution of the above problem is the same as the one of the l1-based minimization below: (7)x^m≜argminxmxm1s.t.ym-Dxm22≤ϵ,∀m,while the latter can be addressed using algorithms such as basis pursuit (BP) [15] and the l1-based optimization techniques [16].
Many algorithms for solving (3) are different from each other mainly in the 2nd stage, that is, dictionary updating. For the dictionary D, in order to code the signals of interest more sparsely, we usually set K>N which means that D is overcomplete. However, this redundancy increases the pairwise similarity of dictionary atoms. According to the work in [13], such a similarity has a direct influence on the dictionary’s performance, especially for the accuracy in sparse coding stage. If any two atoms degenerate to the same vector, this will lead to overfitting to the training data. Thus, incoherent dictionary is expected to improve the performance of the SR model.
Yaghoobi et al. proposed a design method for parametric dictionary [17]. The authors attempted to optimize the dictionary to make the corresponding Gram matrix approximate to the Gram of an equiangular tight frame (ETF), which possesses good coherence behavior. However, this method relies extremely on a priori knowledge of appropriate parameters choosing criterion that is related to a given class of signals. A new algorithm was developed in [18] named INK-SVD. In each iteration of K-SVD algorithm [11], the dictionary updating stage is followed by an additional decorrelation step. Each pair of atoms which has coherence above the threshold should have its inner angle increased symmetrically so as to reduce the coherence. But this procedure will implicitly destroy the original SR result from the K-SVD algorithm. To compensate this problem, the authors of [19] improved the work of [18] by incorporating a new decorrelation step (also related to the ETF according to its low coherence) and a dictionary rotation operation to the update stage. In [20], a weighting model was formulated to balance the coherence of the dictionary and the sparse representation ability, and a gradient-based method was carried out for solving the corresponding problem.
The main objective of this paper is to propose a new incoherent dictionary learning (IDL) method that constrains the coherence of the dictionary and minimizes the SRE and the contributions are threefold:
A novel model is proposed for learning the incoherent dictionary. The main contribution is also located in the dictionary updating procedure. When minimizing the SRE, that is, minDY-DXF2, D is under the coherence constraint by making the corresponding Gram matrix approximate to an identity matrix of proper dimension.
An iterative algorithm that updates the sparse coefficients and the components of the dictionary alternately is put forward to solve the design problem. In every step of dictionary updating, the solution of each component of dictionary is derived analytically.
A series of experiments on synthetic data and audio signals is carried out to demonstrate the performance of each compared algorithm.
The remainder of this paper is arranged as follows. In Section 2, some preliminaries are provided and the main issue of learning incoherent dictionary is also formulated in this part. The algorithm proposed for addressing the corresponding design problem is investigated in Section 3. Simulations are carried out in Section 4 to examine the performance of the proposed algorithm and to compare with the existing ones. Some concluding remarks are given in Section 5.
2. Preliminaries and Problem Formulation
In this section, some preliminaries will be introduced and two main comparisons of this paper are also reviewed in detail. Based on these, we formulate the problem of incoherent dictionary learning with the purpose of increasing the approximation performance of the dictionary to a particular class of signals under the coherence constraint.
The most fundamental quality associated with a dictionary is the mutual coherence (MC) [21]. MC indicates the degree of similarity between different dictionary columns. It equals the maximum absolute inner product between two distinct atoms: (8)μD≜max1≤i≠j≤KD:,iTD:,jD:,i2D:,j2,where T denotes the transpose operator. As shown in [21], a S-sparse signal generated according to (1) can be exactly recovered with OMP as long as(9)S<121+1μD.Roughly speaking, MC measures how two atoms can look alike. Equation (9) is just a worst-case bound and only reflects the most extreme correlations in the dictionary. Nevertheless, MC is easy to be manipulated and it captures well the behaviors of some dictionaries. Generally, a dictionary is called incoherent if the corresponding MC is small [18, 19]. Besides, as pointed out in [19], the coherence of a dictionary is related to the condition number of its subdictionaries. This implies that achieving a low MC value results in well-conditioned subdictionaries.
Define the Gram matrix of D as(10)G≜DTD.It is common to study MC in (8) via the Gram matrix. Let Dsc be the diagonal matrix whose kth element is given by 1/G(k,k) for k=1,…,K. The Gram matrix of D¯≜DDsc, denoted as G¯, is then normalized, such that G¯(k,k)=1, ∀k. Obviously, μD=maxi≠jG¯i,j.
For D∈RN×K, it has been shown in [22] that μ(D) is bounded with (11)μ_≜K-NNK-1≤μD≤1with μ_ being the Welch bound. If each atomic inner product meets this bound, the dictionary is called an ETF. An ETF has a very nice MC behavior and has been considered to be utilized in optimal dictionary design [17, 19].
2.1. Related Works
It is worth noting that ETFs only exist for those matrices D∈RN×K with dimensionality constrained with K≤N(N+1)/2 if the atoms are real. So, one usually replaces the set of ETF Grams with a relaxed version [17, 19] that is defined as(12)Sη≜G∈RK×K:G=GT,Gk,k=1,∀k;maxi≠jGi,j≤η,where 0<η<1 is a constant to control the searching space. Clearly, when η≥μ_, Sη contains all the ETF Grams.
Besides the space Sη, the authors of [19] define a spectral constraint set as (13)F≜G∈RK×K:G=GT, eigG≥0,rankG≤N.Here eig(·) returns the vector of eigenvalues and rank(·) is the rank operator. The algorithm for learning incoherent dictionary proposed in [19] can be outlined as follows:
Sparse coding with OMP.
Dictionary updating employing K-SVD.
Atoms decorrelation through an iterative projection procedure.
Dictionary rotation to minimize the approximate error while keeping the MC unchanged.
The main contributions of [19] lie in the last two steps. The atoms decorrelation is executed by iteratively projecting the Gram of the output dictionary of K-SVD between the sets Sη and F until a stopping criterion is met. With the singular value decomposition (SVD) of the resulting positive semidefinite Gram matrix G^ being expressed as (14)G^=VGΣG000VGT,where VG∈RK×K is orthonormal and ΣG∈RN×N is the diagonal singular value matrix with all its elements being nonnegative, the incoherent dictionary can be obtained as (15)D^=UΣG0VGTwith U being an arbitrary orthonormal matrix. Finally, the authors consider this degree of freedom to further reduce the SRE by solving (16)U^=argminU∈ONY-UΣG0VGTXF2,where O(N) is the set of N×N orthonormal matrices. This is the rotation procedure.
Remark 1.
Compared with the decorrelation operation in [18], the above-mentioned atoms decorrelation can achieve a much smaller MC value. Besides, the additional rotation procedure can slightly redeem SR ability. However, the approximation performance of the dictionary is highly damaged by the iterative projections. Though the dictionary rotation procedure is carried out for compensation, the effect of the sole degree of freedom U on the SR ability is quite limited.
In [20], the authors consider another strategy for IDL, where the dictionary’s coherence is minimized along with the SRE. The cost function can be expressed as(17)minDDX-YF2+γDTD-IKF2≜ϱDwith IK denoting the identity matrix of dimension K. It is clear that IK is the simplest ETF Gram (with N=K). The Lagrange multiplier γ controls the trade-off between minimizing the SRE and minimizing the dictionary’s coherence. With the gradient of ϱ(D) being calculated as (18)dϱDdD=2DXXT-YXT+4γDDTD-D,the update of the dictionary is then executed by the steepest descent algorithm [20].
Remark 2.
(i) The choice of γ remains open-ended and there is no selection criterion introduced in [20]. From the simulation results of [20], larger γ introduces better performance.
(ii) It is well known that the gradient-based algorithms may easily fall into a local minimum if the initialization is not properly set [23, 24]. As gradient-based method is carried out for solving (17), the efficiency and accuracy can be further improved.
2.2. Problem Formulation
Let D∈RN×K be the dictionary, let Y=ymm=1M be the signal set with ym∈RN×1, and let X=xmm=1M be the corresponding sparse coefficient matrix in D with xm∈RK×1 as defined previously. For the problem indicated in (3), we update X and D alternately. For a fixed D, X can be calculated by the greedy algorithms or the l1-based convex optimization methods. In the following, we focus our discussion on the dictionary updating stage. For the traditional case, that is,(19)minDDX-YF2,the authors of [10] simply update the dictionary as D^=YXT(XXT)-1. But when X is not full rank, this method fails to work. The K-SVD algorithm [11] minimizes (19) for each atom separately. When updating the dictionary, the coefficients are also renewed simultaneously. In every iteration between coefficients and dictionary, it needs K times SVD operations. It is a time-consuming algorithm and not well suited to enforcing a coherence constraint which is important for the implementation of sparse coding.
As an ETF can achieve small MC value, this motivates us to design a dictionary that is as close as possible to an ETF [17–20]. So the following constrained model is proposed: (20)minD,xmm=1MDX-YF2+∑m=1Mτmxm0s.t.minDDTD-IKF2.The closed-form solution set of minDDTD-IKF2 has been derived in [7, 24] as (21)D=UIN0VT,where U and V are both arbitrary orthonormal matrices of dimensions N and K, respectively. So (20) can be rewritten as(22)minU∈ON,V∈OK,xmm=1MDX-YF2+∑m=1Mτmxm0s.t.D=UIN0VT.
Remark 3.
(i) Here we choose the identity matrix IK as the target Gram for the following reasons: it is easy to handle (avoiding the iterative projection between Sη and F as carried out in [19]) and expression (21) contains more degrees of freedom than (15) for further minimizing the SRE.
(ii) As pointed out in [20], a flatter singular value spectrum of the dictionary indicates a less coherent dictionary. Our design strategy is under constraint (21) which means that the nonzero singular values of our designed dictionary are all equal (the same as (17) with γ→∞). Hence, better coherence performance can be expected.
3. Coherence Constrained Dictionary Learning
In this section, an alternating minimization algorithm is developed to address the dictionary learning problem (22).
3.1. Algorithm for IDL
To solve the above multivariate problem and also avoid the selections of {τm}, the alternating minimization strategy as introduced for addressing (3) seems a natural choice. The pseudocode of the proposed algorithm (named CCDL, standing for Coherence Constrained Dictionary Learning) is summarized as follows:
Initialization
D(0): initial random N×K dictionary;
Y: training data;
Iter1: number of iterations between sparse coding and dictionary updating;
Iter2: number of iterations between updating U and V.
Calculate the SVD of D(0) as D(0)=U(0)Σ(0)(V(0))T and set i=1.
For 1≤i≤Iter1, do the following.
Step 1.
Set D(i-1)=U(i-1)IN0Vi-1T, and update X column by column by solving (23)x^mi=argminxmDi-1xm-ymF2+τmxm0,∀mwith an OMP based algorithm and get the approximate solution X(i)=x^mim=1M.
Set j=1 and U(0)(i-1)=U(i-1) and V(0)(i-1)=V(i-1).
Step 2.
While 1≤j≤Iter2, do the following:
For fixed V(j-1)(i-1), update U(j)(i-1) by solving(24)minU∈ONUIN0Vj-1i-1TXi-YF2.
The analytical solution will be given in the next subsection.
For fixed U(j)(i-1), update V(j)(i-1) by solving(25)minV∈OKUji-1IN0VTXi-YF2.
The solution will be derived in the next subsection.
Return to Step 2 with j→j+1.
Step 3.
Set U(i)=U(j-1)(i-1) and V(i)=V(j-1)(i-1). End for if i>Iter1.
End. Output D=U(i-1)IN0(V(i-1))T.
3.2. Update the Components of Dictionary
Now, let us focus on solving (24) and (25). For convenience, we omit the superscript (i) in the expressions. As the sparse coefficient matrix X is assumed to be fixed in Step 2, we can rewrite the cost function of (22) as (26)minU∈ON,V∈OKUIN0VTX-YF2≜ϱU,V,where U and V can be updated alternately. Let X have the following SVD (for arbitrary matrix M, the general SVD form can be expressed as M=UmΣm(Vm)T):(27)X=UxΣxVxT.Define(28)QU≜UTYVxPV≜IN0VTXV~≜UxTV=V~1V~2with V~1∈RK×N. We then have two alternative expressions for ϱ(U,V): (29)ϱU,V=UPV-YF2=V~1TΣx-QUF2.Assume that U(j-1) and V(j-1) be given. In what follows, we derive a procedure for updating (U,V) such that (30)ϱUj,Vj≤ϱUj-1,Vj-1.
3.2.1. Update U
First of all, consider(31)Uj≜argminU∈ONϱU,Vj-1.This model can be solved by the following theorem [19].
Theorem 4.
For both B and C belonging to RN×M, the solution of (32)minA∈ONAB-CF2can be characterized as(33)A=VtUtT,where Vt and Ut are both orthonormal matrices given by the following SVD:(34)BCT≜T=UtΣtVtT.
Let(35)PVj-1YT=Uj-1uΣj-1uVj-1uT.The solution to (31) can be derived as (36)Uj=Vj-1uUj-1uT.
3.2.2. Update V
Now, let us consider(37)Vj≜argminV∈OKϱUj,V.Obviously, (30) is satisfied with U(j) and V(j) obtained as the solutions of (31) and (37), respectively.
Note that ϱ(U(j),V)=V~1TΣx-Q(U(j))F2. Define(38)ϱ~V~,W≜V~TΣx-QUjWF2=V~1TΣxV~2TΣx-QUjWF2=V~1TΣx-QUjV~2TΣx-WF2.It is clear that such a function has the following properties:(39)ϱ~V~,V~2TΣx=ϱUj,Vϱ~V~,W≥ϱUj,V,where V~ is defined in (28).
Denote(40)Wk≜V~2kTΣx,k=0,1,2,…,where each V~(k)=(Ux)TV(k) with V(0)≜V(j-1) is constructed using (41)V~k=argminV~∈OKϱ~V~,Wk-1,∀k=1,2,….It follows from (38) and Theorem 4 that the solution to (41), as understood, is given by(42)V~k=Uvk-1Vvk-1T,where(43)ΣxQUjWk-1T=Uvk-1Σvk-1Vvk-1T.It turns out from (39) and (40) that (44)ϱ~V~k,Wk-1≥ϱ~V~k,Wk=ϱUj,Vk,while (41) indicates that(45)ϱ~V~k,Wk-1≤ϱ~V~k-1,Wk-1=ϱUj,Vk-1.This implies that constructing {V~(k)} using (42) and hence {V(k)}={UxV~(k)} makes ϱ(U(j),V(k)) a decreasing sequence and, therefore, the solution to (37) can be estimated with (46)V^j≜limk→∞Vk=Uxlimk→∞V~k.
Remark 5.
(i) It may be possible that V(j)≠V^(j), but ϱ(U(j),V^(j))≤ϱ(U(j-1),V(j-1)) is always true. Therefore, (U,V) can be updated with (U(j),V^(j)).
(ii) For the whole CCDL, there actually exist three loops (indexed by i, j, and k, resp.). For the loop indexed by k, (44) and (45) indicate that the procedure of updating V makes ϱ(U(j),V(k)) decrease as k increases. So the solution (or an approximate one) of (37) can be gotten. Besides, the solution of (31) is derived analytically as (36). All these result in the convergence of the second loop indexed by j, that is, dictionary updating. Assuming that OMP performs perfectly in the sparse coding stage, the nonincreasing trend of Step 1 is ensured. To sum up, the cost function (22) decreases in every step and hence the convergence of CCDL is guaranteed.
4. Experiment Results
In this section, we evaluate the performance of the proposed model and algorithm with synthetic data and audio signals.
4.1. Convergence Performance
Firstly, several simulations will be carried out to verify the convergence performance of the proposed CCDL. As the main contributions of the new method lie in the dictionary updating stage, we focus on the performance of designing the orthonormal matrices U and V, that is, solving (26).
Set N=20, K=80, and number of signals M=1000. There exist two loops in dictionary updating as introduced in the second point of Remark 5 indexed by k and j, respectively. The maximum iteration numbers for k and j are both fixed to 100.
4.1.1. For Synthetic Dictionary
X is taken as a K×M Gaussian random matrix. Two orthonormal matrices Uˇ and Vˇ are generated to form the authentic dictionary Dˇ:(47)Dˇ=UˇIN0VˇT.Then Y is produced as Y=DˇX. The performance is evaluated by(48)Dˇ-D^F2≜ewith D^ being the learnt dictionary.
Starting from an initial random dictionary, Figures 1 and 2 show the convergence performance of the loops indexed by k (with j=1) and j, respectively.
Convergence performance of loop indexed by k with synthetic Dˇ.
Convergence performance of loop indexed by j with synthetic Dˇ.
Remark 6.
(i) Seen from the minimum values of Figures 1 and 2 (very close to zero), they indicate that the design processes of U and V can result in a dictionary which is almost the same as the authentic one, Dˇ.
(ii) The loop indexed by k is embedded in that of j. When j=1, that is, the case of Figure 1, e already achieves the minimum value that is very close to zero. It manifests that the orthonormal matrix V plays a more important role in minimizing e. The result of Figure 2 also verifies this conclusion as the value of e converges in one iteration of the loop indexed by j. Review Remarks 1 and 3 that state that one of the main differences between the proposed algorithm and the method in [19] is the extra degree of freedom V. So better sparse representation ability of the proposed algorithm can be expected.
Figures 1 and 2 show the efficiency of the proposed CCDL, where the authentic dictionary is generated with an ideal format as (47). In what follows, random generated dictionary will be considered.
4.1.2. For Random Dictionary
In this case, the matrix X, the authentic dictionary Dˇ, and the initial dictionary are all chosen randomly of proper dimensions without any correlation. Y is produced as Y=DˇX. Figures 3 and 4 depict the convergence performance of the loops indexed by k (with j=1) and j, respectively.
Convergence performance of loop indexed by k with random Dˇ.
Convergence performance of loop indexed by j with random Dˇ.
The phenomena observed in this case are similar to the previous one, except for the fact that the value of e is larger compared to the case with synthetic Dˇ. It should be pointed out that, for random Dˇ=UˇΣˇ0VˇT, (49)e≥Σˇ-INF2always holds [25]. This explains why the minimum of e cannot approach zero in this case.
4.2. Simulations with Synthetic Data
We now carry out experiments to illustrate the performance of dictionaries learnt with different approaches. As comparisons, algorithms in [11, 19, 20] will be performed. For convenience, the learning systems are denoted as Dictnew, Dictksvd, DictBP, and DictSDB for the proposed CCDL and the methods in the references just mentioned, respectively.
We generate two N×K dictionaries D(0) and Dˇ, both with normally distributed entries. D(0) is used as the initial condition for executing different learning algorithms, and Dˇ is the authentic dictionary. A set of MS-sparse K×1 vectors {xm}m=1M is produced, where each nonzero element of xm is randomly positioned with a Gaussian distribution of i.i.d that has zero mean and unit variance. With the authentic dictionary Dˇ, the set of signal vectors {ym}m=1M is generated by ym=Dˇxm, ∀m, for training the dictionaries.
Set N=20, K=80, S=5, and M=5000, and the number of iterations for dictionary learning is fixed Iter1=100 for all the four methods. Besides, the number of iterative projections and rotations in [19] is set to 100, and the gradient descent is executed 100 times with step size equaling to 0.1 in [20]. For CCDL, the maximum iteration numbers for the loops indexed by k and j are fixed as 100 and 10, respectively.
The mutual coherence performances of different dictionaries are compared and the results are shown in Figure 5. For DictBP, the horizontal axis refers to the constant η which controls the searching space Sη in (12), while for DictSDB the horizontal axis indicates the weighting factor γ for γ=2p with p being an integer varying within [-10,10]. In order to have clear comparisons, some results beyond certain ranges have been omitted, mainly concerning DictBP with too small η.
The mutual coherence performance. (a) Results of DictBP versus η. (b) Results of DictSDB versus γ=2p.
With synthetic data, we test the sparse representation abilities of the learnt dictionaries. The representation accuracy is usually quantified with the mean square error (MSE) defined as [11](50)σmse≜1N×M∑m=1My^m-ym22,where y^m=D^x^m is the reconstructed signal with D^ being the output dictionary of Dictnew, Dictksvd, DictBP, or DictSDB and {x^m}m=1M being the corresponding coefficients of {ym}m=1M in different dictionaries calculated by OMP. Figure 6 depicts the MSE results of different systems.
The MSE performance. (a) Results of DictBP versus η. (b) Results of DictSDB versus γ=2p.
Remark 7.
(i) As known, for DictBP, when η approximates to 1, the system regresses to Dictksvd. The results in Figures 5 and 6 have confirmed this conclusion with the cases where η=1. Besides, if too small η is chosen, the MSE performance of DictBP degenerates, though small mutual coherence values would be achieved.
(ii) The results of DictSDB fluctuate a lot for both tests. Though some surprisingly good performance is achieved, this superiority is highly sensitive to the data. As will be seen in the next experiment, when musical audio signals are tested, the fluctuations of DictSDB become gentle.
(iii) Seen from the recovery accuracy indicator that is most crucial for evaluating the systems’ performance, that is, Figure 6, the results of Dictnew are superior to those of the other three methods in most of the cases.
4.3. Experiments with Musical Audio Signals
The effectiveness of all the algorithms will be evaluated for audio signal coding task which is popularly used for testing the performance of incoherent dictionaries [19, 20].
The audio signals are selected from the “testMusic16kHz” set of SMALLbox [26]. Just like the operations in [19], we divide the recording into 50% overlapping blocks of 256 samples with rectangular windows and arrange the resulting time-domain signals as columns of the training data matrix Y. For each musical excerpt, the resulting Y∈R256×624; that is, N=256 and M=624. An overcomplete Gabor dictionary of size 256×512 is used as the initialization which means K=512. The sparsity level is fixed as S=12 for all the tests. The numbers of iterations are kept with the same settings as used for that of synthetic data. When learning the dictionaries, OMP algorithm is applied for the sparse coding stage.
The recovery accuracy is quantified with the signal-to-noise ratio (SNR) defined as [19] (51)σsnr≜20lgYFY-DX^Fwith D being the output dictionary of each learning method and X^={x^m}m=1M being the sparse coefficient matrix corresponding to Y={ym}m=1M calculated from (52)x^m≜argminxmxm1s.t.ym-Dxm22≤ϵ,∀mby the l1-Homotopy algorithm [16] with ϵ=0.001.
For all Dictnew, Dictksvd, DictBP, and DictSDB, we test each of the ten musical excerpts in “testMusic16kHz” set and keep the average results for comparison. The mutual coherence behavior for each of the learnt dictionaries is depicted in Figure 7.
The mutual coherence performance for each of the learnt dictionaries. (a) Results of DictBP versus η. (b) Results of DictSDB versus γ=2p.
When recovering with l1-Homotopy, the SNR performance is shown in Figure 8.
The recovery SNR performance with l1-Homotopy used for reconstruction. (a) Results of DictBP versus η. (b) Results of DictSDB versus γ=2p.
Remark 8.
In this case, where musical audio signals are tested and l1-Homotopy algorithm is applied for signal reconstruction, the fluctuations of DictSDB become gentle for both SNR and mutual coherence performance versus the weighting factor γ. For DictBP, the results of mutual coherence and SNR are more uniform; that is, smaller mutual coherence leads to higher SNR value. The performance of Dictnew is superior to the others as these results are obtained by averaging ten musical excerpts.
5. Conclusion
In this paper, we have investigated the problem of learning incoherent dictionary. The contributions are threefold. The first one is to have proposed a novel model for IDL which considers minimizing the sparse representation error according to the training signals under the coherence constraint by making the Gram matrix of the dictionary approximate to the identity matrix. An alternating minimization algorithm named CCDL has been presented for solving the learning problem and the solution of each component of the optimum dictionary is derived analytically as the second contribution. The last one is to carry out experiments on synthetic data and musical audio signals to demonstrate the superiority of the proposed model and algorithm.
Competing Interests
The authors declare that there are no competing interests regarding the publication of this paper.
Acknowledgments
This work was supported by Grants of NSFCs 61273195, 61473262, and 61503339, ZJNSF LQ14F030008, and Zhejiang Hua Yue Institute of Information and Data Processing.
BrucksteinA. M.DonohoD. L.EladM.From sparse solutions of systems of equations to sparse modeling of signals and images2009511348110.1137/060657704MR24811112-s2.0-59749104367EladM.2010New York, NY, USASpringer10.1007/978-1-4419-7011-4MR26775062-s2.0-84892329327EladM.AharonM.Image denoising via sparse and redundant representations over learned dictionaries200615123736374510.1109/TIP.2006.881969MR24980432-s2.0-33751379736LiA.ShounoH.Dictionary-based image denoising by fused-lasso atom selection201420141036860210.1155/2014/3686022-s2.0-84934979373JianZ.LuxiH.JianJ.YuX.A fast iterative pursuit algorithm in robust face recognition based on sparse representation201420141168349410.1155/2014/6834942-s2.0-84896300739ZhuX.KuiF.Image recovery algorithm based on learned dictionary20142014696483510.1155/2014/9648352-s2.0-84907257202BaiH.LiG.LiS.LiQ.JiangQ.ChangL.Alternating optimization of sensing matrix and sparsifying dictionary for compressed sensing20156361581159410.1109/TSP.2015.2399864MR33169112-s2.0-84924045952LewickiM. S.SejnowskiT. J.Learning overcomplete representations200012233736510.1162/0899766003000158262-s2.0-0034133184TosicI.FrossardP.Dictionary learning: what is the right representation for my signals20112822738EnganK.AaseS. O.Hakon-HousoyJ. H.Method of optimal direction for frame design5Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing1999Phoenix, Ariz, USA24432446AharonM.EladM.BrucksteinA.K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation200654114311432210.1109/tsp.2006.8811992-s2.0-33750383209SahooS. K.MakurA.Dictionary training for sparse representation as generalization of K-means clustering201320658759010.1109/lsp.2013.22589122-s2.0-84877756294TroppJ. A.Greed is good: algorithmic results for sparse approximation200450102231224210.1109/tit.2004.834793MR20970442-s2.0-5444237123TroppJ. A.GilbertA. C.Signal recovery from random measurements via orthogonal matching pursuit200753124655466610.1109/tit.2007.909108MR24469292-s2.0-64649083745CandesE. J.WakinM. B.An introduction to compressive sampling2008252213010.1109/msp.2007.9147312-s2.0-41949092318AsifM. S.RombergJ.Sparse recovery of streaming signals using l1-homotopy201462164209422310.1109/tsp.2014.2328981MR32604212-s2.0-84904994734YaghoobiM.DaudetL.DaviesM. E.Parametric dictionary design for sparse coding200957124800481010.1109/tsp.2009.2026610MR27223342-s2.0-70450233548MailhéB.BarchiesiD.PlumbleyM. D.INK-SVD: learning incoherent dictionaries for sparse representationsProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '12)March 2012Kyoto, JapanIEEE3573357610.1109/icassp.2012.62886882-s2.0-84867605276BarchiesiD.PlumbleyM. D.Learning incoherent dictionaries for sparse approximation using iterative projections and rotations20136182055206510.1109/tsp.2013.22456632-s2.0-84875655001SiggC. D.DikkT.BuhmannJ. M.Learning dictionaries with bounded self-coherence2012191286186410.1109/LSP.2012.22237572-s2.0-84868011752DonohoD. L.EladM.Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization200310052197220210.1073/pnas.0437847100MR19636812-s2.0-0037418225StrohmerT.HeathJ.Grassmannian frames with applications to coding and communication200314325727510.1016/s1063-5203(03)00023-xMR19845492-s2.0-0242323185AbolghasemiV.FerdowsiS.SaneiS.A gradient-based alternating minimization approach for optimization of the measurement matrix in compressive sensing2012924999100910.1016/j.sigpro.2011.10.0122-s2.0-83955164191LiG.ZhuZ.YangD.ChangL.BaiH.On projection matrix optimization for compressive sensing systems201361112887289810.1109/TSP.2013.2253776MR30641012-s2.0-84877903708HornR. A.JohnsonC. R.20132ndCambridge University PressMR2978290DamnjanovicI.DaviesM. E. P.PlumbleyM. D.SMALLbox-an evaluation framework for sparse representations and dictionary learning algorithmsProceedings of the 9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA '10)2010St. Malo, France418425