Customized Dictionary Learning for Subdatasets with Fine Granularity

distributed


Introduction
Sparse models are of great interest in machine learning and computer vision, owing to their applications for image denoising [1], face recognition [2][3][4], traffic sign recognition [5], visual-tactile fusion [6,7], and so forth.In sparse coding, samples or signals are represented as sparse linear combinations of the column vectors (called atoms) of a redundant dictionary.This dictionary can be a predefined one, such as the DCT bases and wavelets [8], or a learned one based on a specific task or dataset of interest.
With sufficient samples, learning a specialized dictionary instead of using the "off-the-shelf" one has been shown to dramatically improve the performance.Generally, the dictionary and the coefficients are estimated by minimizing the sum of least squared errors under the sparsity constraint.Batch algorithms such as MOD [9] and K-SVD [10] and nonparametric Bayesian methods [11] have shown state-ofthe-art performance.Further, Mairal et al. [12] developed an online approach to handle large amounts of samples.
Recently, theoretical analysis of sparse dictionary learning has attracted much attention.Schnass [13] presented theoretical results of the dictionary identification problem.Sample complexity has been estimated in [14,15].Gribonval et al. [16] analyzed the local minima of dictionary learning.Moreover, to extend the capacity, dictionary learning with specific motivations [17][18][19] has also attracted lots of interests.For instance, robust face recognition [3] is dedicated to particular applications, and Hawe et al. [20] require the dictionary to have a separable structure.While the learned dictionary has significant effects on a given dataset, attaining further specialized dictionaries for subdatasets with fine granularity is an interesting and useful concept as well.For instance, with a dictionary corresponding to facial images of all humans, we want to gain a customized dictionary for each particular individual.However, in this case, standard dictionary learning approaches may be unwarranted or impractical: on one hand, samples for a particular individual (subject) are restricted and insufficient in most cases; on the other hand, even with enough data, learning so many 2 Mathematical Problems in Engineering dictionaries becomes inefficient for computation and storage.We demonstrate further examples, such as customizing handwritings to different styles, matching images of flower to various species, or matching paper corpora to specific proceedings.
In terms of classification tasks, approaches such as Yang et al. [18] and Ma et al. [2] learn a structured dictionary which consists of  subdictionaries on behalf of  different subjects.However they are often unfeasible: firstly, as a part of the global dictionary, the coding performance of the subdictionary is always worse than the global one.Secondly, the subdictionaries for  subjects must be learned together, which becomes inflexible and exacting for a huge .Thirdly, once the global dictionary is obtained, specialization for a new ( + 1)th subdataset would be impossible.
In this paper, we are looking for an effective, economic, and flexible dictionary customization approach, which are supposed to have the following characteristics: (i) We specialize an existing global dictionary by utilizing auxiliary samples obtained from the target subdataset, valid for finer granularity and a small quantity of examples (hence less computations).
(ii) Compared with the global one, the customized dictionary has the same size but smaller reconstruction errors and better representation of the target subdataset.
(iii) The customization for each subdataset is independent; thus we can customize an arbitrary number of subdatasets or attain a particular one alone.
As depicted in Figure 1, we first observed that the corresponding dictionary atoms of the global and the particular subjects often look "similar."This is reasonable, as the dictionary atoms describe the sketches of the object and the basic shapes of all the subjects are consistent.For a more rigorous theoretical analysis, we further considered dictionary identifiability [13] for mixed bounded signal models, that is, signals that are generated from more than one source (reference dictionary).And we proved that if reference dictionaries were close in the sense of the Frobenius norm, the global dictionary learned from mixed signals would be close to each of them.In fact, the global dictionary grabs the common basic shapes of all the subdatasets, regarding characteristics of the subjects as noise and discarding them.
Thus, formulating the dictionary customization problem, we introduced a regularizer penalizing the difference between the global and the customized dictionary.By minimizing the sum of the reconstruction error and above regularizer under sparsity constraints, we exploit the characteristic of the target subdataset contained in the auxiliary samples while maintaining the basic shapes stored in the global dictionary.As a result, a better dictionary, closer to the global one, is obtained.The solution is an asymptotic unbiased estimation of the underlying dictionary and can be seen as a trade-off between learning a new one from data and using an existing one.
To minimize the object function, we considered a general strategy the same as dictionary learning, that is, coding the samples and updating the atoms alternately in each iteration.Further, we present an algorithm that shares the idea with K-SCVD [10], which we call C-Ksvd.The flow chart of our methods is demonstrated in Figure 2. Experiments on tasks such as denoising and superresolution illustrate that our approach can handle the customization problem effectively and efficiently, outperforming both the global one and the normal dictionary learning approach.In addition, our model is also promising for more tasks such as enhancing an insufficient learned dictionary.

Notations
Throughout this paper, we write matrices as uppercase letters and vectors as lowercase letters.Given  > 0, the   -norm of the vector V ∈ R  is defined as ‖V‖  ≜ (∑  =1 |V  |  ) 1/ .In particular, the  0 -norm ‖V‖ 0 counts the nonzero entries of V. Let sign(V) denote the vector such that its th entry [sign(V)]  is equal to zero if V  = 0 and to one (resp., minus one) if V  > 0 (resp., V  < 0).
The Frobenius norm of the matrix  is denoted as where   denotes the th column vector of .

Dictionary Learning with Mixed Signals
Dictionary identifiability [13], that is, recovering a reference dictionary that is assumed to generate the observed signals, is important for the interpretation of the learned atoms.
In particular, Gribonval et al. [16] proved that the loss function of dictionary learning admits a local minimum in the neighborhood of the dictionary generating the signals.
In this section, we consider that there are multiple reference dictionaries and that the signals generated from them are mixed.Further, we prove that if reference dictionaries are close to each other in the sense of the Frobenius norm, dictionary learning with mixed signals admits a local minimum near both reference dictionaries simultaneously.
Without loss of generality, we analyze the case of two signal sources  1 ,  2 .In particular, for the signal source   ( = 1 or 2), assume its signals   ∈ R  are generated by model where   ∈ R × is the reference dictionary of   ,   ∈ R  is the coefficient, and   ∈ R  is the noise.Particularly, the coefficient   is drawn on index set  ⊂ {1, 2, . . ., } such that     is a zero vector and    is a random vector.Assume    and   satisfy the following assumptions similar to [15], where we denote   = sign(  ).Assumption 1 (basic and bounded signal assumption).There exist random variables , , values ,   , and   , such that ( = 1, 2) . ( Remark 2. Almost all sparse signal models such as -sparse Gaussians and Laplacians satisfy the first five formulas, which can be seen as a kind of abstract and generalizing of the basic sparse signal model.
Further, the additional assumptions that the signal is upper-bounded and lower-bounded are standard and mainly used to make the analysis simple and clear [15].In practice, as digital data is gathered with sensors with limited dynamics and stored in float format with limited precision, the boundedness assumption seems to be reasonably relevant.
The index set  is called the support of   and the sparsity  is defined as the number of elements in .Thus the signal model is parameterized by the sparsity , the expected coefficient energy E 2 , the minimum coefficient magnitude , maximum norm   , and the flatness   ≜ E||/ √ E 2 .
Note these assumptions can be generalized to multiple sources case easily, and thus we have the following definition.Definition 3 (mixed bounded signal source).A mixed signal source   is defined as the union set of several signal sources  1 ,  2 ,  3 , . ..; that is, where each source generates the signals by the way described in (2).Further, if  1 ,  2 ,  3 , . . .satisfy the basic and bounded signal assumptions (3) simultaneously, we say that   is a mixed bounded signal source or satisfies a mixed bounded signal model.
Further, for the two signal sources' case, assume  1 and  2 are close in the sense of Frobenius distance; that is, there is a small  ∈ R, s.t.‖ 1 −  2 ‖  ≤ .(As discussed in [15], a dictionary is invariant by sign flips and permutations of the atoms, and we simply assume the atoms have been tuned to attain the minimum distance.)Denoting   the th column of , the cumulative coherence of a dictionary  is defined as The term   () gives a measure of the level of correlation between columns of .Moreover, the lower restricted isometry constant of a dictionary , (), is the smallest number, for any  ∈ R  , satisfying Recall that, for a set  = [ 1 , . . .,   ] ∈ R × , the loss function of dictionary learning is where () is a penalty function promoting sparsity.Now consider a set of mixed signals where  1 ⊂  1 and  2 ⊂  2 ; the dictionary learning can be formulated as where D denotes the set of dictionaries with unit  2 norm atoms.Further, we have the following asymptotic result.( = 1, 2) .
(i)   (  ) ≤ 1/4 and  ≤ /16(‖  ‖ 2,2 +1) 2 assume upper bounds of the correlation level between columns of   and the sparsity .This is common in the analysis of sparse learning [21].(ii) The condition  min <  max would be satisfied with small /.The smaller / is, the larger  max −  min would be.(iii)  ≤ (1/4) impose an upper limit on admissible regularization parameters.Note that limits on regularization parameters are also frequent [22].
In particular, noiseless situation, that is,   = 0, is a special case.Besides,  would be particularly small; for example, if the noise level is 30 dB, then 1 +  = 1.001.
(v) Consider  min  +  <  <  max  and and we can rewrite them as so the conditions would be satisfied for small , in line with that  1 and  2 are close.
To conclude, the assumptions will hold for small cumulative coherence   (  ), sparsity , noise level   , dictionary distance , and chosen regularization parameter .
Remark 5.For the radius , it is lower-bounded by  min , , and .While  is fixed, if the sparsity  is particularly small,  min  will be very small as well and the lower bound of  will be close to .While  is fixed, and  tends to zero, that is, the mixed signal model degenerated into one single source case, then will be held forever and Theorem 4 degenerated into the case in [15], implying that the discussion in [15] can be seen as a special case of ours.
Moreover, the upper bound of  is implied to be less than 0.15, which can be concluded by a discussion similar to [15].Remark 6. Theorem 4 can be generated to  ( > 2) sources case easily by considering a loss min and the proof is similar.
Further, for a set of samples  and two dictionaries ,   , define Note that 2  () =   1 () +   2 (); then we have When () = ‖‖ 1 , the function   () is Lipschitz continuous with respect to Frobenius metric on the compact constraint set D ⊂ R × [16].Thus by choosing a radius  such that Δ  (T,  1 ) > 0, the compactness of the closed set U will then imply the existence of a local minimum D of (18).
First note that assuming   ( 1 ) ≤ 1/4,  ≤ /16(‖ 1 ‖ 2,2 + 1) 2 ,  ≤ (1/4), and   /  < (7/2)( max  − ), then, by the proof of theorem 1 in [15], for any radius  ∈ ( min ,  max ) and any dictionary  that ‖ −  1 ‖  = , we have Further, E 2 /8 ⋅ / ⋅ ( −  1 min ) is monotonically increasing for  >  min  and for  ∈ T, we have ‖− 1 ‖  ≥  −  >  min .Thus For the second item Δ  2 (T,  2 ), we have the same lower bound similarly.Moreover, for the dictionary  2 and any coefficient  with sparsity , we have Then by the theorem 2 and lemma 6 in [16], when () is  1 norm, we have where Assume  is a sample in  2 and its sparse coefficient and noise coefficient are  0 and .As  =  2  0 + , we have taking expectation on each side of it, by assumptions in (2), as  18), (20), and (26), as long as we have which means that E  () admits a local minimum D in U; that is, The result is reasonable, as when those reference dictionaries are similar, the dictionary learned from the mixed signals should be similar to each of them, in order to get less reconstruction errors for each subdataset and hence a lower total loss.

The Regularizer and Dictionary Customization Problem
Now we turn back to the dictionary customization problem.
In particular, the dataset   consists of several separable subdatasets   ,   ,   , . ..; that is,   =   ∪   ∪   ∪ ⋅ ⋅ ⋅ .Further,  0 ∈ R × ( ≫ ) is an existing global dictionary corresponding to   .This is common, as the dictionary for facial images would always be well trained, and the corresponding dataset can be divided by different individuals.Then we would like to customize  0 with some auxiliary samples  = {  |   ∈ R  }  =1 ⊂   , requiring that the customized dictionary  has the same size but behaves better on   .
Obviously,  should have sparse representations and small reconstruction errors on , which corresponds to minimizing ∑  =1 ‖  −  ‖ under sparsity constraint ‖  ‖ 0 ≤ .Further, noting that   ,   ,   , . . .can be regarded as several signal sources and, hence,   as a mixed bounded model.Moreover, accounting for the fine granularity, the differences between those subdatasets   ,   ,   , . . .are small and the basic sketches of them are consistent, implying that the underlying dictionaries for all subdatasets are similar.Thus,  0 should, according to Theorem 4, be close to our customized dictionary  as well, which is also in accordance with the practical observation.Considering the distance induced by the Frobenius norm, this leads directly to a regularizer ‖ −  0 ‖  .
Denote  =  −  0 ; then the customization model can be formulated as a sum of the reconstruction errors and the above regularizer; that is, arg min where   represents the th column vector of ,  ≪  is the sparsity number, and  ≥ 0 is the parameter balancing the prior knowledge of  0 and the information in .
It is worth noting that problem ( 29) is connected with the matrix version of total least squares (TLS) problems [23], which generalized the least squares by assuming noises in both dependent and independent variables.This is interpretable: as mentioned above, the atoms of the global dictionary only grab the main sketches.They regard the characteristics belonging to different subdatasets as noise and discard them.As a result, when considering a particular subdataset, the characteristic information is absent and thus the corresponding atoms of  0 can be seen as noisy.Different from TLS, the tuning parameter  is necessary, as noises in  0 and  are different and should be balanced.We further depict model (29) with the following properties.(29), where = {  }  =1 is the auxiliary data,  0 is the global dictionary,  * is the true one corresponding to the target subdataset, and  is the customized one attained from (29); then

Theorem 7. Consider customization problem
(2) for a fixed , when  tends to infinity,  will converge to  * ; in other words, the minimizer of ( 8) is an asymptotically unbiased estimator of  * ; (3) the tuning parameter  reflects the confidence in  0 ; in particular, if  → ∞,  =  0 ; if  = 0, (29) will degrade into a common dictionary learning problem.
Proof.For 1, as  is the optimal solution of problem ( 29 and the equality holds only when  =  0 .For 2, reshape the loss function as arg min When  tends to infinity, the penalty will tend to zero and thus the loss function will degenerate into the common dictionary learning form.For 3, it is easy to see that  reflects the weight of the penalty in the loss function and the conclusion is reasonable.
According to the third property of Theorem 8, customization can be seen as a trade-off between learning a dictionary and using an existing one, which fills the void between them and implies a more flexible dictionary selection strategy.In particular, for datasets with coarse granularity, select dictionary learning with large amounts of samples.For subjects with fine granularity, customize the existing one with some auxiliary samples, and use a predefined dictionary if no sample is available.
We also emphasize that our model (29) will be valid as long as the assumption is satisfied (i.e., ‖ 0 −  * ‖  ≤ ).As demonstrated in the experiments, there are more applications, such as improving an insufficient learned dictionary or correcting a contaminated one.In addition, for the regularizer, more matrix norms can be selected as well.For example, consider the distance induced by matrix  1 -norm; then  will be sparse and the consumptions in storage and transmission will be reduced greatly.

Optimization
In this section, we first introduce a general optimization strategy and then devise a more straightforward dictionary updating strategy similar to K-SVD [10].

A General Strategy.
A general optimization strategy, not necessarily leading to a global optimum, can be found by splitting the problem into two parts which are alternately solved within an iterative loop.The two parts are as follows.

C-Ksvd Algorithm.
We now turn to a more involved dictionary updating strategy: rather than freezing the coefficient matrix , we update  =  0 +  together with the nonzero coefficients (i.e., only the support is fixed).
In particular, assume that both  and  are fixed except for one column   in the correction matrix  and the coefficients that correspond to it, the th row in , denoted as    .Then the loss function can be rewritten as      − ( 0 + ) Then, the unique solution for problem (37) is where Proof.Denote   = Taking   back into the original problem (37), it becomes a least squares problem and we have Thus, problem (37) has a closed-form solution and the main computation is top SVD of [ √  0  ,    ].In the dictionary updating stage, we can suggest minimization with respect to each column   (for simplicity, omitting   , we directly use   in updating) and corresponding    in sequence, forcing the support of the coefficients fixed.The complete algorithm, named "C-Ksvd", is described as Algorithm 9. Noting that while K-SVD computes the top SVD of matrix Assuming that the sparse coding stage is performed perfectly, a local minimum is guaranteed, as the loss function is guaranteed to be nonincreasing at the update step for   and a series of such steps ensures a monotonic reduction.Compared with the general strategy, the updating for   is more straightforward as it allows tuning of the values of the corresponding coefficients.In addition, each atom can have a unique parameter   as well, on behalf of the confidence level of the th atom   .
Output: a better dictionary .

Experiments
We first showed the effectiveness of our approach on the denoising task, with analysis of the customized dictionary and the tuning parameter .Further, a novel superresolution experiment was illustrated, sharing the idea of transferring knowledge from a related auxiliary data source.In addition, we conducted an experiment that enhances an insufficient learned dictionary by C-Ksvd, illustrating that our model was also valid for more tasks.
6.1.Denoising.We demonstrated the customization results by denoising tasks on facial images drown from PIE Database [28].The denoising process was similar to [1], which included sparse coding of each patch of the noisy image.As the coding performance relied heavily on the dictionary, we could assess the dictionary by the denoising results, which were evaluated by PSNR (Peak Signal Noise Ratio).
In particular, the noisy images were produced by adding Gaussian noises with mean zero and different standard deviation .The patch size and the redundant factor were set as 16 × 16 and 4. (We chose them for the best visual effect while similar comparisons can be attained for different value.)OMP was used for coding and atoms were accumulated until the average error passed the threshold, chosen empirically to be  = 1.15 ⋅ .Results corresponding to three dictionaries were compared, that is, the global dictionary  0 , the one generated by K-SVD, and the one produced by our customization approach.In K-SVD and customization,  0 was used as initialization and the iteration number was set to 10.Moreover, three kinds of  0 were considered, denoted as "global I", "global II," and "DCT," respectively: (1) a dictionary learned by K-SVD, with 40,000 noiseless patches picked from 100 individuals; (2) similar to (1), but learned with noisy patches ( = 20); (3) predefined DCT (discrete cosine transform).
Each experiment was repeated 5 times and results are depicted in Table 1 and Figure 3.It was seen that customization outperformed the global dictionary and K-SVD on both PSNR and visual effects, accounting for the fact that both the common sketches in  0 and characteristics in  had been utilized.Particularly, note that denoising by  0 tended to be too smooth, and results by K-SVD were likely to be too rough.Regarding DCT as a suboptimal global dictionary, the results also showed that our customization is valid for a wide range of  0 .Conducted on a i7-3770 CPU and processed with the same dataset , the average running time for K-SVD and customization were 173.34 s and 48.21 s, respectively, showing that our approach is competitive.In particular, for K-SVD, 119.31 s were used for removing identical atoms.We also display the three dictionaries as images in Figure 4, showing that the customized one was similar to the global one, while the one corresponding to K-SVD was not.
In addition, we plotted the relations of the tuning parameter , the average number (AN) of coefficients for patches, and the PSNR after denoising in Figure 5.It was shown that  could be chosen as the one attaining the minimum average number of coefficients by a quick one-dimensional search.What is more, experimentally, for a fixed  0 , the best  for different individuals was the same, which implies we only need to tune it once while customizing.6.2.Superresolution.Yang et al. [29] proposed a scale-up algorithm via sparse signal representation, which contains two steps: dictionary learning and patch-pairs construction.To reduce the dimension and speed-up processing, Elad [30] applied PCA on the samples and used K-SVD for training.However, this learned dictionary was still a global dictionary, which means that we can further improve the performance of superresolution by customization.
Consider a global dictionary  0 and patches = {  }  =1 sampled from related high-resolution images.We can customize this dictionary to a finer granularity.In particular, by substituting K-SVD, the low-resolution dictionary   and the coefficient  was customized by C-Ksvd, and the corresponding high-resolution dictionary was attained by where  ℎ 0 and   0 denoted the initial high-resolution and low-resolution dictionaries, respectively.In this experiment, similar to the settings in [31], we evaluated the proposed approach on the Yale Face Database [32], which contains 11 different 100 × 100 facial images for each of the 15 individuals.A downscaled image was taken as the low-resolution object, and the down-scale factor was set to 3. Further, other images of the same individual were considered as high-resolution auxiliary data.The patch size was set to 3 × 3. The global dictionary  0 was trained by 34,650 patches sampled from the 80% downscaled total dataset.(This is to highlight the relevance of auxiliary data and simulate real conditions, as the total training set is relatively small and clean in our experiment.)Results produced by  0 , K-SVD (i.e., the original version with  0 as initialization and  as training data), and customization were compared.For customization, 225 patches were taken from each auxiliary image.For K-SVD, the total number of sampled patches was fixed to 6,000, to gain the best results.
Varying the number of the auxiliary images and repeating the experiments on different individuals, the performance was evaluated by PSNR.Some of the results are summarized in Table 2. "DL" and "Cus" represent K-SVD and customization, respectively, and "3," "6," or "9" denote the number of auxiliary images."Bicubic," that is, simple Bicubic interpolation, is shown as a baseline method.
It is seen that if the number of auxiliaries is small, results produced by K-SVD are worse than the common dictionary, implying that the learning is meaningless.However, even when the auxiliary data is small (675 patches from 3 images), superresolutions by customization have significant improvements.Further, customization still outperforms or is no worse than the learning approach when new data is added.Remember that the number of patches which K-SVD needs is much larger than that needed for customization, meaning more computations and time are required.Also note that once the dictionary has been customized, it is valid for all the images of the person.

Enhancing.
As was mentioned above, model (29) can be applied to more tasks, as long as the assumption that  0 and the reference dictionary  * are close.In this subsection, we consider the case of enhancing an existing dictionary by C-Ksvd and evaluate the performance on classification.
In particular, LC-KSVD [33], one of the state-of-the-art methods for image classification, introduced a triple model (, , ), where  represents the dictionary,  stands for parameters of the label consistent term, and  denotes the linear classifier.Regarding (  , √  , √  )  as a new dictionary, the following object function can be solved by K-SVD: arg min Sometimes the model  0 = ( 0 ,  0 ,  0 ) learned is likely not good enough, due to the fact that the training data may be insufficient or too noisy.Moreover, over time the past training information often becomes unavailable.In this case, we can further enhance it by our customization model, simply replacing the K-SVD procedure with C-Ksvd.The number of auxiliary images  In accordance with [33], we used the "Extended Yale-B" dataset [34] to demonstrate that the performance and the data were divided into three parts: training data to obtain the initial model, auxiliary data for model enhancement, and test data to evaluate it.Parameters  and  were tuned while training the initial models and then kept fixed.The initial model  0 , LC-KSVD, and C-Ksvd were compared, where LC-KSVD used  0 as initialization and  as training data.
Results were analyzed in three ways.
(1) Initial models of different levels were obtained by tuning the number of training samples, and we tried to promote the model with 800 auxiliary images.After repeating the experiment 5 times for each level, the averaged recognition accuracies are summarized in Table 3.
It is seen that C-Ksvd is valid in a wide range of initial models and always significantly outperforms LC-KSVD.Besides, influences of the initial models on LC-KSVD are relatively small, in accordance with our previous analysis.
(2) For fixed initial models, varying the number of auxiliary images from 100 to 1100, we plotted the corresponding recognition results in Figure 6 and found that the accuracy had a significant increase, even when the auxiliary number was relatively small.To gain a competitive result for LC-KSVD, large amounts of images were required, which was unaffordable.(3) In the previous discussion, the auxiliary images were uniformly sampled from all 38 individuals.Then we considered the nonuniform case where only images of several classes (named "enhanced classes") were available.After setting the number of enhanced classes as 19, and getting 31 images from each class, the results are reported in Table 4.
While C-Ksvd improved the accuracy on enhanced classes, the accuracy on the remainder was slightly reduced, owing to the similarity of the original and new dictionaries.LC-KSVD presented a sharp contrast.

Conclusion
In this paper, we considered the dictionary customization problem, which can be seen as a trade-off between learning a new dictionary from data and using an existing one.We investigated our hypothesis with theoretical analysis and formulated a model by raising a specific regularizer.An efficient algorithm was proposed, and experiments on realworld data demonstrate that our approach is promising.

Figure 1 :Figure 2 :
Figure 1: Three sorted dictionaries corresponding to two individual ones and the global one are demonstrated as images.Each one has a size of 64 × 256.The dictionaries for individuals 1 and 2 are trained from 40,000 patches picked from 24 of their corresponding facial images.The global one is trained from 80,000 patches sampled from 200 facial images belonging to 50 different individuals.

Figure 6 :
Figure 6: Varying the number of auxiliary images with fixed initial models.
)   represents the error when the th dictionary atom is removed.Now we shrink the loss function to the support of row vector    .Define   as the group of indices pointing to samples {  } that use the atom   =  0 +   ; that is,   = { | 1 ≤  ≤ ,    () ̸ = 0}.Further, define Ω  ∈ R ×|  | as oneson the (  (), )th entries and zeros elsewhere.
Sparse coding stage: use any sparse recovery algorithm to compute the coefficients   for each sample   by approximating the solution of min {  }       − ( 0 + )        ̸ =      .(b) Define the group of samples that use this atom as   .Restrict   and    by choosing the columns corresponding to   , obtain    and    .(c) Apply top SVD decomposition to [ √  0  , (ii) Dictionary updating stage: for each column  = 1, 2, . . .,  in , update it by (a) Compute   by   =  − ∑

Table 1 :
Denoising results (PSNR, dB) on facial images of different individuals with the noise level  = 30.For each image, three kinds of  0 were considered.

Table 2 :
PSNR for superresolution on test images.

Table 3 :
Accuracy (%) for different level initial and fixed number auxiliary images.

Table 4 :
Accuracy (%) on enhanced classes, remainder classes, and all classes.