A Weighted Block Dictionary Learning Algorithm for Classification

Discriminative dictionary learning, playing a critical role in sparse representation based classification, has led to state-of-the-art classification results. Among the existing discriminative dictionary learning methods, two different approaches, shared dictionary and class-specific dictionary, which associate each dictionary atom to all classes or a single class, have been studied. The shared dictionary is a compact method but with lack of discriminative information; the class-specific dictionary contains discriminative information but consists of redundant atoms among different class dictionaries. To combine the advantages of both methods, we propose a newweighted block dictionary learningmethod.Thismethod introduces proto dictionary and class dictionary.The proto dictionary is a base dictionarywithout label information.The class dictionary is a class-specific dictionary, which is aweighted proto dictionary. The weight value indicates the contribution of each proto dictionary block when constructing a class dictionary. These weight values can be computed conveniently as they are designed to adapt sparse coefficients. Different class dictionaries have different weight vectors but share the same proto dictionary, which results in higher discriminative power and lower redundancy. Experimental results demonstrate that the proposed algorithm has better classification results compared with several dictionary learning algorithms.


Introduction
Recently, sparse representation based classifications have been extensively discussed with encouraging results.In these methods, choosing a proper dictionary is the first and most important step.In literatures, there are two ways to design a dictionary: prespecified versus adaptive.At the early stage, for the sake of simplicity, predetermined dictionaries (e.g., overcomplete DCT or wavelets dictionaries) are often resorted to.Later on, dictionaries learned from training data [1] obtain more attentions, because the learned dictionaries usually lead to better representation and achieve much success in applications such as classification.
In last decades, many famous dictionary learning approaches have been proposed.These approaches can be divided into two categories: unsupervised dictionary learning (UDL) approaches [2] and supervised dictionary learning (SDL) approaches [3][4][5][6][7].UDL learns dictionary using unlabeled training samples.SDL learns dictionary using labeled training samples.K-SVD algorithm [2] is a popular UDL algorithm that learns a compact dictionary by singular value decomposition from a set of unlabeled samples.It has been widely applied to image processing tasks, such as image compression [8,9], image restoration [10,11], image deblurring [12,13], super-resolution [14,15], and visual tracking [16,17].K-SVD mainly focuses on the representational power of sparse representation but ignores the discriminative power of sparse representation which is critical for pattern classification.Representational power of sparse representation is the capability to sparsely reconstruct sample using sparse coefficient and dictionary.Discriminative power is the capability that these sparse coefficients can be well distinguished when they belong to different categories; this results in the application of these sparse coefficients to classify these samples.
Depending on whether training samples have been labeled, current dictionary learning approaches can be divided into two types: UDL approach and SDL approach.However, depending on whether atoms have been labeled, current dictionary learning approaches also can be divided into two main types: shared dictionary learning approaches 2 Mathematical Problems in Engineering [1,[19][20][21][22][23] and class-specific dictionary learning approaches [3][4][5][6][7].In shared dictionary learning approaches, all atoms do not have label information and are shared by samples from all classes.Shared dictionary learning approaches can be UDL approaches, but they also can be SDL approaches.For example, K-SVD algorithm [2] is a shared and unsupervised dictionary learning approach, but D-KSVD algorithm [20] is an extension of K-SVD algorithm, which is a shared and supervised dictionary learning approach.D-KSVD learns a discriminative dictionary by incorporating linear classification error into objective function.In class-specific dictionary learning approaches, all atoms are labeled, each of which can only be shared by the same class samples.Class-specific dictionary learning approaches must be SDL approach.For class-specific dictionary, class-specific reconstruction errors can be utilized to classify samples.Moreover, some discriminant criterions can be incorporated into dictionary learning processing.For example, Zhang et al. presented a low rank constraint [24], Yang et al. added a Fisher discrimination criterion [5], Ramirez et al. proposed a structure incoherent constraint [4], and so forth.However, when there are a large number of classes, the size of learned dictionary will be very large, and the redundancy of learned dictionary could become serious.Recently, some hybrid methods which combine shared dictionary and class-specific dictionary have been proposed [25][26][27].In these methods, shared and classspecific parts need to be predefined, and the balance of the two parts is not a trivial task which is usually determined empirically.
Although above-mentioned dictionary learning methods have achieved good classification results, labels of these dictionary atoms are predefined and fixed, which may not be able to accurately interpret true structure of data.Yang et al. [28] proposed a latent dictionary learning (LDL) method.LDL learns a latent matrix to build the relationship between dictionary atoms and class labels; this mechanism has achieved very high classification accuracy.
In this paper, we propose a new dictionary learning method, named weighted block dictionary learning (WBDL) method.This method is a compromise between shared dictionary and class-specific dictionary.As shown in Figure 1(a), WBDL learns a proto dictionary which can be shared by all class dictionaries.Proto dictionary  contains  blocks.Assuming training samples have  classes, model should learn  subdictionaries,  1 ,  2 , . . .,   , where   is a classspecific dictionary corresponding to class .Each class dictionary is obtained by multiplying proto dictionary with the corresponding weight vector.For the weight vector, each value in it indicates the contribution of a block when constructing class dictionary.Class dictionary is a class-specific dictionary which represents samples coming from the same class.The sparse coefficients   are obtained by sparsely represented samples into the class dictionary.As shown in Figure 1(b), a new test sample  is represented by each class dictionary, and we get  sparse coefficients, ẑ1 , ẑ2 , . . ., ẑ .Those sparse coefficients can be utilized to classify this test sample.For WBDL model, instead of predefining each block belonging to only a single class, each block of proto dictionary can belong to all classes.The shared dictionary [1,[19][20][21][22][23] could be regarded as a special case of our WBDL model when weight matrix is an all-one matrix.The class-specific dictionary [3][4][5][6][7] also could be regarded as a special case of our WBDL model when weight vector has only one unique nonzero element 1.Compared to shared dictionary and class-specific dictionary, our proposed model is more flexible, and it increases discriminability and reduces redundancy simultaneously.
Our specific contributions are listed as below.
Firstly, for higher discriminative power and lower redundancy, we design a proto dictionary and some class dictionaries, where each class dictionary is a weighted proto dictionary.Our goal is to learn a compact proto dictionary and some discriminative class dictionaries.The class dictionary can represent samples sparsely and discriminatively.
Secondly, sparse coefficients obtained by sparse representation can be utilized to implement the classification.Two classification algorithms are proposed: local classification algorithm and global classification algorithm.When training samples for each class are enough, test samples are locally coded into all class dictionaries.On the contrary, test samples are globally coded into total dictionary.Global classification algorithm is a simplification of local classification algorithm.
Thirdly, weight vector corresponding to each class dictionary is easy to learn as it adapts to sparse coefficients of the samples coming from the same category.These weights can be computed from these sparse coefficients directly.Compared to traditional dictionary learning algorithms, WBDL algorithm would not significantly increase computational complexity.Experiments on some databases show that WBDL algorithm is competitive to some algorithms such as [2,3,5,20,23,29].
This paper is organized as follows.In Section 2, we illustrate the related work, including shared dictionary and class-specific dictionary.In Section 3, weighted block dictionary learning model is proposed and analyzed.Two WBDL classification approaches also are proposed in this section.In Section 4, optimization of WBDL model is described and its two classification algorithms are given.In Section 5, experiments are performed on face recognition and object classification datasets to compare our algorithm with several state-of-the-art methods.We end this paper with a conclusion in Section 6.

Related Work
In this section, we review two types of dictionaries, shared dictionary and class-specific dictionary.2.1.Shared Dictionary.K-SVD algorithm is a popular UDL algorithm, which learns a shared dictionary.K-SVD optimizes the following objective function: where x dictionary Class Test sample  The minimization of ( 1) is solved by a two-step iterative algorithm.Firstly, dictionary  is fixed and sparse coefficients  can be found.This is a sparse coding problem, which can be solved by OMP [30], and so forth.Secondly, sparse coefficient matrix  is fixed and dictionary  is updated one atom at a time while fixing all other atoms in . For

Mathematical Problems in Engineering
Compared to K-SVD algorithm, D-KSVD algorithm adds the second term classification error.D-KSVD dictionary learning method utilizes the class labels of training samples and its dictionary is more discriminative.However, the class labels of atoms have not been taken into account in D-KSVD algorithm.
2.2.Class-Specific Dictionary.Class-specific dictionary should be learned using a SDL algorithm.Suppose that there are  class samples; a class-specific dictionary is denoted as  = [ 1 ,  2 , . . .,   ]; each   is a subdictionary corresponding to class .All atoms of dictionary have been labeled, and subdictionary should be learned or constructed class by class.
Sparse representation based classification (SRC) [3] method is a popular method to construct class-specific dictionary.Suppose that there are  classes of samples;  = [ Equation ( 4) can be regarded as a basic model of classspecific dictionary learning method.This model does not consider the relationship between subdictionaries.Yang et al. proposed a Fisher discrimination dictionary learning method (FDDL) [5], which learns subdictionary for each class.FDDL model can be described as follows: where the first term is data fidelity term, the second term is sparsity penalty, the third term is discrimination term, and the fourth term could make the function smooth and convex.FDDL model makes class-specific dictionary more distinctive.
In this paper, we integrate shared dictionary and classspecific dictionary into a new dictionary learning model and propose WBDL model.
The block structure of WBDL model is ensured by a mixed  2,1 norm regularization.Most aforementioned methods simply adopt  0 norm or  1 norm for sparsity regularization. 1 norm sparsity regularization has also been referred to as Lasso [31].Inspired by the success of structured sparsity (Group Lasso) in the area of compressed sensing, some methods have been proposed for structured sparsity regularization.For example, Bengio et al. proposed group sparse coding (GSC) [21], which joins each category training sample into the same group and regularizes using mixed  2,1 norm.This method encourages the same group samples encoded using the same dictionary atoms.Elhamifar and Vidal proposed block sparse coding (BSC) [32], which also uses mixed  2,1 norm for regularization, but this regularization is used on a sparse coefficient vector rather than on a sparse coefficient matrix.The block sparsity regularization encourages block structure of the learned dictionary.Chi et al. proposed block and group regularized sparse coding (BGSC) [33], which combines group sparse coding and block sparse coding together.As shown by above methods, mixed  2,1 norm is a suitable tool to learn a block structure of proto dictionary.
Generally, dictionary learning model has two unknown variables dictionary and sparse coefficient; WBDL introduces a new variable weight vector.We find that when a dictionary block and the th class samples are more similar than the other dictionary block, this dictionary block is more suitable to represent the th class samples.In consequence, the sparse coefficients corresponding to this block are relatively larger than others.Inspired by this observation, weight vector corresponding to the th class dictionary can be obtained through sparse coefficients of the th class samples.In order to avoid increasing computation complexity, weight vector is constructed through sparse coefficients directly in WBDL model.

Weighted Block Dictionary Learning
Shared dictionary learning algorithm ignores class labels of these dictionary atoms.Recently various class-specific dictionary learning approaches [3][4][5][6][7] have been proposed.These class-specific dictionary learning approaches are based on the assumption that the class label of each atom is invariable during the dictionary learning process.However, since dictionary atoms have been updated, the class label of these atoms should be reassigned in accordance with the updating of these dictionary atoms.The goal of our proposed weighted block dictionary learning model is to learn a labeled adaptive dictionary, which is composed of a proto dictionary and a weight matrix.Each column of the weight matrix is a weight vector that indicates the contribution of each proto dictionary block to construct a class dictionary.As a result, class dictionary is obtained by the product of a weight vector and the proto dictionary.In this section, firstly, we propose weighted block dictionary learning model.Secondly, we discuss the construction of weight matrix.Thirdly, we compare the difference of our WBDL model with BDL model.Finally, two classification approaches are proposed using WBDL model.

Weighted Block Dictionary Learning
where   is the subset associated with class .We design a proto dictionary  = [ 1 ,  2 , . . .,   ] = [ 1 ,  2 , . . .,   ] ∈ R × , where   is the th block of proto dictionary,  is the number of blocks, and  is the total number of dictionary atoms.To better describe the relationship between a proto dictionary and  class dictionaries, a weight matrix  = [ 1 ,  2 , . . .,   ] ∈ R × is introduced into our WBDL model, where   = [ 1, ,  2, , . . .,  , ]  ∈ R ×1 is a vector to indicate the contribution of each proto dictionary block when constructing the th class dictionary.For instance,  , is the weight value of the th block proto dictionary to construct the th class dictionary.Correspondingly, the th class dictionary     2 ) , where  is a proto dictionary,   =  diag(  →   ) is the th class dictionary, and is the th block sparse coefficient of the th sample.The first term denotes the reconstruction error of all th category samples.The second term is the block sparse regularization of all th category samples. is a scalar controlling the trade-off between reconstruction and sparsity.In order to avoid a trivial solution of sparse coefficient   , each dictionary atom be constrained, As shown in (6), WBDL model is a nonconvex optimization problem, in which three unknown variables , , and  need to be optimized.We propose a two-step iterative algorithm to solve this problem.The first step is the following: weight matrix  is fixed and coefficient matrix  and dictionary  are learned, which is a general dictionary learning problem.The second step is the following: coefficient matrix  and dictionary  are fixed and weight matrix  is constructed.Construction of the weight matrix  is crucial for this new dictionary learning model.

Construction of Weight Matrix 𝑈.
Without loss of generality, a weight value is required to be nonnegative, and the sum of which is equal to 1; that is,  , ≥ 0, ∑  =1  , = 1,  = 1, 2, . . ., ;  = 1, 2, . . ., .When proto dictionary and sparse coefficients are fixed, how to calculate weight value for every block is crucial for us.If we do not take into account weight matrix, we can rewrite the block representation of the th class samples as follows: where Obviously, weight value can be computed from other variables directly; three variables that needed to be optimized have been reduced to two variables.The weight value in ( 8) is nonnegative and satisfies the former constraint, ∑  =1  , = 1,  = 1, 2, . . ., .

A Discussion about BDL Model and WBDL Model.
Compared to block dictionary learning (BDL) model [32], our weighted block dictionary learning (WBDL) model introduces a weight vector into dictionary learning.The objective function of original block dictionary learning model can be described as arg min where   is the th block dictionary.The objective function of ( 9) can be rewritten: arg min Compared to the objective function of BDL [32] in (10), WBDL model in (8) deletes weight  , in block sparse regularization.When weight value  , of the th block is larger than weights of other blocks, the th block represents the th class samples better than other blocks, so sparse coefficients corresponding to this block should be even larger than others.In BDL model [32], a big  , value will suppress [] and force the solution   [] to be small.In our proposed WBDL model, we delete weight  , , and this deletion will bring a relative increase of block sparse coefficients [] .These increased sparse coefficients improve discriminative power compared with the original BDL model.

WBDL Classification
Upon the training of the labeled data, we learn a weight matrix  and an extended dictionary (  ,   )  , which is the concatenation of a proto dictionary  and a linear classifier .However, since  and  are normalized jointly in the previous learning process,  does not support sparse code of a new test sample.As proposed in [20], proto dictionary  and corresponding classifier  are normalized as follows: The label of test sample  is determined by the class label of sparse code which has the smallest reconstruction error.
When training samples for each class are not enough, a test sample  can be globally coded.We define a total weight vector, denoted by  = ∑  =1   , which reflects the total relationship between each block of proto dictionary and all involved classes.A big value of   shows that proto dictionary block   is important to represent all classes.Global sparse code can be computed as follows: Global sparse code: Utilizing the former learned linear classifier , the final classification of test sample  can be obtained by the following classifier: Global classifier: where  ∈ R ×1 is a vector.The label of test sample  is determined by the index of the largest element in .

Optimization of WBDL Model
In the objective function of our proposed WBDL model, there are two unknown variables  and , a variable  which can be computed from  directly.We adopt alternated optimization to solve such multivariable problem and design a two-step iterative algorithm.Firstly, weight matrix  is fixed but coefficient matrix  and proto dictionary  are learned, which is actually a general dictionary learning problem.Secondly, coefficient matrix  and proto dictionary  are fixed and weight matrix  is updated, which is a process to learn weight matrix.In this section, we describe the optimization of each step separately and give the whole algorithm at last.

Dictionary Learning.
When weight matrix  is fixed, coefficient matrix  and proto dictionary  are learned.Firstly, we fix proto dictionary  and learn coefficient matrix ; this is block sparse coding.Secondly, we fix coefficient matrix  and update proto dictionary , which is dictionary updating.

Block Sparse Coding.
For example, we compute the block sparse coefficient of a th class sample .Firstly, we obtain the th class dictionary,   =  diag(  →   ), and then we formulate the minimization of ( 8) only for the th block of sparse coefficient ; this optimization is similar to that of BGSC method [33].By fixing   , ( 8) can be written as where (  ) [] is the th block of class dictionary   and  is the term that does not depend on  [𝑟] .Computing the gradient of ( 18) with respect to  [] , we can obtain the following condition: Assuming ‖ [] ‖ 2 > 0, denoting the first two terms by −, substituting the semidefinite matrix (  ) [] (  ) [] with its Eigen-decomposition Σ  , and multiplying with   on both sides, ( 19) can be formulated into Denoting new variable  =    [] , and N =   , we have Setting  = ‖‖ 2 , and ê = /‖‖ 2 , we have We can compute  using Newton's method.Once  is known, we can compute ê and .Finally,  [] can be obtained,  [] = .
When the solution of  is not positive, the above assumption ‖ [] ‖ 2 > 0 does not hold.In this case, the optimality solution is  [] = 0.The proof can be found in [32].

Dictionary Updating.
Let   = diag(  →   )  denote weighted sparse coefficients; we fix  = [ 1 ,  2 , . . .,   ] and update proto dictionary ; the objective function subject to  is as follows: We can minimize the objective function by Lagrange dual method [34].

Learning of Weight Matrix
.We find that sparse coefficients inherit weight information of class dictionary.Motivated by this observation and the details described in Section 3.2, weight matrix  = [ 1 ,  2 , . . .,   ] ∈ R × can be constructed as follows: Mathematical Problems in Engineering 9 where  , is the th block of matrix   ;   is the sparse code matrix of the th class samples.The adopted norm is Frobenius norm. , satisfies the following conditions: In the process of block sparse coding, sparse coefficient is computed one by one.When learning weight vector, each weight vector is computed from all the sparse coefficients of the same class.A weight vector is reflected by all the sparse coefficients of the same class.As a result, the computation of weight vector has not been integrated with block sparse coding.Weight vector is computed after all sparse coefficients have been obtained.
In our proposed WBDL model, for each proto dictionary block, rather than assigning it to only one class, we assign  weight values to indicate its relationship to all class dictionaries.The weight matrix preserves more classlabel information.The construction of a weight matrix in the model adapts to block sparse coefficients and has not improved computation complexity significantly.
WBDL algorithm and its two classification algorithms, local classification algorithm and global classification algorithm, are described as follows.
Output.Proto dictionary  and weight matrix .
Step 1. Initialize  to all-one matrix.
Step 2. Dictionary learning Repeat Block sparse coding: compute sparse coefficient  by minimizing (18) while fixing the corresponding class dictionary.Dictionary updating: update proto dictionary  by minimizing (23) while fixing the weighted sparse coefficients.

Until convergence.
Step 3. Construct weight matrix  by the definition in (24).Output.Proto dictionary , weight matrix , and the classification result of test sample .
Step 1. Learn proto dictionary and weight matrix using WBDL algorithm (Algorithm 1).
Step 3. Compute the label of  using (15).Output.Proto dictionary , weight matrix , classifier , and the classification result of test sample .
Step 4. Separate extended dictionary into  and ; normalize  and  by (13).
Step 6.For a test sample , compute global code ẑ by ( 16).

Experimental Results
In this section, WBDL algorithm was evaluated on three classification tasks of simulation experiment, face recognition, and object recognition.For simulation experiment, we compared Fisher value when block structure and weight matrix were separately introduced.For face recognition, we experimented on two face databases: AR [18] and Extended Yale B [35].For object recognition, Caltech101 database [36] was adopted for evaluation.For all experiments, the randomly selected samples from the same class were taken as the initialization of proto dictionary, and an all-one matrix was taken as the initialization of weight matrix.For global classification algorithm, the scalar  controlling discriminant term was set to 1.

Simulation Experiment.
Compared to general dictionary learning algorithm, WBDL model introduces block structure and weight matrix.In this section, we measure the discrimination of this model by Fisher criterion, which is the ratio of between-class variance and in-class variance.Fisher criterion can be defined as follows: is the sparse code of the th sample belonging to the th class.A bigger Fisher value means a better classification result.We used 52 images from 2 random persons in AR database for this simulation.For each person, we randomly selected 20 images for training and the remaining 6 images for testing.We used the same parameters for all the following four methods, D-KSVD [20], WDL, BDL, and WBDL.WDL is the algorithm which only introduces weight vector, and BDL is the algorithm only introducing block structure.WBDL is the algorithm adding weight vector and block structure, simultaneously.The obtained Fisher values are listed in Table 1.The results show that all weight vector and block structure have improved discriminant performance of dictionary learning; WBDL algorithm is more discriminative for dictionary learning.Particularly, WBDL local classification algorithm is more competitive than WBDL global classification algorithm.Local classification algorithm can improve Fisher value 0.09 better than global classification algorithm.Just because local classification algorithm fully takes advantage of weight vector but global classification algorithm does not, global classification algorithm can be taken as simplicity of local classification algorithm.

Face Recognition on AR.
Face recognition is a popular application of computer vision and pattern recognition in recent years.In this section, WBDL algorithm is evaluated through face recognition task on AR face database [18].As shown in Figure 3, we show 10 face images of different two subjects.These images in AR database include much more facial variations, including expression, illumination, and facial disguises (sunglasses and scarves).AR database consists of over 4,000 color images of 126 persons, and each person has 26 face images.A subset consisting of 2600 images from 50 male subjects and 50 female subjects was used in this experiment.For each person, we randomly selected 20 images for training and the remaining 6 images for testing.The average of the results on six such random splits of training and testing images is taken as the final results.
In all experiments, AR face image ∈ R 192 × 168 was projected into a vector ∈ R 540 with Randomface [3].The learned proto dictionary had 500 atoms, 5 items per block.The regularization parameter  was set to 0.03.WBDL algorithm is compared with several recently proposed algorithms including SRC [3], KSVD [2], D-KSVD [20], and LC-KSVD [23].Recognition results are summarized in Table 2.As shown in Table 2, WBDL global classification algorithm In addition, when block structure and weight vector were introduced, we recorded the decrease of objective function values in (8) with varied number of iterations.Figure 4 displays the values of objective function with different number of iterations.As shown in Figure 4, after 10 runs, the objective function values decrease very lowly, so WBDL algorithm converges much faster.

Face Recognition on Extended Yale B.
In this section, we evaluate WBDL algorithm with existing dictionary learning methods on Extended Yale B face database [35].Extended Yale B database consists of 2,432 cropped frontal face images of 38 individuals.For each person, there are 64 face images that are captured under various lighting conditions.As shown in Figure 5, we show 12 face images of different two subjects.The key challenge of this database is due to varying illumination and expression.Since the dimension 192 × 168 of original face images is large, we reduce dimension of images to  = 132 using Randomface [3].To compare proposed algorithm with other methods, we randomly chose the half for training and the rest for testing for each subject in all experiments.For simplicity of analysis, we learned 38 dictionary blocks.Assuming that all proto dictionary blocks have the same number of atoms, we learned  ∈ {9, 18, 25, 32} atoms for each block.Regularization parameter  was set to 0.06, 0.07, 0.07, and 0.08 for each block size.Test samples were globally coded and locally coded, separately.Experiments were repeated 6 times for random split of training data and testing data; the average classification rates among all the trials were taken as the final results.
The proposed WBDL algorithm is compared with several recently proposed algorithms including SRC [3], KSVD [2], D-KSVD [20], LC-KSVD [23], Pl2/1 [32], and SVGDL [29].Recognition results are presented in Figure 6.As the results showed, WBDL algorithm always outperforms other methods, especially when dictionary size is small.When dictionary size is bigger, for example, block size being 32, classification accuracies of these learned dictionaries (K-SVD, D-KSVD, LC-KSVD, and SVGDL) do not excess accuracies of those constructed dictionaries (SRC, Pl2/l1), but the classification accuracies of our two WBDL classification algorithms are far in excess of 3% and 3.6% compared with SRC.

Object Classification on Caltech101.
Caltech101 database [36] contains 101 object classes and a "background" class with high shape variability.The number of images per category varies from 31 to 800.Most images are medium resolution of about 300 × 300 pixels.As shown in Figure 7, we show 15 images of Caltech101 database; those images come from 15 different categories.We firstly extracted SIFT [37] descriptors from 16 × 16 patches which were densely sampled using a grid with step size of 6 pixels.Secondly, we extracted the spatial pyramid features based on SIFT features with three grids of sizes 1 × 1, 2×2, and 4×4 in each spatial subregion of the spatial pyramid.Thirdly, the features were pooled together to form pooled features 128 × 21.Max pooling and  1 normalization were used for pooling and normalization, respectively, which were evaluated in [38] being superior to other pooling and normalization methods.Fourthly, we trained the codebook for spatial pyramid features using standard -means clustering with  = 1024; then the spatial pyramid features were reduced to 3000 dimensions from 1024 × 21 dimensions by PCA.Finally, we trained class dictionary and learned classifier on the final spatial pyramid features using WBDL algorithm.Following the common experimental settings, we trained on 5, 10, 15, 20, 25, and 30 samples per category and tested on the rest, and the test samples were globally coded and locally coded, separately.We repeated experiments 6 times with different random splits of training and testing images; the average results of each run were reported as final recognition rates.The sparsity controlling  used in all the experiments is 0.06.The results compared with the popular ScSPM [38], SRC [3], K-SVD [2], D-KSVD [20], LC-KSVD [23], FDDL [5], and SVGDL [29] algorithms are listed in Table 3.As shown in Table 3, WBDL algorithm maintains the highest classification accuracies when we trained on 5, 10, 20, 25, and 30 samples per category.SVGDL algorithm obtains the highest classification accuracy when 15 samples per category were selected to train the dictionary.In general, WBDL algorithm maintains the higher classification accuracies.
We also compare classification accuracy with SRC [3], K-SVD [2], D-KSVD [20], LC-KSVD [23], and SVGDL [29] using different dictionary sizes K = 510, 1020, 1530, 2040, 2550, and 3060 when we randomly select 30 images per category as training data.As shown in Figure 8, WBDL algorithm maintains the highest classification accuracy in all dictionary  size compared with other six methods.Experiment results in Table 3 demonstrate the increased classification accuracy while adding the numbers of training samples.The experiment results in Figure 8 describe the increased classification accuracy while adding the size of dictionary.Our results in Figure 8 maintain the highest accuracy in all size, which are better than the results of SVGDL algorithm.Since WBDL algorithm is more sensitive to size of dictionary block SVGDL algorithm is more sensitive to numbers of training samples.Compared to other six dictionary learning algorithms, when dictionary size is small, for example, 510, WBDL algorithm and SVGDL algorithm all have improvement in classification accuracy.weight vector, the dictionary learned by WBDL algorithm is more discriminative and compact.

Figure 1 :
Figure 1: (a) Learning of proto dictionary and class dictionary.(b) Classification of a new test sample .
(a) and the weighted sparse coefficients of the same testing images in WBDL model are shown in Figure 2(b).As shown in Figures 2(a) and 2(b), compared to BDL model, weighted sparse coefficients in WBDL model are more compact and discriminant.

Figure 2 :
Figure 2: Performance comparisons of sparse coefficients from BDL model and from WBDL model.We selected 200 images from 10 random persons in AR database [18] for training and the remaining 60 images for testing.Proto dictionary contains 10 blocks; each block contains 5 atoms.The sparse coefficients of 60 testing samples in BDL model are shown in (a) and the weighted sparse coefficients of the same 60 testing samples in WBDL model are shown in (b).

Figure 5 :
Figure 5: Example images from Extended Yale B database.

Figure 6 :
Figure 6: Performance comparisons of recognition accuracies on Extended Yale B database with varying block size.
making  overcomplete) is a dictionary with  atoms. = [ 1 ,  2 , . . .,   ] ∈ R × are  sparse codes of input signals .  is a constant which controls the number of nonzero elements in   less than .
}, is denoted as   = diag(  → ).diag(  →   ) is a diagonal matrix with vector  →   as its diagonal vector.In order to represent weight value of each atom, the size of diagonal matrix diag(  →   ) is  × , so the weight vector   must be resized from  to  1 +  2 + ⋅ ⋅ ⋅ +   = , where   is the number of atoms in th block.For example, when the number of atoms in each block is 2, a weight vector should be resized from [1, 0, 0, 0]  to [1, 1, 0, 0, 0, 0, 0, 0]  .Finally, a sparse representation to encode data on the corresponding class dictionary is obtained.Take the th class data   as an example; the th class data can be represented as   =     .
is the th class samples,   is the th block of proto dictionary, and  , is the th block sparse coefficient corresponding to proto dictionary block   .Observing the sparse coefficients obtained by block sparse representation, a phenomenon is found.When   is more similar to the th class samples,  , should be larger than  , ( ̸ = ,  = 1, 2, . . ., ); we find that the value of  , is larger than the other coefficient block  , ( ̸ = ,  = 1, 2, . . ., ).Inspired by this observation, weight value  , can be calculated using value of sparse coefficient  , .Consistent with Frobenius norm of reconstruction error, Frobenius norm of  , is utilized to compute weight.Thus, our objective function can be rewritten as follows: .
For normalized proto dictionary  and weighted matrix , we can obtain a sparse code for a new test sample  by two coding strategies: local coding and global coding.When training samples for each class are enough, a test sample  is locally coded into all class dictionaries, respectively.Taking the th class local code as an example, we have the following: −   ẑ      2 2 ,  = 1, 2, . . ., .

Table 1 :
The values for Fisher criterion in simulation experiment.