Integrating Globality and Locality for Robust Representation Based Classification

The representation based classificationmethod (RBCM) has shown huge potential for face recognition since it first emerged. Linear regression classification (LRC) method and collaborative representation classification (CRC) method are two well-known RBCMs. LRC and CRC exploit training samples of each class and all the training samples to represent the testing sample, respectively, and subsequently conduct classification on the basis of the representation residual. LRC method can be viewed as a “locality representation” method because it just uses the training samples of each class to represent the testing sample and it cannot embody the effectiveness of the “globality representation.” On the contrary, it seems that CRC method cannot own the benefit of locality of the general RBCM. Thus we propose to integrate CRC and LRC to perform more robust representation based classification. The experimental results on benchmark face databases substantially demonstrate that the proposed method achieves high classification accuracy.


Introduction
Face recognition has become a popular technique as one of the most promising branches of the pattern recognition [1][2][3][4].Recently, the representation based classification method (RBCM) was proposed for face recognition by the specialists and scholars in the fields of computer vision and pattern recognition [5][6][7].The RBCM are completely different from the conventional classification methods, such as principle component analysis (PCA) [8] and kernel PCA [9], linear discriminant analysis (LDA) [10], kernel LDA [11], and Gaussian maximum likelihood [12].The common procedure of these traditional methods is to first utilize all the training samples to obtain the transform axes and then exploit the predetermined transform axes to convert all the samples into a new lower-dimensional space to perform recognition [4,[8][9][10][11][12].However, the RBCM generally uses the linear combination of the training samples to represent the test sample and then utilizes the representation results to classify the test sample.The RBCM has been widely applied to the face recognition [13][14][15], image categorization [16,17], and image superresolution [18,19].
Generally speaking, the representation based classification consists of two categories, the global representation method and the local representation method.Hereafter, the global representation method refers to use all the training samples to represent the test sample, whereas the local representation method just uses the fraction of the training samples to represent the test sample.What is more, from the viewpoint of the image restoration, the linear combination of training samples can be viewed as a new image with the same size of the samples.If the linear combination of training samples from a class can perfectly reconstruct the probe image, the probe image can be classified as a member of this class.The corresponding residual can be regarded as the distance between the test sample and each class.Both of the global and local representation methods make use of the residual to perform final classification.
Since the sparse representation based classification (SRC) method [5] was proposed, the sparse representation based Thus abundant research communities convert their attention to develop RBCM with loose sparse restriction,  2 -norm minimization constraint.We briefly review some remarkable literature resources in the following.Zhang et al. [30] presented an efficient representation method called collaborative representation classification (CRC) method.CRC method introduces  2 -norm minimization, rather than  1norm minimization, to acquire limited-sparse representation solution.Furthermore, it is confirmed that the computational efficiency of CRC is extremely higher than SRC on the basis of analogical classification accuracy [30].Xu et al. [4] put forward a two-step sparse representation method which first utilized CRC method to figure out  nearest samples and then implemented CRC on the predetermined M samples to make final classification.Liu et al. [31] used reconstruction error and normalized distance of RBCM to perform palmprint recognition method.Xu et al. [32] applied the model of RBCM to improve the nearest neighbor classifier.Yang et al. [26] presented a relaxed collaborative representation classification using the similar and distinctive features.Xu et al. [33] proposed to utilize the bimodal biometric traits to perform sparse representation method for biometrics and palmprint classification.The literature [25,26,[30][31][32][33][34] also certifies the robustness and efficiency of CRC method in contrast to SRC.Furthermore, CRC also endows with the characteristic of the limited sparsity [30,34].
Local recognition methods have been proposed to improve the classification accuracy and have extraordinary priority in comparison with the global recognition methods [35][36][37].The local recognition methods only exploit part of the training samples rather than the whole of training samples to obtain intermediate result and then use the result to make classification.For example, Xu et al. [38] proposed a novel and efficient solution scheme for locality preserving projection (LPP) method and this method can effectively conquer the small training sample size problem.Sugiyama [39] proposed a local LDA method that combined the advantages of LDA and the preponderance of LPP method to perform supervised dimensionality reduction.Liu et al. [40] demonstrated that the local principle component analysis (PCA) had great superiorities compared to the global PCA.In order to obtain better face recognition results, local RBCM method recently was proposed to exploit each class of the training samples to represent the test sample and classify the test sample based on the representation residual.Linear regression classification (LRC) method was proposed to utilize each class of the training samples to represent the test sample and classify the test sample into the class which generates the minimum representation residual [35].Naseem et al. [36] developed a robust regression method using a linear combination of specific class of training sample to represent the test sample.Tahir et al. [37] fused the histogram and face features using LRC classifier to constitute a texture descriptor, named multiscale local phase quantization histogram (MLPQH) feature, to solve the illumination problem for face recognition.The literature [35][36][37] certified the significance of locality in RBCM and also demonstrated that the local representation methods could greatly lower the recognition error rate.
In order to obtain the bilateral dominant characteristics of the global and local RBCM, we integrate CRC and LRC together to perform face recognition.CRC method possesses the property of limited sparsity and can properly harness the nature of low-complex sparsity to perform global representation based classification.The LRC method is a local representation method and is able to employ the local structure of patterns to perform classification.Section 4 verifies the rationality of the proposed method and the substantial experiments in Section 5 also confirm its effectiveness.
The rest of this paper is organized as follows: we first briefly review the LRC and CRC methods in Section 2. In Section 3, we present the proposed optimizing method.In Section 4, we discuss the potential rationale of the proposed method.In Section 5, the proposed method is verified by conducting extensive experiments on several well-known benchmark face databases.Finally, Section 6 concludes the paper.Linear regression classification (LRC) method [35] first exploits a linear combination of each class of training samples to represent the testing sample ; that is, the following equation is satisfied:

Related Work
where  1 ,  2 , . . .,    are the th class of the training samples and   ( = 1, 2, . . ., ) is the coefficient of   .For convenient presentation, we can rewrite (1) into the following equation: where We refer to ỹ =     as the representation result of the th class of the training samples using preobtained   and we can transform ỹ into a two-dimension image with the same size as the original sample image.
Finally, we calculate the representation residuals of each class of the training samples and then exploit the representation residuals to conduct classification.The residual of representing the testing sample, , using the training samples of the th class is We note     as the representation contribution using the training samples from the th class and the RBCM presumes that the test sample, , can be adequately represented by only the training samples from the same class [41].Thus, a smaller representation residual can reflect a larger representation contribution, and then the minimum representation residual, resi(), means the nearest distance between the testing sample and the th class.Consequently, we classify the testing sample, , into the class which achieves the smallest representation residual.

Collaborative Representation Classification Algorithm.
Collaborative representation classification (CRC) [30] method first utilizes all the training samples to represent the test sample and exploits the linear combination of the whole of training samples to approximate the test sample.That is, the following approximation is satisfied: or where  is the testing sample.The   ( = 1, 2, . . ., ) is the coefficient of   and  = [ 1  2 ⋅ ⋅ ⋅   ]  .The solution of  can be calculated by solving  2 -norm minimization problem: where  is a small positive constant and ( 8) is to utilize the Lagrangian Method to obtain the solution of (7).We solve (8) by Â = (   + ) −1   , where  is the identity matrix.Similarly, we evaluate the representation contributions and representation residuals of each class of the training samples using where ∑ × =(−1)×+1   Ã and devi() are viewed as the representation contribution and the representation residuals of the training samples from the th class, respectively, and then classify the testing sample, , into the class which leads to the minimum representation residual.

The Proposed Method
This section describes the proposed method of integrating CRC and LRC in detail.As presented above, (1) just uses training samples of each class to represent the test sample, whereas ( 5) uses all the training samples to represent the test sample.
The literature [30,34] has proved that CRC method possesses limited-sparse representation capability to perform an excellent RBCM for face recognition.The literature [35,36] also has shown the LRC method owns the ability to address the challenges of varying facial expressions, illumination, and occlusion.Thus, the motivation of the proposed method is to combine LRC and CRC to make robust and effective face recognition.
Therefore  10) is a joint optimization problem of CRC in (8) and LRC in (2).The main steps of the proposed method to solve (10) are described as follows.
Step 1. Normalizing all the samples including the training samples and test samples to unit vectors of length 1 using   =   /‖  ‖ and   =   /‖  ‖.
Step 2. Using CRC method by solving (8) to obtain each class of representation residual, that is, applying (9) to compute the representation residual of the th class, devi().
Step 3. Using LRC method by solving (2) to obtain each class of the representation residual, that is, utilizing (4) to calculate the representation residual of the th class, resi().
Step 4. Normalizing representation residuals, devi and resi.For test sample , devi and resi are normalized by where devi max and resi max denote the maximum representation residual of devi and resi, respectively.devi min and resi min are the minimum representation residual of devi and resi.
Step 5. Integrating two different residuals obtained by CRC and LRC.For the test sample , we calculate the fusing score of the test sample with respect to the training samples of th class using where  is the weight of the representation residual.
Step 6. Outputting the identity of the test sample  as We believe that a lower representation residual can lead to a more accurate representation and we assign the test sample to the class with the smallest final representation residual.

Potential Rationale of the Proposed Method
In this section, we analyze our method for exploring its underlying characteristics.We will discuss the effectiveness of our method from the following three aspects: the significance of the proposed method, the advantages of the local representation based classification, and the superiorities of the global representation based classification.
CRC method and LRC method use different ideas of ideologies to make representation based classification and the correlation coefficients can reflect the difference between the two methods.Figure 1 illustrates the correlation coefficients of these two methods.We treated the first three samples of each subject in ORL database [42] as the training samples and regarded the rest of the samples as test samples and then calculated the representation residual of each class using LRC and CRC, respectively.And the representation residuals from all the classes for each testing sample can form a residual vector, and then we compute the correlation coefficients of residual vectors obtained by CRC and LRC.From Figure 1, we can see that the correlation coefficients are from 0.2148 to 0.7765.That means CRC and LRC have weak relationships with each other.Thus, combining the two methods is meaningful and significant.For example, we conducted an experiment on ORL database and used the first sample from each subject as training sample and the rest of the samples as test samples.The testing sample shown in Figure 2(a) from the 21th class is classified into the 36th class, Figure 2(b), using CRC, but LRC assigns the testing sample to the 19th class, Figure 2(c).However, our method can make a correct classification into the 21th class, Figure 2(d).Figure 3 presents the representation residuals using CRC, LRC, and the proposed method, respectively.Moreover, the experimental results in Section 5 also verify that it is practical to exploit the proposed method to make effective classification.Representation based classification method (RBCM) usually supposes that only the training samples from the same class can sufficiently represent the test sample [41].However, if one subject extremely resembles another subject, the above assumption may not stand.For example, we performed an experiment on FERET face database [43] using the first sample of each subject as training sample and the remaining samples for testing samples.Figure 4 shows that for the test sample wrongly and accurately classified by CRC and LRC method, our method certainly can assign the test sample to the right class.Figure 5 presents the coefficients of the representation solution, and it is apparent that the largest two coefficients are the coefficients of training samples from the 110th class and the 72th class.However, CRC assigns the test sample to the 72th class that is the nearest class.On the contrary, LRC and our method can exactly recognize the test sample.Thus, the local representation based classification method can compensate for the imperfection of the global representation based recognition method.From the viewpoint of the proposed method, our method can sufficiently implement the two typical characteristics of the local representation based classification, LRC, and the global representation based classification method, CRC.
It is obvious that global representation based classification method, which uses all the training samples to represent the test sample, can alleviate the difficulty of the small sample size problem.For example, if we just use one sample of each subject to make classification by exploiting LRC method, LRC will deteriorate to the nearest neighbor (NN) classifier and also lose the preponderance of RBCM.In that regard, score fusion of CRC and LRC is meaningful and can obtain better result.CRC and SRC are the typical examples of global RBCMs and LRC is one of the typical local RBCMs.However, SRC has very high computation complexity and CRC owns the superiority that CRC is extremely faster than SRC with very competitive face recognition accuracy.What is more, CRC and LRC will be almost not able to identify the class to which the test sample truly belongs and will erroneously classify the test sample.However, the proposed method can compensate for the imperfection and shortcomings of each other.Moreover, the correlation coefficients between these two methods are very low.Thus, integrating CRC and LRC can take full advantages of these two methods.At this stage, we can demonstrate that the framework of integrating the advantages of global RBCM and the superiorities of local RBCM is reasonable and meaningful.

Experimental Results
In this section, we conducted extensive experiments to evaluate the effectiveness and robustness of the proposed method on different face databases including ORL, AR, CMU PIE, and FERET database.In order to present the performance of the proposed method, several competitive face recognition methods are tested as comparison, such as CRC [30], LRC [35], LDA, improved nearest neighbor classification (INNC) method [32], and the sparse representation based classification (SRC) algorithm proposed in [29].Moreover, we also performed experiments on corrupted face databases and conducted contiguous occlusion experiments.The parameter of the proposed method is set to  = 0.01 for all the experiments.enumerates the variation of the classification error rates of different solutions.And our method with different values of  also verifies that the proposed method outperforms the other classification methods.For example, when we took the first four face images of each subject as training samples and the rest of the face images as test samples, the classification error rate of our method can greatly outperforms the CRC, LRC, INNC, LDA, and SRC algorithms by a margin of 2.5 percent, 5.42 percent, 9.17 percent, 6.25 percent, and 4.17 percent, respectively.What is more, it is clear that the proposed method can dramatically improve CRC and LRC classification methods and it also reflects the effectiveness of the framework of integrating the global RBCM and local RBCM.

Experiments on the AR Face Database.
The AR face database [44] is composed of more than 3000 face images of 126 subjects and 26 facial images for each subject were taken in two separated terms and the resolution of each AR image has been resized to a 40 by 50 image matrix.This face database has much more variations, such as, illumination conditions, facial expressions, and facial disguises.We treated the first 1, 2, . . ., 9 face images from each subject as the original training samples and regarded the remaining samples as test samples, respectively.Figure 7 has shown some face images for four subjects from the AR database.Table 2 shows the classification error rates using different classification methods.It is obvious that our method has remarkable priorities compared to the other classification algorithms.For example, when we took the first face image of each subject as training samples and the rest of the face images as test samples, the classification error rates of our method with  = 0.90,  = 0.75,  = 0.65, CRC, LRC, INNC, and SRC are 29.06%,28.90%, 28.87%, 30.17%, 38.07%, 30.17%, and 31.20%,respectively.That means that 9.2 percent recognition errors can be avoided by using our method instead of LRC.We can see that combining CRC and LRC is a favorable idea to perform face recognition.

Experiments on FERET Face Database.
The FERET database is one of the largest publicly available facial images databases.We just selected a subset made up of 1400 images from 200 individuals and each one contains seven images to conduct experiments [43].The images were collected in a semicontrolled environment.We resized each image to a 40 by 40 image matrix using the same downsampling algorithm.For some individuals, there were over two years between their first and last time to be photographed.Figure 8 presents some cropped face images from the FERET face database.The parameter of our method is set to  = 1.0 for this experiment.
We perform an experiment on FERET face database using the first six samples of each subject as training sample and the remaining samples for testing.The classification error rates exploiting our method, CRC, LRC, INNC, LDA, and SRC are 15.00%, 25.50%, 21.50%, 25.50%, 41.50%, and 18.50%,   respectively.In other words, this suggests that the proposed method can greatly outperform the CRC, LRC INNC, LDA, and SRC algorithms by a margin of 10.5 percent, 6.5 percent, 10.5 percent, 26.5 percent, and 2.5 percent, respectively.
The first image of each subject was used for training sample, while the remaining six were served as test samples.Also Figure 9 shows the detailed experimental results compared to other classification methods for a large variation of feature dimensions.From Figure 9, we can see that our method achieves the lowest classification error rate compared to the other classification methods although the dimensions of the samples are extremely low.The classification error rates under different feature dimensions suggest that the proposed method is superior to other classifiers.

Experiments on CMU PIE Database.
We chose a subset of the CMU PIE database to perform experiments.For each subject in this face database, we randomly selected 21 face images under different lighting conditions and expressions including 1428 images of 68 subjects for experiments [45].The size of each image is cropped to 32 by 32 pixels.We just took the first 1, 2, and 3 face images of each subject as the training samples and the rest of the samples as test sample.The experimental results are shown in Table 3 and we can draw a conclusion that the proposed method can achieve the  minimum classification error rate.For example, when the number of training sample is one, the classification error rates using our method with  = 1.0,  = 0.85,  = 0.70, CRC, LRC, INNC, and SRC are 10.80%, 11.17%, 11.32%, 16.10%, 13.82%, 16.10%, and 16.10%, respectively.Thus the proposed method can extensively improve the classification accuracy compared to CRC and LRC methods and our method obtains the lowest classification error rates.

Recognition on Random Pixel Corruption Databases.
We performed experiments on random pixel corrupted face images to test the robustness of the methods.The datasets of training samples and the test samples are the same as those in Sections 5.1 and 5.2, respectively.In order to obtain random pixel corruption face images, we add Gaussian white noise of zero mean and variance of 0.01 to the ORL database and SaltPepper noise of density of 0.1 to AR database, respectively.Figure 10 shows some corrupted face images on ORL database and AR database.Table 4 and Figure 11 show the classification error rate results on ORL database and AR database, respectively.The LDA algorithm is not available when the number of the training sample for each class is one, so we set the classification error rate exploiting the LDA algorithm to 0 for corrupted AR database in Figure 11.We see again that our method possesses the absolute dominance compared to all the other methods.For example, when each subject provides 3 corrupted ORL face images as training samples, the classification error rates using our method with  = 0.65, CRC, LRC, INNC, LDA, and SRC are 15.00%, 17.14%, 22.86%, 21.78%, 29.64%, and 16.79%, respectively.Thus the experimental results also demonstrate the robustness and effectiveness of the proposed method.

Conclusion
The proposed method combines CRC and LRC methods to perform effective and robust face recognition.The significance of our method is to sufficiently exploit the globality of CRC and the locality of LRC, respectively.The potential rationale of this method is that integrating LRC and CRC is meaningful because they have the low correlation coefficients showed in Figure 1.Moreover, it is remarkable that our method obtains a promising performance on face databases with random pixel corruption, and then we can confirm the good efficiency of the proposed method under complex conditions.
All the experiments in Section 5 can testify that the framework of integrating the globality and locality of training samples can tremendously improve the classification accuracy.CRC, as one of the typical global RBCMs, can completely utilize the globality of all the training samples, whereas LRC, as one of the typical local RBCMs, can make the most of the locality of the training samples.A large number of experiments also demonstrate that the proposed framework is reasonable and practical and it can extensively improve the classification accuracy compared to the stateof-art classification methods on different face databases.Furthermore, the proposed method is simple and computationally efficient.Besides the method description and the experimental analysis, the rationale of our method also has been explicitly interpreted and the superiority of our method also has been visually presented in this paper.

Figure 1 :
Figure 1: The correlation coefficient of CRC and LRC.We exploit the first three samples of each subject as training sample on the ORL database and the remaining samples as test samples.And the figure illustrates the correlation coefficient of representation residuals of each testing sample using CRC and LRC.

Figure 2 :Figure 3 :
Figure 2: A test sample that is erroneously and correctly classified by CRC, LRC, and the proposed method, respectively.(a) The test sample.(b), (c) One sample from the class wrongly classified by CRC and LRC.(d) One sample from the class correctly classified by our method.

Figure 4 :Figure 5 :
Figure 4: A test sample that is falsely and accurately classified by CRC and LRC.(a) A test sample from the 110th class.(b) A sample from the 72th class which is the closest to the test sample.(c) One sample from the 110th class precisely classified by LRC method and the proposed method.

Figure 6 :
Figure 6: Some face images from the ORL face database.

Figure 7 :
Figure 7: Some face images from the AR face database.

Figure 8 :
Figure 8: Some face images from the FERET face database.

Figure 9 :
Figure 9: The recognition error rates of different method under different feature dimensions.We treat the first image of each subject as training sample from FERET face database and the remaining images are used for test samples.The figure shows the detailed experimental results compared to other classification methods for a large variation of feature dimensions.

Figure 10 :Figure 11 :
Figure 10: Some corrupted face images in ORL database and AR database.The first two rows show the corrupted face images used in the ORL database.The last two rows show the corrupted face images used in the AR face database.
= [ 1 ,  2 , . . .,    ] ∈  ×  where  is the dimension of the sample and   is the number of training samples of the th class.Let us note a matrix  = [ 1 ,  2 , . . .,   ] ∈  × and, apparently, the matrix  denotes the whole of training samples.

Table 1 :
Rates of classification errors of different methods on the ORL face database.

Table 2 :
Rates of classification errors of different methods on the AR face database.

Table 3 :
Rates of classification errors of different methods on the illumination variation subset of CMU PIE face database.

Table 4 :
Rates of classification errors of different methods on corrupted ORL face database.