Patch Based Collaborative Representation with Gabor Feature and Measurement Matrix for Face Recognition

1Department of Medical Engineering, Wannan Medical College, Wuhu 241002, China 2Department of Biomedical Engineering, Hefei University of Technology, Hefei 230009, China 3School of Medical Information and Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu 241002, China 4Anhui Province Key Laboratory of Active Biological Macro-Molecules Research, Central Laboratory of Wannan Medical College, Wuhu 241002, China 5Department of Electronic Science and Technology, University of Science and Technology of China, Hefei 230027, China


Introduction
Face recognition (FR) is one of the most classical and challenging problems in pattern classification, computer vision, and machine learning [1].Although face recognition technology has made a series of achievements, it still confronts many challenges caused by the variations of illumination, pose, facial expression, and noise in real-world [2,3].In real applications, the small sample size problem of FR is also a more difficult issue due to the limitations in availability of training samples.
In terms of classification schemes, several widespread pattern classification methods are used in FR.Generally, there are two types of pattern classification methods [4,5]: parametric methods and nonparametric methods.Parametric methods such as support vector machine (SVM) [6,7] center on how to learn the parameters of a hypothesis classification model from the training samples and then use them to identify the class labels of test samples.In contrast, the nonparametric methods, such as nearest neighbor (NN) [8] and nearest subspace (NS) [9], use the training samples directly to identify the class labels of test samples.Recent works have revealed an advantage by the nonparametric methods over the parametric methods [4,10,11].The distance based classifiers are widely used in nonparametric methods for FR [11], such as the nearest subspace classifier (NSC) [12].A key issue in distance based nonparametric classifiers is how to represent the test sample [4].Recently, Wright et al. pioneered by using the SRC for robust FR [13].First the test sample was sparsely coded by the training samples; then the class labels of test samples were identified by choosing which class yields the smallest coding error.Although sparse representation related methods [13,14] achieved a great success in FR, those methods focus on the role of the  1norm while ignoring the role of CR [15], which uses all classes of training samples to represent the target test sample.In this study [16], Zhu et al. argued that both SRC and collaborative representation based classifier (CRC) suffer serious performance degradation when the training sample size is very small, because the test sample cannot be well represented.In order to solve the small sample size problem, they proposed to conduct CRC on the patch and named it the patch based CRC (PCRC).
The PCRC and some related works [17,18] have demonstrated their effects on small sample size problem of FR; however, some key issues remain to be further optimized.On one hand, all the PCRC related works used the original face feature, but the original feature cannot effectively handle the variations of illumination, pose, facial expression, and noise [19].On the other hand, the data redundancy existing in these methods leads to poor performance in classification accuracy and computational cost.The face feature problem has been noticed by some recent works in which an efficient and effective image representation has been proposed by using local and holistic features.The Eigenface [20,21], Randomface [22], and Fisherface [21] are all classical holistic features [23], but some other works argued that those holistic features can be easily affected by variables such as illumination, pose, facial expression, and noise.Therefore, they introduced some local features such as LBP [24] and Gabor filter [25,26].Gabor filter has been successfully and widely used in FR [26,27].Gabor feature could effectively extract the face local features at multiple scales and multiple directions; however, this may lead to a sharp rise in data volumes [28].The key to solving the large data volume problem is dimensionality reduction.Numerous dimensionality reduction methods have been put forward to find projections that better separate the classes in lowdimensional spaces, among which linear subspace analysis method has received more and more attention owing to its good properties, including principal component analysis (PCA) [28], linear discriminant analysis (LDA) [29], and independent component correlation algorithm (ICA) [30].A lot of works showed that PCA has the optimal performance in FR [31][32][33].
In this paper, we first attempted to alleviate the influence of unreliability environment on small sample size in FR; therefore, we proposed to use Gabor feature and applied it to PCRC.Then to improve the computational efficiency of GPCRC, we proposed to use PCA for dimension reduction and then use the measurement matrix, including Random Gaussian matrices [34], Toplitz and cyclic matrices [35], and Deterministic sparse Toeplitz matrices (Δ = 4, 8, 16) [36], to reduce the dimension of the transformed signal and used the low-dimensional data to accurately represent the face.The experimental results showed that the GPCRC and its improved methods are effective.
Section 2 briefly reviewed SRC and CRC.Section 3 described the proposed GPCRC and its improved methods.Section 4 illustrated the experiments and the results.And Section 5 concluded the paper.

Sparse Representation Based Classification.
Recently, SRC was first reported by Wright et al. [13] for robust FR.In SRC, let x  ∈ R × denote the -th face dataset, and each column of x  is a sample of the face of the -th individual.Assuming there are  classes of face samples, let X = [x 1 , x 2 , . . ., x  ].When identifying a target face test sample, y ≈ Xα is used for coding, α = [α 1 , α2 , . . ., α ], and the coefficient vector α is the encoding of the -th individual sample.If y is from the -th class, then y ≈ x  α is the best reservation, which means that a large number of coefficients are close to zero in α ( ̸ = ); only α remains intact.Thus, the classification (ID) of the target face test sample y can be decoded by the sparse nonzero coefficient in α.
The SRC methods [13] are summarized as follows.
Step 1.Given -class face training sample X and test sample y.
Step 2 (dimension adjustment).X, y are projected onto the corresponding low-dimensional feature space X, ŷ using the traditional dimensionality (PCA) reduction technique.
Step 5. Compute the residuals to identify the following: where  is a regularization parameter.Equation (1) has dual roles, first of which is stabilizing the least-squares method, and, secondly, it proposes a "sparsity" which is much weaker than  1 -norm to solve ρ.The CR for the regularized leastsquares method of (1) can be solved as follows: Let P = (X  X +  ⋅ I)  associated with class ; the  2 -norm "sparsity" ‖ρ  ‖ 2 also contains abundant information for classification.The CRC method [15] is summarized as follows.
Step 1. Give  classes of face training sample X and test sample y.
Step 2 (reduce the dimension).Use PCA to reduce X and y to low-dimensional feature space and obtain X and ŷ.
Step 3 (normalization).Normalize the columns of X and ŷ using the unit  2 -norm.

Patch Based Collaborative Representation.
In the equation of sparse representation and collaborative representation, it can be seen that if the linear system determined by the training dictionary X is underdetermined, the linear representation of the target test sample y over X can be very accurate, but in reality available samples in each target are limited; the sparse representation and the cooperative representation method may fail because the linear representation of the target test sample may not be accurate enough.In order to alleviate this problem, Zhu et al. proposed a PCRC method for FR [16], as shown in Figure 1, the target face test sample y is divided into a set of overlapping face image patches y 1 , y 2 , . . ., y  according to the patch size.Each of the divided face image patches y  is collaboratively represented on the local dictionary M  at the corresponding position of the patch y  extracted from X.In this case, since the linear system determined by the local dictionary M  is often underdetermined, the patch based representation is more accurate than the overall face image representation.

Patch Based Collaborative Representation
Using Gabor Feature and Measurement Matrix for Face Recognition 3.1.Gabor Feature.Gabor feature has been widely used in FR because of its robustness in illumination, expression, and pose compared to holistic feature.Yang and Zhang have applied multiscale and multidirectional Gabor dictionary to the SRC for FR [19], which further improves the robustness of the algorithm.Inspired by previous works, in this paper, we integrate Gabor feature into the PCRC framework to improve its robustness.
A Gabor filter with multidirection  and multiscale ] is defined as follows [25,26]: where the coordinates of the pixel are z = (, ),  max is the maximum frequency, and the interval factor of the kernel function distance is denoted as .The bandwidth of the filter is determined by .The convolution of the target image Img and the wavelet kernel Ψ ,] is expressed as where M ,] ( = 0, 1, . . ., 7, ] = 0, 1, . . ., 4) is the amplitude of Gabor,  ,] () is the phase of Gabor, and the local energy change in the image is expressed by amplitude information.
Because the Gabor phase changes periodically with the space position and the amplitude is relatively smooth and stable [19,25,26], only the magnitude of Gabor was used in this paper, such as Figure 2.

Measurement Matrix.
Unfortunately, although Gabor feature can be used to enhance the robustness of face image representation, it brings higher dimensions to the training sets than holistic feature does.In other words, the computation cost and computation time are increased.In order to solve the problem caused by higher dimension of the training sets, a further dimension reduction is necessary.We proposed to use PCA [31][32][33] for our method.The steps of PCA are as follows: assuming  sample images x  ∈ R  ,  = 1, 2, . . ., .Firstly, normalize each sample (subtract the mean, and then divide the variance), convert vector  *  , which accord with normal distribution (0, 1).Secondly, compute the eigenvectors of covariance matrix  =   ,  = [ * 1 ,  * 2 , . . .,  *  ]:  = ,  is the eigenvector corresponding to the eigenvalue .Thirdly, the eigenvector is sorted according to the size of the eigenvalues; the first  eigenvectors are extracted to form a linear transformation matrix  PCA ∈ R × .We can use   =   PCA   to reduce dimension.But, if  ≫ , the dimension of   ∈ R × would be very large.In order to solve this problem, singular value decomposition is usually used.The eigenvalues In summary, we found that PCA and its related algorithms have two obvious shortcomings [37,38].Firstly, the leading eigenvectors encode mostly illumination and expression, rather than discriminating information.Secondly, in the actual calculation, the amount of calculation is very large and will fail in small samples.
Inspired by the method of using random face for feature extraction illustrated in this literature [38], we used the Random Gaussian matrices Φ ∈ R × ( ≪ ) as a measurement matrix to measure face images.The measurement matrix Φ is used to measure the redundant dictionary X to obtain  = ΦX ∈ R × , where X = [ 1 ,  2 , . . .,   ] ∈ R × .In Figure 3(a), for any test image , measurements  were obtained by  = Φ ∈ R ×1 .In essence, utilizing the measurement matrix to reduce the dimension of the image is different from the sparse representation theory.The dimension  of the measurements  was measured by the measurement matrix Φ and was not limited by the number of training samples.
However, some literatures [35,39] suggested that the Random Gaussian matrices are uncertain and limit its practical application.And Toplitz and cyclic matrices were proposed for signal reconstruction.The Toplitz and cyclic matrices rotate the row vectors to generate all matrices.Usually, the value of the vector in Toplitz and cyclic matrices is ±1, and each element is independent of the Bernoulli distribution.Therefore, it is easy to implement hardware in practical application.Based on the above analysis, we further used Toplitz and cyclic matrices and their improved method (namely, the Deterministic sparse Toeplitz matrices [36]) for our method.Some of the relevant measurement matrices used in this paper will be described in detail.The operating mechanism of each measurement matrix is shown in Figure 3.

Random Gaussian Matrices. The format of 𝑚 × 𝑛
Random Gaussian matrices is expressed as follows [34,38]: each element  , is independently subject to a Gaussian distribution whose mean is 0 and variance is 1/.

Toplitz and Cyclic Matrices.
The concrete form of × Toplitz and cyclic matrices [35,39] is presented below: Equation ( 6) is the Toplitz matrices; the main diagonal  , =  +1,+1 are constants.If   =  + , the additional condition of ( 6) is satisfied; then it becomes a cyclic matrices, and its element follows a certain probability distribution ().

Deterministic Sparse Toeplitz Matrices.
The construction of Deterministic sparse Toeplitz matrices [36] is based on the Toplitz and cyclic matrices, which is illustrated in this paper by the example of the random spacing Toplitz matrices with an interval of Δ = 2.The independent elements in the first row and the first column of (6) constitute the vector T: Conducting random sparse spacing to (7), one can see that  contains all the independent elements in (8).Then

The Proposed Face Recognition Approach.
Although the PCRC can indeed solve the problem of small sample size, this method is still based on the original feature of the patch, and the robustness and accuracy are yet to be improved.Based on the above analysis, Gabor feature and measurement matrix are infused into the PCRC for FR, which not only solves the problem of small sample size but also enhances the robustness and efficiency of the method.Our proposed method for FR is summarized as follows.
Step 1 (input).Face training sample X and test sample y.
Step 2 (patch).Divide the  face training samples X into  patches and divide the test sample y into  patches, where y  and M  are the position corresponding to the sample patches: Step 3 (extract and measure features).( 1) Extract Gabor feature from patches of training samples and test sample, respectively, to obtain Gy  . . .

Extended Yale B.
To test the robustness of the proposed method on illumination, we used the classic Extended Yale B database [40][41][42], because faces from Extended Yale B database were acquired in different illumination conditions.The Extended Yale B database contains 38 human subjects under 9 poses and 64 illumination conditions.For obvious comparison, all frontal-face images marked with P00 were used in our experiment, and the face images size was downsampled to 32 × 32. 10, 20 faces from each individual were selected randomly as training sample; 30 others from each individual were selected as test sample.Figure 4 shows some P00 marked samples from the Extended Yale B database.The experimental results of each method are shown in Table 1.In Figure 5 we compared the recognition rates of   various methods with the space dimensions 32, 64, 128, 256, 512, and 1024 for each patch in GPCRC, and those numbers correspond to downsampling ratios of 1/320, 1/160, 1/80, 1/40, 1/20, and 1/10 respectively.In Table 1, it can be clearly seen that GPCRC achieves the highest recognition rate performance in all experiments.In Figure 6, we can see the performance of various dimensionality reduction methods for GPCRC.10 samples of each individual in Figure 5(a) were used as training samples.When the feature dimension is low (≤256), the best performance of PCA is 92.80% (256 dimension), which is significantly higher than that of the measurement matrix at 256 dimension: Random Gaussian matrices are 88.77%,Toeplitz and cyclic matrices are 85.17%,Deterministic sparse Toeplitz matrices (Δ = 4) are 86.67%,Deterministic sparse Toeplitz matrices (Δ = 8) are 85.69%, and Deterministic sparse Toeplitz matrices (Δ = 16) are 86.05%,but none of these has been as good as the performance of the original dimension (10240 dimension).So the reason why the performance of PCA is significantly better than the measurement matrix at low dimension can be analyzed through their operation mechanism: the PCA transforms the original data into a set of linearly independent representations of each dimension in a linear transformation, which can extract the principal component of the data.These principal components can represent image signals more accurately [31][32][33].The theory of compressed sensing [34][35][36] states that when the signal is sparse or compressible in a transform domain, the measurement matrix which is noncoherent with the transform matrix can be used to project transform coefficients to the low-dimensional vector, and this projection can maintain the information for signal reconstruction.The compressed sensing technology can achieve the reconstruction with high accuracy or high probability using small number of projection data.In the case of extremely low dimension, the reconstruction information is insufficient, and the original signal cannot be reconstructed accurately.So the recognition rate is not high.When the dimension is increased and the reconstructed information can more accurately reconstruct the original signal, the recognition rate will improve.When the feature dimension is higher ).However, in the proposed measurement matrix algorithm, the measurement matrix is formed by a certain deterministic distribution (e.g., Gaussian distribution) and structure, according to the required dimension and length of the signal, so that the measurement matrix demonstrates low complexity [36,48,49].The actual speed of each dimension reduction methods for one patch of all samples (including all training samples and all testing samples) is listed.Based on the above analysis, we can see that the PCA method has the most outstanding performance, but it is limited by the sample size and it is time-consuming.The measurement matrix does not perform well in low dimension, but is not limited by the sample size.When the dimension reaches a certain value, its performance was the same as that of the original dimension.Thus, the dimension of the data can be reduced without loss of recognition rate.6 shows some marked samples of the CMU PIE database.The experimental results of each method are shown in Table 3; GPCRC has the best results.In Figure 7 we compared the recognition rates of various dimension reduction methods  Through the above analysis, we further validated the feasibility of the measurement matrix in the premise of ensuring the recognition rate.

LFW.
In practice, the number of training samples is very limited, and only one or two can be obtained from individual identity documents.To simulate the actual situation, we select the LFW database.The LFW database includes frontal images of 5,749 different subjects in unconstrained environment [45].LFW-a is a version of LFW after alignment using commercial face alignment software [46].We chose 158 subjects from LFW-a; each subject contains no less than ten samples.
For each subject, we randomly choose 1 to 2 samples from each individual for training and another 5 samples from each

Conclusion
In order to alleviate the influence of unreliability environment on small sample size in FR, in this paper we applied improved method for Gabor local features to PCRC; we proposed to use the measurement matrices to reduce the dimension of the transformed signal.Several important observations can be summarized as follows.
(1) The proposed GPCPC method can effectively deal with the influence of unreliability environment in small sample FR. (2) The measurement  matrix proposed to deal with the high dimension in GPCRC method can effectively improve the computational efficiency and computational speed, and they were able to overcome the limitations of the PCA method.

Figure 2 :
Figure 2: Patch based collaborative representation using Gabor feature.

Figure 3 :
Figure 3: The operating mechanism of measurement matrix.
≥512), PCA cannot work because of the sample size.At 512 dimension, the measurement matrix (Deterministic sparse Toeplitz matrices (Δ = 8) are 95.21%;Deterministic sparse Toeplitz matrices (Δ = 16) are 94.91%) has reached the performance of the original dimension (10240 dimension), and the dimension is only 1/20 of the original dimension.At the 1/10 of the original dimension, the performance of the measurement matrix has basically achieved the performance of the original dimension: Random Gaussian matrices are 95.60%,Deterministic sparse Toeplitz matrices (Δ = 4) are 94.64%,Deterministic sparse Toeplitz matrices (Δ = 8) are 94.56%, and Deterministic sparse Toeplitz matrices (Δ = 16) are 94.98%.In Figure 6(b), 20 samples of each individual were used as training samples.When the feature dimension ≤ 512, the best performance is PCA which is 97.50% (512 dimension).At 1024 dimension, PCA cannot work and the performance of the measurement matrix has basically achieved the performance of the original dimension: Random Gaussian matrices are 98.80%, Deterministic sparse Toeplitz matrices (Δ = 4) are 98.68%, Deterministic sparse Toeplitz matrices (Δ = 8) are 99.01%, and Deterministic sparse Toeplitz matrices (Δ = 16) are 98.77%.In Table 2, we fixed the dimension of each reduction methods at 512 dimension in the case of 20 training samples.In general, the complexity of PCA is ( 3 ) [47], where  is the number of rows of the covariance matrix.The example we provided consists of 38 individuals, and each individual contains 20 samples.Since the number of Gabor features (10240) for each patch far outweighs the total number of training samples (760), the number of rows of the covariance matrix is determined by the total number of training samples, and the complexity of PCA is approximately (760 3 Deterministic sparse Toeplitz matrices (Δ = 4) Deterministic sparse Toeplitz matrices (Δ = 8) Deterministic sparse Toeplitz matrices (Δ = 16) Deterministic sparse Toeplitz matrices (Δ = 4) Deterministic sparse Toeplitz matrices (Δ = 8) Deterministic sparse Toeplitz matrices (Δ = 16) Deterministic sparse Toeplitz matrices (Δ = 4) Deterministic sparse Toeplitz matrices (Δ = 8) Deterministic sparse Toeplitz matrices (Δ = 16) Deterministic sparse Toeplitz matrices (Δ = 4) Deterministic sparse Toeplitz matrices (Δ = 8) Deterministic sparse Toeplitz matrices (Δ = 16) (d) 5 training samples

Figure 7 :
Figure 7: The performance of each dimension reduction method for GPCRC on CMU PIE database.

Figure 8 :
Figure 8: Some marked samples of the LFW database.

Figure 8
shows some marked samples of the LFW database.The experimental results of each method are shown in Table4; it can be clearly seen that the GPCRC achieves the highest recognition rate performance in all experiments with the training sample size from 1 to 2. In Figure9, we can also see the advantages of the measurement matrix in small sample size.At 1024 dimension, the performance of the measurement matrix in the single training sample is as follows: Random Gaussian matrices are 26.18%,Toeplitz and cyclic matrices are 24.68%,Deterministic sparse Toeplitz matrices (Δ = 4) are 24.43%,Deterministic sparse Toeplitz matrices (Δ = 8) are 25.85%, and Deterministic sparse Toeplitz matrices (Δ = 16) are 25.28%, and the performances of the measurement matrix in the two training samples are as follows: Random Gaussian matrices are 39.68%,Toeplitz and cyclic matrices are 37.72%, Deterministic sparse Toeplitz matrices (Δ = 4) are 37.56%, Deterministic sparse Toeplitz matrices (Δ = 8) are 39.91%, and Deterministic sparse Toeplitz matrices (Δ = 16) are 38.25%.

Figure 9 :
Figure 9: The performance of each dimension reduction method for GPCRC on LFW database.
−1 X  .Obviously, since P and y have little relevance, they can be precalculated as a projection matrix.When a target test sample y comes in to be identified, y is projected onto P by Py, thus making CR very fast.The classification by ρ is very similar to the classification by α in the SRC method.In addition to representing the classification residuals ‖y − x  ρ ‖ 2 , ρ is a coefficient vector

Table 1 :
Recognition rate (%) on Extended Yale B database.

Table 2 :
The speed of each dimension reduction methods.Some marked samples of the CMU PIE database.
for GPCRC.Similarly, we can see that PCA cannot achieve the best recognition rate because of the sample size limit, and at 1024 dimension the performance of the measurement matrix has basically achieved the performance of the original dimension.The recognition rate of each measurement matrix is shown below: