A Fast Iterative Pursuit Algorithm in Robust Face Recognition Based on Sparse Representation

,which


Introduction
Face recognition is one of the most active and challenging subject in computer vision and artificial intelligence, which has a wide range of applications such as personnel sign system, image search engine, and convicts detecting system.It has been experimentally proved that various sparse representation methods perform well in face recognition [1].Given sufficient face images of  object members and a test image , which belongs to one of the object classes, the problem of face recognition can be transformed into a classification issue [1,2].The basic idea of this kind of algorithms is to find a sparsest solution to represent  for classification [3,4].
But current algorithms based on sparse representation also have drawbacks.On one hand, convergence speeds of these methods are slow to some extent.On the other hand, when the test image is under large percent of random corruption or contiguous occlusion, the sparse solution becomes denser so that it is hard for the system to find the right class  belongs to [5][6][7].
In this paper, a fast pursuit algorithm is proposed to solve the problem mentioned above.A common problem of pursuit algorithms is that the computational speed is quite slow.Aiming at face recognition, we improved the greedy algorithm, making it much faster than kindred ones.By focusing on crucial information to classification issue-sparsity of errors, this algorithm enhances the robustness in the problem of face misalignment and large percent of occlusion.The basic frame of this paper is as follows: to begin with, we briefly review existing techniques for face recognition, including its advantages and brittleness to occlusion.Then, an improved algorithm is proposed and its feasibility and effectiveness will be demonstrated.Finally, we present experiments on ORL, Yale, and FERET face databases, as well as on a face database collected by ourselves, to verify the modified algorithm.

A Review of Pursuit Algorithms to Solve Face Recognition Issue
Pursuit algorithms are for solving the problem ( 0 ) ( 0 ) : min  ‖‖ 0 subject to  = , where  represents the test sample,  is the training dictionary of the  known classes, and  is the sparse solution [8][9][10].The core idea of these greedy algorithms is to update support and provisional solution iteratively in order to reduce the residual to a minimum [11].Suppose the matrix  is composed of  training face images of the  subjects, where an image is represented by a column of ( * ), where   represents the th class of  and V , represents th element in   .And the given test sample , well aligned, can be seen as one of the linear combination of these dimensional column vectors: where  , is the corresponding coefficient of V ,  .The task of these algorithms is to find the sparsest solution so that class of the largest coefficient can be considered of the same class as  [12].Now, let us briefly review the procedure of OMP and MP: firstly, find the th column which can minimize () = ‖  V  − ‖ 2 , add this column to support, and compute   to make the residual minimized.Then, continue to update support and provisional solution iteratively until the residual is less thAn a threshold we set before.As to MP, which is similar to OMP, the apparent difference is that the coefficients of  −1 original entries remain unchanged, rather than solving a least square for reevaluating all the update support stages.
The greedy strategy expands the support set, initially empty, by one additional column.So the enumeration takes   0 steps if the optimization problem is known to have  0 nonzero members, which seems quite slow in many situations.But to the problem of face recognition, since the test sample  belongs to one class in , in another word, correlated with only a few columns in ,  0 is a controllable whose value is small.Many iterative-shrinkage algorithms, which "shrink" entry of every column in  to update the optimal solution iteratively in order to make the solution become "sparser, " additionally process ( − 1) classes of useless information every stage.Compared to them, pursuit algorithms can find the most possible class of test image  by the foremost iterations.Just because of this extraordinary nature, we hold that the pursuit method is best for classification via sparse representation.

A Fast and Robust Pursuit Algorithm
One of the apparent drawbacks of pursuit algorithm is that the computing speed is slow in many situations, since it must continuously update the support and optimal solution until the result is accurate enough.In this section, we propose an improved pursuit algorithm for face recognition, which mainly involved two aspects: (1) general stopping rules to make the greedy algorithm faster in face recognition; (2) stopping rules for well-aligned and noiseless input samples.
Most state-of-the-art algorithms dealing with sparse representation try to find linear combination of matrix , which approximates to input vector .Goal of many kinds of iterative shrinkage algorithms is to "shrink" the sparse solution [13], in other words, to make the solution become "sparser" [14]; on the contrary, pursuit strategy is to make the solution from sparse to dense.It is known that the basic idea of pursuit algorithm is to add columns to the support and update provisional solution until the residual between the proposed solution and the input vector  is small enough.We can easily imagine that it is hard to represent  by the linear combination of columns in  accurately when the test sample  is occluded or corrupted, as the distance between  and the corresponding column in  increases, and  may be relevant with more columns in  [15].

General Stopping Rules.
Since test image  is related with just a few columns in matrix , which belong to the same class, only coefficients of one class are valid to the recognition result.So it is unnecessary and inefficient to recover  precisely, which will unavoidable involve many irrelevant classes.Rather than recovering  accurately, our goal is to find the right class rapidly in face recognition; therefore more stopping rules are needed to make the algorithm faster.

Maximum Iterating Times.
Let us consider the best situation first.If the test image  is the same as one column in matrix , indicating that  and   are parallel, only one iteration which needs  flops to find that the maximum (   ) 2 is required to identify the class.When the test image is under random corruption or varying level of contiguous occlusion, the errors between  and every column of  become larger.Then  may be correlated with more classes in .But the final recognition result depends on the largest coefficients of sparse solution, which suggest the most possible class of .Hence even under the worst condition,  iterating times are enough because the iteration can be seen as an ergodic one to a classification problem where only one class is valid.

Results Resemblance of Two Successive Iterations.
As what we have discussed above, our final identification result depends on the largest entries of the sparse solution; therefore we can neglect the details of the solution [6].The minor change of the solution  may affect the accuracy of representation, but this would not influence the classification result.So if ‖  −  −1 ‖ 2 is smaller than some predetermined threshold, the iteration process can be stopped.

Stopping Rules and Processing Aiming at Different Image
Samples.Under most conditions, we would not hope that the iteration times reach its upper limit: on one hand, it may take quite a long time; on the other, since the solution becomes denser when iteration times increase, it is harder to identify the right class.And the basic stopping rule of OMP, ‖‖ 2 <  0 , where  =  −  and  0 is the error threshold, is often hard to obtain as test image may contain various noises or be disguised.Hence, more stopping rules should be devised to reduce iteration times.Now, we discuss this issue in two contexts-well-aligned, noiseless input samples and corrupted or occluded images.

Maximum Coefficient of the Sparse Solution.
Suppose the input image contains little noise and is well-aligned, it is easy for the system to classify this sample.What we should do is to raise the identification speed.The coefficients of sparse solution , where reflects the degree of resemblance between test sample  and one column in -  .If the maximum coefficient is large enough, in other words, approximating to 1, we can consider that  belongs to the class of   , since  and   have already been sufficiently similar and we do not need to represent the minor errors in the linear combination of other columns as the small coefficients are senseless to classification and this process will surely increase the iteration times.

Sparse Level of the Sparse Solution.
To corrupted or disguised test samples, however, large coefficients of the sparse solution are hard to reach owning to relatively larger range of noise.But the special characteristic of recognition or identification ensures that the recognition system can find the class of  before the representation error reaches the upper bound we have set.One of the key points is the sparse level of the sparse solution.We define the sparse level as the ratio of two largest coefficients which are in different classes, where   is the largest coefficient, which belongs to class  and   is the second largest coefficient whose class is different from class .

The Noise Level.
We define () as the maximal inner product between columns in  and test sample , which reflects the maximum correlation between  and face images in training dictionary .If () is smaller than some predefined threshold, it means that similarity between  and each column of  is less than that threshold, which indicates that the input sample is not a valid one or with much additional noise.Suppose the test sample is a valid one, it is inefficient and laborsome for the system to process  directly since our goal is to try to find the one resemble  in the dictionary, but  itself is not precise enough.Both the random noise and the part of occlusion can be regarded as noise in the test image accordantly.So () implies the noise level of the input image.

Stopping Rules for Well-Aligned and Input Samples with
Little Noise.The entries of sparse solution  reflect how much their respective contribution to the solution in respect that  includes all coefficients of the combination of columns in .Matching pursuit algorithm update support and solution based on minimizing the errors where  = − −1 and   =     −1 /‖  ‖ 2 2 .For input samples with noise or misaligned, obviously, the coefficients of  decrease.So the largest entry threshold cannot be reached.But if the corrupted percent of the input image  is not too large, the corresponding level of  and   , which is in the same class as , is relatively large compared to the others.This means that the solution is still quite sparse after a few iterations and the class of largest entry can be considered as the identification result.Using the concept of sparse level which we have defined before, the iteration processing can be stopped when a preset sparse level threshold reaches after some iteration times.
Proof.It can be proved that for a system of linear equations  = , if a solution  exists obeying OMP run with threshold parameter  0 = 0, is guaranteed to find it exactly.This theorem is only valid when test sample  can be represented by the linear combination of  exactly, Suppose that, () is equal or greater than some value close to 1, and after some iterations, the sparse level of the solution is still greater than , and we only reserve the (1 + 1/())/2 greater entries; hence  = {0, 0, . . .,   , 0, . . .,    , . ..} .
It can be proved that if a sparse vector satisfies the sparsity constraint ‖‖ 0 < (1 + 1/())/2 and gives a representation of  to within error tolerance , every solution  must obey Note that as we reconstruct  as one image in , the smallest error between  and the recovery image is (), so  is larger than 1 − ().Therefore,

Processing for Images under Random Corruption and Contiguous Occlusion
To almost all sparse representation algorithms, as based on errors of pixel in the corresponding position, it is brittle for them to cope with samples under large level of occlusion or corruption [7,16].Figure 2: Error between an occluded image and a "clean" one.We reshape the pixels in the two samples (a) into vectors and calculate the difference (b).Obviously, the distribution of error is like a comb, which indicates that the error exists in a few pixels and the others are "clean."

Necessity of Preprocessing of Test Samples and Training
Dictionary.When the test sample  is occluded seriously, the error between  and each column of  increases, which means each element of vector  rises, where   =  −   .Hence, both the largest coefficient and the sparse level after some fix iteration times can hardly reach their respective predetermined thresholds.On condition that we employ  and  directly, as shown in Figure 1, the solution we get can be dense, which will surely weaken the superiority of sparse representation.Therefore it is necessary to process input sample  and training dictionary  firstly.
It has been proposed that this kind of issue can be solved by block partitioning, which means to partition the image into blocks and process each block independently.The results for individual blocks are then aggregated.This method is only valid to images under contiguous occlusion.And the processing takes quite a long time since it transforms the classification to several subproblems.Let us think about how human brains handle the face images disguised by scarf or glasses.We make out the scarf or glasses in the image and then neglect these parts which are unrelated to face.And our judgment of the person in the image depends on the other parts which we regard as face.Imitating the method human brain dealing with this kind of image, we can propose to preprocess the input image and training matrix before applying them to the OMP algorithm.No matter the random corruption or the contiguous occlusion part can be regarded as noise in the image, we can uniformly reject this part and only pay attention to the other parts.

How to Identify a Corrupted or Occluded Test Sample
Automatically.Firstly, the system should automatically identify whether the test image is the one with partial occlusion or corruption.The errors between the occlusion and corruption image which belong to a class in the training dictionary or an invalid one are both quite large.But the error of an occluded one has some distinguishing features-the error between pixels focuses on only a few pixels; the others' error is quite small.So if the error vector contained some elements which are closed to 0 and the variance of the error vector is large enough,   >  0 , we can regard the test sample as a partial occlusion or corruption one, as shown in Figure 2.

Images Preprocessing to Corrupted or Occluded Samples.
Then let us discuss how to use extract the "clean" pixels and remove the invalid ones.To begin with, we can define an error threshold  between a particular pixel in test sample  and respective pixels of elements in .If the minimal error between the pixel in  and respective pixel in one class of  is larger than , We can regard this pixel as an invalid one and remove it from  as well as the respective one from .Then we get the new training matrix  1 and test vector  1 , whose "noise pixels" have been filtered.So the problem has been transformed into finding a sparse solution  1 subject to And the improved OMP algorithm can just be used in this new generated equation.One example is given in Figure 3.This will surely enhance the recognition rate of the system.Since this method does not involve any constraint conditions regarding noise distribution, the issue of random corruption and contiguous occlusion can be solved simultaneously."Shrinking" of  and  also makes the computing speed faster.

Experimental Results
In this section, we apply our algorithm on ORL database for face recognition.We first test the recognition rate and elapsed time of the algorithm, compared to the state-of-art algorithm to find the sparse solution.We then examine the identifying performance to corruption and occlusion.Finally, we simulate the real situation and check up the robustness under various disguises.

Performance on ORL Database and Yale Database. ORL database consists of 400 frontal images for 40 individuals.
Samples in this database include facial variation like various expressions and postures, which can be obstacles or challenges for the system to find the true class of the test images.
We test the face recognition rate and elapsed time of the algorithm by 10-fold cross-validation.In other words, for each test, the training dictionary is consisted of 40 classes of 360 images (9 samples per class) and the remaining 40 images are test samples.All these images have been simply downsampled, without any particular feature extraction.We compared the result with original OMP, together with some state-of-art algorithms aiming at sparse solution-primal augmented Lagrangian method (PALM), dual augmented lagrangian method (DALM), fast iterative soft-thresholding algorithm (FISTA), and truncated Newton interior-point Method (TNIPM).All results of each test and the average are in Table 1.As shown in Table 1, the recognition rate of improved OMP is the highest.Meanwhile, elapsed time per sample outperforms others.
We also compared the algorithm with others in the Yale database.Yale database contains 2432 frontal images for 38 individuals, which were captured under various laboratorycontrolled lighting conditions.8-fold cross-validation has been taken in this database, with 56 images of each class as training samples, and the other 304 images as test samples.Just as the tests on ORL database, the images have only been downsampled to construct the training dictionary to make the problem  =  become undetermined.The results of the 8 tests and average values are in Table 2.
In Table 2, although the recognition rate of FISTA is higher than our algorithm, its run time is almost tenfold compared with the improved OMP.
Similar experiments were performed on FERET database.Compared with the other face databases we mentioned above, this database includes more variations like postures and facial expressions.We ran 7-fold cross-validation, with 150 classes (6 samples per class) in this database as training database, the other 150 samples as test samples in each test.Data in Table 3 reflects the comparison of these algorithms.
In Table 3, we can see that the average recognition rate of improved OMP is the highest in tests of FERET database and its run time is much shorter.
We get these data in MATLAB on a typical 2.40 GHz PC with quad-core processor.To be fair, both the training dictionary and the test samples are the same to all algorithms.And all identified results depend on the class of the largest coefficient in the sparse solution.The improved OMP algorithm greatly reduces the run time, and its recognition rates are relatively well.

Recognition under Random Corruption.
We first test the robust version of our algorithm for samples under random corruption.We add salt and pepper noise of different intensities to samples in the database to generate the test samples.Figure 4 plots the recognition results of the robust version of the OMP and applying OMP directly to the test samples.
From Figure 4, when the face imaged is 90% corrupted by the noise, although we can hardly identify it as a face image, the algorithm still reconstructed the right image.The right line graph in Figure 4 indicates that this method performs quite well under the condition of large percent of random corruption.

Recognition under Continuous Occlusion.
We add an irrelevant image with different sizes to the samples in the  database and treat them as test image to test the robustness under continuous occlusion.Figure 5 indicates the performance.
We can see in the right of Figure 5 that the improved OMP significantly outperforms the original one, which shows its robustness of occlusion.

Recognition despite Disguise.
Face photos taken in a real world scenario often contained glasses and scarf, which makes it harder for the system to identify the right person.Now let us examine the performance of the algorithm under these kinds of situation.Our test images are also from ORL database and we add glasses and scarf pictures to the samples.
Figure 6 shows that the algorithm also has a quite well performance on the real situation-disguised test samples.We constructed a disguised test sample database based on ORL database of 40 samples with sunglasses and scarf.And the recognition rate reached 95%.

Conclusions
In this paper, we proposed a fast and robust algorithm based on OMP algorithm.We first discussed the disadvantage of OMP algorithm to solve the face recognition problem.Then an improved method is proposed to make the elapsed time become much shorter to identify a test image.We also tried to enhance robustness to the occluded and corrupted test samples by extracting the "noiseless" pixel and reduce the elements in both the test image and training dictionary, respectively.Finally, we prove this method by experiments on ORL database, Yale database, and FERET database.
One further work is to enhance the robustness in situation under various kinds of misalignment and postures.We may further reduce the constrained condition and apply this method to object recognition.

Figure 1 :
Figure 1: (a) is an input sample with contiguous occlusion.(b) is the sparse solution based on the OMP algorithm.(c) is the reconstructed image, whose class is the same as the class of the largest entry of the sparse solution.Obviously, the solution is not sparse enough and we have got a wrong answer due to the occluded part in .

Figure 3 :
Figure 3: Applying the occluded image (left top) directly to the algorithm, we get the wrong answer because part of the noise is too large for the system.By rejecting some pixels in the test image and shrinking the training matrix, respectively, the right person can be recognized.

Figure 4 :
Figure 4: Recognition under random corruption.Left: (a) test images  from ORL database, with random corruption.Top row: 30% of pixels are corrupted, middle row: 60% corrupted, bottom row: 90% corrupted.(b) Estimated sparse coefficients.(c) Reconstructed images.We can see that our algorithm outperforms the original OMP.

Figure 6 :
Figure 6: Recognition despite disguise like sunglasses, scarf, or respirator.(a) Test images with sunglasses or scarf.(b) Estimated sparse coefficients.(c) Reconstructed images.(d) Test images with sunglasses and scarf.(e) Estimated sparse coefficients.(f) Reconstructed images.

Table 1 :
Comparison on ORL database.

Table 2 :
Comparison on Yale database.