Thresholded Two-Phase Test Sample Representation for Outlier Rejection in Biological Recognition

The two-phase test sample representation (TPTSR) was proposed as a useful classifier for face recognition. However, the TPTSR method is not able to reject the impostor, so it should be modified for real-world applications. This paper introduces a thresholded TPTSR (T-TPTSR) method for complex object recognition with outliers, and two criteria for assessing the performance of outlier rejection and member classification are defined. The performance of the T-TPTSR method is compared with the modified global representation, PCA and LDA methods, respectively. The results show that the T-TPTSR method achieves the best performance among them according to the two criteria.


Introduction
Object recognition has become a hot topic in the field of computer vision and pattern recognition in recent years, and many approaches have been proposed for face image classification with a given database. One type of the methods is to reduce the dimensionality of sample by extracting the feature vector with linear transformation methods, such as the principal component analysis (PCA) [1][2][3] and the linear discriminant analysis (LDA) [4,5]. In the PCA method, the training samples and the testing samples are transformed from the original sample space into a space with the maximum variance of all the samples, while the LDA method converts the samples to a feature space where the distances of the centers of different classes are maximized. In these two transformation methods, both the training samples and the testing samples have their corresponding representations in the new feature space, and the classification is carried out based on the distance between the representations related to the training set and the testing set.
Another type of transformation-based method was proposed to focus on local information of the training samples. Instead of using the whole training set, this type of method only uses part of the samples, since the performance of the classifier is usually limited within some local areas. By concentrating on the local distribution of training data, the design and testing of the classifier can be much more efficient than the global methods [6]. Typical examples of local LDA methods include the method for multimodal data projection [7,8] and the approach to use the local dependencies of samples for classification [9]. It is also found that the local PCA is more efficient than the global PCA in feature extraction [10] or sample clustering [11].
In recent years, the sparse representation theory has been applied to pattern recognition problems and has drawn a lot of attentions [12][13][14][15][16][17][18][19][20][21]. The sparse representation method also uses only part of the training data for classification by linearly representing a testing sample with the training set, and part of the linear combination coefficients is set to zero. The classification criterion of the sparse representation method is based on the biggest contribution from the sample classes during the linear representation.
In a recent study, a two-phase test sample representation (TPTSR) method was proposed for face recognition [22]. In this method, classification process is divided into two steps: the first step selects -nearest neighbors of the testing sample from the training set by using linear representation method and the second step processes the selected samples further by using them to linearly represent the testing sample. The classification result is based on the linear contribution of the classes among the -nearest neighbors in the second phase of the TPTSR. By selecting -closest neighbors from the training set for further processing, the TPTSR method identifies a local area that may contain the target class sample, reducing the risk of misclassification because of a similar nontarget sample.
Even the TPTSR method has been proven to be very useful in face classification; however, for face recognition applications with outliers the classification emphasis is different and the performance measurement criterion is also new. In face recognition problems with outliers, like security registration systems, only a small and particular group of members is required to be classified and compared with a large population of irrelevant people or intruders. In the application of identifying wanted criminals at airports, train station and other public places, the classifier is also required to identify a minor number of target members from a large number of irrelevant passengers. In previous studies, the approaches for pattern classification with outliers include two main methods, one is to train the classifier with only the member samples, and the other is to take into account a small number of outliers as a separate class in the training set [23]. However, neither of the methods can guarantee a low false alarm rate while maintaining a reasonable recognition rate for members.
In this paper, we further develop the TPTSR method by applying a threshold in the classification process for outlier rejection and member classification, and it is referred to as thresholded TPTSR (T-TPTSR) method. In the T-TPTSR, the distance between the testing sample and the weighted contribution of the target class in the secondphase linear representation is measured and compared with a threshold, by which an outlier will be identified. In this study, we also propose two different criteria for assessing the performance of classifier for outlier rejection as well as member classification, and, based on these criteria, we test the thresholded global representation (T-GR) method, thresholded PCA (T-PCA) method, and thresholded LDA (T-LDA) method, respectively. The test results show that the T-TPTSR achieves better performance in rejecting the outliers while maintaining outstanding classification rate for members.
In Sections 2 and 3 of this paper, we will introduce the theory of the T-TPTSR, T-GR, T-PCA, and T-LDA, respectively. Section 4 presents our experimental results with different face image databases, and finally a conclusion will be drawn in Section 5.

Thresholded Two-Phase Test Sample Representation (T-TPTSR)
In this section, the TTPTSR method will be introduced with a threshold applied to the second-phase output in the classification process.

First Phase of the T-TPTSR with M-Nearest Neighbor
Selection. The first phase of the T-TPTSR is to selectnearest neighbors from all the training samples for further processing in the second phase, narrowing the sample space down to a local area for the target class [22]. The -nearest neighbors are selected by calculating the weighted distances of the testing sample from each of the training samples. Firstly, let us assume that there are classes and training images, 1 , 2 , . . . , , and if some of these images are from the th class ( = 1, 2, . . . , ), then is their class label. It is also assumed that a test image can be written in the form of linear combination of all the training samples, such as where ( = 1, 2, . . . , ) is the coefficient for each training image . Equation (1) can also be written in the form of vector operation, such as , or it can be solved by using = −1 , where is a small positive constant and is the identity matrix. In our experiment with the T-TPTSR method, in the solution is set to be 0.01.
By solving (2), we can represent the testing image using the linear combination of the training set as shown in (1), which means that the testing image is essentially an approximation of the weighted summation of all the training images, and the weighted image is a part of the approximation. In order to measure the distance between the training image and the testing image , a distance metric is defined as followed: where is called the distance function, and it gives the difference between the testing sample and the training sample . It is clear that a smaller value of means that the th training sample is closer to the testing sample, and it is more probable to be the member of the target class. Thesenearest neighbors are chosen to be processed further in the second phase of the T-TPTSR where the final decision will be made within a much smaller sample space. We assume that the -nearest neighbors selected are denoted as 1 ⋅ ⋅ ⋅ , and the corresponding class labels are = { 1 ⋅ ⋅ ⋅ }, where ∈ {1, 2, . . . , }. In the second phase of the T-TPTSR, if a sample 's class label does not belong to , then this class will not be considered as a target class, and only a class from will be regarded as a potential target class.

Second Phase of the T-TPTSR for Outlier Rejection.
In the second phase of the T-TPTSR method, the -nearest neighbors selected from the first phase are further calculated to obtain a final decision for the recognition task. We represent the testing sample with the linear combination of the training samples again, but only with the -nearest neighbors selected from the first phase. If the -nearest neighbors selected are denoted as 1 ⋅ ⋅ ⋅ , and their linear combination for the approximation of the testing image is assumed to be satisfied, such as where ( = 1, 2, . . . , ) are the coefficients. In vector operation form, (4) can be written as In the same philosophy as above, if̃is a nonsingular square matrix, (5) can be solved by or, otherwise, can be solved by where is a positive small value constant, and it is usually set to 0.01, and is the identity matrix.
When we obtain the coefficients for each of the nearest neighbors, the contribution of each of the classes to the testing image will be measured, and the classification output will be based on the distance between the contribution and the testing image. If the nearest neighbors ⋅ ⋅ ⋅ are from the th class ( ∈ ), and the linear contribution to approximate the testing sample by this class is defined as The measurement of the distance between the testing sample and the th class samples in the -nearest neighbors is calculated by the deviation of from , such as It is clear that a smaller value of means a better approximation of the training samples from the th class for the testing sample, and thus the th class will have a higher possibility over other classes to be the target class. However, if outliers are considered, a threshold must be applied to the classification output to differentiate the members of class from outliers, such as where is the threshold. If ≥ , the testing sample will be regarded as an outlier and therefore will be rejected. Only when < , the testing sample can be classified to the th class with the smallest deviation from .
In the second phase of the T-TPTSR, the solution in (6) or (7) finds the coefficients for the linear combination of the -nearest neighbors to approximate the testing sample, and the training class with the minimum deviation of the approximation will be considered as the target class for the testing sample. However, the value of the minimum deviation must be less than the threshold . If the minimum distance between the testing sample and the member class's approximations is greater than the threshold , the testing sample will be classified as an outlier and thus rejected. However, if the value of the minimum deviation of the linear combinations to an outlier is less than the threshold , this outlier will be classified into the member class with the minimum deviation, and a misclassification will occur. Likewise, if a testing image belongs to a member class, but the minimum deviation from the linear combinations of each of the classes is greater than the threshold , this testing image will be classified as an outlier, and a false alarm is resulted. Since the samples used in the T-TPTSR method are all normalized in advanced, the value of in (9) will be within a certain range, such that 0 ≤ ≤ , where ≈ 1, and therefore it is practical to determine a suitable threshold for the identification task before the testing.

The T-GR, T-PCA, and T-LDA Methods for Outlier Rejection
As a performance comparison with the T-TPTSR method, in the following section, we also introduce the modified versions of the GR, PCA, and LDA methods, respectively, for outlier rejection and member classification in face recognition.

The T-GR Method.
The thresholded global representation (T-GR) method is essentially the T-TPTSR method with all the training samples that are selected as the -nearest neighbors ( is selected as the number of all the training samples), and it also finds the target class directly by calculating the best representing sample class for the testing image.
In the T-GR method, the testing sample is represented by the linear combination of all the training samples, and the classification is not just based on the minimum deviation of the linear contribution from each of the classes to the testing sample, but also based on the value of the minimum deviation. If the minimum deviation is greater than the threshold applied, the testing sample will be identified as an outlier.

The T-PCA Method.
The PCA method is based on linearly projecting the image space onto a lower-dimensional feature space, and the projection directions are obtained by maximizing the total scatter across all the training classes [24,25]. Again, we assume that there are classes and training images, 1 , 2 , . . . , , each of which is -dimensional, where < . If a linear transformation is introduced to map the original -dimensional image space into an -dimensional feature space, where < , the new feature vector ∈ can be written in the form of where ∈ × is a matrix with orthonormal columns. If the total scatter matrix is defined as where ∈ is the mean of all the training samples, we can see that, after applying the linear transformation , the scatter of all the transformed feature vectors 1 , 2 , . . . , is , which can be maximized by finding a projection direction , such as where ( = 1, . . . , ) is the set of -dimensional eigenvectors of corresponding to the biggest eigenvalues. During the recognition process, both the testing sample and all the training samples are projected into the new feature space via before the distance between them is calculated, such as In the thresholded PCA method, the testing sample will be classified to the class whose member has the minimum distance , but this distance must be less than the threshold , such that = min < ( , = 1, 2, . . . , ; ∈ [0, +∞)) . (15) The testing sample whose corresponding minimum distance is less than the threshold will be classified as an outlier and therefore rejected; otherwise will be classified into the class with .

The T-LDA Method.
The LDA is a class-specific linear method for dimensionality reduction and simple classifiers in a reduced feature space [26][27][28][29]. The LDA method also finds a direction to project the training images and testing images into a lower dimension space, on the condition that the ratio of the between-class scatter and the within-class scatter is maximized.
Likewise, if there are classes and training images, 1 , 2 , . . . , , each of which is -dimensional, where < , and in the th class there are samples ( = 1, 2, . . . , ), the between-class scatter matrix can be written as and the within-class scatter matrix can be defined as where is the mean image of the th class, and is the mean of all the samples. It is noted that must be nonsingular in order to obtain an optimal projection matrix with the orthonormal columns to maximize the ratio of the determinant of the projected and projected , such that where ( = 1, . . . , ) is the set of -dimensional generalized eigenvectors of and corresponding to the biggest eigenvalues, such as where ( = 1, . . . , ) is the generalized eigenvalues. Since there are the maximum number of − 1 nonzero generalized eigenvalues available, the maximum can only be − 1.
The distance between the projection of the testing sample and the training samples with in the new feature space is calculated as If the sample 's projection into the feature space has a minimum distance from the projection of the testing sample , the testing sample will be classified into the same class as , such that = min < ( , = 1, 2, . . . , ; ∈ [0, +∞)) , where is a threshold to screen out the outliers. For the threshold LDA method, all the target members' projection distance must be less than , or otherwise they will be classified as outliers and rejected.

Experimental Results
In this experiment, we test the performance of the T-TPTSR, the T-GR, the T-PCA, and the T-LDA methods for outlier rejection and member classification, respectively. One of the measurement criteria for comparing the performance of these methods is to find the minimum overall classification error rate. During the classification task, an optimal threshold can be found for the above methods so that the overall classification error rate is minimized. The overall classification error rate is calculated based on three classification error rates, such as the misclassifications among member's classes (when the testing sample is a member and < , but misclassified as another class), the misclassifications of a member to outlier's group (when the testing sample is a member but > , and thus misclassified), and misclassifications for outliers (when the testing sample is an outlier but < , and therefore accepted wrongly as a member). If ERR overall ( ) represents the overall classification error rate as a function of the threshold , ERR member ( ) denotes the classification error rate for errors that occurred among members (misclassifications recorded for testing samples from member's group versus the total number of testing samples from member's group), and ERR outlier ( ) is the misclassification rate for outliers (classification errors recorded for testing samples from the outlier's group versus the total number of testing outliers), their relationship can be written as ERR overall ( ) = ERR member ( ) + ERR outlier ( ) .
It is noted that the value of ERR member varies with the threshold , and when = 0, ERR member takes the value of 100%, and it generally decreases when the value of increases until it reaches a constant classification error rate. The classification error rate for outlier also changes its value according to the threshold , however, ERR outlier = 0% when = 0, and its value increases until reaching 100%. The minimum ERR overall ( ) can be found between the range of = 0 and = , where ERR member ( ) becomes a constant, or ERR overall ( ) reaches 100%, such that ERR opt = min ERR overall ( ) , ∈ [0, +∞) .
The value of ERR opt is an important criterion showing the performance of a classifier for both of outlier rejection and member recognition. Another measuring criterion for measuring the performance of the thresholded classifiers is the receiver operation characteristics (ROC) curve, which is a graphical plot of the true positive rate (TPR) versus the threshold in the application of thresholded classification for outlier rejection. We firstly define the true positive detection rate for the outliers, TPR outlier ( ), and it can be written in the form of the classification error rate for the outliers, such that TPR outlier ( ) = 100% − ERR outlier ( ) , ∈ [0, +∞) .
We also define the false alarm rate caused in the member' group as a function of the threshold, ERR FA ( ), which is the number of errors recorded for misclassifying a member to an outlier versus the number of testing samples from the member's group. An optimal classifier for outlier rejection and member classification needs to find a suitable threshold so that the TPR outlier ( ) can be maximized as well as the ERR FA ( ) can be minimized. Therefore, the following function -( ) is defined for this measurement, such that It is obvious that -( ) is required to be maximized so that a classifier can be optimized for both outlier rejection and member classification, such that and the value of opt is an important metric for comparing the performance of classifier for outlier rejection analysis. The minimum overall classification error rates ERR opt and the maximum difference of the true positive outlier recognition rate and the false-alarm rate opt are essentially the same performance assessment metric for a classifier with outlier rejection. The difference is that the overall classification error rate represents the efficiency of member classification, while and opt show the performance of outlier rejection. In the following experiment, we test and compare the minimum overall classification error rates ERR opt and the maximum opt of the T-TPTSR, T-GR, T-PCA, and T-LDA methods, respectively, and based on these two criteria we find the optimal classifier for outlier rejection and member classification.
In our experiment, we test and compare the performance of the above methods using the online face image databases Feret [30,31] In this experiment, the training set and the testing set are selected randomly from each of the individuals. For each of the databases, the people included are divided into two groups and one is member's group and the other is outlier's group. For individuals chosen as the member's class, the training samples are prepared by selecting some of their images from the database, and the rest of the images are taken as the testing set. For the outliers that is supposed to be outside the member's group, there is no training set for the classification, and all the samples included in the outlier's group are taken as the testing set.
We firstly test the Feret database with the above outlier rejection methods. The Feret database is divided into two groups, 100 members from the 200 individuals are randomly selected into the member's group, and the rest of the 100 individuals are the outliers in the test. For each of the 100 member classes, 4 images out of 7 are selected randomly as the training set, and the rest of the 3 images are for the testing set. For the 100 individuals in the outlier's group, all 7 images from each of them are the testing set for the classification task. Therefore, there are 400 training images and 1000 testing images in this test, and, among the testing images, there are 300 images from member's group and 700 images from outlier's group. Figure 1 shows part of the member and outlier's images from the Feret database for the testing, and all the images have been resized to a 40 × 40-pixel image by using a downsampling algorithm [34]. Since the number of classes in the Feret database is much more than the ORL and AR databases, also the number of training images is less, and the resolution of the images is lower, the testing with the Feret database would be more challenging and the result is generally regarded as more convincing.
In the test of the T-TPTSR method with the Feret database, the number of nearest neighbors selected for the first-phase processing is 60 (according to the empirical data, the optimal number is selected about 10∼15% of the number of training samples). In the test with the above methods, the threshold value varies from 0 to a constant that can result in 100% of ERR outlier with the interval of 0.1 or 0.5, where all outliers are accepted as members. Figures  2(a)∼2(d) show different classification error rates of the above methods as the function of the threshold , respectively. It can be seen that the ERR opt values of the T-TPTSR method and the T-GR method are much lower than the T-PCA and T-LDA methods, and the ERR member curves of the T-TPTSR and T-GR decrease from 100% to a much lower constant than those of the T-PCA and T-LDA when the threshold increases. The second row of Table 1 lists all the ERR opt values shown in Figure 2, and we can see that the T-TPTSR method achieves the lowest overall classification error rate. Figure 3 shows the ROC curves of the T-TPTSR, T-GR, T-PCA and T-LDA methods, respectively, and the third row of Table 1 Computational and Mathematical Methods in Medicine gives details of all the opt values shown in Figure 3. It can be seen that the T-TPTSR also has a higher value of opt than other methods. For the testing with the AR database, we randomly selected 80 classes as the member and the rest of the 40 people are taken as outliers. For each of the members, 13 images are selected randomly from the 26 images as the training set, and the rest of the 13 images are included in the testing set. Hence, there are 1040 training images and 2080 testing images in this test, and in the testing set, there are 1040 member's images and 1040 outlier's images. Figure 4 shows part of the member's and outlier's images from the AR database, and the images for training and testing have been downsized to be a 40×50-pixel image [34].
When we test the T-TPTSR method with the AR database, the number of nearest neighbors selected is 150.    Tables 2 and 3 show the method and number of transform axes used in the same way.
and opt values that the T-TPTSR method outperforms the T-GR, the T-PCA, and the T-LDA methods in the outlier rejection and member classification applications. We also test the above methods with the ORL face image database. There are totally 40 classes in the ORL database, and we select 30 random classes to be the members and   Figure 5 shows some sample images from the ORL database, and the images used are also resized to 46 × 56 [34]. The number of nearest neighbors selected for the T-TPTSR method for the ORL database is 40. Table 3 gives the details of the ERR opt values and opt values of the four methods, respectively. It can be seen that the T-TPTSR method also shows better performance than all the T-GR, T-PCA, and T-LDA methods, and it has been confirmed that the T-TPTSR method is the optimal solution among them for outlier rejection and member classification.
It is noted that, in the test with the AR and ORL databases, the performance of the T-TPTSR, the T-GR, and the T-PCA are comparable. This is because, under redundant and reasonable resolution sample situation, the performance of the T-PCA method is close to the T-TPTSR and T-GR methods. However, when the T-PCA method is tested with a small number of training samples and low-resolution images, like the Feret database, the advantages of the T-TPTSR method are very obvious.
The criterion we use for judging, whether a sample is an outlier or not, is to measure the distance between the testing sample and the selected target class. If this distance is greater than the threshold, this sample will be classified as an outlier. In T-TPTPR method, the first-phase process finds a local distribution close to the testing sample in the wide sample space by selecting -nearest samples. In the secondphase processing of the T-TPTSR method, the testing sample  is classified based on the distance between the testing sample and the closest class among the -nearest neighbors. If the testing sample is an outlier, the measure of distance will only be limited within the local distribution within the sample space, and, therefore, the measurement is not confused with other training samples that happen to be close to the outlier. By applying a suitable threshold, a classifier can reject the outliers and classify the members with the minimum overall classification error rate and the maximum gap between the outlier detection rate and false alarm rate for members. The T-TPTSR method linearly representing the testing sample with the training samples and the distance between the testing sample and the target class are measured by calculating the difference between the testing sample and the weighted contribution of the class in the linear representation. In our test above, the T-TPTSR method achieves the best performance in outlier rejection as well as member classification. This is because in the T-TPTSR the two-phase linear representation of the testing sample results in a closer approximation and assessment by the training samples. Thus, the distance between the testing sample and the target class can be minimized, and the distance between the testing sample and an outlier can be maximized, leading to a better overall classification rate and greater ratio of outlier recognition rate versus the false alarm rate.

Conclusion
This paper introduces the modified versions of four useful approaches in face recognition, the T-TPTSR method, the T-GR method, the T-PCA method, and the T-LDA method, for the application of outlier rejection as well as member classification. Their performance is tested with three different online face image databases, the Feret, AR, and ORL databases, respectively. The results show that the T-TPTSR method achieves the lowest overall classification error rate as well as the greatest difference between the outlier detection rate and false-alarm rate. Even the T-PCA method may achieve comparable performance with the T-TPTSR method under ideal sample conditions, the test result of the T-PCA method is generally poor under bad sample conditions. The T-TPTSR method achieves the best performance in outlier rejection as well as member classification because of the two-phase linear representation of the testing sample with the training samples.