A Support Vector Data Description Committee for Face Detection

Face detection is a crucial prestage for face recognition and is often treated as a binary (face and nonface) classification problem. While this strategy is simple to implement, face detection accuracy would drop when nonface training patterns are undersampled. To avoid these problems, we propose in this paper a one-class learning-based face detector called support vector data description (SVDD) committee, which consists of several SVDD members, each of which is trained on a subset of face patterns. Nonfaces are not required in the training of the SVDD committee. Therefore, the face detection accuracy of SVDD committee is independent of the nonface training patterns.Moreover, the proposed SVDD committee is also able to improve generalization ability of the original SVDD when the face data set has a multicluster distribution. Experiments carried out on the extended MIT face data set show that the proposed SVDD committee can achieve better face detection accuracy than the widely used SVM face detector and performs better than other one-class classifiers, including the original SVDD and the kernel principal component analysis (Kernel PCA).


Introduction
Face detection plays a key role in human-computer interaction since it is a prior step to face recognition.Given an image, the objective of face detection is to locate the faces in the image and return the location of each face.Due to complex backgrounds, variations in facial details, and lighting conditions, face detection has been considered one of the most challenging pattern recognition problems.A large body of works has been presented to tackle this difficult problem in the past decades, which had been nicely surveyed in [1][2][3].
1.1.Related Works.Appearance-based approach has dominated the recent advances in face detection [3], which consists of two main steps: first, a sliding window is used for scanning the whole image in a serial fashion [4,5]; then a preselected face detector performs a binary (face and nonface) classification task on each window image to verify whether a face is present or not in each window.Previous works based on this approach were focused on addressing issues such as (1) exploiting robust features, for example, Haarlike features [6], Bayesian feature [7], spectral histogram [8], local binary pattern-(LBP-)based spatial histogram [9], and principal component analysis (PCA) and its nonlinear version: Kernel PCA [10], (2) seeking face detectors with good generalization ability, such as neural networks (NNs) [11][12][13], Bayesian classifier [7,14], and support vector machine (SVM) [4,8,10,[15][16][17][18][19][20][21][22][23][24][25], and (3) further improving efficiency of a given face detector by boosting-based techniques, in which the AdaBoost is probably the most famous and has been used in the Viola-Jones face detector [6,26].In this work, we aim at dealing with the second issue, the face detector design, and propose a novel face detector called support vector data description (SVDD) committee.

Problem Description.
Appearance-based face detection typically treats the face detection task as a binary classification problem [2,3,5]: face and nonface classification.Accordingly, two-class classifiers were adopted.According to previous  works, the two-class classifier SVM by Vapnik [27] has been frequently used as the face detector.The success of SVM in face detection should be attributed to the use of kernel tricks and its learning strategy based on structural risk minimization (SRM).However, SVM may still suffer from a critical problem when applying it to face detection: the high false negative rate due to unrepresentative nonface training patterns, described as follows.
To train an SVM, one has to prepare a training set containing face (positive) and nonface (negative) patterns in prior [22].The training set is then used to train an SVM to find an optimal separating hyperplane (OSH) with maximal margin of separation in a kernel-induced feature space.However, compared with face class, nonface class's distribution has a relatively large variation due to rich nonface patterns.In other words, it is easy to collect a set of face training patterns which can represent the face class; however, collecting a set of nonface training patterns which are representative enough is difficult because any patterns that do not belong to the face class are nonface patterns.In other words, the nonface class is most likely undersampled: distribution of the collected nonface patterns used for training is not identical to the true distribution of the nonface class.If an SVM is trained by such a training set in which the nonface training set is unrepresentative, many nonface patterns would fall into the wrong side of the OSH in testing stage, resulting in numerous false positives, as illustrated in Figure 1.

Presented Work.
To avoid this critical problem, we choose to adopt the one-class learning strategy to deal with aforementioned undersampled problem of nonface training patterns.One-class learning is to solve the conventional twoclass classification problems where one of the two classes is undersampled, or only the data of one single class can be available for training [28][29][30].One-class classifiers are to find a compact description for a specific class (usually being referred to as target class) and can be built on just one single class, the target class.In this work, we treat faces as targets, while nonfaces as outliers.The decision boundary of a oneclass classifier is then used to distinguish targets from outliers.
SVDD is a kernel method for novelty detection.Given a target training set, SVDD maps the set into a higherdimensional kernel-induced feature space and then finds a minimum-volume hypersphere that can enclose all or most of the mapped target data in this feature space.Due to the use of kernel tricks, the sphere boundary in the feature space becomes a flexible one in the original input space, thus being able to fit any irregularly shaped target data sets.This is particularly useful for face detection since face patterns are in general nonlinearly distributed [37,38].Recently, success of SVDD has been shown in a variety of novelty detection problems, such as anomaly detection in hyperspectral images [39], defect inspection [40], and novel percept detection for a vision-guidance mobile robot [41].
However, SVDD still has its limits.When a target training set is not a compact set but is formed by a set of disjoint clusters in the data space, the generalization performance of SVDD would drop significantly, as pointed out by Tax and Duin [34].Unfortunately, face patterns from different individuals form a multicluster distribution in the space of patterns.Thus, using one single SVDD to discriminate faces from nonfaces may not be adequate.To solve this problem, we propose in this paper an SVDD committee.
The training of the proposed SVDD committee consists of two stages.In the first stage, a given face training set is automatically partitioned into disjoint clusters using fuzzy -means (FCM) algorithm [42] and a partitioning entropybased best cluster selection criterion [43].The face patterns in each cluster form a compact face subset.Each face subset is then used to train a unique SVDD, which is the second stage.In addition, the decision boundary of SVDD often encloses the face training patterns tightly, which limits the generalization performance for testing faces.To improve the performance, we also modify the original decision function of SVDD so that the decision boundary of SVDD is enlarged.By doing so, face acceptance rate can be improved.Finally, if there are  face clusters,  SVDDs (members) will be trained.In the testing stage, each trained SVDD serves as a committee member.An unseen pattern is classified as a face pattern if it is accepted by any of the  SVDDs.Details of the SVDD will be introduced in Section 3.
The rest of this paper is organized as follows.In Section 2, we first review the basics of SVDD.Then, the SVDD committee will be introduced in Section 3. Results and discussions are provided in Section 4. Finally, conclusions are drawn in Section 5.

SVDD
Let  = {x  ∈ R  }  =1 be a face training set, where x  are face training patterns.SVDD maps the training patterns into a higher dimensional space  using a nonlinear map  : R  →  and then finds a minimum-enclosing hypersphere with center a  and radius  in , which can be formulated as the optimization problem as where the penalty weight  is user specified and   are slack variables representing training errors.Taking the partial derivatives / = 0, /a  = 0 and /  = 0, where is the Lagrangian function and   and   are nonnegative Lagrange multipliers, and substituting the results back into  yield the dual constrained optimization problem as follows: where (x, y) = (x) ⋅ (y) is the kernel function.In this paper, the Gaussian function is used as the SVDD kernel, where  is the parameter of the Gaussian kernel.According to the Kuhn-Tucker conditions, (1) the data points with   = 0 are inside of the hypersphere, (2) the data points whose 0 <   <  are on the sphere boundary, and (3) the data points whose   =  fall outside the sphere and have nonzero   .The data points with   > 0 are support vectors (SVs).Further, the SVs with 0 <   <  are called unbounded SVs (UBSVs), while the SVs with   =  are called bounded SVs (BSVs).The center of the sphere is spanned by the mapped SVs, where   is the number of SVs.The sphere radius  is determined by taking any x  ∈ UBSVs and calculating the distance from its image to the center a  as follows: For Gaussian kernel, (x, x) = (x)⋅(x) = 1, for all x ∈   .Mapping the sphere boundary (x) = 0 back into the original space R  yields a flexible boundary that encloses the face training set.The free parameter  controls the tightness of the boundary: the smaller the  is, the tighter the boundary is.However, the  cannot be too small; otherwise the boundary will be too tight to get a satisfactory face acceptance rate for unseen patterns.The decision function of SVDD is given by If (x) ≤ 0, the test pattern x is accepted as a face pattern, rejected as a nonface pattern otherwise.Here, the fuzzy -means (FCM) algorithm is employed to accomplish this task, which solves the optimization problem as follows:

SVDD Committee
subject to the constraints where U = [  ] is the partition matrix,   ∈ [0, 1] is the membership degree that the th training pattern x  belongs to the th cluster, V = (v 1 , . . ., v  , . . ., k  ) is a -tuple of cluster prototypes, v  ∈   is the centroid of the th cluster,  is the number of clusters (2 ≤  ≤ ), and  ∈ (1, ∞) is the weight controlling the degree of fuzziness of the matrix U (we set  = 2 in this study).FCM algorithm is performed in an iterative manner as follows.

Mathematical Problems in Engineering
Step 1. Initialize .
Choosing a proper number  of clusters is of primary importance.The best number  best of clusters can be determined using the partitioning entropy-based criterion [43] as follows: where Ω  is the set of solutions and is the partitioning entropy, providing a global validity measure for the clustering results.Finally, the training set  is partitioned into  best subsets using the simple rule: the th training data point x  is assigned to the th cluster if its membership degree to the th cluster is the highest; namely, =1 to obtain the Lagrange multipliers for the th SVDD by solving the quadratic programming problem formulated in (3) and then use (6) to calculate the sphere radius   .However, the trained decision boundary tightly encloses the face training subset.For some test face patterns located around the subset's distribution, they may be outside of the boundary.Such face test patterns will be rejected as nonfaces as a result.In this study, we deal with this problem by enlarging the sphere; that is, where Δ  is a positive value.Δ  cannot be too large, otherwise the outlier acceptance rate (nonfaces classified as faces) will be significantly increased although the face acceptance rate is improved.Therefore, Δ  should be much smaller than   .According to our preliminary testing results, setting Δ  =   /15 can improve the face acceptance rate without increasing the nonface acceptance rate.After updating the sphere radius by (15), the decision function   (x) for the th SVDD is obtained.

Testing.
For an unseen pattern x, it is rejected as a nonface if it is rejected by all the committee members, accepted as a face otherwise: x ∈ nonface, if   (x) > 0, ∀ = 1, . . .,  best x ∈ face, otherwise.
In other words, the decision making strategy used in the SVDD committee is not based on the usual majority voting.Instead, the unseen pattern is classified as a face as long as at least one of the SVDDs accepts it.

Results and Discussions
In Section 4.1, we first introduce the data set used for experiments.Then, in Section 4.2 we give an illustrated example that shows how the proposed SVDD committee deals with the problem of multicluster face distribution by a visualization analysis.Then, we compare our method with SVM and other one-class classifiers in terms of face detection accuracy.

An Illustrative Example.
We randomly select 100 face patterns from the extended MIT face data set and perform the principal component analysis (PCA) to reduce the dimensionality of the patterns by selecting two leading eigenvectors of the pattern covariance matrix.The projections of the 100 face patterns in the 2D PCA-based subspace are depicted in Figure 2.
It can be observed from Figure 2(a) that the face patterns do not form a compact distribution but a multimodal (multicluster) distribution in the 2D space.We first use one single SVDD to learn the boundary which encloses most of the face patterns.As the kernel parameter  is set to a very large value ( = 100), the decision boundary of SVDD is nearly spherical shaped in the original 2D space.As the value of  decreases to 10 from 100, the SVDD boundary becomes tighter (see Figure 2(c)).However, the boundary is still not tight enough because there still exist empty areas (the areas within the green circles).Nonface patterns falling into these empty areas will be accepted as faces since the patterns are also inside of the SVDD boundary.One way to avoid such a situation is to further decrease the value of .However, according to author's previous work [46], when  is too small, all the mapped target training data will be orthonormal to each other in the Gaussian kernel-induced feature space  and become SVs.Also, when all or almost all the target training data becomes SVs, the boundary will be too tight to get a good target acceptance rate [33,34].As shown in Figure 2 Instead of using the whole face training set to train an SVDD, the proposed SVDD committee partitions the whole training set into disjoint clusters and then utilizes the face patterns in each cluster to train an independent SVDD.By doing so, the performance drop due to the multicluster face distribution can thus be solved.An illustrated example is shown in Figure 3.
First, the 100 face patterns are partitioned into three clusters (three disjoint subsets) using the FCM algorithm and the best cluster selection criterion stated in Section 3. Compared with the whole training set, each subset forms a much more compact distribution.Then, each subset is utilized to train an SVDD.Finally, the three independent SVDDs constitute a committee.By comparing the results of Figure 3 with the ones of Figure 2, we can see that using multiple SVDDs to find a description for a multicluster face distribution is more suitable than using one single SVDD.

Comparison with Other Methods
4.3.1.Methods.We compare our proposed SVDD committee with the frequently adopted face detector SVM and the oneclass learning methods including the regular SVDD (i.e., single SVDD) and Kernel PCA [36].
Kernel PCA is a nonlinear version of PCA, which was originally designed for pattern representation [47].Recently, Tsang et al. [44] further extended their original idea to novelty detection.Kernel PCA uses the reconstruction error in a kernel-induced feature space as a novelty measure (see Section 3 of [44] for details).A test data point is accepted as a target if its reconstruction error is below a predefined threshold, rejected as an outlier otherwise.The Kernel PCA for novelty detection involves three free parameters: kernel parameter, number of chosen eigenvectors , and the threshold of the reconstruction error  re .Free parameters of the methods are listed in Table 1.Table 1: Free parameters of the methods to be compared.
For SVM, faces are treated as positive (target) data while nonfaces as negative data (outlier).For one-class classifiers (SVDD, SVDD committee, and Kernel PCA) faces are treated as targets while nonfaces as outliers.For face detection problems, nonfaces often outnumber faces largely.If the data set is imbalanced, usual classification error rate is not an appropriate performance measure [48].Therefore, the balanced loss suggested in [48] is adopted as the performance measure in the following experiments, where TAR and ORR stand for the target acceptance rate and outlier rejection rate, respectively.
A good face detector should be able to achieve a high TAR and a high ORR simultaneously.Therefore, the lower the balanced loss is, the better the face detector is.To facilitate the comparisons, the balanced loss is simply called error rate hereafter.(2) Step 2. Perform 10-run twofold cross validation [30] on the new training set to optimize the methods.
(3) Step 3. Feed the new test set to the methods to obtain their error rates.
According to Table 2, we can first observe that the average error rates of SVM in Exp A, Exp B, and Exp C are 20.12%,18.24%, and 14.24%, respectively.The comparison results indicate that increasing the number of nonface training patterns can improve generalization performance of SVM in face detection.This could be due to the fact that when the number of nonface training patterns increases, the distribution of nonface training patterns can better represent the true distribution of the nonface class, thus being able to gain better generalization performance.Moreover, all the one-class classifiers perform better than the two-class classifier SVM in all experiments except for Exp C where SVM (14.24%) is slightly better than SVDD (14.27%).Also, the proposed SVDD committee gives the best results in the three experiments and greatly improves the accuracy of the original SVDD in face detection, which can be seen from Table 3 where the error reduction ratio (ERR) is defined as where  SVDD and  SVDD Com denote the average error rates of SVDD and SVDD committee, respectively.The results reported in    5 indicate that SVDD has the fastest testing speed.Although SVDD committee is lower than SVDD, its testing speed (1.12-1.54ms/pattern) is acceptable for real-time face detection.

Conclusion
In this paper, we have presented a novel face detector called SVDD committee.The proposed SVDD committee is based on one-class learning and partitioning strategies, thus being able to improve the generalization performance of the original SVDD in face detection.Also, nonface patterns are not required in the training of SVDD committee.Therefore, the face detection accuracy of SVDD committee would not be affected by the chosen nonface patterns.On the contrary, the frequently adopted face detector SVM is a two-class classifier.Its face detection accuracy depends on the number   of collected nonface training patterns, and its training time increases with   .Experiments have demonstrated that the proposed SVDD committee not only performs better than SVM, but also significantly improves the generalization performance of the original SVDD in face detection.Also, the testing speed is acceptable for applications where real-time face detection is required.This work does not apply any robust feature extraction methods, because the focus of this work is on the development of a novel face detector based on one-class learning.It is believed that the SVDD committee-based face detection accuracy can be further improved if advanced feature extractions are applied, such as LBP and Haar-like features.In addition, the presented work does not address other critical issues such as learning from large-scale data sets.Those will be our future works.

Figure 1 :
Figure 1: This figure shows that if an SVM is trained on a training set in which the nonface training set does not represent the nonface class, some of nonface testing patterns that lie inside of the red boundary would be classified as face patterns (i.e., falling into the wrong side of the SVM's OSH), resulting in a high false positive rate in the testing stage.

3. 1
. Training Stage 1 (partitioning).The first stage is to partition the face training set  = {x  ∈ R  }  =1 into clusters which are disjoint.

4. 1 .
Data Set.The extended MIT face data set [44, 45] consists of a training set and a test set.The training set contains 489410 patterns, where 17496 are faces and the remaining 471914 are nonfaces.The test set contains 472 faces and 23573 nonfaces.Each pattern is represented by a 361-dimensional vector.

Figure 2 :
Figure 2: (a) Shows the distribution of the randomly selected face patterns in a 2D PCA-based subspace.(b), (c), and (d) show the SVDD training results using  = 100,  = 10, and  = 1, respectively.The white curves are the SVDD decision boundaries.The blue crosses with white circles denote the UBSVs, while the blue crosses with red circles in the upper-right subfigure are BSVs.The penalty weight  is set as 0.3 in all the experiments.
(d), the boundary overfits the face training set and most of the face training patterns become SVs as  is set to 1.Although nonfaces can be rejected by this extremely tight boundary successfully, face test patterns are easily rejected as nonfaces as well, resulting in a poor face acceptance rate (i.e., face detection rate).

Figure 3 :
Figure 3: (a) shows that the 100 face patterns in the 2D PCA-based subspace are partitioned into three clusters.Face data sets in the three clusters are used to train three independent SVDDs, and the results are displayed in (b) (SVDD for cluster 1), (c) (SVDD for cluster 2), and (d) (SVDD for cluster 3) subfigures, respectively.

3 )
time complexity, where  =   /2 +   /2 (for twofold cross validation, 50% of the patterns are used for training).For one-class classifiers, only half of the face training patterns are included in the training.For example, SVDD's training time complexity is ( 3 ), where  =   /2.Since   = 500 in all the three experiments, the actual training time of SVDD is almost the same for the three cases.SVDD committee takes more training time than SVDD.However, compared with SVM, SVDD committee has a much faster training speed, especially when the number of training faces increases.The testing speeds reported in Table

)
Stage 2 (training SVDDs).After the training set  is partitioned into the  best subsets   ,  = 1, . . .,  best , where there is no overlap between two subsets and || = ∑  best =1 |  |, the second training stage is to train  best SVDDs.The th subset is used to train the th SVDD.Namely, use the face training subset Select   face and  non nonface patterns from the training set by random.The collected   +  non patterns form a new training set.In addition, randomly select 200 face and 400 nonface patterns from the test set.The 600 patterns form

Table 2 :
Comparison of average error rate among different methods on extended MIT face data set (in %).

Table 3
4.3.4.Training and Testing Speeds.We further compare the computational complexities of the methods.The training time and testing time of each method are recorded during the experiments.A 3.40 GHz-CPU (i7-3770) computer (with 8 GB RAM) running on Windows 8 is used.The methods are implemented with MATLAB 7.10, which is also a 64-bit

Table 3 :
Error reduction ratios (ERRs) of SVDD committee to SVDD in the three experiments (in %).

Table 4 :
Training time of each method (in s).

Table 5 :
Testing speed of each method (ms/pattern).We can see from Table 4 that the training time of SVM increases with the size of the training set, because SVM' training has (