A Modified Adaboost Algorithm to Reduce False Positives in Face Detection

We present amodified Adaboost algorithm in face detection, which aims at an accurate algorithm to reduce false-positive detection rates. We built a new Adaboost weighting system that considers the total error of weak classifiers and classification probability. The probability was determined by computing both positive and negative classification errors for eachweak classifier.Thenewweighting system gives higher weights to weak classifiers with the best positive classifications, which reduces false positives during detection. Experimental results reveal that the original Adaboost and the proposedmethod have comparable face detection rate performances, and the false-positive results were reduced almost four times using the proposed method.


Introduction
Face detection is a computer technology that determines the location, size, and posture of the face in a given image or video sequence [1].Face detection is an active research area in the computer vision community; locating a human face in an image plays a key role in applications like face recognition, video surveillance, human computer interfaces, database management, and querying image databases [2,3].The most successful face detection method was developed by Viola and Jones based on the Adaboost learning algorithm [4].Although their method was successful in face detection, it faces false alarm challenges, which may increase in the presence of a complex background.False positives in an application can be a source of errors and need additional postprocessing to remove them.As our contribution to reduce the number of false positives, we propose a probabilistic approach to modify the weighting system of the Adaboost algorithm, which includes the expansion of key ideas and supporting experimental results over the preliminary version [5].

Face Detection and Adaboost
A number of promising face detection algorithms have been published.Among these, the Adaboost method stands out because it is often referred to by other face detection studies.In this section we present the outline and main points of some of face detection algorithms.

Face Detection. Madhuranath developed the "modified
Adaboost for face detection."In their method, multiple strong classifiers based on different Haar-like types trained on the same set of input images are combined into a single modified-strong classifier [6].Viola and Jones [4] presented the fundamentals of their face detection algorithm.This algorithm only detects frontal upright faces; however, a modified algorithm was presented in 2003 that detects profile and rotated views [7].In "Face Detection Using a Neural Network" [8], the authors computed an image pyramid to detect faces at multiple scales.A fixed size subwindow was subjected to each image in the pyramid.The content of the subwindow is corrected for nonuniform lightening and subjected to histogram equalization.The processed content is passed to parallel neural networks that carry out the actual face detection.The outputs are combined using logical AND to reduce the amount false detection rate.In its first form this algorithm also only detects frontal upright faces.
Schneiderman and Kanade [9] calculated an image pyramid and fixed size subwindow scans through each layer of this pyramid.The content of the subwindow was wavelet analyzed and histograms were prepared for the different wavelet coefficients.These coefficients were fed to different trained parallel detectors sensitive to various orientations of the object.The orientation of the object is determined by the detector that yields the highest output.In contrast to the basic Viola-Jones algorithm, this algorithm detects profile views.AL-Allaf reviewed face detection studies based on different ANN approaches [10], whereas C. S. Patil and A. J. Patil combined skin color information and Support Vector Machine to detect faces [11].
One of the fundamental problems with real-time object detection is that the size and position of a given object within an image are unknown.An image pyramid was computed to overcome this obstacle, and a detector scans each image in the pyramid.However, this process is rather time consuming; thus, Viola and Jones presented a new approach based on Adaboost algorithm to solve the problem.However, one of the disadvantages was a high false-positive rate.

Adaboost Learning Algorithm.
First introduced by Freund and Schapire [12], the Adaboost algorithm is short for Adaptive Boosting.This algorithm strengthens overall performance when used with other weak learning algorithms.A weak learning algorithm consists of a learning algorithm that classifies the input data better than random.
The Adaboost algorithm is adaptive in that misclassified data from previous classifiers are boosted during training by assigning them higher weight than that of the correctly classified data.The training database is input data set and associated classification labels.Adaboost repeatedly calls a weak learning algorithm over the training data set.Most optimal parameters of weak learning algorithms are computed at each stage, which minimizes the classification error.A weak learning classifier with optimal parameters at a given training stage is called a best weak classifier [4].The input data set is initially weighted equally; however, the weak learning algorithm puts emphasis on the misclassified data more than the correctly classified data during the training process.This is accomplished by raising the weights of the misclassified data during each stage with respect to the correctly classified data.The main steps for the Adaboost algorithm to classify data efficiently are presented in the following.

Pseudocode for the Adaboost Learning Algorithm
(3.2) select the best weak classifier ℎ  () in terms of parameters of the th stage   which minimizes the classification error between the weak classifier output ℎ  (  ,   ) over the th index of the data and the corresponding label   , over all indices  of the data,  = 1, . . ., : (3.3) compute the th stage exponent   ; a weak classifier ℎ(), which classifies the input data better than random will result in   which is less than 0.5; thus   will be less than 1.0, (3.4) classify the th index of the data   with this weak classifier ℎ  (), compare with the actual label   , and store the error in classification   over all indices  = 1, . . ., , of the data: (3.5) update the weights  of the input data during this th stage using the exponents   computed in step (3.3).Since the weights are being normalized in step (3.1) the weights of the incorrectly classified data are boosted; this is the basic idea behind Adaboost.
(4) The final strong classifier, (), is the weighted majority of the individual weak classifiers chosen in step (3.2) of each stage .
where () is the strong classifier,   = log(1/  ) is the weight, ℎ  (,   ) is the weak classifier of the th stage,  is the input data,   are the parameters, and ℎ  () is the Haar-like feature type.
Freund and Schapire [12] showed that the training error on the final hypothesis is upper bounded and if the individual weak hypothesis classifies the input data better than random, the training error decreased exponentially with an increase in the number of stages.The generalization property of Adaboost is a gradient-descent method in the space of weak classifiers, as shown by Schapire and Singer [13].

Face Classification Using Adaboost. Faces and nonfaces
with their corresponding labels become the input data set for the Adaboost algorithm.Each Haar-like feature ℎ() is evaluated as the sum of the pixels in the image corresponding to the white portion subtracted from the sum of the pixels in the image corresponding to the black portion.A weak classifier has been correctly classified, if the value of the Haarlike feature at a particular location (, ) on the image is greater than the threshold , and the polarity, , determines the sign of this inequality in  * ℎ (, ) >  * .
The exponent   used for updating the weights is computed based on this weighted error, as shown in Typically,   will be less than unity.The classification of the image   based on the optimal weak classifier ℎ  (  ,   ,   ,   ) is performed as shown in The classification error   is used to update the weights of the ( + 1)th stage,  +1, based on the weights of the th stage and  , according to the Adaboost method of updating the weights, as shown in As   is less than unity, the correctly classified images are weighted lower than the misclassified images.The process of normalizing the weights in the next training stage results in lower weights of the misclassified images than those of the present stage.When all training stages  are complete, the final strong classifier is the weighted majority of the optimal weak classifiers ℎ  (  ,   ,   ,   ,   ) of each stage, as shown in where () is the strong classifier,   = log(1/  ) is the weight, ℎ  (  ,   ,   ,   ,   ) is the weak classifier of th stage,   is the threshold, and ℎ  () is the Haar-like feature type.

Probabilistic Weighting Adjusted Adaboost
The main objective of our proposed method is to reduce the number of false-positive results.We want to reduce the number of regions in the image that are classified falsely as faces.Therefore, our contribution changes the weighting system of the original Adaboost algorithm based on a probabilistic approach [14].

Best Weak Classifier Selection.
As in the Viola-Jones method, we used training data made of positive (cropped face) and negative (random images without faces) images and used Haar-like features to build the full dictionary of weak classifiers.Note that weight was normalized in step (3.1) of the pseudocode for the Adaboost learning algorithm, which makes the total weighted error sum to 1.As in the Viola-Jones method, a weak classifier is selected as the "best weak classifier" once its total weighted error   is less than 0.5 [4].
In the proposed procedure, a weak classifier was classified as the best one when the weighted positive error was less than 0.05 to keep the positive detection rate at about 95%.For a given pattern   each best weak classifier ℎ  provides ℎ  (  ) ∈ {1, 0}, and the final decision of the committee  of selected best weak classifiers is (  ), which can be written as the weighted sum of the decision of the best weak classifier as follows: (13) where ℎ 1 , ℎ 2 , . . ., ℎ  denote  best weak classifiers selected from the pool.The constants  1 ,  2 , . . .,   are weights assigned to each classifier decision in the committee.Recall that every ℎ  just answers "yes" (1) or "no" (0) to a classification problem, and the result is a linear combination of classifiers followed by a nonlinear decision (sign function).
In the Viola-Jones method, the weight given to each best classifier only depends on the total error that each classifier committed in a training set, as shown in (9).In the original Adaboost method, if two best weak classifiers have the same error, their opinions are given the same weight no matter how different their probabilities of classifying positive or negative images may be.We introduce a new weighting system; the weight given to the opinion ("yes" or "no") of the best weak classifier considers the ability of the best weak classifier to classify positive images on one side as well as negative images.Therefore, this information will be very useful to build a system that reduces the false-positive rate by giving more weight to the best weak classifier with a high probability of classifying the positive images correctly.

Best Weak Classifier Classification Probability and Weighting.
A weak classifier from the pool is voted to be the best weak classifier once once it classifies the input data better than random [4].Now, let us consider that the output from a given best weak classifier is made for a given number of well classified images, false-positive (negative images classified as positive) and false-negative images (positive images classified as negative).The "false-positive error probability" and the "false-negative error probability" can be computed as follows [14]: where  Total is total error,  FP is the false-positive error,  FN is the false-negative error,  FP is the false-positive error probability, and  FN is the false-negative error probability.
After calculating the probabilities we used them to build a new weighting system that considers both error and probabilities for weighting each classifier's opinion.In fact, two classifiers with the same error but with different classification probabilities will have different weights because the probability is considered while assigning weights to the classifiers.The weight in the original method ( 12) can be rewritten as where   is the total error of a considered classifier at stage .The new alpha called "probabilistic alpha" is computed as follows considering the previously calculated probabilities: The proposed alpha is inversely proportional to the falsepositive error probability; thus, when a given classifier has a high false-positive error rate, its weight is lowered; otherwise the inverse is true.Once a best weak classifier produces high  FP , it is given a relatively small weight, which produces a strong classifier that reduces the number of false positives.Note also that when a classifier produces an increasing  FN , the value of the numerator of the term after the multiplication sign in (16) decreases and this also lowers the weight of the proposed alpha.The probabilistic alpha on updating weight allows greater update from misclassified images compared to the original method.This is because the proposed alpha will always be greater than the original alpha according to (16).Hence, misclassified images are relatively highly weighted using the proposed method; therefore, accuracy is higher for the best weak classifiers.The following shows the pseudocode for the proposed algorithm.
(3.2) select the best weak classifier ℎ  () with respect to the weighted error: where (3.6) compute alpha    ; (3.7) update the weights (4) The final strong classifier, (), is where    is the probabilistic weight and ℎ  () is the weak classifier of the th stage.

Experimental Classification Results and Analysis
We trained and tested data sets, such as the CMU/MIT face data set for training and Georgia Tech face database made of 50 distinct persons (15 images per subject) for testing.The Georgia Tech images were taken at different times with different lighting, facial expressions (open/closed eyes, smiling/not smiling), and facial details (glasses/no glasses).All images were taken against a complex background with the subjects in an upright and frontal position (with tolerance for some side movement).We evaluate the performance of the algorithms using the precision.The precision for a class is the number of true positives divided by the sum of true positives and false positives, which are items incorrectly labeled as belonging to the class.High precision means that an algorithm returned substantially more relevant results than irrelevant.
Experimental results obtained from the Viola-Jones and the proposed probabilistic method are described in Table 1.The number of false-positive results is reduced about four times using the proposed method and the precision of the proposed algorithm is twice as high as that of Viola-Jones algorithm without much deteriorating the accuracy.The processing speed is around 30 fps for both methods.
The faces detected by the Viola-Jones method and the proposed method are shown in Figure 1.

Conclusion
The challenge in face detection using Adaboost is the number of false-positive results that accompany actual faces detected on a complex background.In this study, we presented a probabilistic weighting adjusted Adaboost algorithm for detecting faces in a complex background that reduced falsepositive errors.We want to reduce the number of regions in an image that are classified as faces but that are not faces.Therefore, we propose a modified version of the Adaboost algorithm that classified weak classifier probabilities for weighting the decisions of each best weak classifier.Classifiers with the same total error have the same weight in the original Adaboost algorithm.However, classifiers with the same error in our proposed probabilistic approach have different probabilities and different weights.In this new weighting system, the classifier's weight is inversely proportional to the false-positive error probability; thus, when a given classifier has a high probability of false-positive errors, its weight is decreased.Experimental results reveal that the proposed algorithm reduces the number of false positives almost four times compared to that of the original Adaboost.

( 1 )
Given data and corresponding labels ( 1 ,  1 ), . . ., (  ,   ) where  is the input data and  = 1, 0 are the labels for positive and negative examples with the size of the input data being .(2) Initialize weights  of the input data equally;  1, = 1/2, where  = 1, . . .,  and  is the th index of the data.(3) For  = 1, . . ., , where  is the number of stages of training, (3.1) normalize the weights  of the th stage:

Table 1 :Figure 1 :
Figure 1: Detected faces by the Viola-Jones method (a) and the proposed method (b).
Figure 1(a) shows a false-positive result and Figure 1(b) shows the same face detected without the false-positive by the proposed probabilistic weighting.
Classification of the weak classifier depends on the evaluation of the Haar-like feature ℎ() on the image  at the location (, ), the threshold , and polarity .Image , location (, ), polarity , and threshold  are the weak classifier parameters, and the output of the weak classifier is represented as ℎ(, , , , ).Thus, the data in the Adaboost learning algorithm is image , and the parameters become the location, polarity, and threshold (, , , and ).During the training stages of a single weak classifier, the Haar-like feature ℎ() is evaluated at each location (, ) across all  images   ,  = 1, . . ., .The weighted sum of the error between the correct and the actual classification |ℎ  (  , , , , ) −   | for all location (, ) is computed.The location (  ,   ), threshold   , and polarity   of the Haar-like feature ℎ  () of the th stage, which minimizes the weighted error, are chosen as the weak classifier parameters ℎ  (  ,   ,   ,   ,   ).This minimum weighted error for the th stage, represented as   , is shown in (  ,   ,   ,   ,   ) −       .