Face detection has been an important and active research topic in computer vision and image processing. In recent years, learning-based face detection algorithms have prevailed with successful applications. In this paper, we propose a new face detection algorithm that works directly in wavelet compressed domain. In order to simplify the processes of image decompression and feature extraction, we modify the AdaBoost learning algorithm to select a set of complimentary joint-coefficient classifiers and integrate them to achieve optimal face detection. Since the face detection on the wavelet compression domain is restricted by the limited discrimination power of the designated feature space, the proposed learning mechanism is developed to achieve the best discrimination from the restricted feature space. The major contributions in the proposed AdaBoost face detection learning algorithm contain the feature space warping, joint feature representation, ID3-like plane quantization, and weak probabilistic classifier, which dramatically increase the discrimination power of the face classifier. Experimental results on the CBCL benchmark and the MIT + CMU real image dataset show that the proposed algorithm can detect faces in the wavelet compressed domain accurately and efficiently.
Automatically detecting specific objects from images has been a popular research topic for intelligent image analysis and understating with many applications, including face recognition, face tracking, expression cloning, face pose estimation, and 3D head model reconstruction from images. These applications usually assume the face regions are detected correctly as the first step. A lot of researchers from computer vision and image processing have proposed many different approaches for this problem.
Most previous face detection methods focused on detecting faces from a single gray-scale image. The survey paper [
In recent years, with the popularity of digital camera and camcorder, the demand for real-time face detection is increasing. Detecting faces directly in a compressed domain, instead of the original image, is an interesting approach that can save time in the decompression process and reduce the complexity of hardware and software design, especially that most digital images in the world are stored in a compressed form. However, not much previous research work was focused on detecting faces in a compressed domain [
Some previous face detection works proved that the wavelet representation [
Figure
The block diagram of face detection in the original image space and the wavelet compressed domain for JPEG2000 [
Increasing the discrimination in the limited wavelet compressed feature space is the main goal of our learning mechanism. We propose a space warping technique to increase the representation capability of each feature via reweighted learning of the sample distributions. In addition, the joint feature representation projects learning samples onto a paired-feature plane to improve the discrimination power. We propose an improved AdaBoost system based on learning with this joint feature representation.
In order to avoid the information loss in the feature learning procedure, some modified components are developed to preserve more information. For example, the ID3-like quantization is applied after the join feature space representation. Compared with traditional quantization methods, the proposed ID3-like quantization considers the positive and negative sample distribution in 2D pair feature space to achieve the best discrimination by separating samples into different bins with their labels. Moreover, instead of a binary classifier, Bayesian weak learners are adopted to compute the ratio between positive and negative samples for each bin as the output for the classifier. With simple prelearned look-up tables, the weak classifier can provide more detailed classification result than a hard decision.
Finally, the trained face classifier can be applied directly in the wavelet compressed domain with very efficient calculation. The learning framework preserves the essential information in quantization tree and some look-up tables. The execution process of the face detector is simplified to some low-complexity computation, such as accessing corresponding coefficients and querying look-up tables. Although the input features of the proposed algorithm are restricted to the wavelet compressed coefficients without normalization, our experimental results show the accuracy of the proposed face detector is comparable to some state-of-the-art methods which detect faces in the uncompressed image domain.
The rest of this paper is organized as follows. Section
Automatic object detection is an important issue in computer vision and image. In addition to face detection, researchers have proposed algorithms for car detection [
Our focus in this paper is on frontal face detection from still gray-scale images. The survey paper [
In order to tolerate more scale and pose variations, Fleuret and Geman [
Although detecting faces in gray-scale images is the most general approach, some researchers also employed the color information to simplify the face detection problem. By using the color information, the face detection system can extract more discriminative information and increase the speed and accuracy dramatically. Traditional works [
Detecting faces in video [
Furthermore, multipose face detection techniques have been researched to extend the previous frontal face detection methods in recent years. Schneiderman and Kanade [
In addition, some more related works were developed for different problem settings and different applications. For example, detecting small faces from degraded images [
In addition to the face detection, face identification and recognition is another challenging problem which had been widely discussed in computer vision research field. After the face region is detected precisely from the input image, the face recognition system would analyze the frontal facial image patch and determine or verify the identity of the person. Zhao et al. [
AdaBoost, short for adaptive boosting, is a meta-algorithm that can cooperate with other machine learning techniques to improve their performance. The AdaBoost algorithm was originally proposed by Freund and Schapire [
Distribution Weak learning algorithm integer
(1) Set (2) Call (3) Calculate the error of (4) Set (5) Set the new weight
As shown in Algorithm
In the AdaBoost algorithm depicted in Algorithm
The input integer
AdaBoost has been very popular in computer vision and image processing research fields since the first real-time face detection method proposed by Viola and Jones [
At the end of this section, we list some improved versions of the AdaBoost algorithm in Table
Variants of the AdaBoost algorithm.
Algorithm | Authors and references | Description |
---|---|---|
AdaBoost | Freund and Schapire [ |
Original algorithm |
RealBoost | Schapire and Singer [ |
A real version of weak hypothesis |
AdaBoost.M1/M2 | Freund and Schapire [ |
Multiple class extension of original AdaBoost algorithm |
AdaBoost.R | Freund and Schapire [ |
Solving regression problems |
AdaBoost.MO/MH/MR | Schapire and Singer [ |
Multiclass, multilabel extensions |
Kullback-Leibler Boosting | Liu and Shum [ |
Incorporate Kullback-Leibler divergence into AdaBoost |
FloatBoost | Li and Zhang [ |
Replace exponential error function by backtrack mechanism |
The proposed learning system is an improved version of Viola and Jones’ face detector [
For the proposed face detector, a 3-leveled wavelet transformation is applied for each training image to obtain the 576-dimensional features. The LL band of highest level from
The goal of the improved AdaBoost learning algorithm is to learn an efficient face detector from a restricted wavelet feature space. Without wavelet decomposition and intensity-based normalization processes, the feature discrimination power for face detection is weak. The improved AdaBoost learning system contains four major improvements based on two principles: higher discrimination and more information preservation. Algorithm
(i) Given example images for positive examples, respectively. (ii) Initialize weights the total numbers of negative and positive images, respectively. (iii) For (1) Estimate feature space warping function and weights (2) For each possible feature pair ( warped feature value (3) Apply ID3-like tree method to each axis of paired feature plane in rotation, and find the quantization function samples into different bins. (4) Compute the conditional probability as the Bayesian classification result for each weak classifier (5) Estimate the error (6) Select the paired feature (7) Update the weights for all training samples as follows: where (8) Normalize the weights by (iv) The final classifier is given by
The function of feature space warping is similar to the histogram equalization for image enhancement. The basic idea is that we should use more levels in the area with dense data distribution and less levels in the area with sparse data for the feature quantization. In actual implementation, the distribution of the training data samples should be reweighted by the current weights. We need a nonlinear transformation for each feature to increase the feature representation capability.
A discrete cumulative density function is estimated to find some landmark points, such as the feature value located in 50% weighted distribution. After we have these landmarks, a simple space warping, which linearly interpolates samples between these points, is applied. Figure
The feature space warping representation: (a) the landmarks in the original distribution and (b) the weighted distribution in the warped feature space.
Schneiderman and Kanade [
The original feature dimension for a three-level wavelet transform of a 24-by-24 block image is 576. After the cross-level and cross-band combinations of a pair of wavelet coefficients, the feature dimension is increased to 160,461 dramatically without extracting any additional information. Instead of the paired feature representation, a cubic or even higher dimensional mapping is another aggressive possibility. However, the efficiency of the AdaBoost learning system should be considered, especially when the computational cost for the AdaBoost learning from the paired feature spaces is already very high.
The major design principle for the feature space warping and the joint feature representation is to explore high discrimination from the limited set of wavelet coefficients. After the joint feature representation is created, the following steps are designed for preserving more information in the feature quantization before going into the AdaBoost learning procedure.
After the joint feature representation, we can estimate the positive and negative data distributions for each feature pair from the training data. To develop weak classifiers for all possible pair features for AdaBoost learning, we quantize the paired feature plane for computing the conditional probabilities of the joint features for positive and negative cases. Our strategy is to segment the paired feature plane into several representative regions and use the ratio of the conditional probabilities in each quantized region for classification. In other words, we want to quantize the continuous paired feature plane such that each quantized segment has its own dominant sample label. With this strategy, the system can separate the positive and negative into different segments as good as possible to achieve high discrimination capability.
To achieve the above goal, we employ the ID3-like plane quantization on the paired feature space. This quantization for each feature is determined based on the distribution of the training data with the current weight function. Compared to other traditional quantization methods, our ID3-like approach can retain more information with a little bit more computational effort. The main algorithm of ID-3 decision tree [
In the ID3-decision tree, we first define the entropy and information gain as follows:
ID3-like plane quantization representation: (a) a 4-layer ID3 tree structure and (b) the corresponding quantization regions and boundaries in the feature plane.
For each pair of features, we can train a weak classifier based on the corresponding joint conditional probability determined from the training data. The AdaBoost training algorithm is then used to select some powerful and complementary weak classifiers and combine them to form a final classifier for face detection. For each weak classifier, we apply the Bayes rule based on the conditional probability density function in each interval to decide the class. By thresholding the ratio of face to nonface conditional probabilities for a given paired feature vector in a quantized interval, we obtain a number of weak classifiers that can be used in the AdaBoost training algorithm.
Making a binary decision in the weak classifier does not fully exploit the information computed in the conditional probabilities. Therefore, we replace the binary decision in the weak classifier by the conditional probability in the AdaBoost learning algorithm.
By applying the Bayes rule, we can compute the conditional probability as follows:
In our implementation, we segment the paired feature plane into 16 regions and a look-up table containing 16 Bayesian probabilities is used for the AdaBoost learning and testing processes.
In this section, we describe how the trained AdaBoost face classifier is applied for face detection from the wavelet representation of the whole image. The previous AdaBoost face detector is featured for its simple computation. Although we add several complicated components into our learning system to increase the overall discrimination for face detection from a restricted feature space, the resulting AdaBoost classifier only needs a very small amount of computation with some look-up tables and quantization tree structures. Algorithm
(i) Given a test image represented in (ii) Each layered-coefficient plane (iii) Preprocessing (1) Apply the bi-linear interpolation to down-sample each sub-band to 1/1.25, 1/1.5, and 1/1.75 scales, respectively, and form three additional wavelet layer sets. (iv) For each of these four sets of the wavelet-layer representation, run the sliding window face detection with the scale (1) Apply the AdaBoost face classifier to each sliding window which is constructed from the coefficients in the planes from (2) If the classifier determines the region is a face, calculate and save the position and size of the corresponding window in the original image space based on the shift, downsample, and layer information. (3) Repeat the previous two steps with the scale (v) Postprocessing (1) Eliminate the overlapped face regions based on the scores provided by the AdaBoost classifier. (2) Output the detected faces.
First, we describe how the trained AdaBoost classifier is applied in a sliding window search strategy for face detection from a wavelet compressed image. When focusing on a single block of a whole image to determine whether it is a face region, we can access the corresponding wavelet coefficients from HL, HH, and LH bands of three contiguous levels. Since the LL bands for all levels are not available without a complete decompression process, we do not include them into our feature space learning.
In each iteration
When detecting faces in a whole image, the position and scale of sliding windows should cover all the possible image blocks where faces could appear. The minimal detection window is as large as the training samples with
The corresponding coefficients in compressed domain of different detected window.
In addition to the shift, the various scales of the sliding windows in the compressed domain are not easy to implement without wavelet decomposition. The multilevel structure of wavelet decomposition provides a basis to find the corresponding features from different scales, but it is restricted to the detected windows with power-of-2 scales of the template face window. For example, if we can detect faces of size 48-by-48 from wavelet levels
In this section, we show four sets of experiments to verify the improvement of the proposed AdaBoost learning system to demonstrate the performance and efficiency of the proposed face detector directly on the wavelet compressed domain. We first adopted CBCL face database which contains separate training and testing image datasets for evaluating the AdaBoost learning results. Totally 24,045 testing clipped images are examined after one-time learning from 6,977 training image blocks. Then, the MIT + CMU face database is used for the testing of face detection from whole images with bootstrap learning. Our experimental results show that the proposed learning system significantly improves the accuracy of face detection on the restricted wavelet compresses domain.
To justify the improved AdaBoost-based learning system, we need a benchmark to evaluate the performance of the proposed method. MIT CBCL face dataset provides a fair benchmarking database to compare the performance of face classifiers. The training dataset contains 6,977 image blocks (2,429 face blocks and 4,548 nonface blocks) and the testing data set is composed of 24,045 image blocks (472 face blocks and 23,573 nonface blocks).
All the experiments in this part have three different learning system settings: AdaBoost learning with Viola and Jones’ feature space [
The first benchmarking experiment is performed for each of the three learning systems with 10 weak classifiers only. The results are depicted in Figure
The benchmark comparison on the CBCL database: (a) three different learning systems with 10 weak classifiers adopted, (b) three different learning systems with 20 features adopted, and (c) three different learning systems with 200 features adopted.
In order to examine the limit of the learning systems, we increase the accessing feature number to 200. In other words, the standard AdaBoost classifiers contain 200 iterations and the proposed paired feature learning system consists of 100 weak classifiers. As shown in Figure
The second experiment is mainly used to examine the performance of the proposed learning system after bootstrap learning. Our system is designed for face detection in wavelet compressed domain, so the input to our face detection system is the wavelet representation of the whole testing image.
MIT + CMU face dataset is widely used in the face detection research. There are 130 gray-scale images containing 507 faces in this dataset. Figure
The ROC curve of the final classifier in MIT + CMU database.
The curve of Viola and Jones’ method was published in their paper [
In this experiment, we provide more results and comparisons between the proposed face detection system and other systems. The experimental setting is similar to that in the previous two experiments. The detection rates under different numbers of false detection of the proposed face detector and some previous methods on the MIT + CMU face dataset are depicted in Table
The detection rates under different false detections of several different face detection systems.
Detection method | False detections | |||||||
---|---|---|---|---|---|---|---|---|
10 | 31 | 50 | 65 | 78 | 95 | 167 | 422 | |
AdaBoost (compressed) | 56.1% | 62.5% | 63.9% | 65.9% | 66.5% | 67.9% | 70.9% | 75.9% |
Proposed method (compressed) | 79.8% | 85.2% | 87.3% | 88.1% | 88.5% | 89.2% | 90.9% | 93.3% |
Viola and Jones [ |
76.1% | 88.4% | 91.4% | 92.0% | 92.1% | 92.9% | 93.9% | 94.1% |
Viola and Jones (voting) [ |
81.1% | 89.7% | 92.1% | 93.1% | 93.1% | 93.2% | 93.7% | — |
Rowley et al. [ |
83.2% | 86.0% | — | — | — | 89.2% | 90.1% | 89.9% |
In this comparison, we can see that the accuracy of the proposed method is about 1%~5% lower than that of Viola-Jones’ method under different rates of false positives. We think it is a reasonable accuracy decrease since the proposed face detection algorithm is restricted to the limited wavelet feature space and it can save the decompression cost. Figure
Face detection results of proposed methods in MIT + CMU dataset.
False positives and missing of proposed face detection methods in MIT + CMU dataset.
Table
The execution time (ms) of the proposed face detector and the original AdaBoost face detector for detecting faces from wavelet compressed images.
Image size (pixels) | Sliding windows | Tier 1 + Tier 2 | Method | IDWT | Integral images | Feature extraction | Classifiers | Processing time | Total time |
---|---|---|---|---|---|---|---|---|---|
|
8,411 | 27.6 | Original | 18.4 | 1.8 | 19.5 | 9.7 | 49.4 | 77.0 |
Proposed | — | 5.4 | — | 14.1 | 19.5 | 47.1 | |||
|
|||||||||
|
41,243 | 95.7 | Original | 75.2 | 7.2 | 96.6 | 47.5 | 226.5 | 322.2 |
Proposed | — | 21.4 | — | 69.3 | 90.7 | 186.4 | |||
|
|||||||||
|
113,789 | 207.4 | Original | 183.7 | 18.3 | 263.8 | 131.1 | 596.9 | 804.3 |
Proposed | — | 55.3 | — | 190.7 | 246.0 | 453.4 | |||
|
|||||||||
|
291,580 | 488.1 | Original | 432.9 | 43.7 | 677.1 | 334.2 | 1,487.9 | 1,976.0 |
Proposed | — | 135.7 | — | 492.8 | 628.5 | 1,116.6 |
The proposed wavelet-domain face detector skips the IDWT procedure and the feature extraction process to achieve more efficient detection. Our experiments show that the processing time, which skips the tier 1 and tier 2 decoding time, on a
In this paper, we proposed a face detection system working directly in the wavelet compressed domain. The main contributions of the proposed face detection algorithm are an improved AdaBoost-based learning framework and a series of joint feature representation strategies, which can produce a strong face classifier on the restricted wavelet feature space.
The proposed face detection system involves a feature space warping process, a paired feature learning scheme, an ID3-like joint feature plane quantization method, and a weak Bayesian classifier. Although some complicated components which increase the power of classifiers are included in the face detection system, the execution of the final face classifier is quite simple. With tree structure quantization and look-up tables, the proposed face detector can work very efficiently and directly on the wavelet compressed domain. Our experimental results on the benchmarking face datasets showed that the proposed face detection system working in the compressed domain can achieve similar accuracy to that of Viola and Jones’ face detector [
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors would like to thank the National Science Council of the Republic of China, Taiwan, for partially financially supporting this research under Contract nos. NSC 102-2221-E-007-082 and NSC 102-2622-E-007-019-CC3. This work was also supported by the Advanced Manufacturing and Service Management Research Center (AMSMRC), National Tsing Hua University.