Color Face Recognition Based on Steerable Pyramid Transform and Extreme Learning Machines

This paper presents a novel color face recognition algorithm by means of fusing color and local information. The proposed algorithm fuses the multiple features derived from different color spaces. Multiorientation and multiscale information relating to the color face features are extracted by applying Steerable Pyramid Transform (SPT) to the local face regions. In this paper, the new three hybrid color spaces, YSCr, Z n SCr, and B n SCr, are firstly constructed using the Cb and Cr component images of the Y Cb Cr color space, the S color component of the HSV color spaces, and the Z n and B n color components of the normalized XYZ color space. Secondly, the color component face images are partitioned into the local patches. Thirdly, SPT is applied to local face regions and some statistical features are extracted. Fourthly, all features are fused according to decision fusion frame and the combinations of Extreme Learning Machines classifiers are applied to achieve color face recognition with fast and high correctness. The experiments show that the proposed Local Color Steerable Pyramid Transform (LCSPT) face recognition algorithm improves seriously face recognition performance by using the new color spaces compared to the conventional and some hybrid ones. Furthermore, it achieves faster recognition compared with state-of-the-art studies.


Introduction
Color information of face images is very important for face recognition [1]. In [1][2][3], it was demonstrated that facial color features could drastically improve recognition performance compared with gray based cues. The color space consists of a combination of the red, green, and blue components of images. The other color spaces are derived from color spaces by linear or nonlinear transformations. Many recent works about face recognition have used the different color spaces in order to improve the recognition performance [1][2][3][4][5][6][7][8].
Two normalized hybrid color space methods were developed in [5]. In [6], the conventional color spaces such as , , and were evaluated comparatively with respect to each other and with respect to gray space by using Principal Component Analysis (PCA). In [7], a question of what kind of color space is suitable for color face recognition was surveyed and a set of optimal coefficients to combine the , , and color components by a discriminant criterion was found. In [8], a new hybrid color space combining the and color spaces was proposed. The results revealed that the hybrid color space is more powerful than gray space and even more than color space. Some authors generated a new color space as by taking the , , and color components of , , and color spaces into consideration sequentially in [9]. In this approach, Gabor Transform, Local Binary Patterns (LBP), and Discrete Cosine Transform (DCT) were applied to the , , and chromatic component images, respectively. All information obtained from the three color component images was fused by weighed sum rule. In [10], a new Discriminative Color Features (DCF) method was presented on the color space obtained by means of subtraction of the primary and color component images from primary component image. In [1], Canonical Correlation Analysis (CCA) was presented for face feature extraction and recognition. In [11], Gabor wavelet and LBP were individually applied to 2 The Scientific World Journal color space and normalized color space proposed in [5] and the outputs of each classifier were combined by the feature fusion and the decision fusion. Although there are many studies, to determine the best hybrid color space is still a challenging problem for face recognition.
This paper evaluates different hybrid color spaces for improving face recognition performance by proposing Local Color Steerable Pyramid Transform (LCSPT) algorithm. Steerable Pyramid Transform (SPT) is a linear multiscale, multiorientation image decomposition technique [12]. SPT aims to represent the original image at different resolutions. Thus, the face image is analyzed by enhancing and isolating image features. SPT was successfully applied on gray images for face recognition in [13].
In color face recognition, the novelties presented in this paper are three fold.
(1) Firstly, an effective color feature extraction algorithm that increases the performance of face recognition by using SPT is proposed. In the algorithm, the features relating to each color component of color face images are extracted by using SPT at different angles and different scales.
(2) Secondly, three novel hybrid color spaces are proposed.
(3) Thirdly, a group of classifiers is applied to the efficient feature set obtained by using SPT with respect to decision frame. In this study, the Extreme Learning Machines (ELMs) for Single Layer Feed-forward Neural Networks (SLFNNs) are developed as an efficient classification method in color face recognition area. SLFNNs have been widely used in face recognition due to their approximation capabilities for nonlinear mappings using input samples. The weights and biases parameters of SLFNNs are usually iteratively adjusted by gradient-based learning algorithms. The past studies in the field of face recognition show that they are generally very slow due to improper learning steps or may easily converge to local minima and they need a number of iterative learning steps in order to obtain better learning performance [14][15][16][17][18]. To get rid of these limitations of SLFNNs for color face recognition in this paper, ELM proposed by Huang et al. [14] is suggested and the combination of ELM and SPT are used. In ELM, the weights of hidden nodes and biases are randomly chosen and output weights are analytically determined. ELM reaches to a good generalization performance in an extremely fast period [18]. ELM has been successfully applied to face recognition area [19,20]. Also ELM has not been applied together with SPT in the face recognition literature. In this study, ELM and SPT were applied to color face recognition the first time. Comparative and extensive experiments have been illustrated to present the effectiveness of a new algorithm on the color FERET database [21] and the AR database [22].
The rest of this paper is organized as follows. In Section 2, SPT is shortly introduced. The basic architecture of ELM classifier is presented in Section 3. In Section 4, the types of color spaces are introduced. Section 5 describes the proposed LCSPT face recognition algorithm. In Section 6, the comparative experimental results are illustrated. The paper is concluded in Section 7.

Steerable Pyramid Transform
The Steerable Pyramid is a multiorientation, multiscale image decomposition method proposed by Freeman and Adelson as an alternative to wavelet transform [12]. In SPT, an image is decomposed into noncorrelated frequency subbands localized in different orientations at different scales. The transform has steerable orientation subbands and a tight frame referred to as self-inverting. Steerable function means that it can be represented as a linear combination of rotated versions of itself. Self-inverting transform means that the synthesis function is the same as the analytical function. The combination of these two properties results in that the subbands become invariant from translation and rotation.
As shown in Figure 1, the input image is firstly decomposed into a highpass subband using a nonoriented highpass filter 0 ( ) and then into a lowpass subband using a narrow band lowpass filter 0 ( ). Afterwards this lowpass subband is decomposed into -oriented portions using the bandpass The Scientific World Journal 3 filters ( ) ( = 0, 1, . . . , − 1) and into a lowpass subband 1 [12]. The decomposition is done recursively by subsampling the lower lowpass subband. The small black boxes represent decomposed subband images. 2↓ and 2↑ indicate downsampling and upsampling by a multiplier of 2 along the rows and columns. Recursive steps extract different directional information at a given scale .
The lowpass filters and highpass filters are defined in the Fourier domain by [12] 0 ( , ) = ( 2 , ) , where and are the polar frequency coordinates. Consider where ( , ) represents the -directional bandpass filters used in the recursive steps with radial and angular parts, defined as (3) Figure 2 shows all filtered images at 3 scales (128 × 128, 64 × 64, and 32 × 32) and 4 orientation subbands (− /4, 0, /4, and /2) on component image of a cropped original FERET image. The SPT can locally detect the multiscale edges of facial images [13]. Detected features are noticeable by the first visual area of human visual cortex. In SPT, the lowest spatial-frequency subbands include distinctive edge information, whereas the higher spatial-frequency subbands contain finer edge information. The SPT coefficients consist of much redundant or irrelevant information. A suitable combination of these subbands can provide superior results. In [23], facial expression recognition was carried out by using only one subband.

Extreme Learning Machine
The architecture of a simple conventional ELM proposed by Huang et al. [14][15][16][17][18] which is shown in Figure 3 is similar to SLFNNs with hidden neurons and common activation functions.
where = [ 1 , 2 , . . . , ] ∈ means the weights between the inputs nodes and an th hidden node, = [ 1 , 2 , . . . , ] ∈ means the weights between an th hidden node and output nodes, and means the threshold of an th hidden node.
The output of ELM can be written more compactly in matrix form as In (8), is the hidden layer output matrix of ELM and is infinitely differentiable activation function; the number of hidden nodes is chosen as ≪ . Here, thê,̂,̂( = 1, . . . , ) parameters of conventional SLFNN are adjusted by solving the primal optimization problem: The objective function for optimization problem in (9) is expressed as The parameters are optimized by calculating the negative gradients of objective function in (10) with respect to , , . Consider The accuracy and learning speed of gradient based method particularly depend on the learning rate, . Small learning rate provides very slow convergence, whereas a larger learning rate exhibits the bad local minima effect. ELM uses minimum norm least-square solution to get rid of these limitations. The weight and bias values of ELM are randomly assigned unlike SLFNNs. Output weights of ELM are analytically determined through a generalized inverse operation of the hidden layer weight matrices, since the learning problem is converted into a simple linear system. So it is obtained extremely fast with better generalization performance than those of traditional SLFNN for hidden layers with infinitely differentiable activation functions. The final ELM achieves not only the smallest training error but also the smallest generalization error thanks to the obtained smallest norm of output weights similar to Support Vector Machines (SVM) [24].
For randomly fixed weights in the hidden nodes, the learning of ELM is equal to a least-square solution in (9).
is a nonsquare matrix for ≪ . ELM represented by the linear system in (7) gives a norm least-square solution aŝ , where * is the Moore-Penrose generalized inverse of matrix . The smallest training error is achieved by usinĝ The performance of ELMs having different activation functions has been presented for both regression and classification in [14][15][16][17][18]. In this paper, we are interested in the ELM classifiers with a sigmoid activation function.

Hybrid Color Spaces for Face Recognition
The hardware-oriented models including digital image processing commonly use color space. Each pixel of a color image is represented in the hardware as binary values for the red, green, and blue color components. Different color spaces are used for different applications. Hence, the color space can be converted to the desired color space by using the values in some formulation with respect to the application. The color components of color space have largely correlation with each other. Hence the conventional color spaces such as , , * * * , and are more effective than original color space at face recognition. In this paper, we investigated the color spaces in the literature and their hybrid color component combinations and their color component combinations for improving face recognition performance.
The (hue, saturation value) color space is defined as follows [25]: In this color space, hue ( ) is a measure of the spectral composition of a color, saturation ( ) shows the relative purity or the amount of white light mixed with a hue, and value ( ) refers to the luminance of the image. This model is commonly used for face detection and skin detection [26][27][28].
The color space is given by [ , (15) where and are chrominance components and is separating luminance component. This space is effective for skin color segmentation and face detection [26][27][28]30].
The color space is computed by , The Scientific World Journal 5 where stands for in-phase and stands for "quadrature", which is based on quadrature amplitude modulation. Consider .
The * * * color spaces are defined based on the tristimulus values by the following equations: where , , and are the tristimulus values of the reference white point. Consider The * * * color space corresponds to brightness ranging from black (0) to white (100). * component corresponds to the measurement of redness (positive values) or greenness (negative values). * component corresponds to the measurement of yellowness (positive values) or blueness (negative values). This color space was effectively used for color face expression recognition in [31].
The normalized and color spaces are obtained by using the across-color-component normalization technique in [4]. In this paper, the normalized and color spaces are names as -and -, respectively. The normalized color components, , , , , , and are named as , , , , , and , respectively.
-and -color spaces are defined as . (20) In [10], a simple effective model was generated by means of the subtraction of primary colors with respect to Ockham's razor principle. In this paper, the color space is named as RGB-r. The color components, and , are named as and , respectively. Consider Generally, the and color spaces are the best color spaces used for skin detection and face detection [3, 26-28, 30, 31]. The component images have fine face region [32]. The and component images contain partial face contour information [9]. The and components are the powerful component images for color face recognition [6,29], whereas color space [8], color space [9], -color space [10], and -color space consisting of the color components of -and -color spaces [11] have been powerful color spaces for face recognition.

Feature Extraction for Color Face Recognition
This section details the novel color feature extraction and multiple feature combination methods for the proposed LCSPT face recognition algorithm. The algorithm incorporates features such as local spatial information and color information for improving face recognition performance. The color information is obtained by using novel hybrid color spaces derived from six conventional color spaces, , , , , , and * * * and three hybrid color spaces, the - [5], - [5], and the RGB-r [10]. The hybrid spaces in this paper are constructed by 3 components as in [33]. So the dominant features of each component image are merged.
Illustration of the proposed LCSPT face recognition algorithm is given in Figure 4. The algorithm is applied in five steps.
(1) In this paper, new three color spaces, , , and are constructed. The new hybrid color spaces consist of the and component images of the color space, the color component of the color spaces, and the and color components of the normalized color space.
(2) Each component contrast is enhanced and divided into local partitions by an efficient pixel number. An efficient pixel number is determined by taking the resolution of face image into consideration.
(3) By applying SPT at a specific scale and a specific orientation to each local image portion, the statistical features such as mean, entropy, and variance of the local face images are extracted.
(4) The group of ELM classifiers is employed to classify the statistical features relating to the color component images of each subband. A decision fusion system combines local decisions from each classifier in the group into a single decision. The combination is implemented by a product decision rule to generate a fused decision vector [34].

Experiments and Results
This section evaluates the effectiveness of the proposed LCSPT algorithm on possibly the most representative examples of color face recognition. We used the color FERET database [21] and the AR database for experiments [22]. The experiments cover a wide range of facial variability and moderately controlled capturing conditions: facial expression (AR and color FERET), illumination changes (AR and color FERET), aging (color FERET), and slight changes in pose (color FERET). Experiments performed were using a single sample per class for color FERET database as well as more than one sample per class for the AR database.
In color FERET database, we centered all face images with respect to their ground truth face coordinates in [21] cropped and scaled to 128×128 pixels resolution. We used the cropped AR face images in [35]. In particular, we applied SPT at 3 scales (128 × 128, 64 × 64, 32 × 32) and 4 orientation subbands (− /4, 0, /4, /2) to the cropped images. The highpass subband is labeled as in all tables. In experiments, all subbands relating to only the first scale were used since the results of the first scale were better than the others. If the other scales were used for the face recognition, the recognition performance might increase too. However, the computation complexity will increase since the input space dimension will grow. We tried many efficient pixel numbers such as 4 × 4, The Scientific World Journal 16 × 16, and 32 × 32. We obtained the best performances for both datasets by using an efficient pixel number of 8 × 8.
All the experiments are run on a personal notebook computer with 2.4-GHz Intel(R) Core(TM)2 Duo processor, 3 GB memory, and Windows 7 operation system. Comparative studies of ELM, SVM, k-Nearest-Neighbors (k-NN), and Feed-forward Neural Networks (FNNs) for the proposed LCSPT face recognition algorithm are carried out. In order to validate both the classification accuracy and the training and testing speeds of SVM, MATLAB interface LIBSVM 2.83 software implementing Sequential Minimal Optimization algorithm, decomposing the overall QP problem into QP subproblems, http://www.csie.ntu.edu.tw/∼cjlin/libsvm, was used [36]. The values of kernel and regularization parameters were selected taken as 1/(2 2 ) = [2 4 , 2 3 , 2 2 , . . . , 2 −10 ] and = [2 12 , 2 11 , 2 10 , . . . , 2 −2 ], respectively. 15 × 15 = 225 combinations of the parameters were generated. The best combination was searched [36,37]. The parameters exhibiting the best 10fold cross-validation accuracy on the training dataset were accepted as optimal ones as in [24,36,37]. 10-fold crossvalidation divides the training set into 10 subsets of equal size, and sequentially one subset is tested using the classifier that was trained on the remaining 9 subsets.
ELM, having fast learning and testing speed, allows us to repeat the experiments several times. We changed hidden neuron number of ELM with sigmoid activation function to find the best number. We firstly took as 10 and then increased to the input sample size by the increasing step of 2. We searched ELM with the best correctness. ELM gives better results for large hidden neuron number [15]. On the other hand, an FNN can give good results for small hidden neuron number. The hidden neuron number of the FNNs with sigmoid function was determined in a range from 10 to the input sample size by steps of 2. The FNNs were trained using the conjugate gradient learning algorithm for 500 epochs. For k-NN, we changed the neighbor number from 1 to 5. We run every experiment for each classifier 10 times. Average results are reported in tables.
In addition, we compared the performance of our LCSPT face recognition algorithm with those of color Local Binary Decision (LBD) method in [38] and Local Color Vector Binary Patterns (LCVBP) method in [39]. We used the MATLAB source codes available in [38,39]. For LBD, we used the local standard deviation filter with a window size of 7 × 5 pixels for a normalization window size of 80 × 90 with respect to the recommendations in [38]. For LCVBP, we rescaled to the size of 112 × 112 pixels and then divided into the local regions with the size of 18 × 21 pixels as in [39]. Our method performed the best recognition performance in all experiments.
We also tried the feature fusion frame for our algorithm. Decision fusion has slightly better performance with respect to the feature fusion in many subbands. The results relating to the feature fusion frame are not included in the tables in order not to corrupt the completeness of the paper. In addition, the feature fusion frame is used together with the dimension reduction techniques in general because it has a large number of features. This also means an additional computational cost. If the dimension reduction techniques are not used, the feature fusion frame requires large computational time.

Evaluation of Proposed LCSPT on AR Database
. The AR database [22] contains over 4,000 frontal view color face images of 126 subjects (76 men and 56 women). Each subject has up to 26 images taken in two sessions, separated by two weeks. Each session contains 13 images with different facial expressions, lighting conditions, and occlusions. The images of 100 subjects were used in our experiments [35]. Figure 5 shows the image samples relating to one person in the AR database used in our experiments. The images consist of neutral expression, smile, anger, scream, left light on, right light on, and all sides light on for both sessions under the same conditions. We planned two experiments on the 8 The Scientific World Journal  to the AR database as in Figure 6. Table 1 presents our results on 24 color component images. The left side of Table 1 lists the results of AR experiment 1. These results indicate that the correctness at all subbands for , , , * , , , and color components are over 90%. However, the correctness of the other color components is below 90% in one or many subbands. Taking into consideration all the subbands for the , , , * , , , and color components, we observe the best results on the color component for HS subband, the * color component for − /4 subband, the and color components for 0 subband, B for /4 subband, and the , , and color components for /2 subband. From these results, we conclude that the results of the , * , , , and color components are better than the color component and even the others for facial expression experiment. In the case of an experiment including the variation of facial expression, and color components are the best in the color components with respect to the correctness in the their subbands since the fusion of the subbands with high correctness will provide a higher correctness.
The right side of Table 1 lists the results of AR experiment 2. These results indicate that the correctness at all subbands on only , , , , and color components is over 90%. Taking all the subbands for , , , , and color components into consideration, we observe the best results on and color components for subband, and color components for − /4 subband, and color components for 0 subband, B for /4 subband, and , and color components for /2 subband. From these results, we conclude that color component is specifically better than the others for the experiment including illumination experiment. The results correlate with the literature [26][27][28]30]. From all the results in Table 1, we infer that the color components, , , , , and can be used to obtain an acceptable good performance in both experiments. Hence, we can give our priority to these components in an experiment including the variation of illumination.
We compared our new three hybrid color spaces to 9 color spaces of , , , , , * * * , - [5], - [5], and - [10] and the hybrid color space of [8], [9], and - [5] and presented advantages with respect to the conventional RGB color space. In order to make our results clearer, we give only the best hybrid color spaces although we tried all combinations of 24 color components. Table 2 shows the recognition correctness of all hybrid color spaces. Specifically, we obtained that the hybrid color spaces generated by using the and color components combined together with the , , , , and color components improve more effectively the face recognition performance. For two AR experiments, the results of the , , and hybrid color spaces outperform those of the powerful conventional color spaces, the other color spaces   and the individual color components such as , , , and . Moreover, we obtained the best results with the correctness of 99.45 in the hybrid color space. On the other hand, if one wants to have the higher correctness for each color component or each color space in Table 1 and Table 2, then all their subbands could be fused by the decision or feature fusion method [13,20] in terms of more computational complexity.
In Table 3, we compared our results on the hybrid color space to LBD method in [38] and LCVBP method [39] in terms of the training time and the testing time and the recognition correctness. As can be seen from these results, our LCSPT-ELM face recognition algorithm is the best one in computing time especially. Moreover, the correctness of LCSPT-ELM outperforms the others.
We also compared the parameter adjusting time, the testing time, and correctness of SVM, k-NN, FNN, and ELM. Figure 7 shows the correctness of all classifiers. ELM outperforms the others in terms of the correctness. In Table 4, the results on the color component image are given. FNN is the most time consuming method in with regard to the parameter adjusting time, but FNN has the shortest testing time due to the high compact network architecture [15]. The parameter adjusting time of k-NN is the fastest, however, its performance and testing time are worse than that of the ELM for both AR experiments. The advantage of ELM is obviously seen by taking both the correctness and training time into consideration. ELM runs around 6 times faster than SVM and 130 times faster than FNN. After the parameter adjusting process we obtained the optimum parameters for the AR experiments. The hidden neuron number of FNN and ELM, neighbor number of k-NN, and ( , 1/(2 2 )) parameters of SVM are given in Table 4.

Evaluation of Proposed LCSP on Color FERET Database.
The FERET database consists of 11,388 color facial images obtained from 994 subjects being captured in the course of 15 sessions. The images have a resolution of 512 × 768 pixels. The database is very challenging due to significant appearance changes in the individual subjects in terms of aging, facial expressions, glasses, hair, moustache, nonuniform illumination variations, and slight changes in pose. The database is divided into five subsets: fa, fb, fc, dup1, and dup2. fa subset contains one frontal view per subject and in total 1196 subjects. We conducted single-sample-per-class face recognition experiments on the color FERET database [21]. The FERET evaluation methodology requires that the training processing be carried out using only the fa subset. We selected a subset of color face images of 204 persons. Their ground truth face coordinates are available in the color FERET database. We generated the training set by only single image portrait images consisting of 204 people from fa subset, whereas testing set was from the fb subset and the dup 1 subset. Figure 8 shows some sample images relating to the subsets of fa, fb, and dup1 of the color FERET database used in our experiments.
The left side of Table 5 shows the results on the fb subset of color FERET database. We observe the best results on the color component for subband, the color component for − /4 subband, the , , and color components for 0 subband, color component for /4 subband, and the and color components for /2 subband. From these results, we conclude that , , , , , , , and color components are especially better than the others. The right side of Table 5 shows the results on the dup subset of the color FERET database. As we can see from the right side of Table 5, the best color components are the , , , and . If we search the best color components in Table 5 for both experiments on color FERET database, we can observe that the , , and color components can achieve good correctness for expression, illumination, and aging experiments.
From Table 6, it can be observed that the best hybrid color spaces for both the fb subset and the dup1 subset are , , and . Taking both the AR database and the color FERET database into consideration, we infer a result that the and color components and the , , and hybrid color spaces are very effective for our face recognition algorithm.
The comparison results for the hybrid color space are described in Table 7. It is shown that the proposed LCSPT-ELM outperforms in terms of computing time and recognition correctness. In this paper, the extracted feature number is very small because only 3 features for each local portion of the color component of each image are used. Naturally, training and testing time are very small. On the other hand, the feature number can be reduced by the dimension reduction techniques such as PCA and LDA. Therefore, the computational complexity can be more reduced. Table 8 shows the average results of LCSPT face recognition algorithm using k-NN, FNN, SVM, and ELM on the color component image. Figure 9 depicts the results of LCSPT face recognition algorithm on the hybrid color space. From Table 8 and Figure 9, we observe that the ELM outperforms the other classifiers.

12
The Scientific World Journal

Conclusions
This paper presents a novel face recognition algorithm by means of fusing color and local spatial information (see Supplementary Material available online at http://dx.doi .org/10.1155/2014/628494)). The effectiveness of the proposed algorithm is assessed on 6 conventional color spaces, , , , , , and * * * , and 6 powerful hybrid color spaces developed in the literature, - [10], - [5], - [5], - [5], [8], and [9]. In addition, 3 new hybrid color spaces are constructed in this paper. In particular, the proposed hybrid color spaces,  , , and are configured as the combination of the and component images of the color space, the color component of the color spaces, and the and color components of the normalized color space. Experiments are constructed using the most challenging color FERET database and AR database.
The novelty of this paper is based on the following aspects: (i) a novel color feature extraction method is introduced by applying SPT algorithm to each color component of color face images; (ii) new hybrid color spaces are presented for improving the color face recognition performance being used together with SPT algorithm; (iii) in decision frame, the fusion of ELM classifiers is developed for fast color face recognition.
Experimental results show that SPT is an effective tool for extracting information from the color face images. The proposed LCSPT-ELM algorithm has very short training and testing time and a good recognition correctness. It is illustrated that the new hybrid color spaces of , , and have the best performance on our algorithm. LCSPT-ELM algorithm can be used for real time face recognition applications thanks to short testing time and parameter adjusting time.