The Performance of LBP and NSVC Combination Applied to Face Classification

The growing demand in the field of security led to the development of interesting approaches in face classification. These works are interested since their beginning in extracting the invariant features of the face to build a single model easily identifiable by classification algorithms. Our goal in this article is to develop more efficient practical methods for face detection. We present a new fast and accurate approach based on local binary patterns (LBP) for the extraction of the features that is combined with the new classifier Neighboring Support Vector Classifier (NSVC) for classification.The experimental results on different natural images show that the proposed method can get very good results at a very short detection time.The best precision obtained by LBP-NSVC exceeds 99%.


Introduction
Researchers have shown that, to recognize a face, human uses different features such as geometry, texture, and colors of different parts of the face: eyes, mouth, nose, the front, and the cheeks.Based on this observation, several studies have been developed to verify whether it was possible to model this behavior in a computational way.
This article is devoted to the problem of computer based face classification [1], which became a popular and important research topic in recent years thanks to its many applications such as indexing and searching for image and video, security access control, video surveillance.Despite many efforts and progress that have been made during recent years, it remains an open problem and is still considered one of the most difficult problems in the community of computer vision, mainly due to the similarities between the classes and class variations such as occlusion, background clutter, perspective changes, poses scaling, and lighting.Nowadays popular detection approaches are based on descriptors and classifiers, which generally extract visual descriptors in the pictures and videos and then perform the classification using machine learning algorithms based on the extracted features.
Generally, the size of the data can be measured by two dimensions: the number of variables and the number of examples.These two dimensions can take very high values, which can be a problem with the exploration and analysis of these data.In this context, it is essential to implement some data processing tools that allow us to better understand the information contained in our dataset.Dimensionality reduction is one of the oldest approaches that answers this problem.Its objective is to select or retrieve an optimal subset of relevant characteristics according to a previously fixed criterion.This selection/extraction allows reducing the dimension of the space of the examples and making all the data more representative of the problem.
This reduction has a dual purpose, the first is to reduce redundancy, and the second allows facilitating subsequent treatments (feature extraction reduces required storage space and accordingly reduces the classification learning time and accelerates the pattern recognition process) and therefore the data interpretation.
The first step aims to select the best feature extraction [2,3] method for the context of face classification.In this context, we found that the LBP descriptor gives the most optimal representation of the image.The principle of this descriptor is to compare each pixel (considered as central pixel in a window of radius  and containing  points) to its neighbors and generate a binary code based on this comparison [4].For the validation of our choice, we make a comparison with other feature extractions descriptors such as Discrete Wavelet Transform (DWT) [5] and Histogram of Oriented Gradients (HOG) [6].
The second step concerns the selection of the classification function.We have chosen to use a method placed in the context of the semisupervised classifiers, the NSVC.It is based on the combination of two classifiers belonging to two different families (nonsupervised classification: Fuzzy C-Means and supervised one: SVM).The basic idea of the Neighboring Support Vector Classifier (NSVC) is to build new vicinal kernel functions, obtained by supervised clustering in feature space.These vicinal kernel functions are then used for learning.
Finally, our experiments show that LBP-NSVC outperforms all other feature selection and classification algorithms.The main criteria used for comparison are the accuracy of the classification and the execution time, without forgetting the ability of the classifier to effectively manage practical applications where the training data may come from different environments.
The rest of the paper is organized as follows.A brief description of LBP is given in Section 2. Section 3 introduces the NSVC based on supervised partitioning of features space.Experimental results are presented in Section 4, while Section 5 concludes the article.

Local Binary Patterns (LBP)
The LBP method can be regarded as a unifying statistical and structural approach to texture analysis.Instead of trying to explain the formation of the texture on the pixel level, local models are formed around each pixel.Each pixel is labeled with the code of the texture that is best at the local level in his neighborhood.Thus, each LBP code can be regarded as the code that best represents the local vicinity of the pixel.The LBP distribution therefore has both structural properties: primitives of textures and the rules for placement of these primitives.For these reasons, the LBP method can be used successfully to recognize a variety of textures, in which structural and statistical methods have been traditionally applied separately.
Local binary patterns were originally proposed by Ojala et al. in 1996 [7].The concept of the LBP is simple; it proposes assigning a binary code to a pixel based on its neighborhood.This code describing the local texture of a region is calculated by thresholding of a neighborhood with the gray of the central pixel level.In order to generate a binary pattern, all the neighbors will then take a value "1" if their value is greater than or equal to the current pixel and "0" otherwise.This binary pattern's pixels are then multiplied by weights and summoned to obtain a current pixel LBP code.We thus obtain, for any image, pixels with intensity between 0 and 255 as in an ordinary 8-bit image.Rather than describing the image by the sequence of the LBP codes, one can choose as texture descriptor using a 255-dimension histogram (Figure 1).The LBP was extended later by using different neighborhood sizes [8][9][10].In this case, a circle of radius  around the central pixel is considered.Values of  points sampled on the edge of the circle are taken and compared with the value of the central pixel.To obtain the values for  points sampled in the vicinity for any radius , an interpolation is necessary.The notation (, ) is adopted to define the vicinity of  points of radius  of a pixel.LBPP,  is the LBP code for the radius  and the number of neighbors .The main difference is that the pixels must be interpolated to obtain the values of the points on the circle.The important property of the LBP code is that this code is invariant to uniform illumination changes because the LBP for a pixel depends only on differences between its gray-level and that of its neighbors.
To calculate LBP code in a neighborhood of  pixels in RADIUS , one simply counts the occurrences of pixels   superior or equal to the central value: where (⋅) is the sign function and where   and   are, respectively, a nearby pixel and the central pixel grayscale The concept of the multiscale LBP is based on the choice of the vicinity in order to calculate LBP code to process textures at different scales [11,12].A neighborhood for a central pixel is distributed on a circle and built from two parameters: the number of neighbors "" on the circle and radius "" to define a distance between a central pixel and its neighbors (Figure 2).The texture  of an image  is categorized by the combined distribution of gray values of  + 1 pixels (where  > 0):  = (  ,  0 , . . .,  −1 ), and   corresponds to the value of the central pixel and   , with  = 0, . . .,  − 1, corresponds to the level of  pixels regularly spaced on a circle of radius .If   coordinates are equal to (0, 0), then   coordinates are given by the following equation: From the definition of neighborhood, the authors define, first, a local binary pattern that is invariant to any monotonic transformation of grayscale, LBPP, .For each pixel (, ) (  = (, )), the central pixel is not used for the characterization of textures.Indeed, regardless of   vicinity, that pixel only describes a light intensity which is not necessarily useful [11].Subsequently,   is used as a threshold in the following manner: Accordingly, the calculation of the LBP code can be obtained in the same way as the basic LBP (see ( 1)).LBP-based face representation: each face image can be considered to be a composition of micropatterns which can be effectively detected by the LBP operator.Hadid et al. [13] introduced LBP-based face representation for facial recognition.To examine the face shape information, they divided the images of face to  small nonoverlapping areas 0, 1, . . .,  (as shown in Figure 3).
The NSVC is a classifier adaptive to different datasets.It is based, on one hand, on a nonsupervised approach such as -means or FCM and, on the other hand, on a supervised approach: SVM.

Neighboring Support Vector Classifier (NSVC)
Support Vector Machines, first introduced by Vapnik and colleagues for the problems of classification and regression, can be seen as a new training technique based on traditional polynomial and radial basis function (RBF).As discussed before, SVMs have attracted considerable attention because of their high generalization ability and higher classification performance relative to other pattern recognition algorithms.However, the assumption that the training data are identically generated from unknown probability distributions may limit the application of SVM to the problems of everyday life [14].
To relax the assumption of identical distribution, the NSVC [15][16][17] uses a set of vicinal cores functions built based on supervised clustering in the feature space induced by the kernel.The basic idea of the NSVC is to build new vicinal core functions obtained by supervised clustering in the feature space.These vicinal core functions are then used to SVM training.
This approach consists of two steps: Consider the following input output data together: where  is the number of input data points and  is the dimension of the input space.The vicinity functions V(  ) of   data points are built if test data points satisfy two assumptions: (i) The unknown density function is smooth in the neighborhood of each point   .
(ii) The function minimizing the functional risk is also smooth and symmetric in the neighborhood of each point   .
The optimization problem based on the principle of VRM named vicinal linear SVM [18,19] can then be formulated as minimize: where  is a weight,  is a punishment constant for   ,  is the offset, V(  ) is the vicinity associated with the test point   , and is the conditional probability of the respective vicinity in the input space.
The following theorem for the vicinal SVM solution is true (see [18] for a proof): where to define the coefficients   one has to maximize where (,   ) is called the monovicinal kernel and (  ,   ) is the bivicinal kernel of the vicinal SVM [18].

Supervised Kernel-Based Deterministic Annealing for NSVC.
The clustering of training data in the feature space is a well-documented subject [20,21].It consists of nonlinearly mapping the observed data of an input low-dimensional space to a high-dimensional feature space using a kernel function, which facilitates the separation of linear data, denoting a nonlinear transformation of the input space  to a high-dimensional space using a kernel function as where Φ(  ) is the transformed point   .
All training data points are distributed in  vicinities/clusters in the feature space, where   () is the center of mass of the th vicinity residing in .This is a similar representation to clustering based on the characteristic space of -means: where  is the number of clusters,   are the parameters to be defined by the clustering technique (SKDA), and   =   (  ) denotes the data points labeled in the feature space.The classification problem is usually defined mathematically by a cost function to be minimized; for NSVC case, this function is the distortion function.Similar to the notation used in [22], we let (  |   ) denote the probability of association of points   mapped to the cluster center   .Using the square distance   (  ) [15] between the center   and the training vector   , the distortion function in the function space becomes Since no a priori knowledge of the distribution of data is assumed, over all possible distributions which give a given value of   we choose the one that maximizes the conditional Shannon entropy in the characteristic space: The optimization problem can be reformulated as the minimization of the Lagrangian: where  is the Lagrange multiplier.
To determine   parameter, we minimize the free energy function  with respect to the likelihood of association [22], which is related to the Gibbs distribution as where (  ) is the mass probability for th cluster: And so the energy function is The partial derivative of  with respect to   is Accordingly By dividing by the normalization factor And, so, Using ( 14) leads to Finally, we obtain the expression of   that will be used to construct the vicinal kernel for NSVC functions: 3.2.NSVC with the Feature Space Partitioning.The optimization problem based on feature space partitioning is formulated as follows [18]: where V(  ) represents the th vicinity associated with the mass center   in the feature space and ( |   ) is the conditional probability of respective vicinity in the feature space.According to Bayes theorem, we have By comparing ( 22) and (25), we get And the optimization constraint becomes Let one define the mono-and bivicinal kernels as where   parameters are obtained from the SKDA clustering step.The decision boundary is where   is the coefficient that maximizes the dual function: In order to obtain a sparse solution at the cost of the extra clustering procedure, a good selection of the number of clusters is required.

Experimental Results
We will now carry out a deep evaluation of the classifiers mentioned in previous sections.We start by a detailed description of the dataset and then present the classification results of all classifiers.
4.1.Dataset.Among the factors that influence or affect the performance of face detection system are scale, pose, lighting conditions, facial expression, and occlusion.For this reason we established a robust database based on diverse illumination conditions and different color and texture variations and, then, under various emotional facial expressions such as neutral expression, anger, scream, sadness, sleepiness, being surprised, wink, frontal smile, frontal smile with teeth, open or closed eyes, and facial details (glasses/no glasses, hats/no hats, and caps/no caps).Then, to use our database in the context of face classification, the different facial images were  taken at different lighting conditions to make the classification model invariant to illumination.The images were adopted under divers unconstrained environment (Figure 4).We detect people's faces in our database using the cascade detected of Viola-Jones algorithm and normalized the detected faces with a fixed size of 30 * 30 pixels.Figure 5 presents some typical face images of database.

Results of NSVC. The basic idea of Neighboring Support
Vector Classifier (NSVC) is to build new neighboring kernel functions, obtained by supervised clustering in feature space.These neighboring kernel functions are then used in SVM based learning.When using polynomial and RBF kernels, we have used cross-validation in order to compute optimal learning parameters for both kernels.
We evaluate the accuracy of each feature extraction method with NSVC.The results obtained are shown in Tables 1 and 2.
To test the performance of the proposed approach, we compare the precision of the LBP-NSVC algorithm with other combinations such as HOG-NSVC and DWT-NSVC.So, every time we use LBP-NSVC in our experiments, we must consider polynomial kernel to obtain more accurate results.The following sections show a comparison of the accuracies achieved with our experiences and other classifiers.

Results of Adaboost.
After classifying our database using Adaboost, the method of boosting is particularly interesting because we can choose the number of classifiers in order to achieve the desired error rates on samples examples.Moreover, we observe that the error rate decreases exponentially with the number of used weak classifiers (Figures 6 and 7).
Figure 6 shows the classification error with respect to the number of weak classifiers for SLBP, ULBP and HOG with Adaboost, and SLBP + DWT, SLBP + HOG, and ULBP + DWT for Figure 7.

Classifiers Comparison. LBP-NSVC gave the best results
for dataset.We seek to demonstrate the performance of this method in comparison with other classification methods.
Classifiers used for our comparison experiments are the following: Naive Bayes, Decision Tree,  Nearest Neighbors (NN), linear Support Vector Machines (linear SVM), and Random Forest.We compare the classification results of these five algorithms together with our proposed NSVC on the SLBP (Figure 8), ULBP (Figure 9), SLBP + DWT (Figure 10), SLBP + HOG (Figure 11), and ULBP + DWT (Figure 12).It is clear that the approach of the proposed LBP-NSVC produced the best or equal classification accuracy compared to other methods.
In addition to its high performance, the NSVC is a new theoretical method of classification which combines two methods of classification belonging to two different families (unsupervised method: Fuzzy C-Means and supervised method: SVM).

Conclusion
We have proposed an original method for face detection.
Our system is based on the combination of two types of information: LBP descriptors and descriptors such as DWT and HOG.In order to manage these descriptors and combine them in an optimized way, we propose using an advanced learning system the NSVC.It allows selecting the most important information through kernel weighting depending on their relevance.The experimental results on different real images show that the proposed method can get very good results.
Our goal in the near future is to continue the study of LBP-NSVC to test it on different datasets from other research areas and try to find the best compromise between precision and execution time.
(i) Supervised clustering step based on SKDA algorithm (for supervised kernel-based deterministic annealing, used to partition the training data in different vicinal areas).(ii) A training step where the SVM technique is used to minimize the vicinal risk function (VRM) under the constraints defined in clustering step based on SKDA.

Figure 4 :
Figure 4: Typical people images of database in varied environment.

Figure 10 :
Figure 10: Comparison results of different classifiers methods on SLBP + DWT.

Figures 8 -
12 show the percentage of classification accuracy of different matching algorithms.It clearly shows that the classification accuracy is best for the majority of algorithms.

Figure 11 :
Figure 11: Comparison results of different classifiers methods on SLBP + HOG.

Figure 12 :
Figure 12: Comparison results of different classifiers methods on ULBP + DWT.

Table 1 :
Accuracy of the method of extraction of features with polynomial kernel of NSVC.

Table 2 :
Accuracy of the method of extraction of features with polynomial kernel of NSVC.