A Revisit Histogram of Oriented Descriptor for Facial Color Image Classification Based on Fusion of Color Information

Histogram of Oriented Gradient (HOG) is a robust descriptor which is widely used in many real-life applications, including human detection, face recognition, object counting, and video surveillance. In order to extract HOG descriptor from color images whose information is three times more than the grayscale images, researchers currently apply the maximum magnitude selection method. This method makes the information of the resulted image is reduced by selecting the maximum magnitudes. However, after we extract HOG using the unselected magnitudes of the maximum magnitude selection method, we observe that the performance is better than using the maximum magnitudes in several cases. Therefore, in this paper, we propose a novel approach for extracting HOG from color images such as Color Component Selection and Color Component Fusion. We also propose the extended kernels in order to improve the performance of HOG. With our new approaches in the color component analysis, the experimental results of several facial benchmark datasets are enhanced with the increment from 3 to 10% of accuracy. Specifically, a 95.92% of precision is achieved on the Face AR database and 75% on the Georgia Face database. The results are better more than 10 times compared with the original HOG approach.


Introduction
Nowadays, image classification is one of the most extensive fields in computer vision which attracts the attention of many researchers because of its wide range of application in real life such as human detection, facial recognition, object classification, and diagnose diseases in medical. Many local and global image descriptors have been proposed in order to handle this task [1][2][3]. The key step is to find a robust descriptor which can discriminate classes. A large number of descriptors have been proposed to extract feature from image, including Color Local Binary Pattern (LBP) [4], Scale Invariant Feature Transform (SIFT) [5], Histogram of Oriented Gradients (HOGs) [6], and GIST [7]. Among them, HOG is a successful descriptor with various applications in real life, including pedestrian detection, face recognition, object classification, security, and industrial inspection. For example, Ding et al. [8] fuse HOG features and global normalized histogram for human detection task by the AdaBoost classifier. Qi et al. [9] apply HOG for railway track detection by using region-growing methods. Qingbo et al. [10] combine HOG features and Discriminative Multimanifold Analysis Method (DMAM) for face recognition in few shot learning context. In this approach, facial features were extracted from image patches fusing and then applied DMAM to transform to lower dimensional space. Chowdhury et al. [11] introduced an improved version of HOG for human detection. Nabila et al. [12] present an optimized version of HOG for road car detection. This approach is based on the concatenation of shape features and motion integration. Aytac et al. [13] extract LBP and HOG features from stomach cancer images. Then, multiple reduction techniques are applied to select the most useful attributes. Hmood et al. [14] introduce an improved version of HOG by proposing an approximate windows that can cover whole object, namely, Dynamic-HOG. This method requires less processing time and achieves a higher accuracy for coin classification. Jebril et al. [15] apply HOG for handwritten character recognition from grayscale images transformed from RGB color space. The authors used both R-HOG and C-HOG with different windows. Uddin et al. create another HOG version named T20-HOG with the ability to extract the textural features from seed varieties for identification [16]. A two-stage classification process is proposed by Chandrakala and Devi in combination with HOG descriptors [17]. With the robustness of deep neural networks and the advantage of HOG, Hung created a hybrid combination of HOG and CNN named HOG-CNN approach with promising results [18].
Various advancement versions of HOG are proposed in recent years by incorporating with convolutional neural networks (CNNs). Zaffar et al. [19] present CoHOG which is based on CNN in order to extract features from ROI for visual place recognition. Xiong et al. [20] introduce a method for handling depth information of images based on HOG, namely, Histogram of Oriented Depth (HOD) features. The proposed approach is applied for pedestrian detection by combining color image edge information and HOD. Wang et al. [21] present an approach based on HOG and multiorientation computation, namely, MO-HOG. This       Journal of Sensors when we extract HOG from color image, the performance usually degrades seriously. The reason is because the information that the color image contains is three times more than the grayscale image does. Moreover, most descriptors are first designed to perform on grayscale images only. Therefore, color has been well investigated in recent years for extracting HOG features. For example, Hoang et al. [25] extract local image descriptors, including LBP, HOG, and GIST for an application of rice seed image recognition. In this approach, features extracted from independent color component are then fused to form final feature vector. Aslan et al. [26] compare HOG-SVM and CNN for human tracking based on video in occlusion context. Zhou et al. [27] introduce a method for extracting HOG features based on     [32,33].
The final features are obtained by concatenating all features extracted from each color component. Fekri-Ershad and Tajeripour analyzed the color information of color-texture images for classification using hybrid color LBP [34].
Color-texture can also be analyzed using weighted color order of LBP for classification [35].
Although the current researchers have applied maximum magnitude selection method in order to selectively   5 Journal of Sensors reduce the information of color image so it can meet the requirements of the following stage of HOG extraction process, the performance is not completely optimized. The impact of color component for extracting features is first mentioned in [36] for LBP descriptors. This issue is extensively investigated to incorporate color information by various proposed method in recent years. It can be used by fusing features extracted from each color component independently or jointly [4]. Specifically, better performances are achieved when we use the unselected magnitudes of maximum magnitude selection method. Novel approaches for extracting HOG from color image are Color Component Selection and Color Component Fusion. Furthermore, we also upgrade the kernels in the gradient computation stage by extending it in horizontal and vertical dimensions. The intention is to figure out the connection between surrounding pixels and the computing pixel, whether the surrounding pixels cause any effects on the output performance.
The rest of this paper is organized as follows. Section 2.1 and Section 3 introduce HOG descriptor with Color Component Selection, Color Component Fusion, and the extended kernels. Then, experimental results are presented in Section 4. Finally, the conclusion and future works are discussed in Section 5.

HOG Descriptor with Color Component Selection and Color Component Fusion
This section briefly introduces HOG computation and the two proposed approaches with Color Component Selection (CCS) and Color Component Fusion (CCF) for HOG descriptor.
2.1. HOG Descriptor. Before extracting HOG feature, an image I is split into three subimages I C1 , I C2 , and I C3 which are three color components of I. Next, several image processing algorithms are applied on these images in order to reduce noisy for enhancing the performance. After preprocessing step, on each image, gradient magnitude and direction of each pixel are computed by using the horizontal and vertical gradients. The gradient computation of pixel located at coordinate ðx, yÞ is defined as follows [6]:   Journal of Sensors where G is grayscale value of the computing pixel, Δ x and Δ y represent horizontal and vertical gradients, and M ðx,yÞ and α ðx,yÞ sequentially define gradient magnitude and gradient direction.

Proposed
Approaches. Each image then results in a pair of matrices which contains one gradient magnitude matrix and one the gradient direction matrix. So there are three achieved pairs of matrices in total which are M C1 and α C1 , M C2 and α C2 , and M C3 and α C3 . As the following step requires only one pair of matrices, the current task is to figure out which pair of matrices should be selected in order to produce an optimal feature vector. The most popular solution is maximum magnitude selection. This method compares three gradient magnitudes in each pixel. Then, the maximum is chosen for the final magnitude of this pixel.
The final direction is also extracted from the same color component with the selected magnitude. The detail of maximum magnitude selection method is illustrated in Figure 1. After selection step, two final matrices are obtained to meet the requirement of the orientation binning step. In this step, the original image is divided into cells (8 × 8 per cell) and a 9-bin histogram is built in each cell based on the gradient features of the inner pixels. The bins are ranged from 0 to 180 degrees for unsigned gradient (α unsigned ) and from 0 to 360 degrees in case of signed gradient (α signed ). Gradient magnitude of each pixel is added into the corresponding index number of a bin in the histogram. The index number B idxðx,yÞ is then computed by Equation (5) or (6) and used the ceiling value. or where B num stands for the default number of orientation bins of the histogram and is usually set to 9 by default. α ðx,yÞ is computed above by Equation (4).

Journal of Sensors
For the normalization stage, the original image is then divided into blocks (each block contains 2 × 2 cells which equal to 256 × 256 pixels). An effective normalization is able to reduce noise and cancel the ill effects. In each block, 50% of the surrounding blocks are overlapped so each cell can be normalized more than once except the cells locate in the corner. Histograms of each block are concatenated and then normalized by using L1-norm, L2-norm, or L1-sqrt. Finally, all normalized histograms are combined together as a feature vector.
In practice, we observe that the maximum magnitude selection method is not optimal enough as it may cause information loss. When we use the unselected values of the maximum magnitude selection for HOG extraction, the achieved results are higher than use the maximum magnitude in several experiments. Therefore, we apply Color Component Selection and Color Component Fusion methods to improve the performance. For the Color Component Selection method, instead of considering which magnitude or direction should be selected for the next stage, we choose the final pair of matrices based on the color components. There are three matrices of each kind so nine different pairs of matrices are achieved in total. The orientation binning stage sequentially applies these pairs to proceed. In the end, nine feature vectors are obtained which are corresponding to nine selected pairs. These vectors are then evaluated to find out the best performance. The process of the Color Component Selection method is presented in Figure 2. The Color Component Fusion simply takes the achieved vectors and concatenates all of them into a fusion vector.

Extended Kernels for HOG Extraction
In order to speed the gradient computation process up, researchers used kernels which are illustrated in Figure 3. By filtering the image with these kernels, the horizontal gradient Δ x and vertical gradient Δ y are computed faster than calculating based on Equations (1) and (2). We define these original kernels are kernels with R = 1. As we sequentially extend the kernels through horizontal and vertical dimension, the parameter R also increases. Figure 3 describes specifically the kernels which parameter R in range from 1 to 4. Then, the achieved horizontal and vertical gradients are divided by parameter R.   Journal of Sensors Georgia, CLV, and MUCT (see Figure 4). In order to experiment the proposed approaches, these databases are split into 50% of training and 50% of testing. However, databases, such as Georgia, CLV, and MUCT, have an odd number of images per class. For instance, the MUCT dataset includes 3 images in each class so we decide to randomly split one image for training and the other two for testing in order to make the training more challenging. The 1-NN classifier is employed to evaluate the classification performance, and accuracy metric is employed to measure the performance on testing set. The summary of those databases is presented in Table 1.

Experimental
Setup. As our proposed approaches are designed to fit in every three-component color images, we also experiment them in several well-known color spaces, including HSV, ISH, I1I2I3, and YCbCr. These spaces are frequently applied for pattern recognition [4]. According to Section 2, each image after applying Color Component Selection results in 10 different feature vectors, including 9 different vectors and one fusion vector which is extracted by the Color Component Fusion method. These results are then deployed to a 1-NN classifier for evaluating. Moreover, we set the kernels with R from 1 to 5, respectively, for comparison. The experiments are implemented by Matlab 2017b and conducted on a PC with a configuration of a CPU I3 8100 3.60 GHz and 8 GBs of RAM.

4.3.
Results. The experimental results are presented by charts in Figures 5-8. By converting in many color spaces and increasing parameter R, the performance of maximum magnitude selection method has been upgraded. Furthermore, the Color Component Selection and the Color Component Fusion have also outperformed the maximum magnitude selection method according to the charts. For the AR database, the highest achieved accuracy is 95.92% which is 0.54% more than the highest accuracy can be achieved by using the maximum magnitude selection method, and this result is obtained by the Color Component Fusion with R = 5 kernels, and the used color space is ISH. For the Georgia database, when we apply the Color Component Fusion method combined with R = 1 kernels and YCbCr color space, the achieved result is 75.00%. This is the highest accuracy in all experiments on the Georgia database. Better performance is also obtained in experiment for the MUCT database and even in experiment with the CLV database  9 Journal of Sensors which is the most challenging image set. Generally, we observe that the best results are mostly achieved with Fusion approach in comparing with other approaches, which can tell the promising performance of our proposed approaches.
Several of best cases are reported in Tables 2-5. In these tables, the Max magnitude abbreviation stands for maximum magnitude selection method while Fusion abbreviation stands for Color Component Fusion method. We found that the best accuracy obtained for AR dataset is 95.92% by using Fusion approach. Similarly, this approach achieves 75.00%, 49.77%, and 93.47% for Georgia, CLV, and MUCT datasets, respectively. Note that the performance of each color space is different according to the feature extraction methods. Moreover, the increment of parameter R in the extended kernels has also improved several experimental results. In most cases, the accuracy when applying R=5 kernels is higher compared to the others. However, there are also several cases where highest precision is achieved with R=1 so we cannot yet tell if the higher the parameter R of kernels, the higher accuracy we can achieve. But we believe that these results are the foundation of the advantageous effectiveness of kernel size in HOG performance, which leads us to study deeper about it in the future.

Conclusion
In this paper, we propose novel approaches to extract HOG descriptor from color images such as the Color Component Selection method and Color Component Fusion in order to improve the classification performance. In our observation, we recognize that the proposed methods outperform the current maximum magnitude selection method in face classification task, especially the Color Component Fusion method. The color space conversion and extended kernels also efficiently improve the accuracy of the classification. However, in several cases, the extended kernels still cause the accuracy to decrease. The Color Component Selection method requires a long time to finish the extraction from every available case, and the Color Component Fusion may cause computer memory issues due to its deep dimension. Therefore, our future work is to figure out the optimal pair of matrices without experimenting all cases to enhance the inference speed. Feature selection methods are also recommended to reduce the vector dimension. Several CNN-

Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.