Joint Subspace and Low-Rank Coding Method for Makeup Face Recognition

Facial makeup significantly changes the perceived appearance of the face and reduces the accuracy of face recognition. To adapt to the application of smart cities, in this study, we introduce a novel joint subspace and low-rank coding method for makeup face recognition. To exploit more discriminative information of face images, we use the feature projection technology to find proper subspace and learn a discriminative dictionary in such subspace. In addition, we use a low-rank constraint in the dictionary learning. +en, we design a joint learning framework and use the iterative optimization strategy to obtain all parameters simultaneously. Experiments on real-world dataset achieve good performance and demonstrate the validity of the proposed method.


Introduction
Digital technology represented by artificial intelligence, Internet of things (IoT), and cloud computing, etc. is developing vigorously for smart cities. A smart city aims at using various kinds of information technology to integrate the system and services of the city, which improves the utilization efficiency of resources and the quality of life of residents [1,2]. Devices and sensors of IoT will reach 40 billion by 2025 [3]. With the amount of data increasing, IoT industry expands from the initial connection to intelligence and autonomy. Simultaneously, artificial intelligence as a powerful tool provides intelligence for smart cities, and a large number of machine learning algorithms are put into practical application to realize the autonomy of the equipment, which completes data collection and processing by itself. In this case, artificial intelligence helps to collect relevant data, identify alternatives, and make choices among alternatives, review decisions, and make predictions [4,5]. Automatic face recognition is considered as one of important techniques to realize smart city. It plays an interactive role in human-computer interaction and intelligent transportation in access control system, community management information system, person of interest, and so on [6,7]. For example, based on face recognition technology, monitoring is carried out in crowded places such as passenger stations and railway stations. Real-time recognition of faces in the video is compared with the database of people of key concern to public security, and real-time alarm can be provided. In smart cities, face recognition technology can now be applied to examinations in schools. At the examination centre, candidates verify their identity through a face recognition system to ensure fairness and prevent the occurrence of test substitution.
Due to differences in illumination variations, face angle, posture, and cameras, the face images belonging to the same person may look very different. Particularly, in real-world applications, facial makeup significantly changes the perceived appearance of the face and reduces the accuracy of face recognition. e literatures [8][9][10] indicated that facial makeup has a negative impact on the performance of the majority of face recognition algorithms. Figure 1 shows examples of face image pairs in the wild (DFW) dataset [11]: left one in each pair is without makeup, and right one in each pair is with makeup. e face before and after makeup can intuitively feel the significant changes in facial appearance. For those reasons, the makeup face recognition has become a difficult problem in facial classification. In order to develop a powerful face recognition system, the influence of cosmetics on face verification needs to be solved. Yan [12] introduced a multiple feature descriptors into the metric learning that learned multiple distance metrics by collaborating different facial features from visual and audio information. Chen et al. [13] developed a method for the automatic detection of makeup in face images. is method extracts a feature vector to capture the shape, texture, and color features of face images and uses SVM and Adaboost to determine if makeup is present. In addition to extracting features from the whole face, the method also uses parts of the face associated with the left eye, right eye, and mouth. Kose et al. [14] developed a facial makeup detector to reduce the impact of makeup in face recognition.
is method exploits the shape and texture information of face and uses SVM and Alligator as classifiers. Wang and Kumar [15] developed a framework for facial makeup detector and remover. In this framework, it uses a locality-constrained low-rank dictionary learning method for facial makeup detector and uses locality-constrained coupled dictionary learning for facial makeup removal. Although there have been some research results on makeup face recognition, the performances of the methods in real scenario applications still need to be improved.
Recently, dictionary learning has achieved great success in the field of face recognition. Traditional dictionary learning learns sparse representation and dictionary in the original data space. However, face makeup image verification is not only affected by cosmetics, but also easily affected by illumination and posture. In this study, we develop a joint subspace and low-rank coding method for makeup face recognition (JSLC). We consider finding a feature projection space and project the face images into it. At the same time, we learn a discriminative dictionary in such feature subspace, and each face image is encoded by a discriminative coding. To solve the problem of subspace and dictionary simultaneously, we build a joint learning model for them. In addition, to obtain more discriminative information in the subspace, we consider a low-rank constraint in the dictionary learning. e optimal solution of subspace projection matrix, dictionary, and sparse coefficient can be obtained simultaneously by alternating iterative optimization strategy.
We organize the rest of this paper as follows. Firstly, related work about makeup face recognition is reviewed in Section 2. Secondly, the proposed method is introduced in Section 3. After that, the results of comparison experiment are shown in Section 4. Finally, conclusions and future work are summarized in Section 5.

Related Work
In the view of AI, the makeup face recognition contains two stages: feature extraction and classification method. e common used feature extraction methods for face recognition is geometric methods and appearance methods [9]. Geometric methods use geometric shape of facial components, and appearance methods use textures of the facial images, also including creases and furrows. Geometric methods use pre-defined geometric marker positions on salient facial features to represent facial characters. Since geometric methods express facial characters according to the limited fiducial points on the human face, they usually need accurate facial feature detection. us, appearance methods often perform better in face recognition. e commonly used local binary patterns (LBP) and Garber filters are all appearance methods. ere are many successful classification methods for face recognition, such as SVM, metric learning, dictionary learning, Adaboost, and so on [16,17]. Due to its sparsity and noise alleviation, dictionary learning demonstrated its advantages in image processing tasks.
Dictionary learning methods can approximate each sample by using a linear combination of a few atoms from the learned dictionary [18,19]. Given training samples X � [x 1 , . . . , x n ] ∈ R d×n where x i ∈ R d (i � 1, 2, ..., n), the dictionary D ∈ R d×K and corresponding sparse coefficients A � [a 1 , a 2 , ..., a n ] ∈ R K×n can be trained by the following formula: where ‖ · ‖ 2 F is the Frobenius norm operation, ‖a i ‖ q is the sparsity regularization, and λ is the balance parameter. e original meaning of equation (1) is to complete the reconstruction tasks. In order to use DL for classification tasks, more discriminative or supervision information is considered in the dictionary learning. us, its optimization problem can be written as where function f A (A) can be a classifier, discrimination criterion, or label consistency term.

Objective Function of JSLC.
Because the appearance of the person face will change significantly after makeup, in this study we use subspace learning to project the original data samples and preserve the discriminative information in the feature subspace. e subspace learning imbedded into dictionary learning can be represented as where W ∈ R p×d is the projection matrix, p is the dimension of the subspace, and η and λ 1 are two positive parameters. J 1 has three terms. e first two are the dictionary learning terms in the subspace, and their goal is to minimize the representation error. e third is the regularization term, and it plays the role of principal component analysis (PCA), by which the discriminant information in the original space can be preserved in the projection subspace [20]. en, we consider using an affinity matrix Q to measure the discriminant ability of the sparse codes; i.e., if two face images are from the same person and look similar, the difference in their sparse codes is minimized; if two face images are from different person and look similar, the difference in their sparse codes is maximized so that discriminative information can be exploited. is idea can be represented as e element Q i,j of matrix Q can be written as where functionN k (x i ) returns the k-nearest neighbors of image x i . y i � y j means images x i and x j from the same person. y i ≠ y j means images x i and x j from the different person. We denote diagonal matrix S � diag s 1 , s 1 , . . . . .., s n } ∈ R n×n whose diagonal elements are the sums of the row elements of Q. J 2 term can be simplified as where L � S-Q.
In order to obtain more discriminative information in the subspace, we consider a low-rank constraint of A in the dictionary learning. Following [21], we use A ≈ EH to present the rank, where E ∈ R d×m and H ∈ R m×n . us, the objection function of rank minimization can be written as We combine J 1 , J 2 , and J 3 together, and we obtain the objection function of JSLC as J � J 1 + J 2 + J 3 , i.e., min W,D,A,E,H Obviously equation (8) is a joint learning function for subspace and dictionary learning. e subspace can gradually enhance the discriminative ability of the learned dictionary during the optimization process. Also, the learned dictionary can improve the recognition of the subspace.

Optimization.
In this subsection, we solve equation (8) by using the alternating optimization strategy. First, we denote W � (XR) T , D � WXB, for some R ∈ R n×p and some B ∈ R n×K . e objective function of JSLC can be written as min B,A,R,E,H Mathematical Problems in Engineering 3 where K � X T X ∈ R n×n .
(1) Update W: when fixing B, A, E, and H, we can obtain the following formula: Equation (10) has a closed-form solution. We can obtain R by where K � CΛC T , and Z can be solved by We can obtain Z in a closed-form solution. en, we can obtain W by (2) Update D: when fixing A, W, E, and H, we can obtain the following formula: We use Lagrange dual approach to solve equation (14). e closed-form solution of D is where Δ is a very small diagonal matrix. en, we can obtain matrix B by (R T K) † D, where (·) † is the operation of pseudo-inverse matrix. (3) Update A: with W, D, E, and H fixed, the objective function is rewritten by Obviously, each term in equation (16) is quadratic; we have the following formulation by setting the derivative of A to zero: Equation (17) is a standard Sylvester equation, and we can solve it by following the Bartels-Stewart algorithm [22].
(4) Update E, H: with W, D, and A fixed, equation (9) can be written as min E,H ‖A − EH‖ 2 F . We can easily obtain the closed-form solution of E and H by When we obtain the optimal parameters of dictionary D and project matrix W, we can obtain the sparse coding a i of testing image x i by Finally, we can use the closing distance strategy to perform the testing task.
Based on the above analysis, the proposed JSLC method is presented in Algorithm 1.

Datasets and Experimental Settings.
In the experiment, we use the widely used face datasets DFW [11]. e DFW dataset contains 11155 different images of 1000 people collected from the Internet, including face images of movie stars, singers, athletes, and politicians. Each person contains one face image without makeup and multiple face images with makeup, and there are differences in posture, age, lighting, and expression. Wearing glasses and hats are also categories of makeup. e example face images of DWF dataset are shown in Figure 2. In this paper, we use histogram of oriented gradient (HOG) [23], local binary pattern (LBP) [24], and three-patch LBP (TPLBP) [25]  To validate the effectiveness of our approach, our method verified performance with the following methods: LLC [26], LMNN [27], PRDC [28], NCA [29], and RDML-CCPVL [30]. We set the subspace dimension in the grid {100, 200, 300, 400, 450} and the number of dictionary atoms in the grid {200, 300,...,600}. e parameters η, λ 1 , λ 2 , and λ 3 are set in the grid {0.5, 1,...,5}. All parameters in these methods are set according to their default settings. We use 5fold cross-validation to obtain the optimal parameters and the average results of five turns are taken as the final result. Table 1 shows the comparison of JSLC based on HOG feature extraction and four comparison algorithms in the matching rate index. e results show the following: (1) JSLC achieves the best results on Rank 1, Rank 5, Rank 10, and Rank 15 of matching rate. JSLC uses dictionary learning framework and combines subspace and low-rank learning technology, which can effectively mine the discrimination information of different face images. (2) e comparison algorithm PRDC is mainly based on relative distance comparison; LMNN mainly uses the large interval information of inter domain samples, which cannot effectively make full use of the image discrimination information, so it still shows poor ability. Although RDML-CCPVL uses the depth discriminative metric learning method, the clustering method used in RDML-CCPVL cannot exploit all the effective information of images, so that its performance cannot achieve the ideal results. Tables 2 and 3 show the comparison of JSLC and four comparison algorithms in the matching rate index based on LBP and TPLBP features, respectively. Similar results are obtained on HOG feature; JSLC obtains the best matching performance compared with the other four methods. e results in Tables 1-3 also indicate that HOG, LBP, and TPLBP features are suitable for extracting makeup face feature vectors. e bold means the best result in the tables. Figures 3 and 4 show the values of Rank 1 of JSLC using HOG features with different subspace dimensions and dictionary atoms. e results in Figures 3 and 4 show that the dimension of subspace to 400 and the number of dictionary atoms to 450 is feasible. In the JSLC method, the parameters η, λ 1 , λ 2 , and λ 3 are related to the performance of the model. Next, we analyze these four parameters. With LBP features and the fixed values of other parameters, Figure 5 shows the average value of Rank 1 of JSLC method with different parameters η, λ 1 , λ 2 , and λ 3 .

Experimental Results.
First, we discuss the effect of η in JSLC. e parameter η controls the role of sparse regularization term. e results in Figure 5(a) show that when η � 1, the average Rank 1 achieves the best performance. In addition, the differences in model performance for different values of η are modest. e Input: a dataset of facial images x i n i�1 , including with makeup images and without makeup images; Output: dictionary D and project matrix W. Initialization: random matrix B, constructing R's columns using eigen-vectors with top p eigen-values of C; Repeat Update W using equation (13) with D, A, E, and H fixed; Update D using equation (15) with A, W, E, and H fixed; Update A using equation (17) with W, D, E, and H fixed; Update E and H using equations (18) and (19) with W, D and A fixed; if converged ALGORITHM 1: e JSLC algorithm.     parameter λ 1 controls the role of PCA regularization term. e larger the value of λ 1 is, the larger the proportion of PCA term in the objective function is. e results in Figure 5(b) show that different values of λ 1 lead to different performance of JSLC. But we cannot find the relationship between λ 1 and matching rate. erefore, the optimal value determined by grid search method is feasible. Next, we consider the effect of λ 2 . e parameter λ 2 controls the role of affinity matrix in JSLC. e results in Figure 5(c) show that the matching rate of JSLC is sensitive to λ 2 . When λ 2 � 4, the matching rate is highest. erefore, grid search method for λ 2 is feasible. Finally, we discuss the effect of λ 3 in JSLC. e results in Figure 5(d) show that the matching rate of JSLC is also sensitive to λ 3 . e parameter λ 3 controls the role of low-rank term. When the λ 3 value is too small or too large, the low-rank term cannot exploit the intrinsic data structure of the face image.

Conclusion
In this study, a joint subspace and low-rank coding method is proposed for makeup face recognition. Based on the dictionary learning framework, the subspace learning and low-rank coding is jointly, so that the discriminative information of face images can be exploited. Experiment results on DFW show the good performance of our method. In the future, we will carry out face makeup recognition and verification in more complex datasets and more scenes, such as under various illumination, pose, and expression. How to extract deep features of face images into our method is also our work in the next step.

Data Availability
e labeled datasets used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.