Analysis and Implementation of Optimization Techniques for Facial Recognition

. Amidst the wide spectrum of recognition methods proposed, there is still the challenge of these algorithms not yielding optimal accuracy against illumination, pose, and facial expression. In recent years, considerable attention has been on the use of swarm intelligence methods to help resolve some of these persistent issues. In this study, the principal component analysis (PCA) method with the inherent property of dimensionality reduction was adopted for feature selection. The resultant features were optimized using the particle swarm optimization (PSO) algorithm. For the purpose of performance comparison, the resultant features were also optimized with the genetic algorithm (GA) and the artiﬁcial bee colony (ABC). The optimized features were used for the recognition using Euclidean distance (EUD), K-nearest neighbor (KNN), and the support vector machine (SVM) as classiﬁers. Experimental results of these hybrid models on the ORL dataset reveal an accuracy of 99.25% for PSO and KNN, followed by ABC with 93.72% and GA with 87.50%. On the central, an experimentation of the PSO, GA, and ABC on the YaleB dataset results in 100% accuracy demonstrating their eﬃciencies over the state-of-the art methods.


Introduction
Automated biometric recognition is fast gaining recognition as the most trusted security systems in the 21 st century. is is perhaps attributed to the recent significant advances in parallel processing techniques and also the search for most reliable security systems due to the sharp increases in crimes worldwide. e earliest biometric features that were automated for recognition include fingerprints where the unique ridge skin patterns were utilized. Others include the retina, iris, palm, skin, and nose tip. Fingerprints, retina, and iris recognition systems are known to yield very accurate results [1], but hardened criminals, being sensitively aware of the security implications, mostly avoid presenting their biometric features to be captured into databases. us, automated face recognition systems are now the obvious choice [2] because people cannot hide their facial images from installed CCTV cameras all the time.
is makes the technology the least intrusive and a hotbed research area as researchers continue to propose newer algorithms that outperform existing ones.
Since automated face recognition study is new as compared to fingerprint and others already stated, the problems associated with it are still eminent. For example, Zhang, Luo, Loy, and Tang [3] perceived the problem of facial landmark detection, which is among the central focus of the system development. Most of the face detection algorithms are slow and produce poor recognition accuracies (Owusu, Zhan, and Mao, 2014). Other unraveled challenges in the face recognition research have to do with occlusion, pose variation, illumination normalization, age, and gender [4]. In unconstrained environments, there is a significant decrease in recognition accuracy, thus making it difficult to accurately identify faces. erefore, there is a need to have techniques that improve face recognition in these environments. Tu, Li, and Zhao [5] attempted to solve the problem of illumination and pose by using DL-Net and N-Net methods. However, this method could not adequately account for large-scale normalized albedo images and face recognition in the wild. Another challenge in the face recognition research has to do with testing for the efficacy of the experimental results. ere are no standard datasets that is generally recognized by the research community to be used for testing. e use of specific datasets depends on the individual researcher's choice. Most of the datasets are premeditated and therefore do not represent a real-world scenario. In terms of ethnicity too, there is a challenge. Currently, there is no dataset that is well-balanced for race, gender, and age. e problem of nonuniform illumination also arises when the lighting conditions vary at different angles. us, the proportion of light reflected by the face is different. is phenomenon can lead to the misidentification of an individual [6]. Similarly, a random gyration due to individual movement can also lead to misclassifications in 4D recognitions. An input image and interperson image could appear dissimilar due to the rotation of the image [7]. e main purpose of this study is to explore the popular techniques and bring forth an approach that leverage on computational cost. Moreover, this method will take into account illumination, pose, and the facial expression. e proposed approach enhances the outcomes of the principal component analysis (PCA) technique using the optimization techniques approach. Additionally, the improvement in accuracy in this research transform to a general improvement in the security and integrity of biometric locks.
In this study, we explored the question of which is the finest or suitable optimization algorithm to use to maximize recognition. It also responds to which classifier suits the recommended approach and again which method utilizes less computational resource and time. e proposed method for this research requires the preprocessing of image; then, features are subsequently examined and extracted using PCA. is will be followed by the augmentation of the said features using PSO, ABC, and GA with classification culminating the entire process.

Related Works
Face recognition is mainly performed in four phases, visa-a-viz., feature extraction, face detection, face synthesis, and recognition [8]. Chihaoui et al. [9] stated that face recognition techniques are mainly in three categories. e first is the use of procedures that require the usage the whole face as input. e second approach is considering only some features or regions of the face, and the final method is the simultaneous usage of global and local facial traits. Furthermore, numerous datasets are geared towards the solution of specific face recognition problems, and these datasets are taken under laboratory conditions. However, there are some datasets that attempt to solve multiple problems and are taken under real-world conditions [9]. Fazilov, Mirzaev, and Mirzaeva [10] examined an algorithm to enhance the classification of objects in higher dimensions. e proposed algorithm formed a subset of correlated images, and then, a feature representation was elected to build elementary transformation models in the representative features' subspace. e algorithm pursues the augmentation of the accuracy of recognition, learning time, and finally, object recognition time. e solution of the problem of low face recognition accuracy due to large samples and limited availability of training samples was solved by He, Wu, Sun, and Tan [11] when they proposed cross-modality images of heterogeneous face recognition (HRF). e study proposed the Wasserstein CNN framework that utilizes one network to project near infra-red and visual images to a Euclidean space. e proposed method is a modality invariant deep feature learning architecture for NIR-VIS HFR.
e Wasserstein space that separates the NIR and VIS distribution is subsequently computed, and then, the correlation is levied on the connected layers to mitigate overfitting on small NIR datasets.
Similarly, Rahimzadeh, Arashloo, and Kittler [12] solved the optimization problem of MAP inference using the Markov random field (MRF) model by utilizing the processing power of the GPU's. e multiresolution analysis technique, incremental subgradient approach, and efficient message passing approach were used to obtain the maximum efficiency gain. Efficiency was enhanced by using the multiresolutional daisy features to attain invariance against occlusion and illumination. e proposed approach reduced the computational cost by 200% when compared to baseline methods. Likewise, Chan et al. [13] attempted the problem of training and adapting deep learning networks to different data and tasks. Chan et al. offered a method of passing images into a cascaded principal component analysis (PCA) filter for training PCANet. PCANet is subsequently used for feature extraction using the MultiPIE, extended YaleB, AR, FERET, and LWF databases. Moreover, PCANet is also a reference for reviewing advanced deep learning architectures containing a large number of image classifications. Also, Deng, Hu, Wu, and Guo [14] put forward the creation of a face image to mitigate varying illumination and pose, respectively, using only one frontal face image to develop an extended generic elastic model (GEM) and a multidepth model. Pose-aware metric learning (PAML) was learned by means of linear regression to synthesize each pose in their corresponding metric space, and it yielded an accuracy of 100%. Chen et al. [15] on the other hand proposed a residualbased deep face reconstruction neural network for the extraction of features from varying poses and illumination.
is method changes illumination and pose images to frontal face images with an average lighting condition. By comparing the proposed triplet loss and the Euclidean loss, the experimentation proved better for the performance of the latter over the former. However, only one database was used for this study, and there were no results to compare the proposed method with.
Tu, Li, and Zhao [5] also solved the problem of illumination, pose, and expression by using a DL-Net and normalization network (N-Net).
e DL-Net purges the illumination and then rebuilds the input image to an albedo image. e N-Net normalizes the albedo image and extracts features by supervised learning. e MultiPIE database establishes efficiency of the proposed method in augmenting face recognition accuracy under illumination, expression, and varying poses. e study concludes by stating that the extracted features can improve conventional feature extraction methods. Zhang et al. [16] also proposed an emotion recognition model with better accuracy than the SOTA model. ey extracted the facial expressions of seven different emotions. e extracted image is filtered through a combination of the Shannon entropy and multiscale feature extraction, and the result is classified using a fuzzy support vector machine (SVM). e study used the stratified crossvalidation as the validation metric, and thus, an overall accuracy of 96.77% accuracy was achieved. Ghazi and Ekenel [17] improved the accuracy under occlusion, variations in illumination, and misalignment of facial features by using two deep CNN models, VGG-Face and Lightened pretrained on large datasets. ese datasets were then used to extract facial features. ey also used 5 databases to attempt a solution to the problem. e AR face dataset was used as the analytical tool for the effects of facial obstruction, CMU PIE, and the Extended Yale dataset B to analyze the variation in illumination. e color FERET database was used for impact analysis on view invariance, and last, the FRGC dataset is for evaluation of multiview catalogues. e authors then used the Facial Bounding Box Extension to scan the entire head and extract deep features, thus improving the results. ey compared their results between the Facial Bounding Box Extension to other methods, and there was a significant improvement in results [18]. However, Zhang et al. optimized face landmark detection by taking advantage of supplementary data from the attributes of the features. e study proposed feature extraction using four convolutional layers. Each one of these layers produces several feature maps that are activated using rectified linear units. e layers are then coupled using max-pooling to produce a shared vector.
e Multi-Attribute Facial Landmark (MAFL), AFLW, and Caltech Occluded Faces in the Wild (COFW) are subjects to mean error and failure rate validation. e study concluded that the auxiliary task is more efficient by learning the dynamic task coefficient, and this, in turn, makes the proposed method more robust to occluded faces and significant view invariance [19].
is approach encouraged Ding and Tao [20] to propound a homographic pose normalization approach which handles the loss of semantic correspondence, occlusion, and nonlinear facial texture wrapping in PIFR. e proposed method first projects a lattice of three-dimensional facial landmarks into a two-dimensional face for feature extraction. Second, an optimal warp is appraised using a homographic corrective texture deformation due to pose variation. is is performed around each landmark on the local patch. e restored occluded features are used for face recognition using established face descriptors [20]. However, Sharma and Patterh [21] proposed a technique, whereby the face is identified by the Viola-Jones algorithm. en, the eyes, nose, and mouth are discovered by means of the proposed hybrid PCA. e features are subsequently mined using LBP for every part found. PCA is then applied to each feature extracted for recognition. e ORL face dataset was used with the recognition rate as the recognition metric. e study concluded that there is a higher recognition rate for the proposed hybrid PCA approach for varying facial expressions and pose when pitted with SOTA, PCA + wavelet, CA, 2DPCA + DWT, and local binary pattern algorithms. ey claimed that this approach can be extended to illumination, age, or partial occlusion problems. Interestingly, Duong, Luu, Quach, and Bui [22] presented an approach to deep appearance models (DAM) that accurately capture shape and texture variation under large variations using the deep Boltzmann machine (DBM). DAM replaced the active appearance model (AAM). is method begins by employing the use of DBM to ascertain the landmark distribution points on the face data, and then, the facial data are vectorized as a texture model. e two layers (shape and texture) are then interpreted by constructing and using a high-level layer. e LFW, Helen, and FG-NET databases were used for the experimentation. e RMSE values of the proposed method to the controlled method (bicubic and AAM) showed a significant improvement in the recognition rate [22].
Duan and Tan [23] also proposed a method of the low complexity method of learning pose-invariant features without the need for prior pose information. e proposed approach removes the pose from a face image and, by so doing, extracts local features. Self-similarity features are first generated from a face image when the distance that separates the features of different nonoverlapping blocks is evaluated. en, the linear transformation is subtracted from the local features, and the transformation matrix is acquired by reducing the distance between pose variant features. is matrix is created while discriminative information across persons is retained. Nevertheless, Singh, Zaveri, and Raghuwanshi [24] have proposed a rough membership classifier (RMF) for the classification of pose images. Feature extraction was performed using log-Gabor, and SVD's are used for the reduction of redundant features. KNN classifier is finally applied on the reduced Gabor features. ORL, Georgian Face database, CMU PIE, Head Pose Image databases were used with similar performance metrics to Duan and Tan [23]. e study concluded that the proposed method is best suited for mug shots in law enforcement. Moreover, it improves the recognition of face images with occlusion, and the method is augmented using modeling techniques to gain improved results. However, the use of three methods for testing reduces the optimality of the proposed methods for substantial datasets with varying images. Nevertheless, Zhao, Li, and Liu [25] have proposed a MSA + PCA for pose-invariant FR. First, features are extracted using the affine-invariant multiscale autoconvolution (MSA) transformation. Furthermore, the decorrelation of these traits and the reduction of the MSA proportions are performed using principal component analysis. Finally, the principal components with the highest eigenvalues are classified using KNN. e experimentation points out how computationally expensive the proposed method is during the MSA feature extraction phase.
Abdalhamid and Jeberson [26] presented an abled poseinvariant FR system via artificial bee colony optimized K-nearest neighbor classifier (ABC-KNN). e method used video as input for conversion into frames. During the preprocessing of the converted images, the adaptive Lee filter (ALF) was applied for image enhancement by removing Applied Computational Intelligence and Soft Computing noise. e Viola-Jones (VJ) algorithm is then used for face segmentation from the right eyes, nose, and mouth. Complete-LBP (CLBP), center symmetric local binary pattern (CS-LBP) features, Gabor features (GF), and patterns of gradient orientation magnitudes (POEM) descriptors are used for when quirks are extracted from the segmented image. ABC-KNN is applied as classification for the image. Recognition accuracy was the performance evaluation metric. Consequently, F. Zhang, Yu, Mao, Gou, and Zhan [27] propounded an approach for the PIFER framework based on feature learning using deep learning. e PCA-Net used frontal images that were not labeled during the learning process of the features. e latter are consequently used by CNN for feature mapping across the space separating the nonfrontal and frontal faces. e novel description generated by the maps is then used to describe nonfrontal faces to achieve a standard characteristic to describe arbitrary faces. e multiview robust features are then trained using a single classifier for varying poses. BU-3DFE Static FEW was used during the experimentation stage and recognition as a performance evaluation metric. After this technique has been contrasted with other techniques and frameworks, the proposed process seems to outperform SOTA techniques. Additionally, this method can be used to pose robust feature extraction when trained instead of training the model for different pose variations.
Finally, Sang, Li, and Zhao's [28] method for PIFR fuses texture and depth into a framework using joint Bayesian classifiers. e output is then identified using a similarity estimator between the input and the face database. However, there is a high computational cost for recognition of face images in large face databases. Furthermore, experimentation was extensive for various poses, and multiple methods were not compared to the current method.

Research Methodology
e research design for this study includes image preprocessing, feature extraction with PCA, the optimization of these features using PSO, ABC, and GA, and finally the classification of objects using KNN, SVM, and EUD. e datasets for the study are YaleB and AT&T popularly known as ORL. ese datasets were selected with the justification that they have well-defined challenges necessary for validating the facial recognition algorithm. Subsequent sections explain in detail the major parts of the study design.

Feature Extraction.
is component of the design acquires relevant biometric descriptors from a given image. In the process, high volume of data is obtained making it necessary to select only high contributing descriptors. Several techniques exist for this task; however, PCA is adopted for this study due to its popularity and efficiency in this domain [29].

Principal Component Analysis.
e primary goal of principal component analysis for facial recognition is the transformation of higher dimensional data into a lower feature subspace known as the eigenface. is eigenspace represents the locus of the covariance matrix of the feature landmarks. Despite its usefulness, they are computationally expensive given a higher dimensional data. is necessitates the adoption of an alternate algorithm with similar properties and structures [30] as PCA but relatively inexpensive known as singular value decomposition (SVD). Taking a matrix X with dimension n x m, a PCA can be defined as the Eigen decomposition of the covariance matrix X T X. is yields an eigenvalue λ with its corresponding eigenvectors W. ese eigenvectors are used as the transformation operator on X to obtain a new matrix T with the same dimension as X as shown in (1) Equation (1) is with the assumption that all components (i.e., columns) in W are principal. However, in practice, some of these components are expected to be redundant; hence, W is ordered by λ. With the ordered W, truncations can be performed using the first r components for analysis. By implication, we have W r being an m by r matrix giving us the new transformed matrix T r shown in T r � XW r . (2) As stated earlier, operations of PCA are expensive, and SVD with properties mathematically identical to PCA is preferred for implementation. Equation (3) shows the SVD of X.
where μ is the left singular vector, V * is the conjugate transpose of the right singular vector, and Σ contains the singular values on its diagonals. Computing the eigenvalue decomposition for X T X with equation (3) to obtain μσμ T , it becomes obvious that W is identical to V, while the ordered singular values (σ 1 σ 2 σ 3 . . .) are proportional to λ. Again, with the property that μ and V are unitary matrices, we have where I is the identity matrix. From equations (1) and (3) and noting that W is identical to V, we have ese equations further justify why SVD is computationally inexpensive compared to PCA which computes the covariance X T X. Taking the principal components of equation (7), we have Finally, since the requirement is W and not the Eigen decomposition of X T X, SVD can be used to efficiently compute W. 4 Applied Computational Intelligence and Soft Computing

Feature Optimization.
e section of the study describes the swarm intelligence algorithms used for the feature optimization. Among these methods are artificial bee colony, genetic algorithm, and particle swam optimization.

Artificial Bee Colony.
e artificial bee colony (ABC) is one of the swarm-based algorithms designed with the foraging actions of the honeybees. e four components of the behavioral model of ABC are mainly the food source, scouting bees, onlooker bees, and employed bees. e food source denotes a possible solution to the clustering problem as the scout bee carries out a global search. is search is performed stochastically, while the onlooker and employed bee search for adjacent solutions. e employed bees subsequently evaluate the precision of the solution from the previously stored solutions in memory. is information is successively passed on to onlooker bees in the dance area. is ensures that the best food source is chosen, and the stagnated food sources within an already set cycle are abandoned and replaced with new sources. is process is repeated until there is a convergence to obtain the optimal solution. Mathematically, we have the following steps.
Step one: randomly initialize solutions where i represents each food source, and FS represents the total food source. Furthermore, initialize onlookers and employ bees using a random function generator in where is a vector of length D with x maxj and x j denoting the maximum and minimum values of the j th dimension.
Step two: iteratively new solutions are found by each employed bee using where and φ ij ϵ (−1, 1). e sum of the Euclidean distance between the sample points and their cluster midpoints is known to be inversely proportional to the fitness value of all candidate sources. In the selection of the sources, a greedy algorithm is employed by comparing the fitness values of old and new positions.
Step three: probability p i of the solution x i is computed using where fit i is the fitness value of x i . Onlooker bees use this probability to select new x i values by searching for the local optimums while following step two to calculate the fitness value.
Step four: if onlooker and employed bees are unable to identify new and better candidate solution through the local search after some predefined iterations, the solution x i is discarded and substituted with scout bees' new solution. ese scout bees then use random global selection to search for new solutions.
Step five: step two to four is repeated until the defined stopping criterion is met returning the optimal output 3.2.2. Genetic Algorithm. Genetic algorithm (GA) on the other hand is based on genetics and the theory of natural selection. It is a stochastic algorithm which finds the best solution by effectively finding the global optimum in a larger space. A nonnegative fitness value is obtained using the fitness function. is value is used to summarize how close the optimal solution is to the global best (Mahmud, Haque, Zuhori, and Pal, 2014). A GA begins by generating random numbers (called chromosomes) with population size n. Each chromosome has its fitness value computed, and the stopping criterion is checked. e GA operators such as selection, crossover, and mutation to drive the chromosomes toward convergence are explained further.
Selection. is operator creates offspring from an existing population by using a process comparable to natural selection in biological lifeforms. Selection once more accentuates on the better performance of individuals in the population. is helps with the expectancy of their offspring having the likelihood of carrying on the genetic information to a successive generation. Consequently, the convergence is impacted greatly by the magnitude of the selection process.
Hence, the selection criteria should prevent premature convergence by maintaining population diversity and balance with the crossover and mutation operations.
Crossover. e crossover operator mixes information between two parents in a manner matching sexual reproduction. e objective of the crossover procedure is to give "birth" to an improved offspring. is is achieved by exploring different portions of the search space.
Mutation. Mutation procedure changes the values of the randomly selected bit within each string, thereby preventing the GA from being stuck at the local minimum through the scattering of genetic data, hence maintaining the variation in the population. is process is repeated until the optimal solution is achieved or the predetermined number of generations elapses.

Particle Swarm Optimization.
Particle swarm optimization (PSO) is also an optimization algorithm influenced by biology. It was derived by observing the collective behavior and swarming of a flock of birds and fish schools [30]. e algorithm comprises of solutions known as population, with each having a series of parameters which represent a coordinate in a space with multiple dimensions. Furthermore, a collection of these particles becomes a population Applied Computational Intelligence and Soft Computing with the particles probing the search space to find the optimal solution. Each particle tracks its former optimal solution in memory and then labels these solutions as the personal best and global best. e locus of the i th particle is then defined in the D-multidimensional space as and the population of the swarm as e particles then iteratively update their respective positions in the parameter space when searching for the optimal solution using where v i is the velocity components of the i th particle along the D-dimensions with t and t+1, indicating a dual consecutive run of the process. Velocity of the i th particle is defined in equation (15) with three terms: the first is inertia which prevents the particles from drastically changing direction, the second term describes the ability of particles returning to the previously known best position, and the last term describes the particles moving (swarm) closer to the best position: where p i is the personal best of the particle, g is the global best, and c 1 and c 2 , in the range of 0 ≤ c 1 , c 2 ≤ 4, are the cognitive and social coefficients respectively. Finally, R 1 and R 2 are the two diagonal matrices randomly generated from a uniform distribution in [0,1]. is ensures that the social and cognitive components have a random effect on the velocity update in equation (15). Since the particles are derived from the convergence of the personal and global best solutions, the stochastic weight of the two accelerating terms and the trajectories are semirandom.
is requires that equations (14) and (15 are iterated until a stopping criterion is met. Algorithmically, we have the following pseudocode.

PSO Algorithm.
(1) N particle initialization (a) Initialize the position x i (0) ∀ i ∈ 1: N (b) Initialize the particles best position to its position P i (0) � x i (0) (c) Calculate the fitness of each particle, and if f(x j (0)) ≥ f(x i (0)) ∀ i ≠ j , initialize the global best as g � x j (0) (2) Repeat until condition is met (a) Update the particle velocity in accordance with equation (15) (b) Update the particle position using equation (14) x (c) Evaluate the fitness of the particle f(x i (t + 1)) ≥ f(p i ) (d) If f(x i (t + 1)) ≥ f(p i ), update personal best: p i � (x i (t + 1)) (e) If f(x i (t + 1)) ≥ f(g), update global best: g � (x i (t + 1)).
(3) Assign the best solution to g at the end of the iterative process.

Classification.
After the optimization of the extracted feature vectors, classification models are built to address the face recognition challenges. ere are myriads of predefined models for this task given the feature set. Among these are SVM, KNN, K-means, Euclidean distance, VGGNet, and CNN. Other pretrained face classifiers such as the VGG-Face also exits which estimate the similarity between the face image of a subject and relevant features selected from the face images in the database. In this study, the Euclidean distance (EUD), K-nearest Neighbor (KNN), and the support vector machine (SVM) were used.

Implementation Pipeline.
e implementation pipeline for this study is shown in Figure 1. From the figure, every image undergoes a series of preprocessing and subsequent feature selection and finally features optimization. ese optimized features are trained for feature matching.

Environmental Setup.
e face recognition system implemented in this study was developed, trained, and tested using Matlab R2018b on an HP desktop processor Intel ® Core ™ i7-770T CPU @ 2.90 GHz, Linux Ubuntu 20.04 LTS operating system.

Image Preprocessing.
e first step taken in image analysis is the preprocessing of the image for undesirable noise. ese components are detrimental to the examination of the image and thus are removed via preprocessing. All images with dimensions more than 96-by-84 pixels are downsampled.
is is followed by the conversion of all colored images to grayscale. e outputs of the images are separated into training and test sets. Eighty percent of the images are considered as training sets with 20 percent as the test set.
is preprocessing is implemented so that the complexity will be reduced and the computational time improved.

Feature Extraction.
is section further illuminates on the feature extraction approach used in this study. Among the objectives of this study is the implementation of an offline facial recognition system with an improved and robust feature extraction method using optimization techniques. is method will be tested using the AT&T and YaleB face datasets as they contain faces with varying illumination, different poses, occluded faces, dissimilar expressions, or a combination of them. e mean of the features is computed, and the feature of the first principal component of each image is selected. e mean face for AT&T and YaleB datasets is shown in Figures 2 and 3, respectively.

Dimensionality Reduction and Feature Selection.
Given the computed mean face of the training data, the binary singleton expansion function is applied as an element-wise operator. e resultant image is decomposed with the single value decomposition function to reduce the coefficient used to characterize the image. e cumulative sum of the square of the diagonal matrix is computed to produce the principal component with the first k eigenvalue of the component selected. e eigenvectors are then normalized into eigenfaces. e sample output of this process on the AT&T and YaleB datasets is shown in Figures 4 and 5 , respectively.
Once more, the binary singleton expansion function is used to transform the test data by using the mean face. ese transformed train and test data are then optimized for better classification results.

Results and Discussion
is section describes in detail the results of the experiment and the analysis of the results. Moreover, comparisons between other optimization methods using the same database and three different classifiers will be discussed.

Numerical Results.
Generally, in recording the performance of a facial recognition model, statistical metrics such as accuracy, recall, precision, F-measure, and among others are used. For an efficient evaluation and a valid comparison with the existing study, the accuracy metric is selected. e recognition accuracy is computed for all the classification methods as applied on different datasets with varying optimization methods. Tables 1-7 show the average, maximum, and minimum recognition accuracies for the datasets with different classification methods. is experiment was conducted with a thousand five hundred (1500) iterations with/without considering the optimization of the extracted features.

Discussion.
From the result shown in Section 4.1, it is observed that the model's performance on the AT&T dataset is fairly low in general. is could be attributed to the occlusion, varying pose, and expression exhibited in the face images making it naturally difficult to model. On the contrary, the model's performance was relatively good as it contains only images with varying illumination. From Table 1, it can be seen that the accuracy is highest for KNN and SVM at 100% each for the YaleB dataset. Nevertheless, optimizing the features with GA saw a significant decrease of the KNN classifier to 70% with a 9.8% reduction using the Euclidean distance method as shown in Table 3. ere is no loss as shown in Tables 2 and 4 for KNN and SVM when ABC and PSO optimization is performed. However, 4% and 5% reduction for PSO and ABC, respectively, was noted         Table 1. However, using the PSO optimization technique saw an improvement in the average recognition accuracy to 80.46% for KNN. ere is a significant degradation when using the SVM classifier with an average recognition accuracy of 59.85%. Again, EUD saw 28.46% average recognition accuracy for PSO as shown in Table 5. Conversely, the average recognition accuracy for GA and ABC reduced to 47.05 and 66.78, respectively, when using KNN and 54.86% and 35.55% when using SVM as can be separately observed in Tables 6 and 7. e order of experimentation is given as follows.      is demonstrates that over 99% of the results for the PCA + PSO + SVM have recognition accuracy above 90% with 95% of the recognition accuracy at 100%. erefore, the 5% reduction in PCA + SVM can be considered insignificant. As a final point, PSO optimizes well for EUD and SVM. Similarly, PCA + GA + EUD has over 71% of the results above that of PCA + EUD. KNN and SVM, however, have 60% and 100% of the results greater than that of PCA + KNN and PCA + SVM, respectively. Yet still, PCA + GA + KNN shows significant decay of results from its default 100%. SVM, on the other hand, displays a negligible reduction in average recognition accuracy. With this, SVM seems to produce better results than both KNN and EUD with respect to the use of the GA optimization algorithm.
In like manner, the YaleB dataset results for PCA + ABC + EUD give rise to 28% of the data above the default 79.83% of PCA + EUD. However, the ABC optimized recognition for KNN and SVM revealed no significant loss of results with an average recognition accuracy of 100% and 99.6% for KNN and SVM, respectively. It can be established that the result optimized by ABC and classified using KNN are appropriate for the YaleB dataset, and ABC optimizes well for KNN and SVM on the said dataset. Table 8 shows the first 20 results of the total experiments for PSO, ABC, and GA optimization algorithms implemented on the YaleB dataset using EUD, KNN, and SVM classifiers. Nevertheless, the substantial reduction of the results observed when the AT&T dataset is used stems from the increase in parameters for recognition. e AT&T dataset contains images that are occluded, and it also has varying poses and expressions. PCA + PSO + EUD for the AT&T face dataset produced results that are on average below PCA + EUD for the database. 51% of the 1500 results obtained were lower than the default 29.94% for PCA + EUD. e overall average recognition of PCA + PSO + EUD for the AT&T database, however, was 28.46% as shown in Table 5. It is perceived that the deterioration of average recognition is offset by the larger values of the other recognitions. 48% of the result above the default 29.94% is not insignificant, yet it is a small percentage for consideration. KNN on the other hand has 31% of the results above the 77.84% default recognition. Still, the average recognition accuracy achieved was 3% higher than the default. us, 80.46% average recognition accuracy for KNN with PSO-optimized features (PCA + PSO + KNN) is the best combination for the AT&T database since none of the results for SVM was above its 82.05% baseline recognition.
Again, PCA + GA + EUD indicates 28.78% average recognition accuracy. is is similar to the average results got by all 3 optimization methods using EUD as the classifier. However, GA and ABC achieved 47.05% and 66.78% average recognition accuracy for the KNN classifier, respectively.
is illustrates an atrophy of the result from 77.84% to 47.05% for GA and 66.78% for ABC. GA suffers 30% degradation, while ABC saw an 11% reduction in average recognition accuracy. Moreover, the average recognition accuracy for both GA and ABC for the SVM classifier plummeted further than that of KNN. A 27% reduction in average recognition accuracy using the SVM classifier for GA supersedes that of ABC, which has 46.5% reduction.
us, it concludes that GA and ABC using SVM as the classifier is not suitable for this approach. e first 20 results are shown in Table 9.
Again, the linear kernel was used for the SVM classifier when the experiment was performed. is kernel has the propensity of improving computational time compared to other SVM kernels, and it is suitable for high dimensional data [31]. However, the linear kernel in this experiment appears to have sacrificed the accuracy for computational time. us, the kernel chosen does not produce good results. Other kernels such as the polynomial, Gaussian, radial basis function (RBF), or ANOVA could be used for SVM in future research, and the result is compared to the proposed method. Similarly, Table 1 indicates that SVM is a better classifier when the linear kernel is used and when no optimization algorithms are utilized. us, both AT&T and YaleB datasets produce the best results for SVM. Now, comparing Tables 2-4, it is perceived that a perfect recognition accuracy of 100% for the maximum of all metaheuristic algorithms and classifiers is achieved. is indicated that all optimization methods can be used for the YaleB database regardless of the classifier. Conversely, the maximum recognition accuracy for the algorithms used for augmentation gave the impression that the KNN classifier was better. is means that PSO + KNN, ABC + KNN, and GA + KNN have better recognition accuracy than their SVM counterparts.
is shows that the optimization algorithms have degraded the results produced by the SVM classifier. Nonetheless, GA's maximum recognition was better than that of the default SVM (PCA + SVM). erefore, GA should be preferred when an SVM classifier with a linear kernel is chosen. Furthermore, the algorithms improved the highest recognition accuracy achieved by the EUD classifier only. With this, PSO is selected as the ideal optimization algorithm for the YaleB and AT&T datasets. Juxtaposing the proposed method to other approaches, it is shown in Table 10 that the offered approach is effective than other SOTA methods. e culmination of this research presented the proposed optimization method and classifier, given their respective datasets in Table 11.
Finally, Table 12 shows the time taken for each experiment carried out. It is seen that PSO has the lowest average time for the experiment with 1.594s, 1.592s, and 55.46s for EUD, KNN, and SVM, respectively. PSO + SVM saw the highest computational cost with 55.46s for all experimentation. However, it required less than 2 seconds for   Local nonlinear multilayer contrast patterns (LNLMCP) 97.50 YaleB [35] Discriminative sparse representation via l2 regularization 82.61 YaleB [32] GLRAM 97.25 AT&T [33] Fisher discriminative dictionary learning (FDDL) 96.7 AT&T [31] PSO-KNN 98.75 AT&T [31] PCA-LDA fusion algorithm 98.00 AT&T [35] Discriminative sparse representation via l2 regularization 95.00 AT&T PSO + EDU and PSO + KNN trials. Subsequently, the ABC and GA meta-heuristic algorithms produced a similar result to PSO, but PSO is computationally less expensive than both.

Conclusion
is study looks at how to augment PCA feature with the selected optimization method to improve the accuracy of face recognition models. e proposed implementation shows that the choice of PSO as an optimization method works well in an unconstrained environment of the real world, since pose, occlusion, and expression are among the dominate face recognition problems found in the unconstrained environments. e default recognition accuracy of the YaleB showed 100% accuracy for both SVM and KNN classifiers. However, the ORL database did not attain perfect recognition due to the inherent nature of the dataset. Nonetheless, the use of optimization algorithms on the selected features saw an increase in recognition accuracy from 82.63% to a maximum of 100% for EUD. is indicates that all three evolutionary algorithms can be used to improve the accuracy of results. However, due to the ORL database catering for 3 parameters, the maximum recognition did not reach 100% but 99.25% which is promising using the PSO algorithm and KNN classifier. Last, the PCA + PSO + KNN approach is chosen for this study due to its ability to handle the increase in parameters, and it also outperforms other SOTA algorithms. ese parametric increases move the recognition closer to real-world human face recognition. Moving forward, this study can be extended by looking at other recent swarm intelligent optimization models used in other fields with the property of it be being less expensive, Other private datasets with more stricter challenges could be used to further validate this model. is remains a limitation to this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.