^{1}

^{1}

^{2}

^{1}

^{2}

Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages’ complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.

Face recognition has recently brought the extensive attention to the society for both research and commercial, especially when several applications have been practically adopted in several areas, for example, human biometrics, pattern recognitions, and computer visions, within various practical usages such as access controls, human identifications, robotics, crowd surveillances, and criminal forensics [

Consider the face recognition stage. There are many approaches used to enhance a recognition precision, one of them is to compare the properly selected facial features’ images to their facial database [

Consider those approaches; however, PCA and its derivatives are commonly used due to several distinctive features. For example, PCA is an optimal linear scheme in terms of mean squared error for compressing a set of high dimensional vectors into a set of lower dimensional vectors. In addition, the model parameters used in PCA can be directly computed from the data without additional processing steps. Moreover, compression and decompression operation complexity is given to the model parameters; in other words, PCA only requires matrix multiplication. Most importantly, PCA requires less number of features, which provide the nonreduction of precision quality, and these advantages result in high recognition accuracy even with a small data set [

With PCA derivations, for years, there are many attempts to overcome the drawbacks of a traditional PCA to enhance the performance of PCA recognition scheme, for example, a symmetrical PCA and two-dimensional PCA [

With a high computational task, one of the probable approaches to lessen the serial constraint is the usage of parallelism concepts. Recently, with a rapid increase of computer chips and advances in integrated circuit technology results in affordable multicore CPUs and enhanced parallelization techniques [

This research paper is organized as follows. In Section

In general, a face recognition system consists of four components shown in Figure

Face recognition system.

The testing process is similar to the training process but with fewer steps. The testing image will be processed in order to generate proper normalized face images which are ready for face image classifier (classifier) in order to figure out the least feature matching distance between testing and trained features. Normally, there are several techniques of feature extractor as well as classifier, that is, PCA, ICA, and LDA, as face image matching, for example, Euclidian distance (ED), support vector machine (SVM), and K-nearest neighbor [

Specifically consider PCA [

PCA Eigenspace Generation: there are five submodules in PCA feature extraction computation as follows: (1) estimating the mean vector of trained images; (2) centering the input data around the mean vector by finding the difference between the input image and images’ mean; (3) performing the covariance matrix calculation and then applying SVD over the covariance matrix to obtain the eigenvectors and the eigenvalues; (4) sorting the eigenvector in descending order and then selecting nonzero eigenvalues; and finally (5) projecting training images by calculating the dot product between the trained image and the ordered eigenvectors.

PCA image identification: there are three submodules as follows: (1) subtracting the testing image by mean vector; (2) performing Eigenspace projection by executing dot-product computation; (3) projecting the testing image and making a comparison between training and testing images to retrieve the closet distance.

PCA Eigenspace Generation and PCA image identification.

As discussed previously, applying PCA for face recognition incurs several advantages; however, there are some limitations; for instance, PCA involves a complex mathematical procedure due to a transformation from a large number of correlated variables to a smaller number of uncorrelated ones. Thus, in particular, a high resolution image in addition to a large number of images produces high computational complexity, especially during the matrix manipulation, that is, multiplication, transpose, and division, among high dimensional vectors [

There are many face recognition proposals employing PCA [

Furthermore, in the same year, Yue [

Consider computational time complexity. In 2011, Chen et al. [

Specifically consider recognition stage complexity. Roweis [

Moreover, in 2010, Chan and Tsai [

In order to overcome the major limitations of single core processing, one of the promising approaches to speed up a computation is parallelism. Several parallel architectures including parallel algorithms and machines have been investigated. Most of parallel face recognition systems only applied computer hardware architectures; for example, each individual computer system is used to run each individual image or a subset of face images. Additionally, recently in 2013, Cavalcanti et al. [

Previously, in 2003, Jiang et al. [

To improve recognition accuracy, in 2005, Meng et al. [

Due to the advances of multicore-processors within a single computer system, Wang et al. [

Notice that all of the approaches discussed above can achieve sorts of highest degree of parallelisms by only performing an individual face recognition algorithm either in multivirtual machines or multicore-processing with the key limitation on the number of CPU cores, and so, in general, these approaches do not utilize the parallelism in each face recognition stage in order to achieve higher degree of parallelisms, and these are our main focus in this research to propose a parallel architecture utilizing the parallelism of face recognition stage.

Aside from the recognition stage, the other two stages, preprocessing and classification phases, are also important. For example, consider the first phase. Zhang [

Consider the classification stage. Many approaches are introduced, for example, ED, Manhattan distance, Mahalanobis distance, nearest neighbor, and SVM [

It should be noted that most of the face recognition systems have applied ED for face classification [

To sum up, as discussed above, PCA yields high face recognition precision together with several derivations; however, our proposals investigated enclosing Expectation-Maximization (EM) algorithm into PCA to reduce a computational time complexity during covariance calculation. To further enhance the speed-up of the recognition rate, although many proposals focus on the parallelism utilization, our proposal deals with individual stage parallelisms during matrix manipulation of our first enhancement by rearranging the matrix manipulation including determinant and orthogonalization processes. Last but not least, the optimization over parallel classification technique was also investigated. These three combinations lead to a parallel architecture for face recognition called Parallel Expectation-Maximization PCA architecture (PEM-PCA).

An overall architecture of Parallel Expectation-Maximization PCA (PEM-PCA) generally consists of three parts: parallel face preprocessing, parallel face feature extraction, and parallel face classification. To illustrate the detailed system, Figure

A Parallel EM-PCA Architecture (PEM-PCA).

In general, preprocessing is one of the major stages used to reduce computational complexity as well as increase recognition precision for noise reduction. Although there are several preprocessing techniques, briefly stated in related work section, here, a necessary parallel method is considered to aid algorithm efficiency, that is, gray-scale conversion. Regarding our previous experiment, the recognition precision between color and gray-scale images is not significantly different but with the increase of computational time complexity [

Parallel preprocessing.

As stated in related works, several enhanced proposals over PCA for reducing a number of dimensions have been introduced; however, some issues are to resolve, for example, outlier and noise reductions, leading to lower accuracy, and computational complexity. Thus, here, to lessen the limitation, our first proposal is to apply Expectation-Maximization (EM) to figure out the maximum likelihood to estimate proper parameters derived from the covariance computational step [

To enhance face recognition rate, Figure

EM-Step derivation: given eigenvector matrix

Orthogonalization: at this stage, Gram-Schmidt Orthonormalization [

Data projection: at this stage, the input vector is projected into a transpose of M-Step matrix manipulation, and then it is multiplied to the result from mean subtraction step.

PCA Eigenspace Generation: the final stage is performed as a final round using a traditional PCA: mean estimation, mean subtraction, covariance computation and SVD, eigenvalue selection, and trained image projection, respectively, as previously shown in Figure

EM-PCA face recognition stage.

Consider algorithm complexity. As stated in Algorithm

To further enhance the speed-up, we propose PEM-PCA which introduces the parallelism in each computational stage. During EM-PCA face recognition, based on our observation, there are four different fundamental matrix manipulations: multiplication (matrix/constant), division (constant), transpose, and subtraction (matrix), each of which can utilize the parallelism depending on its distinctive characteristic. In addition to these four, three extra processes, determinant, cofactor, and orthogonalization, can also be performed in parallel in our architecture.

For example, for parallel matrix multiplication with

As discussed previously, our proposal is to hybrid those two techniques over a traditional PCA (Expectation-Maximization and parallelism). Essentially, each distinctive matrix manipulation is to investigate the degree of parallelism, that is, during subtraction, multiplication, division, transpose, determinant, cofactor, and orthogonal, stated in Algorithm

Consider the later multiplication with constant values. The procedure is multiplying constant number to every element of the input matrix. The result matrix dimension will be equal to the input matrix dimension. Each element value in the result matrix can be computed from (

Example:

Since the transformation from row to column is independent of each other, the parallelism can be utilized by using (

Second, since the matrix is symmetry, each element in the upper-triangular matrix is the same as that in the lower-triangular matrix. (see (

Optimized matrix multiplication with its transpose (upper/lower and diagonal). Consider

The calculation of each position can be computed at the same time, and so the parallelism can be utilized as shown in (

Example:

It should be noted that the determinant is a function of a square matrix reduced into a single number. Finding the determinant of an

Given the

From (

Example:

Example: weight calculation of the 5th vector.

Algorithms

With our proposed parallel architecture face recognition, for testing purposes, our parallel classification is based on ED to parallelly figure out the closet distance of face [

Parallel matrix operation computation for generalized Euclidean distance.

To investigate the performance of our parallel architecture, in this section, this research comparatively evaluates the system into three main scenarios in order to illustrate the optimal number of eigenvectors and epsilon values over EM-based approaches, to state the recognition computational complexity and precision, and to evaluate the performance of degree of parallelization.

In general, our testbed is based on a standard configuration on Windows 7 Ultimate operating systems (64 bits): CPU Intel(R) Core (TM) i-3770K 8-Cores 3.50 GHz (8 MB L3 Cache),

The performance evaluation process was implemented in .NET C# programming environment in order to emulate the real-world application and illustrate the computational time complexity. Public face database from FACE94 and FACE95 [

Three main scenarios are as follows: first, to illustrate the optimal number of eigenvectors and epsilon values for face recognition using EM-PCA and PEM-PCA in practice, the number of eigenvectors was varied by factor of 0.01, 0.03, 0.05, 0.07, and 0.09, respectively. In addition, various epsilon values,

Second, the selected optimal number of epsilons and eigenvectors are based on the outstanding performance in the first scenario. Then, for scalability purposes in terms of number of images, to illustrate the outstanding performance of our PEM-PCA, the evaluation was to comparatively perform the recognition process over two matrixes. Here, a traditional PCA, our first enhancement - EM-PCA, Parallel PCA (P-PCA), and finally our PEM-PCA. Note that the selected proposal, P-PCA is based on one of the promising previous works [

Finally, to evaluate the performance of degree of parallelization, especially of PEM-PCA including P-PCA and EM-PCA, the comparative performance over number of cores were in range of 1, 2, 4, and 8 cores, respectively. The computational time and accuracy were measured with 500 training images and

Consider the first scenario (EM-based approaches). Generally, Figure

Effect of epsilon values and Eigen numbers on computational time over EM-PCA and PEM-PCA.

Effect of epsilon values and Eigen numbers on recognition accuracy over EM-PCA and PEM-PCA.

Second, to explicitly show the performance improvement of PEM-PCA, Figure

Computation time over number of trained images (PCA, P-PCA, EM-PCA, and PEM-PCA).

Moreover, consider the recognition precision. Figure

Percentage of recognition accuracy over number of trained images (PCA, P-PCA, EM-PCA, and PEM-PCA).

Finally, to illustrate the scalability when increasing the number of cores, Figure

Computation time over number of CPU cores (P-PCA, EM-PCA, and PEM-PCA).

Percentage of recognition accuracy over number of CPU cores (P-PCA, EM-PCA, and PEM-PCA).

Eigenface decomposition ((a) traditional PCA, (b) P-PCA, (c) EM-PCA, and (d) PEM-PCA).

Although a traditional PCA can improve the recognition accuracy for a face recognition system, however, there exists a limitation over PCA. Therefore, in this research, several issues were evaluated and investigated, especially in terms of the computational time complexity during the covariance matrix computation stage. In addition, one of the possibilities to enhance PCA, Expectation-Maximization (EM) PCA face recognition, was proposed to enhance the time complexity when the recognition accuracy remains insignificant difference. Plus, due to the advance of parallelism, novel face recognition architecture was proposed by applying parallelism for large matrix manipulation including parallel preprocessing, parallel recognition, and parallel classification, all of which refer to Parallel Expectation-Maximization PCA or PEM-PCA.

Based on our justified parallel algorithm implementation, PEM-PCA outperforms the others, namely, a traditional PCA, our first enhancement, EM-PCA, and Parallel PCA by nine, two, and three, respectively. It should be noted that the results also depend on the number of training images with insignificant difference for recognition precision. Although the proposed technique can achieve a high degree of speed-up over PCA, more investigation including intensive evaluations can be performed, for example, improving preprocessing stages, enhancing high degree of parallelism aside from focusing only on matrix manipulation, reducing sensitivity outliers, and testing a large number of various images. In addition, to show the efficiency of parallelism usages, autonomously separating recognition task can be performed over message passing interface on each individual machine. To also fully complete the process of face recognition system, the other aspects of the system, that is, face detection, can also be further investigated. They are left for future work.

The authors declare that there is no conflict of interests regarding the publication of this paper.