Assessing the Effect of Data Augmentation on Occluded Frontal Faces Using DWT-PCA/SVD Recognition Algorithm

The drift towards face-based recognition systems can be attributed to recent advances in supportive technology and emerging areas of application including voting systems, access control, human-computer interactions, entertainments, and crime control. Despite the obvious advantages of such systems being less intrusive and requiring minimal cooperation of subjects, the performances of their underlying recognition algorithms are challenged by the quality of face images, usually acquired from uncontrolled environments with poor illuminations, varying head poses, ageing, facial expressions, and occlusions. Although several researchers have leveraged on the property of bilateral symmetry to reconstruct half-occluded face images, their approach becomes deﬁcient in the presence of random occlusions. In this paper, we harnessed the beneﬁts of the multiple imputation by the chained equation technique and image denoising using Discrete Wavelet Transforms (DWTs) to reconstruct degraded face images with random missing pixels. Numerical evaluation of the study algorithm gave a perfect (100%) average recognition rate each for recognition of occluded and augmented face images. The study also revealed that the average recognition rate for the augmented face images (75.5811) was signiﬁcantly lower than the average recognition rate (430.7153) of the occluded face images. MICE augmentation is recommended as a suitable data enhancement mechanism for imputing missing data/pixel of occluded face images.


Introduction
Face recognition systems suffer suboptimality due to the lack of effective image preprocessing approaches. us, the use of image enhancement techniques and their effects on the performance of face recognition algorithms have been studied by several researchers [1].
Rana et al. [2] assessed the effect of image enhancement techniques on the recognition rate of facial recognition algorithms under varying illumination, face orientations, and expressions. eir results showed up to 75% improvement in recognition rate when image enhancement was applied. ey also found that the highest recognition rate was achieved under low-light conditions and image noise reduced using a median smoothing filter.
Abdul-Jabbar [3] showed that preprocessing steps such as image adjustment, histogram equalization, and change in file format when applied to enhance the contrast and the quality of face images in different face recognition algorithms improve the accuracy of recognition up to 30% as compared to using the original database of face images. In the case where half of the face is degraded due to occlusions, several researchers [4,5] have leveraged the bilateral symmetry of face images to reconstruct the full-face images and have used different denoising techniques to enhance image quality. Asiedu et al. [4] reconstructed frontal face images from left and right-half images using principal component analysis and singular value decomposition (FFT-PCA/SVD) and employed fast Fourier transforms in the preprocessing stage. ey reported no statistical differences in the average recognition distances for the left and right reconstructed face images. However, numerical evaluation of the average recognition rate was higher for the left reconstructed face images (95%) as compared to the right face images (90%). It is worthy to note that the assumption of occluded half-face images is untenable in practice and becomes deficient in the presence of random occlusions. Also, Fourier transform denoises an image only in the frequency domain of the original image. Wavelet-based denoising techniques, on the other hand, have an advantage by providing both spatial and frequency representations which make more contributions to noise reduction [6]. In addition, the scope of their work was limited to only image degradation due to half-face occlusions. e problem of image degradation due to random missing pixels or patches was not addressed.
Li et al. [6] evaluated the denoising performances of Sliding Window Average (SWA) and DWT in eliminating random fluctuations in sensor data for sensor fault detection and isolation in a nuclear power plant based on mean square error (MSE), signal-to-noise ratio (SNR), and correlation (COR) between the original data and denoised measurement. eir result showed superiority with respect to all test indexes when the DWT technique was used.
When image degradation is due to missing pixels or patches, imputation techniques provide a means of approximating such pixel values or patches by assuming that pixels in the known and unknown portions of degraded face images share the same statistical properties or geometric structures [7]. Two of such approaches are the diffusion-based inpainting methods [8] and the exemplar-based inpainting techniques [9] which have been successfully employed in restoring missing pixels or patches. According to Criminisi et al. [9], the main drawback of diffusion-based methods is that the diffusion process introduces some blurriness which becomes noticeable when filling larger regions. Aside that, such methods are optimal for filling holes or small patches.
Zhang et al. [10] applied the exemplar-based inpainting technique based on a surface fitting as the prior knowledge and an angle-aware patch matching and introduced a Jaccard similarity coefficient to advance the matching precision between patches to restore missing blocks and large holes as well as object removal task. ey asserted that their results outperformed many of the state-of-the-art methods in this domain. However, exemplar-based inpainting techniques are optimal for filling large texture area.
Multiple statistical imputation methods have emerged as a vital approach to finding random missing values. Such methods can account for uncertainty in imputations. e chain equations approach, in particular, is flexible and can handle both binary and continuous variables as well as complexities such as bounds or survey skip patterns [11].
In this study, we harness the benefits of the multiple imputation by the chained equation technique and image denoising using discrete wavelet transforms to reconstruct degraded face images with random missing pixels for recognition.
is work is motivated by the overgrowing number of applications of efficient and resilient intelligent systems. For more details on other application areas of intelligent systems, please refer to the work of Iwendi et al. [12] where they performed an empirical analysis to determine the effectiveness and performance of deep-learning algorithms in detecting insults in social commentary and the work of Gadekallu et al. [13] where a crow search-based convolution neural networks model was implemented in gesture recognition pertaining to the human-computer interaction (HCI) domain. e rest of the paper is organized as follows: Section 2 discusses the data acquisition, the adopted statistical or mathematical methods, the research design, and implementation. Section 3 presents and discusses the results of the algorithmic runs and numerical and statistical evaluations, and Section 4 examines the findings of the study in comparison with existing works in the literature and finally concludes by summarizing the overall achievements of the study. is section also presents some recommendation and directions for future developments.

Source of Data.
e Massachusetts Institute of Technology (MIT) (2003)(2004)(2005) and Japanese Female Facial Expressions (JAFFE) databases were adopted to benchmark the face recognition algorithm. e MIT database contains frontal facial expressions of ten individuals captured under different angular poses (0°, 4°, 8°, 12°, 16°, 24°, 28°, and 32°). For the purpose of this study, we used only the face images with a straight (0°) pose. e JAFFE database contains frontal face images of ten individuals captured along six universally accepted principal emotions (neutral, angry, disgust, sad, surprise, and happy). In this study, only the neutral expression was used. Figure 1 shows the face images of subjects in the train image database. Overall, the acquired train image database contains ten frontal face images each with straight (0°) pose from MIT database (shown in sub- Figure 1(a)) and ten frontal face images each with neutral expression from the JAFFE database (shown in Figure 1(b)). e images captured into the train image database are denoted as train images and are used to train the algorithm.
Two test image databases were used in the study. Test image database 1 was acquired by creating random missingness (10%) in each of the twenty frontal face images. e images in the test image database 1 are shown in Figure 2.

Multiple Imputation with Chained Equation (MICE).
Multiple Imputation (MI) uses the distribution of the observed data to estimate a set of plausible values for missing data [14]. Random components are incorporated into these estimated values to reflect their uncertainty.
MICE, also known as the sequential regression or fully conditional specification multiple imputation, is a very flexible method because it can handle different variable types such as discrete and continuous.
According to Van Buuren [11], the MICE operation is based on the assumption of missing at random (MAR) with the implication that missing value probability is independent of the unobserved values but only depends on the observed values. MICE has three different phases which are similar to any other multiple imputation method, imputation, analysis, and pooling. It creates multiple imputations to overcome the limitation of a single imputation.
In this study, we adopted the MICE algorithm to augment the occluded images due to its ability to handle large datasets through the use of chain equations as compared to other imputation methods that rely on joint models.
Ten frontal face images were acquired through augmentation of the images with missingness using MICE algorithm. e images were captured into test image database 2, as shown in Figure 3.
In quest for uniformity, captured images were digitized into gray-scale precision and resized into 200 × 200 dimensions.
e data types were also changed into double precision for preprocessing. is makes the matrices conformable and enhances easy computations [4].

Research Design.
Face images sent to the recognition system/module are first preprocessed through mean centering and Discrete Wavelet Transformation (DWT) mechanisms. e images in the train image database are the first to be sent to the recognition module for preprocessing. e preprocessed images are then passed to the feature extraction unit where the important features are extracted using the PCA/SVD algorithm. e extracted unique features are stored in memory as a created knowledge for recognition.
As stated earlier, two test image databases shown in Figures 2 and 3 were used in this study. e test images are also preprocessed using the mean centering and Discrete Wavelet Transform (DWT) mechanisms, and their unique features are extracted using PCA/SVD for recognition. e unique features are passed to the classifier/recognition unit where they are matched with the stored knowledge created from the train images. In the classifier, the minimum recognition distance indicates a close match. It should be noted that only one test image is passed to the recognition module along with the train images at a time.
e design of the study recognition module/system is shown in Figure 4.

Preprocessing.
In digital image processing, the preprocessing phase serves as a data preparation step for contrast enhancement, noise reduction, or filtering. e main objective of image preprocessing is to improve the quality of the images by removing acquired noise and suppressing unwanted distortion of the image feature [4].
Among the existing image enhancement procedures, filtering techniques have become very popular over the years for addressing the problem of noise removal and edge enhancement [15,16]. According to Bhattacharyya [17], other approaches, which include neuro-fuzzy-genetic and waveletbased approaches, operate on the underlying data regardless of the distributions and operating parameters.
In this study, as indicated earlier, the mean centering and Discrete Wavelet Transformation (DWT) mechanisms were adopted for preprocessing. Details of the DWT and mean centering mechanisms are presented in Section 2.3.1 and Section 2.3.2, respectively.

Discrete Wavelet Transform (DWT).
A wavelet transform is an efficient tool for data approximation, compression, and noise removal [17,18]. Kociołek et al. [19] defines DWT as a linear transformation that operates on a data vector whose length is an integer power of two, and transforms it into a numerically different vector of the same length.
e DWT has received considerable attention in  Advances in Multimedia various signal-processing applications, including image watermarking [20]. e primary objective of DWT as seen in multiresolution analysis [21] involves the decomposition of an image into frequency channels of constant bandwidth on a logarithmic scale. It provides a principled way of downsizing the range images and also captures both frequency and location information.
In the DWT cycle, an image is decomposed into four subbands denoted LL, LH, HL, and HH at the first level in the domain, where LH, HL, and HH represent the finest scale wavelet coefficients and LL stands for the coarselevel coefficients [20]. L is denoted as the low-frequency band and H as the high-frequency band. Specifically the LL subband represents the lower resolution estimate of the original value, while midfrequency and high-frequency detail subbands HL, LH, and HH represent horizontal edge, vertical edge, and diagonal edge details, respectively [22]. e LL subband can further be decomposed to obtain another level of decomposition. is is because most of the energy in the original image is concentrated in the lowfrequency subband. is makes the LL subband relatively free from noise. e decomposition process continues on the LL subband until the desired number of levels determined by the application is reached. e other HL and LH subbands contain the facial expression and face pose features, respectively.
e HH subband can easily be perturbed by noises, expressions, and poses. is makes the HH subband the most unstable among the four subbands.
Some types of wavelet used in the literature for data approximation, compression, and noise removal are Haar, Daubechies sets, Morlet, Coiflets, Biorthogonal, and Mexican Hat Symlets. We adopted the Haar wavelet because it is the simplest wavelet and can efficiently support the interest of the study. In its operation, it applies a pair of low-pass and high-pass and details coefficient vector Now, we concatenate U 1 and V 1 into another N-vector, which can be regarded as a linear matrix transformation of X j .
We then filter the transformed vector L 1 with the Gaussian filter.
is is because the Gaussian mixture is isotropic and can represent data distributions by a mean vector and a covariance matrix [17]. Most importantly, Gaussian noise is the default noise acquired due to illumination variations. Please refer to the work of Bhattacharyya [17] for other types of mixtures for non-Gaussian and asymmetric distributions.
After filtering, the transformed vector L 1 is inverted to X j with components and (5) Figure 5 shows the DWT cycle using the Haar wavelet.

Mean Centering.
Given an image space X � (X 1 , X 2 , . . . , X n ), whose elements are the vectorized form of the individual images in the study database, we define H as an n × n centering matrix given by where I is the n × n identity matrix and J is an n × n with all entries equal to 1. e mean centering of the j-th image is performed by subtracting the mean image from the individual images under study. Mathematically, the mean centered image ω j is given by where X j � 1/n (JX j ) � E(X j ), j � 1, 2, . . . , n, is the mean image and Ω � (ω 1 , ω 2 , . . . , ω n ) is the mean centered matrix of the face space.  6 Advances in Multimedia

Feature Extraction.
Feature extraction is the second step in digital image processing next to preprocessing (image preparation step). It aids in retrieving nonredundant and significant information from an image. e feature extraction phase is targeted at achieving time efficiency at the cost of data reduction [23], followed by object detection, localization, and recognition, which determine the position, location, and orientation of images [17]. According to Iwendi et al. [24], the main focus of feature optimization is not only to decrease the computational cost but also find such feature subsets that can work with different classifiers to produce better results.
Principal Component Analysis (PCA), also known as Karhunen-Loeve expansion, is a classical feature extraction and data representation technique widely used in the areas of pattern recognition and computer vision [25]. PCA can be used to find lower dimensional subspace which identifies the axes with maximum variance [26].
In a recent study, Reddy et al. [27] investigated two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA), on four popular Machine Learning (ML) algorithms (Decision Tree Induction, Support Vector Machine (SVM), Naive Bayes Classifier, and Random Forest Classifier) using the publicly available Cardiotocography (CTG) dataset from the University of California and Irvine Machine Learning Repository. e experimentation results prove that PCA outperforms LDA in all the measures. ey also found that the performance of the classifiers was not much affected by using PCA and LDA.
In this study, we adopted Principal Component Analysis (PCA) as a dimensionality reduction algorithm to extract the most significant components or those components which are more informative and less redundant, from the original data.
As indicated earlier, the DWT-PCA/SVD algorithm was used to train the image database to extract unique face features for recognition. e primary objective of the PCA/SVD feature extraction mechanism is to find a set of n orthonormal vectors, ξ j , which best describes the distribution of the image data [28]. e k-th vector ξ k is chosen such that is a maximum subject to the orthonormality constraints, where ξ j and λ j , j � 1, 2, . . . , n are the eigenvectors and their corresponding eigenvalues, respectively, of the dispersion matrix, and C is extracted through Singular Value Decomposition (SVD). e dispersion matrix C is given by e SVD decomposition gives two orthogonal matrices U and V and a diagonal matrix Σ. We obtain the eigenfaces from the following equation: where u j is the j-th column vector of U.
From the train image database, the extracted features (principal components) for the j-th image are given as Hence, the extracted features (principal components) of all the face images in the train image database are

Recognition Process.
is is the last stage in the recognition module/system. Here, an unknown face image from either of the two test image databases (occluded and augmented face image database shown in Figures 2 and 3, respectively) is passed through the system for recognition.
Unique features of the j unknown face image X * j are extracted as Let the extracted features (principal components) for all images in the i-th (i � 1, 2) test image database be en, the recognition distances (Euclidean distances) are computed as e train image that corresponds to the minimum Euclidean distance ϑ ji � min[θ], j � 1, 2, . . . , n and i � 1, 2, is chosen as the closest match to the unknown test image.

Results and Discussion
e results of matching the two set test images (occluded and augmented) for the MIT and JAFFE databases are shown in Figures 6 and 7, respectively. It can be seen from Figures 6  and 7 that there was no mismatch when the occluded face images were used as test images for recognition. Also, there was no mismatch when the augmented face images were used as test images for recognition. Figures 6 and  7 that the study algorithm (DWT-PCA/SVD) gave a perfect (100%) average recognition rate when used to recognize face images in both test image databases (occluded face image and augmented face image databases). e average computational time for the recognition of all 20 images was 4 seconds.

Statistical Evaluations.
We begin the statistical assessment with a discussion of some descriptive statistics followed by a test of significant difference between the average recognition distance of the occluded and augmented face images.
From Table 1, the average recognition distance for the recognition of occluded face images (430.715) with a corresponding standard error of 70.858 is greater than the average recognition distance for augmented face images (75.5811) with a corresponding standard error of 13.3511. e median recognition distances for the occluded and augmented face image database are 345.8750 and 48.4765, respectively. e median recognition distance is used as the average recognition distance in the presence of outlier observations. It is worthy to note that a relatively lower recognition distance is always preferred as it signifies a closer match. It can, therefore, be inferred from the abovementioned results that the MICE augmentation of the face images with missingness enhanced the recognition module to produce relatively lower average recognition distances.
Test of significant difference between the average recognition distance of occluded and augmented face images: Now, we assess whether there exists a statistically significant difference between the average recognition distances of occluded face images (from test image database (1) and augmented face images (test image database (2)) when they are used for recognition. e paired sample t-test is suitable for this test only if its underlying assumption is satisfied. e test is very sensitive to the assumption that the observed difference should be normally distributed. e Shapiro-Wilk test of normality gave a test statistic value of 0.874 with a corresponding p value (0.014) <0.05. is indicates that the distribution of the observed difference between the recognition distance of occluded and augmented images is not normal. We now resort to the nonparametric counterpart of the paired sample t-test (related sample Wilcoxon signed-rank test) since the assumption of normality has been violated. e exact distribution of the Wilcoxon signed-rank test gives accurate and reliable results for small sample sizes. e test is distribution free and, hence, does not require the satisfaction of any parametric assumption.
Let d j1 denote the recognition distance recorded using occluded images as test images and d j2 denote the recognition distance using the augmented images as test images for the j − th individual. en, the observed differences should reflect the differential effects of the treatments. e test operates under the null hypothesis, H 0 : the median difference between recognition distance of occluded and augmented faces is zero. Table 2 contains the results of the Wilcoxon signed-rank test.
As in Table 2, the Wilcoxon signed-rank test gave a standardized statistic value of −3.920 with a corresponding p < 0.05. is indicates that the median observed difference of recognition distance between occluded and augmented face images is significantly different from zero. is means the average recognition distance when occluded face images are used as test images is significantly different from the average recognition distance when augmented face images are used as test images. As stated earlier, evidence from Table 1 suggests that the average recognition distance for the augmented face image is lower than the average recognition distance for the occluded face images. Since a relatively lower recognition distance signifies a closer match, it can be inferred from the statistical evaluation that, the MICE augmentation of the occluded face images improves the performance of the recognition algorithm and recognition process at large.

Conclusions and Recommendation
e study successfully assessed the performance of the DWT-PCA/SVD recognition algorithm on occluded and augmented face images. Numerical evaluation of the study algorithm gave a perfect (100%) average recognition rate each for recognition of occluded and augmented face images. is rate is slightly above the rates of Ayiah-Mensah et al. [29] who used FFT-PCA/SVD recognition algorithm and obtained 90% average recognition rate each on the same databases.
is shows that the adopted preprocessing mechanism (discrete wavelet transformation) has an edge over the Fast Fourier transformation (FFT) mechanism used by Ayiah-Mensah et al. [29]. e perfect (100%) rate of recognition achieved cannot be guaranteed if the level of missingness in the face images increases. e statistical evaluation revealed that there exists a significant difference between the average recognition distance of occluded face images and augmented face images. From the descriptive statistics shown in Table 1, the average recognition rate for the augmented face images (75.5811) is lower than the average recognition rate (430.7153) of the occluded face images. is points to the fact that the MICE augmentation improved the recognition performance of the study algorithm.
is finding, although hidden from the numerical evaluation results, is evident from the statistical evaluation of the study algorithm.
According to Ayiah-Mensah et al. [29], the failure of the numerical evaluation exercise to uncover this finding can be attributed to the fact that the statistical evaluation mechanism is a more data-driven approach to assess the performance of the recognition algorithm. e findings of the study are consistent with those of Min et al. [30], despite the differences in occlusion criteria (random occlusions; brow, eye, and mouth occlusions; and scarf and sunglass occlusions) and the database used to benchmark the recognition/classification systems. e study, therefore, recommends the use of discrete wavelet transformation as a preprocessing mechanism in a recognition module. MICE augmentation is also recommended as a suitable data enhancement mechanism for imputing missing data/pixel of occluded face images. Future work will focus on assessing the MICE data enhancement mechanism on occluded face images when the percentage of missingness is increased.

Data Availability
e image data supporting this study are from previously reported studies and datasets, which have been cited. e processed data are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.