On Facial Expression Recognition Benchmarks

Facial expression is an important form of nonverbal communication, as it is noted that 55% of what humans communicate is expressed in facial expressions. *ere are several applications of facial expressions in diverse fields including medicine, security, gaming, and even business enterprises. *us, currently, automatic facial expression recognition is a hotbed research area that attracts lots of grants and therefore the need to understand the trends very well. *is study, as a result, aims to review selected published works in the domain of study and conduct valuable analysis to determine the most common and useful algorithms employed in the study. We selected published works from 2010 to 2021 and extracted, analyzed, and summarized the findings based on themost used techniques in feature extraction, feature selection, validation, databases, and classification.*e result of the study indicates strongly that local binary pattern (LBP), principal component analysis (PCA), saturated vector machine (SVM), CK+, and 10-fold cross-validation are the most widely used feature extraction, feature selection, classifier, database, and validation method used, respectively. *erefore, in line with our findings, this study provides recommendations for research specifically for new researchers with little or no background as to which methods they can employ and strive to improve.


Introduction
e discovery of expression and emotions in humans and animals by Darwin [1] in the nineteenth century served as the premise for research on emotions. In his work, Darwin indicated that both humans and animals exhibit emotions of similar behaviour [2]. Since then, there has been significant progress in the research on emotions, with the past two decades witnessing immense contributions from multidisciplinary fields, such as psychology, medicine, sociology, business, neuroscience, endocrinology, and computer science, resulting in a colossal number of algorithms for automatic facial expression recognition [3].
Emotions can be described as things we feel that are caused by neurons that shoot electrons around the tiny pathways inside the amygdala, the emotion centre of the brain. Emotion can also be described as a complex experience involving related feelings, which tends to move one out of a person's individuality [4,5]. ey come with physical and physiological changes, which regulate our behaviour, due to reactions to internal and external stimuli [6]. Emotion is a salient characteristic of humans. It plays a useful role in human communication, as well as the growth and regulation of interpersonal relationships [7][8][9]. It also affects thoughts, actions, and the making of decisions [10]. In recognising emotions, several sources of emotional information have been proposed. ese sources of emotion information serve as the primary data from which emotions can be inferred. ey can be broadly classified into three groups, namely, biological indicators, behavioural indicators, and physiological signals (see Figure 1) [3,11]. e biological indicators comprise facial expressions and body postures or gestures. e physiological signals are measurements based on electrical signals recording produced by the heart, brain, muscles, and skin. ey include electroencephalography (EEG), electromyography (EMG), electrocardiography (ECG), respiration rate, skin conductance, electrooculogram (EOG), blood pressure rate, Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI), Magnetoencephalography (MEG), Functional Magnetic Resonance Imaging (fMRI), and Near-Infrared Spectroscopy (NIRS). Further, speech signals and text represent the behavioural indicators for emotion recognition.
Research on facial expression dates back to the ancient times, making facial expressions a recognised and important modality among the nonverbal forms of communication. Moreover, it can be inferred from the literature that facial expressions is mostly combined modality with other modalities when performing emotion recognition [12]. Darwin's work on the universality of facial expressions of emotions across different cultures and different tribes served as a foundation for the empirical study on facial expressions [1]. us, it made facial expressions the only measure with developed frameworks as it has been researched thoroughly in the past few decades [13]. Additionally, among the indicators for emotion recognition, facial expressions are argued to be a significant and leading measure for emotion recognition as it conveys 55% of what humans communicate and then 7% and 38% through language and speech, respectively [14,15]. Emotions can be easily and accurately detected from the face [16,17]. Furthermore, the use of the facial expressions for emotion recognition has several advantages, such as its noninvasiveness and relative cheapness, as it does not involve any physical contact with the user employing sensors in comparison to the case of collecting EEG signals or has any requirement for expensive hardware [18]. Facial expressions are useful in deciphering an individual's thoughts or state of mind during a conversation [19]. It also serves as the most real indicators that lend information on age, truthfulness, temperament, personality, and the emotional state of a person [20,21]. Hence, it can be concluded that the face is an important feature of the body, as it conveys an individual's personality, emotions, thoughts, and ideas even before it has been verbalized, playing a significant role in human communication and social interaction [3,22].
Darwin's research established the foundation for the conceptualisation of emotions, and thus it received attention among various psychologists. Ekman [23] validated Darwin theory on the universality of emotions irrespective of the tribes and cultures when he proposed the discrete theory of emotion namely the basic emotion. From that, several psychologists have theorised variants of emotions based on the basic theories, for example, Ortony et al. models [24][25][26].
ese conceptualised emotions vary according to the type and number, even though they are all borne out of Darwin and Ekman's universality of emotions. Nonetheless, the most employed emotions for emotion research based on these discrete theories are the basic emotions, which are modelled by six classes: happiness, disgust, fear, surprise, sadness, and anger [7,27]. e basic emotions are considered to be universal across different cultures and different people and are used in describing the affective states of individuals [23,28]. Each basic emotion is characterised by a unique facial expression [29].
Advances in technology have contributed immensely to the analysis of emotions, begetting automated facial expression recognition. e general framework of facial expression classification or recognition involves the following stages (refer Figure 2): image acquisition, preprocessing, feature extraction, feature selection, and classification [30].
Although there has been considerable number of surveys and literature on facial expression classification, to the best of our knowledge, there is not any comprehensive systematic review in the field of facial expression classification. Having evaluated the above surveys and reviews using an evaluation checklist by Kitchenham [31], it was observed that majority of the authors conducted a narrative review instead of a systematic review, providing general information on the various aspects of facial expression classification analysis  Figure 1: Modalities of emotion recognition [3]. [32][33][34][35][36][37][38][39][40][41][42]. Moreover, to the best of our knowledge, there is no existing systematic review on facial expression classification, which motivates our work as existing literature sought to examine the general trend of reviewing the most utilized methods. Although some reviews also performed a systematic review; however, they examined the general methods for the various facial expression recognition stages [43]. is background knowledge informed the decision to conduct a systematic review investigating the most utilized: feature extraction methods, feature selection techniques, algorithms for classification, validation methods, and the leading databases used from 2010 to the first half of 2021. is rest of this paper is structured as follows. Section 2 describes the methodology used in this review. Section 3 presents the results and discussion, and the conclusions and future work are in Section 4.

Method
e overall objective of the work is to summarize, analyze, and assess the domain of facial expression recognition, providing an up-to-date summary and review of (1) the most dominant feature selection methods utilized for facial expression recognition, (2) the most employed feature selection technique, (3) the most used classification algorithm, (4) the most utilized database, and (5) the most dominant model validation method. Further guidelines are provided to novices as to techniques to be used when conducting a facial expression classification research. e five research questions for this systematic literature review are presented in Table 1.
In this systematic review, we planned, conducted, and reported the systematic review based on procedures proposed by [31], which are planning, conducting, and reporting the review. In planning the review, a justification for the review was first established as well as the development of a review protocol. e development of the review protocol entails the definition of the research question, search strategy design, study selection, quality assessment, data extraction, and data synthesis. Figure 3 shows the review protocol.
To start with, we formulated focused research questions based on the aim of this work. en, the design of the search strategy, which involves the determination of search terms and selection of appropriate search engines, will be useful in retrieving relevant literature resources for the subsequent search process. Subsequently, to select the relevant studies that contribute to addressing the research questions, a study selection criterion was defined. As part of further polishing the study selection criteria, a pilot study selection was first employed. Afterwards, several quality checklists were established to assess relevant studies during the quality assessment process. A data extraction form was devised for the data extraction stage and later refined through piloting of the form to address technical issues, such as the ordering of the questions. Finally, the collected data was synthesized in the data synthesis stage. e appropriate methodologies for synthetization were determined based on the types of data, as well as the research questions that were addressed by the collected data. e details of the review protocol are presented in the following sections [44].

Search Strategy.
e search terms, search engines, and search process comprise the search strategy; each of the following is detailed as follows.

Search Terms.
Our search terms were derived using the steps proposed by Kitchenham et al. [45]: (i) Identifying relevant keywords from papers (ii) Using alternative synonyms for the keywords (iii) Originating major terms from research questions e resulting searching strategy was developed using keywords including facial expression, machine learning, deep learning, and classification. e generated search string used on the search engines was conducted using the following protocol: (i) facial expression AND classifier AND facial expression databases and (ii) facial expression AND feature extraction AND feature selection. e generated search string was created to have a tradeoff between manageable size as well as coverage. Applied Computational Intelligence and Soft Computing 3

Search Engines.
After formulation of the search terms, the appropriate and relevant search engines were selected. e selection of the search engines was not restricted by their availability at the home university. e search for primary studies was done using the following databases: (i) Institute of Electronic and Electrical Engineers (IEEE), (ii) Spring-erDirect, and (iii) ScienceDirect. e generated search strings were searched in the above databases for only articles and then restricted the search to the period between January 1, 2010, to June 2021, inclusive because we wanted to investigate the latest developments or trends in the domain of facial expression classification.

Search
Process. An initial informal search was conducted for the search process task to ascertain if the duty will yield enough literature resources for the study. After the search engines and search strings were identified and already defined, we searched all four electronic databases separately for articles. e candidate retrieved literature resources were downloaded and exported into Excel. However, for Scien-ceDirect, software package JabRef (https://www.jabref.org/) was used instead of Excel since the electronic database does not have the option to export to Excel and also allows for only 100 articles to be downloaded and exported at a time. In the long run, the downloaded articles were later combined and exported to Excel for manual scanning and selection of the relevant articles. For storing and managing of relevant articles, software package Mendeley was utilized (https:// www.mendeley.com/). Figure 2 presents the search process and the total number of papers identified at each phase.

Study Selection.
e study selection helps in filtering candidate papers, which provide no useful information in answering the research questions in this review. e selection was conducted in two phases-Selection stage 1 and Selection stage 2. e selection stage 1 eliminated all irrelevant candidate articles irrelevant to answering the research questions based on the inclusion and exclusion criteria. en, the selection stage 2 was used in selecting relevant papers based on the quality assessment criteria. e search process produced 361 candidate articles. ere was selection and elimination of candidate articles after a scrutiny of the title, elimination of duplicates, selection of the potential relevant papers after elimination based on scrutiny of the abstract, and the inclusion criteria and then a careful perusal of the selected relevant papers in the previous stage for quality assessment review. e review process was to select relevant papers with acceptable quality that was used for data extraction (see Figure 4).
To select relevant articles to be included in the systematic review, the following search limits were applied as formulated from the research questions.

Inclusion Criteria.
e process involves the include articles from journals such as the following: (i) Papers whose content has the main objective of discussing facial expression and classification of the basic emotions using machine or deep learning algorithms were included (ii) Articles within the range of 2010 to 2021 were included (iii) Databases utilized for the classification task in the methodology section should be human faces. (iv) Papers written in English were included.

Exclusion Criteria.
Papers whose content was either an extended abstract or Powerpoint were excluded.
(i) Books and magazines were excluded (ii) Papers based on facial expression for pain analysis or diseases such as altruism, depression, and Parkinson were excluded (iii) Review papers were also excluded erefore, a total of 240 potentially relevant articles were obtained after selection stage 1 for quality assessment review in selection stage 2. After the application of the quality assessment criteria in stage 2, we obtained 233 final relevant studies as shown in the Appendix.

Study Quality Assessment.
Measures to ensure the quality of the search were carried out during the review. e review articles were manually done after the original automated search. en, we verified papers to be included or excluded after the evaluation and analysis of the title and abstracts.
e automated initial search for articles was To review the algorithms deployed for classifying facial expressions into the basic emotions

RQ4
What is the most utilized database for facial expression classification?
Explore the kind of database used: either posed or spontaneous or 2D or 3D database for facial expression classification

RQ5
What is the most dominant validation method for facial expression classification?
To investigate the most employed technique in evaluating the classification as well as portioning the dataset into fractions 4 Applied Computational Intelligence and Soft Computing conducted in private browsing to avoid the influence of historical searches. Additionally, a quality questionnaire was formed to assess the relevance of the included studies. ese questions were formulated to test the relevance, rigour, and credibility of the papers. Some of these questions were derived from Wen et al. [44,46]. Each question was scored with one of these 3 optional answers: "yes" � 1, "partly yes" � 0.5, and "no" � 0. e resulting scores were summed as scores from answers to the quality assessment questions, which is the quality score for each of the included studies.
Included studies with quality above 5 were considered for inclusion for the data extraction and synthesis processes.

Data Extraction.
We designed a data form in Excel to collect data that answer the review questions from each included paper independently using Table 2. We summarized information on both feature extraction and reduction techniques, algorithms, the classification algorithms, the validation methods, and the reported databases. Standard information, such as publication details, date of publication, title, author name or authors' names, and publication venue, were also collected. During the extraction process, we observed not all the included studies provided answers to all the review questions. Another issue encountered during the data extraction process was that some papers used different terminologies. For instance, dimensionality reduction was synonymous to feature selection in some papers. To avoid issues of ambiguity, we adopted the terminology feature selection for all.

Data
Synthesis. e collected data was saved for use during the data synthesizing stage. e purpose of data synthesis is to aggregate and summarize the collected data from the included studies to provide answers to the formulated review questions. Answers from each of the included studies with similar or comparable evidence are accumulated to provide conclusive answers. Quantitative data were extracted in this review and thus after synthesizing our results, our outcomes were shown in a comparable way [31]. Additionally, we utilized a narrative synthesis method due to the extracted data as a result of our review questions. erefore, we used visualization techniques, such as funnel graphs, clustered bar graphs, line graphs, clustered columns, and pie charts. Also, we employed the use of tables for summarization and presentation of the results [44,47].

Description of the Included Studies.
In this section, a brief overview of the included studies is recounted. We identified 233 articles published in the period 2010 to 2021 inclusive in the area of facial expression recognition. e research questions answered by each of the included studies are presented in Table 3. Applied Computational Intelligence and Soft Computing

Publication
Year.
e distribution of the articles published from the year 2010 to 2021 is shown in Figure 5. Generally, the distribution shows an upward trend of research in the study domain. e line graph in Figure 5 shows that there was a steep rise in 2011 and a fall in 2020. e 2020 fall could be accounted for a general recession in 2020 due to COVID-19 pandemic. e publications recorded in 2021 are just that of half of the year, an implication that it is likely to shoot up tremendously at the end of the year. With the upward uprise in the trend of publication, we anticipate there could be exceedingly more articles as there is an increase in the application of facial expression in areas, such as human-computer interaction, medicine for pain detection, autism, and security [47,48].

Publication Source.
e information on the publications along with the number of primary studies in the corresponding journal is summarized in Table 4. e included studies were published in fifty-four different journals. e journal that recorded the most publications is Multimedia Tools and Applications with a whooping sixty-three publications. is was followed by Neurocomputing (18) and Visual Computer (16). It was observed that the majority of our primary studies were obtained from ScienceDirect and SpringerDirect.

What Is the Most Used Feature Extraction Method for Facial Expression Classification (RQ1)?
To develop better models, a considerable number of techniques have been proposed and utilized over the years [47]. Feature extraction is mostly considered the second and most important step in facial expression recognition as the selection of the features is an important task. It helps in representing the facial image effectively by extracting the subtle changes of a facial image into a feature vector [40,49]. e results displayed in Figure 6 shows that local binary pattern (LBP) is the most commonly used feature extraction method and it accounts for 22.9% of all the twenty-nine (29) methods used by researchers in the period. is is followed by geometric related methods, which also accounted for 14%. ough less reported than LBP, the study reported 24 different geometric methods formulated within the 12 years of study. e third and fourth most frequently utilized technique is Histogram of Oriented Gradients (HOG -12.9%) and Gabor ( Figure 6. e study also finds that various variants of feature extraction methods are being developed. Methods, such as LBP, CNN, HOG, and Gabor, have different advanced variants.
According to Shan et al. [50] and Zavaschi et al. [51], the original LBP that was proposed by Ojala et al. [52] frequently surpasses the widely adopted Gabor because of its ability to save computational resource whilst retaining facial information as well as its tolerance to illumination. Chengeta and Viriri [37] also state that LBP has been widely adopted because it possesses rotational and grayscale invariance properties. Zavaschi et al. [51]    Are the results and findings clearly stated? QA12 Are the limitations of the study specified?
been the third adopted technique. However, in comparison to LBP, the authors [41] mentioned that Gabor filter usually attains a better accuracy between 82.5% and 99% and is less sophisticated. e figure extraction techniques are presented in Figure 6.

What Is the Most Employed Feature Selection or Reduction Technique for Facial Expression Classification (RQ2)?
In this section, the various feature selection techniques utilized within the twelve years period are summarized. Feature selection helps in the selection of the most important features, discarding the unimportant ones [47,53]. It was noticed from Figure 7 that only 33% of the included articles used a feature selection method. In comparison to the other methods, principal component analysis dominated research attention over the years with a percentage of 30.6%, and this was followed by the Linear Discriminant Analysis of 18.1%. In affirmation to our result from other relevant reviews, Revina and Emmanuel [41] and Fan and Tjahjadi [54] stated that PCA was the most adopted technique for the feature selection among several feature selection algorithms, such as LDA and AdaBoost. As noted above, PCA can also be used for feature extraction as it extracts both global and low dimensional features. Again, probably the reason why PCA has gained a lot of recognition as feature selection algorithm in facial expression recognition is that it performs well in removing uncorrelated features and improves visualization as well. PCA is also known to reduce overfitting. ese are true key points that improve facial expression recognition; thus, PCA is fitted to be adopted.

What Is the Most Dominant Algorithm Utilized for Facial Expression Recognition (RQ3)?
is section summarizes the various algorithms employed in facial expression recognitions. e distribution of the consistently used algorithms is shown in Figure 8. Ideally, classification is the final stage in facial expression recognition [40]. e classifier must be trained to categorize expressions into sadness, anger, fear, happiness, disgust, surprise, neutral, and sometimes other emotions like joy and smiling [27,41]. e result records that support vector machine (SVM) was the most dominant algorithm used, which alone accounted for 48.6%. Also, variants of SVM, such as iterative universum twin support vector machine, were used. Followed by SVM  Applied Computational Intelligence and Soft Computing is convolutional neural network (CNN), which also accounts for 20.6%. e K-Nearest Neighbor (KNN) and the Hidden Markov Model (HMM) were the third and fourth frequently employed algorithms, respectively. Other less used algorithms as summarized from the included studies are found in Figure 8. Additionally, it was observed that some algorithms were fused with other algorithms for classification. For instance, the naïve Bayes was boosted with neural   Applied Computational Intelligence and Soft Computing network ensemble, and the random forest was with SVM labelers. In summary, it was observed that SVM was the popular choice for classification of the facial expressions. e authors in [41,55] affirm that SVM is the most usable classifier for facial expression classification as it produces better classification and recognition accuracy.

What Is the Most Utilized Database for Facial Expression Classification (RQ4)?
e summary of the frequency of the most utilized database is presented in Figure 9. Databases are utilized for the validation of proposed methods. Our findings discovered forty-three (43) different databases. Figure 9 shows the frequencies of usage of the databases within the period.
From our outcome, CK+ and JAFFE performed better than the other databases with accuracies between 90 and 100.
e reason is perhaps of their two-dimensional nature. As even now, there are lots of studies in 2-dimensional facial expression recognition. Additionally, it was observed that CK+ on several experiments outperformed JAFFE. CK+ consists of both posed and spontaneous (only smile expressions) video sequences of African Americans, Euro-Americans, and others (6%) within ages of 18-50, and JAFFE consists of 213 posed images from 10 Japanese females [56,57]. Although CK+ is a combination of posed and spontaneous smile expressions, it is normally classified as aposed database [34,36]. e variety of different expressions in CK+ can make it useful for all kinds of facial expressions study, namely, pose, emotion, age, 2D, and 3D. e combination of dark skin, color skin, and Caucasians even make it more realistic in facial expression studies.
In some few years ago, Revina and Emmanuel [41] and Kumar and Sharma [55] reported that JAFFE and CK are the most utilized databases. However, this study has proved otherwise.

What Is the Most Dominant Validation Method for Model Evaluation for Facial Expression Classification (RQ5)?
Cross-validation is useful in evaluating a model's accuracy.
is is done by splitting the database into two sets: one for training the model and the other for testing [44,56]. e usual cross-validation methods are leave-one-out cross and k-fold, specifically ten-fold and five-fold, validations. e published research papers that employed validations are shown in Table 3. Figure 10 shows how the various validation methods are used for the 12-year period. k-fold by  way of training and testing k-times completely avoids overfitting and underfitting [34]. Similar to our results, Bengio and Lecun [57] affirm that k-fold is the oftenadopted validation method.
In straight forward, in cross-validation, k � 10 is more appealing in terms of computing efficiency. Furthermore, lower k values, such as 2 or 3, have a large bias even though they are computationally efficient.

Conclusions
In this study, we systematically reviewed the domain of facial expression recognition by investigating the most dominant techniques utilized at the various phases, such as feature extraction, selection, and classification. First, we identified 233 papers published in 2010 to 2021, after the execution of a series of systematic steps and quality assessment. en, we extracted, analyzed, and summarized the collated data from the included studies based on the most utilized: feature extraction technique, feature selection method, classification algorithm, the databases, and validation algorithm. e relevant findings are as follows: (i) e techniques utilized in feature extraction are twenty-nine (29)  is review provides recommendations and guidelines to researchers specifically to new researchers who do not have enough background in the field of facial expression classification analysis as to which methods to adopt for their research work since these excelling and most used techniques will be useful in the correct and efficient unravel of facial expressions [38].
For future work, we will investigate how a combination of these dominant methods and databases will perform in the classification accuracy.

Conflicts of Interest
e authors declare that they have no conflicts of interest.