Emotion Recognition Based on Type-2 Recurrent Wavelet Fuzzy Brain Emotion Learning Network Model

Emotion recognition plays a crucial role in human-robot emotional interaction applications, and the brain emotional learning model is one of several emotion recognition methods, but the learning rules of original brain emotional learning model play poor adaptation and do not work very well. In fact, existing facial emotion recognition methods do not have high accuracy and are not suﬃciently practical in real-time applications. In order to solve this problem, this paper introduces an optimal model, which merges interval type-2 recurrent wavelet fuzzy system and brain emotional learning network for emotion recognition. The proposed model takes advantage of type-2 recurrent wavelet fuzzy theory and brain emotional neural network. There are no rules initially, and then the structure and parameters of model are tuning online simultaneously by the gradient approach and Lyapunov function. The system input data streams are directly imported into the neural network through a type-2 recurrent wavelet fuzzy inference system; then, the results are subsequently piped into sensory and emotional channels which jointly produce the ﬁnal outputs of the network. The proposed model could reduce the uncertainty in terms of vagueness by using type-2 recurrent wavelet fuzzy theory and removing noise samples. Finally, the superior performance of the proposed method is demonstrated by its comparison with some emotion recognition methods on ﬁve emotion databases.


Introduction
Emotion recognition is one of the most effective methods to obtain human information to improve the human-robot interaction, which includes body language emotion [1], facial expression recognition [2], and speech emotion recognition [3]. So it is necessary to recognize humans' emotional states as accurate as possible for robot.
One important part of providing effective and natural interaction between human and computers is to enable computers to understand the emotional states expressed by the human subjects. Rahul et al. [4] proposed ont method to put this theory's practical applications. Also, Cen et al. [5] indicated a new measure of authentic auditory emotion recognition, and put this application to patients with schizophrenia. Bhandari and Pal [6] aim to check if an explicit use of edges can help in emotion recognition from images using convolutional neural network (CNN).
For the speech emotion recognition, speaker-dependent (SD) and speaker-independent (SI) have been introduced in recognition processing [7,8], in which the support vector machine [8], neural network [9], deep conventional neural networks, and deep brief network have been used for speech emotion recognition [10][11][12][13]. For the facial expression recognition, which has been successfully applied in many application scenarios, such as human-computer interaction systems and games, the machine could interpret user emotions through emoticon recognition, which makes the algorithm more intelligent and humanized. Single-tag learning or multitag learning paradigms have been used for facial expression recognition. Yang et al. [14] expanded the conditional probability neural network to a fuzzy form, which is used to predict the emotions expressed by the type-2 recurrent wavelet fuzzy brain emotional learning network can be constructed automatically from the empty initial rule; (4) numerical simulations have been made to demonstrate the performance of the proposed method for emotion recognition.
is paper is organized as follows: Section 2 presents the structure of novel interval type-2 recurrent wavelet fuzzy brain emotional learning network model, the parameters learning, and self-organizing structure learning algorithms. Section 3 shows the simulation results. And finally, the conclusions are given in Section 4.

Framework of the IT2RWFBELN
e IT2RWFBELN is composed of two parts: one of them is amygdala network, which is suitable for emotional judgment, and the other one is orbitofrontal cortex network, which is responsible for emotional control. e fuzzy inference part of IT2RWFBELN adopts interval type-2 recurrent wavelet system; then these two parts can be described as if x 1 is μ a 1j and · · · and x i is μ a ij and · · · and x n i is μ a if x 1 is μ a 1j and · · · and x i is μ a ij and · · · and x n i is μ a where the membership function grades and the weights for the two network could be described as  Figure 1 shows the structure of IT2RWFBELN, which includes the amygdala network, the orbitofrontal cortex network, and the interval type-2 recurrent wavelet fuzzy sets. e proposed IT2RWFBELN is constructed with six layers, which includes an input layer, a MF layer, a spatial firing layer, a weight memory layer, a summarily layer, and an output layer: (1) Input space: these nodes in this space are given as where n i represents the number of the input signals, and all data from this layer will be transmit to the next space without any computation. (2) MF space: in this layer, Gaussian activation function is adopted to finish the fuzzification by using interval type-2 wavelet membership functions, which are adopted as the basis function. en the approximation ability by using wavelet functions than triangle or Gaussian basis functions, so the learning speed could be increased. Furthermore, the recurrent term with previous information is inserted in this layer; therefore, the network performance can be further improved. e wavelet membership functions can be represented as 2 Mathematical Problems in Engineering where the parameters c l ik , c r ik , and σ ik represent the center and variance of the fuzzy rules of type-2 wavelet membership functions. So the output of ith input feature and kth rule can be represented by an interval MF [μ ik , μ ik ], and x ri � [x ri , x ri ] represents the recurrent inputs, which could be given as

Amygdala network
Orbitofrontal cortex network Mathematical Problems in Engineering where r i is the recurrent gain for the network. (3) Spatial firing layer: the rule in this layer is composed of both upper and lower firing strength of MFs as well as non-MFs. e firing strength of interval type-2 fuzzy rule is interval and can be calculated as follows: where k F denotes the intervals of the firing strength of MFs, the m, and M represent number of input signals and fuzzy rules, respectively. (4) Weight memory space: this layer contains two memory spaces: they are amygdala memory and an orbitofrontal memory, which are interval values because the firing space is interval. So the weight of the kth output of the amygdala network and orbitofrontal network are given as and the updating rules of ω k and v k are introduced in the derivative form as where β and λindicate the learning rates of the updating rules and o k , a k represent the outputs of the ω k and v k value. (5) Defuzzification space: the output of this defuzzification space is calculated by the output of the firing space and the weight space. So the left-and rightmost point values for amygdala and orbitofrontal network output are as follows: (6) Output space: the output of defuzzification space is an interval value, so the average operations are used to obtain the amygdala network and e orbitofrontal cortex network outputs are 2.2. Self-Organization of IT2RWFBELN. According to the theory of nuclear fuzzy rough set, fuzzy upper approximation is used to indicate the possibility of a sample belonging to a certain emotion, and fuzzy lower approximation is used to indicate the inevitability of a sample belonging to a certain emotion. In general, the classification decision of a sample has less uncertainty for the strong ability to distinguish between feature spaces, which means that the closer the upper fuzzy approximation is to the lower fuzzy approximation, the better. Achieving the better structure of the IT2RWFBELN requires deciding the appropriate rules by using the corresponding self-organizing algorithm. If the number of rules is large, this will cost long time for computational loading, and also if the numbers are small, this cannot reflect all the cases especially with the data with large ranges. Initially, there are no rules and MFs in the first space; when the first input data stream comes, the first MF then will be created. en the self-organizing algorithm will be used to determine whether to generate new rules and MFs or to delete inappropriate rules and MFs. In this paper, the (interval type-2 fuzzy c-means) IT2FCM is used here to choose the cluster centers of the membership functions for fuzzy rules of the RT2WFNN. e IT2FCM [32] is an iterative optimization algorithm that minimizes the objective function as where d 2 ik � ‖x k − v i ‖ denotes the distance between the cluster centers v i and an input pattern x k . e main steps of the IT2FCM can be shown as follows: (1) Set the fuzzifiers and the number c of the cluster prototypes, and initialize the cluster center V using GA algorithm (2) Calculate the distance between the cluster centers v i and input pattern x k ; the lower and upper partition functions can be calculated by (2)  (3) Update the cluster center V ′ , and the interval type-1 fuzzy set [c L , c R ] is obtained during the iterative process and optimal improved EKM algorithm, which is adopted to estimate both ends of the interval fuzzy set (4) e new cluster center V ′ is updated by a defuzzification method as V ′ � (c L + c R )/2; go to the next step; otherwise, set V � V ′ (5) Finally, the type-reduction of the type-2 fuzzy partition matrix is set as μ ik � (μ L ik + μ R ik )/2 e output of IT2FCM algorithm is an interval type-2 FS that cannot be transformed to crisp set by defuzzifier directly. Hence, the type-reduction process is needed. e aim of type-reduction is to compute the centroid of a type-2 fuzzy set. At present, iterative Karnik-Mendel (KM) algorithm and the enhanced Karnik-Mendel (EKM) algorithm can compute the centroid of an interval type-2 fuzzy set efficiently. e improved EKM is used here, which could change the initialization conditions of switch points and improve the searching method for switch points.

Parameters Learning Algorithm of IT2RWFBELN.
Defining one Lyapunov cost function as V(E(t)) � (1/2)E(t), then _ V(E(t)) � E(t) _ E(t) and using the gradient descent method, the online tuning laws for the parameters of IT2RWFBELN are given as where η ω and η v represent the learning rates for updating the weight of orbitofrontal cortex and an amygdala network, respectively, and η a c , η a σ , η o c , and η o σ represent the learning rates for updating the means and the variances of type-2 wavelet MFs, respectively. η a r and η o r represent the learning rates for updating the IF-indices, indicating the hesitation level of recurring t. By applying the chain rule for the derivation of above terms, then we have

Convergence Analyses
Theorem 1. Let λ m and λ σ be the learning rates for the parameter of FBELC m ij and σ ij , respectively. en, the stable convergence is guaranteed if λ m and λ σ are chosen as Proof. A Lyapunov function is selected as L(k) � 1 2 e 2 (k). (15) e change of the Lyapunov function is e predicted error can be represented by where Δm ij denotes the change of m ij . Using (11), it is obtained that Substituting (11) and (18) into (17) yields us, If λ m is chosen as (13), ΔL(k) in (20) is less than 0. erefore, the Lyapunov stability of L(k) > 0 and L(k) < 0 is guaranteed. e proof for λ σ can be derived in similar method, which should be chosen as (14). is completes the proof.

Experiments and Validation
is paper implements the IT2RWFBELN model on Python 2.7. e initialization parameters of the IT2RWFBELN are derived from the ImageNet data set. e calculation of the model is done on the GPU. ey were based on Windows OS i7-8700 CPU with a clock speed of 3.20 GHz, RAM of 8 GB, and GTX 1070 GPU.
In this part, in order to verify the performance of IT2RWFBELN in the facial emotion recognition, two groups of experiments have been conducted on five public expression data sets (Jafffe, BU-3DFE, CASIA, SAVEE, and FAU). e first one is tested on Chinese corpus of CASIA [36], English corpus of SAVEE [37], and FAU emotion corpus [38], in which both speaker-dependent (SD) and speaker-independent (SI) speech emotion recognition are performed; the second one is tested on Jafffe [39] and BU-3DFE [40], in which six different metrics are introduced to test. To verify the robustness and suitability of the proposed model, some conventional methods are used for comparison.

Emotion Databases.
e five data sets include CASIA database, SAVEE database, FAU database, JAFFE database, and BU-3DFE database, which are described as CASIA Chinese emotion corpus provided with the data set including 300 emotional short utterances, which contain six basic emotions: surprise, happy, sad, angry, fear, and neutral. SAVEE data set is conducted by seven different emotions on recording from four males and seven basic emotions: surprise, happy, sad, angry, fear, disgust, and neutral. e FAU database is conducted by recording 30 females and 21 males and contains five emotional states: angry, emphatic, positive, neutral, and rest. e JAFFE database is conducted by 10 different people with 213 facial images, which include six expressions: anger, disgust, fear, happy, sad, surprise, and neutral. e BU-3DFE multiuser facial expression database is conducted by 56 females and 44 males, which includes six facial expressions: anger, disgust, fear, happiness, sadness, and surprise.

Experiments on JAFEE and BU-3DFE Databases.
e first experiments focused on facial emotion recognition for JAFFE and BU-3DFE database. BU-3DFE database has six emotions: anger, disgust, fear, happiness, sadness, and surprise.
e JAFFE database has seven emotions: fear, happy, sadness, anger, disgust, surprise, and neutral. Furthermore, six different metrics including chebyshev, kldist, cosine, canberra, clark, and intersection are used here to validate the effectiveness of the proposed method by comparing with other methods, including Local Binary Patterns (LBP), "AN(FC1)" and "AN(FC2)," which are the first two layers of fully connected layer output of Alex Net pertrained, "AN-FT(MSE)" and "AN-FT(KL)," respectively, and represent the fuzzy classification results obtained by Alex Net using the mean square error and KL divergence as loss functions and fuzzy rough conventional neural network (FRCNN). Table 1 gives the experimental results for the BU-3DFE data set. e features are obtained in the type-2 fuzzy sets convolutional neural network training task, and the fuzzy classification results are obtained in the fuzzy expression recognition task based on Algorithm Adaption k-Nearest-Neighbors classification. It can be considered that type-2 fuzzy convolutional neural network effectively learns relevant knowledge from fuzzy multilabels. erefore, the fuzzy classification effect is better than "AN(FC1)" and "AN(FC2). As shown in Table 2, using Algorithm Adaption k-Nearest-Neighbors as the fuzzy classification algorithm, under the six metrics, emotional learning networks extract features are better than "LBP." Compared with the other features, type-2 fuzzy wavelet emotional learning neural network achieves good fuzzy classification accuracy under various indicators. Since the performance of the Algorithm Adaption k-Nearest-Neighbors algorithm depends on the distinguishability of the feature space, it can be considered that, compared to other algorithms, type-2 fuzzy wavelet neural network model maps the original picture to a space more suitable for distinguishing facial expressions, that is, a face picture with a similar expression.
To evaluate the performance of the proposed algorithm more efficiently, Tables 3 and 4 list the accuracy of proposed model and other algorithms on the JAFEE and BU-3DFE databases. From the results, we can see that accuracy of our algorithm is superior to most of the other advanced algorithms.

Experiments on CASIA, SAVEE, and FAU Databases.
To verify the performance of the IT2RWFBELN model for speech emotion recognition, the performance of the proposed model is compared with conventional brain emotional learning (BEL), support vector machine (SVM), extreme learning machine (ELM), genetic algorithm-brain emotional learning (GA-BEL), BELFIS, and BELBLA methods that have been made on CASIA, SAVEE, and FAU databases, are presented in Tables 5-7.  Table 5 shows the recognition accuracy of BEL model, SVM, ELM, GA-BEL, BELFIS, and BELBLA methods on CASIA database, which illustrates that the average accuracy based on IT2RWFBELN model is improved for SD recognition and SI recognition compared with BEL and GA-BEL     Table 6 shows the recognition accuracy of BEL model, SVM, ELM, GA-BEL, BELFIS, and BELBLA methods on SAVEE database, which shows that the IT2RWFBELN model has higher accuracy on SD and SI than BEL and GA-BEL models and which is also superior compared with SVM and ELM on SD, but the accuracy of IT2RWFBELN model is lower than the ELM on SI for some instance as well. Table 7 shows the recognition accuracy of BEL model, SVM, ELM, GA-BEL, BELFIS, and BELBLA methods on FAU database, which shows that the IT2RWFBELN model has higher accuracy on SD and SI than BEL and GA-BEL models and which is also superior compared with SVM and ELM on SD, but the accuracy of IT2RWFBELN model is similar to ELM.
According to the above two groups of experiments, we can conclude that the proposed model could get better recognition effects compared with other models, owing to the combination of type-2 recurrent wavelet fuzzy system and brain emotion learning network. As a result, the proposed method is feasible for facial or speech emotion recognition.

Conclusions
is paper introduces one applicable model for emotion recognition, which is vital part in the communication between humans and machines. e proposed model is based on the combination of interval type-2 recurrent wavelet fuzzy system and brain emotional learning network (IT2RWFBELN), which takes advantage of dealing with uncertainties by interval type-2 recurrent wavelet fuzzy system and the lower computation by brain emotional learning network. e proposed model takes advantage of type-2 recurrent wavelet fuzzy theory and brain emotional neural network, and there are no rules initially; then the structure and parameters of model are tuning online simultaneously by gradient approach and Lyapunov function.
e system input data streams are directly imported into the neural network through a type-2 recurrent wavelet fuzzy inference system, and then the results were subsequently piped into sensory and emotional channels which jointly produce the final outputs of the network. In order to demonstrate the performance of IT2RWFBELN model, two groups of experiments are conduced, which include facial expression recognition and speech emotion recognition. e results illustrated the effectiveness of the proposed recognition model.

Data Availability
e data used in this paper have been cited in the article.