An Emotion Detection System Based on Multi Least Squares Twin Support Vector Machine

Posttraumatic stress disorder (PTSD), bipolar manic disorder (BMD), obsessive compulsive disorder (OCD), depression, and suicide are some major problems existing in civilian and military life. The change in emotion is responsible for such type of diseases. So, it is essential to develop a robust and reliable emotion detection system which is suitable for real world applications. Apart from healthcare, importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. Detection of emotion in speech can be applied in a variety of situations to allocate limited human resources to clients with the highest levels of distress or need, such as in automated call centers or in a nursing home. In this paper, we used a novel multi least squares twin support vectormachine classifier in order to detect seven different emotions such as anger, happiness, sadness, anxiety, disgust, panic, and neutral emotions. The experimental result indicates better performance of the proposed technique over other existing approaches.The result suggests that the proposed emotion detection system may be used for screening of mental status.


Introduction
Stressful situation can cause some major psychiatric problems such as depression, suicide, PTSD, BMD, and OCD in civilian as well as in military life.Earlier treatment may become useful for such type of psychiatric problems [1].So, there is a need to develop technology for recognizing early change in human behavior.Several biomarkers are reported by the medical researchers for psychiatric diseases [1,2].But these biomarkers are not effective in military life as they required a big and complicated machine for detecting psychiatric diseases.On the other hand, there is a fast development in voice, speech, and emotion detection technologies in engineering field.These technologies provide human-machine interaction for emotion detection and further treatment of psychiatric problems [3][4][5].Several researches measured the level of fatigue and stress from speech [6].But the level of fatigue and stress does not lead to psychiatric disorder directly.Emotion change of a human can cause mental diseases.Mostly, clinicians recognize the mental state of a patient from his/her face and voice which represents his/her emotion.This fact leads to the possibility that emotion detection system can be used for recognizing the mental disorder or disease in human.Early detection of disease improves the prognosis and is helpful to provide effective treatment at early stages.Emotion detection system can provide support to the clinicians to perform the task of emotion detection more efficiently.In automated call centers or in a nursing home, while nursing staff may not be available to assist everyone, automated emotion detection can be used to "triage" a patient.Automated emotion detection system is helpful to recognize whether a patient becomes angry or impatient and if so then the staff or treatment is provided to that patient as soon as possible.
Nowadays emotion detection from speech is an active research area and is useful for man-machine interaction [3][4][5][6][7].Various researches have been done about automated emotion detection from facial expressions.But this task is computationally expensive and complex due to the requirement of high quality cameras for capturing face images.Apart from facial expression, emotions are also detected from speech which has been proven to be more promising modality.Since speech is the primary mode of human communication, the detection of emotion from speech is an important aspect.

Advances in Artificial Intelligence
Machine learning algorithms such as -nearest neighbor (NN), artificial neural network (ANN), and support vector machine (SVM) are widely used for emotion detection due to their excellent performance [8][9][10][11][12][13].In this paper, the proposed emotion detection system recognizes seven different emotions which are anger, anxiety, disgust, happiness, sadness, panic, and neutral emotions.Different emotions can be seen as different classes.So, it requires a multiclassifier for emotion detection.In this paper, we proposed a novel multi least squares twin support vector machine (MLSTSVM) classifier which is the extension of binary least squares twin support vector machine (LSTSVM).So, the proposed system predicts the class or emotion for a given input.In order to check the validity of the proposed classifier, we evaluated its performance against 5 benchmark datasets.
The paper is organized as follows: introduction section includes need for emotion detection system.Section 2 provides the detail of our novel classifier which is multi least squares twin support vector machine.Proposed framework for emotion detection and dataset details are discussed in Section 3. The experimental results and conclusion of the proposed emotion detection system are presented in Sections 4 and 5, respectively.

Multi Least Squares Twin Support Vector Machine
Kumar and Gopal proposed LSTSVM for binary classification which solves two linear programming problems and constructs two nonparallel hyperplanes, one for each class [14].Since real world data contains multiple classes and requires a classifier that works well for multiple classes, in this paper, we propose a novel multiclassifier termed as MLSTSVM.This classifier is an extension of the binary LSTSVM and is based on "one-versus-rest" strategy.Here, we selected and extended the binary LSTSVM because it shows better generalization ability and is faster as compared to other existing approaches [14,15] The matrix   includes all the data points except th class.MLSTSVM classifier for both linear and nonlinear cases is formulated as follows.

Linear Case. The equation of 𝑘th hyperplane is obtained as
where where   > 0 represents the Lagrangian multiplier.The optimization of Lagrangian function is achieved by differentiating it with respect to normal vector, bias, slack variable, and Lagrangian multiplier and the following Karush-Kuhn-Tucker (KKT) conditions are obtained: By combining ( 5) and ( 6), the following equation is obtained: Consider   = [   1 ] and   = [   2 ] and   = [ After putting these values in (8), it may be reformulated as The above equation requires the inverse of      .Sometimes a matrix may be singular or ill-conditioned due to which it is not possible to obtain its inverse.The situation may be avoided by adding a regularization term  to the matrix and the above equation is reformulated as where  is a very small nonnegative integer and  is an identity matrix of suitable size.Lagrangian multiplier is obtained from (7) as After substituting the value of   in (10), we obtain the normal vector and bias for th classifier as follows: For a new data point or test data sample, its perpendicular distance is measured from each hyperplane and the data sample is assigned to that class depending upon which of the planes lies at minimum distance from it.
(1) For  = 1 to , where  is total number of classes, (i) obtain two matrices   and   as where   and   denote the data points of th class and the rest of the classes, respectively; (ii) use validation process to obtain penalty parameters; (iii) calculate weight and bias for each class by using (13).
(2) Achieve decision function by using (14).Use this function to assign the class to new data points.

Nonlinear Case.
Mostly, the real dataset is nonlinear in nature; that is, the classes are separable by nonlinear class boundaries.So, it is essential for a classifier that it works well both for linear and for nonlinear separable data points.In this section, we proposed the formulation of the MLSTSVM classifier for nonlinear cases.Firstly, kernel functions are used for mapping the input data points into higher-dimensional feature space and then the data points are classified by constructing nonlinear or kernel surfaces in this space.In higher-dimensional space, the equation of th kernel surface or nonlinear surface for any kernel function is obtained as Ker (,   )   +   = 0, where  = 1, . . ., , where  = [    ]  and Ker is any suitable kernel function.
The optimization problem of MLSTSVM for nonlinear cases is formulated as Lagrangian function of the above-mentioned equation is achieved as KKT conditions for nonlinear MLSTSVM are Combining ( 19) and ( 20), we get Let   = [Ker(  ,   )  1 ] and   = [Ker(  ,   )  2 ].
Then (22) can be rewritten as The value of normal vector and bias is achieved by solving (21) and (22) as For a new data point, its perpendicular distance is measured from each nonlinear surface and it is assigned to that class depending upon which of the planes lies at minimum distance from it.The values of weight and bias are used to construct kernel surfaces for each class.The decision function for nonlinear MLSTSVM is obtained as where   and   denote the data points of th class and the rest of the classes correspondingly; (ii) use validation process to obtain penalty parameters; (iii) calculate weight and bias for each class by using (24).
(3) Obtain decision function by using (25) and assign the class to new data points by using this decision function.
In order to prove the validity of the proposed MLSTSVM, we performed experiment on five benchmark datasets.All the datasets are taken from UCI machine learning database [16].Table 1 shows the accuracy comparison of the proposed MLSTSVM classifier with other exiting classifiers.Accuracy refers to the correct classification rate and is calculated by taking the average of testing accuracies.It is clear from the table that the proposed classifier has achieved better accuracy for Wine, Glass, Vehicle, and Teaching Evaluation datasets as compared to NN, ANN, and multi-SVM, while for Iris dataset MLSTSVM obtained 97.75% accuracy which is better than ANN and multi-SVM and comparable with NN.The experiment is performed using 10-fold cross validation method.This tool generates voice report from audio files and converts the audio files into text files.The voice report of these audio files contains different features of voice like pitch, intensity, shimmer, jitter, and so forth.Pitch, also known as vibration rate of the vocal folds, is one of the most important and essential parts of the human voice.The sound of the voice varies according to the vibration rate.High pitch refers to high vibration rate which further increases the sound of the voice while low pitch corresponds to the lower sound.Vibration rate is dependent on the duration and thickness of vocal cords.Relaxation and tightening of Advances in Artificial Intelligence the muscles around vocal cords also affect the vibration rate.Emotion or mood of a person also has an effect on his/her pitch.During excitement or fright, the muscles put strain on vocal cords which further produce high pitch voice.The tone of a person describes the way a statement is presented and can convey the emotion, psychological arousal, and mood of that person.Usually, softer pitch and tone are seen as nonaggressive and indicate the friendly behavior of a person.Jitter and shimmer are another important attribute of a voice.Jitter and shimmer measure the irregularity percentage in the pitch and in the amplitude of the vocal note correspondingly.Voice quality and signal-to-noise ratio can be estimated from harmonicity.The second part includes significant feature selection from voice report generated by PRAAT and classification of emotions using MLSTSVM.Feature selection (FS) is the process of selecting relevant and important attributes from a dataset and plays a significant role in the construction of a classification system [19][20][21].FS is also termed as attribute selection process which reduces the number of input attributes by selecting only important attributes for a classifier in order to enhance its performance.In this paper, we used the combination of -score and sequential forward selection approaches for feature selection.F-Score.It is a significant FS approach and mostly used in machine learning.Initially, -score calculates the discrimination between two class sets of real numbers.Later, it is extended efficiently for calculating the discrimination between more than two class sets of real numbers [19][20][21].Let the dataset contain "" classes and each class contain   data points, where  = 1, 2, . . ., .The formulation of -score for th feature is obtained as [21,22]

Description of Dataset and Proposed Model
where   is average of the th feature of the whole dataset,    is average of the th feature of the th class, and   , is th feature of the th instance in the th class.The value of -score for a feature represents its discriminative ability and larger value of it indicates that the corresponding feature has better separation ability.Sequential Forward Selection.It works in bottom to top fashion and begins with an empty set of features.It starts with adding single best feature to the empty set.At each subsequent step, the best one feature of the remaining original features is added to the previous feature set.
The result obtained from feature selection is given to the proposed classifier and usability of feature selection is compared while observing the behavior of the same classifier without feature selection.The detailed description of MLSTSVM is explained in Section 2.

Proposed Framework.
Figure 2 shows the proposed framework for emotion detection from audio files by using  a novel MLSTSVM classifier.The proposed system contains the following steps.
Step 1. Record the voice in audio format of different emotions.
Step 2. Convert the audio files into text files using PRAAT scripting tool.
Step 3. Generate a database containing different emotions.
Step 5. Partition the dataset into training and testing datasets by using -fold cross validation.
Step 6. Select the relevant features with the help of -score and sequential forward selection approaches.
Step 7. Train and test the model with selected features and evaluate its performance with different features.
Step 8. Select the model with highest classification accuracy.

Experimental Results and Discussion
The benchmark and real data exist in the form of audio files.The experiment is performed on the time span of 2 seconds for each audio file for which a tool Power Audio Cutter is used to cut the audio files as per required duration.The feature of audio files is extracted by using PRAAT scripting tool.Figure 3 shows the browsing of audio file for running PRAAT script in order to extract features from it.The voice report of an audio file generated by using PRAAT is shown in Figure 4.  Table 2 shows the extracted speech features by using PRAAT.PRAAT extracted six major attributes and 24 subattributes from the audio files.
The emotion detection benchmark dataset used in this research work contains 290 voice recordings with 24 features and 7 class labels are given in Table 3. Figure 5 shows the snapshot of the emotion detection dataset.In this snapshot 35 instances, 5 instances of each emotion, have been taken to generate a complete view of range of various attributes for corresponding class.The first attribute of the snapshot denotes emotions.Here, "1, " "2, " "3, " "4, " "5, " "6, " and "7" are used for anger, anxiety, disgust, happiness, neutral emotions, panic, and sadness, respectively.
Since the range of attributes varies from each other, normalization of each attribute value is performed to take them within the specified range.Two feature selection techniques, -score and SFS, are used for selecting significant features.Table 4 shows the average value of -score for each attribute or feature by using 10-fold cross validation.After calculating the -score of each attribute, SFS is used for obtaining 24 feature subsets or models.The importance or -score values of each feature from high to low are 1, 2, 4, 14, 6, 7, 5, 3, 10, 12,8,17,23,20,19,11,9,13,16,15,18,22,21, and 24.Table 5 shows the twenty-four feature subsets or models on the basis of SFS.For each feature subset, a MLSTSVM classifier is constructed and its predictive accuracy is checked using 10-fold cross validation method.The proposed MLSTSVM classifier is implemented using MATLAB R2012a.

Conclusion
The proposed emotion detection system can be used in automated call centres or in a nursing home where resources or nursing staff may not be available to aid everyone.Automated emotion detection system can be useful to identify the emotion change of patients and to trigger the alarm according to their emotion change so that effective treatment or facility can be provided to patients as soon as possible.This system can assist the clinician to perform the task of emotion detection more efficiently.The proposed emotion detection system may serve as an important tool because change in emotion is responsible for several diseases such as PTSD, BMD, OCD, depression, and suicide.In this paper, emotions are detected by using a novel classifier, named MLSTSVM, and its performance is validated on five benchmark datasets.PRAAT scripting tool is used for feature extraction and extracted 24 features from voice recording.The combination of -score and SFS is used for selecting important features from emotion detection dataset.It is found that MLSTSVM classifier based emotion detection system with sixteen features has achieved better predictive accuracy, 87.28% for linear MLSTSVM, 92.89% for Gaussian MLSTSVM, and 88.87% for polynomial MLSTSVM classifier.The performance of proposed system is compared with NN, ANN, and multi-SVM approaches.Experimental results indicate that our proposed novel classifier based emotion detection performs well as compared to the other existing approaches.The results of proposed classifier are also verified by using real dataset containing the voice of 10 persons with different emotions and obtained 86.18% accuracy with Gaussian MLSTSVM.The whole system may be adopted and extended as an intelligent personal assistant application for helping disable, autistic children, psychic patients, and elderly people.Apart from healthcare, importance of automatically recognizing emotions from human speech as achieved here in this proposed system may also be used as a part of human computer interaction applications such as robotics, games, and intelligent tutoring system.We have developed the emotion recognition system using MATLAB on Windows operating system.The system has certain limitations; for example, it does not deal with the background noise and is trained for male voices only.Hence few enhancements are possible in the future.A better performance could be guaranteed by optimizing the values of certain parameters like sigma (for Gaussian kernel function) and cost parameters by using genetic algorithm, particle swarm optimization, or any other optimization approaches.

Figure 3 :
Figure 3: Selecting audio file and running PRAAT script for producing features.

Figure 4 :
Figure 4: Voice report of an audio file.

Figure 5 :
Figure 5: Snapshot of the dataset.

Figure 6 :
Figure 6: Accuracy comparison of various classifiers for different emotions.
binary LSTSVM classifier, each of which separates one class from the rest of the classes.The th LSTSVM classifier assumes the data points of th class as positive data points and the data points of other classes as negative data points.Consider the data points of th class are indicated by the matrix   ∈    × , where   represents number of data points in th class.
∈   and   ∈  represent normal vector and bias term, respectively, in real space .(+  2   ) +   =  2 ,(3)where  1 ∈    and  2 ∈  (−  ) denote the vector of 1's and   and   represent the penalty parameter and the slack variable correspondingly.The first term of (3) denotes the squared sum distance of the data points of the th class.The minimization of this term keeps the hyperplane in the close affinity of the th class.The second term of (3) minimizes the misclassification error of the data points of rest of the -1 classes.So, in this way the hyperplane is kept in the close affinity with the data points of th class and lies as far as possible from the data points of other classes.    +  2   ) +   −  2 ) ,

Table 2 :
Extracted speech features after using PRAAT.

Table 4 :
Average value of -score using 10-fold cross validation.

Table 5 :
Twenty-four feature subsets based on SFS.

Table 6 :
Average accuracy of MLSTSVM classifier on different models.

Table 8 :
Comparison of emotion detection of various classifiers.

Table 9 :
Performance comparison on real dataset.