Emotion Modeling in Speech Signals: Discrete Wavelet Transform and Machine Learning Tools for Emotion Recognition System

Speech emotion recognition (SER) is a challenging task due to the complex and subtle nature of emotions. Tis study proposes a novel approach for emotion modeling using speech signals by combining discrete wavelet transform (DWT) with linear prediction coding (LPC). Te performance of various classifers, including support vector machine (SVM), K-Nearest Neighbors (KNN), Efcient Logistic Regression, Naive Bayes, Ensemble, and Neural Network, was evaluated for emotion classifcation using the EMO-DB dataset. Evaluation metrics such as area under the curve (AUC), average prediction accuracy, and cross-validation techniques were employed. Te results indicate that KNN and SVM classifers exhibited high accuracy in distinguishing sadness from other emotions. Ensemble methods and Neural Networks also demonstrated strong performance in sadness classifcation. While Efcient Logistic Regression and Naive Bayes classifers showed competitive performance, they were slightly less accurate compared to other classifers. Furthermore, the proposed feature extraction method yielded the highest average accuracy, and its combination with formants or wavelet entropy further improved classifcation accuracy. On the other hand, Efcient Logistic Regression exhibited the lowest accuracies among the classifers. Te uniqueness of this study was that it investigated a combined feature extraction method and integrated them to compare with various forms of combinations. However, the purposes of the investigation include improved performance of the classifers, high efectiveness of the system, and the potential for emotion classifcation tasks. Tese fndings can guide the selection of appropriate classifers and feature extraction methods in future research and real-world applications. Further investigations can focus on refning classifers and exploring additional feature extraction techniques to enhance emotion classifcation accuracy.


Introduction
Te goal of emotion recognition is to understand and interpret human emotions accurately.Undoubtedly, this subject captures a great deal of attention due to its wide use spectrum that spans diferent domains.To illustrate this, in case of human-computer interaction, emotion recognition makes systems to be adjustable and react to emotional nature of the user which, thus, gives the experience the boost.Te use of emotion recognition system can be of great assistance in the areas of psychiatry and psychology.It may have application in market research, customer feedback analysis, and social media site feeling analysis.
Emotion recognition typically involves three main steps: various data acquisition methods are used, and both feature extraction and classifcation are conducted.Data acquisition involves capturing emotional cues from sources such as images, audio recordings, or physiological sensors.Feature extraction involves transforming the acquired data into meaningful representations that capture relevant emotional information.Finally, classifcation algorithms are employed to classify the extracted features into specifc emotional categories, such as happiness, sadness, anger, or surprise.
Emotion recognition is mostly based on machine learning algorithms such as SVM, neural networks, and clustering methods which enable building accurate emotion discerning models.Te models are taught or induced to learn these particular patterns in the given labeled datasets containing varied emotions from which they can make predictions even on the new dataset containing unseen data.Overall, emotion recognition plays a vital role in understanding human behavior, enabling more sophisticated human-machine interactions, and facilitating applications that require an understanding of emotional state classifcation tasks.
Because of its usefulness in areas like computer-human interaction, emotion driven computing, and healthcare, speech emotion recognition (SER) has gained interest recently.Te main goal of the SER task is identifying the emotional elements showcased in a person's speech pattern.Pitch, formants, and energy are examples of manually created characteristics that are extracted from the speech signal using traditional SER methods.After that, a classifer is trained using these features.However, these hand-crafted features might not be able to convey the complex and subtle nuances in speech that indicate diferent emotions.Since wavelet transforms may automatically extract distinctive characteristics straight from the raw voice signal, they have shown considerable promise in speech recognition applications.With their widespread application in SER, the combination of WT and machine learning techniques has produced cutting-edge outcomes.
In this paper, we provide an enhanced SER method that integrates machine learning with the wavelet transform.Te DWT has demonstrated success in SER and is a strong method for obtaining time-frequency information from speech data.We can capture the intricate connections between the extracted features and the underlying emotions by combining the LPC calculated from wavelet framed subsignals and classifed by several machine learning models.To evaluate our proposed method, we conducted experiments using the EMO-DB dataset, which is a widely recognized benchmark dataset for SER.
Te main signifcance of this study is as follows.We provide an improved SER approach that integrates machine learning methods with wavelet transform (WT).Our proposed method is assessed using the EMO-DB dataset, yielding state-of-the-art results.We carry out ablation experiments to examine the efects of various elements within our suggested approach.Te uniqueness of this study was that it investigated a combined feature extraction method and integrated them to compare with various forms of combinations.However, the purposes of the investigation include improved performance of the classifers, high effectiveness of the system, and the potential for emotion classifcation tasks.
Tis paper is organized as follows.An introduction and extensive survey of pertinent SER literature are provided in Section 1. Section 2 provides a thorough explanation of our suggested approach.Section 3 presents the discussion of the experiment results.Concluding remarks and suggestions for future work are presented in Section 4.
1.1.Motivation.Our motivations for researching the proposed method, which combines DWT, LPC, and machine learning classifers for emotion recognition from speech signals, include the following: (1) Improved Accuracy.Enhancing the accuracy of emotion recognition is crucial for afective computing and human-computer interaction.Our goal is to enhance the accuracy and dependability of identifying emotional states from speech signals.Te motivation behind our research stems from the need to address existing limitations in emotion recognition from speech signals.Despite the fact that a great deal of efort has been expended in this direction, there is still a need for further work, especially focusing on accuracy, feature representation, and aesthetics.We are going to develop a new explicit algorithm of DWT, LPC, and machine learning classifer to face some challenges mentioned before.Trough this integration, not only can we model both temporals, but also frequency or spectral features of the speech signal, providing a more detailed representation of emotion.By leveraging the advantages of low-power technology, wavelet transformation, and machine learning, our objective is to enhance the accuracy and reliability of emotion recognition readings, thereby advancing the feld.
Given the innovation of the paper, the originality is the specifc implementation of DWT, LPC, and machine learning algorithms for classifying audio signals for identifcation of emotions.More recently, the use of DWT in face emotion recognition has gained momentum.Several studies employed DWT for emotion recognition from speech [3][4][5].Tese studies include Mel Frequency Cepstrum Coefcients (MFCC), Linear Prediction Coefcient (LCD), and waveletbased parameters along with DWTcoefcients in this regard.One of the most used classifers when it comes to emotion classifcation is the SVM classifer.Combined with other algorithms (e.g., MFCC), the mixture increases the reliability of results.Some studies report accuracy levels of up to 82.14% and 85% using DWT-based techniques for emotion detection.Tis shows that DWT may be an important indicator in detecting emotional information through speech.
Tese areas fnd applications in psychology, education, and human-computer interaction [6].Decision trees, support vector machines, and neural networks are some of the advanced machine learning approaches that have correctly recognized emotions from speech samples [7].Tis process involves utilizing features like pitch and MFCCs, along with various techniques for feature extraction, selection, and classifcation [8].Deep learning models such as CNNs and LSTMs have seen impressive advancements and demonstrated remarkable performance in emotion recognition.Interestingly, MFCCs seem to be the most appropriate features for this purpose [9].In the paper [10] titled "Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifer," the authors proposed a method for recognizing speech emotions.Te method combined hybrid spectral-prosodic features extracted from the speech signal and glottal waveform, along with metaheuristic-based dimensionality reduction techniques and a Gaussian elliptical basis function network classifer.In the paper [11] titled "Speech emotion recognition using discriminative dimension reduction by employing a modifed quantumbehaved particle swarm optimization algorithm," the authors propose a method for speech emotion recognition.Teir approach focused on employing a modifed quantumbehaved particle swarm optimization (QPSO) algorithm for discriminative dimension reduction.Te paper presented a novel approach that leveraged a modifed QPSO algorithm for discriminative dimension reduction in speech emotion recognition.

Method
2.1.Feature Extraction Method.Several studies have applied signal processing using discrete wavelet transform (DWT).One study applied DWT for signal noise reduction on a FPGA-based chip design platform and integrated audio encoder on a FPGA board [1,2].One study developed a new algorithm for DAW based on DWT yielding better SNR and BER rates compared with other approaches [3].Another study used DWT for the classifcation of the heartbeats depending on the static and dynamic features extracted from the ECG signals.Such classifcation was achieved with high accuracy using a Radial Basis Function Neural Network classifer [4].DWT was also employed in Applied Computational Intelligence and Soft Computing compressing ECG time-series datasets resulting in high information compression ratios and excellent signal reproductions [5].
In this paper, a combination of LPC and DWTfor feature extraction is proposed.A time-frequency analysis method called the DWT breaks down the speech signal into multiple sub-bands.Each sub-band captures specifc temporal and spectral characteristics of the speech signal.Te LPC is a speech analysis technique that represents the speech signal as a linear combination of previous samples.Te LPC coefcients are estimated for each frame of fve consecutive frames within each sub-band of the DWT decomposition.
Te DWT is expressed as follows: Equation ( 1) presents the approximation coefcients, and equation ( 2) presents the detail coefcients.In the above formula, CA(j + 1)[n] denotes the approximation coefcients at level j + 1, which capture the low-frequency components of the signal.CD(j + 1)[n] denotes the detail coefcients at level j + 1, which capture the high frequency contents or details of the signal.
Te coefcients h[k] and g [k] are the flter coefcients, also known as the scaling and wavelet flters, respectively.Tese flters are typically chosen from well-known wavelet families, such as Daubechies, Haar, or Symlets.
Te summation is performed over the flter taps, or coefcients k, and the coefcients CA(j)[n] represent the approximation coefcients at level j, obtained from the previous level of decomposition.
Te DWT performs iterative decompositions, starting with the original signal at level j � 0 and computing the approximation and detail coefcients at each level until the desired level of decomposition is reached.
Once the signal has been decomposed using the DWT, it can be reconstructed by applying the inverse DWT (IDWT), which involves sampling the coefcients and applying the appropriate synthesis flters [12][13][14].
We can refer to the convergence property of the DWT by the ability of the transform to accurately represent a signal as the number of decomposition levels increases.In other words, as we perform more iterations of the DWT, the approximation and detail coefcients obtained approach the original signal.
Tis overlapping factor is proven using the conservation principle.Te energy of a signal is the total power or magnitude of the signal which includes all its instantaneous values.For the DWT, the property of keeping the energy of the signal helps us to maintain the energy of the original signal in the decomposing and reconstructing process which is done by using the DWT.
And now let us express the energy conservation property of the DWT mathematically as Here, ||x|| 2 represents the energy of the original signal, and represent the energy of the approximation and detail coefcients at each decomposition level.Te convergence property can be observed by increasing the decomposition level number J. One the aspect which is received from the fact that as the decomposition level number increases, there is a reduction in the energy contribution in the detail coefcients with a one aspect which is that the energy contribution in the approximation coefcients remains relatively high.Tis indicates that the approximation coefcients capture the low-frequency components of the signal, while the detail coefcients represent the high-frequency details.
Te implementation of the convergence is drawn, and we picture the reconstruction of signal by the inverse DWT (IDWT) from diferent layers of decomposition.We then change the number of decomposition levels, and the reconstructed signal becomes developed and thus almost looks like the original signal.Te convergence behaviors of DWT can be highly afected by the choice of wavelet functions as well as the implementation details.Diferent wavelet families may introduce certain trade-ofs between time and frequency localization, which can impact the accuracy of the signal reconstruction.
Te discrete wavelet transform is a scientifcally established and stable transformation technique for signal processing.Te DWT ensures boundedness and energy preservation through the careful design of wavelet flters.Te stability of the DWT is guaranteed by the properties of the wavelet flters.Te flters possess fnite support, ensuring that the transformed coefcients remain bounded.Additionally, the flters exhibit good frequency localization, allowing the DWT to accurately capture signal details without excessive amplifcation or distortion.Te decay of high-frequency coefcients further supports the stability of the DWT, indicating that the transform efectively represents the high-frequency components of the signal.
Te LPC equation is given by In this equation: (1) s predicted(i) denotes the predicted sample at index i.For each sub-band, we extract fve frames of the subband signal.For each frame, we estimate the LPC coefcients.Ten, we average the LPC coefcients of the fve frames to obtain a single feature vector for each sub-band.Te averaged features for all subsignals are collected in one feature vector that represents the entire signal.
Te advantage of this feature extraction technique is that it enables reliable and discriminative extraction of the speech signal's time-frequency and spectral characteristics at the same time.

Database.
Te EMO-DB database is a collection of German emotional speech recordings.Te database encompasses audio recordings of ten actors, each manifesting seven distinct emotional states, namely, anger, boredom, disgust, anxiety, happiness, sadness, and neutrality.Each actor recorded fve repetitions of each emotion, resulting in a total of 535.Te speech utterances were recorded with a top-notch microphone in a sound-proofed room.Te signals were saved in WAV format after being sampled at a rate of 16 kHz.

Experimental Setup.
For evaluating the proposed method, the EMO-DB dataset is used.EMO-DB dataset is a well-known testing database for recognition tasks, which includes ten actors' recordings of emotional speech.Te database consists of fve males and fve females.To ensure that the emotional distribution remains consistent in both sets, a 5-fold cross-validation technique is used.

Feature Extraction.
For each sub-band obtained, LPCs were extracted using a frame size of fve frames and a step size of one frame.Twelve LPCs were computed for each frame.Te resulting features for each subsignal are twelve coefcients.Te whole features are collected from all the subsignals to form a single feature vector for each utterance.

Model Training and Evaluation.
To evaluate the proposed method, SVM, KNN, MLP, and CNN are used for classifcation.Te proposed classifers are trained by the training set to be tested over the testing set.

True Positive Rate (TPR).
It is the ratio of true positive cases compared to the total number of true positive and false positive cases, calculated as TP/(TP + FN).

False Positive Rate (FPR).
It is the ratio of false positive cases compared to the total number of true positive and false positive cases, calculated as FP/(FP + TN).
To see how the classifer performed per class, a TPR or FNR option can be used.Te TPR is the proportion of correctly classifed observations per true class.Te FNR is the proportion of incorrectly classifed observations per true class.Figure 1 shows summaries per true class in the last two columns on the right for sadness.
2.6.Classifcation.Te classifcation of emotions was conducted using trained models.For each utterance, each emotion is detected from all remaining six emotions.Here is a brief description of each classifer [15][16][17][18].(5) Hyperparameters: Wide Neural Network with one fully connected layer of size 100, ReLU activation, iteration limit of 1000, regularization strength of 0, and standardized data.

Results and Discussion
Figures  1 shows that there are diferent degrees of accuracy for various types of emotions.Te average prediction accuracy obtained from the SVM classifer was 89.82%, which is a higher fgure than any other model.Te classifcation of sadness, happiness, disgust, boredom, anxiety-fear, and anger was high by at least 88.59%.However, the "neutral" class was at about 85.23% which is slightly lower.Te KNN classifer presented an overall average accuracy of 88.18%.It demonstrated high classifcation accuracies exceeding 88% for sadness, disgust, boredom, anxiety-fear, and anger.Nevertheless, its accuracy of the neutral category stood at 85.42%.Efcient Logistic Regression classifer scored an average accuracy of 87.47%.It was also stable in most categories, but not on the happiness with the accuracy score of 87.10%.Te average accuracy for the Naive Bayes classifer is 86.75%.Other classifers were more accurate across all emotion categories than it was.Te Ensemble classifer attained an average accuracy of 89.63%.It achieved high performance in all the emotion classifcations averaging accuracies between 87.48% and 94.76%.However, the Neural Network classifer exhibited an average accuracy of 89.40% in prediction.Te study classifed sadness, disgust, boredom, anxiety-fear, and anger well but had low accurate level for neutral categories.After scrutinizing the results, it becomes apparent that certain classifers possess a remarkable ability to accurately classify specifc emotions compared to others.Notably, the SVM and KNN classifers excelled across all measures, but the Ensemble classifer exhibited the highest average accuracy.However, there exists a need to know the best appropriate classifer depending on the application specifcation and characteristics.

Applied Computational Intelligence and Soft Computing
To establish a deeper understanding, detailed analysis and evaluation can be carried out, using various crossvalidation techniques and comparing performance with other machine learning classifers applied to emotion classifcation tasks.Furthermore, investigating the proposed feature extraction method as well as refning the classifers' parameters might result in increased precision for certain emotion categories.
Based on the provided results in Table 2 from crossvalidation, holdout validation, and resubstitution validation for sadness, we can observe the performance of diferent classifers: SVM, KNN, Efcient Logistic Regression, Naive Bayes, Ensemble, and Neural Network.SVM maintains high reliability with an average in terms of various crossvalidation folds.Cross-validation ranges from 94.20 and 95.89%.On the other hand, the holdout validity and resubstitution validation are between 94.34 and 100.0%.Tis means that SVM is an efcient classifer because it works well in multiple validation contexts.KNN achieves comparable performance, with cross-validation results ranging from 94.76% to 95.89% and holdout/resubstitution validation results between 94.34% and 100%.KNN accuracy remains constant when using diferent ways for validation, demonstrating it is stable and reliable.
However, Efcient Logistic Regression is less accurate than SVM and KNN algorithms.Te range is between 88.78% and 89.91% for cross-validation while it is 88.68% and 89.72% for holdout validation and resubstitution validation.Its performance is lower than accurate, but reasonable.According to cross-validation, it has an average accuracy of 94.01-95.28%.It demonstrates excellent performance in multiple validation settings, thus demonstrating its reliability across diferent utterances.
Among various classifcation models examined during cross-validation and holdout validation processes, Ensemble exhibits a highly competitive accuracy of up to 94.76% to 96.23%, while maintaining consistency in performance.Tese results imply that ensemble methods are efective for   increasing prediction accuracy when used on multiple classifers.With respect to the cross-validation, Neural Network also gets an accuracy of between 93.83% and 96.23%.Tese results show that neural networks can indeed tackle the specifc classifcation problem due to their capacity of learning complex patterns.In summary, based on the provided results, SVM, KNN, Naive Bayes, Ensemble, and Neural Network classifers show competitive performance.SVM and KNN consistently achieve high accuracy, while Naive Bayes demonstrates robustness across diferent validation approaches.Ensemble methods and Neural Networks also perform well, indicating their potential to improve accuracy through combining multiple classifers or learning complex patterns.Efcient Logistic Regression achieves lower accuracy compared to other classifers but still performs at a decent level.
Table 3 presents classifcation accuracy obtained using diferent feature extraction methods in the context of sadness, for diferent classifers, namely, SVM, KNN, Efcient Logistic Regression, Naive Bayes, Ensemble, and Neural Network.Te examined feature extraction methods include the proposed method, formants, wavelet entropy, proposed method and formants, and proposed method and entropy, respectively.
Te classifers produced good results with accuracies varying within the range of 88.41-95.88%.Te proposed method and formants employing SVM attained the best accuracy of 95.88%.Tese imply that the combined use of the proposed method and formants as features provided the optimum classifcation performance.
In comparison, the proposed method achieved the highest average accuracy of 93.39%.Formant features and wavelet entropy also yielded good performances, achieving average accuracies of 88.75% and 93.30%, respectively.When combining the proposed method with entropy, an average accuracy of 93.70% is achieved.Te lowest average accuracy (minimum accuracies) was obtained by Efcient Logistic Regression, suggesting that this particular classifer may not perform well with this dataset, regardless of the feature extraction technique used.
Tese results shed light on the signifcant role played by features in obtaining correct classifcations.Te proposed method likely involves some domain-specifc knowledge and sophisticated means that consistently outperform other methods.Also, combining the proposed method with either formants or entropy further enhanced classifcation accuracy.
Table 4 shows the time consumption of our proposed method compared to the two referenced methods (Hema et al. [19] and Ullah et al. [20]) on the EMO-DB dataset.Te experiments were conducted on a computer with an Intel Core i7-11700K CPU, 32 GB of RAM, and an NVIDIA RTX 3090 GPU.
As can be seen from the table, our proposed method is generally faster than the compared methods.Tis is likely since our method uses a more efcient feature extraction method based on wavelet transform.
Te proposed method is compared with two published methods (Table 5): (1) MFCC-PCA-SVM [21] Te comparison of the three models for emotion recognition shows that the proposed method combined with entropy achieved the highest average accuracy of 90.24%.Its performance exceeded that of the LPC-PCA-SVM which scored a mean accuracy of 90.07%.Te MFCC + SVM model had the highest average error rate of 12.70% out of all the models applied.Regarding individual emotions, the proposed method combined with entropy provided the best result in recognizing sadness (94.40%), disgust (93.10%), and anger (91.80%).Tis was, however, more pronounced among the other emotions.Te results were the worst for MFCC + SVM across all emotions with relatively high performance for happiness (87.70%) and disgust (92.40%).
For more comparison of the proposed method with published works, two more methods are analyzed for comparison: (1) In the study regarding the automatic recognition of anxiety emotional state using the EMO-DB dataset by using KNN [23], the proposed method achieved an average accuracy of 90.24%.Te accuracy for identifying anxiety/fear emotion specifcally was 89%.In comparison to the other emotions in the EMO-DB dataset, the proposed method achieved accuracies ranging from 86% to 94.40% for diferentiating sadness, neutral tones, happiness, disgust, boredom, and anger.While the study in [23] had a recognition rate for the emotion classes of about 70%, our proposed method outperforms the published method where our result for the database is more than 80%.(2) In comparison with [24] that based on MFCC and CNN on the same database, we can state that our method achieved an average accuracy of 90.24% for all database.Tat is slightly better than this method published in [24], that achieved 90.20% accuracy, slightly less than our results.While the published method has a promising CNN-based method, we found that our proposed method is an excellent way of dealing with the database providing a competitive result.
Te drawbacks of the proposed strategy along with justifcations for overcoming the mentioned drawbacks are as follows: Computational Complexity.Te proposed strategy addresses the computational complexity by leveraging efcient algorithms and optimization techniques specifc to LPC and DWT.Justifcation.Tis highlights that despite the potential higher complexity, the method incorporates measures to mitigate computational demands and improve efciency.Sensitivity to Noise.Te proposed strategy includes noise reduction and denoising techniques in conjunction with LPC and DWT to enhance robustness against noise.Justifcation.Tis implies that the approach considers noise sensitivity and uses preprocessing procedures to increase the quality of retrieved features, making it more noise resistant.
Limited Adaptability.Te proposed strategy incorporates adaptive wavelet selection and feature fusion techniques, allowing it to adapt to diferent signal characteristics and efectively capture relevant information across diverse signal types.Justifcation.Tis demonstrates that the method is designed to overcome the limitation by incorporating adaptability mechanisms, enhancing its applicability to various signal analysis tasks.
Lack of Comparative Evaluation.Te study contains detailed comparative evaluations using cutting-edge feature extraction methods to demonstrate the superiority and efcacy of the suggested approach.Justifcation.Tis guarantees that the method is rigorously examined against known methodologies, demonstrating its performance and allowing for fair comparisons.
Future work will focus on exploring diferent feature extraction methods and deep learning architectures to further improve the performance of our proposed method.We will also investigate the use of transfer learning to leverage knowledge from related tasks.

Conclusion
In conclusion, discrete wavelet transform combined with linear prediction coding was proposed for emotions modeling via speech signals.Te study evaluated the performance of diferent classifers for emotion classifcation using the EMO-DB dataset.Classifers considered were SVM, KNN, Efcient Logistic Regression, Naive Bayes, Ensemble, and Neural Network.AUC, average prediction accuracy, and cross-validation approaches were used for evaluation.
Te results demonstrated that KNN and SVM classifers have high discriminatory power in accurately identifying sadness from other emotions.Ensemble methods and Neural Networks also performed well in sadness classifcation.Efcient Logistic Regression and Naive Bayes classifers showed competitive performance but were slightly less accurate compared to other classifers.
Te study also explored feature extraction methods and found that the proposed method yielded the highest average accuracy.Combining the proposed method with formants or wavelet entropy further improved the accuracy.Efcient Logistic Regression had the lowest accuracies among the classifers.
Te results indicate that the proposed method combined with entropy as a feature extraction technique has superior performance in accurately recognizing emotions, particularly for sadness.Te LPC-PCA-SVM model also performs well but slightly less, while the MFCC + SVM model has the lowest accuracy among the three models.
Tis research, by evaluating the performance of diferent classifers for emotion classifcation, aims to contribute to the feld of emotion recognition.Te results suggest that KNN, SVM, Ensemble, and Neural Network classifers effectively predict sadness, particularly when combined with the proposed feature extraction method.Tese fndings can inform the selection of suitable classifers and feature extraction techniques for designing emotion recognition systems.Future research could focus on improving classifer performance and exploring additional feature extraction methods to further enhance the accuracy of emotion categorization [25].
(2) b(m) denotes the LPC coefcients.(3) s(d − m) denotes the past samples of the speech signal.4 Applied Computational Intelligence and Soft Computing (4) m min and m max defne the valid range of m values based on the order of the LPC model.

Figure 1 :
Figure 1: Te classifer performance per class, summarizing TPR and FNR in the last two columns on the right, for sadness.

Figure 2 :
Figure 2: Te ROC fgure for the results of the proposed method for SVM classifer (a) and KNN classifer (b).

Figure 3 :
Figure 3: Te ROC fgure for the results of the proposed method for Efcient Logistic Regression classifer (a) and Naive Bayes classifer (b).

Figure 4 :
Figure 4: Te ROC fgure for the results of the proposed method for Ensemble classifer (a) and Neural Network classifer (b).

Table 1 :
Te average accuracy for each classifer across diferent emotion categories.

Table 2 :
Te results of diferent cross-validation techniques and comparison with machine learning classifers, in emotion classifcation tasks.

Table 3 :
Te proposed method, formants, wavelet entropy, and the proposed method combined with formants and with entropy, along with the average accuracy in the last row.

Table 4 :
Te time consumption of our proposed method compared to the two referenced methods (Hema et al. and Ullah et al.) on the EMO-DB dataset.

Table 5 :
Te proposed method is compared with two published methods.