Research on Intelligent Recognition and Classification Algorithm of Music Emotion in Complex System of Music Performance

In the complex system of music performance, there are differences in the expression of music emotions by listeners, so it is of great significance to study the classification of different emotions under different audio signals. In this paper, the research of human emotional intelligence recognition and classification algorithm in the complex system of music performance is proposed.+rough the recognition of SVM, KNN, ANN, and ID3 classifiers, the accuracy of a single classifier is compared, and then the four classifiers are combined to compare the classification accuracy of audio signals before and after preprocessing. +e results show that the accuracy of SVM and ANN fusion is the highest. Finally, recall and F1 are comprehensively compared in the fusion algorithm, and the fusion classification effect of SVM and ANN is better than that of the algorithm model.


Introduction
Music performance is a comprehensive expression of emotional characteristics of music, behavior, emotion, and other factors, and it is also an important way to express human emotions. Because there are many expressions of personal emotions related to their own experiences in music emotions, music emotions deeply rooted in people's hearts can strengthen the musical expressions of human beings' control, suppression, and encouragement of their own emotions. In human emotional expression, classifying music emotions can distinguish different human beings from classifying emotional expression in music performance. erefore, it is of great research significance to classify human emotions intelligently from music performance; to distinguish music classification in the music performance system more effectively, music emotion recognition is the process of using computer to recognize the emotion contained in music itself, that is, to extract the basic characteristic information of music from music and classify the music emotion information through the preset recognition mode; it is a relatively cutting-edge artificial intelligence technology; music visualization is a design activity that aims at accurately and quickly conveying music content information and effectively realizing the value of music information, transforming music content information into visual elements for rearrangement and combination, and finally presenting music information in a visual way. e combination of music emotion recognition and visualization can help people reduce the cognitive burden of music understanding, thus promoting the application and development of music in many fields such as work analysis, auxiliary teaching, visual retrieval, game entertainment, and stage performance. According to the existing research situation, researchers at home and abroad have made some achievements in music emotion recognition and music visualization and put forward some feasible models or solutions. However, looking at the overall situation of this research field, the existing research still focuses on the technical level, while the theoretical exploration is still relatively lacking. Although it is necessary to study music emotion and its visualization technology, it is quite valuable to conduct interdisciplinary research based on the field of information science and combine musicology, psychology, and mathematics and computer science. is paper will start from the perspective of information and carry out some innovative research on the recognition and visualization of music emotional information.
In the emotional analysis of music, it is necessary to digitize different data forms for further processing. Some scholars classify emotions by decomposing audio signals such as music rhythm and timbre, extracting acoustic features, and combining relevant algorithms. In view of the music performance recommendation system, human emotions are not fully considered for classification research. e traditional music performance system is basically based on the user's playing history and playlist or aims at the characteristics of music performance, such as sound quality and rhythm. Calculating the distance of different performing music, using the basic information of music genre, singer, singer, lyrics, emotional characteristics, beat, and so on, distinguishing the emotional distance of different music, and classifying the characteristics are the key work of this paper. Literature [1] aims at some limitations in music performance recommendation methods, such as users only listen to rock music, while hip hop or R&B with similar emotions is difficult to get corresponding recommendations. For the classification of songs in music performance (popular and unpopular), it is proposed to give corresponding weights to the emotional characteristics in each music performance based on factor analysis of the characteristics of music factors, so as to quantitatively analyze the emotional analysis in music performance. Literature [2] studies the algorithm of classification and recognition of music ranking in the music performance system and analyzes posts and comments from Facebook and corresponding music emotions from human emotion data. e music emotion in the above music performance system is ranked, and the experimental results show that the preparation rate based on music emotion analysis reaches 80%, which is more conducive to improving the competitive quality of music works. In Literature [3], aiming at the problem of music emotion expression in the music performance system in the Internet network, a fusion research on the music emotion recommendation algorithm based on user personalization and network tags is proposed. Feature users are used to measure the similarity of personalized tendency of music emotion, and the music emotion recognition recommendation method is improved. e similarity personalized recommendation is realized from three aspects: emotion acquisition, analysis, and aggregation. Literature [4] analyzed the emotional influence of human beings in music performance. Music emotion expresses human happiness, hope, emotion, and other factors. Expressing emotion through music is a complex problem in intelligent recognition. In order to express this relationship better, it is proposed to extract features (for example, random k-tag set and multitag k-nearest neighbor) and classify human emotions to different music in music expression. Literature [5] proposed a music emotion label prediction algorithm based on the music emotion vector space model to solve the problem that the prediction effect of music emotion label in music performance is not ideal. Firstly, the emotion vector in the music performance system is extracted, the corresponding spatial model is established, and then it is classified and recognized by SVM. e results show that the music emotion classification method based on the vector space model has better advantages in speech recognition. Since music melody comes from human auditory perception, Paiva et al. [6] proposed a pitch tracking method based on the auditory model to calculate auditory spectrogram of music signal and to calculate pitch significance according to correlation graph to obtain melody pitch candidates. After quantifying the pitch candidates into music notes, the minimum pitch interval is adopted for them. Xia et al. [7] used the continuous emotional psychological model and regression prediction model to generate robot dance movements driven by constructing emotional changes and music beat sequences. Schmidt et al. [8] chose a continuous emotion model, linked the music emotion content with the acoustic feature model, and established a regression, and the model studied the music emotion changes with time. Sordo et al. [9] studied a variety of acoustic features, including bottom-level features and melodies, tones, high-level genres, styles, and other features, then reduced these features to D-dimensional space, linked them with semantic features, and used the K-nearest neighbor algorithm for automatic recognition of music emotions. Anders [10] invited 20 music experts to express the different emotions of happiness, sadness, fear, and calm by controlling the numerical combination of 7 feature quantities such as rhythm and timbre in the equipment and to obtain the relationship between feature quantities and music emotions. Shan et al. [11] established a recommendation model based on music emotion, mainly studying the emotion conveyed by movie music. Chen and Li [12] used a continuous emotional psychological model and regression model to predict the emotional value of music and used two fuzzy classifiers to measure the emotional intensity to identify the emotional content of music. Rajib Sarkar et al. [13] proposed to use convolution neural network to identify music models and compared it with commonly used classifiers such as BP neural network. Rao and Rao [13] proposed a two-way mismatch (TWM) pitch saliency calculation method based on the fact that the harmonic amplitude attenuation speed of singing sounds is lower than that of musical instrument sounds and used the dynamic programming (DP) algorithm to track vocal melodies. Huang et al. [14] proposed a multimodal deep learning method, which uses double convolution neural network to classify emotions. Depth model 370-Boltzmann machine is used to reveal the correlation between audio and lyrics. In the second part of this paper, the emotion classification methods in music performance system are introduced, especially the related physical characteristics and music signal processing. In the third part, the emotion recognition algorithm is analyzed, and eight kinds of emotion expressions are put forward. en, SVM, KNN, and other models in the intelligent algorithm are explained. In the third part, the intelligent recognition method proposed above is applied to music emotion recognition. From the experimental results, the proposed method has recognition rate and preparation rate.

Emotion Classification Method in Music Performance System
Emotion in music performance is based on the resonance of sound in musical instrument performance. e sounds emitted by different equipment will produce different emotional expressions to the audience, and they can be intelligently classified through different emotions.

Physical Properties of Music.
In music performance, several instruments are usually played at the same time.
Different musical instruments will produce different timbres, pitches, and other characteristics. In order to better study the basic nature of sound, the next study is the sound principle of musical instruments. e tension of strings in music equipment reflects the intensity of sound expressed by T0, the density is P, the length is L, the strings are divided into N + 1 segments, and the number of separation points is N, and then the intensity of sound can be described as follows: Definition of the original frequency of the performance system is as follows: When n � 1, p1 is described as the fundamental frequency, representing the pitch in the performance music. When n takes 2, 3, 4, . . ., n, it is called n harmonic in physical music characteristics, which will have an important influence on timbre in music performance. e length, density, and tension of strings in performance equipment have obvious influence on frequency. If the length of the string is changed, the pitch will also change.
e tightness of the string also determines the natural frequency. If the string is tighter, the pitch will be higher, and vice versa.

Music Feature Extraction.
In the music performance system, the combination structure of music is the systematic structure relationship of notes (pitch, length, and intensity), bars (notes, notes,. . .,notes), segments (bars, bars,. . .,bars), and music (segments, segments,. . .,segments). is structure is described as follows: e smallest unit in music is notes, which are combined into bars, bars are combined into segments, and segments are combined into the whole music. is classification is reasonable. From the perspective of granularity, notes, bars, or passages can be selected as important indicators to identify emotions. Segment size is determined by the characteristics of music itself, which is similar to the whole emotion classification.

Spectrum Transformation.
e purpose of spectrum transformation is to divide music equipment in music performance into frames and windows electronically and to express the spectrum of each frame information number of received music equipment information. Fourier transformation is used here to realize spectrum transformation. e formula is as follows: where x (n) is a window signal and ω(n) is a window function. N, m, n, and H, respectively, represent the number of frames, window length, Fourier transform length, and window movement size.
In order to facilitate the processing of the information of the above signals, the transformation in equation (4) needs to be normalized. e processing process is as follows: e above signals are averaged and normalized in different windows, and the signals tend to be stable.

Peak
Processing. After the above processing, the signal can obtain the corresponding peak frequency and amplitude. e simplest method is to directly use the position of the frequency point and the amplitude of the Fourier transform, but this method is limited by the frequency resolution of the Fourier transform. In order to solve this limitation, various Complexity correction methods have been formed to achieve higher frequency accuracy, and better amplitude estimation can also be obtained.
where f ⌢ is the evaluated peak frequency, f s is the sampling rate, and κ is the offset of the window, and the calculation formula is as follows: However, in the high-pitched region, it is significant and is expressed by the following formula: where α ⌢ � (X m (k m )N(1 − z 2 ))/(sin c(z)(κ(k m ))), N and h denote the maximum number of harmonics, and i denotes the number of peaks.

Music Signal Filtering.
In this paper, fractal wavelet is used to filter the equipment signals in music performance. In the complex environment of different equipment sounds, there are M sound source nodes [15], in which node O is the source point that controls all equipment, node An is the music sound source, and the traffic flow is sent from node An to node O and then forwarded through node O. Traditional methods to describe the performance of traffic flow include Poisson model and Gaussian distribution. e research basis is that music signals show short correlation characteristics. However, with the deepening of the research, the results show that the actual music signals show selfsimilarity and long-correlation characteristics. Based on the fractal Brown motion (FBM) model, the following description model is proposed: where m is the average arrival rate, Γ( * ) is the c function, H is the related parameter (0 < H < 1), B H is the standard fractal Brownian motion, and ∇B H (t) is a discrete fractal Gaussian noise model, and the function K H (u) is as follows: From equation (9), it can be seen that B H (at) � a H B H (t), the mean value is 0, the variance is σ 2 t 2H , and the autocorrelation function is shown in the following equation: e probability density function of FBM is In equation (10), ∇B H (t) obeys N(0, σ 2 |τ| 2 ) the Gaussian distribution, where σ 2 is as follows: e wavelet is reconstructed by scale contraction and related scale function, and the appropriate wavelet function and scale function are selected as shown in the following equation: Equation (15) becomes a pair of orthogonal bases, and the signal X (t) is shown in the following equation: e scale coefficient U j,k and wavelet coefficient W j,k are shown in the following equation: where j represents the scale and K represents different audio sources. Due to environmental reasons or the influence of other bands, there is noise in the original music, so it is necessary to filter the music signal. Compared with the existing filtering algorithms, wavelet filtering is time-sensitive and fast in signal processing. Other filtering algorithms are not as efficient as wavelet filtering.

Emotional Model.
In music performance, the audience is emotional and can express a series of emotional types. e following emotional model is explained. At present, Hevner's emotional ring model [16] is widely used, which is divided into music emotions that have progressive connections with their adjacent classes. e emotions among the 8 classes can smoothly transit to each other, forming a ring emotional model to symbolize the emotional response of music works, as shown in Figure 1.
Hevner's emotional loop model is a music emotional model composed of a series of discrete words, and the emotional attributes of these offline words are quite different, which is beneficial to the emotional identification of music works. In order to classify the emotional experience of music, researcher Hevner asked the experimenters to compare the eight emotions they felt under the stimulation of hundreds of music works. Hevner's emotional ring model was tested by many experimenters on different tones of music works. It is found that major is easy to make people happy, elegant, and lively; minor is easy to make people sad, illusory, multifeeling, and other emotional expressions. e speed of music also has a certain impact on human emotions; music with fast speed and rhythm is easy to make people happy and cheerful; slow-speed or slow-paced music tends to make people suffer and sad negative emotions.

SVM Model.
e principle of SVM is to find a hyperplane to complete the binary classification of samples.
at is to say, the given training dataset is transformed into the corresponding convex quadratic programming problem by the principle of interval maximization to obtain an optimal separation hyperplane, and the dataset is divided into two categories. Figure 2 is a hyperplane schematic diagram.
For linearly separable SVM hyperplanes, the leading quantity w and the distance b are described.
(18) e classification function for equation (18) is defined as follows: As can be seen from Figure 2, when the size of the margin value determines the effect of SVM classification, the margin is calculated as follows: When the classification effect of the sample is more obvious, the larger the margin value is. For linear SVM classifiers, there is a minimum requirement for margin value. If the value is too small, the worse the classification effect of SVM classifiers is, and the constraint relationship is as follows: Generally, for the objective function of linear SVM, Lagrange operator is introduced to transform the original optimization problem into dual problem for calculation simplification. Linear separability is based on the ideal sample situation. In the dataset used in real classification problems, most samples are linearly inseparable.

k Nearest Neighbor
Algorithm. K nearest neighbor (kNN) classification algorithm [17] is a common passive supervised classification algorithm in the field of pattern recognition. e classification standard model for the KNN algorithm is defined based on the formula of distance: For equation (22), the smaller the dist value, the higher the similarity between samples, and the N samples with high similarity are only classified in the same category.

Multiclassifier Fusion Method.
In music performance, the processed audio signal is divided into audio features and text features by the above algorithm, and then multimodal fusion is carried out so as to judge the overall emotion after music performance. Model fusion is to integrate multiple learners through model combination strategy and finally achieve better prediction results than a single model. In this paper, the common average fusion strategy is selected to realize multiclassifier fusion. e above classifiers are fused by simple average or weighted average, and the fused average value of the two classifiers is taken as the final value of the model. e corresponding average value or weighted value for SVM and CNN models is taken, and the formula is as follows: In this paper, SVM and CNN classifiers are used for fusion research. Considering the poor classification effect and low accuracy of a single classifier, the fusion classifier can improve the classification effect of music features.

Evaluation Model.
e evaluation model is to better evaluate the classification effect of the above classifiers. e following formula is adopted in this paper:

Experimental Analysis
In this paper, the dataset of the song audio emotion classification experiment comes from the PEMO pop music emotion dataset. According to the ayer emotion model, four emotional tag song lists are extracted from the website. e list is as follows: Chanting, Sad, Longing, Lyric, Jumping, Joyful, Warm, and Majestic. Among them, 100 songs are extracted from each emotional category, and a total of 800 songs are extracted in the dataset. e audio signal is divided into 15 s, 30 s, and 45 s original fragments, 80% of which are training sets and 20% are sample test sets.
In order to better detect the proposed algorithm effectively, the same audio is tested in segments so that the segmentation goal can be reflected in different segments of audio and whether the test results are consistent. In the experiment, the algorithms SVM, KNN, ANN, and ID3 are used as the basic classification algorithms, and the effects of different combinations are tested by adding the weights of different algorithms to find the optimal model in the classification combination algorithm. e flow of the classification algorithm adopted in this paper is shown in Figure 3: In this group of experiments, the original audio signal was processed and compared with the unprocessed audio. e classification effects of different algorithms were analyzed from 8 indicators. e LLDs features were input, and the above four classification algorithms were used to classify and output music emotions.
As can be seen from Tables 1 and 2, when the music segment time is relatively short, the shorter the segment, the higher the accuracy. Similarly, the accuracy of emotion classification of the processed audio as a whole is higher than that of the original audio. It shows that the audio signal is filtered and the effect is better after denoising. e following is a comparison of the overall average values of the four algorithms before and after audio signal processing, as shown in Figures 4 and 5.
From the average value of the audio under the above four algorithms, the overall processed audio has a certain improvement in accuracy under the four algorithms. Among the four single classifications, SVM and ANN classifiers have ideal preparation rate, while ID3 classification has the worst effect.
After experimental comparison of four single classifiers, the fusion of the above different algorithms is studied. e classification accuracy of fusing different classifiers is shown in Figures 6 and 7.
As can be seen from Figures 6 and 7, the overall accuracy of the processed audio is relatively high, averaging over 70%. e highest is that the accuracy of audio classification after SVM and ANN fusion is 75% (15 s). e lowest is that the accuracy of ANN and ID3 fusion is only 55% (45 s). e classification effect of audio without preprocessing is better than that of a single classifier, and the accuracy rate of audio without preprocessing is still about 5% lower than that of preprocessing classifier.
Based on the above fusion algorithm, the two indexes of recall and F1 are analyzed again to evaluate the full search efficiency and algorithm stability after classifier fusion. e specific effect is shown in Figure 8. e fusion effect of SVM and ANN algorithm is the best, with the highest recall and stable algorithm model. SVM and ANN are used for audio signal classification, which has a good classification effect. e single performance analysis of four common classification algorithms is carried out, and the classification algorithms with two classification effects are fused, and the fusion effect can achieve the best classification performance. If the combination is carried out again, the performance is not significantly improved, but the complexity of the algorithm increases. erefore, the combined recognition method can achieve the best performance.

Conclusion
In music performance, the audience's performance of audio signals is a lot of emotional expression. How to classify audio information emotionally is to study different music performances and bring the best audio-visual effect to listeners. In this paper, a single classifier of SVM, KNN, ANN, and ID3 is used to classify audio signals, and the classification accuracy of audio signals before and after processing is compared. e processed audio has the best classification effect. Further research on the fusion of different single classifiers to classify audio signals has obvious advantages in accuracy, recall rate, and F1.
Data Availability e raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.