Music Trend Prediction Based on Improved LSTM and Random Forest Algorithm

As one of the entertainment consumption products, pop music attracts more and more people’s attention. In the context of big data, many pop music listeners can determine the development trend of pop music to a large extent. In order to predict the trend of pop music, we can dig and analyze the audience’s preferences and preferences deeply based on massive user data. This paper proposes a music trend prediction method based on improved LSTM and random forest algorithm. The algorithm first performs abnormal data processing and normalization processing on the test data set. Then the important features are selected by the random forest algorithm and corrected by the rough set compensation system. Finally, the prediction is made by improving LSTM. In the experiment, RMSE and MAER are used as the performance evaluation indexes of the algorithm, and the results show that the proposed algorithm can better predict the music popularity trend. At the same time, the root means square error and mean absolute error index are improved obviously.


Introduction
With the continuous improvement of people's living standards in our country, more and more people like to listen to music, and the demand for online music is also growing [1]. As growing number of people enjoy electronic music, various electronic music platforms have emerged. These music platforms provide customers with various personalized services, such as music recommendations and social networking [2]. These music platforms allow online artists to share their original music on the platform, providing an easy way for online artists to become online celebrities [3]. Nowadays, more and more original artists are willing to share their original music on electronic music platforms, with tens of thousands of new original songs appearing on music platforms each month. At the same time, many users enjoy music on the electronic music platform every day, resulting in millions of user records of listening, downloading, and collecting music [4]. Such massive data resources have a vital role in guiding the grasp of music trends.
With the rapid development of the Internet, "Internet plus" products have emerged in the music field, including online music platforms with increasing number of users [5]. The standardization of music copyright has affected the distribution of user groups. Retaining old users and attracting new users has become the focus of our work [6]. The app users will choose is influenced by many factors and whether a music platform is liked by songs users is important [7]. Through the user base, we can talk about the problem of a diversified profit model. From the user perspective, the most popular songs are most likely to be clicked on, regardless of personal preferences [8]. However, from the perspective of history and the current situation, the popularity of music is mainly judged by managers' intuition, which is lacking scientific and practical basis, and the results are not satisfactory. Therefore, accurately predicting the popular trend of songs has become the top priority to attract users [9].
Only scientific predictions can make the correct decision. So far, there have been many methods to predict data, but the nature of prediction can be divided into quantitative and qualitative [10]. Qualitative analysis is generally analyzed by induction, deduction, analysis, synthesis, abstraction, and generalization. Quantitative analysis usually includes two aspects: causality study and statistical analysis. Regardless of which method is used, prediction accuracy is crucial [11]. In order to improve the accuracy of prediction, it is usually necessary to compare various prediction methods and either choose the best method or combine various prediction methods for prediction [12]. The commonly used statistical analysis models mainly include the exponential smoothing method, trend extrapolation method, and moving average method. The standard causality research mainly includes the linear regression causality model [13].
The music trend prediction can be realized by time series regression and other prediction models. The regressive integrated moving average model (ARIMA) was proposed in the literature [14]. Although it can predict well based on dynamic data and its characteristics, the selection of different times and parameters in ARIMA is not universal; so, it is necessary to preprocess and classify the data of each artist and then adjust the parameters one by one. A three-order exponential smoothing model was proposed in the literature [15]. It can predict the time series with both trend and seasonality, but it is sensitive to selecting data sets and periods. In addition, for curves higher than second-order fitting, uncontrollable divergence will occur. The model of STL decomposition (seasonal and trend decomposition using loess) is proposed in the literature [16]. Although universal and robust, it is only suitable for additive models and cannot be automatically processed adjusting sudden data changes. RNN (recurrent neural network) proposed in the literature [17]. However, the feedback can be given according to the previous data and the nonlinear dynamic system. Nevertheless, the convergence is poor, even if the corresponding features are added, and the prediction effect is not ideal.
Various research methods have been used to predict the trend of pop music. Collect and integrate the massive music library resources and user behaviors of various platforms, thus forming the big data set of pop music. Accurate analysis of specific attributes of this data set can finally accurately control the trend of pop music. This paper proposes pop music trend prediction based on improved LSTM and random forest algorithm. Firstly, the LSTM network is optimized to increase the attention layer. Then, the stochastic forest regression prediction model is constructed. In the experimental part, the collected data sets are analyzed and processed firstly, the important features of the data are extracted by the random forest model, and then the improved LSTM is used for prediction.

The Algorithm Is Proposed in This Paper
2.1. Improved LSTM Algorithm 2.1.1. LSTM Network. Long-short-term memory (LSTM) is a recurrent neural network (RNN) [18], solving time series problems. The problems of gradient explosion and gradient disappearance in RNN are also solved effectively. The contents of cell state c n in LSTM are controlled by two gates. One is the forgetting gate, which determines how much of the unit state c n−1 from the last moment remains at the current moment c n . The other is the input gate, which determines how much of the network's input i n is saved to cell state c n at the current moment. The current output value b n of LSTM is determined by the output gate and cell state c n . The structural principle of LSTM is shown in Figure 1 [18].
LSTM is divided into encoder and decoder. The encoder realizes the isometric learning of the input feature data, and the decoder calculates the reconstruction error. Let the input sample I 1 , I 2 ⋯ , I T be defined as fI x g t x=1 , and the observation window size is represented by t. If researchers' type I x = fi x1 , i x2 ,⋯, i xt g is in any of these, they will have I x = fi x1 , i x2 ,⋯, i xt g. The encoder implied state vector corresponding to the x column of any sequence I x in time n ∈ f1, 2,⋯,tg is shown in where b ðn−1Þx ∈ R w is the output state vector of the x coding unit at the moment n − 1. Input vector i nx ∈ R w , M. R is the coefficient weight matrix of order w × d and w × w. Function zð·Þ is usually the activation function "tanh." Taking each column vector in I x as input to the encoder results in the following.
where b nx refers to the output of the x coding unit at moment n. t is the parameter set of the encoding part. z etc t ð·Þ is usually set to the tanh activation function. After the whole sequence is input to the encoder, the output sequence set fb nx g tx x−1 can be obtained. Then, the pooling step is performed as follows.
where y is the number of rows of b nx . After the pooling step, b x enters the decoder. The inputs can be refactored into formulas (6) and (7).b whereĩ n x refers to the reconstructed data.b t i refers to the decoder implicit state vector. z dec α ð·Þ and ρð·Þ are usually set to the activation function "tanh." The LSTM model is finally obtained by minimizing the function ∑ t x=1 i n x −ĩ n x 2 . The LSTM network structure is shown in Figure 2.

Attention Mechanism.
After the attention mechanism was proposed, it was applied in visual images for the first time [18]. The subsequent rapid application has influenced the development of many artificial intelligence fields based 2 Journal of Sensors on depth algorithms. At present, attention mechanism has been successfully applied to image processing, natural language processing, and data prediction. Meanwhile, an attention mechanism has been applied to neural network machine translation and achieved fruitful results. Taking Ali music platform as an example, the output information is obtained through the LSTM network. Then, the output information is computed. Finally, the attentional probability distribution of feature information is obtained. In this way, we can master the output state of LSTM unit at each moment and provide an important basis for predicting music popularity trend.
2.1.3. Improve the LSTM Model. The improved LSTM model adds an attention layer after the LSTM structure. The LSTM and attention mechanism model has been applied to relational classification and achieved good results. The application of the combined model to the prediction of time series data, especially public data, is still in the stage of continuous improvement. In this paper, the mechanism combining LSTM and attention mechanism are used to predict the corresponding singers and songs on the Ali Cloud music platform. Based on the past historical data, two dimensions of "daily broadcast volume" and "average broadcast volume of consecutive three days" were selected to predict the song LSTM decoder 3 Journal of Sensors broadcast volume of singers in the next 60 days. The overall model construction adopted is shown in Figure 3.

Similar Day Data Sets Were Obtained by Fuzzy Clustering
(1) Fuzzy Clustering Principle ofC-Means. For a given data set I = fi 1 , i 2 ,⋯,i t g, the C-mean fuzzy clustering is determined. Reseachers must enter the number of categories C, each cluster's center w y ðy = 1, 2,⋯,cÞ. For each sample i z , there is a corresponding membership function μ xz = μ I x ði z Þ, that is, the membership degree of the z sample to class I x . The clustering loss index function based on membership function can be expressed as Equation (8): where h represents the weighted index, also known as the smoothing factor, and the sharing degree of sample among fuzzy classes. There are still controversies about the optimal value of h in the academic circle. Considering the amount of calculation and the calculation accuracy, the weighted index is usually 2.
If the partial derivative of Y concerning w y and μ y ði x Þ is 0, the necessary conditions for Y to obtain the minimum value can be obtained by formulae (9) and (10): Therefore, generating c-mean fuzzy clustering is the number of input categories c. Initialize the center w y ðy = 1, 2,⋯,cÞ of each cluster. For sample i z , the corresponding membership function is μ xz = μ I x ði z Þ and then repeatedly uses formula (9) and formula (10) to calculate w y ðy = 1, 2, ⋯,cÞ and μ xz = μ I x ði z Þ. The clustering centre and membership function can be determined until the accuracy requirement is satisfied.
(2) Determination of the Optimal Cluster Number. When c -means fuzzy clustering is used for cluster analysis, the number of categories c must be set in advance [19]. The value of c is has a profound influence on clustering. If the number of clustering is too large, the samples of the same kind will be divided into different classes. If the number of clustering is too tiny, data of different classes may be grouped into the same class. The setting of the wrong clustering number will lead to the wrong clustering result and even make the itera-tion unable to converge. Therefore, it is necessary to calculate the number of clustering by setting optimization criteria.
The basic idea for calculating the number of clusters is to introduce outcome evaluation indicators. By increasing the number of clusters and judging the change of evaluation indexes, the optimal number of clusters can be calculated from the results of the optimal indexes. As for the result evaluation index, the evaluation of clustering results is mainly carried out from two aspects: similarity of the same class and difference of different classes. Therefore, the evaluation indexes can be determined as intraclass similarity X g and interclass similarity X r [20]: where t x is the amount of data in the class. i is the sample object. c x is the center of class I x . The smaller the value of X gx is, the more concentrated the data points are in the center of the class and the more similar the data samples are: where c x is the center of class I x . c y is the center of class I y .
The larger the X rxy value is, the farther the center distance between adjacent classes is, and the smaller the similarity between classes is. Therefore, comprehensive evaluation indicators can be obtained.
Therefore, the calculation and determination process of the optimal cluster number is as follows: (1) Determine the range of classification numbers. Generally, the classification number c is set to c ∈ ½2, ffiffi t p , where t represents the number of all sample data  [21]. Therefore, the critical step of random forest prediction is decision tree and forest formation, as shown in Figure 4. This paper uses the bootstrap algorithm to extract the training set and its features for music trend prediction. Then, select the CART algorithm as the regression decision tree generation algorithm. Therefore, the splitting of nodes 4 Journal of Sensors is based on the minimum mean square error, and its calculation method is shown in Equation (14).
where s represents all training sets of the current node. G represents the feature set extracted from the current node. Training set s is divided into subsets D 1 and D 2 according to feature G. By traversing the values of G, the sum of the minimum mean square deviation of the output values j x of subset D 1 and D 2 can be calculated. Finally, all attributes are traversed to obtain the value w of the minimum mean square deviation, the value of the corresponding attribute, and corresponding attribute, and the growth information   After the decision tree is generated, a similar method is used to train all the training sets obtained by sampling to form a random forest.
The steps of building the stochastic forest prediction model are as follows: (1) First, the c-mean fuzzy clustering method processes the original data. After obtaining similar daily data, it was divided into two parts: training set D and test set S. Let the sample size of D be T and let the attribute size of the sample in D is W (2) Sample size was extracted from training set D by bootstrap resampling. Can also the training set of T, repeated Z times, Z training sets θ 1 , θ 2 , ⋯, θ Z , be obtained? Each training set can generate a corresponding decision tree fN 1 ðθ 1 Þ, N 2 ðθ 2 Þ,⋯,N Z ðθ Z Þg. Therefore, Z is the number of decision trees in the random forest (3) Use training set θ 1 , θ 2 ⋯,θ Z to generate the corresponding CART decision tree. In decision tree generation, c dimension attribute features are extracted from M dimension attribute features for each split node without putting back. These attributes and their corresponding values were traversed, and the minimum mean square value was calculated to obtain the optimal segmentation criterion (4) Train each decision tree until the termination condition is reached The established random forest model is used to substitute the test set data.
(1) After normalized processing, the input data is matched with similar daily data. Substitute in the random forest prediction model fN 1 ðθ 1 Þ, N 2 ðθ 2 Þ,⋯, N Z ðθ Z Þg formed by the corresponding similar days (2) It was substituted into each decision tree of the growth model, and the splitting criterion of the culture process was like that of the model. Until the end. Then, the weight of each nonzero leaf node is calculated, and the weight calculation formula is shown in Formula (15) (3) After obtaining the weight of the leaf node, the weight of the decision tree can be calculated according to Equation (16) (4) After the decision tree's root, node, weight, and other information are saved, the random forest prediction model is established After establishing the random forest prediction model, the steps for prediction are as follows.
(1) After normalized processing, the input data is matched with similar daily data. Substitute in the model fN 1 ðθ 1 Þ, N 2 ðθ 2 Þ,⋯,N Z ðθ Z Þg of random forest prediction formed by the corresponding similar days (2) Each decision tree is cultivated, and the splitting criterion in the cultivation process is like the model. After the prediction results of each tree are obtained, the weight data generated by the test set is utilized. A weighted average is used to get the final prediction

Prediction Result Evaluation and Rough Set Correction
(1) Evaluation Criteria for Prediction Results. This paper uses the random forest regression model to predict the results. Then, the average absolute error, average relative error, and R 2 (R 2 is used here to measure the fitting degree of the prediction result and the actual curve, namely, the correlation coefficient) qualified rate are used to judge the prediction result.
(2) Rough Set Theory. When using random forest or other regression algorithms for regression prediction, the prediction results tend to be conservative and smooth due to the specificity of regression theory. When the predicted curve has multiple peak values or the difference square changes significantly, the prediction results will have a conspicuous error near the peak value. The rough set can overcome this shortcoming. It is a mathematical tool to deal with fuzzy and uncertain problems in big data [22]. At the same time, it can effectively analyze and process incorrect and incomplete information, requires error compensation, is inconsistent, and has data loss. The hidden knowledge is mined, and the underlying laws in the original data are revealed.

Journal of Sensors
(3) Construction of the Rough Set Compensation System. The prediction model of music popularity trend can be expressed by Formula (18).
where j n+1 ′ represents the data after compensation at time n + 1. Before time n + 1, this value represents the popularity of music. j n+1 represents the result of random forest prediction at time n + 1. z n+1 , and z n represents the difference between the two sides of the prediction function at time n + 1. s is the scale factor, which is the key to music prediction and correction. It will be calculated by the rough set theory below in order to use rough set theory to calculate scale factor s.
According to the rough set theory requirements, an information system should be constructed first. Here, it is assumed that the information system based on rough set theory is Z = ðP, GÞ, and set P is the set of predicted values output by random forest prediction. P = C ∪ S is the attribute set. The conditional attribute C represents the consistent rule, and the data feature is extracted from the set of predicted values. S = fsg represents the decision attribute; that is, the scale factor to be obtained. Each attribute in C can be represented by the information contained in P. Finally, after rule deletion, we try to obtain the representation of attribute S.
As for the conditional attribute represented by C = fg, h, cg, based on previous research results, the conditional attribute D is defined here, where for the decision attribute represented by D. The attributes before n can be calculated according to Equation (22). The values after n + 1 are the decision targets to be determined.

Description of Sample Data
3.1.1. Source of Sample Data. The data set used in this paper comes from Ali Cloud music platform. Ali Cloud Music is a large music platform under Alibaba, which has accumulated many users and music library resources after years of development. In terms of the number of artists and the variety of songs, the platform is the best of its kind. More importantly, there is a wealth of user activity off the platform: uploading songs and downloading albums. Both the rich resources and the active degree of users play a significant role in guiding the trend of music. Given the data information under the music platform, the sample data provided by the platform are two tables. One is the user behavior table (time span 2020.03.01-2020.08.31) p2_mars_tianchi_users_action. The function of this table is to represent the behavior of users on Ali music platform. The other is p2_Mars_tianchi_songs. The information expressed in this table includes information about the singer and the corresponding song, such as the album, initial popularity.
The data sets p2_mars_tianchi_user_actions.csv and p2_ Mars_tianchi_songs.csv used in the paper are covered by (2020.03.01-2020.08.31). There are a total of 6 months of user behavior information and song information. According to the data set statistics, the total initial play popularity (historical play) of the songs included in the data set was 261.07 million. Users recorded 4.78 million playback behaviors.

Data Preanalysis and Pretreatment.
The goal of this paper is to predict the number of songs played by the corresponding singers. Therefore, in processing the data set, we should locate the number of songs played by singers. According to the 6-month data from 2020.03.01 to 2020.08.30 given in the data set, the trend chart of songs played by singers was made. Select the combined daily broadcast volume and the mean value of consecutive three days as the sample value at a particular time point. The training set of the neural network is constructed by "sliding," and its model parameters are set as follows: (1) Number of input nodes = 2  Figure 5 shows the average number of streams, downloads, and favorites for any artist in each region over the past six months. As can be seen from the chart, the number of plays, collections, and downloads of an artist's songs is the best predictor of whether an artist will be famous in a certain period.
For the song playback data of all singers, demean, variance normalization, and scaling were carried out [-1,1], and the length of the mean filter was set as ave filter = 4. The prediction curve shown in Figure 6 can be obtained using the original broadcast volume for prediction. As can be seen from Figure 6, there are deviations in the prediction results of some singers based on the original data. Figure 7 shows the experimental results of predicting the number of songs played in the next 60 days by using the 7 Journal of Sensors average number of songs played by singers (after scaling). As can be seen from Figure 7, the prediction result using the average number of songs played is better than that using the original number of songs played. The prediction matches the number of songs played by artists over the next 60 days.
RMSE and MAE are used to compare other forecasting methods. RMSE and MAE are the most used measures of variable accuracy. It is also an important yardstick of the evaluation model. RMSE measures the average size of the error, and MAE is the average error of the general form. The specific calculation method is as follows: where j is the observed value, jˆis the original value, and t is the number of observations. The experiment in this paper randomly selects ten artists in the data set. It predicts the number of songs played by artists from August 1 to August 31, 2020, by using SVM, LSTM, and the improved LSTM algorithm used in this paper. The predicted results, root mean square error and mean absolute error, are shown in Figures 8 and 9.
As can be seen from Figures 8 and 9, the prediction model proposed in this paper has a better prediction effect (RMSE and MAE indicators) in predicting the number of songs played by ten artists. Compared with the traditional method, LSTM and SVM are reduced from the original 0.08 and 0.067 to 0.048 and 0.035, respectively, and the error rates are reduced by 36.7% and 28.5%, respectively. It is proved that the model proposed in this paper is more suitable for predicting the trend of music popularity. Figure 10 shows the influence degree of RMSE and MAE of different model parameters. It is not difficult to find that both

Conclusion
The data in this paper is from Ali music platform, and the popular trend of music is predicted in big data. Based on the historical playback data of one user, this paper achieves accurate control of music trends in a period by predicting the playback amount of artists' songs in the target time. This paper proposes a music trend prediction method based on improved LSTM and random forest algorithm. The algorithm predicts whether a singer's songs will be popular in the future based on the average number of songs played in the past period. The experimental results show that the predicted results of the proposed algorithm are consistent with the actual songs played by singers in the next 60 days. At the same time, RMSE and MAE were significantly improved. Experimental results show that the proposed algorithm is more stable, accurate, and suitable for music trend prediction under big data background. In the future, the following types of work can try to combine the prediction model to improve the prediction effect. In this paper, the factors affecting the number of songs played are not considered comprehensively when selecting corresponding features. The effect of prediction needs to be further improved. Therefore, other comprehensive characteristics affecting song playback volume can be further explored in the later stage.

Data Availability
The labeled dataset used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The author declares no competing interests.