Quantitative Analysis of Comprehensive Influence of Music Network Based on Logistic Regression and Bidirectional Clustering

,


Introduction
As an important part of human civilization, music has a profound impact on the development of human society. In the process of creating music, artists are influenced by many factors, such as other musicians, music schools, music genres, social events, and political events. In turn, the continuous creation of musicians also promotes the development of music. In recent years, scholars try to use machine learning and deep learning model to explore the impact of music development. According to our task and preliminary thinking, we searched and read articles about music genre classification and recognition, music impact evaluation, and so on. In the research content related to music genre classification, it mainly analyzes from the perspective of single music influence. Some literature considers the perspective of music genre classification, and on this basis, it has better practical significance to evaluate the music influence.
Naser and Saha [1] analyzed the influence of music on emotion and creativity by establishing an egg model. Banerjee et al. [2] used deterministic and nondeterministic methods to study the influence of different music. In the analysis of seriousness, this paper analyzes the music faction from two dimensions of acoustics and music characteristics [3]. Suganda et al. [4] and Li et al. [5] constructed a deep neural network model for music classification by using the spectrum, evaluated and combined the concept of fusion segment feature proposed by Dai et al. [6], verified the effectiveness of music feature segmentation and extraction, and introduced long-term short-term memory (LSTM) into the field of music genre recognition. Sim et al. [7] used the multilevel MRA artificial neural network method to understand and predict the relationship between music influence and music influence. On the basis of continuous integration of traditional filtering method and packing method, various new search algorithms and evaluation criteria for classifiers are constantly changing, such as neural network pruning method [8] and fuzzy entropy evaluation of feature set [9], which provide a good idea for support vector machine [10] and Gaussian mixture model [11]. Burred and Lerch [12] classified music types based on crnn algorithm. Cheng et al. [13] analyzed the advantages of the music genre in intensity tests. Nilson et al. [14] studied the analysis of the music emotion recognition method based on data features and used Li's [15] LDA model to process music tags, and weighted processing improved the accuracy of classification. Zhen et al. [16] input MFCC features into the KNN model and conducted experiments on gtzan dataset [17], which verified the effectiveness of the KNN model in music genre recognition. Tzanetakis and Cook [18] compared the ability of the gradient enhancement model and the additional tree model to extract multidimensional digital features of music types and proved that the two models can effectively identify the feature information of music types.
e Gauss-Seidel method proposed by Donald Goldfarb [19] and the basic principle and idea of clustering proposed by Goldfarb et al. [20] also provide ideas for the creation of music impact indicators.
is paper constructs a comprehensive and multidimensional music influence network and, on this basis, constructs a music similarity measurement model and discusses the influence of music, the similarity of music schools, and the dynamic changes of music in detail. It mainly includes the following aspects: creating music influence network: PageRank is used to construct a directed network to reveal the types of interaction between musicians and music; research on constructing music similarity measurement model: the supervised classification model is used to calculate the music similarity of samples; bidirectional clustering: analysis of the similarity and influence between and within genres; and lasso regression: using lassos to identify indicators that significantly explain the dynamic factors and explain how genres or artists change over time. e existing model is analyzed, and a general model considering a variety of data is proposed. e overall idea is shown in Figure 1.

Data Sources and Basic Assumptions
e data of this paper comes from question D of the 2021 American college students' mathematical modeling contest. In order to facilitate the solution of the problem, we propose the following assumptions: (i) We assume that the average characteristics of all music created by a musician in a year can represent the musical characteristics of a musician throughout the year (ii) We assume that the average characteristics of all musicians in a year can represent the musical characteristics of the whole year (iii) We assume that missing data does not play an important role in the model (iv) We assume that there is no deviation in the internal relationship between the indicators in the model

Exploratory Data Analysis
When we get the preliminary data set, we first use the normalization method to process the data, evaluate the existing data information through numerical processing and comparison, and mine the correlation and change trend between the data, so as to help us understand the original data set to the maximum extent. Here is what we found.

Exploration of Data Loss Value.
rough the analysis of the data sets, we found that there is no missing value in all the data sets, but there is some information asymmetry between the data sets. Here is our analysis. e musician with ID 477787 has no information in "data_by_artist", so we deleted it. Some musicians in "data_by_artist", do not have data in "influence_data", and they lack information about the year and genre of influence.
Considering a large amount of information about musicians, few musicians lack information. In order to facilitate subsequent analysis and modeling, we extract these musicians' data from "data_by_artist", After deletion, we still have the data of 5602 musicians. At the same time, there are many people who make up a small proportion of music. For the convenience of analysis, we also delete them. After the deletion, we still have 91731 music data.

Data Standardization.
Sometimes, due to the needs of the model, we need to standardize the continuous variables in the data before modeling, that is, using the following formula: where x ij is the element in rows i and j in the dataset, x ·j is the average of column j, and s ·j is the standard deviation of column j.

Variable Analysis.
We calculate the mean, median, quantile, maximum, and minimum values of continuous variables in "full_music_data", e results are shown in Table 1.
Discrete variables can be divided into explicit variables, pattern variables, and key variables. Among them, there are 88361 explicit variables of type 0 and 3370 variables of type 1. e number of type 0 variables is nearly 26 times that of type 1 variables. According to experience, we can delete this variable in future analysis. ere are 25324 explicit variables in Category 0, 66407 explicit variables in Category 1, and 12 key variables in Category 1, which are expressed in the form of a histogram, as shown in Figure 2.
As can be seen from Figure 2, in "full_music_data", there are more categories 0, 2, and 9 and less categories 3 and 6 in the data set. e above is an exploratory analysis of each data set. In the next model, we will select variables and observations according to the actual situation of the data set, the 2 Complexity content of the problem, the background knowledge, and the application scope of the model.

PageRank Principles.
PageRank is a search engine technology based on hyperlinks. According to the links between the nodes, the importance of the contact is divided from 0 to 10 columns, and 10 points is the full score. A high PR value indicates that the node is very important [21]. PageRank schematic is shown in Figure 3.

Calculating Music Influence.
In this paper, the importance of influencers in the network is sorted by using the improved PageRank algorithm [22,23], and the influence of musicians is calculated. For example, for the musicians' influencers and followers, PageRank's principles are as follows.    First, we construct a dimension adjacency matrix by the following relations among n musicians: If the values of i row and j column in the matrix are equal to 1, then musician j has an influence on musician i. On the contrary, if the values of row i and column j in the matrix are equal to 0, then musician j has no effect on musician i.
To avoid divergence, we normalize the columns of the matrix so that the sum of each column in the matrix equals 1: In order to avoid the algorithm convergence failure due to the dead chain of the algorithm, we jump along the connection relationship between musicians with a certain probability. We jump along the connection between musicians with a certain probability β and jump to any musician with a certain probability 1 − β. To describe it in mathematical language is to construct a matrix M.
where (1/N) N×N is the N × N dimension adjacency matrix with all internal elements (1/N). We set the β � 0.85. Finally, we initialize the influence degree vector of N × 1 dimension; that is, r (0) � random(r 1 , r 2 , . . . , r N ), where r i ∈ (0, 1), and through the iteration of the following formula until convergence, we get the final influence vector r (t+1) .

Analysis of Calculation Results.
We used the PageRank formula to calculate the importance and got the influence of 5602 musicians. e results are shown in Table 2.
From Table 2, we find that musicians with ID 36106, 3495279, and 3480388 are the three most influential musicians. 1960, 2010, and 2010 are the first ten years of their music career, respectively. Pop/Rock musicians make up 75% of the top 20, and Latin musicians make up 15%. From the time when musicians began to write, 50% of them started to write in 2010 and 25% in 2000, and 75% of the top 20 musicians started to write in the 21st century.
In order to further explore the relationship between the influence, genre, and year of musicians in the target network, we conducted an in-depth analysis of influential musicians among 5062 musicians. e influence of the 200th most influential musician accounts for 10.12% of the first most influential musician. Based on the assessment of musicians' influence in previous studies, we defined the top 200 musicians as influential. For selected influential musicians, we compared their music genres and the time they started their careers. e results are shown in Figures 4 to 5.
From Figure 4, we find that the most influential music genre is pop/rock, accounting for 65.5%, followed by country music, electronic music, and R&B music, accounting for 11.5%, 5%, and 5%, respectively. In terms of the time when musicians start their careers, 45.5% of them start their careers in 2010 and 21.5% in 2020, which means that 67% of the most influential musicians start their careers in the early 21st century. In 1990, 26.5% of musicians began to work in music, and in 1980, 4.5% began to work in music. By the end of the 20th century, 31% of the musicians began to write music. Based on the above analysis, we can draw a preliminary conclusion: since the end of the 20th century, influenced by economic and social stability, scientific and cultural development, and other factors, pop/rock, country, electronics, R&B, and other music schools have developed rapidly, and the number and influence of musicians of these schools have increased.

5.1.
e Principle of Supervised Classification Pattern. When the response variable is binary, the logistic regression [24] model is usually used. e response variable is represented by G, where G � 1, 2 { }, and takes the value in the label of the element which is arbitrary. e logistic or the logarithmic formula of quasi-conditional probability: e linear logistic regression model can be extended to a multivariate logistic model when the classification response variable G is at the level of k > 2. e traditional method is to extend the logarithm formula of the conditional probability of binary logarithm regression model to logarithm k − 1. log is is the dimension vector of the coefficient p. We chose a more symmetrical approach. Our model is as follows: If there are no constraints, the parameterization is not estimable. erefore, we regularize the maximum likelihood (polynomial) fitting model. In other words, let p ℓ (x i ) � Pr(G � ℓ|x i ) and g i ∈ 1, 2, . . . , K { } be the i-th response; we estimate the parameters by maximizing the probability of penalty logarithm.
where y is the indicator response matrix of dimension NK, and the elements are

Construction of Music Similarity Model.
We use the multiobjective logistic model [25] to build the music similarity index, and the specific process is as follows: (i) Taking 20 music genres as dependent variables and music characteristics as independent variables, a multiobjective logistic model is constructed to obtain the probability value of each music in 20 music genres. en, we select the K genres with the highest probability among the genres that each music is most likely to belong to cp i � cp i1 , cp i2 , . . . , cp i20 , i � 1, 2, . . . , N, and we set k � 6; that is, . . , N, of music in six music genres and calculate the number of elements in the intersection num ij � length(intersection(C i , C j )), and finally, we get the similarity s ij � (num ij /k).
is view holds that each music genre has its own music characteristics, so it is easy to think that each music genre can be regarded as a complete musical feature. If the two kinds of music are very similar and their music characteristics should be very similar, the first six music types with the highest probability obtained by the logistic model should have high similarity coincidence. We take two songs, Ernie and Wandering Eye, by musician 178301 as an example.
rough multiobjective logistic analysis, the first six types with the highest probability are shown in Table 3.
So, their similarity is 1. It can be said that the similarity of these two pieces of music is still very high.
In addition, we can also use a logistic background music similarity measurement model to calculate the similarity between musicians. As shown in the left figure of Figure 6, the deeper the color expression among different musicians, the closer their music features are, which means that the higher the similarity between their music.

Case Analysis.
e music characteristics of different musicians affect the distance between musicians, which also affects the similarity between musicians. So, must musicians in a genre be more similar than cross-genre musicians? In order to study this problem, we randomly selected 100 musicians from 5062 musicians and analyzed the similarities between the same type of musicians and different types of musicians.
e results are shown in the right figure in Figure 7. e following can be seen from the similarity value between different genres in Figure 7: generally speaking, the music similarity within a unified genre is generally higher than that between different genres. e music similarity between the same genre is more than 0.6, and the selection of similarity up to 1 is due to the small sample size, which leads to the small number of musicians extracted from a specific genre. It can be solved by enlarging the sample. For some genres, they are more similar to some genres, even higher than the similarity within this genre. For example, the similarity between vocal music musicians and country music musicians is 0.65, which is greater than that between country music musicians by 0.64. e reason for this phenomenon can initially speculate that the popularity of vocal music schools is earlier than that of country music, and the musical characteristics of vocal music genres have a certain foundation. In the development of country music, the vocal music genre can get more references from the music style of the vocal music genre, and they have high similarity.

Bidirectional Clustering Principle.
e purpose of bidirectional clustering [26] is to find the submatrix satisfying the condition in the gene expression data matrix, so that the expression of the feature set in the submatrix is consistent on the corresponding observation set. is is similar to the best subset selection in the regression problem. Just as the best subset selection problem is successfully solved by solving the convex surrogate problem (lasso), we will use the convex relaxed combination problem to select the row and column partitions.
en, the bidirectional clustering model is equivalent to the chessboard average model, which is exhaustive because each matrix element is assigned to a bidirectional ω ij � ω ji clustering. is is different from other bidirectional clustering models, which identify possible overlapping row and column subsets, but not exhaustive. e parameter estimation of the chessboard model includes partition and the average value of each partition.

Bidirectional Clustering Process.
We determine the partition by minimizing the following convex criteria: where represents the i-th column (row) of a matrix. e quadratic term quantifies the approximation degree of U to X, and the regular term penalizes the deviation from the chessboard pattern. Parameters c ≥ 0 adjust the trade-off between the two terms and ω ij � ω ji .
In this paper, the alternating minimization algorithm (AMA) proposed by Chi Eric and Lange [27] is used to solve the convex clustering problem. With the increase of penalty coefficient from small to large (1, 5.62, 31.62177.82), we get the clustering process in Figures 8(a)-8(d).
It can be seen that with the increase of penalty coefficient, the U matrix of reconstructed x gradually presents the overall pattern of the chessboard. Next, we only look at the clustering results with penalty coefficient of.  Figure 9.
According to the results of bidirectional clustering, 20 different music genres are divided into 5 groups. According to the comparison of rhythms, the similarities and differences of the five music characteristic values of Latin, country, children, reggae, blues, pop/rock, R&B, electronics, and religion all have good rhythm and positive vitality. ere are obvious similarities among international, folk, and vocal music, all of which have a certain sense of rhythm and moderate overall loudness.
rough the comparison of sound, duration, and musical instruments, we can see that there are obvious similarities among classical, new era, stage, and screen music schools, with similar repertoire and a higher degree of musical instruments. Similarly, easy listening is very similar to jazz in that there are specific vocals in their repertoire. Comedy/spoken English is divided into groups. Among the 19 music schools, only comedy/spoken language has positive values of discourse and liveliness, which indicates that comedy/spoken language is a music school of speaking or reciting poetry for the audience.

e Screening Principle of Actual Influence.
A musician can list a dozen or more musicians who have an impact on them. ICM provides us with a data set, including the impact on their relationship with their followers. But do these influential musicians really influence the music that their followers make? In order to further explore the influence of influencers on followers, we use the data set of the relationship between influencers and followers, and the similarity matrix between musicians to construct a matrix that can filter out the actual influence.
Firstly, a 0-1 matrix M of 5602 × 5602 is constructed by using the data set, where 0 represents no relationship between the two musicians and 1 represents the relationship between the two musicians. For example, we can set the M as follows:    8 Complexity en, the similarity matrix between 5602 musicians is used to construct the 0-1 matrix P; that is, when the elements in the similarity matrix between musicians are greater than the threshold, the corresponding position element of the P matrix is taken as 1; otherwise, when they are not greater than the threshold, the corresponding position element of the P matrix is taken as 0. Combined with previous studies, we set the threshold of similarity to 0.8. For example, the transformation from similarity matrix to P matrix is Finally, a new matrix is obtained by multiplying the corresponding elements of the 0-1 matrix and the matrix. e meaning of each element in the matrix is as follows: the superposition of the similarity and influence of two musicians, which we define as the actual influence. If there is no influence relationship between the two musicians, the actual influence is 0. If there is an influence relationship between two musicians, the actual influence depends on whether the similarity between them is greater than the threshold. When the similarity is greater than the threshold, there is a real influence relationship between two musicians.

Case Analysis.
We randomly selected 5602 musicians and took the musician with ID 816890 (Johnny Cash) as an example. Based on the interaction and similarity data set between musicians with ID 816890 (Johnny Cash) and other musicians, the screening matrix of actual influence is constructed. We finally found the musicians who really influenced him, as shown in Table 4. After getting the real influential musician ID 816890 (Johnny KASH), we further thought the following: are musicians who really influence Johnny KASH have more "infective" musical characteristics, or do they all play a similar role in affecting Johnny KASH's music? In order to explore this problem, we have done the following.
First, we get normalized data for musicians with ID 816890 (Johnny Cash) and the music characteristics that really affect his musicians, as shown in Table 5.
en, the sum of absolute distances of followers with ID 816890 on nine music features is calculated. For example, for the dance of musical features, we get the sum of absolute distances: For the nine features, the sum of the absolute distances calculated is shown in Figure 10.
Finally, look at Figure 10. We find that, compared with other musical features, the sum of absolute distances between acoustics and instrumentality is very small. Musical harmony is the two musical characteristics, and the musician with ID number 816890 has a real influence on these two characteristics.

An Analysis of the Characteristics of Revolution.
In the process of music evolution, some revolutionary changes in music characteristics may lead to a significant leap in music evolution [28,29]. So, which music characteristics are revolutionary in the data? In order to find the revolutionary characteristics of music, we have done the following work. We use the "data_by_year" data set to analyze the fluctuation of various music characteristics from 1921 to 2020. In order to eliminate the influence of the level of variable value and different measurement units on the measurement value of dispersion degree, we choose to use the dispersion coefficient to analyze the fluctuation of music characteristics. e calculation formula is as follows: We calculate the dispersion coefficients of the ten musical characteristics and set the threshold as 0.5. en, when the dispersion coefficient of a musical feature is greater than 0.5, it will be revolutionary to a certain extent. e dispersion coefficient of each music feature is shown in Table 6.
From Table 6, we can see that the dispersion coefficient of acoustic, instrumental, and speech music features is greater than 0.5, which is a revolutionary music feature.

e Impact of Major Changes.
e characteristics of revolutionary music are often reflected in the songs created by musicians. So, in the directional network of musicians' influence, are there any musicians with significant influence who are the influencers of major changes? In order to find out, we have done the following work.
Firstly, we get the changing trend of acousticness, instrumentalness, and speechiness from 1921 to 2020.
Secondly, we take the value of the three revolutionary music characteristics as the standard when they have changed greatly and look for the songs with the smallest absolute distance between the music characteristics and the standard value in the corresponding years. In 1924In , 1926In , 1927In , 1929In , 1930In , 1935In , and 1946, a total of 10 tracks with revolutionary application characteristics were found (as shown in Table 7).  Among them, the repertoire with revolutionary music characteristics in 1924 is not in the "influence_data" structure, so we looked for similar tracks in the past few years. So, in 1921, we found that the repertoire created by the musician ID 26350 is the one closest to revolutionary music.
Finally, according to the repertoire with the characteristics of revolutionary music, we determine the musicians who create this repertoire and calculate their music influence through the directional network of musicians' influence. e results are shown in Table 8.

e Method of Coordinate Axis Descent to Solve the Lasso Regression.
Lasso is a linear model for estimating sparse coefficients. It tends to choose solutions with less nonzero coefficients, which effectively reduces the number of features on which a given solution depends. Under certain conditions, Lasso can accurately recover a set of nonzero coefficients [30].
Mathematically, it consists of a linear model and an additional regularization term. e objective function of minimization is as follows: erefore, lasso estimation solves the minimization of least squares penalty by adding α‖w‖ 1 , where α is constant and ‖w‖ 1 is L1 norm of the coefficient vector.
Lasso uses the coordinate axis descent method as the fitting coefficient algorithm. e coordinate axis descent method is to descend along the direction of the coordinate axis, which is different from the gradient descent. It iterates step by step through a heuristic method to find the minimum value of the function.

Lasso Regression.
In order to analyze the influence process of a certain music genre over time, we use lasso regression to screen out the indicators that can reveal the dynamic influence factors and then study how the music genre and musicians change over time [31].
Firstly, the normalized data is used in the artist data set. e independent variable is set as the musical characteristics of the musician, and the dependent variable is set as follows: e results of lasso regression are as follows: It shows the influence of three independent variables, namely, valence, loudness, and acousticness, on the dependent variable. y is more significant, which can reveal the dynamic influencers.
Secondly, we randomly select a musician and analyze the change with time according to the results of lasso regression. We selected a musician whose ID number is 26350. Figures 14-16 show the changes of the musician's three musical characteristics: valence, loudness, and acousticness over time.
As you can see from Figures 14-16, the musician's valence with ID 26350 fluctuates between 0 and 0.8, but in most cases, it will remain between 0.2 and 0.5. His loudness gradually decreased with the passage of time but still remained in a typical range. His acousticness fluctuates with time, but the fluctuation intensity is very small and generally maintains a relatively stable value. Only a few times, there will be relatively large fluctuations.
Finally, we choose any genre and use the same method to analyze its changes over time. We choose the type of country music, and the three music features change with time, as shown in Figures 17-19 : From Figure 17-19, we find that the valence of the country music genre shows a downward trend over time, from 0.65 to 0.5, which indicates that the style of country music is changing from happy to sad. e loudness of country music increases with the passage of time, but the      range of change is small, and it has been maintained at about -10. e acousticness of country music has decreased year by year, from 0.7 to 0.2, which has changed a lot.

Sensitivity Analysis
In question 6, we use lasso regression to screen variables, where we set the penalty coefficient to 0.1, and the resulting formula is as follows: Now, we refer to the empirical study [32], set the penalty coefficient of 0.01 interval as 0.08, 0.09, 0.1, 0.11, and 0.12, and test the change of regression coefficient in different corresponding result formulas. Finally, we plot the result as a line chart, as shown in Figure 20.
It can be seen that with the increase of the penalty coefficient, the regression coefficient changes gently, and there is no violent vibration, which indicates that our model is not sensitive to the penalty coefficient and has good stability.

Conclusion
Based on PageRank, this paper establishes a dynamic analysis network of music influence by using 11 music characteristic indexes and analyzes the music influence of different genres and musicians. Using multiobjective logistic regression to establish a music similarity measurement model, this paper analyzes the music similarity between different genres and musicians combined with music influence networks to analyze whether the interaction between musicians will have a practical impact on their works. At the same time, from the perspective of music genres, using the method of two-way clustering, this paper analyzes the mutual influence and similarity between different music genres and the same music genre. Finally, the lasso region method is used to select features, explore the change factors in the process of music development, and analyze the dynamic change process of music [33][34][35][36].

Complexity 13
In this paper, we choose multiobjective logistic regression, bidirectional clustering, and other methods; these methods have the characteristics of high accuracy and easy to understand. Meanwhile, this paper uses visualization tools to assist analysis many times, which helps to absorb and master information intuitively. But the analysis of this paper is also insufficient. In the process of similarity index construction, compared with direct calculation distance, the computational complexity is higher, and more computing time is needed in the face of a large amount of data. Due to the availability and quantification of data, this paper does not make full use of some discrete music features.
In the work of model improvement, we will reduce the error by increasing the number of variables and samples and analyze the deficiencies that may cause the error from the aspects of data processing, model building, and model solving. To build a model that can analyze more music features (including virtual features), we collect more information about musicians, expand the number of analysis objects, and make the model more universal in practical application.
To sum up, we use network science to build a dynamic network to analyze the similarity of music, the evolution process, and the impact of music on culture. Our research results can provide a theoretical basis for evaluating the influence of different music schools and have certain research significance and practical value in the fields of music, history, social science, and practice.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.