Educational Evaluation of Piano Performance by the Deep Learning Neural Network Model

In recent years, the piano education industry has occupied a huge market. However, the automatic evaluation function of piano performance has shortcomings in existing piano education products. Deep learning (DL) algorithm and the recurrent neural network (RNN) structure can help in automatics evaluation function of the piano performance. is paper proposes a Musical Instrument Digital Interface (MIDI) piano evaluation scheme based on the RNN structure and the Spark computing engine using the Deeplearning4J DL framework. e Deeplearning4J framework can run on the Java Virtual Machine; therefore, the entire system does not require cross-platform development. e Spark distributed computing engine realizes parallelization in music data preprocessing, feature extraction, and model training. Combined with the training user interface (UI) provided by the Deeplearning4J, it can improve developmental eciency. Additionally, the RNN parameters are analyzed. e results demonstrate that the error value of the three-layer RNN structure is smaller than other closest rivals’ techniques. In particular, few piano training institutions andMIDI website datasets are used as the basis, and the experimental samples are collected.e neural network is trained, and the performance of the evaluation model is tested. e results show that the evaluation outcomes of the designed performance evaluation model for the piano are fundamentally consistent with the real levels of the players with assured feasibility; after 3k times of the training periods, the error of the RNN model is close to 0.01 and the network converges.


Introduction
In the 10th year of reform and opening, the piano examination system was restored, and China ushered in the stage of the explosive development of piano education for the rst time [1]. In recent years, with the improvement of material living standards, the awareness of spiritual needs has deepened in people's lives, and piano education has experienced a second explosive development. Whether entering an art school for collective study or asking a pianist to teach individual study, these teaching methods have high professional requirements for piano educators. ere is a serious shortage of traditional Chinese piano educators, so the contradiction between supply and demand is di cult to solve in the short term [2]. Additionally, the commission of traditional educators is generally high, and the price of wooden pianos is high, which is una ordable for ordinary families. High cost has become an important obstacle in the development of piano education. erefore, reducing the cost of learning has become an inevitable trend in popularizing piano education. At present, piano teachers mainly rely on their own understanding to monitor, assess, and precise students' performances in the class. Teachers and students have di erent perspectives on what constitutes play and music. Not only the proper or wrong note but also signi cant aspects like rhythm and expressiveness have an impact on the performance [3]. As a result, the conventional approach to teaching piano music contains aws like strong subjectivity, insu cient judgement, and high unpredictability [4].
e New Generation Arti cial Intelligence Development Plan, which the State Council of the People's Republic of China formally released in July 2017, emphasised the use of intelligent technology to accelerate the reform of talent training models and teaching methods and to build a new education system that includes intelligent and interactive learning [5]. e Artificial Intelligence Innovation Action Plan for Colleges and Universities, published in April 2018 by the Ministry of Education, promoted the advancement of intelligent education; explored a new teaching model based on artificial intelligence (AI); rebuilt the teaching process; and utilised AI to monitor the teaching process, analyse learning situations, and diagnose academic level [6]. In recent years, deep learning (DL) neural network, as one of the more important branches in the field of AI technology, has been fully developed, especially in computer vision, machine learning, and other directions, and education is increasingly integrated. e application of DL neural network structure in the field of education shows a trend of rapid growth [7]. Since 2015, various educational applications of AI have emerged, and a number of companies dedicated to empowering education with AI have also emerged. Under the background of the dual promotion of national policies and industry, a number of key technologies of DL neural networks are playing an increasingly important role in the field of education and are gradually being widely used [8].
Under the requirements of the times, the intelligent teaching evaluation mode has become a popular teaching evaluation method that the education community pays attention to. e DL neural network structure is integrated into the evaluation of piano performance education, which can help students obtain a more comprehensive and complete teaching evaluation [9]. is study mainly proposes a convolutional neural network (CNN)-based Musical Instrument Digital Interface (MIDI) piano performance education evaluation method based on the DL neural network structure. e evaluation grades are divided into five grades: excellent, good, medium, poor, and poor to help music teachers understand students' mastery. is study aims to provide important technical support for alleviating the inadequacy of piano coaches and reducing the work amount of piano instructors. Moreover, it can help in realizing automatic error amendment and evaluation of playing objectives and refining the productivity of music teaching for piano. e goal of this study is to offer crucial technical assistance for addressing the shortage of piano teachers, lowering the workload for piano teachers, achieving the objective assessment of piano performance education, and enhancing the effectiveness of piano music instruction. Following is a summary of our work's main contributions: (I) e deep learning algorithm (DL) and the recurrent neural network (RNN) structure are analyzed; (II) is study proposes a Musical Instrument Digital Interface (MIDI) piano evaluation scheme that is based on the RNN structure and the Spark computing engine and using the Deeplearning4J DL framework; and (III) e Spark distributed computing engine realizes parallelization in music data preprocessing, feature extraction, and model training.
e remaining article is organized in the following manner: in section 2, we discuss various methods such as deep learning and neural networks. Besides, forward propagation and back propagation (BP) networks are also discussed. A deep network model construction of Piano performance evaluation based on RNN method is discussed and proposed in subsequent section 3. Discussion over the dataset is also included. Experimental outcomes and discussion are illustrated in section 4. Finally, we summarize the findings of our study in section 5 and offer ways for further research.

e DL Neural Network
Analysis. Geoffrey Hinton first proposed DL in Science magazine, and it was studied by later generations and gradually emerged. e DL is an extension of the original basis of machine learning [10]. Compared with the machine learning network, the DL network optimizes the hierarchical data of the network structure; therefore, making the overall structure more complex. Furthermore, the internal operation algorithm has also undergone greater progress [11]. e most common algorithms are classified according to common machine learning algorithms and DL algorithms. e classification results are shown in Figure 1(a).
In Figure 1(b), DL learns the inherent laws and representation levels of sample data by analyzing the underlying laws and data structure levels within the sample data and uses the data obtained in the learning process to provide reference explanations for data in other fields [12]. e original research goal of the DL algorithm is to apply it in the field of AI, to help AI so that it can analyze and learn like humans, and to recognize various forms of data. e DL has achieved results in several fields. is should be noted that improving the analytical learning ability of AI through DL helps humans solve many complex data problems [13].
DL is a general term for data research patterns and methods. It is classified according to the specific research content.
e classification results show that DL neural networks include (i) CNN is a neural network system based on convolution operation; (ii) deep belief network (DBN) performs pretraining in the form of a multilayer selfencoding neural network and further optimizes the neural network's weights by combining the discriminant information; and (iii) recurrent neural network (RNN) is a selfencoding neural network based on multilayer neurons, including autoencoder and sparse coding [14].

e RNN Structure Analysis.
Currently, RNN is one of three types of neural networks that are most commonly used in the field of artificial intelligence. e characteristic of RNN is that the neurons in the hidden layer can communicate with each other. When the next input information is processed, the previous output information also affects it [15].
is memory capacity is beneficial for time series analysis. erefore, RNN approach is extensively implemented for natural language processing (NLP), speech synthesis, speech recognition, and other fields of optimization. e structure of the basic RNN is shown in Figure 2.
In Figure 2, the RNN has an input x t at every moment. en, according to the state h t−1 of the RNN at the previous moment, the new state is denoted as h t , and the output is denoted as O t . e current state h t of the RNN is jointly determined according to the state h t−1 at the previous moment and the current input x t . At time t, the state h t−1 condenses the information of the previous sequence as a reference for the output. Since the length of the sequence can be extended indefinitely, however, the h state with a restricted dimension cannot store all statistics of the sequence. For that reason, the model needs to essentially learn and to retain only the utmost significant facts and statistics relevant to the subsequent jobs [16].
At time t, the expression of the output value O t is shown as follows: (1) Input: Input: Input: x t

State s
Output O Input:X Iterate on the data In the above equation, g characterises the activation function, V represents the weight, and s t symbolises the sum of the weights at time t. e weight and s t expression at time t is shown as follows: (2) In the above equation, f symbolises the activation function, U represents the weight, W characterises the state transition weight matrix from the before the next moment, x t represents the input at time t, s t−1 represents the weighted sum at time t−1.
(2) is substituted into (1) to obtain the O t expansion. RNN has a strong memory function for sequence information. e expansion is shownas follows:

Forward Propagation Algorithm.
e forward propagation algorithm, that is, the algorithm that realizes the function of data propagation along the forward direction, calculates the parameters of the forward propagation algorithm [17]. At time t, the hidden state s t is shownas follows: where σ represents the activation function, generally marked as tanh; and b represents the bias. At time t, the output O t is shown as follows: where c stands for bias. At time t, the predicted output y t is given by: At time t, the hidden state h t is shownas follows: where s is the internal state, f is the excitation function, and θ is the weight coefficient inside the recurrent unit.

Backpropagation Algorithm.
e RNN forward propagation algorithm is used as the basis, and the RNN backpropagation algorithm is calculated. e process of the RNN backpropagation algorithm is to calculate the gradient of each parameter of the model, that is, the gradients of U, W, V, b, and c through the transferred property of the gradient descent error [18]. e loss function is set to the crossentropy loss function L, the output activation function is the softmax function, and the activation in the hidden layer is the tanh function. e total loss function is shown as follows: e gradients of V and c are given by the following equations, respectively: 2.5. e Basic Framework of Spark. Spark is a fast, generalpurpose, and scalable big data analysis computing engine based on memory. e following are the major features of the Spark framework.
(1) Rapidity: Spark supports memory-based computing, which can effectively save Input/Output (IO) resources compared to MapReduce's disk-based computing engine. In iterative operations, it runs nearly 100 times faster than MapReduce. Even, Spark is still nearly ten times faster than MapReduce on disk-based computing due to the superiority of Scala code for the functional language. (2) Versatility: Spark provides standardized solutions to reduce enterprise development costs. (3) Scalability: In addition to its own resource scheduler, Spark can replace Hadoop's Yarn or Mesos resource scheduler and can process all data supported by Hadoop [19]. e internal module structure of Spark is shown in Figure 3. e update of parameters between each node is scheduled through the scheduler.
e data parallelization saves a copy of the network model on each compute node and trains its own batch of data separately. en, the global network parameters are updated according to a synchronous or asynchronous mechanism [20]. Deeplearning4J mainly uses data parallelization solutions. Currently, Deeplearning4J supports two data parallelization strategies: a parameter synchronous averaging and a decentralized gradient sharing scheme [21]. e principle of data parallelization is shown in Figure 4.
In Figure 4, the principle of parameter synchronization and averaging is to define a parameter server, for instance, Parameter Server in the Driver.
is example uses a weighted average algorithm to calculate the updated model parameters after collecting training parameters from multiple nodes. en, the updated parameters are made into broadcast parameters and passed to each node [22]. e principle of decentralization is that multiple nodes are connected in pairs and can update parameters with each other. In order to solve the bottleneck problem in network transmission, the decentralized gradient sharing scheme defines a threshold for each gradient change. e gradient parameters can only be updated when a certain gradient change is greater than the threshold.

Model Construction of Piano Performance Evaluation Based on RNN
is study adopts the Deeplearning4J DL framework that trains and runs on the Spark-Yarn run mode cluster and Deeplearning4J cluster. e User Interface (UI) provided by the Deeplearning4J is used to monitor the training effect in real time and adjust the model parameters at any time. Additionally, the RNN model with attention mechanism can achieve efficient and accurate Musical Instrument Digital Interface (MIDI) piano performance evaluation. e framework of the evaluation model is shown in Figure 5.
In Figure 5, in the data acquisition module, the Sqoop tool is used to migrate data to the Hadoop Distributed File System (HDFS). In the data preprocessing module, raw data that are not suitable for training are filtered. e raw data are transformed into an input matrix form that is suitable for the RNN model training. e dataset is divided into training, validation, and test datasets. In the music evaluation classification module, the Spark-Yarn cluster is built. On the distributed framework, the RNN model is built. e preprocessed data are fed into the model training. e model parameters are adjusted in real time through the UI provided by Deeplearning4J, and the model parameters with better evaluation effects are obtained [23].

Data Preprocessing.
Before training the model, the MIDI music data are predesigned. Timestamps in track chunks are unified to 1/16th notes, efficiently handling tedious preprocessing. e preprocessing content includes filtering music data synthesized by multitrack and MIDI software, extracting feature vectors, and designing a more reasonable model input format. ere are two main types of characteristics of music. e first category is physical characteristics, including pitch and timbre. e second category is time-domain features, including short-term energy, shortterm average zero-crossing rate, and short-term average amplitude. Since the MIDI music is digital music, the MIDI format can completely record the required physical characteristics, which is difficult to obtain compared to the timedomain characteristics [24]. erefore, physical features are selected as the feature research direction. Because the timbre is only related to playing the instrument, the model only evaluates the piano music, so the timbre can directly get the default value. e ordinate of the input feature matrix is set as the time series; the abscissa is set as the key information; the matrix elements are set as the pitch information. Since the piano keyboard has 88 keys, the dimension of the abscissa is 88. e ordinate dimension is determined by the playing time of the music.

MIDI-based Piano Evaluation Neural Network Model.
After obtaining a piece of piano playing audio, firstly, a data preprocessing algorithm is used to obtain the start and endpoints of each note. At this time, the time value information of the musical note is also determined. e functional requirement of this model module is to evaluate multiple pieces of MIDI music with high accuracy. e evaluation results are divided into five grades: excellent, good, medium, poor, and poor. is study uses the RNN structure and attention layer in DL for classification by softmax function. e complete model structure design is shown in Figure 6:

Mobile Information Systems
In Figure 6, multiple subnet models are designed to realize the function of evaluating multiple pieces of piano music. Each subnet model needs to be trained separately. Multiple subnetwork models are evaluated separately for specific piano music. e evaluation subnet model obtains a feature matrix by training sample repertoires with differences in level. en, a classification algorithm is used for evaluation. e main structures of the evaluation subnet model are input, bidirectional LSTM, attention mechanism, and output layer [25]. e input layer receives the difference sample repertoire, obtains the input feature matrix through data preprocessing, and then inputs it into the RNN structure. After the sample passes through the attention mechanism layer, the evaluation is obtained through the softmax function of the output layer.

Data Acquisition.
e data of this experiment mainly come from two parts. In fact, part of it comes from the database provided by five piano training courses in Guangxi, and the other part comes from the MIDI Show web page. e  MIDI enthusiasts upload their works on this platform and score them. e scores are divided into five intervals, which correspond to the five evaluation levels of the system. In order to facilitate the implementation and analysis of the evaluation model, this study only selects piano music in 4/4 time [26]. e details of the experimental dataset are shown in Figure 7. In Figure 7, 30 piano pieces are selected for each evaluation level, for a total of 150 pieces. For each evaluation level, 15 songs are selected as the training set, 5 as the validation set, and 10 as the test set.
In databases, the quality of music data is uneven. Neural network models cannot recognize raw data. Data are preprocessed. Preprocessing is divided into two Figure 8 steps: (1) Music files synthesized by MIDI software are filtered. e filtering process is shown in Figure 8; this should be kept in mind that when the number of pitch values in the experimental dataset is less than 40 or the number of pitch values is greater 80 while is less than 20, then in that cases the MIDI music is filtered out. (2) e filtered data are vectorized; that is, the data in the form of MIDI are converted into an input matrix that the RNN structure can read.

Analysis of RNN Parameters.
e choice of the number of layers of the RNN structure and the corresponding number of nodes will directly affect the reliability and accuracy of the model. A reasonable number of layers of the RNN structure and the corresponding number of nodes are given so that the model is optimal, and the prediction error is minimized [27]. e value of the loss function L for different layers and different nodes of the RNN is analyzed separately, as shown in Figure 9.
In Figure 9, when the RNN structure is a single layer, the more node values, the smaller the L value. After the node value is greater than 352, the rate of decrease of the value slows down. e RNN structure is a double layer; the more node values, the smaller the L value. After the node value is greater than 176, the rate of decrease of the value slows down. When the RNN structure is three layers, the changing trend of node value and L value is not obvious. erefore, a three-layer RNN structure is chosen [28]. Additionally, the number of nodes per layer is set to 352, 176, and 88.

e RNN Training.
e mean square error (MSE) is used to assess the performance of the proposed RNN model. After the input of the parameters affecting the piano performance, education evaluation and the parameters and structure of the RNN are determined. Subsequently, the neural network is trained to attain the anticipated needs for precisions and accuracy. First, the standard information of each characteristic is obtained by playing and MIDI file by the piano teacher. en, through different levels of piano performance, input features are extracted. e training method of RNN is mainly to obtain training samples by playing the Minuet piano piece several times by two piano teachers and three students. e input data to the RNN are derived for each play [29,30]. e overall rhythmic feeling and level of playing expression are then assessed separately. e supplied data's range is 0-1. For this training, 10 samples were gathered. e parameters are as follows: learning rate of 0.76, momentum factor of 0.4, and error of 0.1.

Analysis of the RNN Training Results.
e RNN structure model is trained as shown in Figure 10.
In Figure 10, with the increase of training times of RNN, the resulting error value gradually decreases. After training 3000 times, the error value of the RNN structure is basically fitted with the standard error value, reaching the network convergence accuracy. Additionally, the correlation coefficient between the output of the RNN structure network and the target  is as high as 0.99947, showing a high degree of fit. e designed RNN performance can meet the actual requirements. e designed teaching evaluation model is used to evaluate the piano teacher, student A (piano level 6 level) and B (piano level 5 level), and the evaluation results are shown in Figure 11.
In Figure 11, the average value of the overall evaluation of the piano teachers is 0.92, the average value of the

Conclusions and Future Work
e proposed RNN-based MIDI piano performance education evaluation method makes up for the shortcomings of the rule-based evaluation method, which cannot consider the coherence and expressiveness of music. First, raw data are selected from the MIDI database and the educational data of a local piano training institution. en, the raw data are preprocessed. Next, the three-layer bidirectional LSTM neural network and the attention mechanism make it easier for the model to capture useful information. Additionally, the Spark cluster is built using the Deeplearning4J DL framework to train the model. e work efficiency is improved by adjusting model parameters through UI dependencies provided by Deeplearning4J. Additionally, the RNN parameters are analyzed. e results show that the error value of the three-layer RNN structure is smaller. Local piano training institutions and MIDI website data are used as a basis. Experimental samples are collected. ese samples are used to train the neural network and to test the performance of the evaluation model. e findings demonstrate that (1) the evaluation outcomes of the developed piano performance evaluation model are largely consistent with the actual skill level of the players and have some degree of viability; and (2) after 3000 training cycles, the RNN error is close to 0.01, and the network converges. e disadvantage is that the research stays at the theoretical level.
In order to test a large amount of data, the DL network is developmental, and it is necessary to update the evaluation model according to the latest development. e purpose is to provide important technical support to improve the efficiency of piano music teaching. Similarly, deep neural networks and the impacts of the model activation functions should be analyzed in subsequent research efforts. e Spark framework runs the deep learning algorithms in parallel; however, the computational time is still limited on a single online system. erefore, big data technologies such as cloud and edge computing should be used to improve the training efficiency in terms of computational times.

Data Availability
e experimental data used to support the findings of this study are available from the author upon request.

Conflicts of Interest
e author declares that there are no conflicts of interest regarding this work. Mobile Information Systems 11