A Factorization Deep Product Neural Network for Student Physical Performance Prediction

As we all know, sports have great benefits for students. However, with more and more learning pressure, students' physical education has not been paid attention to by teachers and parents, so the analysis and prediction of physical education performance have become significant work. This paper proposes a new method (factorization deep product neural network) for PE course score prediction. The experimental results show that, compared with the existing performance prediction methods (LR, SVM, FM, and the DNN), the proposed method achieves the best prediction effect on the sports education dataset. Compared with the traditional optimal methods, the accuracy and AUC of DNN are both improved by 2%. In addition, there is also a significant improvement in accuracy, recall, and F1. In addition, this study found that considering two or more features at the same time has a certain influence on the prediction results of students' grades. The proposed feature combination method can learn feature combinations automatically, consider the influence of first-order features, second-order features, and high-order features in the meantime, and acquire the relationship information between each feature and performance. Compared with single-feature learning, the proposed method in this paper can enhance prediction accuracy significantly. Moreover, several dimensionality reduction methods are used in this paper, and we found that the PCA model for data processing outperformed all the benchmark models.


Background
In the information age, a large amount of data has been accumulated in all walks of life, under which there is often some useful knowledge and valuable information. At present, technologies related to machine learning and data mining are widely used in business, fnance, medicine, and other felds.
With the fast development of the Internet, universities have increasingly perfected their digital campuses, and all kinds of educational data have been accumulated. However, there is often some potential knowledge and information in the massive educational data that can promote the development of education. ML data mining and other related technologies are used to provide valuable information for teachers, students, and educational researchers, so as to scientifcally improve teaching methods and make comprehensive management decisions. Terefore, it is worth studying the way how better teaching efciency and educational output are obtained from the big data of education. From the 1990s to the beginning of this century, with the fast development of the Internet, the education informatization has gradually entered the network era, and distance education and online education have attracted more and more educators' attention. Te current mainstream education environment can be roughly divided into traditional classroom education and new online education, such as MOOC. Educational data mining deals with various problems in teaching, practice, and educational research through theories and technologies in multiple disciplines including pedagogy, computer science, statistics, and psychology. Currently, EDM application scenarios can be mainly divided into four categories: student performance prediction, student modeling, a recommendation system, and visualization.

Related Works
At present, education quality has become the top priority in education. Improving the quality of education is one of the unyielding determinations of educators. Now many scholars have conducted research on the prediction of students' academic performance. Early research mainly focused on collecting student learning data (such as traditional classroom teaching test scores) from the educational administration system. Students' consumption behavior data are collected from students' campus cards to predict scores. Burman used the records collected by questionnaires to classify learners into high, average, and low levels according to their academic performance based on students' psychological parameters, including personality, motivation, psychosocial background, learning strategies, learning methods, and socioeconomic status, by using a multiclassifer support vector machine [1]. A team including Sweeney used SVD, SVD-KNN, factorizers, and other recommendation system methods to predict the grades of the next semester and made a comprehensive analysis of the predicted results [2]. Sweeney proposed a method of mixed decomposer and random forest to predict students' scores by taking advantage of the course scores learned by students [3]. A team including Polyzou proposed a sparse linear and lowrank matrix decomposition model to predict future course scores based on students' historical course scores [4]. Yi et al. predicted students' scores through a multikernel support vector machine combined with an optimization algorithm, and then successfully evaluated the teaching quality [5]. AI-based methods play an increasingly important role in teaching quality evaluation [6,7] and student performance prediction [8,9].
Recently, with the development of the Internet and the continuous improvement of online learning platforms, the data related to MOOC students' learning has attracted more and more attention from relevant researchers. A team including Jiang used the interaction records of learners' frst week on the platform and the performance data of homework to predict whether learners would eventually obtain certifcates based on the logistic regression model [10]. Brinton and MChiang developed an algorithmic model relying on the decomposing machine and K-nearest neighbors (KNN) to predict whether a student answers a question correctly for the frst time in a MOOC [11]. Lorenzo and Gomez-Sanchez adopted logistic regression, stochastic gradient descent, stochastic forest, and support vector machine models to predict whether the indicator, compared with the previous indicator at the end of the chapter, the three participation indicators (video, exercise, and assignment), would decline [12]. Hlosta builds a student performance prediction model based on machine learning methods (logistic regression, support vector machine, random forest, naive Bayes, and integrated learning XGBoost) in accordance with the data generated in the current course to evaluate whether students have the risk of dropping out [13]. A team including Aljohani deployed a deep long and short-term memory model based on student interaction records on online platforms (such as clickstream data) to explore student performance prediction , and the results showed that the model could predict pass/fail courses in the frst 10 weeks of student interaction in a virtual learning environment with an accuracy of about 90% [14].
Compared with the application of the classical ML-based model to education such as LR [15], SVM [16], Decision Tree [17], GBDT [11], and BNs [18], the DL-based model such as RNN [19] and CNN [20] can also be used to enhance the results in the feld of education.
To sum up, some achievements have been made in the study of performance prediction, but there are still some problems and shortcomings.
First, in the traditional classroom grade prediction problem, the main data information comes from some data generated during the course, such as the in-class assignment grades and unit test grades. Te characteristics of the course grade prediction can be achieved until the end of the course, which leads to the late predicted results, so the method has a certain lag and the data are sparse and single, so that it cannot provide efective technical support for the teaching and management work in the early stage of the course.
Second, with a lack of other relevant course grade information, the existing online platform course grade prediction research mainly focuses on the log data of learners on the learning platform, such as the learning time on the online learning platform and the number of clicks on the learning video. In addition, in existing research, manual feature engineering is commonly used, which is highly dependent on the professional knowledge and experience of engineers, which afects the prediction accuracy of the method to a certain extent.
Tird, most of the data used in the existing research on performance prediction come from the dataset constructed by researchers themselves, and the data are generally not enough. For mainstream research methods such as machine learning algorithms, there are certain requirements on the amount of data. If the data are insufcient, it is difcult to train a better model, which leads to low accuracy of prediction to a certain extent. In view of the above problems, in this paper, two kinds of diferent data are used to put forward diferent performance prediction models, so as to improve the accuracy of performance prediction.
To sum up, under the background of educational data mining, this paper carries out in-depth and systematic research on student performance prediction from the perspectives of traditional classroom teaching scenarios and MOOC online platform courses. Te research focus is mainly on improving the predictability and accuracy of the method. In the following sections, the main research content of this paper is briefy introduced.

Problem Defnition.
Given a student feature set F determined by the student attribution features expressed as stu de nt attr � s 1 , s 2 , . . . , s m , course attribution features expressed as course attri � c 1 , c 2 , . . . , c n , and the student learning behavior feature expressed Namely, F � stu de nt attr, course attr, bahavior { }, where m, n, and k are the number of features, respectively. For the student, his fnal course score was y t . y � 0, 1 { } is a class set divided by grades, where 0 indicates a student grade failure and 1 indicates a student grade pass. Student grade prediction aims to predict the grade category y i according to the student feature F.

Model Framework.
Tis chapter aims at mining and analyzing the data related to students' learning based on deep learning technology to realize the accurate prediction of students' academic performance. By so doing, timely help and guidance can be provided to students at risk of failing exams. Terefore, this chapter proposes a performance prediction model (factorization deep product neural network, FDPN) based on feature combination, course attributes, and students' learning behavior features. Te model framework is shown in Figure 1. Te FDPN contains 3 layers: (1) Embedding layer: narrow the dimension of the original high-dimensional features and map them to the low-dimensional feature vector. (2) Concatenate layer: this layer is composed of three parts: factorization machine, DNN, and product neural network (PNN). FM is used to express frst-order and second-order features, and DNN and PNN are used to represent higher-level features. (3) Prediction layer: through splicing, the low-level and high-level features are combined to get the fnal features with richer information, so as to better predict students' performance.

Embedding Layer.
Because the raw data are relatively sparse, a dimensional reduction is made to obtain a low-level representation of the features. Changing the initial feature to a lower-dimensional vector representation can make the data relatively dense and reduce computational efort. Figure 2 shows the structure of the embedding layer.
Mapping the output of the embedding layer could be introduced as follows: a � e 1 , e 2, . . . e p .
Where a represents the embedding feature, e i represents number i embedding feature, p refers to the number of embedding features, and p ≤ (m + n + k).
. .  Computational Intelligence and Neuroscience

Concatenate Layer
(1) Factorization machine. Te FM, as proposed by Rendle [21], is for learning feature interactions. As shown in the formula: where w 0 , w i , and w ij are the weights of each feature. Te factorization machine is represented by a frst-order feature of logistic regression learning, and the secondorder features of learning information are accumulated by the dots of vector. Te last output value y fm in the FM layer is transferred as input to the part of the input in the prediction layer. DNN [22] is more capable of learning. Te output of the embedding layer is the input of the frst hidden layer of the DNN, and the calculation formula of the frst hidden layer is shown in the following formula: Assuming that there are l hidden layers, which directly output y dnn to the input part of the prediction layer, the fnal output value of DNN is shown in the following formula: where f (.) is the activation function of the hidden layer, whose activation function is the ReLU [23].
(ii) Product neural network. Te product neural network (PNN) is a feed-forward deep neural network [24] containing the product layer. In the PNN, the input information not only contains frst-order feature-related information but also second-order features. Terefore, the product layer enriches the information of the input deep neural network. Its second-order features are calculated in (5) Te input vector of the PNN is composed of the frstorder feature vector output by the embedding layer and the second-order feature vector generated by the interaction of the embedding layer. Te calculation is shown as follows: Te fnal output value y pnn of the PNN is calculated, as in formula (4), which distinguishes itself from the DNN by varying the input feature vector from the embedded layer to the frst hidden layer. Te output values of the last hidden layer node of the PNN will be transmitted directly as input to the part node of the prediction layer input.

Prediction Layer.
Te prediction layer's primary task is to combine the low-and higher-order feature representations of FM, DNN, and PNN output in the network layer and predict the grade categories of the target students. More comprehensive and accurate students' performance can be predicted by integrating the features.
In this paper, features including y fm , y dnn, an d y pnn are integrated through concatenation [25]. Tis process can be formalized as follows: f � y fm ; y dnn ; y pnn , where f is the fnal feature after integrating of y fm , y dnn and y pnn Finally, the feature f is input into the perceptron of the Sigmoid [26,27] activation function to obtain the probability of the student course grade category. From the above, FDPN includes three parts: FM, DNN, and PNN, and the fnal result is obtained from the following formula:

Loss Function.
Tis paper employs the cross-entropy loss function and employs the L2 regularization parameter [27]. Te loss function of the model is as follows: where n is the total number of training data, y i is the grade category of the data, g is the predicted probability of the number i grade category of the data, and λ‖θ‖ 2 is the L 2 regular term, θ is the set of all parameters of the model.

Data Set.
In this study, part of the data comes from the Open University Learning Analysis Data set (OULAD), which contains basic information, registration, and learner learning activity records from seven sport online courses from 2013 to 2014. Figure 3 shows the learning process of using the Open University platform. First of all, the Open University opens up a course for students to apply for the course registration, and then students begin their learning. Te courses of the Open University usually last for nine months, and learners are required to complete corresponding learning tasks during the learning process. Finally, learners take the fnal examination, and the course ends. Te description of the data set is introduced in Table 1. Numbers 1 to 54 refer to the highest degree of students who register for the course, the environment index in school learning, age, times of trying a specifc module, credits of students who are currently learning, . . . , and times of clicking additional information before the course, such as video, tape, website, environment index of learning the module, times of clicking shared information between staf before the course, times of click PDF resources like books before the course and clicking information on the website and related activities.
According to the above descriptions, 22347×33 valid data are preprocessed.

Evaluation Indicators.
Te evaluation indexes in this paper include accuracy, precision, recall, F1, and AUC (area under the curve), which measure the model classifcation prediction performance.
Accuracy � TP + TN TP + TN + FP + FN , Te meanings of TP, FP, FN, and TN are shown in Table 2:

Number of Neurons in the Hidden Layer.
Tis model contains two deep neural networks. When a network contains multiple hidden layers and the number of neurons in each hidden layer is not the same, a lot of experimentation is required. In this paper, the same number of neurons is set in the hidden layer of the two neural networks to simplify the experiment. Te experimental results are shown in Figure 4 and Table 3. Six experiments are conducted in turn to change the number of neurons. From the experimental results, it can be seen that when the number of neurons is 256, the recall rate is about 95%, the AUC is about 82%, the accuracy is about 86.6%, and the precision rate is 86.8%. Te model performance is optimal because, with the increase in the number of neurons, the model can learn more feature information. However, when the neurons increase to a certain number, no more efective information can be learned by the model, and even the noise that degrades the prediction performance of the model may be generated. Terefore, in deep neural network training, too many neurons should be moderated. Model training and learning comparison are needed to select the optimal number of neurons.

Diferent Activation Functions.
Te activation function of hidden layer neurons is related to the prediction Shared information Times of clicking on the course and faculties' shared information before class 53 Sources Times of clicking on the PDF resources, such as books 54 Related information Times of clicking on the information on the website and activities related to that information   efect about DAF. Due to the binary classifcation model used in this paper, the model fnally predicts the output unit with the Sigmoid function, and the settings of the remaining activation functions are the same. Among ReLU, Tanh, and Sigmoid, ReLU and Tanh are better used in deep learning models, so this experiment only compares the ReLU activation function and the Tanh activation function of the hidden neuron activation function. Table 4 introduces the experimental results. We can fnd that, when the activation function is ReLU, the prediction accuracy, recall rate, F1, and AUC of the FDPN model are increased by about 2%, and the prediction efect of the hidden layer neuron activation function with ReLU is better than that of the Tanh function.
In particular, this chapter makes use of two feed-forward neural networks, in which the ReLU activation function performs better than the Tanh function and will be used in the last part of our study.

Te Number of Layers in the Hidden Layer.
Te model presented in this paper contains two feed-forward neural networks, DNN and PNN, with diferent numbers of hidden layers and diferent predictive power of the models. To simplify the experiment, the number of hidden layers in two neural networks is the same; that is, the number of hidden layers is increased from 1 to 3 layers. Te experimental results are shown in Table 5. From the experimental results, it can be seen that when the hidden layer is 1 and 2, the model has a good performance, and when the hidden layer is 3, the model prediction efect is signifcantly reduced. As the number of layers increases, the evaluation index drops, mainly because the more layers, the more complex the structure, and the larger the calculation amount, the more likely the problem of over-ftting the model will appear. Terefore, the number of layers of the hidden layer is still set to be 1 layer in the subsequent experiments of this paper.

Efect of the Model Structure on the Performance.
Te infuencing factors mainly include frst-order representations and second-order representations of FM learning features and diferent higher-order representations of DNN and PNN learning features. In this paper, the three structures are combined for learning to predict performance. In the experiments of this section, the diferent feature combination structures are compared to observe the efect of the structure on the model's performance. Te experimental results are shown in Table 6. Experimental results showed that the single-structure FM, DNN, and PNN slightly performed worse, and the Deep FM, DNN + PNN, and FM + PNN of both structures that have combined feature learning are slightly better than the single structure. Te FDPN model is the optimal one because it considers both frst, second, and two diferent higherorder feature representations, during which more potentially efective information is used in performance prediction. In conclusion, the FDPN performance prediction model has a signifcant prediction efect and can improve prediction performance.

Experiment Comparison.
In this paper, LR, SVM, FM, DNN, DeepFM, PNN, and other deep learning models are used as comparative models. By performing comparative experiments on the sports dataset, the results are shown in Table 7 to verify that the proposed FDPN model has the best prediction performance.
Te experimental results in Table 7 show that, compared with the existing performance prediction methods (LR, SVM, FM, and the DNN), this paper achieves the best prediction efect on the sports education dataset. Compared with the optimal traditional methods, DNN accuracy and AUC are both improved by 2%. In addition, there are also signifcant improvements in accuracy, recall, and F1. Te method based on feature combination is better than the four traditional performance prediction methods. this is mainly because the traditional performance prediction method adopts features directly as a classifcation feature input for model learning training and only the low or high features are taken into consideration, with the exception of the diferent efects of low and high feature combinations on the fnal    performance. For the other two feature combination methods (DeepFM and PNN), this paper extracts the feature information, including frst and second-order features and two diferent higher-order features, and thus the prediction ability of the model can be greatly improved to achieve a good prediction efect. Trough the experiment, the efectiveness of the model was also confrmed. We use LDA (Latent Dirichlet Allocation) and LPP (Locality Preserving Projects) methods for combination comparison. LDA is a generation model for document topics. Make a guess about the topic distribution of the document. Tis model can represent all topics in the document set in the form of a probability distribution and realize topic clustering and text classifcation through the probability distribution of each topic. LPP is a linear manifold learning algorithm, which can preserve the local manifold structure of the original dataset and keep it in low-dimensional space. LPP is completely unsupervised; the eigenvectors of LPP are statistically correlated and not orthogonal. Tis means that LPP does not introduce category tags in the process of feature extraction, and category tags are of great signifcance for guiding feature extraction for classifcation problems. Table 8 shows diferent performances under diferent dimensionality reduction methods. It can be found that the FDPN model after PCA dimensionality reduction achieves the best experimental results. In addition, we can also fnd that compared with the FDPN model alone, the use of the LDA method does not improve the fnal classifcation results; compared with the FDPN model alone, the LPP method improves the fnal classifcation result but is not as good as the PCA method.

Conclusion
Tis paper presents a new feature combination and structure model for the shortcomings of existing sports course performance prediction methods. We proposed a new method (factorization deep product neural network) for PE course score prediction. Te experimental results show that, compared with the existing performance prediction methods (LR, SVM, FM, and the DNN), this paper achieves the best prediction efect on the sports education dataset. Compared with the optimal traditional methods, the DNN accuracy and AUC are both improved by 2%. In addition, there are also signifcant improvements in accuracy, recall, and F1. Te model proposed by us provides an efective method for predicting students' performance in physical education courses.

Data Availability
Te experimental data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te author declares that there are no conficts of interest.