Research on PMF Model Based on BP Neural Network Ensemble Learning Bagging and Fuzzy Clustering

Probability matrix factorization model can be used to solve the problem of high-dimensional sparsity of user and rating data in the recommender systems. However, most of the existing methods use the user to model the item rating, ignoring the relationship between the user and the item, so the accuracy of user-item rating prediction is still low. Therefore, this paper proposes a probabilistic matrix factorization model based on BP neural network ensemble learning, bagging, and fuzzy clustering. Firstly, the membership function of fuzzy clustering and the selection of cluster center are used to calculate the user-item rating matrix; secondly, BP neural network trains the user-item scoring matrix after clustering, further improving the accuracy of scoring prediction; ﬁnally, the bagging method in ensemble learning is introduced, which takes the number of user-item scores as the base learner, trains the base learner through BP neural network, and ﬁnally obtains the score prediction through the voting results, which improves the stability of the model. Compared with the existing PMF models, the root mean square error of the PMF model after fuzzy clustering is increased by 9.27% and 3.95%, and the average absolute error is increased by 21.14% and 1.11%, re-spectively; then, the performance of the ﬁrst mock exam is introduced. The root mean square error of the ensemble method is increased by 4.02% and 0.42%, respectively, compared with the existing single model. Finally, the weights of BP neural network training based learner are introduced to improve the accuracy of the model, which also veriﬁes the universality of the model.


Introduction
In recent years, matrix factorization technology, with good scalability and high recommendation accuracy, has developed rapidly [1]. After the famous Netflix recommendation contest, matrix factorization has received more attention.
e basic idea of matrix factorization technology is to assume that users' preferences and project characteristics can be described by potential factors and find the minimum sum of squares of the distance between the original scoring matrixes. e representative ones are probability matrix factorization, Bayesian probability matrix factorization, and fast parallel matrix factorization.
Koren put forward a new SVD++ model by combining the matrix factorization model with the domain-based recommendation method [2]. Salakbuttinov and Mnih analyzed the principle of matrix factorization from the angle of probability and put forward the probabilistic matrix factorization (PMF) model [3], which extended matrix factorization to any maximum likelihood solution. Later, the Bayesian probabilistic matrix factorization (BPMF) was put forward [4]. e idea of ensemble learning was also adopted to improve the accuracy of the recommendation system. Fang et al. [5] integrated the recommendation methods based on user similarity, used different similarity measures to generate different recommendation models, and weighted sum to get the final prediction score, which improved the prediction accuracy of the model. Cui et al. [6] constructed a new dataset by combining user-based and product-based prediction score difference with real scores and then trained and predicted with the XG-boost model. All the above integration methods are based on a content-based recommendation algorithm, which has the defects of high time complexity and relatively low prediction accuracy. When applied to high-dimensional sparse data, users or commodities with 0 similarities may appear, which can reduce the prediction accuracy of the algorithm.
Based on the above analysis, we can conclude that the probability decomposition matrix has inherent defects in the face of high-dimensional sparsity. In this paper, a probability matrix factorization model by fusing the ensemble learning bagging method based on BP neural network and fuzzy clustering is proposed. e main work is as follows: (1) e scoring matrix of users and items is calculated by using the membership function of fuzzy matrix and the selection of cluster center, which is more accurate than the traditional probability matrix method and can better construct the scoring matrix of users and items. (2) e bagging method in ensemble learning is proposed to generate different training sets by selfsampling, and ensemble learning is introduced into this model, thus increasing the parallelism and improving the accuracy and stability of scoring prediction.

System Model
In this section, we review the literature related to our work and discuss their differences with our contributions.

Probabilistic Matrix Factorization (PMF).
Salakhutdinov et al. proposed the PMF, which is a well-known approach for recommendation systems. Table 1 summarizes the  notations of PMF, and Figures 1 and 2 show the overview of the graphical model of PMF. We suppose that M users, N items, a rating matrix R ∈ R k×N , and item latent matrix R ∈ R k×N to reconstruct the rating matrix R. e goal of the PMF is to determine the optimal matrix U, V and minimize the loss function ε as follows: After the objective function is determined, the stochastic gradient descent method is used to update u and v iteratively to minimize the objective function: where α is the learning rate. When a certain number of iterations or the change of the objective function is less than a certain threshold, the iteration stops. Finally, the U, V characteristic matrix is trained to predict the score.

Fuzzy C-Means.
Fuzzy c-means is an unsupervised clustering algorithm in which each point has a certain strength of association between the nodes and the particular community [7]. e FCM minimizes an objective function J f : where u ij is the membership degree of the i-th node to the jth cluster and d ij � x i − c j is the distance between the i-th node and the center of the j-th cluster. During optimizing J f , the constraint k j�1 u ij � 1 must be satisfied. e parameters of controlling the fuzziness of the algorithm. As f turns out to be larger, the process is fuzzier. c j can be calculated by the following equation: u ij can be calculated via the following equation: Rating of item j given by user i r ij Predicted rating of item j given by user i U User latent factor V Item latent factor k Size of latent factor I Indicator, 2 Complexity J f can be minimized by iterative optimization with the update of membership degree u ij and the cluster center c j .

Ensemble Learning.
Ensemble learning is to use a series of basic learners to learn [8] and then integrate the learning results based on certain rules, to get a better learning method than a single learner. Usually, there are some differences between base learners, either different algorithms or the same algorithms (with different parameters or super parameters). Generally speaking, the greater the difference between base learners, the better the final learning results. Ensemble learning has great advantages in performance improvement, so it is widely used in theoretical research and practical application. e classical ensemble learning methods mainly include bagging and boosting. In this paper, the bagging method is used, so the principle of this method is introduced in detail.
Bagging (bootstrap aggregation) is a classic parallel ensemble learning algorithm. It is a bootstrap sampling ensemble learning algorithm. It can obtain a lower prediction error and improve the accuracy of the recommendation algorithm. e general idea of the algorithm is as follows: given a dataset D containing K samples, a sample is randomly taken out and put into the sampling set, and then the sample is put back into the original dataset, so that the next sampling of the sample may be selected. Because of the use of the sample put back, a sample may appear many times in the sample set, or it may not appear once. In theory, after K times of random sampling, the sampling set D′ containing K samples can be obtained. It is worth noting that if the probability of each sampling in the initial training set is 1/K, the probability that the sample will not be collected in K sampling is (1 − 1/k) k , and the limit is From the above formula, the probability of being sampled is In other words, the probability of each sample in the sample set obtained by autonomous sampling is 63.2%. Using the above method, G sampling sets D containing K samples can be sampling sets {D 1 ′ , . . . D G ′ }, a base learner is trained based on each sampling set, and then the base learner is integrated to generate the model prediction. Figure 2 shows the structure of the bagging model.

BP Neural
Network. BP neural network is composed of an input layer, hidden layer, and output layer, which can realize continuous nonlinear mapping [9]. BP neural network is a kind of multilayer feed-forward neural network, which is characterized by signal forward propagation and error backpropagation. In the process of forward propagation, the signal is processed layer by layer from the input layer through the hidden layer and finally reaches the output layer. Figure 3 shows the topological structure of the BP neural network.
BP neural network is a supervised learning algorithm, which completes the mapping from input to output by minimizing the objective function.
e main idea of the algorithm flow of bagging algorithm-integrated BP neural network is shown in Algorithm 1.
e basic processing framework of the BP neural network is shown in Figure 3, where X � (X 1 , X 2 , . . . , X n ) is the set of n values that are input from the outside or other neurons output; W � (W 1 , W 2 , . . . , W n ) is called the weight, representing the connections strength between the neuron and other neurons; WX is called the activation value that is equal to the total input of the artificial neuron; O refers to the output of the neuron; b refers to the threshold of this neuron, and if the weighted sum of the input signal is greater than b, the artificial neuron is activated. In this way, the output of the artificial neurons can be described as follows: In equation (8), f(·) is called the activation function. e activation function used in this paper is a nonlinear transformation function and bipolar sigmoid function (tanh (x) function). In the process of error backpropagation, the problem of derivation with respect to the activation function is involved, and the tanh(x) function solves the problem of derivative discontinuity and the output problem of zerocentered effectively, so it is used as the activation function of this paper. so it is used as the activation function of this paper. It is defined as follows: e basic processing framework of the BP neural network is shown in Figure 4.
is paper uses a three-layer BP neural network with a single hidden layer structure to simulate the change of the outburst.

Probability Matrix Factorization Model with Fuzzy Clustering
To further improve the prediction accuracy of probability matrix factorization for high-dimensional and sparse matrices, this paper uses the FCM method to process the score matrix by fuzzy clustering. On the one hand, the FCM algorithm is suitable for solving the problem of high-dimensional [10] and sparse data and has strong scalability; on the other hand, it can solve the shortcomings of hard clustering, that is, it does not force a certain score to be classified into a certain category but expresses the degree of a certain category's score belonging to a certain category in the Complexity 3 form of membership function, to better divide scoring users without clear boundaries.

Algorithm ought.
FCM is introduced into the scoring matrix [11], where n users score m items. Every element in the matrix x ik represents the score of user i on item k, and the row of the matrix x i � (x i1 , x i2 , . . . , x in ), where iε [1, n] represents the users; the column of the matrix x j � (x 1j , x 2j , . . . , x mj ), where jε [1, M] denotes the project.
Users are clustered according to the scores of each user, and the whole user is divided into the number of c clusters so that the similarity of user scores in the same cluster is the highest, and the clustering results are expressed by the membership matrix U. e objective function J f of fuzzy clustering based on user-item scoring matrix is as follows. e FCM minimizes an objective function J f : e membership matrix needs to be generated by the fuzzy clustering Algorithm 2, and the fuzzy similarity matrix needs to be constructed by the data similarity in the matrix. e construction methods of fuzzy similarity matrix include maximum and minimum calculation method, cosine angle is paper mainly adopts the correlation coefficient method. Figure 5, we demonstrate the workflow of our work in which, first, the training dataset is engaged and then FCM is used to classify the users in the training dataset by applying the similarity of the user rating matrix. e useful rating predictions are delivered to the users who received the effects of FCM and PMF models.

Probability Matrix Factorization Model Ensemble Learning Bagging with BP Neural Network
e probabilistic matrix factorization model and similaritybased recommendation algorithms have greatly improved the efficiency and prediction accuracy. However, due to the characteristics of the data itself, the high-dimensional sparsity and the setting of random initial value lead to the instability of the model, resulting in the large variance of the prediction score, which affects the accuracy of the recommendation.
Considering that the accuracy of the single weak learner algorithm is not high, we choose the bagging ensemble learning method. At the same time, to further improve the generalization ability of the learner, we choose the probability matrix factorization model of bagging ensemble BP neural network to effectively improve the accuracy of scoring prediction.

Algorithm ought.
Firstly, the FCM algorithm is used to initialize the sample dataset D, and the number of clusters is D 1 ′ , D 2 ′ , . . . , D G ′ . e difference is that, in order to ensure that each user and product has training samples in each sampling set, each sampling first randomly selects one of the scoring data participated by each user and product as a sample, with a total of (m + n) samples ((m + n ≪ k)), and then conducts self-help sampling on the overall training set to obtain a sampling set containing K samples.
en, for each sample set D g ′ , the BP neural network algorithm is used to train the optimal weights, and then the PMF model is used to predict the score.
For a regression task, let (x, y) be a piece of data in dataset D, where x is the eigenvector and Y is the true value. Multiple regression models are trained through the dataset, and then the features are put into the regression model to produce the corresponding predictive values Φ(x, D). e integrated prediction value is the average value predicted by multiple models on dataset D: where x is the fixed input value and y is the output value; then, D).

Complexity
Applying equation (10) and inequality EZ 2 ≥ (EZ) 2 and then applying the change of equation (12), we can get It can be seen from equation (12) that the root mean square error (RMSE) of the predicted value Φ(x) generated by the ensemble methods is less than the average value Φ(x, D) of RMSE, and the more unstable Φ(x, D) is, the greater the ensemble methods' improvement of model performance. Figure 6 shows the PMF model based on FCM and bagging-BP.

Algorithm Description.
e algorithm flow of bagging algorithm with BP neural network and PMF model is given in Algorithm 3.

Experiments
In this part, we mainly test our hypothesis through several groups of experiments: FCM clustering methods are applied to the PMF model from different aspects to achieve the purpose of prediction accuracy. At the same time, the prediction accuracy of this method is verified, and the mean error (MAE) and root mean square error (RMSE) of the prediction are reduced: where r ij is the prediction score, r ij is the actual score of the test set, and N is the number of data pieces contained in the test set. From the definition of MAE and RMSE, MAE can well reflect the prediction error, while RMSE is more sensitive to outliers with a larger error. By calculating the root mean square of the sum of the square error between the predicted user score and the actual user score to predict the accuracy, the smaller the RMSE value, the better the recommendation quality. e smaller the MAE and RMSE, the higher the accuracy of recommendation. For the models under the same evaluation method, this paper will choose the evaluation index used in the comparison model to evaluate the accuracy of scoring prediction.

Relevant Parameter Settings.
Without losing generality, we take 80% of the data as the training data according to the clustering results and then predict the remaining 20% of the recommended accuracy, and set the regularization factor of this experiment λ U � λ V � λ bu � λ bi � 0.01; the learning rate of SGD α � 0.03. e number of hidden layers of BP is 100. e datasets selected in this paper are MovieLens and FilmTrust, which are, respectively, applied to PMF, FCM-PMF, and FCM-bagging-BP-PMF models for comparison and conclusion.

Datasets Information.
is experiment is carried out on MovieLens and FilmTrust datasets, both of which contain the user's rating information of the project. e rating values are 1-5 discrete values, and the sparsity is 4.47% and 1.04%, respectively, which belong to a high-dimensional and sparse matrix. e specific information of the dataset is shown in Table 2.
is paper studies the clustering number of the model. e experiment shows that different clustering numbers have different effects on the performance of the model. In the experiment, we set the clustering number as 10, 20, 30, 40, and 50. e experimental results on the MovieLens dataset are shown in Figure 7.

Comparison of Recommendation Accuracy.
To verify the accuracy of the proposed model, the PMF model based on FCM and bagging BP is evaluated by experiments, and the results are compared with the existing models MF and PMF in two datasets. e comparison results of RMSE and MAE of different models in different datasets are shown in Table 3.
It can be seen from Table 3 Figure 7: Influence of cluster numbers on RMSE.

Conclusions
In this paper, a probability matrix factorization model based on BP neural network ensemble learning and fuzzy clustering is proposed. By using the similarity of the scoring matrix, the fuzzy clustering method is used to divide the users, which effectively solves the problem of scoring consistency; each base learner uses BP neural network to find the optimal weights and then carries out integrated processing to build a strong learner. e PMF model is built on the strong learner to improve the accuracy of the model prediction score.

Data Availability
We use the public datasets of MovieLens (1 M) and Film-Trust, and our model and related hyperparameters are provided in our paper.