Collaborative Filtering Recommendation Algorithm for MOOC Resources Based on Deep Learning

,


Introduction
At present, many universities comply with the indicators of the Ministry of Education, using the advantages of the Internet, artificial intelligence, big data analysis technology, putting forward intelligent education. In the implementation of smart education, MOOC resources are the most important component [1][2][3]. erefore, the design of a collaborative filtering recommendation algorithm for MOOC resources has very important research significance [4,5].
With the continuous development of big data technology, MOOC resource recommendation algorithms have emerged. With the help of recommendation technologies in the fields of e-commerce, tourism routes, and social networks, many excellent MOOC resource recommendation algorithms have been proposed [6,7]. Traditional courses only have dozens or hundreds of students, but a MOOC can hold more than 100,000 people at most. erefore, in the Internet, collaborative filtering is used to recommend art learning MOOC resources [8,9]. Due to the increasing volume of resource data, the existing collaborative filtering recommendation algorithm for art learning MOOC resources can only stay on the surface of the data, resulting in higher MAE values. Deep learning can automatically extract deep features. erefore, in order to solve the current problems in the process of collaborative filtering recommendation for MOOC resources, this article designs a collaborative filtering recommendation algorithm for MOOC resources based on deep learning. e remainder of this article is organized as follows. Section 2 reviews the related work. Section 3 introduces the proposed methods. Section 4 reports the experiments and results. Section 5 concludes our work.

Related Works
Nowadays, the recommendation system has been widely used in all occupations, and it is unknowingly changing people's lives. According to statistics from foreign media VentureBeat, Amazon increased its revenue by about 35% in 2003, all thanks to its introduction of an item-based collaborative filtering recommendation algorithm [10][11][12][13]. Traditional collaborative filtering methods use shallow machine learning models, unable to learn the deep characteristics of users and items. erefore, the fusion of side information for recommendation is getting more and more attention, and hybrid recommendation is becoming more and more popular. However, auxiliary information often has problems such as large scale, multiple types, inconsistent data types, and missing key data. e hybrid recommendation is facing severe challenges [14,15].
In recent years, deep learning has made huge breakthroughs in many directions of artificial intelligence, and its application in the direction of recommendation systems is still in its infancy. e recommender system branch of the International Computer Society held a symposium on the application of deep learning in the recommendation field and pointed out in the meeting that the next important direction of recommender system research would be deep learning [16]. erefore, it is of great significance to apply deep learning to the research of recommender systems [17]. e algorithm based on nearest neighbors mainly uses the method of calculating the similarity of users or items in the scoring matrix to make recommendations for users. Many scholars have improved and innovated this algorithm. A typical representative is the item-based algorithm [18], which has been successfully applied to Amazon's e-commerce system. Wang et al. [19] proposed a collaborative filtering recommendation algorithm based on matrix factorization, which has greatly improved the accuracy of recommendation. Subsequently, scholars continued to improve it and successively proposed the probabilistic matrix factorization (PMF) model [20], the SVD++ model [21], the factorization machines model [22], and so forth and achieved certain improvements to varying degrees. Hazrati et al. [23] proposed a collaborative filtering algorithm based on restricted Boltzmann machines (RBM). For the first time, deep learning is introduced into the recommendation system to learn the hidden factors of users and items. Later, some scholars made improvements on its basis, but the RBM-based collaborative filtering algorithm has many drawbacks, such as long training time and large scale of weight parameters connecting the hidden layer and the visible layer, which is difficult to implement in practical applications. Chen et al. [24] applied a deep belief network (DBN) to the recommendation system and proposed a new deep hybrid recommendation model. Before the introduction of deep learning, Liu et al. [25] proposed a topic model (CTR). It models the abstract of the article and then learns the hidden feature representation of the article, which is used for article recommendation. However, when the auxiliary information is very sparse, the hidden features obtained by the topic model for learning are insufficient and invalid. At this time, it is the turn of deep learning to show its effect. Huang et al. [26] directly used CNN and DBNs to obtain hidden factors from content information. However, it only considers the hidden factors of the item and only applies to music data. Zhao et al. [27] proposed a collaborative deep learning (CDL) model to obtain hidden features from the textual information of the project. is model uses the SDAE [28] instead of LDA, which solves the problem of insufficient learning of hidden features when the data are sparse in collaborative topic regression. e emergence of the CDL model instantly attracted a lot of attention and became a benchmark for a large number of researchers to improve and compare. In the CDL model, the author expressed the auxiliary information of the article through the bag-of-words (BoW) model. is model uses unordered words to express text and cannot dig out important information implicit in the order of words. erefore, Zhang et al. [29] used the idea of a cyclic neural network [30] in the encoding and decoding stage to improve the SDAE model into a collaborative recurrent autoencoder, which makes the correlation information between words in the auxiliary information dugout. Li et al. [31] proposed a collaborative variational autoencoder (CVAE) for the noise reduction extension SDAE. e model does not need to add noise to the input; it can better learn the hidden factors of the item from the auxiliary information. Since CDL only uses the superficial meaning of auxiliary information, Xiong et al. [32] introduced CNN in the text recommendation to mine the relevant information within the auxiliary information.
ey used word-embedding technology to replace the BoW model to represent words in auxiliary information. en, these words were concatenated to form a two-dimensional matrix.
en, convolution, pooling, and fully connected mapping were performed on this two-dimensional matrix to finally obtain the hidden factor of the item.
Under the trend of deep learning, vertical and horizontal recommendation, in which a deep learning model is adopted to mine hidden features and a collaborative filtering algorithm is adopted to combine them with different recommendation scenarios, is a trend and direction [33][34][35][36].

Spark Architecture.
e advantage of the Spark architecture is that it is more suitable for data-mining algorithms, which can search for hidden information from large amounts of data. e Spark framework includes functional components such as SQL queries, text processing, and machine learning. ese components are tightly integrated within Spark. e computational performance is better, especially in environments where information is analyzed and iterated in large quantities, and the advantages of using the Spark framework are particularly great. So this article chose the Spark architecture.
With the widespread popularity of MOOCs, a large number of MOOC resources can be searched on many Internet social platforms. If you want to search for the information you need in massive resources, you need to use the collaborative filtering recommendation algorithm under the Spark architecture, which is achieved by establishing the user's own search history model, recording the user's preferences and interests, and actively providing relevant MOOC push. From the point of view of the collaborative filtering algorithm, similar users will have similar tastes. erefore, the preferences of similar users can be used to make recommendations for the target users. e algorithm flow is shown in Figure 1. e Spark framework includes functional components such as SQL query, text processing, and machine learning. ese components are tightly integrated into Spark. Its computing performance is better, especially in the work environment of mass analysis and iteration of information; the advantages of using the Spark framework are more prominent. When the Spark architecture is running in a cluster, the driver first completes the resource application through the resource manager. After the manager allocates resources, the Executor is started on the corresponding node. After the node completes the task submitted by the driver, it finally sends the request to the driver. e program submits feedback.

MOOC Resource Score Prediction Based on Dual-Channel CNN.
e input of the recommendation system based on deep learning is generally the relevant information between the MOOC classes, and the deep neural network model will automatically learn the implicit representations between the MOOC classes and recommend courses for users based on the Cain representation. A basic deep learning recommendation system includes an input layer, a modeling layer, and an input layer. In the modeling layer, most of the deep learning models used include RBM, convolutional neural networks, and recurrent neural networks. In the output layer, the deep network model learns a highly abstract representation between MOOCs and then generates a project recommendation list through steps such as inner product, Softmax conversion into probability values, and similarity calculation and ranking. e traditional matrix decomposition method only uses the scoring information, so it only learns the representation of the MOOC. In addition to scoring information, this model also uses other additional information. erefore, in addition to the representation learning between classes, this model also explicitly learns the metapath-based context representation between the user and the course. As can be seen in Figure 2, this model is mainly composed of two modules, one of which is embedding vector learning based on the context of metapaths. e second module is to introduce the Laplacian matrix into the prior distribution of the hidden factor feature matrix, and the relational network information is effectively integrated into the model.
Path instances of different metapaths are input to the hierarchical neural network to learn their low-dimensional vector representation. e path instance vectors of all specific metapaths are pooled to obtain the low-dimensional vector representation of the metapath itself. e context based on metapaths aggregates different metapath information, and different MOOCs may have different preferences for different metapaths. In order to capture this preference drift, the model introduces an attention mechanism. Attention mechanism has been widely used in the field of natural language processing to learn the importance of different words or sentences. e introduction of an attention mechanism can not only produce better performance but also improve the interpretability of recommendation results. e model in this article can effectively integrate user scoring information, the content of the MOOC, and network information. In addition, the characteristic expressions learned can be circulated among the MOOC's relationship network so that the characteristic expressions can be more precise and can more accurately describe the implicit characteristic vectors in the MOOC. By introducing the Laplace matrix into the prior distribution of the social hidden factor matrix, the relational network information is effectively integrated into the model.
It can be seen from the generation process of the model that the model in this article successfully integrates the course feature expression vector obtained through deep learning, the user's rating matrix β, and the network matrix α, so that the feature representation is more accurate and the MOOC can be described more accurately. e recommended framework of this model is shown in Figure 3. First, it is necessary to preprocess the training dataset, convert the user's collection records of the courses into the user-rating matrix, S_matrix, and use the BoW model to express the title and summary information of the MOOC as the content matrix, C_matrix, and the reference relationship between the MOOCs as social adjacency matrix, J_ matrix. en, these three kinds of information are fused, and two neural networks with the same input and different output are trained simultaneously. e user implicit eigenmatrix and MOOC implicit eigenmatrix are obtained, and the predicted results are finally obtained. e current calculation formulas for the similarity between MOOC resource A and MOOC resource B mainly include cosine similarity, Pearson correlation coefficient, and constrained Pearson correlation coefficient. eir calculation formulas are from formulas (1)-(3). e selection formula (3) of the MOOC resource collaborative filtering recommendation algorithm proposed in this article calculates the similarity between MOOC resource A and MOOC resource B.
Among them, FA represents the predicted feature score of MOOC resource A, and n represents the number of MOOC resources.

MOOC Recommendation Algorithm Based on Word-Embedding Vector.
In recent years, word-embedding vectors have been widely used in many applications of natural language processing, making the training of the model of an endto-end overall process instead of a traditional pipeline. It does not rely on feature engineering and greatly improves the performance of the system. rough the word-embedding model, the long text is mapped to another space by function G; namely, G: W-> Wm, where W is a dictionary composed of words in the review text or description text and Wm is the m-dimensional vector mapped by the function G. is article uses this representation technology to mine the semantics of the review text and MOOC description text. In the input layer of the model, the review text and MOOC description text are, respectively, represented as a matrix of word-embedding vectors so that their semantic information can be learned. Specifically, all comments of user a are divided into a single document w, which contains a total of n words. en, a word vector matrix E is constructed for user a; the rule is as follows: Among them, variable w a 1 represents the first word in the document w. rough matrix E, the order of words can be maintained.
After a series of operations on the CNN layer, you can learn the hidden features of users and MOOCs. However, these two features come from the comment content and the description text, respectively, and are not in the same feature space, so it is impossible to perform factorization and other operations.     Complexity erefore, it is necessary to use a shared structure first and merge the two into the same feature space before subsequent processing can be performed and the shared layer has emerged.
First, a separate correlation vector ε � (ε 1 , ε 2 ) needs to be constructed to connect the user implicit feature ε 1 output by the CNN model with the MOOC implicit feature ε 2 . en, the model-based hidden factor model is used in the recommendation system to model the association vector u and train the final predictive score. For a given training sample, the loss function is shown in Among them, the variable actual is the actual value of the score, variable ε 0 is the overall bias term of the entire model, and variable ε 1 is the weight of vector u 1 .

Datasets.
A large number of MOOC resource experiment objects are selected, and they are divided into ten categories. e number of MOOC resources in each category is shown in Table 1.

Experimental Parameter Settings.
In addition to MOOC content information, the model can also make recommendations using relational network information. If the scoring parameters α and β are set to 1, it means that this model only uses MOOC scoring information or only integrates network information for recommendation. For other values of β and α, it means that the model proposed in this article combines scoring information and network content information at the same time. Figure 4(a) shows the effect of different values of parameter α on the recall rate when β is fixed. is article sets β � 10. It can be seen from the figure that the value of β is relatively sensitive to the impact of recommended performance. As the value of α increases, the recommendation performance also gradually increases. When the value of α is 10, the model in this article achieved the highest recall rate and then began to decline. When the value of α exceeds 10 or more, the recommended performance is significantly worse. e reason is that the excessive alpha value causes the MOOC resources in the mutual relationship to be too close to each other, thus making the prediction results false. Figure 4(b) shows the effect of parameter β on the recall rate when α is fixed. is article sets α � 20. It can be seen from the figure that the value of β is not sensitive to the impact of recommended performance. As β value increases, the recommendation performance also slowly improves. When the value of β exceeds 20, the recommended performance begins to decline slowly. is is because, for very small β, the model in this article is approximately equivalent to CDL. As the value of β increases, the model in this article also incorporates more network information to improve recommendation performance. An excessively large β value indicates that the model has a serious tendency to relational network information. e parameters in Figure 5 are all trained ten times, and each training uses a fivefold cross-validation method. e average MSE corresponds to the average MSE of the ten experiments and the number of iterations is the number of iterations required for the current model loss value to stabilize. It can be seen from Figure 5(a) that the best effect can be achieved when the word vector dimension is 100. e reason is that as the dimension of the word vector increases, the word vector in the high-dimensional space changes from dense to sparse, which weakens the association between words. In Figure 5(b), the number of convolution kernels from one to five layers is 16.
Studies have shown that the best results can be achieved by using a 3-layer convolution module. e reason is that the number of convolution parameters used in the first layer is the largest, the number of dimensionality reductions is small, the output matrix is large, the text feature extraction is not sufficient, the final fully connected layer has many neural units, and the training speed is slow. Under the 5-layer convolution module, the features extracted by the convolution module are too abstract. It can be seen from Figure 5(b) that the effect of using a 3-layer convolution module is the best, so the comparison experiments with different numbers of convolution kernels here use a layer convolution structure. From Figure 5(c), the optimal number of convolution kernels is 16. e analysis found that using four and eight convolution kernels, the abstract feature ability is limited, and the convolution kernel is not fully used to extract the deep features between the user and the MOOC. When using 32 and 64 convolution kernels, the extracted features are too detailed and noise is abstracted, resulting in overfitting and long training time.

e Impact of Similarity Function and Negative Sample
Ratio on Recommendation Performance. e similarity function can be used to measure the degree of similarity between MOOC classes. Figure 6 compares the impact of different similarity functions on the recall rate. Cos means using cosine similarity as a similarity function. Pearson means using the Pearson correlation coefficient as the similarity function. Constraint Pearson means using the constrained Pearson correlation coefficient as the similarity function. It can be seen from the figure that the recommended performance of the model using the similarity function constraint Pearson is the best. It can be seen that the similarity function will affect the recommendation performance, and constraint Pearson has the best performance. e objective function of the recommendation system can be divided into two categories: point-by-point and pairwise.  Number  MOOC resource name  Quantity  1  Language  2000  2  Legal  1500  3  Biological  1000  4  Computer  1000  5  Art  1000  6  Building  300  7  Logistics  200  8  Mathematics  1500  9 Materials science 500 10 Chemistry 800 6 Complexity

Complexity 7
Compared with the paired objective function, the point-bypoint objective function is more free to choose the proportion of negative samples. In order to clarify the influence of the negative sample ratio on the recommended performance, this article does the following experiment to capture the relationship between the negative sample ratio and performance. Figure 7 shows the performance of the model with different negative sample ratios under different datasets and different indicators. It can be seen from the figure that when the proportion of negative samples is below four, the performance will be greatly improved as the proportion of negative samples increases. However, when the proportion of negative samples exceeds four, as the proportion of negative samples increases, the performance will increase, but the magnitude is relatively small. In addition, the dataset size is N times the positive sample, where N is the proportion of negative samples plus one. erefore, when the positive samples are large, the proportion of negative samples with large sampling will cause the training time to increase exponentially. In order to balance performance and time complexity, choosing a small negative sample ratio will be the best choice. erefore, the best choice for a negative sample ratio is between six and eight.

Comparison and Analysis of the Accuracy of MOOC
Resource Recommendation. In the above experimental environment parameters, the algorithm based on the cloud platform, the traditional algorithm based on shallow machine learning, and the algorithm in this article are, respectively, used to conduct experiments. e experimental results are shown in Table 2.
e experimental results show that, with the decrease of training set and the increase of test set, the MAE and RMSE values of the three algorithms are all decreasing, and the accuracy is constantly improving. e MAE and RMSE values of the algorithm proposed in this article are lower than the other two algorithms in each proportion, and the accuracy is higher than the other two algorithms. e traditional algorithm adopts the shallow machine learning model, unable to learn the deep features of users and items.
is indicates that the Spark-based collaborative filtering recommendation algorithm for MOOC resources of art has higher recommendation accuracy and better performance.
In order to verify the stability of the model in this article, the dataset was divided into fivefold cross-validation in the experiment, and 50 rounds of experiments were conducted. e results are shown in Figure 8. e loss value in the 50round experiment is negatively correlated with the number of training rounds, there will be no large-scale jitter in the overall training process, and the lowest learning rate can be reached with a large learning rate. Experimental results show that the model in this article has good stability.
Select the resource collaborative filtering recommendation algorithms of literature [19], literature [23], and literature [27] to conduct comparative experiments. For the same dataset, their MOOC resource recommendation accuracy is shown in Figure 9.

Complexity
A comparative analysis of the recommendation accuracy of MOOC resources in Figure 9 shows that the accuracy of MOOC resource recommendation in this model is much higher than that of literature [19], literature [23], and literature [27]. It reduces the error of MOOC resource recommendation.    Recommended accuracy Mooc resource number Literature [19] Literature [23] Literature [27] This paper

Conclusion
With the continuous development of big data technology, MOOC resource recommendation algorithms have emerged. e system analyses students' learning interests based on the students' learning history and related materials. Foreign countries pay more attention to the research of MOOC resource recommendation. ey have proposed many excellent MOOC resource recommendation algorithms with the help of recommendation technologies in the fields of e-commerce, tourism routes, and social networks. However, the current recommendation performance of these algorithms is still poor. erefore, based on the Spark architecture, this article proposes a collaborative filtering recommendation model based on deep learning for art education resources. is model is mainly composed of two modules, one of which is embedding vector learning based on the context of metapaths. e second module is to introduce the Laplacian matrix into the prior distribution of the hidden factor feature matrix, and the relational network information is effectively integrated into the model. Compared with the traditional model using the scoring matrix, the model using the text word vector effectively alleviates the impact of data sparsity and greatly improves the accuracy of prediction. After analyzing the experimental results, compared with other algorithms, the resource collaborative filtering recommendation model proposed in this article has achieved better recommendation results, with good stability and scalability.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e author declares no conflicts of interest.