An Improved Recommendation Method Based on Content Filtering and Collaborative Filtering

,


Introduction
With the rapid development of Internet technology, there are more and more information on the Internet, making it difficult for users to select the information they are interested in. For this reason, a personalized recommendation system came into being to recommend relevant information for users from the Internet [1][2][3].
At present, personalized recommendation technologies are mainly divided into two types: collaborative filtering [4] and content-based [5]. Collaborative filtering recommendation technology can be divided into user-based and itembased recommendation technology [6,7]. User-based collaborative filtering recommendation technology predicts item ratings based on the ratings of other users to generate item recommendations [8]. However, its recommendation quality is easily affected by the sparseness of user evaluation data. e content-based recommendation technology is to analyse the characteristics of the item content information and calculate the matching degree with the user's interest to recommend items [9]. erefore, compared with collaborative filtering recommendation, content-based recommendation is less dependent on scoring data. However, it has high requirements for the structure and feature extraction of item information, and the recommended items are usually frequently recommended items, which cannot adapt to the recommendation of new items. In view of the respective shortcomings of the two recommendation technologies, some scholars combine the two technologies. Literature [10] used a weight to integrate the scores based on collaborative filtering and content recommendation and make recommendations, to play their respective advantages. However, it needs to adjust this weight value, and no reasonable adjustment mechanism is given. Literature [11] first used contentbased recommendation technology to predict user ratings and builds an initial prediction error matrix. en, collaborative filtering is used to supplement and perfect the values in the matrix and finally make a final prediction score based on this matrix. Literature [12] first generated multiple preliminary recommendation items through collaborative filtering. en, the initial recommended item set is deleted through content recommendation technology, and the most relevant recommended items are finally obtained.
To solve the above problems, this paper proposes a fusion recommendation method based on content and collaborative filtering. In this paper, the user's existing interest and potential interest are fused to obtain a user interest model that is both personalized and diverse. By calculating the similarity between the marketing content and the fusion model, a user rating set combining the characteristics is constructed. More accurate recommendation information can be obtained. e main contributions of this paper are as follows: (1) is method improves the traditional content-based method to obtain the user's existing interest and obtains the user's potential interest through collaborative filtering of feature words. (2) Moreover, the user's existing interest and potential interest are merged to obtain a fused user interest model. e fusion model is used to calculate the similarity of the candidate marketing content and recommend content that may be of interest to different users.
(3) Compared with the previous method, this article takes into account the diversity and personalization needs of users' browsing products and effectively avoids the time lag of the hybrid recommendation method.

Related Works
e emergence of e-commerce personalized recommendation is the first realization in the late 1990s with the rapid development of e-commerce. In recent years, the development and innovation of this technology have also been continuously improved.
is development and improvement have greatly subverted consumers' traditional consumption patterns, network marketing, and application patterns.
In order to achieve better recommendation effect, many scholars have been focusing on various personalized recommendation design schemes for many years. Among them, the collaborative filtering-based personalized recommendation technology is the most widely used in many personalized recommendations. Kastner et al. [13] developed a system based on collaborative filtering technology, whose main purpose is to filter emails. Goyani et al. [14] developed Group Lens, which is mainly used for collaborative filtering in newsgroups. Its success has greatly promoted the rapid development of collaborative filtering technology in personalized recommendations. Li et al. [15] applied collaborative filtering technology to build a movie recommendation website Movie Lens. At present, many people use this data set to test and analyse their own algorithms. Tan and He [16] proposed to apply the principle of collaborative filtering to product recommendation. Jiang et al. [6] proposed to introduce the trust model into the collaborative filtering algorithm to increase the accuracy of recommendation. Since then, more and more scholars have proposed various personalized recommendation algorithms based on collaborative filtering, such as personalized recommendation based on adaptive collaborative filtering [17], collaborative filtering based on social psychology [18], collaborative filtering personalized recommendation based on trust awareness [19], and so on.
Currently, collaborative filtering algorithms have been widely used. However, due to the very small proportion of the number of goods purchased to the total number of goods, coupled with the rapid development of the Internet, the number of users and commodities in the recommendation system is very large. e user-project scoring matrix is not only high dimensional but also sparse. is leads to problems such as low timeliness, low precision of recommendation, and cold start of new projects [20]. Aiming at the problem of low solidity, Borlea et al. [21] proposed to solve the problem of "local optimization leading to the sensitivity of initial cluster centres in the k-means partitioning clustering algorithm. Bhattacharjee and Mitra [22] aimed at k-means partitioning.
e algorithm can only query the problem of clusters of balls. It is proposed to combine the k-means partition-clustering algorithm with the densitybased clustering algorithm. Huang et al. [23] proposed to rely on dynamic search to determine the value of k autonomously. Nevertheless, this scheme has one drawback that is it is not easy to judge normal clusters and abnormal clusters during the clustering process and is prone to deviation. Ren et al. [24] proposed an improved k-means algorithm, which overcomes the k-means division of clustering. e attribute of the algorithm data is limited. Idrees and Al-Yaseen [25] proposed an algorithm based on the genetic algorithm to find the initial clustering centre. Kolaja et al. [26] proposed multiple subset solutions in the data set for the problem of "local" optimization. Aiming at the problem of data sparsity, Wu and Li [27] introduced dimensionality reduction based on singular value decomposition to optimize data sparsity and decomposed the high-dimensional user-item rating matrix into a low-dimensional orthogonal matrix to better solve the problem of sparsity. Richa and Bedi [28] proposed to use the most frequent score value to predict and fill the sparse score matrix to alleviate the impact of data sparsity.
In response to these problems, some studies have combined the two technologies to propose a hybrid recommendation technology, and related research results have been shown [29,30]. e hybrid recommendation technology has higher recommendation accuracy than the previous two recommendation technologies.

Improved Network Marketing Recommendation Algorithm Based on Content Filtering and Collaborative Filtering
3.1. Recommended System. A complete recommendation system follows the data input-algorithm processing-data output model, which is mainly composed of input module, recommendation algorithm module, and output module. Each module has its specific function and role and cooperates with each other to complete the recommendation. Its architecture is shown in Figure 1.
(1) Data Layer. e main function of the module is to make full use of different channels to collect and update user information and to provide a channel for the recommendation system and user interaction, which is the basis of the entire recommendation system. Data sources are mainly divided into individual customers and community groups, and feedback information is divided into displayed information and implicit information. (2) Logic Layer. e module is the core part of the entire recommendation system. e task is to analyse and process the information collected by the data layer module.
e function of the module is to return recommended products to the corresponding users after forming recommended results in different forms. Timeliness and friendliness must be achieved, while ensuring the diversity of output methods. Figure 2 shows the construction process of the fusion model proposed in this paper. Since EUIM is derived from the direct behaviour of users, its characteristic words are texts traditionally read or written by users, reflecting users' interests and preferences. PUIM uses collaborative filtering to extract feature words that are followed by similar users but the target user has not paid attention to, thus reflecting the potential interest of the target user. FUIM is the result of the integration of the two, which can take into account both the existing and potential interests of users.

Existing Interest Model Construction.
Given the product set S � s 1 , s 2 , . . . , s p and the main feature word sequence M � m 1 , m 2 , . . . , m q , then the product s i can be expressed as a vector space model s i corresponding to the main feature word sequence a � (a i1 , . . . , a ij , . . . , a iq ). Among them, a ij represents the weight of the feature word m j in the product s i and a ij � 0 means that the feature word m j has not appeared in the product s i , so the entire product set can be expressed as a weight matrix: is article uses the TF-IDF notation. Due to the large gap between the lengths of the product text, some text content is particularly large and some have a few words. In order to prevent long text terms from getting higher weight, formula (2) is used to calculate a ij : . (2) Among them, count(i, j) is the number of times the feature word j appears in product i, max count(i, j) is the maximum number of times other feature words appear in product i, N is the total number of products, and Num(j) is the appearance. j is the number of products passing feature word.
User interests usually change over time. In this regard, this article proposes a time-weighted EUIM calculation method, which is combined with the time the user clicks to generate EUIM.
Suppose user u browses the product set Su � (su 1 , . . . , su l ), where the time of browsing product sui is t sui and the current time is t, then the time influence factor λ of product sui on user u is defined as follows: where Su is a subset of S and its weight matrix is as follows:

Potential Interest Model
Construction. PUIM is different from EUIM in which it cannot be directly obtained from the user's previous browsing content. Because of the large volume and variety of product updates and different fields, there are commodity hot spots that may have a sensational effect. erefore, the recommended list should not only include products that are of interest to users but also products that include potential interests of users. In this regard, this paper proposes the use of collaborative filtering methods to recommend the interests of similar user groups to target users to express the potential interests of target users.
Suppose user u browses the product set Su � (su 1 , . . . , su l ), EUIM su � (a 1 u 1 , . . . , a 1 u l ), user v reads the product set Sv � (sv 1 , . . . , sv k ), Su and Sv are all subsets of S, EUIM sv � (a 1 v 1 , . . . , a 1 v k ), then the behaviour similarity of users u and v is shown in formula (5). Between the formulas are operations between matrices.
Among them, U(i) represents the set of users who have viewed the product s i . e content similarity of user's u and v is shown in the following formula: Combining equations (5) and (6), this paper proposes a formula for calculating the hybrid similarity as shown in the following equation: Among them, the coefficient α is a weighting factor determined by experiments, which is a similarity ratio parameter, and its value range is [0, 1]. When α � 0, the similarity calculation only considers content feature data. When α � 1, the similarity calculation only considers behaviour characteristic data. e behaviour similarity and content similarity of the two user's u and v are calculated and then the weighting factor α is used to combine the two similarities to obtain a mixed user similarity.  4 Complexity e similarity between the target user and all other users is obtained through the above formula. e h users are selected with the greatest similarity to the target user as the similar user group of the target user, and collaborative filtering is used to recommend the characteristic words that are of interest to the target user to obtain the PUIM of the target user.
Suppose the similar user group is u � (us 1 , us 2 , . . . , us j , . . . , us f ), the similarity between user u and any user us i in the similar user group is similar(u, us i ), EMIM us i of us i � (a 1 us i1 , a 1 us i2 , . . . , a 1 us ij , . . . , a 1 us if ), equation (8) Among them, Mj represents the jth feature word in the PUIM of user u and n represents the number of feature words.

Build a Fusion Model.
After obtaining the EUIM and PUIM of the target user, the weights of the feature words of the two interest models are combined to obtain the FUIM of the target user. en, the similarity between the main feature word weight vector of the candidate product and FUIM is calculated.
Clustering operations are performed on the above similarities to form clusters and determine cluster centres. e algorithm flow chart is shown in Figure 3. It can be clearly seen from Figure 3 that the new fusion algorithm is based on the user-feature preference matrix for clustering search. e projection characteristics derived from the fusion model, such as project characteristics and user-project rating, are intermediate results. ese results will be further used in the clustering algorithm. e farthest distance principle represents the L2 distance. User preference means that when user likes product, the score for product will be higher. en, when looking for the nearest neighbour users, the user who also has a higher score for product is given priority to become the nearest neighbour.

Experimental Data.
e data set provided by Movie Lens website is used as the experimental data of the improved algorithm proposed in this paper for serial analysis and comparison. e Group lens research team has published three data sets of different scales, which are as follows: (1) e first data set includes one hundred thousand ratings of 1682 movies from 943 users (2) e second data set includes 1 million ratings of 3900 movies from 6040 users (3) e third data set includes 100000 label records and 10 million ratings of 10681 movies from 71567 users e data set is collected by the Movie Lens website, in which each user has scored at least 20 movies he has watched, and the score is between one and five. e higher the score, the more the user prefers the movie. e data set mainly includes three data tables, namely: rating data table,  user data table, and movie data table, and the composition of  each table is as follows. For the Movie Lens datasets, we use the entire dataset for testing. is paper randomly selects 11560 scores of 1682 movies from 100 users as the data basis of the experiment and randomly hides about 10% of the scores in the data set to form a test set and the remaining about 90%. e scoring data are used as the training set. According to the needs of the test, 50 users were randomly selected from the training set three times to obtain four training sets with different sparsity. Based on this, the score of each user's hidden movie was predicted. e specific distribution of the experimental data set is shown in Table 1 and Figure 4. is article selects precision, recall, and hybrid similarity for evaluation. e calculation formulas of precision and recall are as follows:

Complexity
Among them, TP is true positive, FP is false positive, and FN is false negative. e calculation formula of hybrid similarity will be explained in the following chapters.

Existing Interest Model with Time Weight.
e user interest model is constructed through the traditional content-based recommendation method, and the similarity between the obtained user interest model and the candidate product is calculated to obtain the recommendation list. e recommendation list is compared with the user's actual browsing records in the test data, and the precision and recall that use traditional user interest model construction methods to generate recommendation results are obtained. en, the method proposed in this paper is used to build an existing interest model, calculate the similarity with the candidate products, and get a recommendation list. e recommendation list is compared with the user's actual browsing records in the test data to get the precision and recall of the proposed method. Figure 5 shows the traditional user interest model and the time-weighted existing interest proposed in this paper when the number of recommended products is 5,10,15,20,25,30,35, and 40, respectively. Precision and recall of the recommended results are obtained by the model.
It can be seen from Figure 5 that the existing interest model with time weight proposed in this paper is compared with the user interest model constructed by the traditional content-based recommendation method to directly generate recommendation results. Recommendations are generated by the existing interest model with time weight. e results in precision and recall indicators are better than the recommendation results generated by the user interest model obtained by direct weighted average in the traditional content-based recommendation method.
is proves the validity of the existing interest model with time weight.

Mixed Similarity.
e traditional collaborative filtering method only uses the user's behaviour similarity to find similar user groups and then recommends similar user groups to the target users to browse. However, the target user has not browsed the product. is paper proposes a hybrid similarity representation based on the original behaviour similarity. e content similarity between users is compared using the user's existing interest model. By mixing the behaviour similarity and content similarity, the hybrid similarity calculation is obtained.
For recommendations using collaborative filtering, the selection of the number of similar users is very important. If there are too few similar users, the resulting  6 Complexity recommendation results are easily affected by the personal preferences of similar users. If there are too many similar users, many users with very small similarity to the target user are also included in the similar user group, which will interfere with the calculated user interest. erefore, it is first necessary to find the optimal number of similar users through experiments. At the same time, when using mixed similarity, the value of the mixed parameter α also needs to be determined through experiments. In the experiment, first α is fixed to 0, 0.5, and 1, respectively, and the initial value of the number of similar users is set to 10 to generate a recommendation result and calculate the F-measure of the recommendation result. en, with an increment by 10 people in turn, the F-measure of the recommended result is calculated again and so on to find the number of similar users when the F-measure obtains the extreme value. Figure 6 shows the F-measure of recommendation results when the number of product recommendations per user is 15, α is equal to 0, 0.5, and 1, and the number of similar users is 10, 20, 30, 40, 50, 60, 70, and 80.
rough the comparison of the experimental results in F-measure in Figure 6, 60 is taken as the best number of similar users. en, the optimal number of similar users is fixed, and α is set as 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9, respectively, to calculate the F-measure of the recommendation result. Table 2 shows the F-measure of recommendation results with 60 similar users and different α values.
rough the comparison of the experimental results of F-measure in Table 2, it can be seen that the optimal α is 0.7.
After determining the optimal number of similar users and the value of the mixed parameter α, the behaviour similarity used in the traditional collaborative filtering method is directly used to find similar user groups, and products that similar users browse but have not browsed by themselves are recommended as the recommendation results to the target users. en, the hybrid similarity proposed in this paper is used to find similar user groups and recommend products that similar users browse but have not browsed by themselves as recommendation results to target users and get the precision and recall of the recommendation results. Figure 7 shows that when the number of recommended products is 5,10,15,20,25,30,35, and 40, the behavioural similarity and mixed similarity of similar user groups are directly sent to the target recommend products that similar users browse but have not browsed themselves as precision and recall as the recommendation results.
It can be seen from Figure 7 that the hybrid similarity proposed in this paper and the behavioural similarity of the traditional collaborative filtering method are used to find similar user groups, and the products that similar users browse but have not browsed themselves are directly recommended to the target as the recommendation results. From the perspective of accuracy and recall rate indicators, the hybrid similarity proposed in this paper is better than the  Complexity recommendation results of behaviour similarity used in the traditional collaborative filtering method. is proves the effectiveness of the hybrid similarity calculation.

Fusion Algorithm.
e existing interest model and the potential interest model are fused to obtain a fusion interest model, and the similarity between the fusion interest model and the candidate product is calculated to obtain a recommendation list. e recommendation list is compared with the user's actual browsing records in the test data, and the precision and recall of the fusion method are obtained. Figure 8 shows the precision and recall of the recommended results by comparing the fusion method proposed in this paper with literature [10], literature [12], literature [19], and literature [21] when the number of recommended products is different.
It can be seen from Figure 8 that the fusion method and comparison method proposed in this paper are used to generate the recommendation list, and the recommended results obtained are on the precision and recall indicators. e fusion method and comparison recommendation method proposed in this paper have better results.
Finally, this paper uses literature [26], literature [27], and literature [28] as the baseline to compare with the fusion method proposed in this paper. e F-measure and diversity of the four methods are compared, respectively, to illustrate the effectiveness of the method proposed in this paper. Figures 9 and 10 show the F-measure and diversity of the recommended results obtained by the fusion method proposed in this paper and the other three methods when the number of recommended products is different. Figure 9 shows the F-measure of the recommended results under different methods. It can be seen that the method proposed in this paper has a significant improvement in the recommendation performance of the method in literature [27] and literature [28], which is comparable to literature [26]. Figure 10 shows the diversity of recommendation results under different methods. It can be seen that the method proposed in this paper is basically equivalent to the diversity of literature [26] and literature [28] and has a significant improvement in diversity than content-based recommendations. e method proposed in this paper finally uses the constructed fusion interest model to perform similarity matching with candidate products to generate recommendation results, and there is no cold start problem. erefore, the actual recommendation performance of the method proposed in this paper is better than the traditional collaborative filtering recommendation, based on content recommendation and hybrid recommendation methods.

Conclusion
e rapid popularity of the Internet has enabled online marketing to integrate into the lives of modern people, greatly changing the way users shop in the past and providing users with the convenience of shopping without going out. However, with the continuous expansion of the scale of e-commerce, its structure is becoming more and more complex, users are not familiar with the massive amount of product information, and merchants have lost contact with users. e wide application of the online marketing recommendation system has alleviated many problems such as "information overload" and "information trek" and enabled users to have more and better online shopping experience. It has become indispensable to help e-commerce successfully implement online marketing. At the same time, various types of online marketing recommendation systems are also facing many challenges, such as new user issues based on content filtering, data sparseness of collaborative filtering, and cold start issues. To solve the above problems, this paper proposes a fusion recommendation method based on content and collaborative filtering.
is method improves the traditional content-based method to obtain the user's existing interest and obtains the user's potential interest through collaborative filtering of feature words. In addition, the user's existing interest and potential interest are merged to obtain a fused user interest model. e fusion model is used to calculate the similarity of the candidate marketing content and recommend content that may be of interest to different users. Experiments show that the method proposed in this paper achieves better results than traditional content-based methods in terms of accuracy, recall, and diversity, which shows the effectiveness of this method.

Data Availability
Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Consent
Informed consent was obtained from all individual participants included in the study references.

Conflicts of Interest
e authors declare that they have no conflicts of interest.