A Recommendation Approach Based on Heterogeneous Network and Dynamic Knowledge Graph

,


Introduction
In recent years, the continuous development of the Internet and big data has led to the rapid growth of network resources.Information explosion and knowledge scarcity coexist.How to quickly fnd resources which match users' needs from massive data has become a hot topic.To provide users with the most appropriate resources, recommender systems have been popular in various scenarios, such as news recommendations [1,2] POI location recommendation [3,4], recommendation of goods [5], and learning resource recommendation [6].
Cold start [7,8] and data sparsity [9,10] are the main problems faced by recommender systems.Te applications of context information, social networks, hybrid algorithms, and other methods have greatly addressed traditional cold start and data sparsity problems.Some studies added auxiliary information to the knowledge graph (KG) to achieve accurate recommendations [11,12].Some studies utilized RippleNet to extract user characteristics and expand user preferences [13].In order to represent the potential preference behavior of users more accurately and comprehensively, how to make full use of auxiliary information needs to be studied further.
In addition, popular items are usually overrecommended in recommender systems, which will lead to various data bias problems for items.Data bias includes exposure deviation, selection deviation, popularity deviation, circular deviation, and consistency deviation.Te existence of data bias makes popular items overrecommended, and items that may be of interest to users are ignored.Tis leads to a decrease in the freshness and a lack of diversity of recommendations, ultimately having a negative impact on users and product providers.Terefore, it is very important to properly mitigate and handle the bias problem [14,15].For both selection bias and exposure bias, common solutions include uniform data, inverse tendency score, heuristic confdence weight, and sampling.Tese methods are efective in improving the precision of the recommender system, but the accuracy is insufcient.Moreover, these methods depend heavily on expertise.
In this study, heterogeneous network graphs and dynamic knowledge graphs are designed for multidimensional user feature extraction [16].Heterogeneous network is a special information network which contains multiple types of nodes [17][18][19].Te node type of heterogeneous network can be multimodal, so the heterogeneous network can retain more comprehensive semantic and structural information, which helps capture the implicit relationship between items and reduce the dependence of recommender systems on rating data.Heterogeneous network can help complete the user preferences and user relationships, and efectively address the cold start and data sparsity problems.Te completion of entities and entity relationships based on heterogeneous networks will help reduce the impact of exposure bias [20,21] and selection bias [22,23].
In addition, most of the existing research focuses on user's long-term preferences.User's long-term preferences are usually related to their inherent characteristics and can be extracted from a large number of users' behaviors in a certain period of time.However, with the rapid development of Internet, the trend of popularity has changed greatly.User preferences are often changing due to public opinion, frequent online communication, and other unexpected events.Terefore, the short-term preferences of users also provide potential possibilities to develop longterm preferences.If we pay attention to the short-term preferences of users, the recommendation results can be more accurate and diversifed.
In order to extract user features and mine potential user preferences, attention mechanisms are widely used in recommender systems [24][25][26].Graph attention network (GAT) is diferent from some previous graph neural networks based on spectral domain.It can aggregate neighbor nodes through the attention mechanism to achieve adaptive distribution of weights of diferent neighbor nodes.It has advantages such as high efciency and portability [27].For example, the graph attention network can complete the relationship weight between diferent nodes and improve the accuracy of recommendations according to users' intimacy and the interaction behavior of users participating in different activities [28,29].
Based on the above studies, we propose an approach to build heterogeneous networks for recommendations.First, a heterogeneous network diagram with multimodal nodes is established based on cross-domain multi-platform information, from which the implicit preference of users can be extracted.Data extraction is performed on heterogeneous multimodal node graphs to construct basic knowledge maps.Second, the timeliness of user preferences is considered, and a time warehouse triggering mechanism is set up.At the same time, the graph attention network component is used to extract the short-term characteristics of users.Finally, according to the attention weight function, the improved RippleNet is used to calculate the click probability [22].Tus, the accuracy and diversity of recommendations could be further improved.
Te rest of this paper is organized as follows: Section 2 introduces the research of heterogeneous networks, graph attention networks (GAT), and RippleNet.Section 3 details the proposed recommender system based on dynamic knowledge graphs in heterogeneous networks.Section 4 carries out experiments and analyzes the experimental results.Section 5 summarizes the research of this study and introduces the future work.

Related Work
Tis section focuses on the study of feature extraction and relationship enhancement for users and items in recommender systems and summarizes the research of heterogeneous networks, graph attention network, and RippleNet.Te motivation of our study is also introduced.

Construction and Application of Heterogeneous Networks.
Tere is much research focusing on cold start and data sparsity of recommender systems, such as using knowledge graphs, deep learning, and hybrid recommendation methods.To address the problem of selection bias [30,31], the most common scheme is data flling and tendency score.Exposure deviation [14,20] can be addressed by using heuristic weight, uniform data, and negative sample sampling methods, but these methods depend too much on human's professional experience and they are insufcient in high recommendation accuracy.
Heterogeneous networks can not only integrate diferent types of objects and their interactions but also integrate information from heterogeneous data sources [32][33][34][35].Heterogeneous network has become an efective information modelling method for it contains multiple types of nodes and multiple types of edges [36].Wang et al. proposed an extensible dimension recommendation model based on heterogeneous network in view of the current lack of label models that simultaneously consider multidimensional information [37].Teir method can be applied to diferent dimension label data to recommend labels for users at the same time, but its real-time performance and recommendation efciency need to be improved.Shi et al. proposed a recommendation method based on heterogeneous information network (HIN) for heterogeneous network embedding [17].Tey aimed at the difcult modelling of complex auxiliary information and the problem of data cold start.Teir method efectively uses the auxiliary information in heterogeneous networks and designs a random walking strategy based on meta path to obtain a more 2 International Journal of Intelligent Systems meaningful network embedded node sequence.However, the method needs further improvement in the efectiveness of auxiliary data selection.Hu et al. proposed a deep neural network model of common attention mechanism based on heterogeneous information networks to solve the impact between meta paths and related user pairs in interaction [38].However, the weight of edges in heterogeneous information networks is set as the same, and the diferent weights of diferent types of edges are not taken into account, which makes the recommendation result not ideal.Many studies also proposed solutions based on heterogeneous information networks in the feld of recommendation, but these models failed to efectively consider the diferences between diferent meta paths [39][40][41].
Te above research studies completed the missing information to a certain extent by establishing various heterogeneous graphs and also improved the accuracy of the recommender system.However, these research did not use users' multi-platform behavior to complete the information.In this study, we integrate the interaction behavior of multimodal nodes and establish virtual nodes and virtual relationships to mine users' implicit preferences.Tus, the real-time diversity of the recommender system is expected to be improved.

Current Status of Graph Attention Application.
In 2018, Veličković et al. proposed a graph attention network for graph structure data [27].Te graph attention network uses the self-attention method to calculate the attention of a node in the graph relative to each adjacent node.Te graph attention network is widely used in text classifcation, software detection, and other felds and has achieved good results.Its application in recommender systems has also become a popular topic [42].Te graph attention network does not need to pay attention to the whole graph structure and it can give diferent weights of neighbor nodes.
Wang et al. proposed the knowledge graph attention network (KGAT) to solve the problem of ignoring the relationship between data in some models [42].It explicitly models the high-order connectivity in knowledge graph (KG) in an end-to-end manner and uses the attention mechanism to distinguish the importance of neighbors.However, this method does not consider the situation that the relationship will change over time, and it has poor realtime performance.Wang et al. proposed a relational metric social recommendation model based on graph attention network to solve the problem of excessive interference information in recommender systems [43].Dual graph attention network is designed in the item domain and social domain to adaptively aggregate domain characteristics of users or items.Te complex interactions between corresponding neighbors are modelled as relation vectors by using multilayer neural networks.However, this method lacks in modelling social relationships of diferent types or strengths.Zeng and Liu proposed a model which combines knowledge graph and graph attention network and adds an interest evolution module to graph attention network to capture user interest changes and generate Top-N recommendations [44].However, the robustness of this method for intelligent recommendation algorithms needs to be improved.
Te above research applied graph attention to the recommender system and solved the problems of data sparsity and cold start to some extent.However, how to use graph attention networks to improve the real-time performance of recommender systems still needs to be studied.In this paper, the time factor is integrated into the graph attention network to generate a dynamic knowledge graph.Moreover, in order to reduce the impact of exposure bias and selection bias, additional relationships of the knowledge graph are defned.

RippleNet Application Status.
Te current typical method based on path and knowledge graph embedding is RippleNet [13].Interest propagation is the important approach to implement RippleNet.Interest propagation can make full use of the user's historical data to obtain the preferences and then expand the user's interests outward along the relationship of the knowledge graph [45].However, RippleNet does not consider the weight diference of the relationship between data.It is one kind of an undiferentiated transmission strategy of excellent seeds, which aggravates the impact of selection deviation and exposure deviation.
Luo et al. put forward the CN-RippleNet method [46], which combines the relevant knowledge of the complex network to calculate the infuence of each node, and incorporates the infuence into the original model.Te model includes a user data processing module, recall layer module, and sorting layer module.However, the result of triple extraction of the improved method is inaccurate, and the node infuence does not consider the relationship type.Shi fused two RippleNet models based on the knowledge graph to build a new recommendation model [47].Tis model can discover the distribution of user interests and item features on the knowledge graph and the relationship between them.However, the attribute labels of each entity need to be explored.Luo et al. focused on the weight of entities and proposed a RippleNet model considering the infuence of complex network nodes [48].After constructing the complex network based on a knowledge graph, the maximum subnet model is established.Te node infuence in the graph network is calculated and added to RippleNet as a weight.Wang et al. proposed a multitask feature learning method based on RippleNet's knowledge graph enhancement in order to mine potential preferences from the knowledge graph [49].
Tis paper proposes a recommendation method based on a dynamic knowledge graph.First, multiplatform information is used to generate heterogeneous networks.User and item knowledge enhancement is achieved through multimodal nodes of knowledge graph.Based on the multimodal nodes, virtual relationships are introduced.Ten, the multihead attention mechanism of the graph attention network is used to set the weight of each relationship in the knowledge graph.Finally, the improved RippleNet model is utilized to predict the user-item clickthrough rate, and a list of Top-N recommendation results International Journal of Intelligent Systems with the highest probability value is given.Te virtual nodes, virtual relationships, and advanced RippleNet mechanisms can efectively alleviate the problems of data sparsity and cold start and reduce the impact of data bias on the recommender system.

Adaptive Dynamic Knowledge Graph Recommender System
Te overall framework of the proposed recommendation approach is shown in Figure 1.Te recommendation method includes three steps: (1) build a heterogeneous network based on multiplatform information and multimodal nodes and establish a basic knowledge graph; (2) integrate the time warehouse mechanism into GAT and use the graph attention network to extract the short-term preferences of users to obtain the realtime knowledge graph network; and (3) cluster users and items, optimize the RippleNet model based on excellent seed clusters, propagation blocking, and random seed mechanisms to predict click probability, and obtain a list of recommendations.

Building Heterogenous Network.
In a recommendation environment, users' preferences on a platform can be supplemented and enhanced through users' behaviors on multiple platforms.Considering the complexity of user behavior and the multidimensional characteristics of user preferences, we frst build a heterogeneous network which is shown in Figure 2. Several defnitions related to heterogeneous networks are introduced as follows.

Defnition 1. Multimodal nodes.
Multimodal nodes refer to multiple types of nodes in heterogeneous graphs, including users and item nodes.Besides the actual items that exist practically, item nodes also include virtual nodes such as topics, emotions, styles, and habits which are extracted from users' multi-platform information.Specifcally, a virtual node is the user's preference or style in life, study, or work obtained by analyzing user's multi-platform behaviors.In Figure 2, virtual nodes are represented by dotted ellipses, such as work fanatic, anxiety tendency, and fnancial sector.Virtual nodes help discover the implicit preferences of users.Users and items in heterogeneous networks are represented by U and V, respectively.
where U is a set of users, including n single users.V is a set of items, including m items.
Defnition 2. Heterogeneous relationship.Te set of relationship types in heterogeneous networks is represented by R.
where r 1 , r where T r i represents the establishment time of the relationship r i .F r i is the interaction frequency under the relationship r i .M r i represents the number of mutual interaction or associated nodes of two nodes in relationship r i .Te longer the relationship between nodes is established, the greater the impact between nodes.For example, if two people have been good friends for many years, they usually share similar interests and hobbies.Terefore, they are easily infuenced by each other.Te higher the interaction frequency between nodes, the greater the infuence exerted by each other.Te more the number of jointly associated nodes of two nodes, the closer the relationship between nodes, and the greater the similarity weight of user social relations or items.c is the normalization coefcient, and it is used to reduce the deviation of the recommended results due to the large diference of the weight function.After constructing a heterogeneous network with weight values, the entities and relationships of the heterogeneous network are extracted, and the knowledge graph G is established to form the ternary relationship group (h, r, t), h, r, and t refers to head, relation, and tail, respectively.After constructing a heterogeneous network with weight values, the entities and relationships of the heterogeneous network are extracted, and the knowledge graph G is established to form the ternary relationship group (h, r, t), h, r, and t refers to head, relation, and tail, respectively.

GAT Integrated into the Time Warehouse Mechanism.
In the recommender system, the user's behavior is usually afected by network interactions, network public opinions, and emergencies, and the relationship of nodes in heterogeneous graphs will change accordingly, which will lead to changes in short-term preferences, and short-term 4 International Journal of Intelligent Systems preference may also develop into long-term preference.GAT focuses on the neighbors of the target user, that is, the local structure of the graph.Hence, GAT can efectively extract users' short-term preferences.For this reason, we set a time warehouse and time warehouse trigger mechanism to facilitate GAT to extract users' short-term preferences.A time warehouse is a time segment used to observe user behavior.TI a is used to represent a time warehouse, and it can be represented as follows: Usually, the time warehouse periodically extracts user preferences on a weekly basis.In the event of network public opinion and emergencies, we set the trigger function to reduce the space of the time warehouse, increase the number of warehouses, and increase the frequency of calculating user preference characteristics.
Te sliding trigger function can be expressed as follows: where f(x) is the trigger function, whose value is related to several behavior parameters, such as Poe-public opinion events, Eme-sudden/hot events, Kes-keyword search changes, Pug-purchase items, Fot-focus topics, and Ffi-frequent friend interactions.ξ is a constant and can be set empirically to adjust the trigger frequency of the time warehouse.Figure 3 shows the specifc process of the time warehouse embedding layer to extract user features.It can trigger the creation of multiple time warehouses and can also perform feature extraction calculation of conventional time warehouses, which realizes the fexibility of time warehouse establishment, ensures system accuracy, and reduces calculation costs.

Multihead Attention Network of GAT.
Considering the complex relationship of multiple nodes, we use the multihead attention mechanism to extract the relationship of the dynamic knowledge graph, so as to make the attention weight more accurate and improve the accuracy of the recommendation results.We focus on user-user, user-item, item-item, and additional relationships.A similar logical structure is used for function computation in these four relationships.

User-User GAT.
For the interaction between users, graph attention networks can be used to enhance the characteristics of user relations and mine implicit friends according to the path relationship of the knowledge graph.Te potential characteristics between user u j and other users are expressed as h where σ is the nonlinear activation function Finally, the user's low dimensional vector feature representation is obtained.In order to improve the accuracy of its calculation, the multihead attention mechanism of the GAT model is used.Te specifc process of calculating h →TIa j is as follows.

User-Item GAT.
Te items that users have interacted with can be divided into two categories.One is direct useritem interaction, such as user evaluation, purchase, and collection, and the other one is an indirect interaction between users and virtual items based on the path in the heterogeneous graph.Te representational feature q TI a j of user-item interaction can be expressed as follows.
where σ is the nonlinear activation function Te output feature representation of user-item interaction is represented as follows: 6 International Journal of Intelligent Systems

Item-Item GAT.
For the interaction information between items, we focus on the degree of association between historical interaction projects and neighborhood items, so as to provide users with better recommendations among the same type of items.Te information between items includes direct information and indirect information.Direct information refers to the relationship that can be established between items through keywords or other attributes.Indirect information refers to the connection between items established through the user's social interaction, and also through the user's personality or style.Te potential feature e TI a j between items can be calculated by the following formula.Te potential characteristics between items are calculated by the following equation.
where σ is the nonlinear activation function.AF v−v is the aggregation function that fuses the information directly related to the target item and the information indirectly related to it.b is the neural network ofset.W represents the neural network weight, which can be obtained by iterative training.
Di v refers to the item with information directly related to the target item.In v refers to the item with indirect information related to the target item.ρ TI a ia represents the interaction embedding of the target item with other items at time TI a .
Te attention coefcient of the item, c TI a ij , is calculated according to the latent features, and the normalized calculation is performed.
Te output feature representation of item-item is listed as follows.

Additional Relationship GAT.
In order to improve the diversity of recommendation results and reduce the impact of data bias on recommendation results, additional where σ is the nonlinear activation function.AF Te output characteristic of additional nodes is represented as follows.
Figure 4 shows the structure of the attention network.It is divided into three layers, namely, the propagation layer, the time warehouse embedding layer, and the aggregation layer.Te attention network carries out input propagation for attention calculation of four types of relationships which are user-user, user-item, item-item, and additional relationships.Trough the calculation of multiple time slots, the dynamic attention weight is obtained.It can not only mine the potential preferences of users but also provide the possibility for the expansion and extension of user preferences.Tis will help improve the accuracy and diversity of recommendation results and improve user satisfaction.
3.4.Advanced RippleNet.After using GAT to extract users' short-term preference features, a real-time dynamic knowledge graph is obtained.Ten, the RippleNet model is advanced to expand the knowledge graph information to complete the user's click prediction of the item.First, the weight coefcients generated by the graph attention network are used for quick clustering of multitype nodes.Each cluster is used as the seed cluster of RippleNet to propagate the knowledge graph.Ten, in order to reduce the computational complexity, a propagation blocking mechanism is set up to determine the number of hops of propagation.For some island users or users with less historical access data, recommendations are made according to the weight reference function and random seed mechanism to alleviate the problem of data sparsity and obtain more diversifed recommendations.

Defnition 5. Seed clustering.
According to the weight of the attention coefcient, users and items are respectively clustered by density-based clustering algorithm (DBSCAN) to generate node clusters.Te seed clusters can not only ensure the compactness of node associations within the cluster but also ensure the diversity of nodes within the seed cluster.Te optimality and diversity of seed clusters can ensure the efectiveness of RippleNet.Te number of nodes contained in each cluster is related to the size of the dataset.Te user cluster is represented as User clusters interact directly or indirectly with item clusters.Specifcally, 8 International Journal of Intelligent Systems Te threshold can be set according to needs or expert experience to classify user clusters and item clusters.Te function increases the weight of the virtual relationship and attaches importance to the diversity of results.Among them, ς, ϵ, Ω is a normalized parameter setting, maintaining the order of magnitude of the parameter.Defnition 6. Propagation and blocking mechanisms.
To improve RippleNet, we set excellent seed, random seed, and propagation blocking mechanisms to guarantee personalized recommendation, reduce the impact of exposure deviation, and improve recommendation efciency, respectively.
Te interaction matrix between user cluster and item cluster is expressed as Y.
where y C u C v represents the interaction coefcient between user clusters and item clusters.y C u C v has three values.
(i) If y C u C v � 1, it means the user cluster has direct interaction with the item cluster or indirect interaction along the meta path of the graph data.Ten, the item cluster is called the excellent seed cluster set for the user cluster.Te propagation of excellent seeds is indicated by blue arrows in Figure 5. (ii) If y C u C v � 0, it means that there is no interactive information between the user cluster and item cluster.Tese item clusters can be used as candidate seed sets of user clusters to establish a random propagation relationship between users and items.Te propagation of random seeds is indicated by green dotted arrows in Figure 5. (iii) If y C u C v � −1, it indicates that the relationship between the user cluster and the item cluster is an inhibition relationship.Te item cluster is set as an inhibition cluster to block the propagation between the items and the users.Te red cross is used to represent the propagation interruption in Figure 5.
Specifcally, the size of the interaction matrix is calculated according to the weight function calculated above.
If eta < crit and y C u C v � 1, the prediction probability will be calculated.If eta > crit and y C u C v � −1, the propagation blocking mechanism is enabled, and the prediction probability is not calculated.N ′ is a fxed value, so that the eta, crit function can be compared in the same order of magnitude.If qa x � 0, it means that there is no interaction between the two nodes.A virtual relationship is randomly established, and the prediction probability is calculated to fnd the user's potential preference.Defne 7. Node cluster set.
Te set of entity node clusters is represented by where the above formula represents the set of user clusters and item clusters associated with user clusters after k jumps.Wherein, ε 0 Te potential interest of user clusters to item clusters is expanding, but with the expansion of the scope of propagation, the intensity of preference transmission is also gradually weakening.Te framework of RippleNet is also shown in Figure 5.
Te user marked with red color in a cluster is the targeted user.Te interaction of other users in the cluster and the item cluster of their own interaction can be used as the seed cluster for calculating the prediction probability.Virtual relationships are randomly established for island users to enrich user information.At the same time, the inhibition items will be blocked and no information will be transmitted from these items.
By comparing the feature C v of the item cluster with the head node h i and relation r i of the triplet (h i , r i , t i ), the association probability of each triplet in the ripple set S 1 C u can be obtained.Te formula of P i is listed as follows.
where R i and h i are the features of the relation r i and head node h i , respectively.Ten, the weighted sum of the tail nodes in S 1 C u is calculated, and the weight is the correlation probability calculated by (24), and the vector O 1 C u is obtained.
where t i is the feature of the tail node t i , and the vector O 1 C u represents the frst-order response of the user cluster C u in the seed set of the knowledge graph to the item cluster.Te corresponding expansion is carried out, the multiorder response is calculated, and the summation is performed to obtain C u , which is the response after integrating all orders.
International Journal of Intelligent Systems Finally, combining user clusters and item clusters, the predicted user clicking probability is output.Te calculation formula is as follows.

Algorithm Description.
Te key steps of HN-DKG include how to extract user short-term preference and how to predict the user click probability.Hence, we introduce two algorithms, which are the feature extraction algorithm of GAT and probability prediction algorithm of RippleNet.At the same time, Algorithm 1 is given to illustrate how the above methods cooperate.In which, acc th and loss th are the thresholds set for prediction accuracy and cross entropy loss, respectively.Teir values can be determined based on practical needs for model training or expert experience.

GAT Algorithm.
Algorithm 2 is mainly composed of three parts.Te frst part (lines 1-4) constructs a time warehouse to preprocess data and extract user and item features.Te second part (lines 5-7) calculates the attention coefcient and low dimensional representation of user and item characteristics.Te third part (lines 8-12) adjusts the heterogeneous network according to the weight of the attention coefcient to achieve dynamic network representation and output potential features.

RippleNet Algorithm
Description.Algorithm 3 is mainly composed of three parts.In the frst part (lines 1-2), data preprocessing is used to extract user clusters and item clusters as the seeds of the corrugated network model.Te second part (lines 3-9) calculates the multihop vector of the corrugated network and processes and calculates the triplet relationship.Te third part (lines 10-13) calculates the multiorder response and prediction probability and fnally gives a list of recommendation results.In Algorithm 3, P th is the threshold set for triple association probability and K set is the threshold for the number of iterations to calculate the optimal P i .Teir values can be determined based on practical needs for model training or expert experience.

Experimental Analysis
Tis section introduces the experimental settings, experimental datasets, the comparison algorithms, and the parameters settings.Ten, the experimental results are analyzed.

Experimental Dataset.
In this study, the datasets of MovieLens-1M movies and Book Crossing books are applied on the performance test.Table 1 shows the details of the two datasets.
MovieLens-1M (https://grouplens.org/datasets/movielens/1m/) is a standard dataset widely used in the feld of movie recommendation feld, which includes 1000209 ratings from 6036 users on 2445 items.Each score is a positive integer between 1 and 5. Book-Crossing (http://www2.informatik.unifreiburg.de/~cziegler/BX/) is a standard dataset widely used in the feld of book recommendation.Tis dataset contains 1149780 ratings from 70679 users on 24915 items.Each score is a positive integer between 1 and 10.After removing fuzzy relation, data screening, and data deduplication, the knowledge map in the dataset is extracted by similarity measurement [50] and sampling [51].Te knowledge graph corresponding to the dataset MovieLens-1M contains 182011 entities, 12 diferent relationships, and 1241995 pieces of knowledge, while the dataset Book Crossing contains 113487 entities, 80 diferent relationships, and 6420520 pieces of knowledge.To test the performance of recommendation approaches on multiplatform, we construct a new dataset named MBdata which is based on heterogeneous platforms.Te premise of designing this dataset is to assume that users who are interested in a certain feld also have the same interest in other felds.Te theoretical basis for this hypothesis is the user interest transfer theory, which states that there is a connection between the user's interest preferences in the source and target domains [52,53].
Te frst construction method is to extract highly matched users as the same user in the MovieLens-1M and Book-Crossing datasets according to the distribution of user scores and the similarity of the characteristics of the item categories that users like and dislike and construct the interactive information of multi-platform to generate multi-platform datasets.Another construction method is to collect the interactive data of users on platforms such as Weibo, Taobao, and CSDN, such as purchasing, browsing, and evaluating, rating based on frequency and time, and establish a relational dataset to obtain richer multiplatform data.
HN-DKG uses movies, books, and MB data datasets.To refect multiplatform data, we integrate user nodes in the movie and book datasets and then establish relationships between users and movie and book nodes.Tis forms a multimodal node dataset.
Further processing was done on the dataset to create a dataset suitable for our experiment.To ensure the connectivity of heterogeneous networks, we remove users who does not have multi-platform behaviors.Te data were randomly divided into a training set (80%) and a testing set (20%) in an inductive way.

Comparison Method.
Te following approaches are used as baselines for comparative experiments, as shown in Table 2.
(i) EHCF [54] (efcient heterogeneous collaborative fltering) can model fne-grained user item relationships.(ii) CKE [55] (collaborative knowledge base embedding) uses heterogeneous network embedding and deep learning embedding methods to automatically extract semantic representation from structural knowledge, text knowledge, and visual knowledge in the knowledge base.(iii) RippleNet [13] expresses user preferences through a large number of entities related to user click history.In this method, the knowledge graph representation method used is TransE [57].(iv) KGAT [42] (knowledge graph attention network) explicitly models high-order connections in KG in an end-to-end manner.(v) NFM [56] (neural factorization machine) proposed a new model neural factorization machine for prediction under sparse sets.

Evaluation Matrix.
Tese recommendation methods aim to give the Top-N recommendations for users.To evaluate the performance of these recommendation methods, the precision, recall, and F1 score commonly are used as evaluation criteria.
Te accuracy rate represents the number of items in the recommendation list for which users have had positive feedback as a percentage of the total number of items in the recommendation list.Te formula of the accuracy rate is as follows.
where U represents a set of users and u is a specifc user in the set U.
Recall describes the proportion of the number of items in the recommended list that users have had positive feedback to the number of items in the test set.It is calculated as follows.
where TP(u) and FN(u) denote the goods that are predicted to be actually liked and disliked by user u among the goods that user u is interested in, respectively.Te precision rate is for the prediction result, which indicates how many of the samples predicted to be positive are actually positive samples.Recall, on the other hand, is for the original sample, and it indicates how many of the positive examples in the sample were predicted correctly.
F1 score combines the accuracy rate and recall rate to measure the efect of recommendation.Te higher the value of F1 score, the better the efect of recommendation.Te formula is as follows.
Tis parameter is used to compare and discuss the similar models and analyze the advantages and disadvantages of the proposed method and baseline methods.
Normalized discounted cumulative gain (NDCG) is an evaluation index that takes into account the return list to evaluate the accuracy of the list.Te value is in the range of (0, 1).Te larger the value, the better the recommendation efect.AUC measures the probability that a model will predict a positive sample as a positive example rather than a negative sample as a positive example.Te formula is as follows.
where DCG refers to discounted cumulative gain, IDCG refers to optimal DCG, NDCG is used to evaluate the accuracy of ranking, rank ins i represents the sequence number International Journal of Intelligent Systems of samples with the ith smallest probability score.M and N represent the number of positive and negative samples, respectively.AUC is used to evaluate the ranking quality of predictions.

Experimental Environment.
Te experiments in this study were conducted on a desktop computer with 8 GB of RAM, an Intel Core i5-8250 CPU, and the software was Python 3.9.Te other details of the experiment environment setting are shown in Table 3.

Setting Hyperparameters.
Te hyperparameters in the recommended approaches are determined by experiments, that is, when other parameters are determined, we change the values of the parameters needed to be determined and select the parameters that achieve the best experimental results.We take the number of heads of multihead attention as an example to illustrate the process.Te number of heads of multihead attention determines the number of times the model calculates.In general, the more heads the model has, the more accurate attention is extracted, but more heads will also increase the calculation cost and the possibility of overftting.Terefore, in order to maximize the extraction of user features and minimize the calculation time of the model, it is crucial to select an appropriate number of heads.We test the performance of the number of multiple heads of HN-DKG in the case of K � 2, 4, 6, 8, 10.
Figure 6 shows the performance comparison of HN-DKG when using diferent K values on MovieLens-1Mdataset, where RMSE is the root mean square error and MAE is the mean absolute error.Figure 7 shows the precision of recommendations under diferent K values.Figure 6 and 7 indicate that the proposed method achieves lower RMSE and higher precision when K is 4. In addition, we compared Flops under diferent K values [58].When K = 4, the proposed method only accounted for 0.14% of the computational cost in creating and applying attention matrices, which is only 0.05% higher than the Flops at the suboptimal K = 2. Te experiment indicates that the proposed method can achieve a balance of precision, RMSE, and computational complexity at a K of 4 on the experimental data scale.

Performance Comparison.
Tese recommendation models are tested on the MovieLens-1M dataset.70% of the data are used for training and 30% are used for verifcation.Te data set is split in an inductive way, that is, the data in the test set will not appear in the training set.
Te selected evaluation indexes are precision, recall, F1 score, NDCG@10, and AUC.Te experimental results are shown in Table 4. Tese methods are also tested on the Amazon Book dataset.Te comparative experimental results are shown in Table 5.
In order to test the performance of the proposed HN-DKG algorithm under extremely sparse data, based on the Amazon Book dataset, we delete 2/3 of the scoring data to increase the sparsity of the data.RippleNet and KGAT, which performed better in the above experiments, are selected to compare and discuss the performance of the proposed model HN-DKG.
Figure 8 shows the performance comparison of three models under extremely sparse data.Te model is iterated on MovieLens-1M dataset to obtain more accurate weights.Among these three indicators, HN-DKG is better than the other two methods.
Te efects of training iterations on loss function and accuracy (acc) are compared through experiments.Figure 9 shows the change of loss function and accuracy rate with the increase of iterations.Both parameters are within the range of (0, 1), and the higher the value of loss, the less loss the model has and the better its functionality.Te larger the acc value, the higher the accuracy and performance of the model.

Ablation Experiment.
In order to evaluate the importance of the main innovations of the proposed algorithm, we designed ablation experiments.Specifcally, the virtual node, the time warehouse mechanism, and the user clustering factor are considered in the ablation experiment.Also, we conducted the experimental results of ablation studies in Table 6.Recursively propagate embedding from a node's neighbors (which can be users, items, or attributes) to optimize the embedding of nodes and use the attention mechanism to distinguish the importance of neighbors NFM He and Chua [56] It combines the linearity of FM in the second-order feature interaction modelling and the nonlinearity of the neural network in the higher order feature interaction modelling International Journal of Intelligent Systems In order to test the efect of virtual nodes on the effectiveness of these approaches, virtual nodes are removed and an algorithm HN-DKG-drop1 is formed.MovieLens-1M with a small amount of data is selected, and the improved HN-DKG model is compared with HN-DKG-drop1 to test the impact of virtual nodes on the recommendations.Besides the results of HN-DKG, the experimental results of the HN-DKG-drop1 model are shown in Table 6.
In order to test the efect of the time warehouse mechanism on the system efectiveness, the time warehouse is eliminated and the algorithm HN-DKG-drop2 is formed.MovieLens-1M, which has a small amount of data, is selected for this part of the experiment.Te improved HN-DKG model is compared with HN-DKG-drop2 to test the impact of the time warehouse mechanism on the recommendation results.Besides the results of HN-DKG, the experimental results of the HN-DKG-drop2 model are shown in Table 6.
In order to test the efect of clustering on the recommender system, clustering factors are eliminated and an algorithm HN-DKG-drop3 is formed.In this part of the experiment, MovieLens-1M with a small amount of data is selected.Te improved HN-DKG model is compared with the HN-DKG-drop3 to test the impact of clustering on the recommended results.In the experiment, for data preprocessing, a density-based clustering algorithm can be used to classify the data according to the reference factors of weights.Besides the results of HN-DKG, the experimental results of HN-DKG-drop3 model are shown in Table 6.

Experimental Results.
When determining the number of heads K for multihead attention, comparing the RMSE and MAE parameters, it can be seen from Figure 6 that when K � 4, the performance is best, and the calculation time is relatively short, resulting in lower computational costs [59].Te performance trend of the model is to frst reach the optimal point and then gradually decrease, indicating that increasing the number of attention heads can better mine user features and improve model performance.However, when there are too many heads, the performance of the model will decrease due to overftting, and the calculation time will be longer, resulting in unsatisfactory model performance.Terefore, a K value of 4 was determined for subsequent experiments.
From Table 4, it can be seen that the HN-DKGT proposed in this article has higher accuracy than other algorithms, but slightly lower recall than some models.If we pay attention to the experimental results comprehensively, RippleNet model has a relatively high recall rate and slightly lower accuracy than other models.Tis also confrms some empirical results of the recommendation model, that is, accuracy and recall cannot be achieved simultaneously.Terefore, fnding a balance between the two is the best way to optimize the model.Te model proposed in this article achieves a high balance in the ratio F1 of accuracy and recall, verifying the efectiveness of the proposed method and also improving a certain degree of accuracy.According to Table 5, it can be seen that the model proposed in this article has the highest recall rate in the Amazon Book dataset, but the advantage of accuracy is not obvious enough.Tis may be because the dataset is relatively large and the improvement efect is not signifcant enough, but there is still a certain advantage in the numerical value of F1, achieving a relatively optimal balance.By analyzing the NDCG@10 and AUC in both Tables 4 and 5, we can conclude that the performance of HN-DKG is slightly higher than that of other models, which further proves the efectiveness of the proposed method.
According to Figure 8, in the MB-data dataset, the accuracy of the model proposed in this paper is 0.05% higher than that of RippleNet algorithm with good performance and 0.1% higher than that of EHCF algorithm with similar performance.At the same time, it performs best in similar algorithms on F1.It shows that the proposed method can achieve more accurate recommendation results by extracting efective auxiliary information for user/item completion and knowledge enhancement in multi-platform and interactive information data.It can be seen that the HN-DKG has slight advantages in accuracy and recall, and it achieves a better balance in F1.It is proved that the proposed method is effective in dealing with cold start and sparse data problems.
It can be seen from Figure 9 that with the increase of training iterations, the loss function gradually decreases and the accuracy rate continuously improves.Te loss function

Diversity Analysis.
In order to evaluate the diversity of recommendation results, the attribute distribution of recommendation resources is analyzed frst, and then, the long tail resources with low matching degree are recommended.Te diversity measurement method in the Top-N strategy is introduced to evaluate the diversity of recommendation results [60], that is, the diversity is calculated according to whether the attribute distribution of resources is balanced.Considering that the diversity function based on item characteristics can be seen as a supplement to the similarity measurement [61], we take the decentralized distribution of attributes as the diversity evaluation index.Te calculation formula of diversity is as follows: where R is a set of resources recommended for user U, the number of R is recorded as |R|, and div(l i , l j ) is a supplement to sim(l i , l j ).Te similarity calculation formula adopts cosine similarity algorithm.In general, improving diversity is at the cost of reducing the accuracy of recommendations.We will compare the diversity and accuracy of the recommended results.
Figure 10 shows the relationship between accuracy and diversity of recommendation results.Te curve trends of EHCF, CKE, KGAT, and NFM in Figure 10 show that with the improvement of accuracy, diversity shows a downward trend.Compared with RippleNet, the diversity of HN-DKG declined slowly and remained above 0.40.Terefore, when the accuracy is improved, the diversity loss of HN-DKG is not very signifcant.Te bottom-up recommendation strategy helps to improve the diversity of recommendation results.At the same time, it ensures the accuracy of recommendation results by enhancing the relationship between users' multi-platform data.
Trough the above experiments, it can be known that the applications of multihead attention-based GAT model and the advanced RippleNet model are efective for the improvement of the recommender system.Due to the size of the dataset and the training degree, the weight fles obtained are diferent, which has a certain impact on the accuracy and recall rate.However, this model has a certain advantage in the accuracy and recall ratio F1.Te balance calculation between the two indexes is relatively stable, which improves the overall performance of the recommendation approach.

Analysis of Ablation Experiments.
As shown in Table 6, compared with the original algorithm HN-DKG, the evaluation metrics of the training and verifcation model of HN-DKG-drop1 have decreased, with an average decrease of 2.57% on accuracy and 5.49 on AUC.It is proved that it is useful to add the virtual node factor into the recommender system to improve the system performance.Terefore, the original intention and the implementation of HN-DKG are efective.
Similarly, as shown in Table 6, compared with the original algorithm HN-DKG, it is found that the   It is proved that the time warehouse mechanism and seed clustering method are useful for improving the performance of recommender system.In this paper, the experiment proves the efectiveness of HN-DKG in improving the recommendation efciency by comparing the parameters of two datasets.Ablation experiments were carried out to prove the efectiveness of virtual nodes, time warehouse mechanism, and node clustering factors.Terefore, it can be concluded that the proposed recommendation method based on multiplatform heterogeneous networks and dynamic knowledge maps considers the correct direction of factors and signifcantly improves the accuracy and diversity of results.

Conclusions and Future Work
Data sparsity, cold start, and data bias are the main factors afecting the performance of the recommender system.In order to improve the accuracy and diversity of the recommendations, we propose a recommendation approach based on heterogeneous networks and dynamic knowledge graphs.First, heterogeneous networks are built based on cross domain and multi-platform data to mine users' implicit preferences.After extracting knowledge maps from multimodal heterogeneous networks, a time warehouse mechanism is established.Te GAT component is used to extract user preference features and calculate four types of attention weights.Te additional relationship can increase the diversity of recommendation results and enhance the interaction relationship.Te RippleNet algorithm is improved by using some mechanisms such as excellent seed clustering, random seed, and propagation blocking to improve the accuracy and increase the diversity of the recommendations, and further reduce the complexity of the algorithm.
Te experimental results show that the proposed HN-DKG has the following characteristics: (1) Te basic knowledge graph is established through multi-modal heterogeneous networks to reduce the impact of selection bias, and at the same time, it is conducive to mining the potential preferences of users.(2) Te GAT component is used to calculate the attention weight of real-time multitasks, extract the short-term preference of users, enhance the feature relationship, and realize the timeliness of the recommendation approach.Te HN-DKG proposed in this article can be applied in many scenarios which are related to recommendations, such as news feed, one-stop tourism and dining, public opinion guidance, and fnancial industry investment and social circle participation.In the future work, we will conduct research on more types of deviation problems and continue to study how to mine users' social behavior and potential preference characteristics according to the complex behavior of users in multiple platforms.At the same time, the combination of dynamic knowledge graph and deep learning algorithm is also the key to improve the recommender system.

Figure 2 :
Figure 2: Multimodal heterogeneous network relationship diagram based on multiple platforms.

Figure 4 :
Figure 4: Attention structure diagram constructed from four types of node relationships.

Figure 9 :
Figure 9: Experimental comparison of the efect of changing the iterations on loss function and accuracy.(a) Comparison of loss.(b) Comparison of acc.

Figure 10 : 3 )
Figure 10: Relationship between accuracy and diversity of recommendation results under diferent models.
2 , • • • , r l represents diferent types of relationships.Since there are multimodal nodes, the relationship between nodes contains many types.Tere are l types of relationships in set R, including multiple relationship types such as user's clicking on an item, user's interacting with another user, and item's relationship with other items.Among these l types of relationships, one part is the access or subordinate relationship between users and items based on explicit history records, which is called the inherent relationship.Te other part is the relationship between user nodes and virtual nodes, which is called the virtual relationship.Virtual relationship is shown by the dotted lines in Figure2.Te relationships marked with question marks in Figure2refer to the ones that need to be predicted.Defnition 3. Relationship weight function.RW denotes the relationship weight function between users.Te weight function of the relationship r i is represented as RW r i , Te specifc relationship weight of relationship r i is calculated as follows.
Ex v indicates interesting items that users have interacted with in history.Im v indicates an implicit item that users interact with indirectly through the Meta path.s . AF u−v is an aggregation function that combines explicit interest items that users have directly interacted with and implicit items that users have indirectly interacted with through meta paths.b is the neural network ofset.W represents the neural network weight, which can be obtained by iterative training.
3: Embedding variable time warehouse into a multihead attention network layer.International Journal of Intelligent Systems relationships are designed.Tere are two types of additional relationships.One is random recommendation to all users based on hot topics in the time warehouse, and the other is random recommendation to users that have no interaction with user's preferences.Te nodes recommended to users in the additional relationship are called additional nodes.
u•••v is the aggregate function of users and additional nodes.b is the neural network ofset.W represents the neural network weight, which can be obtained by iterative training.Vi v represents the characteristics of additional nodes, and its number can be limited on the threshold according to the size of the dataset.d TI a ia represents the embedding of user and additional nodes at time TI a .Te attention coefcient of additional nodes, φ it represents the collection of excellent seed clusters and random seed clusters.

Table 2 :
Baseline methods used in the experiment.
than that of prediction data.However, when the training iterations are 100, both of the loss and acc are abnormal.It may be that due to the low number of training iterations, the calculation of the weight function fle is biased, so the recommendation model still needs to increase the number of training iterations to get a stable and accurate weight function fle.

Table 5 :
Experimental results on Amazon Book dataset.

Table 6 :
Ablation study of the key components of HN-DKG.