Node Importance Estimation with Multiview Contrastive Representation Learning

. Node importance estimation is a fundamental task in graph analysis, which can be applied to various downstream applications such as recommendation and resource allocation. However, existing studies merely work under a single view, which neglects the rich information hidden in other aspects of the graph. Hence, in this work, we propose a M ultiview C ontrastive R epresentation L earning (MCRL) model to obtain representations of nodes from multiple perspectives and then infer the node importance. Specifcally, we are the frst to apply the contrastive learning technique to the node importance analysis task, which enhances the expressiveness of graph representations and lays the foundation for importance estimation. Moreover, based on the improved representations, we generate the entity importance score by attentively aggregating the scores from two diferent views, i.e., node view and node-edge interaction view. We conduct extensive experiments on real-world datasets, and the experimental results show that MCRL outperforms existing methods on all evaluation metrics.


Introduction
Knowledge graphs (KGs) are graph-based data structures consisting of nodes and edges [1,2]. Each node represents an "entity" and each edge represents a "relation" between two connected entities. In recent years, a lot of research has been devoted to solving problems related to graphs [3][4][5][6]. Estimating the importance of each node in a graph, network, and KG is a highly fundamental and crucial task, which is benefcial to many downstream applications, such as question answering, recommendation, web searching, and resource allocation [7][8][9][10].
For instance, Figure 1 shows an example of a movie knowledge graph, where "Suicide Squad" is a movie node, with the author node "David Ayer" and the actor node "Jared Leto" connecting to it via the edges "wrote" and "starred-in." Each node is also associated with texts such as the biographies or movie plots. Before the two movies "Training Day" and "Suicide Squad" come to screen, we may use the information in the KG to estimate the potential popularity of these two flms. As can be observed from the fgure, the movie node "Training Day" might become more popular, since it stars more popular actors and is directed by the director with higher recognition in terms of professional reviews.
Tus, a number of works have attempted to estimate the importance of nodes in a graph, which can be divided into two main categories. Te frst category includes classical methods such as PageRank [7] and Personalized PageRank [11]. PageRank was originally designed for estimating the importance of websites. It assumes that more important nodes may receive more links from other nodes. By counting the number and quality of edges linked to the nodes, the importance of the nodes can be roughly estimated, whereas this algorithm only considers the graph structure. Personalized PageRank improves PageRank by taking into account the user's estimation of the importance of the nodes in the graph, while it neglects the type of edges. In summary, this category of approaches cannot perform well when estimating the importance of nodes in large-scale complicated graphs, since they merely take into account the graph's topology while overlook the substantial amount of semantic and latent structural information encoded in the graph.
Another category is machine learning-based strategies, such as GENI [12] and RGTN [13]. GENI acquires the node features through node2vec and then converts them to the importance scores. Te scores are fexibly aggregated via a predicate-aware attention method. Additionally, a centrality adjustment module is used at the end for fne-tuning. RGTN considers both structural and semantic information based on representation learning and then uses a common attention fusion mechanism to interact structural features with semantic features; after that, these features are projected into importance values separately and then aggregated with attention weights to produce the fnal importance scores of nodes. Tese trainable methods outperform the traditional solutions owing to the advanced supervised learning framework and the fexible graph attention mechanism.
Nevertheless, there are still notable issues with current works: (1) To learn the graph representations, existing eforts usually adopt a single graph learning model to capture the structural information and generate the embeddings, which can be insufcient for accurately estimating the node importance.
(2) To aggregate the embeddings and produce the scores, existing eforts directly combine the node scores and edge embeddings, which might fail to make full use of the interaction between node and edge representations.
To tackle the aforementioned issues, we propose a Multiview Contrastive Representation Learning (MCRL) model to generate entity representations from multiple perspectives, which can help make more accurate entity importance estimation. Specifcally, we adopt two graph encoders to characterize the entity representations in diferent views and make cross-view contrasting to extract more useful signals using the contrastive learning strategy. Ten, given the learned graph embeddings, we estimate the entity scores from two views-one based purely on the entity embeddings, and one based on the interaction between entity and relation embeddings. Finally, the multiview scores are integrated using the attention mechanism. Comprehensive experiments on real-world knowledge graph datasets validate that MCRL can outperform existing methods in terms of all metrics.

Contribution.
Te main contributions are summarized as follows: (1) We devise a multiview contrastive learning strategy to estimate entity importance, where the graph representations are frst learned in the two views separately and then forwarded to the cross-view contrasting module to further enhance the expressiveness. (2) Based on the graph embeddings, we generate the entity importance score by attentively aggregating the scores in two views-one merely considering the entity embeddings and one modeling the interactions between entity and relation embeddings. (3) We conduct extensive experiments on real-world public datasets, and the results demonstrate that MCRL outperforms the baselines in all aspects.

1.2.
Organization. Te rest of this paper is organized as follows. Section 2 gives an overview of the literature that is relevant to this work. Section 3 provides the defnitions of key concepts and the problem formulation. Section 4 elaborates the node importance estimation model. Section 5 reports and discusses the evaluation results on the mainstream node importance estimation experimental settings. Section 6 concludes the paper and provides future directions. : An example of movie knowledge graph. Diferent colored boxes represent diferent types of nodes (e.g., movie and director), and diferent edge types represent diferent types of relations (e.g., wrote and starred-in). Te numbers connected to the movies are the known number of votes (which can be used as a source of importance scores). Te "plot" and "biography" are descriptions of the nodes.

Related Work
Tis section provides an overview of the literature related to this work, including node importance estimation methods, graph neural network methods, contrastive learning methods, and data augmentation methods.
2.1. Node Importance Estimation. Tere are numerous ways to estimate node importance. PageRank (PR) [7] is a random walk model that propagates the importance of each node by traversing the graph structure or transmitting it to random nodes with a fxed probability. Personalized PageRank (PPR) [11] adjusts node weights or edge weights to bias the random walk by considering specifc topics. Recently, several works have begun to explore supervised machine learning algorithms with the development of deep learning of graph data. In addition to employing the random walk model, HAR [14] additionally distinguishes between diferent types of predicates in KGs while being aware of importance scores to make better use of the rich information contained in KGs. GENI [12] is the earliest work to apply GNN to node importance estimation, which classifes the neighbors of each node based on the type of edges and aggregates the importance values of neighboring nodes. Additionally, it adjusts the node importance in accordance with the nodes' degree of centrality. RGTN [13] proposes a representation learning-based framework that utilizes both graph topology and node semantic information and aggregates them via an attention mechanism, which in turn infers node importance scores.
Notably, unlike previous works that use a single graph learning model to extract the structural information and generate the embeddings, which may not be adequate for estimating the node importance precisely, in this work, the graph representations are initially learned in the two views independently before being submitted to the cross-view contrasting module to further improve expressiveness. Besides, existing studies directly combine the node scores and edge embeddings, which may not fully exploit the interaction between node and edge representations; in this work, we calculate the attention scores between nodes using the representations of nodes and edges, and we then combine the scores to predict the node importance scores while fully accounting for the information contained in nodes and edges.
Tere are also some research works performed on heterogeneous graphs. MultiImport [15] is an end-to-end framework that integrates information from both the KG and external signals, while dealing with challenges arising from the simultaneous use of multiple input signals, such as inferring node importance from sparse signals, and potential conficts among them. HIVEN [16] traces the local information of each node, employs the meta schema to alleviate the problem of node type dominance, and exploits the node similarity within each node type to overcome the limitation of GNN models in capturing global information. Taking the movie dataset as an example, our model requires only one type of label, the popularity of the movie, and just ranks the importance of the movie nodes. In the work on heterogeneous graph, they also input information about other types of nodes, such as the director's box ofce.

Graph Neural Networks.
Graph neural networks (GNNs) apply deep learning ideas to graph data, and these methods have attracted great research attention in recent years [17][18][19]. Te pioneering work of GNN is the graph convolution model GCN [20], which performs convolution in the Fourier domain by aggregating neighbor node features and has performed well in many applications. However, GCN training needs to use the neighbor matrix of the whole graph, which depends on the specifc graph structure, so GraphSAGE [21] is proposed to solve this problem. GraphSAGE uses a multilayer aggregation function, and each layer of aggregation function will aggregate the information of nodes and their neighbors to get the feature vector of the next layer, which uses the neighborhood information of nodes and does not depend on the global graph structure. In addition, GCN treats all neighboring nodes equally in convolution and cannot assign diferent weights to nodes according to their importance; graph attention networks (GATs) [22] are proposed to solve this problem. GATs adaptively aggregate neighboring information based on the attention mechanism and can assign diferent weights to diferent nodes, and they provide an efcient framework for integrating deep learning into graph mining. Tese GNN works have been widely used in recommender systems [23], knowledge graph inference [24], and graph classifcation [25].

Contrastive Learning on Graphs.
Contrastive learning (CL) has recently become recognized as an efective method for learning self-supervised graph representations [26][27][28][29]. CL can produce data representations by learning to encode similarities or dissimilarities between a set of unlabeled samples. Rich unlabeled data are used as a supervised signal for model training. Since there are typically few labeled entities in knowledge graphs, in this work, we employ contrastive learning and make use of a large number of unlabeled entities to better obtain feature representations of nodes and get more precise node importance scores.

Problem Formulation
In this section, we provide the defnitions of key concepts and introduce the formalization of the problem studied in our work.

Knowledge Graph.
A knowledge graph is a graph G � (V, E, P) that represents a network of real-world entities and illustrates the relationship between them, where V, E, and P represent the entities, relationships, and predicates, respectively. In the knowledge graph, it is plausible that there might be several diferent types of predicates between two entities, and hence each edge is linked to a particular predicate through a mapping function: E ⟶ P.

Node
Importance. An entity's importance or popularity in a knowledge graph is indicated by node importance s ∈ R, which is a nonnegative real value.

Semantic Information of Nodes.
Te semantic information of the node is the natural language text that provides comprehensive descriptions of the semantic information of the entity or the concept represented by the node.

Problem Defnition.
Given a knowledge graph G � (V, E, P), a set of semantic information of nodes T, and importance scores S for a subset of nodes V s ⊂ V, entity importance estimation aims to learn a function f: V ⟶ [0, ∞) that generates the importance score for each node in the knowledge graph.

Approach
In this section, we frst describe the outline of the proposed model. Ten, we introduce the details of the two components and training. Table 1 provides the defnition of symbols used in this paper. Figure 2, the features of entities are frst forwarded to the self-supervised contrastive learning module to generate node embeddings. By adjusting the encoders' hidden-size and out-size parameters, we produce a high-dimensional embedding and a low-dimensional embedding for each node, respectively. Ten, the highdimensional embeddings are mapped directly to the importance score (i.e., score 1), and the low-dimensional embeddings are concatenated with the edge features, which are mapped to another importance score (i.e., score 2) using the attention mechanism. Te reason for this is that node embedding with high dimensionality can save more information and is better suited for direct mapping to predict the importance score; node embedding with low dimensionality can be better combined with edge embedding to get the attention weight between two nodes, and then the scores are aggregated to make the prediction. Te scores from the two perspectives are combined to obtain the fnal predicted node importance score. Finally, we train the entire model by aggregating the self-supervised contrastive loss, the supervised root mean square error (RMSE) loss, and the learning-to-rank (LTR) loss.

Multiview Contrastive Learning.
In this work, we choose two popular GNN models as the encoders to produce the graph representations in diferent views.

GCN.
Te frst one is the graph convolutional network (GCN) [20]. Te GCN model utilizes multiple convolutional layers to conduct the information passing by aggregating the features of nodes and their frst-order neighborhoods. Given two message passing layers, the equation of GCN can be expressed as follows: where X is the input feature vector matrix of nodes, A � A + I ∈ R N×N is the scaled adjacency matrix A of the graph G with added self-loops, W 0 and W 1 are trainable weight matrixes, σ is the activation function, and E is the output node embedding matrix.

GAT.
Te other model is graph attention network (GAT) [22], which assigns attention weight coefcients to the neighboring nodes of the target node and uses a local aggregation function to generate node embeddings. Given the features of nodes, the infuence of node j on node i can be calculated by the following equation: where e(·) represents the feature of the node, || is a concatenation operator, σ is the activation function, N(i) represents the neighboring nodes of node i, v is a learnable weight vector, and W is a trainable weight matrix. Following the acquisition of the attention weights, GAT aggregates the feature representations of the nodes and its neighbors. Te L-layer GAT aggregation formulation with multihead attention can be expressed as follows: where e 0 i is the input feature vector matrix of nodes in the graph.

Cross-View Contrastive Learning.
After obtaining the representations in the two views, we use the cross-view contrastive learning strategy to help learn more expressive graph representations.
Given a node, we denote its embedding generated by the frst view as e φ (i) and the embedding generated by the second view as e ψ (i). Tese two embeddings form a positive sample. Te pairs of embeddings including e φ (i) (or e ψ (i)) and another node's embedding are the negative samples. Te following is the defnition of node i's contrastive object: where f is a score function that measures the similarity between two embeddings. Specifcally, two embeddings are frst transformed using a multilayer perceptron (MLP) with nonlinear activation functions, and then the similarity between the two embeddings is evaluated by using the similarity metric. n is the number of nodes in the graph, and 1 [·] is an indicator function that returns 1 if the argument included in the bracket holds true and 0 otherwise. Te frst term in the denominator is the positive sample. Te term N cross refers to the cross-view negative samples and N intra represents the intraview negative samples. Finally, the overall self-supervised loss is defned as

Multiview Score Aggregation.
We devise a multiview strategy to produce and aggregate the entity importance scores.

Node View.
As the two encoders obtain the highdimensional embeddings of each node i in the graph, respectively, i.e., e φ (i) and e ψ (i), we add the two embeddings and generate the node importance scores of the node view: where F represents a fully connected neural network in our experiments.  Figure 2: Te framework of our proposed MCRL. Activation function e(·) Feature of the node or edge N(i) Neighboring nodes of node i f Score function that measures the similarity between two embeddings 1 [·] Indicator function N cross Cross-view negative samples e φ (i), e ψ (i) Embeddings of node i generated by frst view and the second view α ij Attention weight of node i to node j s 1 (i), s 2 (i) Valid ground truth importance value of node s * (i) Predicted score of node i International Journal of Intelligent Systems

Node-Edge Interaction View.
Te node view merely focuses on the features of nodes. However, edge features also contain a wealth of information and play an essential role in terms of estimating node importance. Tus, we concatenate the node and edge vectors to better model their interactions. Specifcally, we use the attention mechanism, and the attention weight of node i to node j can be calculated by the following equation: where e(·) represents the feature of the node (or edge), p ij denote the predicate between nodes i and j, || is a concatenation operator, σ is the activation function, N(i) represents the neighboring nodes of node i, and v is a learnable weight vector. We frst convert the low-dimensional embeddings to scores, and the scores can then be aggregated using the attention weights obtained with the following formulation:

Score Aggregation.
Te fnal predicted scores are formed by integrating the scores from the node view and the node-edge interaction view: where a is the hyperparameter.

Training.
We select mean square error and learningto-rank loss as supervised loss functions. First, we establish the node set V s using nodes with known importance ratings. Te following equation illustrates how to utilize RMSE to calculate the error between the predicted and labeled nodes' important scores: where s(i) is the valid ground truth importance value of node i and s * (i) represents the predicted score. In order to take the entire graph into account while rating the nodes' importance, we use the learning-to-rank (LTR) loss in the training process, We sample n nodes for node i to form a node set N i n. Te calculation method is shown below: By combining the supervised loss function with the selfsupervised comparison loss function, we can obtain the total loss function for model training:

Experiments
In this section, we conduct extensive experiments on realworld datasets to answer the following questions: (1) Does MCRL work better than the existing baseline and previous models? Are the contrastive learning and score aggregation modules useful? (2) Is MCRL generally valid for diferent encoders? Is it sensitive to hyperparameters?
We describe detailed information about the dataset and baseline in Section 5.1, answer the above questions in Sections 5.2 to 5.4, and perform a case study in Section 5.5.

Datasets.
Following previous works, we conduct comprehensive experiments on three public knowledge graphs with diferent features. More details can be found in Table 2.
FB15K [30] is a subset of the Freebase [31] database which contains knowledge base relation triples and textual mentions of entity pairs. Te 30-day view count of the corresponding Wikipedia page is utilized as the node importance score for each entity in the graph, and the description of the entity in the Wikidata is used as the node semantic information. Compared to the rest of the datasets, FB15K has more predicates and also a higher density.
TMDB5K is a movie knowledge graph generated from TMDB (https://www.kaggle.com/tmdb/tmdb-movie-metadata), and it contains information about movies as well as other closely related entities including actors, casts, crews, and countries. Te popularity of the movie is conducted to identify the entity's importance scores, while the movie summaries provide the nodes' semantic information.
IMDB is a movie knowledge graph created from the IMDB dataset (https://www.imdb.com/interfaces/), which includes entities for movies, casts, crews, genres, publishing companies, and countries. Te importance scores are determined by the number of votes for each movie. As the semantic information for the nodes, the movie plot summaries and personal biographies are used.

Competing Methods.
We compare MCRL with two primary kinds of methods that are readily available for ranking the importance of nodes in the graph. Te frst refers to the unsupervised approaches, including (1) PR [7]: a random walk-based algorithm for measuring the importance of web pages can also be used to rank the importance of nodes in a graph. 6 International Journal of Intelligent Systems (2) PPR [11]: a variant of PageRank that considers the node's own feature information.
Te second includes the supervised methods: (1) LR: a simple machine learning technique that uses the least squares algorithm based on the reduction of mean square errors. (2) RF: another basic machine learning algorithm that uses the ensemble learning method based on decision trees. (3) GCN [20]: a GNN model that aggregates the neighbor node embeddings to conduct graph convolutions in the Fourier domain. (4) GAT [22]: a GNN model that uses the multihead attention mechanism to aggregate the features of the neighbor nodes. (5) GENI [12]: the model aggregates scores using a predicate-aware attention mechanism and fexible centrality adjustment to perform node importance estimation. (6) RGTN [13]: the model provides a representation learning-based framework for node importance estimation, which propagates the embedding of nodes in a relational graph transformer.

Detailed Settings.
For fair comparison, we maintain consistency with previous works [12,13] by concatenating semantic and structural features as node input features, except for GENI * using the structure features only by following the setting in the original paper [12]. Te structural features of the nodes are obtained by node2vec [32], and the semantic features are obtained from Transformer-XL [33]. To enhance the GNN model in acquiring the node representation more accurately, we employ two widely used graph data augmentation techniques during training. Given a graph, edge dropout [34] refers to randomly dropping some edges with the probability p, while node dropout [35] is the process of randomly discarding some nodes and their connected edges with the ratio q. Te nodes with the important scores in datasets are divided into training, validation, and testing parts with a ratio of 7 : 1 : 2. To obtain reliable and stable experimental results, we conducted fve-fold cross-validation on each dataset to evaluate all the models. In order to avoid the overftting problem, we apply early stopping if the performance on the validation set is not improved for 1000 consecutive epochs. For testing, the parameters that perform the best during validation are used. Te experimental setup includes a Linux operating system, an NVIDIA GeForce RTX 3090 graphics card with 24 GB of memory, CUDA version 11.3, and Python 3.8 programming language, and the model is built using PyTorch framework version 1.11.0.

Evaluation Metrics.
Following previous works [12,13], we use three evaluation metrics to give a thorough evaluation of the rank quality and importance relevance: normalized discounted cumulative gain (NDCG) [36], Spearman's rank correlation coefcient (SPEARMAN) [37], and Top-K Hit Ratio (HR). For all metrics, higher values are preferable. Te defnitions of metrics in formal terms are provided below.
(1) NDCG is a popular metric for evaluating ranking quality for the top k nodes. Given a list of k nodes ranked by predicted important scores and their ground truth important scores s(i), the discounted cumulative gain at position k (DCG@k) can be defned. Te Ideal DCG at rank position k (IDCG@k) is obtained by an ideal ordering of nodes based on their ground truth scores. Ten, we can get the normalized DCG at position k (NDCG@k). (2) SPEARMAN measures the strength and direction of the correlation between two node rankings that are rated in accordance with the predicted scores s * (i) and the ground truth scores s(i). (3) HR measures the ratio of the predicted nodes that have been contained by the real important nodes. HR@k is achieved through HR@k � NumberofHits@k/k.

Analysis of Experimental Results.
Te performance results are shown in Table 3. Numbers after ± symbol refer to standard deviation from the cross-validation. Te approach denoted by an asterisk ( * ) only employs structure features. It can be observed from the table that our proposal MCRL outperforms all the compared models in all metrics. Besides, the results also reveal the following: (1) Supervised methods typically perform better than unsupervised approaches and are more accurate in predicting node importance scores. (2) GENI prematurely maps node features into scores and calculates attention weights by simply splicing node scores with edge embeddings, which cannot fully utilize the interaction between nodes and edges. (3) RGTN learns graph representations from a single perspective, which is not fexible and accurate enough for node features. (4) Te model proposed in this paper also has some shortcomings because it uses data augmentation methods that randomly discard some nodes or edges   Te bolded results are the best, and the italic results are the second best. 8 International Journal of Intelligent Systems in the graph, thus increasing the uncertainty and instability, and the standard deviations from the cross-validation are a little bit higher.

Ablation Study.
In this section, we perform ablation studies to prove the validity of each module in our proposed framework.

On Multiview Contrastive Learning.
To verify the efectiveness of contrastive learning, we conducted ablation study on two datasets. Specifcally, one variant merely uses GCN as the encoder and the other merely uses GAT as the encoder, both removing the contrastive loss component in the training. It can be observed from Figure 3 that using contrastive learning efectively increases the performance by improving the graph representations.

On Multiview Score Aggregation.
To validate the performance of the score aggregation module and also to demonstrate that splicing the features of nodes with those of edges to calculate attention weights is more efective than simply splicing the scores of nodes with the features of edges, we conduct the experiment on multiview score aggregation. Table 4 shows that employing score aggregation enhances the efectiveness and stability of model prediction. Besides, concatenating the node and edge representations can better capture their interactions and generate superior performance.

Choices of Encoders.
In this study, we select the GCN and GAT models as the encoders and employ contrastive learning approaches to produce the node representations. In fact, the encoders in the model can be replaced with any graph representation learning model, and we choose GraphSAGE for experiments for verifcation. From Figure 4, we can see that the outcomes of the experiments on the two datasets are not sensitive to the choices of encoders, and employing contrastive learning can consistently enhance the performance. Terefore, MCRL can be applied to a variety of encoders, and in this paper, we choose two popular encoders.

Parameter Sensitivity.
As mentioned in Section 4.3, we obtain the fnal prediction scores by assigning hyperparameter a as the weight to the scores from two perspectives. To show that the model is stable under hyperparameter perturbations, we conduct sensitivity analyses on this important hyperparameter on FB15K. Figure 5 demonstrates that changing the weights of the two scores has no appreciable impact on the experimental outcomes. Terefore, MCRL is robust to the perturbations of a.

Training Time.
To compare the efciency of diferent methods, we report the overall training time on the FB15K dataset in Table 5. For each model, we ran the experiment fve times and took the average of the run times to obtain the time cost. As shown in Table 5, it is evident that contrastive learning and dropout method used in our proposed model increase the model training time, while the slight increase in training time is acceptable considering the improvement in prediction results, which is shown in Table 3.

Comparison with Methods Proposed for Heterogeneous
Graphs. Recent years have also witnessed the emergence of importance estimation methods on heterogeneous graph. Tus, for the comprehensiveness of experiment, we compare our proposal with state-of-the-art importance estimation method on heterogeneous graph, i.e., HIVEN [16], and report the performance in Table 3. In order to compare the outcomes of the two models without taking into account the diferent node types, we apply HIVEN on homogeneous graph with the same input data as our proposed model. Taking the movie dataset as an example, we only provide the models with labels of some of the movie nodes to just estimate the importance of the movie nodes. Te experimental results demonstrate that our work outperforms the method proposed for heterogeneous graphs when evaluating on the homogeneous graph. Tis indicates that the methods proposed for heterogeneous graphs cannot work well on homogeneous graphs where the nodes are of the same type.

Case Study Analysis.
To demonstrate the efectiveness of MCRL on the prediction task, we conduct a case study using the movie dataset IMDB as an example. Table 6 shows the top-10 movies with the highest importance scores predicted by MCRL, GENI, and RGTN along with the diference between their ground truth ranks and estimated ranks. Te ground truth rank is calculated from known importance scores of movies. From the table, we can see that the top-10      Te value "G-E" of "ground truth rank" minus "estimated rank" is shown for each prediction.
movies predicted by MCRL are qualitatively better than the two others, demonstrating our model's efectiveness in terms of evaluation.

Conclusion
Estimating the importance of nodes in KGs is a highly fundamental and crucial task in graph analysis, which is benefcial to many downstream applications. In this paper, we propose a multiview contrastive learning strategy to obtain representations of nodes from multiple perspectives and use cross-view contrasting module to enhance the expressiveness. Additionally, we generate the entity importance score by attentively aggregating the scores in two views-one merely considering the entity embeddings and one modeling the interactions between entity and relation embeddings. Comprehensive experiments on real-world knowledge graphs show that our model outperforms existing methods in measures. Tere are also some works on node importance estimation for heterogeneous graphs [15,16], so for future work, we intend to apply cutting-edge representation learning techniques to estimate node importance on heterogeneous knowledge graphs.

Data Availability
Tis study used the movie datasets from TMDB and IMDB, the TMDB dataset can be downloaded from website https:// www.kaggle.com/datasets/tmdb/tmdb-movie-metadata, and the IMDB dataset can be downloaded from the ofcial IMDB website https://www.imdb.com/interfaces/or https:// datasets.imdbws.com/. Te datasets are available for access to customers for personal and noncommercial use.

Conflicts of Interest
Te authors declare that they have no conficts of interest.