UserRBPM: User Retweet Behavior Prediction with Graph Representation Learning

Social and information networks such as Facebook, Twitter, and Weibo have become the main social platforms for the public to share and exchange information, where we can easily access friends’ activities and in turn be influenced by them. Consequently, the analysis and modeling of user retweet behavior prediction have an important application value, such as information dissemination, public opinion monitoring, and product recommendation. Most of the existing solutions for user retweeting behavior prediction are usually based on network topology maps of information dissemination or designing various handcrafted rules to extract user-specific and network-specific features. However, these methods are very complex or heavily dependent on the knowledge of domain experts. Inspired by the successful use of neural networks in representation learning, we design a framework, UserRBPM, to explore potential driving factors and predictable signals in user retweet behavior. We use the graph embedding technology to extract the structural attributes of the ego network, consider the drivers of social influence from the spatial and temporal levels, and use graph convolutional networks and the graph attention mechanism to learn its potential social representation and predictive signals. Experimental results show that our proposed UserRBPM framework can significantly improve prediction performance and express social influence better than traditional feature engineering-based approaches.


Introduction
Due to their convenient capability to share real-time information, social media sites (e.g., Weibo, Facebook, and Twitter) have grown rapidly in recent years.They have become the main platforms for the public to share and exchange information and to a great extent meet the social needs of users.Under normal circumstances, online social networks will record a large amount of information generated by people through interactive activities, including various user behavior data.User behaviors (also called actions) in online social networks consist of posting messages, purchasing products, retweeting information, and establishing friendships.By analyzing the distribution and causality of these behaviors, we can evaluate the influence of the initiator and the communicator of the behavior, we can predict people's behaviors on social networks, and we can deepen our under-standing of human social behavior [1,2].Till now, there is little doubt that the large amount of data generated by user interaction provides an opportunity to study user behavior patterns, and the analysis and modeling of retweet behavior prediction have become a research hotspot.In addition to analyzing the retweeting behavior itself, retweeting can also help with a variety of tasks such as information spreading prediction [3,4], popularity prediction [5,6], and precision marketing [7,8].
Owing to the enormous usefulness of prediction, a variety of studies have been conducted on the task of automatic prediction in social networks.Previous researches investigated the problem of user retweet behavior prediction from different points of view.On the first approach, some researchers build retweet behavior prediction models through network topology maps of information dissemination.Matsubara et al. [9] studied the dynamics of information diffusion in social media by extending an analysis model for information dissemination from the classic "Susceptible-Infected" (SI) model.Wang and Wang [10] proposed an improved SIR model, which used the mean field theory to study the dynamic behavior in uniform and heterogeneous network models.Their experiment showed that the existence of the network would influence information communication.This kind of research method studied retweeting behavior by modeling the propagation path of the message from a global perspective.The other approach is the machine learning method based on feature engineering.Liu et al. [11] proposed a retweeting behavior prediction model based on fuzzy theory and neural network algorithm, which can effectively predict the user retweeting behavior and dynamically perceive the changes in hotspot topics.This research method relies on the knowledge of domain experts, and the process of feature selection may take a long time.The methods above build user behavior prediction models from different perspectives.The main purpose is to collect user behavior data in social networks, cluster and label user behavior data, and then exploit machine learning models to predict the retweeting behavior.However, in many online applications, such as personalized recommendation [12,13] and advertising [8], or personalized services [14], it is critical to effectively analyze the social influence of each individual and further predict the retweeting behavior of users.
In this paper, we focus on user-level social influence.We aim to predict the action statuses of the target user according to the action statuses of her near neighbors and her local structural information.For example, in social networks, a person's behavior will be affected by her neighbors.As shown in Figure 1, for the central user u, if some friends (red node) around her posted a microblog and other friends (white node) did not post it, whether the action statuses of user u will be affected by the surrounding friends and forward this tweet can be regarded as a user retweeting behavior prediction problem.The social influence hidden behind the retweeting behavior not only depends on the number of active users, but may also be related to the local network structure formed by "active" users.The problem mentioned above are common in practical applications, such as presidential elections [15], innovation adoption [16], and ecommerce [17].Therefore, it has inspired many research work on user-level influence models, most of which [18][19][20] consider complicated handcrafted features, which require extensive knowledge of specific domains.
In recent years, graph convolution networks (GCN) [21,22] are the best choice for graph data learning tasks.Inspired by the successful application of neural networks [23] in representation learning [24,25], we designed an end-to-end framework UserRBPM to explore potential driving factors and predictive signals in user retweeting behaviors.We expect deep learning frameworks to have better expressive capability and prediction performance.The designed solution is to represent both influence driving factors and network structures into a latent space, and then use graph neural networks to effectively extract spatial features for learning, and further construct a user retweet behavior prediction model.To predict the status of a target user u, we first sample her k-order local neighbors through random walks with restart.After obtaining the r-ego network as shown in Figure 1, we leverage both graph convolution and attention mechanism to learn latent predictive signals.We demonstrate the effectiveness and efficiency of our proposed framework on Weibo social networks.We compare UserRBPM with several conventional methods, and experiment results show that the UserRBPM framework can significantly improve the prediction performance.The main contributions of this work can be summarized as follows: (i) We designed an end-to-end learning framework UserRBPM to explore potential driving factors and predictive signals in user retweeting behaviors (ii) We convert the retweeting behavior prediction into a binary graph classification, which is more operable and comprehensible (iii) Experiment results demonstrate that the UserRBPM framework can achieve better prediction performance than existing methods The rest of this paper is organized as follows.Section 2 summarizes related work.Section 3 formulates the user retweet behavior prediction problem.We detail the proposed framework in Section 4. In Section 5 and Section 6, we conduct extensive experiments and analyze the results.Finally, we conclude our work in Section 7.

Related Work
2.1.User Retweet Behavior Prediction.Many studies on user retweet behavior in social networks are based on the analysis and modeling of the dynamics in the process of information dissemination.Currently, researches on user behavior prediction in social networks take primarily two approaches.On the first approach, Ota et al. [26] constructed the user topology network based on the user's following relationship and discovered users who retweeted many tweets by overlapping propagation paths of retweeting.Yuan et al. [27] investigated the dynamics of friend relationships through online social interaction and proposed a model to predict repliers or retweeters according to a particular tweet posted at a certain time in online social networks.Tang et al. [28] studied the conformity phenomenon of user behavior in social networks and proposed a probabilistic model called Confluence

2
Wireless Communications and Mobile Computing to predict user behavior.This model can distinguish and quantify the effects of the different types of conformity.Zhang et al. [29] proposed three metrics, namely, user enthusiasm, user engine, and user duration, to describe the user retweet behavior in the message spreading process and studied the relationship between these three metrics and the influence obtained by the user retweet behavior.The other approach is the machine learning method based on feature engineering, which solved the problem of user behavior analysis and prediction by manually formulating rules to extract the basic features of users and network structural features.Luo et al. [30] explored features such as followers' status, retweet history, followers' interests, and followers' active time with a learning-to-rank framework to discover who would retweet a tweet poster on Twitter.Zhang et al. [18] analyzed the influence of the number of active neighbors of a user on retweeting behavior, proposed two instantiation functions based on structural diversity and pairwise influence, and applied a classifier based on logistic regression to predict users' retweet behaviors.Jiang et al. [19] pointed out that the retweeting prediction is a singtype setting problem.By analyzing the basic influence factors of retweet behavior in Weibo, the sing-type collaborative filtering method is used to measure users' personal preferences and social influence to predict retweet behavior.
Recently, there have been efforts to detect those global patterns using deep learning.Li et al. [31] proposed an endto-end predictor that incorporated recurrent neural network (RNN) and representation learning to infer the cascade size.This method significantly improved the performance of cascade prediction.Zhang et al. [32] proposed a novel attentionbased deep neural network to obtain the user's attention interests from an attention-based neural network.Wang et al. [33] transformed the social influence prediction problem into a neural network multilabel classification problem and proposed the NNMLInf social influence prediction model.The experimental results showed that the node2vec method is more effective than the traditional manual feature extraction method in obtaining representative features of the network structure.

Graph Representation Learning.
Graph representation learning has emerged as a powerful technique for solving real-world problems.Various downstream graph learning tasks have benefited from its recent developments, such as node classification [34], similarity search [35], and graph classification [36,37].Network embedding is a bridge connecting the original data of the network and network application tasks.It is aimed at representing the nodes in the network as low-dimensional, real-valued, and dense vectors.The resulting vectors can be represented and reasoned in the vector space.Therefore, the primary challenge in this field is to find a way to represent or encode the structure of graphs so that they can be easily exploited by machine learning models.
Traditional machine learning approaches relied on userdefined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions).However, recent years have seen a surge in approaches for automatically learning to encode a graph structure into low-dimensional embedding using techniques based on deep learning and nonlinear dimension reduction.Chen et al. [38] exploited graph attention networks (GAT) to learn user node representation by spreading information in heterogeneous graphs and then leveraged limited labels of users to build end-to-end semisupervised user profiling predictor.Zhang et al. [25] introduced the problem of heterogeneous graph representation learning and proposed a heterogeneous graph neural networks model HetGNN.Extensive experiments on various graph mining tasks, i.e., link prediction, recommendation, and node classification, demonstrated that HetGNN can outperform state-of-theart methods.Fan et al. [39] first provided a principled approach to jointly capture interaction and opinions in the user-item graph, and then presented a Graph Network model (GraphRec) to model social recommendation for rating prediction.

Problem Formulation
In this section, we introduce necessary definitions and then formulate the problem of user retweet behavior prediction.
Definition 1 (ego network).The ego network model is one of the important tools for studying human social behavior and social networks.Compared with the global network version, the research version of the ego network pays more attention to individual users, and it is in line with the need for personalized services in actual application systems.The research version of this paper can also be extended to other scenarios that include network relationships.
(r -neighbors) Let G = ðV, EÞ denote a social network, where V is a set of users' nodes and E ⊆ V × V is a set of relationships between users.We use v i ∈ V to represent a user and e ij ∈ E to represent a relationship between v i and v j .In this work, we consider undirected relationships.For a user u, its r-neighbors' nodes are defined as Γ r u = fv : dðu, vÞ ≤ rg, where dðu, vÞ is the shortest path distance (in terms of the number of hops) between u and v in the network G, and r ≥ 1 is a tunable integer parameter to control the scale of the ego network.
(r -ego network) The r-ego network of user u is the subnetwork induced by Γ r u , denoted by G r u .
Definition 2 (social action).In sociology, social action is an act that takes into account the actions and reactions of individuals.Users in social networks perform social actions, such as retweeting behaviors and citation behaviors.At each time stamp t, we observe a binary action status of user u, s t u ∈ f0, 1g, where s t u = 1 indicates that user u has performed this action before or on the timestamp t, and s t u = 0 indicates that the user has not performed this action yet.Definition 3 (historical behavior).Let L denote a stream of action logs, where each log entry l ∈ L is a triple ðv, a, tÞ, representing user u ∈ Vperformed action a ∈ A before or on the timestamp t.Here, A is a set of action types, A = fðv i , a, 3 Wireless Communications and Mobile Computing tÞ | t ∈ ψ, vi ∈ Vg, and ψ denotes the time scope of historical behavior.For example, a retweeting behavior is an action in Twitter, and a citation is an action in academic social networks.
In this paper, our research motivation of the user retweeting behavior prediction problem can be vividly illustrated through the example shown in Figure 1.For a user u in her 2-ego network (i.e., r = 2), if some users retweet the message m before or on the timestamp t, they are considered to be active.We can observe the action statuses of u's neighbors, such as s t v 1 = 1, s t v 2 = 1, and s t v 5 = 0.Moreover, the set of active neighbors of user u is represented by ψ t u = fv 1 , v 2 , v 3 , v 4 , v 6 g.As shown in Figure 1, we study whether the action statuses of user u will be influenced by the surrounding friends and forward this message.Next, we will formalize the problem of user retweet behavior prediction.
Problem 1 (user retweet behavior prediction, [18]).User retweet behavior prediction models the probability of u's action states conditioned on her r-ego network and the action states of her r-neighbors.More formally, given G r u and S t u = fs t v : v ∈ Γ r u /fugg , it can be concluded that the user retweet behavior prediction formula of user u after a given time interval Δt is as follows: Practically, A v denotes the predicted social action status of user u.Suppose we have N instances, and each instance is a 3-tuple ðu, a, tÞ, where u is a user, a is a social action, and t is a timestamp.For such a 3-tuple ðu, a, tÞ, we also know u's r-ego network G r u , the action states of u's r-neighbors S t u , and u's future action states at t + Δt, i.e., s t+Δt u .We then formulate user retweet behavior prediction as a binary graph classification problem which can be solved by minimizing the following negative log-likelihood objective w.r.t.model parameter θ:

Model Framework
In this paper, we formally propose the UserRBPM to address the problem of user retweet behavior prediction.The framework is based on graph neural networks to parameterize the probability in equation ( 2) and automatically detect the potential driving factors and predictive signals of user retweet behavior prediction.As shown in Figure 2, UserRBPM consists of the pretrained network embedding layer, the input layer, the GCN/GAT layer, and the output layer.
4.1.Sampling Near Neighbors.Given a user u, the most straightforward way to extract its r-ego network is to perform a Breadth-First-Search (BFS) starting from user u.However, for different users, the r-ego network scale (regarding the number of nodes) may vary greatly.Meanwhile, the size of user u's r-ego network can be very large due to the smallworld property in social networks [40].In real-world application scenarios, when sampling neighbor nodes of an ego user node, the problem that may arise is that each node has a different number of neighbors.Specifically, because of the small-world phenomenon in social networks, the size of user u's r-ego network may be relatively very large or small.In addition, these different sizes of data are not suitable for most deep learning models.
In order to address the above problem, we select to perform random walk with restart (RWR) [41] from the original r-ego network to fix the sample size.Inspired by [42,43] which suggested that people are more susceptible to be influenced by active neighbors than inactive ones, we start a random walk on from user u or its active neighbors.The walk iteratively travels to its neighborhood with a probability proportional to the weight of each edge.Besides, the walk returns back to the starting vertex u with a positive probability at each step.In this way, a fixed number of vertices can be collected, denoted by Γ u ⟶r with j Γ u ⟶r j = n.We then regard the subnetwork G u ⟶r induced by Γ u ⟶r as a proxy of the r-ego network G r u , and denote S u ⟶t = fS t v : v ∈ Γ u ⟶t \ fugg to be the action statuses of u's sampled neighbors.
When we use RWR, the starting node can be an ego user or its active neighbors.Firstly, the purpose of setting as described above is to make the starting node in the sequence obtained by walking as much as possible to keep in touch with surrounding neighbors, instead of being relatively single.Thus, it can further support that people are more susceptible to active neighbors.Accordingly, the random walk with a restart strategy can meet this requirement.Secondly, the starting nodes include ego users and active neighbors.This setting allows ego users and active neighbors to participate in the next embedding process as much as possible, which also shows that active neighbors will affect its surrounding nodes.

Graph Neural Network Model. We design an effective graph neural network model to incorporate both the structural properties in G u
⟶r and action statuses in S u ⟶r , learn a hidden embedding vector for each ego user, then use it to predict the action statuses S t+Δt u of the ego user in the next period of time.As shown in Figure 2, the graph neural network model includes the embedding layer, instance normalization layers, the input layer, the graph neural network layer, and the output layer.

Embedding Layer.
Representation learning [44,45] has been a hot topic in the academic community in recent years.In the context of graph mining, there have been many studies on graph representation learning.For graph structure data such as social networks, we want to learn the users' social representation from users' relationship network data, that is, our main purpose is to discover the network structural properties and encode them into low-dimensional latent space.More formally, network embedding learns an embedding matrix X ∈ R d×jVj , with each column corresponding to the representation of a vertex (user) in network G.In our scheme, we learn a low-dimensional dense real number vector x v ∈ R d for each node v in the network, where d ≪ N. The process of network representation learning can be unsupervised or semisupervised.
In social networks, when considering the structural information, we can take the triadic closure patterns with strong ties as an example [46].As shown in Figure 3, there will be such a case: the figure on the left contains a triadic closure.For the green node, it is equivalent to a different tree structure on the right (there is no triadic closure) after two neighborhood aggregations, which ignore the structural information of triadic closure.Therefore, there is a need for a method of graph representation learning that can adapt to different local structures.
In our work, we utilize the GraLSP model [47] for graph representation learning, which explicitly incorporates local structural patterns into the neighborhood aggregation through random anonymous walks.Specifically, the framework captures the local structural patterns via random anonymous walks, and then these walk sequences are fed into the feature aggregation, where various mechanisms are designed to address the impact of structural features, including adaptive receptive radius, attention, and amplification.In addition, GraLSP can capture similarities between structures and are optimized jointly with near objectives of nodes.The process of the GraLSP model for graph representation learning is shown in Figure 4.In the case of making full use of the structural model, GraLSP can outperform competitors in various prediction tasks in multiple datasets.

Instance Normalization Layer.
In the training process of the UserRBPM model, we applied Instance Normalization (IN) [48] to prevent overfitting, which is a regularization technique that loosens the model and allows for greater generalization.And for such tasks that focus on each sample, the information from each sample is very important.Therefore, we adopt such a technique in the task of retweet behavior prediction.After original data is normalized, the indicators are between ½0, 1, which is suitable for comprehensive comparative analysis.Furthermore, it helps to speed up learning and also reduces overfitting.
As illustrated in Figure 2(c), for each user v ∈ Γ u ⟶r after retrieving her representation x v from the embedding layer, the instance normalization y v is given by for each embedding dimension d = 1, 2, ⋯, D, where Here, μ d and σ d are the mean and variance, respectively, and ε is a small number for numerical stability.Intuitively, such normalization can remove instance-specific mean and variance, which encourages the downstream model to focus on relative positions of users in latent embedding space rather than their absolute positions.As we will see later in Section  5 Wireless Communications and Mobile Computing neighbor nodes to supplement the information of the current node.In this way, we can get more complete information than a single individual characteristic.Therefore, how to combine the characteristics of neighbors with the current node is a critical part of its realization.
The recently developed GCN is a successful attempt to generalize the convolutional neural networks used in Euclidean space to graph structure data modeling.The GCN model naturally integrates the connection mode and feature attributes of graph structure data, and it is much better than many state-of-the-art methods on benchmarks.GCN is a semisupervised learning algorithm for graph structure data, which can effectively extract spatial features for machine learning on such a network topology.Simultaneously, it can perform end-to-end learning of node feature and structure information, which is one of the best choices for graph data learning tasks at present.Suppose an undirected graph has n nodes, each node has d-dimensional features, the adjacency matrix of the graph is denoted as A, and the matrix composed of all node features is denoted as X, where X = ½x 1 , x 2 , ⋯, x n T ∈ R n×d and x i ∈ R d is the d-dimensional feature vector of node i.If the labels of a set of nodes are given, our goal is to predict the labels of the remaining nodes.Thus, for the GCN network, the input is a node feature matrix X ∈ R n×d and the propagation between layers can be expressed as follows: where H l is the lth activation matrix, W l is the trainable weight matrix of the lth layer, and this layer functions as a feature map.σ is a nonlinear activation function, LðGÞ is a n × n matrix that can capture the structure information of graph G, and GCN uses symmetric normalization to perform aggregation operations.The conventional graph convolution operation is as follows: where A = ½a ij ∈ R n×n is a nonnegative adjacency matrix, is the degree matrix of A, d i = ∑ j a ij is the degree of node i.
We can divide the above learning process into three parts.The first part is transformation: transform and learn the current node features.The second part is aggregation: aggregate the features of neighboring nodes to get the new features.The third part is activation: use an activation function to increase nonlinearity.4.2.5.GAT Layer.Essentially, both GCN and GAT are aggregation operations that aggregate the characteristics of neighbor nodes into the central node.GCN uses the Laplacian matrix to perform graph convolution operations, while GAT introduces the attention mechanism into GCN, which can add weight to the influence of neighboring nodes, thereby differentiating the influence of neighboring nodes.GAT assigns different weights to each node, paying attention to those nodes with greater effects, while ignoring some nodes with smaller effects.To a certain extent, the performance ability of GAT will be stronger, because the correlation between node features will be better integrated into the model.4.2.6.Output Layer.In the output layer, each node corresponds to a two-dimensional representation, which is used to represent the behavior state (retweet/unretweet, cite/uncite, etc.).The calculation process is shown in equation (8).By comparing the representation of the ego user with ground truth, we then optimize the log-likelihood loss: where Θ is a weight matrix, b represents the bias term.

Experiment Setup
In this section, we first introduce the construction process of the dataset and the experimental hypothesis.Then, we  and user u 2 is called the followee of user u 1 .The dataset was crawled in the following ways.To start with, 100 users are randomly selected as seed users, and then user information of their followers and followees was collected.A total of about 1.8 million users and 300 million social relationships were obtained during the crawling process.A total of 1 billion microblogs were generated in this process, and all user profiles were also crawled which contain the name, gender, verification status, #bifollowing, #followers, #followees, and #microblogs.Besides that, to protect the privacy of users, we have desensitized the user id.The statistical information of raw data is shown in Table 1.

The Assumption of Experiments.
The main problem studied in our paper is as follows: when a certain microblog is visible to a certain user, we predict whether the user will retweet the microblog within a certain period, which is a supervised binary classification problem.The experiments in our work are based on the following assumption: (i) The assumption of visibility: as long as a user sends (an original post or retweeted) a microblog, all his followers will see the microblog (ii) We aim to predict the action status of users at a specific time (iii) The assumption of timeliness: if the retweeting time of a user is more than 72 hours from the time that a microblog was first sent out, the sample will be discarded (iv) For a microblog, only when a user has active neighbors, that is, the user has followed the original creator or his followers have retweeted the microblog, we only consider whether the user will retweet this microblog (v) For a microblog, if a user has appeared as a positive sample at time t, we will not predict the action status after time t Specifically, the assumption of Weibo Visibility is set because, with more than 100 million daily active users of Sina Weibo, each user receives a large amount of information every day.If a user follows a large number of users, some microblogs will likely be overwhelmed by other messages before the user can decide whether to retweet them.
We give a restriction on whether a microblog will be retweeted at a certain time.Because for a user and a microblog, the user may not retweet the microblog when he saw it for the first time.At this time, we generate a negative sample.When he retweeted the microblog the second time he saw it, we create a positive sample.In other words, users have different action statuses in different periods.For users who did not retweet, they were regarded as negative samples since the first time they saw a microblog.If the status of each time is retained and predicted, there will be many negative samples generated.Therefore, we give the second assumption.Besides, as Zhang et al. [18] pointed out that for most microblogs, 72 hours after their first posting, the number of retweets drops dramatically, and there will be almost no user retweeting.So, we set up the third assumption.
Sina Weibo does not restrict users from retweeting messages sent by users who have not followed them in real application scenarios.There are often phenomena that users actively search for microblogs and retweet them or retweet popular microblogs.This situation is beyond the scope of our work, so we set up the fourth assumption.The last assumption is that once a user has a retweeting behavior, it will be regarded as a positive sample at every subsequent time point.Repeated predictions in the experiment are of little significance.

The Generation of Samples.
In retweeting behavior prediction, since we can directly learn from the microblogs' record which users have retweeted the microblogs, the extraction process of positive samples is relatively simple.Thus, for a user v who is affected by others, he performs a social action at a certain timestamp t, then we generate a positive sample.Compared with the extraction of positive samples, it is impossible to directly know from the microblogs' records which users saw the message but did not retweet the microblogs.Therefore, the extraction method of negative samples is much more complicated.Our work is based on the assumption of visibility.We suppose that after a user posts a microblog, all of her followers can see the microblog.If someone saw the microblog but did not retweet it, we create a negative sample.
Through the analysis of the experimental data, we found that about 1.25% are original microblogs, most of which were retweeted microblogs, and 85.39% of the microblogs have been retweeted at least ten times, verifying that the retweeting behavior of users is pervasive.However, for our research scenarios, in the process of solving research problems, there are two data imbalance problems in our dataset.The first one comes from the number of active neighbors.As Zhang et al.

7
Wireless Communications and Mobile Computing [18] observed, structural features are significantly related to user retweeting behavior when the ego user has a relatively large number of active neighbors.However, in most social influence datasets, although retweeting behavior is ubiquitous, the ratio between the number of active neighbors and the number of inactive neighbors is not balanced.For example, in the Weibo dataset, 80% of users have only one active neighbor and users with more than 3 active neighbors account for only 8.57%.Therefore, when we train our model on such an imbalanced dataset, the model will be controlled by observation samples with few active neighbors.To address the issues caused by data imbalance and to illustrate the superiority of our proposed model in capturing local structural information, we established a balanced subdataset Edata (as shown in Table 2) for fair data analysis and a further training-test scheme.Specifically, we filter out samples in which the followers or followees did not have Weibo content.Besides, we only considered samples in which ego users have at least 3 active neighbors.
Imbalanced labels are the second problem.For instance, in our Weibo data set, the ratio between positive and negative instances is about 1 : 300.Normally, the model is trained to optimize the overall accuracy, and the weights of different types of misclassification are the same when calculating the overall error.As a result, the trained model tends to judge samples belonging to the minority classification as the samples of the majority classification.Moreover, the generalization ability of the model is poor and the minority classification cannot be accurately judged.Our goal is to find out those users who will retweet, rather than pay attention to those who do not retweet.Therefore, the datasets need to be balanced.
To address the above problem, the most direct way is to select a relatively balanced dataset, that is, set the ratio of positive samples and negative samples to 1 : 3.In addition, we also used the global random downsampling method and microblog granularity-based downsampling method to process imbalanced datasets.Among them, when we use the global random downsampling method, the number of microblogs involved in the negative samples in the obtained dataset is small, and there is a case where only positive samples of the same microblog are not sampled to their corresponding negative samples.The downsampling method based on microblog granularity can try its best to ensure that the number of positive and negative samples of the same microblog is also the same.The data statistics of the balanced sample set are shown in Table 3.To visually verify from the results that the downsampling strategy adopted in our work is more suitable for our research scenarios, we conducted a comparative analysis in Section 6.

The Features of Our
Design.We made detailed data observations and analyzed how the characteristics of users at the spatial and temporal levels influence users' retweeting behavior in addition to the structural attributes of social networks.To visualize the observation results, we design several examples of statistical information, which, respectively, represent spatial-level features and temporal-level features.These characteristics can be regarded as user node features.In our work, the spatial-level features are specifically analyzed in terms of social roles.We studied the influence of social roles played by different users on the prediction performance of retweeting behavior.Inspired by the previous research work of Wu et al. [50] and Yang et al. [51], we divide users into three groups according to their network attributes: opinion leaders (OpnLdr), structural hole spanners (StrHole), and ordinary users (OrdUsr).Specifically, we consider that 5% of users with the highest PageRank score are opinion leaders, 5% of users with the lowest Burts Constraint score are structural hole spanners, and the rest are ordinary users.A detailed analysis of users' social roles and social behaviors is shown in Table 4.For the temporal-level feature, we mainly analyzed the content of the messages posted by users, and we defined the following features: (i) Similarity: the TF-IDF similarity between ego user's and followees' post content within a month (ii) Exposure: the number of microblogs posted by followees within a month (iii) Retweet rate: the retweet rate of ego users to their followees 5.2.Comparison Methods.In order to verify the effectiveness of our proposed framework, we compared the prediction performance of UserRBPM in this paper with existing representative methods.Firstly, we compared UserRBPM with previous retweeting behavior prediction methods which usually extract rule-based features.Secondly, by comparing the GraLSP method with other network embedding methods, it is verified that the local structure information plays a more important role in the prediction of retweeting behavior than the global information.The comparison method is as follows: (i) Handcrafted features + Logistic Regression (LR): we use the logistic method to train the classification model.The features we constructed manually include two categories: one is the user node features designed in our work, including spatial-level and temporal-level features; the other is the ego network features designed by Qiu et al. [49].The features we used are listed in Table 5 (ii) Handcrafted features + Support Vector Machine (SVM): we also use SVM as the classification model.The model uses the same features as the LR method (iv) Node2vec: Node2vec [53] further extends the Deep-Walk method by changing the way that the random walk sequence is generated.This is a network embedding method that designs a biased random walk that can tradeoff between homophily and structural equivalence of the network (v) Our proposed method: in our proposed UserRBPM framework, we use GraLSP to extract the structural attributes of the r-ego network, design the user node features at the spatial level and temporal level, and finally apply GCN and GAT to learn latent predictive signals

Performance Metrics.
In order to quantitatively evaluate our proposed framework, we use the following metrics to evaluate the performance of retweeting behavior prediction.Specifically, we evaluate the performance of the UserRBPM in terms of Area Under Curve (AUC) [35], Precision, Recall, and F1-score.
(i) Precision: it is for the predicted result.It measures the probability that a predicted positive instance would be the true positive (ii) Recall: it is for the original sample.It measures the probability that the true positive would be predicted to be the positive instance (iii) Area Under ROC Curve (AUC): it measures the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one.
(iv) F1-score: it is a comprehensive evaluation metric that integrates precision and recall 5.3.2.Parameter Sensitivity.In addition, parameter sensitivity is also considered in our work.We analyzed several hyperparameters in the model and tested how different hyperparameter choices affect the prediction performance.

Implementation Details.
There are two stages for training our UserRBPM framework.In the first stage, we pretrain each module of UserRBPM, and in the second stage, we integrate the three modules of UserRBPM for fine-tuning.

Stage I:
Pretraining of Each Module.For our framework, UserRBPM, we first perform a random walk with a restart probability of 0.8 and set the size of the sampled subnetwork to be 30.For the embedding layer, the embedding dimension of the GraLSP model is set to three dimensions of 32, 64, and 128, and we train GraLSP for 1000 epochs.Then, we choose to use a three-layer GCN or GAT network structure; the first and second GCN/GAT layers both contain 128 hidden units, while the third layer (output layer) contains 2 hidden units for binary prediction.In particular, for UserRBPM with multihead graph attention, both the first and second layers consist of K = 8 attention heads, and each attention head computes 16 hidden units (total 8 × 16 = 128 hidden units).The network is optimized by the Adam optimizer with a learning rate of 0.1, a weight decay of 5e-4, and a dropout rate of 0.2.To evaluate the model performance and prevent information leakage, we performed fivefold cross-validation on our datasets.Specifically, we select 75% instances for training, 12.5% instances for validation, and 12.5% instances for testing.In addition, the minibatch size is set to be 1024 in our experiments.

Stage II:
Global Fine-Tuning.In the global fine-tuning stage, if the dimension of an embedding layer is set too large, then the training process will be too slow, while a small setting will affect the performance of our model.After finetuning the model, we found that the model performance is relatively stable when the embedding dimension is set to 64.Then, we fix the parameters of the pretrained embedding module, and train the GCN/GAT layer with the Adam optimizer for 1000 epochs, with a learning rate of 0.001.A larger learning rate can make the model learn faster, thereby accelerating the convergence speed, but the performance of the model will be affected to some extent.Therefore, we set a relatively large learning rate at the beginning, and then we gradually decrease it during training.Finally, we choose the best model by stopping using the loss on the validation sets as early as possible.The training process of the UserRBPM model is shown in Algorithm 1.

Experiment Results
In this section, we give the quantitative and qualitative results of retweeting behavior prediction, then analyze the structural attributes and the interaction between spatial-level features and temporal-level features.Finally, the robustness of the UserRBPM framework is verified.
6.1.Prediction Performance Analysis 6.1.1.Overall Performance Analysis.To verify the influence of the structural attributes of users' ego network and the characteristics of user nodes (extracted from the spatial and temporal levels) on the prediction performance, as well as the interaction between features at different levels, we made the Temporal-level features The TF-IDF similarity between an ego user and its followees' post content within a month (similarity) The number of microblogs posted by the followees within a month (exposure) The retweet rate of ego users to their followees (retweet rate) Handcrafted ego network features [49] The  6.
Based on the analysis of four evaluation metrics used in our work, the performance of UserRBPM is better than the abovementioned benchmark methods, which demonstrate the effectiveness of our proposed framework.From the comparison among DeepWalk + ST&HC + GAT, Node2vec + ST&HC + GAT, and UserRBPM, we can observe that the GraLSP model we leverage in the embedding layer can indeed capture local structural patterns and significantly outperforms the first two methods in the experiment, confirming that the GraLSP can indeed capture local structural patterns in retweeting behavior prediction.Experiment results show that UserRBPM outperforms DeepWalk + ST + GAT by 3.76% in terms of precision and by 0.40% in terms of AUC.Moreover, the performance is also better from the perspective of Recall and F1-score.Meanwhile, from the comparison among ST&HC + LR, ST&HC + SVM, and UserRBPM, we notice that UserRBPM achieve an improvement of 13.59% in terms of precision.Such improvement verifies that the end-to-end learning framework UserRBPM can effectively detect potential driving factors and predictive signals in retweeting behavior prediction.
Comparing the first four methods (HC + LR, ST&HC + LR, HC + SVM, and ST&HC + SVM) with our proposed UserRBPM, it can be shown that the model which takes handcrafted features as input hardly represents interaction effects, while network embedding technology and graph attention can effectively extract high-dimensional structural attributes and can express highly nonlinear interaction mechanisms.Furthermore, from the comparison between HC + LR and ST&HC + LR (HC + SVM and ST&HC + SVM), Figure 5 shows that ST&HC + LR is notably better than HC + LR for retweeting behavior prediction.It reveals that users' spatial-level features and temporal-level features are the potential driving factors of retweeting behavior in social networks.Additionally, we observe that ST&HC + LR performs 4.42% better than HC + LR in terms of precision, verifying that the spatial-level and temporal-level features we designed have improved the prediction performance to a certain extent.
From the comprehensive analysis of the part to the whole, the space-level features and time-level features we designed have improved prediction performance.The combination of structural attributes and node features can further improve prediction performance.Therefore, it demonstrates the effectiveness of our UserRBPM framework for retweeting behavior prediction.

Prediction Performance of Different Sampling
Strategies.When dealing with data imbalance, we apply downsampling based on microblog granularity, instead of completely random downsampling.This sampling method would ensure that the number of positive and negative samples covered by the same microblog is the same, and the number of microblogs covered by negative samples is sufficient.
We use three sampling methods to obtain different training models.Among them, the directly sampling method (DSM) represents that we directly extract relatively balanced samples based on the ratio of the original positive and negative samples, that is, the ratio between positive and negative samples is set to 1 : 3. The number of positive samples and negative samples in completely random downsampling (CRDM) and our downsampling method (ODM) is the same.Experiment results are illustrated in Figure 6.Compared to the completely random downsampling method, the model trained with the samples obtained by our downsampling method has better prediction performance.The better the prediction effect of the model obtained by the training data training, the more it shows that the dataset has universal significance and the learned model has a stronger generalization ability.In the original imbalanced datasets, the direct extraction of positive and negative samples with a ratio of 1 : 3 is simple, but the difference in the number of microblogs covered by the positive and negative samples is ignored.Therefore, the downsampling method based on microblog granularity is more suitable for the user retweeting behavior prediction problem that we researched.

Comparative Analysis of Graph Convolution and Graph
Attention.Table 7 is the prediction performance of two variants of graph deep learning, that is, the experimental results of using graph convolutional network (GCN) and graph attention mechanism (GAT) to build models, respectively.In previous research work, we have seen the success of GCN in classification tasks.However, in the application scenarios of our work, we observe that the performance of GCN in models constructed by different graph embedding technologies is generally worse than that of GAT.We attribute its disadvantage to the homophily assumption of GCN, that is, connected vertices tend to be similar (for example, have the same label).Under such assumption, for a specific vertex, GCN computes its hidden representation by taking an unweighted average over its neighbors' representation.This homophily exists in many real networks, but in our research scenario, different neighbor nodes may have different importance.Therefore, the graph attention mechanism (GAT) is introduced to assign different weights to different neighboring nodes.In essence, GAT is an aggregation func-tion that focuses on the differences between neighbor nodes, rather than simple mean aggregation.
Besides, we wanted to avoid using handcrafted features and make UserRBPM a pure end-to-end learning framework, so we compared the prediction performance with additional vertex features and without additional vertex features.Comparison results of prediction performance with or without vertex features are presented in Table 8.It is observed that UserRBPM_GCN with handcrafted vertex features achieved an improvement of 2.18% in terms of precision, 1.46% in terms of recall, 0.44% in terms of F1-score, and  This technique provides benefits to improve the classification performance.For instance, it can learn faster while maintaining or even increasing accuracy.Moreover, it also partially serves as a parameter tuning method.Therefore, we applied IN and obtained a boost in both performance and generalization.

Conclusion
In this work, we focus on user-level social influence in social networks and formulate the problem of user retweet behavior prediction from a deep learning perspective.Unlike previous work that built a prediction model of retweet behavior based on network topology maps of information dissemination or feature engineering-based approaches, we proposed a UserRBPM framework to predict the action status of a user given the action statuses of her near neighbors and her local structural information.Experiments on a large-scale realworld dataset have shown that the UserRBPM significantly outperforms baselines with handcrafted features in user retweet behavior prediction.This work explores the potential driving factors and predictable signals in user retweet behavior in hope that the deep learning framework has the better expressive ability and prediction performance.For future research, experimental datasets related to this research field still contain rich social dynamics, which are worth further exploration.We can study user behavior in a semisupervised manner, develop a generic solution based on heterogeneous graph learning, and then extend it to many network mining tasks, such as link prediction, social recommendation, and similarity search.Through such a learning scheme, we can leverage both unsupervised information and limited labels of users to build the predictor, and verify the effectiveness and rationality of user behavior analysis on real-world datasets.

Figure 1 :
Figure 1: A motivating example of user retweet behavior prediction.

Figure 5 :
Figure 5: Prediction performance of different features.

Figure 8 Figure 6 :
Figure 6: Prediction performance of different sampling strategies.

Figure 7 :Figure 8 :
Figure 7: Prediction performance with different training and test data size.

6
[48]49]s Communications and Mobile Computing present the existing representative methods and evaluation metrics.Finally, we introduce the implementation details of the UserRBPM framework.5.1.Dataset Presentation and Processing5.1.1.The Presentation of Raw Datasets.We use real-world datasets to quantitatively and qualitatively evaluate the proposed UserRBPM framework.We used the Weibo dataset in the work of Zhang et al. and Qiu et al.[18,49], and Ulyanov et al.[48]also used the Weibo dataset in their work, and then we performed data preprocessing according to our research question.The microblogging network used in our research work is used to crawl data from Sina Weibo, similar to Twitter social media, which is a broadcast-style social network platform that shares brief real-time information through the follow mechanism.The follow mechanism of Weibo is divided into one-way following and mutual following.Particularly, when user u 1 follows user u 2 , u 2 's activities (such as tweet and retweet) will be visible to u 1 .User u 1 can choose to tweet or retweet useru 2 .User u 1 is called the follower of user u 2 ,

Table 1 :
Statistics of raw data.

Table 3 :
Data statistics of the balanced sample set.

Table 4 :
The statistics of social roles and relation statuses.

Table 5 :
List of features used in this work.
number/ratio of active neighbors Density of subnetwork induced by active neighbors Connected components formed by active neighbors Input: Datasets(train_loader, valid_loader, test_loader), Learning rate, Weight decay, Epochs, Batch size.Output: The predictive value of the test samples and the prediction performance of the model.
In addition, we consider parameter sensitivity in our work.We analyzed several hyper parameters in the model and tested how different hyperparameter choices affect the prediction performance.6.2.1.Robustness Analysis.To verify the robustness of the UserRBPM framework, we changed the proportion of the training set, validation set, and test set and then redo the experiments.The results in Figure7show that the model is effective under limited training data size.Even with a small size of the training set (20%-40%), our model can still have an acceptable and steady performance.6.2.2.Effect of Instance Normalization.As mentioned in Section 4, this paper studied the technique used to accelerate model learning called Instance Normalization (IN).

Table 8 :
The effect of handcrafted vertex features on prediction performance (%).IN layer takes about 1892 seconds per epoch.It was calculated that the model with the IN layer increased the training time for each epoch by about 87% compared to the model without the IN layer.Yet, we believe it is worthwhile to apply IN, as the additional training time is compensated with a faster learning rate (it requires less number of epochs to reach the same level of precision) and can ultimately achieve higher testing precision.