Heterogeneous Graph Convolutional Network-Based Dynamic Rumor Detection on Social Media

. The development of social media has provided open and convenient platforms for people to express their opinions, which leads to rumors being circulated. Therefore, detecting rumors from massive information becomes particularly essential. Previous methods for rumor detection focused on mining features from content and propagation patterns but neglected the dynamic features with joint content and propagation pattern. In this paper, we propose a novel heterogeneous GCN-based method for dynamic rumor detection (HDGCN), mainly composed of a joint content and propagation module and an ODE-based dynamic module. The joint content and propagation module constructs a content-propagation heterogeneous graph to obtain rumor representations by mining and discovering the interaction between post content and propagation structures in the rumor propagation process. The ODE-based dynamic module leverages a GCN integrated with an ordinary diﬀerential system to explore dynamic features of heterogeneous graphs. To evaluate the performance of our proposed HDGCN model, we have conducted extensive experiments on two real-world datasets from Twitter. The results of our proposed model have outperformed the mainstream model.


Introduction
Recent years have witnessed the prevalence of mobile Internet technology and social networks, which provide a convenient platform for users to obtain a large amount of information and express their opinions. Since more people have gotten involved in hot topics and exchanged their views on social media, the rumors have been circulated. Meanwhile, rumors can be manipulated to spread widely and mislead users for certain purposes [1,2], which can damage the reputations of individuals and have a detrimental effect on society [3][4][5]. For example, during the COVID-19 outbreak, the world has been facing a large number of healthrelated rumors about coronavirus. One of the typical rumors is that a traditional Chinese medicine called Shuanghuanglian can prevent novel coronavirus, causing many citizens to queue to buy overnight [6]. e presence of such rumors on social media seriously affected social stability. erefore, it is urgent to come up with automatic detection methods to identify rumors accurately and timely.
Most of the early detection methods adopt handcrafted feature engineering to extract features from user profiles [7,8], content [9][10][11][12], and propagation patterns [13][14][15] to train supervised classifiers. Corresponding methods are used for training, such as Decision Tree [16], Random Forest [8], and Support Vector Machine [7]. However, these methods that rely on manual feature engineering consume both time and workforce. Moreover, those methods cannot capture the higher-dimensional feature representations from rumor propagation.
Recently, deep learning has been successfully applied in computer vision, natural language processing, and recommender system. Given its ability to automatically extract low-and high-order feature representations, many studies have exploited deep learning methods in rumor detection. e deep learning model, long short-term memory (LSTM), and gated recurrent unit (GRU) [17] are applied to capture sequential features from rumor propagation along time. It is the first time introducing deep learning to obtain temporal representations of rumor propagation. en, Ma et al. [18] designed a tree-based recursive neural network (RvNN) to learn the semantic sequence and propagation structure from two directions for rumor detection. ese approaches focus on sequential propagation but neglect dispersion. Bruna et al. [19] pointed out that Convolutional Neural Network (CNN) could obtain the relevant features of local neighbors. erefore, CNN-based methods [20,21] were put forward to mine the dispersion structure of the rumor propagation, obtaining local correlation features representation from neighbor retweets. However, these methods cannot deal with the global structure of rumor propagation. In order to capture global structure feature representations during propagation, graph convolutional network (GCN) is proposed [22]. Since GCN is widely used in complex networks such as social networks and physical systems, Wu et al. [15] proposed an undirected GCN model to obtain global structural features of rumor propagation. Still, they do not take the direction of propagation into account. erefore, Bian et al. [23] leveraged a GCN with a bidirectional graph network to learn the global features of patterns and diffusion of rumor propagation. ese methods mentioned above only take propagation structure into consideration and ignore rumor content features which are also vital for rumor detection [24,25]. Since these methods above do not take the heterogeneous information into consideration, a heterogeneity networkbased method [26] is proposed to tackle the complex network, which makes positive performance to link prediction. Huang et al. [27] proposed a heterogeneous method by constructing a tweet-word-user heterogeneous graph to make the most of content and propagation information. Besides, rumor propagation is a continuoustime dynamic system [28]. Recent studies on rumor detection do not pay attention to dynamic changes during rumor propagation. Nevertheless, it is essential to delve into dynamic changes of rumor propagation structure for better performance. Shang et al. [29] found limitations of the static graph in analyzing social network and explored interaction evolution of the dynamic network. To date, some studies have focused on using ordinary differential equations to capture the continuous-time dynamic of the complex networks [30][31][32] in other fields, such as fluid mechanics [33,34]. Yildiz [35] proposed a method based on ordinary differential equations and an autoencoder to learn high-dimensional complex continuous-time latent dynamics. Choi et al. [36] leveraged neural ordinary differential equations (NODEs) to learn user and product coevolving embedding over time. e NODEs consider the time variable t as continues, which enables us to obtain the continuous-time dynamic [37]. e results of the above NODEs demonstrate better performance than only the neural network method. is is because NODEs combine the high-capacity function approximation of deep learning and the high-efficiency dynamic acquisition of differential equations, which offers the best-of-both-worlds approach to obtain dynamic features. e typical architecture of NODEs is shown in Figure 1.
To deal with both heterogeneous information and dynamic features of rumor detection, in this paper, we propose a novel heterogeneous graph convolutional network-based method for dynamic rumor detection (HDGCN), which integrates an ordinary differential equation system with GCNs to capture the dynamic evolution of the heterogeneous graph. We first construct a heterogeneous graph based on combining content and propagation structures with learning rumor representations, as shown in Figure 2. en, we design an ODE-based graph convolutional network to learn dynamic representations of nodes during the propagation process. Finally, we introduce a meaning-pooling layer and a fully connected layer to predict rumor classes.
Compared with existing methods, the proposed approach HDGCN is capable of (1) exploiting rumor representation with joint content and propagation pattern; (2) precisely modeling continuous-time dynamic relevance by an ordinary differential equation system to obtain the instantaneous rate of changes in the heterogeneous graph; (3) effectively capturing and leveraging propagation trending and content for rumor classification. To summarize, the main contributions of this work are as follows: (1) is novel method constructs the joint post content and rumor propagation structure as a heterogeneous graph to obtain rumor representation, containing textual information and initial propagation information.
(2) We explore a novel ODE-based graph convolutional network method that integrates an ordinary differential system and GCNs to acquire dynamic changes of the heterogeneous graph. Moreover, this is the first work to leverage the differential equation to obtain the dynamic evolution of heterogeneous graph for rumor detection. (3) We perform extensive experiments on real-world Twitter15 and Twitter16 datasets. e result demonstrates better effectiveness of our proposed method HDGCN compared to mainstream methods. e rest of the paper is organized as follows: in Section 2, related works are reviewed; in Section 3, details of our proposed model HDGCN are introduced; we present the experimental results and discussions in Section 4; finally, Section 5 is the conclusion.

Related Work
In recent years, automatic rumor detection on social media has attracted considerable attention. Previous works for rumor detection on social media have focused on handcrafted feature engineering, extracting rumor features from text content, user profiles, and propagation structures to design a classifier. Castillo et al. [16] pioneered engineering features and classified the rumor using Support Vector Machine and so on, which represented the starting point for 2 Complexity many later works. Yang et al. [7] formulated rumor classification by extending emotional features to enhance the strength of rumor credibility assessment. Dayani et al. [38] proposed a method for obtaining user features and content features by K-Nearest neighbor and Naive Bayes classifier, respectively. Wu et al. [15] modeled the propagation structures by using the random walk method to extract features of the propagation process. Ma et al. [14] constructed a propagation tree kernel to detect rumors by evaluating similarities among the structures of the propagation tree. However, these methods all rely on manual feature engineering to obtain valid information, which is inefficient and cannot extract high-dimensional feature representations.
In order to automatically learn high-dimensional features, many rumor detection methods based on deep learning have been proposed. For propagation structures, Ma et al. [17] leveraged recurrent neural networks (RNN) to capture hidden feature representations of variation tweet content over time. Liu et al. [39] proposed a time-series classifier that incorporates both CNN and RNN, capturing the global and local variations of content features and user features, respectively. Ma et al. [18] designed a tree-based recursive neural network (RvNN) to learn the semantic sequence and propagation structure from two directions for rumor detection. Guo et al. [40] proposed a hierarchical bidirectional long short-term memory model combined with an attention mechanism for learning representations.
For content preprocessing, Varshney et al. [41] developed an automated system to distinguish fake and real news from content resemblance features. Kaliyar et al. [42] proposed the FakeBERT model based on TextCNN [24,25] and BERT [43] to learn news textual information for detection. Apart from text content, Meel et al. [44] proposed a multimodal fake news detection framework to analyze multimodal data, such as textual information and image features. Besides, Varshney et al. [45] also proposed a scheme to extract text information from the videos to classify fake videos with user profiling and human consensus features.
However, deep learning models mentioned above have no ability to process graph-structural data of rumor propagation, but graph neural network lends itself to learning graph features [22]. Wu et al. [15] proposed a GCN model to obtain global structural features of rumor propagation. Malhotra et al. [46] leverage GCN to classify the constructed propagation graph and use RoBERTa-based word embedding [47] to understand linguistic cues. Additionally, Dou et al. [48] proposed a method based on BERT [43] and GCN, which captures various signals from user preferences. ese methods, however, are inefficient in learning the dynamics of rumor propagation.
Compared to the GCN models mentioned above, neural ODEs (NODEs) contribute to capturing dynamic features from neural network architecture. NODEs are able to learn the derivative of hidden state with neural networks and change discrete hidden layers into the continuous-depth network and continuous-time latent variable models for supervised learning [49]. Learning from neural ODEs, Ma et al. [50] leveraged entity recognition, sentence reconfiguration, and an ODE network to construct a rumor detection framework. It is the first time introducing neural ODE into rumor detection field and getting a promising performance. Zang et al. [28] combined an ODE and GNNs to learn continuous-time dynamics on complex networks for predicting continuous-time network dynamics. Yildiz [35] proposed a latent second-order ODE method integrated with an autoencoder to learn high-dimensional trajectories and complex continuous-time latent dynamics. In the recommendation field, Choi et al. [36] extended the linear GCN method with neural ordinary differential equations (NODEs) to learn user and item coevolving embedding over time. Besides, Wang et al. [51] proposed a temporal influence kernel function based on the derivation and decay exponential function to obtain the dynamics of users' preference from sequential records, which derives from the idea of differential equations.

Preliminaries.
We now introduce a formal description of our problem settings and preliminary knowledge to understand our work. e notations and symbols used are summarized in Table 1.
where each x i corresponds to the i-th event and |X| is the number of all events. Each event comprises two independent sets, namely, rumor content and propagation structure. So the event x i can be denoted as x i � {P i , G i }, where P i P i and G i represent the content and propagation structure of i-th event. e i-th event propagation structure is denoted as . . c iN represents the nodes in the propagation structure of event x i . c i0 is the original post, and c ij is the j-th responsive post during propagation. E i � e st |s, t � 0, 1, 2 . . . , N} denotes a set of all links.
en, according to the nodes and edges, Aϵ 0, 1 { } NxN is denoted as an adjacency matrix. Specifically, A st � 1 means the edge from node c is to c it exists. en, combining content and propagation structure, we set A ′ as the adjacency matrix of the heterogeneous graph, which is detailed in the next section.
Moreover, we formulate this task as a supervised classification problem. Each event is associated with a groundtruth y i and learns classifier f from ground-truth labels; that is, f: x i ⟶ y i , where y i is one of the four classes {NR, FR, TR, UR}, representing nonrumor, false rumor, true rumor, and unverified rumor, respectively. Our goal is to train a model f(·) to predict the label of a given post and propagation.

Model Overview.
e basic idea of HDGCN is to learn dynamic changes from joint post content and rumor propagation structure for accurate rumor classification. In this section, we present the construction of our proposed framework for rumor detection based on the continuoustime dynamic on rumor heterogeneous graph.
As shown in Figure 3, the proposed rumor detection model consists of three main components: rumor representation capturing, dynamic propagation modeling, and rumor classification. Firstly, HDGCN encodes rumor representations by constructing a joint content and propagation structure graph. In order to build the heterogeneous graph, we learn rumor content representations via TF-IDF [52], encode propagation node vectors via adjacency matrix, combine content and propagation information together, and then embed the union graph into a low-dimensional space. Secondly, we use an undirected GCN to learn the initial status of the heterogeneous graph. Meanwhile, we devise an ordinary differential equation system with GCNs to obtain continuous-time dynamics of the heterogeneous graph. Finally, a mean-pooling layer is used to concatenate the hidden dynamic features and predict rumor classification. Next, we will elaborate on the details of each component.

Rumor Representations Capturing.
We build a joint rumor content and propagation structure graph, named content-propagation heterogeneous graph, to encode rumor representation. e heterogeneous graph combines rumor content information (the weights of edges) and rumor propagation structures, which is shown in Figure 2.
To obtain rumor content representation for each post, we adopt the TF-IDF method, which is used for text mining by static word frequency and inverse document frequency. Firstly, we filter the stopped words and construct the corpus. en, we formulate the term frequency as tf ij � n ij / k n ik , where n ij is the number of times the i-th word appeared in post j. And the inverse document frequency is formulated as idf i � log|D|/1 + | k: t i ∈ p k |, where D denotes the total number of posts in the corpus and | k: t i ∈ p k | is the number of posts containing word t i . e weight of word t i is computed as t i � tf ij * i df i . So the content representation of post j can be denoted as p j � [t 1 , t 2 , . . . , t |W| ], where |W| is the number of corpora. en, because the representation of p j is high-dimension and sparse, we employ an embedding layer to map them into low-dimension space to acquire dense real-value vectors. Formally, the dense vector of each rumor content is defined as where W j denotes the weight of embedding layer and v j is considered as the content representation of post j. Next, we can construct the rumor propagation structure G j � C j , E j , A based on the spreading path of retweets and responses. Since adjacency matrix Aϵ 0, 1 { } NxN can be on behalf of rumor spreading, we can obtain the adjacency matrix A ′ of joint rumor content and propagation graph from A and v j , which is defined as where A st is the value of edge from node c js to c jt . en, we can obtain the initial rumor representation r j , formulated as where M is the number of edges in the propagation of post j.

Initial State Acquiring.
After obtaining rumor representation, we acquire the initial state of the heterogeneous by Mapping functions 4 Complexity a GCN method. We firstly build the operators of convolutional graph neural network A, which is defined as where A ′ and I are the adjacency matrix and identity matrix of the heterogeneous graph, respectively. D is the degree matrix of A. en, we obtain the initial state representation of the union graph. e equation is written for the model as follows: where H 0 represents the initial state of rumor propagation. W 0 is the filter weight. And we adopt the ReLU function as the activation function σ(·). e dropout is applied in the GCN layer to avoid overfitting.

ODE-Based Dynamic Propagation
Modeling. As we know, the propagation of rumor events can be seen as a continuous-time dynamic complex network. To obtain the dynamic features in joint rumor content and propagation graph, we combine the ordinary differential equation system and graph convolutional networks to learn the continuoustime dependency of propagation progress. Firstly, we describe the dynamic system of the propagation by a differential equation, formulated as [53] where H(t) represents the state of the propagation network at time t, with t ∈ [0, ∞]. G denotes the heterogeneous graph, capturing how nodes interact with each other. W are parameters which are the weight demonstrating how the progress evolves over time. dH(t)/dt is the derivative of time, denoting the instantaneous rate of rumor propagation dynamic system changes. H(0) � H 0 is initial states of propagation progress at time t � 0.
e nonlinear mapping function f (H, G, W, t) , capturing the dynamic changes, is modeled by a GCN, which is defined as where Φ is the operators of convolutional graph neural networks, W 1 denotes the parameters of GCN, b is the bias, and σ(·) is the ReLU activation function. After getting the initial state, we encode the graph network node state H t from the original hidden state H 0 and obtain a continuous-time dynamic state at an arbitrary time by integrating dH(t)/dt over time t, which is defined as where H t denotes the current state at arbitrary time t in rumor propagation. e rumor propagation is a timevarying dynamic system in the differential system when coefficient W changes over time. And we can obtain the hidden state at any time by setting time t.

Rumor Classification.
With the rumor dynamic hidden representation, we employ a mean-pooling operator to aggregate information from an ordinary differential equation system grouped by each interval time, which is formulated as where T is the number of time points t ∈ t 1 , t 2 , . . . , t T of the ODE solver. en, we apply a fully connected layer and a softmax layer for rumor classification, which is defined as where W FC W FC and b FC are the weight and bias of the last hidden layer. y denotes a vector of probabilities for all classes used to predict the label of the rumor event. We train our model by minimizing the cross-entropy of the predictions y and ground-truth distributions y. L 2 regularizer is applied in the loss function to prevent overfitting.

Experiments
In this section, we conduct the experiment on two real-world datasets to evaluate the effectiveness of our proposed HDGCN method in comparison with several baseline models.

Data Description.
For experimental evaluation, we use two public datasets collected from real-world social media to verify our proposed method. ey are Twitter15 and Twitter16 [14], which contain 1490 and 818 propagation graphs for each and are annotated with four classes. In the graph, nodes represent users, and edges refer to replies and retweets. Each graph is labeled as one of the four classes, that is, nonrumor, true rumor, false rumor, and unverified rumor. e statistic of the two datasets is given in Table 2.

Baselines.
We compare our proposed model HDGCN with some baselines on rumor detection, which is shown as follows: DTC [16]: this is a method using a Decision Tree classifier based on manually engineering features to obtain the tweet information.
RFC [54]: this is a method using a Random Forest classifier that uses features related to user, linguistic, and structure characteristics of news for rumor detection.
SVM-TS [55]: this is a method based on an SVM classifier with a constructed time-series kernel of handcrafted features.
SVM-HK [15]: this is a rumor detection method with an SVM classifier using a graph kernel based on the similarity of propagation structures.
GRU-RNN [17]: this is a rumor detection method based on recurrent neural networks with GRU units to capture the variation of contextual information of relevant tweet posts over time.
BU-RvNN and TD-RvNN [14]: a rumor detection method adopts two-directional tree-structured recursive neural models based on a top-down and a bottom-up tree neural network with GRU units to learn rumor information.
Rumor2vec [56]: a rumor detection method adopts a CNN-based model to combine textual content with propagation structures, capturing joint representation learning for rumor detection.
ESODE [50]: a rumor detection method integrates semantic information and an ordinary differential equation network to obtain statistical features for rumor detection.

Experiment Setup.
We implement all deep learning models using PyTorch and GCN models with PyTorch Geometric package. We use TF-IDF [52] with 5000 words for the content representation of the posts. Embedding sizes of nodes and hidden layers are searched in [64,128,256,512]. At last, we choose 128 as our model embedding size, and the batch size is the same size. e categorical cross-entropy is chosen as the loss function. e maximal tolerance of the ODE solver is 0.01, and the time intervals are set to 5 snapshots H(t 1 ), H(t 2 ), . . . , H(t 5 )|0 < � t 1 < t 2 < · · · < t 5 < � T} in [0, T] of the continuous-time dynamic propagation graph. We use DOPRI5 [57] method as the ODE solver to calculate the differential equation in our HDGCN model. We grid search the best terminal time T ∈ [1.5, 2.5] and get our terminal time T � 2. To make a fair comparison, we split the datasets into five parts and conduct 5-fold crossvalidation to obtain robust results. For Twitter15 and Twitter16 datasets, we evaluate accuracy over four categories and F 1 on each class. Besides, we optimize the model by the Adam algorithm. e dropout is 0.5, and the learning rate is 0.005. e training is iterated upon 100 epochs. To prevent overfitting, we apply early stopping when the validation stop is reduced by 10 epochs.

Performance Comparison.
In this section, the result of our proposed method HDGCN is compared with the baselines on Twitter15 and Twitter16 datasets. From Table 3, we can draw the following observations.
Firstly, it is shown that the baselines based on deep learning methods like GRU-RNN, BU-RvNN, Rumor2vec,  6 Complexity and our model outperform the methods using traditional handcrafted features, such as DTC, RFC, SVM-TS, and SVM-HK. is is because the deep learning methods have an excellent ability to automatically learn effective and highdimensional feature representations of rumors, which demonstrates that the deep learning method benefits from improving the effectiveness of rumor detection. Secondly, our proposed method HDGCN performs best on accuracy among all baselines on two datasets. Our model achieves an accuracy of 83.4% and 86.5%, respectively, increasing by 4.8% and 1.6% compared with the latest baselines Rumor2vec. e comparison between Rumor2vec and our model demonstrates the effectiveness of incorporating the dynamic features into the model for rumor detection. Since CNN cannot process data with dynamic features, Rumor2vec ignores the importance of dynamic changes during rumor propagation, which causes worse accuracy for rumor detection. Additionally, only for the nonrumor class, the performance of Rumor2vec is better than our proposed model. Because nonrumors are always released or responded by users who have higher authority and credibility, the paths and content of nonrumor propagation are more likely to be fixed. So the dynamics of nonrumor propagation do not significantly affect the performance.
Finally, our method HDGCN is superior to the ESODE method with a 1.2% and 1.6% improvement on two datasets. As we all know, rumor propagation is seen as a graph structure. e neural ODE method only uses NODE to get dynamic features of propagation but cannot process the graph-structural data, which prevents it from obtaining efficient propagation features, resulting in worse performance for rumor detection.

Impact of ODE-Dynamic.
To analyze the effect of the dynamic features of the heterogeneous graph, we compare the proposed rumor detection method with the variants without the dynamic system. e results are shown in Table 4. By capturing the continuous-time dynamics on the heterogeneous graph, our HDGCN achieves better performance than without dynamic module, increasing by 6.2% and 9.1% on Twitter15 and Twitter16, respectively, which indicates that the dynamic features of the heterogeneous graph play an important role in rumor detection. is is because the GCN model has only learned static representations of graph structures and failed to capture the continuous-time dynamic features precisely. In contrast, the differential equation system can capture the dynamic features by its derivative operator and allow our method to pay more attention to dynamic changes over time, which helps to improve the performance of our model.
Moreover, the effect of dynamic features on the nonrumor class is less than that of other classes, which implies fewer dynamic features of the nonrumor propagation  structure. is is because the nonrumors are usually released or responded by an authority or government department, and the propagation paths and content are more likely to be fixed. erefore, the dynamics of nonrumor propagation have less effect on the performance.

Ablation Study.
We perform the ablation studies over our heterogeneous graph to analyze the importance of post content and propagation structure for rumor detection. e experimental results are presented in Figure 4. "Only content" refers to our model with post content features in the network, while "only propagation" represents our model with propagation structure only in the heterogeneous graph.
As shown in Figure 4, we can observe that the heterogeneous graph with post content and propagation performs better than the only content or only propagation on accuracy and F 1 of Twitter15 and Twitter16. is indicates that both content and propagation structure benefit from increasing accuracy of rumor classification. Moreover, the performance of most "only content" is better than that of "only propagation." Since the dynamic change of rumor content may change the rumor meanings, post content dynamics play a more critical role in rumor detection than propagation dynamics. In general, it is necessary to combine the content and propagation structure for rumor detection.

Conclusion
In this paper, we propose a graph convolutional networkbased dynamic rumor detection method named HDGCN, which takes advantage of post content and rumor propagation structure as well as the dynamic changes of the heterogeneous graph when the rumor is spreading. Considering that the rumor content and propagation structure play essential roles in rumor classification, we combine the two factors into a union graph to obtain more rumor information details. Besides, GCN has the ability to tackle graph-structural data and learn higher-level representations, and the ordinary differential equation can acquire continuous-time dynamics on a complex graph. We design the dynamic module using GCN as the derivative network to get the dynamic representation of the heterogeneous graph. Finally, we conduct extensive experiments on two real-world datasets (Twitter15 and Twitter16), and the experimental results demonstrate that our proposed method outperforms mainstream models in rumor detection.
In the future, we will explore multimodal graphs, which can integrate more information such as user profiles, images, and semantic knowledge for rumor classification. Moreover, we will continue to dive into the study of social media's rumor propagation dynamic system, building more efficient methods to explore the dynamic changes of rumor spreading to enhance the robustness of rumor classification.
Data Availability e data underlying the results presented in the study are a public dataset from the paper "Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning" produced by Jing MA, which is available at https://www. dropbox.com/home/rumdetect2017/ rumor_detection_acl2017.

Conflicts of Interest
e authors declare that they have no conflicts of interest.