A New Early Rumor Detection Model Based on BiGRU Neural Network

With the progress of society and the rapid development of computer technology, rumors arise on social media, which seriously aﬀects the social economy. How to detect rumors accurately and rapidly has become one hot research topic. In this paper, a new early rumor detection model is proposed. The aim of this model is to increase the eﬃciency and the accuracy of rumor detection simultaneously. Speciﬁcally, in this model, the input data is ﬁrstly reﬁned through account ﬁltering and data standardization, then the BiGRU is used to consider the context relationship, and a reinforcement learning algorithm is applied to detection. Ex-perimental results show that compared with other early rumor detection models (e.g., checkpoints), the accuracy of the proposed model is improved by 0.5% with the same speed, which testiﬁes the eﬀectiveness of this model.


Introduction
Rumors refer to statements that have no corresponding factual basis but are fabricated and promoted through certain means. In today's highly developed situation of information dissemination media, it can spread quickly through social media, and malicious rumors may affect economy and society significantly. e negative impact of rumors may increase significantly when certain major events occur, such as the traceability of COVID-19 in 2019. is makes people realize that if malicious rumors are not discovered in time, they may continue to cause significant damage, so the timing of their detection is crucial. Figure 1 shows an example of a rumor propagating on Twitter which is named "German Wings Crash." e source message started a claim about the crash could be an Airbus A320 German wings. A German wing Airbus A320 with 150 people on board crashed in Barcelonite, southern France, with no one surviving. e message was retweeted by multiple users on Twitter, either by reposting, commenting, or questioning the original source message. We extracted several related tweets within 24 hours.
Currently, the most research on rumors uses the Twitter social platform and Weibo platform as the main research objects. Considering that the research of rumor detection technology on the Weibo dataset has been relatively mature, and the latest accuracy rate has reached about 95%, this paper uses public standard Twitter dataset as the main research object. Unlike funny videos and celebrity gossips that are popular on Weibo, hot social events are the most popular topics on Twitter.
Based on the platform of Twitter, many researchers have conducted research on rumor detection. When most researchers focus on improving the accuracy of rumor detection, Ma et al. [1] and Kwon et al. [2] have proposed the use of presettings in recent years. e method of fixed checkpoints can evaluate the timeliness of the discovery of rumors, but this method has the disadvantage of not being able to capture the changes in different rumors spreading modes. On this basis, Farajtabar et al. [3] proposed a combination of reinforcement learning and a point process network activity model to detect false news and achieved good results.
is assessment of the timeliness of rumor detection is also one of the focuses of this paper.
At the same time, although data preprocessing has been widely used in NLP (Natural Language Processing) to improve performance, few researchers conduct preprocessing on target data for rumor detection. Moreover, this paper finds that most studies focus on the extraction of various features the Twitter standard dataset to capture rumor indicators but ignore the data complexity on Twitter as a large social platform. Firstly, Twitter has a large number of spam accounts. e tweets sent by these accounts usually have a certain commercial purpose, and the research on rumor detection for these tweets is usually weakly relevant or invalid. ey are intended to be included in the tweets for this paper. e extracted value information has caused serious interference, so it is necessary to filter it out. Secondly, the short text and randomness of tweets in Twitter makes it difficult for people and computers to accurately understand the information of tweets, so it is necessary to preprocess the tweets to standardize the tweets on the Twitter standard dataset.
e key contributions of our work are summarized as follows: (1) In rumor detection, for the early rumor detection with checkpoints, this paper proposes to apply BiGRU to the early rumor detection model. Using BiGRU to consider the characteristics of context relations, combined with the sequence before and after, the two posts input before and after the event are included in the detection, so as to improve the effect of rumor detection. (2) In rumor detection, aiming at the early rumor detection with checkpoints, this paper proposes a data preprocessing method based on account filtering and text standardization for the first time. e account filtering method is used to remove the junk account. Text standardization is used to standardize tweets in Twitter standard dataset so that the data can better express the meaning of the text so as to improve the accuracy of rumor detection. (3) In this paper, Q-learning, a reinforcement learning algorithm, is applied for rumor detection to dynamically determine checkpoints, thereby improving the timeliness of rumor detection.

Related Work
In the contemporary era of the emergence of various social media, rumor detection has attracted the attention of all parties, and the current research on rumor detection has achieved initial results. e current research on rumor detection is mainly divided into two categories: one is based on traditional machine learning for rumor detection, and the other is based on feature learning to extract main features for rumor detection.   Discrete Dynamics in Nature and Society Rumor detection methods based on traditional machine learning mainly use decision trees, SVM, and other classifiers to classify events. Liang et al. [4] no longer use a single classifier, such as KNN and SVM but propose a BP neural model and an improved excitation function and add an impulse term to make it possible to detect rumors in the propagation process. Lu et al. [5] noticed that there is an imbalance in the data, which has affected the implementation of rumor detection and proposed a Co-Forest algorithm to improve the imbalanced data and balance the data distribution. Mao et al. [6] used an integrated classifier to detect rumors based on characteristics such as emotional orientation and communication process. In recent years, the rise of artificial intelligence has made the application of deep learning increasingly widespread. Similarly, in the field of rumor detection, deep learning plays an important role. Takahashi and Igata [7] developed a system to detect rumors by studying the spread of rumors and conducted experiments on Twitter and found that they can effectively detect rumors, which opened up a new chapter in rumor detection. Karamchandani and Franceschetti [8] proposed a method of detecting the source of rumors to control the rumors, which extended the best estimator of rules and irregularities to achieve the purpose of detecting the source of rumors. Similarly, Wu et al. [9] also considered propagation in the rumor detection technology and proposed a hybrid SVM classifier based on the graph kernel, which not only captures semantic features such as topics and emotions but also captures high. e first-order propagation mode improves the classification accuracy.
With the rise of artificial intelligence, rumor detection has entered a new stage. e method of rumor detection based on feature learning mainly uses advanced artificial intelligence ideas. Li et al. [10] combine the convolutional layer of the convolutional neural network to extract text features, use the GRU network to process the features, and then judge whether it is a rumor. Ren et al. [11] considered that Weibo text is a graph structure, and information such as the attitude of users' comments will affect the spread of Weibo text and proposed a rumor detection model based on time series. Liao et al. [12] considered the potential information of some Weibo texts and partial user information and proposed a social media rumor detection method based on a hierarchical attention network. Srinivasan and Dhinesh Babu [13] proposed a double convolutional neural network method with a new activation function for the sparse data with little available information that can be used to distinguish rumors at the beginning, because this method has faster generalization speed and more high precision and has a very good effect in rumor detection. After summarizing many studies, Zhou et al. [14] found that their research seldom considered the timeliness of rumor detection and proposed an early rumor detection model to process rumors through two modules, a rumor detection module and a checkpoint module. e rumor detection module is used to extract features, and the checkpoint module is used to solve the problem of timeliness, which is used to trigger the rumor detection module to ensure the timeliness of rumor detection while ensuring accurate identification of rumors. Lin et al. [15] raised the issue of word independence and found that some common words appear in rumors. Once these words appear, they can be judged as rumors. ey proposed a deep sequence model to consider the two aspects of rumors: falsehood sex and influence, using long-and shortterm memory units to learn falsehood, and combining deep sequences and social characteristics to learn influence. Asghar et al. [16] proposed a bidirectional long-term shortterm memory model based on convolutional neural network, which uses convolutional neural network to extract post features and uses bidirectional long-term short-term memory method to store points and consider contextual information, effectively detecting rumors on Weibo, focusing on the research of rumor detection in Arabic, extracting information from users and content, and proposing the use of semisupervised expectation maximization (EM) to train newsworthy tweets topics to achieve the purpose of rumor detection [17].
In recent years, the research on rumor detection has mainly focused on extracting features and analyzing features. According to the text content of the given data, the main features expressed by the data are extracted and analyzed. Since the data comes from Twitter accounts, there are some spam accounts that guide public opinion, which interferes with the process of rumor detection and affects the results of the detection. At the same time, the longer the rumors spread, the more harmful it will be to the society, and timely detection of rumors is an important aspect of rumors detection research. Aiming at two aspects, this paper starts from the two directions of preprocessing and timeliness. On the one hand, the accuracy of rumor detection is improved through account screening and tweet standardization. On the other hand, reinforcement learning algorithms are used to save the time of rumor detection as much as possible.

DDR Model Architecture
e model mainly consists of three submodels: a data preprocessing model based on account filtering and standardization (DP model), a rumor detection model based on deep learning (DL model), and a checkpoint model based on reinforcement learning (RL model). erefore, the proposed model is called DDR (data preprocessing and rumor detection based on deep learning and reinforcement learning) model for short as shown in Figure 2. In the DP model, this paper proposes basic information of users to analyze users, filter spam accounts in Twitter data, refine the data, and use standardized Twitter text to enhance the data to achieve accurate detection of rumors purpose. In the DL model, data characteristics are mainly analyzed, and BiGRU is used to consider the context, analyzes Twitter text data, and detects rumors. e RL model mainly solves the problem of the timeliness of rumors detection. e Q-learning algorithm is used to judge the detection results, and the reward and punishment mechanism is set to make the model trade-off between timeliness and accuracy to improve the timeliness of detection. At the same time, the accuracy of detection is improved.
Discrete Dynamics in Nature and Society 3.1. DP Model 3.1.1. Account Filtering. For a Twitter account, its main features are generally divided into features based on user portraits and features based on tweet. User portrait features are a series of features that can be directly extracted from user information, such as the number of people that users follow and number of fans. User tweet features are statistical features extracted from user tweets, such as the proportion of tweets containing URLs and the proportion of liked tweets to the total tweets. Since the tweets of a single Twitter account in the dataset used for rumor detection usually is not larger enough to extract the user-based tweet text features, it is impossible to obtain complete user tweet features. At the same time, if a tweet of a single Twitter account has a high number of likes, the tweet can be considered valuable, thereby reducing the probability of the account being filtered.
Based on the above description, it can be concluded that when a certain type of Twitter account meets certain conditions, the tweets belonging to this account are usually weakly relevant or invalid for rumor detection. In this paper,  this type of account is defined as "filtered account," or FA, and using the user's portrait features and the number of likes on user tweets to determine whether an account is FA. e definition of FA is as follows: (1) An FA will follow many people, while few users will follow the FA. Based on this feature, the authenticity definition of user accounts is proposed, as shown in the following: where FollowersCount denotes user followers and FriendsCount refers to the number of users following. (2) Generally, the personal information of spam accounts in Twitter is not complete enough, and it is rare to fill in user description and user location information. erefore, HasDesc and HasLoc are defined to indicate whether the user has description information and location information. us, HasDesc or HasLoc is 1; otherwise, it is 0. Based on the above analysis, the definition of user authority is proposed, as shown in the following: authority � authenticity + 0.5 · (HasDesc + HasLoc) where TweetsLike denotes the total number of likes corresponding to the current user's tweet and AvgLike refers to the average of the total number of likes of all users. (3) e authority of all users is sorted in a nonincreasing manner. It is concluded that the bottom 5% of users in the ranking have lower authority, and these users are defined as FA.
We filter out the FAs in the Twitter standard dataset and filter the tweets belonging to these FAs.

Tweet Standardization.
e standardization of text is a part of text preprocessing, which mainly refers to the correction of some irregularities or errors in the text, thus transferring it to a text that people can understand correctly. Based on the characteristics of tweets in the Twitter standard dataset, a standardization method for tweet text is proposed in this section.
Tweets in Twitter are usually random and short. On the one hand, tweets are generally limited to 140 words. On the other hand, compared with traditional standard texts, tweets contain many irregularities or errors in terms of wording, grammar, format, and so on, such as spoken language, colloquialisms, acronyms, Internet terms, or emoji expressions, which greatly increase the difficulty of computer understanding of the text, disturbing factors in the difficulty of understanding the text. At the same time, the tweets also contain some symbols and network links that have no actual meaning, and other factors that have no relation with the semantics of the text.
In view of the above characteristics of tweets, in order to strengthen the computer's understanding of tweets, this paper carries out the following standardized processing on tweets: (1) Unit replacement is as follows: replacing the unit in the text with a unified format, such as replacing "4 kgs" and "4 kg" with "4 kg" (2) Acronym replacement is as follows: replacing the acronyms in the text with complete words, such as replacing "can't" with "can not" (3) Spelling proofreading is as follows: replacing some network terms or punctuation of words with irregular spelling, such as replacing "rep" with "reply" (4) Punctuation is as follows: adding spaces on both sides of all punctuation (5) Symbol replacement is as follows: replacing all logical symbols with words, such as "and" with the word "and" (6) Redundant information processing is as follows: removing extra spaces, "@" and "#" symbols in hashtags and removing all hyperlink information (7) Delete stopwords is as follows: deleting a series of stop words such as "if" and "to" (8) Part-of-speech restoration is as follows: restoring an English word of any form to its general form, such as "does," "did," and "done" unified reduction to "do"

DL Model.
Rumor detection model based on deep learning processes the tweets after tweet standardization, dividing into words embedding layer, max-pooling and dropout layer, and BiGRU layer. It is used to transform a piece of text into the final state and to judge whether the text is a rumor through the softmax function.

Words Embedding Layer.
In the words embedding layer, this paper first performs word segmentation on the text Input i that has been standardized. Considering that simple splitting will destroy the semantics of compound words such as "eleven-years-old," this paper uses a word segmentation method based on phrase dictionary matching for text segmentation. After word segmentation, we map the words to word vectors w n i according to word frequency. is paper sets E i � Input 1 , Input 2 , . . . , Input n to indicate that there are n tweets at a time, where Input i � w 1 i , w 2 i , . . . , w n i means that the tweet has word vectors, and these word vectors are combined together and obtain the vector matrix e i of the tweet formed after Input i is processed by the words embedding layer.
Discrete Dynamics in Nature and Society 5

Max-Pooling and Dropout Layer.
In order to get the most prominent features of posts, the maximum pooling method is used for pooling, so that keywords or sentence features are reduced, and parameters are reduced. Finally, a fixed-size vector m i can be generated. At the same time, in order to slow down overfitting and enhance the model generalization ability, the dropout layer is added.

BiGRU Layer.
In order to strengthen the model's understanding of contextual semantics, this paper uses BiGRU to simultaneously combine the before and after sequences to make predictions. BiGRU is composed of two GRU stacked on top of each other, and its main structure is a combination of two unidirectional GRU. For each time t, the input will be provided to the two GRU in opposite directions at the same time, and the output will be jointly determined by the two unidirectional GRU. As shown in Figure 3, x t is the input data, h t is the output of the GRU unit, z t is the update gate, z t and r t jointly control the calculation from the hidden state of h t−1 to the hidden state of h t , and the update gate also controls the current input data and previously memorized information h t−1 , output a value z t between 0 and 1, and z t determines how much h t−1 is transferred to the next state; the specific unit is calculated as the following formulae show: where σ is the Sigmoid function and W z , W r , W are the weight matrix of update gate, reset gate, and candidate hidden state, respectively. e reset gate controls the importance of h t−1 to the result h t . When the previous memory h t−1 is completely related to the new memory, the reset gate can be used to increase the impact of the previous memory. According to the calculation results of reset gate, update gate, and hidden state, the output h t at the current moment can be obtained by formula (6), thereby obtaining the relationship between BiGRU and a large number of posts. en, we use the final state h N (N � the number of posts received so far) to judge the rumors through the softmax function: 3.3. RL Model. In addition to the accuracy of detection, this paper also considers the timeliness of detection. is paper uses the Q-learning algorithm to dynamically determine the best checkpoint to improve the timeliness of rumor detection. e Q-learning algorithm has a calculation action value and a reward mechanism. e action value function calculates the Q value according to the obtained state expression. As shown in formula (8), combining the Q value and the post state expression, the action value function is used to determine whether to terminate or continue, and as shown in formula (9), the characteristic representations h fi and h bi are used as inputs to calculate the action value: (9) where c is the discount rate, r is the reward value, a 0 is rumor, and a 1 is nonrumor. According to the action value and state value, we get the max reward value, used to optimize the action value. e main features of the posts are maintained in the words embedding layer unchanged.
rough the BiGRU layer, the new state value is obtained by combining the main features of the current input post and the previously obtained state value as the input of reinforcement learning. en, the action value obtained by reinforcement learning decides whether to terminate or continue. If it terminates and the prediction is correct, the model will give a reward. If it terminates but the prediction is wrong, this model will give a big penalty. If it continues, this model will also give a small penalty. e calculation formula is shown in formula (10). In this way, the model will make a consideration between whether to continue or terminate in the end and, at the same time, make a trade-off between accuracy and timeliness: where N is the number of correct predictions accumulated thus far, P is a large value to penalize an incorrect prediction, and ε is a small penalty value for delaying the detection.

Dataset.
is paper uses the public standard Twitter dataset proposed by Ma et al. [18]. is dataset was proposed in 2016 and has been recognized by the academic community. It has since been widely used in the field of rumor detection and is a classic dataset on the problem of rumor detection. It contains 5802 events, each of which contains several Twitterrelated JSON files. Each JSON file represents a tweet, including the information of the tweet and the basic information of the user who posted the tweet. e specific data information of the dataset is shown in Table 1. is paper uses part of the information to perform account filtering and then standardizes tweets, uses the creation time of the tweets as the test basis for early detection, encapsulates the data in a certain format, and uses the ratio of 8 : 2 to divide the preprocessed dataset into a training set and a test set for the application of the subsequent sequence in the training and testing of the model.

Evaluation Indicators.
For the evaluation criteria of the model, this paper uses four indicators consistent with the literature [19,20], namely, Accuracy, Precision, Recall, and F1. e calculation formula is shown as follows: 6 Discrete Dynamics in Nature and Society accuracy � where TP, FN, FP, TN are shown in Table 2 [21].

Environment and Hyperparameters.
In this paper, the model is implemented under the Linux system, and the model is trained under the GPU environment of Python 3.6 and TensorFlow 1.13.1. In the training process, this paper uses the per-trained GloVe as the initialization of the word vector to input, which contains 840 billion words. Also, the dimensionality of word embedding is set to 300, while the dropout rate to embedding layer is 0.5. e Adam optimiser (Kingma and Ba) [22] with a learning rate of 0.01 for DL model and 0.001 for RL model are used as the optimization method. Set the size of each batch to 50 and the number of DL model and RL model alternate training rounds to 20.

Training Loss and Reward.
e training loss and reward values over iterations are presented in Figures 4 and 5. In this paper, the DL model and RL model are trained in an alternating manner. It can be seen that the training loss of the DL model tends to be dynamically balanced after about 5000 iterations, and the loss value drops below 0.2, achieving the optimal value. In addition, the reward curve fluctuates more as the reward was calculated based on the accuracy of DL model. When switching between training DL and RL model, the reward value tends to change abruptly. But with the improvement of accuracy over time, a consistent improvement of reward value can be seen.

Detection Model and Comparison.
In order to effectively testify the accuracy and effectiveness of the model, this paper compares the proposed model with the following models.
e RNN model mainly uses recurrent neural networks to analyze data and detect whether a text is a rumor. e LSTM model is an improvement of the recurrent neural network model, which considers the contextual relationship. e GRU-2 model [18] is an improvement of the LSTM model, which reduces the parameters and improves the efficiency. Specifically, this model first divides the event into time periods, then uses the tf-idf method to calculate the text representation of each time period, uses a two-layer GRU network to learn the hidden layer representation of each event, and finally realizes the classification of the event.
e HMM model [23] uses the group's point of view for analysis and finally achieves the classification effect. e GAN-GRU model [24] uses a generative adversarial network   In this section, this paper evaluates DDR based on four evaluation indicators on the test set, as shown in Table 3. is paper is mainly based on the improvement of the ERD model proposed in 2019 and proposes a preprocessing module based on account filtering and standardization. By processing the input of the model, the data is refined, and the accuracy of rumor detection is improved. It can be seen from the chart that the indicators of DDR are basically better than traditional models, such as RNN model, LSTM model,  Comparing the three indicators of Precision, Recall, and F1, the model proposed in this paper is 0.027 lower than the ERD model on the Precision indicator, and the other two indicators have increased by 0.123 and 0.051, respectively. It can be seen that the model proposed in this paper has a good effect in the process of rumor detection. However, when reinforcement learning is added, reinforcement learning must consider the timeliness of detection, and you will want to get results faster when making trade-offs. erefore, reinforcement learning has an impact on the accuracy of the model, compared to not considering timeliness. For the RDM model, the accuracy of the model proposed in this paper is slightly lower than 0.01.
In summary, the model proposed in this paper improves the accuracy of rumor detection while considering the timeliness of model detection. It can be seen from the graph that the proposed model is effective in detecting rumors.

Detection Timeliness.
en, in order to evaluate the timeliness of detection, based on the Twitter standard dataset, this paper focuses on comparing the DDR model with the GRU-2 model. e biggest difference between the two models is that GRU-2 uses fixed checkpoints. e DDR dynamically determines checkpoints through the DL model. is paper presents the proportion of events that are classified by DDR and the classification accuracy over time (6-hour interval) in Figure 6. Firstly, it can be seen that 70% of rumors are discovered within 6 hours, and the best checkpoint of GRU-2 (vertical dashed line) is 12 hours, so DDR can detect most rumors earlier than GRU-2. Secondly, it can be clearly seen that the classification accuracy of DDR is better than GRU-2 (horizontal dotted line) at all checkpoints.
In summary, it can be seen that DDR improves the timeliness of rumor detection compared with the GRU-2 model.

Conclusions
is paper presents a new early rumor detection model. e model is divided into three submodels: DP model, DL model, and RL model. In rumor detection, according to the early rumor detection model, this paper proposes to filter the account according to the user's portrait characteristics and the user tweet's praise number. e text standardization for tweets is defined. e original data is processed more precisely and more precisely to improve the accuracy of the detection. At the same time, BiGRU is proposed to enhance  the understanding of context semantics, fully consider the relationship between the post text, and improve the effect of model training. e model is trained by the method of intensive learning to ensure the timeliness of rumor detection.
e results are compared with the early rumor detection model. e results show that the accuracy of DDR model is 86.3% in the test set, 0.5% higher than ERD model, and 5.5% higher than that of GRU-2 model. It is proved that the model proposed in this paper has achieved good results in rumor detection. 70% of them are found in 6 hours, and the detection of GRU-2 model takes 12 hours. erefore, this paper improves the accuracy of rumor discovery on the premise of ensuring the timeliness of rumor discovery. With the development of science and technology, there is an increasing trend about the investigation of the real world problems such as the development of Faults Detection [25], Vortex eory [26], Periodic Orbit [27], Dynamics [28] and other research fields. It is hoped that the model proposed in this paper can be applied to these problems in the near future.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.