A Proposed Method for Predicting User Disinformation Forwarding Behavior

,


Introduction
At present, social network sites (SNSs), such as Facebook and Twitter, have become the major channels by which information is released and disseminated.SNSs create huge commercial value; however, they also increase the difficulty of preventing disinformation [1].According to Oxford Dictionary, disinformation refers to "false information that is given deliberately" [2].Compared to other credible information, the common feature of disinformation is that "the intention of mislead and spread false information about various occasions in the world" [3].Worse still, disinformation propagates much faster than other credible information on social media [1,[4][5][6].Without the effective control of disinformation on social media, a serious threat to social stability may occur [7,8].
Different from traditional media, forwarding has become the key approach to propagating information on social media.Forwarding refers to a user passing on a message written by someone else [9].As shown in Figure 1, initially, information is posted on social media by one or a few users.Next, their follower-fans read these posts and forward it.Eventually, these posts are widely spread on social media after forwarded for multiple times.As a result, if disinformation is not forwarded for multiple times, it is not capable of spreading widely.erefore, if we can identify the users who will forward the disinformation and take corrective actions in advance, we can prevent them from forwarding the disinformation and minimize the harmful effects of disinformation, as shown in Figure 2. To identify the users who will forward the disinformation, we should predict the probability of an individual forwarding disinformation.It is of crucial importance for both businesses and governments.
If commercial organizations or governments can identify users who will forward disinformation in advance, they can spread denials of false information to target users.As a result, the probability of the large-scale spread of disinformation can be decreased effectively and the losses caused by disinformation can be reduced considerably.
Predicting the probability of an individual forwarding disinformation is also of academic importance.Disinformation forwarding is essentially a process of user behavior spreading.User behavior spreading on a social network refers to users following the behavior performed by someone else when they see it [10].Studying the spread of behavior on an online social network "has long been a fundamental area in social sciences, particularly social computing" [11].To better understand the spread of user behavior, a fundamental question is how to predict (future) probability of individual behavior who have not adopted by now [11].
However, few previous studies have directly addressed the probability of social network disinformation forwarding.
is research issue mainly involves user information forwarding behavior on social networks and disinformation spreading across social networks.Current research on user information forwarding behavior on social networks mainly treats this problem as a classification task.Specifically, they extract the features that affect user information forwarding behavior and train the classifier using the extracted features to predict user information forwarding behavior.erefore, the extracted features will undoubtedly influence the prediction accuracy.Current research on user information forwarding behavior mainly focuses on features related to user influence and textual content.In fact, the features affecting user information forwarding are not exactly the same as those affecting user disinformation forwarding [12][13][14].
e key difference is that the susceptibility of users to disinformation has a clear impact on user disinformation forwarding behavior, while it has negligible impact on user information forwarding behavior.If users are able to recognize the disinformation, they will not forward it.On the contrary, if users are deluded by the disinformation, they will be very likely to forward it.In addition, current research on user information forwarding behavior rarely takes unobserved features into considerations [10,11,[15][16][17].On the other hand, current research on the spread of disinformation on social networks mainly investigates the spread of disinformation among groups.ey focus on the features related to disinformation and susceptible population, while ignoring the features related to individuals [7,[18][19][20][21].As a result, they are still inappropriate for the prediction of individual disinformation forwarding.
To overcome the limitations mentioned above, we propose a novel method called user disinformation forwarding behavior prediction (DFP) method to predict the disinformation forwarding probability of individuals on a social network.e DFP method also treats this problem as a classification task: it extracts the features that affect individual disinformation forwarding, especially extracting features related to the susceptibility of users to disinformation.With combining bootstrap sampling and expectation-maximization (EM) algorithm to learn unobserved features, the DFP method utilizes both observed and unobserved features to predict the disinformation forwarding probability of individuals.Here, the bootstrap method is used for sampling training data and the EM algorithm is used for learning the parameters of unobserved variables.e remainder of this paper is organized as follows.In Section 2, we review relevant previous research and discuss the differences between our proposed method and existing methods.In Section 3, we describe the method to predict the disinformation forwarding probability of individuals on social networks.To demonstrate the superiority of our 2 Scientific Programming proposed method, in Section 4, we evaluate its effectiveness on real data from social networks using representative existing methods as benchmarks.Finally, in Section 5, we summarize the findings of this study, discuss the theoretical and practical contributions, and conclude with the limitations of our research.

Literature Review
Previous researchers rarely directly solved the problem of predicting the disinformation forwarding probability on social networks, so we review related studies, including research on user forwarding behavior on social networks, disinformation spreading on social networks, and the spread of behavior on social networks.

Studies on User Forwarding Behavior on Social Networks.
Studies on user forwarding behavior on social networks aim at solving the problem related to the information forwarding behavior of social network users.Researchers found that the features directly related to forwarding behavior mainly include the following: (1) intensity of influence of the information publisher on the forwarders: Wu et al. found that the strength of interaction between the publisher and forwarders correlated with the final volume of forwarding, Wang et al. used the trust between users to predict forwarding behavior [22,23], and Asif Khan et al. used interaction between leaders and their supporters to analyze the retweet network [24].(2) Textual content: both Wu and Shen [23] and Firdaus et al. [25] found that emotion in textual content could affect user forwarding behavior.Wang et al. used the similarity between the content of Twitter posts to predict forwarding behavior [22,23,25].Wang and Yang learnt text feature from text content and combined text feature with social feature to predict forwarding behavior using multitask deep learning technology [17].(3) Network topology indicated that when most active users belonged to the same social circle, this reduced the probability of forwarding information [26].(4) Contextual information: Zhang et al. combined contextual information and social information to predict forwarding behavior [14].(5) User behavior history: Lymperopoulos used users' history behaviors and the popularity of tweets to predict forwarding behavior [16].ese studies demonstrated that the characteristics of the publisher, social network, and textual content all affected disinformation forwarding behavior.However, these results cannot be fully applied to predicting disinformation forwarding behavior because the studies mainly focused on social network data and ignored the characteristics of the disinformation itself.

Studies on Disinformation Spreading on Social Networks.
e main way to spread information on social networks is forwarding; accordingly, most information spreading features are also disinformation forwarding features.Among them, the features that have been identified to be directly related to disinformation forwarding mainly include the following: (1) social homogeneity: M. Del Vicario et al.
found that the disinformation receiver tended to make the same choice as users in the same social circle, that is, forwarding the disinformation or not [7].Askarizadeh and Tork Ladani found that disinformation generally started from one or multiple users and spread through between the trusted friends [20].Oh et al. further explained this phenomenon [27].
ey found that people tended to emotionally trust information from acquaintances or people in the same social circle, even when its source was ambiguous.
(2) Network topology: Doerr et al. found that disinformation was quickly forwarded from connections between influential users to a large number of ordinary users around them [28].
(3) People's beliefs: Zimmermann and Kohring indicated that the less people trusted news media and politics, the more they trusted online disinformation [29].F.B. Soares et al. suggested informational characteristics of online disinformation and the psychological mediators could promote online disinformation forwarding behavior [21].It is noted that these studies did not utilize those findings to predict individual forwarding behavior prediction.(4) Attractiveness of disinformation: Zhang et al. mentioned that the attractiveness of disinformation had a marked impact on the transmission pattern of user forwarding [19].( 5) Characteristics of the disinformation receiver group: Liu et al. found that the ability of users to distinguish disinformation and the refutation of disinformation by users could interrupt the spread of disinformation [30].ese studies demonstrated that the characteristics of social networks, disinformation, and users on social networks could all influence disinformation forwarding behavior.However, few in-depth studies have been conducted on individual's forwarding disinformation, and the results of the aforementioned studies cannot be directly applied to the prediction of individual disinformation forwarding behavior.

Studies on the Spread of Behavior on Social Networks.
Disinformation forwarding is a kind of user behavior spreading.e spread of user behavior on a social network has long been a fundamental study area in social computing [11].In many relevant studies, researchers found that the features that were directly related to disinformation forwarding mainly include the following: (1) network topology: Centola found that the size, centrality, and density of social circles were positively related to the probability of behavior spreading [10].Wang et al. also indicated that a multiplexing network would promote behavior spreading [31].(2) Social influence among users: Centola et al. found that social reinforcement between members could promote behavior spreading, and Bond et al. found that behavior spreading among close friends was more prominent [10,15].(3) Social convergence: both Centola and Aral and Nicolaides found that social convergence had a positive impact on behavior spreading [32,33].(4) Social persuasiveness and unobserved features: Fang et al. found that compared with social influence, social persuasiveness (composed of social interaction, entity similarity, and structural equivalence) and unobserved features could better predict the probability of user behavior spreading [11].ese studies demonstrated that the characteristics of social networks, the interaction between members, and social convergence should be taken into consideration when predicting disinformation forwarding.However, these studies rarely considered the susceptibility of individual users to disinformation.ey were also not fully applicable to predicting disinformation forwarding behavior.
From the above analysis, current studies have significantly contributed in the prediction of disinformation forwarding.However, there are still mainly three limitations in these studies.First, few studies address the susceptibility of users to disinformation.ey mainly focus on some social network data features, particularly the social influence of information publishers.However, the susceptibility of users to disinformation is an essential feature that affects disinformation forwarding.Second, few studies consider the unobserved features when predicting user forwarding behavior.Most studies treat the prediction of user information forwarding as a classification task.Regular classification algorithms cannot handle unobserved features.It will influence the prediction accuracy inevitably.Final, current research on the spread of disinformation on social networks mainly investigates the spread of disinformation among groups, with ignoring the features related to individuals.ey are inappropriate for the prediction of individual disinformation forwarding.
To overcome the limitations mentioned above, we propose a novel method to predict the probability of individual disinformation forwarding on social networks.is method systematically explores the key features that affect the forwarding of disinformation and measurement methods of each feature and combines bootstrap sampling and EM algorithm to learn unobserved features.

A Proposed Method for Predicting User Disinformation Forwarding Behavior
We propose a novel method called DFP to predict the disinformation forwarding probability of individuals on a social network.As illustrated in Figure 3, the DFP method consists of three stages: (1) Identifying features: we identify features that affect user disinformation forwarding behavior on social networks based on the theory related to behavior spreading on social networks and measure these features (2) Predicting the probability of user being persuaded to forward a disinformation: we predict the probability that a user is persuaded to forward a disinformation (3) Predicting the probability of user forwarding disinformation: we predict the probability of a user forwarding a disinformation with considering unobserved features

Identify Features at Affect User Disinformation
Forwarding Behavior on Social Networks 3.1.1.eoretical Basis.Aral et al. indicated that behavior spreading on social networks could be regarded as a result of the interaction between social influence, user susceptibility, and user self-publishing [34].Users create social influence that affects their opinions, attitudes, and behaviors within a social network.Chaiken et al. found that the influence is essentially social persuasiveness [35].Fang et al. further suggested that social persuasiveness originated not only from social influence, but also the combination of social influence, entity similarity, and social structural equivalence [11,36].According to social comparison theory, people adjust their behavior according to the information released by other people around them to reduce uncertainty when they made decisions.Pfeffer et al., Friedkin, and Festinger further proposed the social influence network theory [37][38][39].According to this theory, users on social networks usually have their own ideas about events from the beginning and then further consolidate or change their opinion or attitude by interacting with other members in the social network.ese theories all suggest that social influence is generated in the interaction between users on social networks [40], and it could affect the views and behaviors of users [40][41][42].erefore, when people invest a great deal of time and energy in establishing solid social connections with each other, they tend to trust members with strong social connections.It means that social influence between members with strong social connections is much stronger than that between other members [43][44][45].Social comparison theory and social influence network theory suggest that entity similarity also has an impact on behavior spreading.ey found that the behaviors or opinions of social network users were similar simply because their characteristics were similar [46,47].Generally, entity similarity refers to the degree to which two entities in a social network are similar demographically and behaviorally [48].People with similar demographic characteristics usually have similar needs and preferences [49].People with similar behavior usually have similar views on products or services [33,49].McPherson et al. found that the similarity in behavior between members was even independent of their interaction behavior [50].Generally, the more similar the members are, the more likely they are to take concerted actions [11].
e characteristics of social network topology also affect the views and behaviors of users on this social network [51,52].Studies have shown that network structure characteristics could affect the similarity of views and behaviors between users on this social network [52,53].Structural equivalence is a basic feature of social network topology.Users with equivalent structures tend to have similar views and behaviors because they tend to learn from users with equivalent structures when they make decisions [51].Additionally, the topological structure characteristics of communities and the characteristics of individual users also affect the degree of similarity of views and behaviors between members [10,54].
Previous studies rarely took the susceptibility of users into consideration when predicting user behavior.Instead, they often used social influence to indirectly measure susceptibility [54,55].In fact, the awareness of received information is also essentially a type of susceptibility of users, which is independent of social influence, and also affects the views and behaviors of users [56][57][58].Researchers have found that people believe disinformation and actively spread them on social networks only when they are susceptible to disinformation, whereas other people do not respond to disinformation because they detect disinformation or believe disinformation but are not interested in them [59].e reason that people are susceptible to disinformation is that the content of disinformation is consistent with their original cognition.Specifically, the degree of agreement between the characteristics of disinformation content and their cognition (i.e., susceptibility) determines their susceptibility to disinformation [60].
is also means that exploring the susceptibility of people to disinformation features can help to predict the susceptibility of people to disinformation.
e characteristics of a disinformation include the following: the information form of the disinformation, such as the #, @, URL, length, and picture; semantics; confounding, such as fuzziness and clear behavior; and emotion [1].
It is difficult to measure the spontaneous behavior of users directly, which can be regarded as the unobserved features of disinformation forwarding.Additionally, the disinformation forwarding behavior of users may also be affected by other confounding features [46,61] that include the specific content about the interaction of people and the spread of disinformation offline [62].In addition to the features that can be traced to a clear source, there are many unobserved features that can significantly influence the disinformation forwarding behavior of people [63].
Based on the above analysis, we first measure the features that affect the disinformation forwarding of social network users in addition to confounding features and then use the features that influence the disinformation forwarding of social network users, including confounding features, to predict the disinformation forwarding probability of social network users.

Operationalization.
Based on the above discussion, we divide the features of the disinformation forwarding behavior of social network users into four categories: the strength of social influence of the disinformation publisher on the disinformation forwarder, the entity similarity between the disinformation publisher and disinformation forwarder, the structural equivalence between the disinformation publisher and disinformation forwarder, and the susceptibility of the user to disinformation.
is a group of users on a social network.Users are connected by two-way or one-way links.e social influence of user v i on user v j (hereinafter referred to as the influence power) is denoted by I ij .I ij is measured as the interaction intensity X ij between user v i and user v j : where X max and X min denote the maximum and minimum social connection strengths, respectively.Additionally, standardization helps to prevent I ij from relying on the measurement unit of X ij .Interaction intensity X ij is a three- dimensional vector that measures the interaction intensity between users through the three dimensions of thumbs up, forwarding, and comments: See Table 1 for the calculation formula for the interaction strength between user v i v i and user v j .
(2) Entity Similarity.e entity similarity between user i and user j is denoted by S ij .S ij is measured as the similarity of each feature between user i and user j, that is, the distance between each feature: where e features that we use to measure the entity similarity of users can be divided into three categories: the profile similarity of users, the behavior similarity of users, and the comprehensive similarity of users.Additionally, the corresponding calculation formula for the entity similarity between users v i and v j is shown in Table 2.
e structural equivalence between user i and user j is denoted by E ij .E ij is measured as the similarity of each topological structure feature between user i and user j, that is, the distance between each topological structure feature: where e max1 , e max2 , . . ., e max11 denote the maximum value of each topology feature distance among all users and e min1 , e min2 , . . ., e min11 denote the minimum value of each topology feature distance among all users.e features that we use to measure the user structure equivalence can be divided into three categories: topological structure equivalence, the individual topological structure feature similarity between users, and the topological feature similarity for the community of users.e corresponding calculation formula for the structural equivalence of user v i and user v j is shown in Table 3. e susceptibility of users to disinformation refers to the susceptibility of users to various characteristics of disinformation.e susceptibility of user v j to disinformation M i is denoted by F ij .F ij is measured as the satisfaction degree of disinformation to various features and the susceptibility of the user to various features of disinformation.e calculation formula is as follows: where M ip denotes the satisfaction degree of disinformation M i to a certain feature and YG jp denotes the susceptibility of user v j to a certain feature of the disinformation.e characteristics of disinformation mainly belong to three categories: formal features, popularity, and semantic features.e formula for calculating disinformation M i that corresponds to the feature satisfaction degree M ip is shown in Table 4.
e susceptibility of users to disinformation features is calculated as the mean value of the corresponding features of all Weibo posts forwarded by them, which is then normalized.e calculation formula is as follows: where According to the analysis in the previous section, the features that influence whether user v i forwards disinformation m k published by friend v j comprise mainly four aspects: the social influence intensity I ij of user v j on user v i , the entity similarity S ij between user v i and user v j , the structural equivalence E ij between user v i and user v j , and the susceptibility F ik of user v i to disinformation m k .erefore, the calculation formula of p persuade ijk is as follows: where Retweet ijk � 1 indicates that user v i forwards disinformation m k published by friend v j and persuade ijk � 1 indicates that user v j persuades user v i to forward disinformation m k .We predict p ijk using the random forest algorithm.To improve the prediction accuracy of p ijk , we predict p persuade ijk using the bagging method and M-estimation.In particular, we extract training samples using the bootstrap method from the original sample set and obtain 50 training sets after 50 rounds of extraction.en, we train 50 decision trees from these sample data and calculate p persuade ijk as the average value of the probability estimation of the 50 decision trees.e training process and prediction process are shown in Figures 4 and 5, respectively.en, we calculate the probability p persuade ijk of user v i being persuaded to forward disinformation M k .e calculation formula of p persuade ik is as follows: where persuade ik � 1 indicates that user v i is persuaded to forward disinformation M k .

Predict the Probability p retweet ik of User v i Forwarding
Disinformation M k .We denote the unobserved variable that affects user v i forwarding disinformation M k by H ik and combine it with the probability p persuade ik of user v i being persuaded to forward disinformation M k .en, according to the Bayesian probability formula, the calculation formula of p retweet ik is as follows: For the naive Bayesian estimation, we assume that H ik and p persuade ik are independent.en, In the naive Bayes algorithm, a common approach is to make the probability density of each input variable obey the exponential distribution [64].
en, we can describe the probability density function of P(Retweet ik � r) as follows: f(Retweet ik ) � λ Retweet ik e − λ Retweet ik x , x ≥ 0, where λ Retweet ik is the parameter of the probability density function, r � 0, 1. en, hj Note 1: L ij denotes the interaction intensity between user v i and user v j in thumbs up Note 2: zan ij denotes the number of thumbs up received by user v i from user v j Note 3: zan hj denotes the number of thumbs up received by user v h (h ≠ i) from user v j Note 4： L max and L min denote the maximum and minimum values among all L ij , respectively Forwarding R ij � zhuan ij / v h ϵV,h ≠ i zhuan hj Note 1: R ij denotes the interaction intensity between user v i and user v j in forwarding Note 2: zhuan ij denotes the number of microblogs of user v i forwarded by user v j Note 3: zhuan hj denotes the number of microblogs of user v h (h ≠ i) forwarded by user v j Note 4: R max and R min denote the maximum and minimum values among all R ij , respectively Comments denotes the interaction intensity of user v i and user v j in comments Note 2: comment ij denotes the number of microblogs of user v i commented by user v j Note 3: comment hj denotes the number of microblogs of user v h (h ≠ i) commented by user v j Note 4: C max and C min denote the maximum and minimum values among all C ij , respectively

Forwarding behavior
Sen Note 1: if there exists a link from user v a to user v b , I ab � 1; otherwise, I ab � 0 Note 2: e(v i1 , v j1 ) denotes user topological structure equivalence between user v i and user v j Individual topological structure feature similarity Degree centrality ) and deg(v j ) denote the degree centrality of user v i and user v j , respectively Note 2: if there exists a link between entity v i and v j , then a ij � 1; otherwise, a ij � 0, and n denotes the number of users on social network Note 3: e(v i2 , v j2 ) denotes the equivalence of user v i and user v j in degree centrality Note 4: max deg(v) and min deg(v) denote the maximum and minimum values among all deg(v i ), respectively Closeness centrality ) and cls(v j ) denote the closeness centrality of user v i and user v j , respectively Note 2: d ij denotes the distance from entity v i to entity v j , and n denotes the number of users on social network Note 3: e(v i3 , v j3 ) denotes the equivalence of user v i and user v j in closeness centrality Note 4: max cls(v) and min cls(v) denote the maximum and minimum values among all cls(v i ), respectively Betweenness centrality Note 1: g ik denotes the number of shortest paths linking entity v j and entity v k , and g jk (v i ) denotes the total number of shortest paths from entity v j to entity v k , and these paths also traverse entity v i , where i ≠ j ≠ k Note 2: btn(v i ) and btn(v j ) denote the betweenness centrality of user v i and user v j , respectively Note 3: e(v i4 , v j4 ) denotes the equivalence of user v i and user v j in betweenness centrality Note 4: max btn(v) and min btn(v) denote the maximum and minimum values among all btn(v i ), respectively Percolation centrality Note 1: pc(v i ) and pc(v j ) denote the percolation centrality of user v i and user v j , respectively Note 2: if user v j forwards the disinformation published by user v i , D j � 1; otherwise, D j � 0, where i ≠ j ≠ k Note 3: e(v i5 , v j5 ) denotes the equivalence of user v i and user v j in percolation centrality Note 4: max pc(v) and min pc(v) denote the maximum and minimum values among all pc(v i ), respectively Eigenvector centrality − min Egn(v) Note 1: Egn(v i ) and Egn(v j ) denote the eigenvector centrality of user v i and user v j , respectively Note 2: x(v i ) and x(v j ) denote the influence of user v i and user v j , respectively, if there exists a link between entity v i and entity v j , then a ij � 1; otherwise, a ij � 0 Note 3: e(v i6 , v j6 ) denotes the equivalence of user v i and user v j in eigenvector centrality Note 4: max Egn(v) and min Egn(v) denote the maximum and minimum values among all Egn(v i ), respectively PageRank Note 1: Pr(v i ) and Pr(v j ) denote PageRank values of user v i and user v j , respectively Note 2: Ο(v j ) denotes the number of followers of entity v j .To ensure guarantees global convergence, the attenuation factor d is usually set to 0.15 Note 3: e(v i7 , v j7 ) denotes the equivalence of user v i and user v j in PageRank Note 4: max pr(v) and min pr(v) denote the maximum and minimum values among all Pr(v i ), respectively Clustering coefficient and c(v j ) denote the clustering coefficient of user v i and user v j , respectively Note 2: k i denotes the number of nodes directly connected to entity v i , E i denotes the actual number of edges of these nodes, and C 2 k i denotes the maximum number of edges that may exist Note 3: e(v i8 , v j8 ) denotes the equivalence of user v i and user v j in clustering coefficient Note 4: max c(v) and min c(v) denote the maximum and minimum values among all c(v i ), respectively Note 1: DE G(v i ) denotes the degree centralization of the communities which user v i and user v j belong to Note 2: e(v i9 , v j9 ) denotes the equivalence of the communities which user v i and user v j belong to in degree centralization Note 3: max DE G(v) and min DE G(v) denote the maximum and minimum values among all DE G(v i ), respectively Closeness centralization Note 1: CLS(v i ) denotes the closeness centralization of the communities which user v i and user v j belong to Note 2: e(v i10 , v j10 ) denotes the equivalence of the communities which user v i and user v j belong to in closeness centralization Note 3: max CLS(v) and min CLS(v) denote the maximum and minimum values among all CLS(v i ), respectively Betweenness centralization Note 1: BTN(v i ) denotes the betweenness centralization of the communities which user v i and user v j belong to Note 2: e(v i11 , v j11 ) denotes the equivalence of the communities which user v i and user v j belong to in betweenness centralization Note 3: max BTN(v) and min BTN(v) denote the maximum and minimum values among all BTN(v i ), respectively Community clustering coefficient ) denotes the size of the communities which user v i and user v j belong to Note 2: if there exists a link from user v Z1 to user v Z2 , I Z1Z2 � 1; otherwise, I Z1Z2 � 0 Note 3: e(v i9 , v j9 ) denotes the equivalence of the communities which user v i and user v j belong to in community size Note 4: max S(v) and min S(v) denote the maximum and minimum size among all S(v i ), respectively  Note 2: M i11 denotes the emotional intensity of disinformation M i Note 3: the emotional intensity of a word is determined by using an emotional dictionary Simultaneously, the confounding features are indispensable in a prediction.e problem is how to learn the parameter θ in the case of missing training data.
To overcome this obstacle, we combine bootstrap sampling and EM algorithm to learn parameter θ. e bootstrap method is mainly used for sampling training data, whereas the EM algorithm is used for learning the parameters of unobserved variables.
e EM algorithm is the most common implicit variable estimation method.It is an iterative process from the initial parameter estimation.Each iteration of EM is composed of Steps E and M. In Step E, the maximum likelihood estimation value is calculated using the existing estimation value of the unobserved variable, whereas in Step M, the value of the parameter is calculated by maximizing the maximum likelihood value obtained in Step E. Additionally, the parameter estimates found in Step M are used in the next Step E calculation and they are conducted alternately.
In our study, the specific steps for learning parameters are shown in Figure 6.
As mentioned above, we consider three cases when setting the initial value of the unobserved variable parameters.
erefore, we perform cyclic training for the three cases in the iteration of Steps E and M. e training algorithm for parameter θ is shown in Figure 7.
We can predict the probability of disinformation forwarding after training the parameters.Given parameter estimation  θ, for user v i and disinformation m k , where the probability density function of erefore, we can use the expected value based on H ik to approximately calculate the disinformation forwarding probability.As shown in Figure 8, the forwarding probability inference algorithm generates H ik samples repeatedly and calculates the disinformation forwarding probability using the sample data and equation ( 12) until it converges.e output of this algorithm is the expected value of the disinformation forwarding probability based on H q .

Empirical Evaluation and Results
We evaluate our proposed method on real-world social network data to demonstrate its superior prediction performance over several existing methods and then empirically analyze its important properties.In this section, we describe the data and benchmark methods, detail our evaluation design, and report important evaluation results.e data we used in this study comes from the microblog, which is the largest social media platform in China.We are provided with real data by the microblog data center as a result of our cooperation with them.We obtain a total of nine disinformation-mongering tweets that were widely spread on social media between April 2018 and August 2018, which covered common disinformation topics, such as personal safety, health and wellness, death anxiety, and geomantic superstitions.Additionally, 38,079 users participated in disinformation forwarding.It is noted that all the disinformation-mongering tweets in our experiments are identified as disinformation by Weibo platform.e information on the spread of these disinformation is shown in Table 5.

Benchmark Methods.
Compared to current research on prediction of user information forwarding behavior, our proposed method systematically investigates the features affecting user disinformation forwarding and takes unobserved features into considerations.erefore, to demonstrate the superiority of our proposed method, we design the following two groups of experiments to evaluate the features.e objective of the first group of experiments is to compare the effects of features used in the proposed method and features used in current methods on the prediction accuracy of user disinformation forwarding.Both the experimental group and control group use the same prediction algorithm, but the experimental group and control group use different features that influence disinformation forwarding.Because there is little research on features that influence disinformation forwarding in existing systems, we not only compare the common features that influence disinformation forwarding but also the common features that affect the behavior of social network users.Specially, in the first group of experiments, we do not take the unobserved features into consideration.
e objective of the second group of experiments is to evaluate the effects of unobserved features used in the proposed method on the prediction accuracy of user disinformation forwarding.Both the experimental group and control group utilize observed features used in the proposed method, but the experimental group uses our proposed prediction algorithm with considering the unobserved features, whereas the control group uses the regular classification prediction algorithms without considering the unobserved features.
Tables 6 and 7 summarize the methods compared in the evaluations.

Evaluation Design.
Our experiments are conducted using the following steps: Step 1: divide the training set and testing set.We obtain a total of 37,014 records of data.e ratio of records for which the user forwarded the disinformation, marked as 1, to data for which the user do not forward the disinformation, marked as 0, is 1 : 1.We randomly divide the data into a training set and testing set according to the ratio 4 : 1. e data after division are shown in Table 8.
Step 2: calculate the features that influence disinformation forwarding.We calculate the features that influence disinformation forwarding used by each experimental group and control group.In the analysis of social network topology, we divided the social network users collected into 28 communities using the Louvain algorithm and then further analyzed the topology of the internal community and individual users.
Step 3: predict the disinformation forwarding probability of users.We predicted the probability of each user forwarding each disinformation by inputting the calculated features that influenced disinformation forwarding into the classifier used by each experimental group and control group.We performed the above steps ten times, and each time we randomly divided the training set and testing set to ensure the stability of the evaluation results.

Prediction Results and Evaluation Analysis
4.4.1.Evaluation Index.To solve the binary classification problem, the machine learning models usually predict the probability value P (Y|X) for modeling, where X is the sample and Y is the category to which it belongs.Additionally, P(Y � 1|X) denotes the probability that Y belongs to category 1 when sample X appears.We set a threshold value and hypothesize it as k, and it is classified as category 1 if P(Y � 1|X) > k; otherwise, it is classified as category 0. For example, the default threshold of a neural network and logistic regression is 0.5.It can be seen that the threshold selection quality directly determines the generalization ability of the machine learning model, and the ROC curve can indicate the influence of threshold change on the classifier.
We calculate the true positive rate (TPR) as the vertical axis and false positive rate (FPR) as the horizontal axis: e area under the ROC curve (AUC) is used to determine the comparison between two machine learning models.9 shows the ROC curves of features generated from different methods.Figure 10 shows the AUCs of our proposed features that influence disinformation forwarding and other commonly used disinformation forwarding features in the classification prediction algorithm.Cross-validation AUC values are shown in Table 9.
Proposed features (PF) refer to features proposed in our method.Social influence features (SI) refer to features related to social influence.Both social persuasion features 1 (SP1) and social persuasion features 2 (SP2) refer to features related to social persuasion, which include social influence, structural equivalence, and entity similarity.e primary difference between SP2 and SP1 is that the structural equivalence and entity similarity are calculated in different ways.SP1 calculates structural equivalence with only considering one kind of structural equivalence and calculates entity similarity with only considering profile similarity.SP2 calculates structural equivalence with considering social network topology and profile similarity and calculates entity similarity with considering profile similarity, behavior similarity, and comprehensive similarity.
From Figures 9 and 11, the following results can be concluded: (1) Regardless of the classifier, the best performance results come from our proposed features (PF), compared with other features used in control groups.is demonstrates that our proposed features, which consider both social persuasion of publishers and the susceptibility of users, can greatly improve the prediction accuracy of classifiers.
(2) Regardless of the classifier, the worst performance results come from features related to social influence (SI).By using features related to social persuasiveness for the prediction of individual disinformation forwarding (SP1), the prediction accuracy of classifiers can be significantly improved: when using the PM, NB, RF, ADB, LR, KNN, and SVM classifiers, . is demonstrates that structural equivalence features calculated by our proposed method is more reasonable.(4) Our proposed method takes the susceptibility of users to disinformation.We compare the effects of features considering the susceptibility of users to disinformation (PF) with those without considering the susceptibility of users to disinformation (SP2) on the prediction accuracy of user disinformation forwarding.e results show that when using the PM, NB, RF, ADB, LR, KNN, and SVM classifiers, the AUC value of PF increases by 2.12%, 1.25%, 3.28%, 3.54%, 4.31%, 4.93%, and 2.25%, respectively, compared with SP2. is demonstrates that the susceptibility of users to disinformation should be taken into consideration, when predicting individual disinformation forwarding.
We further evaluate the prediction accuracy of our proposed method considering the unobserved features by comparing it with other classification methods.Table 10 and Figure 11 show the AUC values of each classification method.Figure 12 shows the ROC curves of each classification method.
According to Table 10 and Figures 11 and 12, the prediction accuracy of our proposed method is the highest, compared with other common classification methods including NB, SVM, LR, RF, ADB, and KNN. is demonstrates that unobserved features should be taken into consideration when predicting the individual disinformation forwarding behavior.
In summary, the experiment results demonstrate that our proposed features can greatly improve the prediction accuracy of classifiers.
e experiment results also demonstrate that our proposed method considering unobserved features exhibits superior performance than common classification methods.

Discussion and Conclusion
We propose a novel method to predict the disinformation forwarding probability of users on social networks and verify it using real data.Our proposed method makes two important contributions.First, we study disinformation forwarding behavior from the perspective of social network behavior spread and then identify and calculate the key features that affect the disinformation forwarding behavior of social network users.Second, we develop an algorithm that combines Random Forest and EM to predict the probability of social network users forwarding disinformation based on the key features.e greatest advantage of this algorithm is that it can take into account the influence of unobserved features on the behavior of social network users, which was not available in previous studies on disinformation forwarding behavior.
e results demonstrate the following three important points: (1) social persuasion of publishers, rather than social influence of publishers, should be taken into consideration when predicting individual disinformation forwarding; (2) when predicting individual disinformation forwarding, the susceptibility of users to disinformation and the unobserved features should be taken into consideration; (3) structural equivalence features should be calculated by considering more topology information of SNS.
Our study also has some implications in practice.First, the government can monitor the development of public opinion more efficiently using our proposed method, which can locate users who may believe and forward disinformation and refute them in a targeted manner.For example, during the outbreak of COVID-19, a large number of disinformation spreads easily on social networks and made people anxious.With the support of our method, the government could have determined the people who easily believe disinformation related to the pandemic scenario before they forward the disinformation and performed a targeted refutation of disinformation, which could have effectively curbed the spread of disinformation and helped to stabilize the hearts of people.Second, enterprises can eliminate the negative impact of disinformation on their brand image using our proposed method.For instance, disinformation that is not conducive to the image of an enterprise is often spread on social networks when an enterprise encounters some major crisis.With the support of our method, enterprises could identify people who easily believe this disinformation in a timely manner and perform targeted disinformation refutation to eliminate the negative impact of disinformation on the enterprises and help the enterprises survive the public opinion crisis.
Our study could be extended in several directions.First, although our proposed method considers the features that affect the disinformation forwarding of social network users more comprehensively than existing methods, there may also be more precise methods to calculate these features.e experimental results of this study prove that the original method for calculating structural equivalence between users can be further improved.Future research should explore more accurate methods to calculate the influencing features and evaluate their effectiveness accordingly.Second, our proposed method estimates the influence of unobserved features using the EM algorithm, but in fact, besides the EM algorithm, there also exists more than one algorithm that can estimate the influence of unobserved features.Future research should compare these algorithms and evaluate their effectiveness accordingly.ird, interactions between the influencing features, such as social influence and entity similarity, have interactive effects on the adoption decision [34,46].erefore, it is of great importance to investigate how to extend our approach by combining the features.

Data Availability
e data used to support the findings of this study have not been made available because of the user privacy reasons, and the privacy problem in this study cannot be solved through anonymizing data.

Figure 1 :Figure 2 :
Figure 1: e process of propagating information on social media.

Figure 3 :
Figure 3: e DFP method for predicting user disinformation forwarding behavior.

Note 1 :Note 2 :
Sen(v ik ) denotes the paragraph vector of the kth blog post forwarded by user v i Sen di s(v ik , v j ) denotes the semantic similarity between the kth blog post forwarded by user v i and all the blog posts forwarded by user v j , and n denotes the number of blog post forwarded by user v j Note 3: d(v i3 , v j3 ) denotes the similarity of forwarding behavior between user v i and user v j , and m denotes the number of blog posts forwarded by user v i

Note 1 :
C(v i ) denotes the clustering coefficient of the communities which user v i and user v j belong to Note 2: e(v i9 , v j9 ) denotes the equivalence of the communities which user v i and user v j belong to in clustering coefficient Note 3: max DE G(v) and min DE G(v) denote the maximum and minimum values among all C(v i ), respectively Community size

Figure 4 :
Figure 4: Construction of the training data algorithm.

Figure 6 :
Figure 6: e learning parameter process of the DFP method.
d max1 , d max2 , d max3 , and d max 4 denote the maximum value of each feature distance among all users and d min1 , d min2 , d min3 , and d min 4 denote the minimum value of each feature distance among all users.

Table 1 :
e calculation formula for the interaction strength

Table 2 :
e calculation formula for the entity similarity.

Table 3 :
e calculation formula for the feature types to measure structural equivalence.

Table 4 :
e susceptibility of users to disinformation features.

Table 4 :
Continued.Note 1: like i denotes the like number of disinformation M i , and max like { } and min like { } denote the maximum and minimum values among all disinformation like i , respectively Note 2: M i8 denotes the relative like number of disinformation M i among all disinformation i denotes the emotional intensity of words in disinformation

Table 10 :
AUC values for the proposed method and benchmark methods.