Rumor Identification with Maximum Entropy in MicroNet

The widely used applications of Microblog, WeChat, and other social networking platforms (that we call MicroNet) shorten the period of information dissemination and expand the range of information dissemination, which allows rumors to cause greater harm and have more influence. A hot topic in the information dissemination field is how to identify and block rumors. Based on the maximum entropy model, this paper constructs the recognition mechanism of rumor information in the micronetwork environment. First, based on the information entropy theory, we obtained the characteristics of rumor information using the maximum entropy model. Next, we optimized the original classifier training set and the feature function to divide the information into rumors and nonrumors. Finally, the experimental simulation results show that the rumor identification results using this method are better than the original classifier and other related classification methods.


Introduction
Rumors circulate rapidly as it is suggested that the Internet is harmful for society.The number of active users each month for Sina Microblog (like Twitter) reached 313 million at the end of 2016.In comparison, the number of WeChat (a free application that provides instant messenger services where users can share information to friend's circle which is only visible to their friends) users reached 7.68 million in September 2016, with WeChat covering more than 90% of the usage of mobile phones.There is less and less time required to spread information, which is very convenient for rumor spreading.In this paper, rumor is defined as "information which is inconsistent with the facts" or "information fabricated according to certain facts and promoted by means of certain means." Internet rumors have a negative impact on people's lives.Internet rumors would destroy personal reputation, social cohesion, and even national stability.Therefore, it is particularly important to identify and block rumors.The identification and blocking of rumors are intended to block rumors before they spread widely and have a huge negative impact.Social network rumor identification has become a hot research topic in the field of network information security.
Rumor identification is essentially a classification problem, which divides information into rumors and nonrumors.
The accuracy rate of this simple classification is particularly low, especially for classified fuzzy information.Therefore, we claim that the best results of the rumor identification will be obtained by calculating the probability of information being rumors.If the probability exceeds the threshold, we can release official information to block rumors.As for information with a lower probability, we can track its dynamics.

Background
Social network rumor is a type of network public opinion."Internet public opinion" involves the expression of the emotions, attitudes, and opinions to public events through the Internet [1]."Network public opinion" is usually subjective and thus can include false information, such as rumors.A rumor is an unauthorized exposition or interpretation of things, events, or issues that are of interest to the public [2].Social network rumors are rumors that spread through the Internet.
There are three types of studies for social network rumor identification.
The first class includes the feature extraction methods of rumor identification [3].Many researchers construct a training set and training classifier using the characteristics of Internet rumors.Following this, they optimize the model or algorithm using the classification results.There are many 2 Complexity models detecting rumors using characteristics of rumor makers and disseminators.
Qazvinian et al. [4] examined rumor identification in Twitter.They explored the impact of characteristic content, network, and Twitter on rumor identification.They collected more than 10,000 pieces of information from Twitter in an artificial experiment.They found out the general rules of rumors through analysis of the shallow text features of Twitter messages, behavioral characteristics of users, and elemental characteristics.They constructed multiple bias classifiers and ensemble classifiers to detect rumors in Twitter.The average precision of this model is above 0.95.Using the prototype system of rumor identification constructed by Wang [5], they also used content features, like @, #, and other symbols, and the content of URL (Uniform Resource Locator) links to identify suspicious users.Based on the complex relationships formed by users, the system considers the number of followers and friends in addition to a user's credibility as a spam recognition feature.The authors constructed a number of bias classifiers and ensemble classifiers on a large data set to analyze the behavior of the users with regard to rumors.
Scholars continue to add new features to improve the accuracy and efficiency of rumor identification based on common features.However, the basic idea is to regard rumor identification as a classification problem.
Ratkiewicz et al. [6] constructed a "Truthy" identification system for political rumors, which also uses symbols to detect political rumors in Twitter.However, the system is more likely to detect rumors based on the emotional analysis of content features.
Yang et al. [7] used counter-rumors released by official accounts as a standard to make corpus annotations and to train the text classifier.They used two novel features based on the client program type and position.They researched rumor identification using an SVM (Support Vector Machine) and proved these two novel features are helpful in improving identification accuracy.Some scholars examined the number of retweets and the deletion of retweets as potential rumor identification features.Suzuki [8] believed that a tweet has a high degree of credibility if it is forwarded more frequently and if these retweets were not deleted for a period of time.This author provided a method to calculate the credibility of tweets.The method uses the difference between a tweet and its retweets as a basis.Suzuki [8] claimed that a tweet that has retained most content in its retweets is more credible.
Liu et al. [9] explored the impact of user-specific features on the rumor identification in social networks.They assumed the ability of a user to spread a piece of information depends on the features of the rumor and the properties of the user.Their novel method can detect rumors by noting the difference in patterns between rumors and credible information.
The second class is rumor identification focused on information after major events.
Takahashi and Igata [10] also researched rumors about disasters in Twitter, in particular focusing on rumors related to the Japanese earthquake tsunami, to find rumor identification indicators.They found that the breaking point, forwarding rate, and word distribution difference are all useful for rumor identification.Based on these three indicators, they designed a rumor identification system.
Gupta and Kumaraguru [11] selected a large amount of information about 14 high impact events of 2011 in Twitter.They found out that 30 percent of tweets are related to these events, of which 17 percent of information is credible and another 14 percent cannot be trusted.The authors used regression analysis to determine which content and source features can be used to detect rumors.They used supervised machine learning and feedback methods to rank the credibility of information.
Xing and Ruijie [12] pay attention to the spreading of rumors in earthquake.They designed the earthquake rumor theme crawler to obtain rumors related to earthquakes by focusing on earthquake topics.
The third category involves other related research.Sun et al. [13] proposed a novel method of detecting Microblog rumors.This method was mainly used to detect rumors in which there is no match between text and pictures.They extracted pictures that do not match the text and used an external search engine to find the source of the pictures.Following this, they used the reliability of image sources to judge the credibility of information.They selected a number training sets to improve the accuracy of this classifier constantly.Thus, they can use the classifier to identify rumors.
Kumar and Geethakumari [14] used cognitive psychology to find a method of rumor identification.They proposed an algorithm to detect deliberate spreading of rumors.This method uses the collaborative filtering characteristics of social networks to measure the credibility of sources of information and the credibility of news.
Cai et al. [15] found a rumor identification method by studying the response to information of different people.Morris et al. [16] researched the credibility of Microblog information through two experiments.Al-Khalifa and Al-Eidan [17] also designed a system to calculate the credibility of Twitter messages.
Zhang et al. [18] set up a model for several rumor sources.They considered the rumor identification problem as a problem set.They proposed a framework that provides a method to effectively evaluate multiple, independent rumor spreading models.Shah and Zaman [19] tried to find rumor sources by using random trees.They proposed a method that can effectively detect rumor sources.On the basis of Shah and Zama's research, Fuchs and Yu [20] derived a asymptotic formula for a random growth tree to detect rumors.
Cheng [21] researched rumor identification service for the old on social platform.He analyzed dyslexia and interest points in the face of information about old people and put forward that it was necessary to provide rumor identification service for old people.
Wu et al. [22][23][24] proposed a model to investigate whether knowledge learned from historical data could potentially help identify newly emerging rumors.They provided a principled way to leverage prior labeled data to detect emerging rumors, proposed a novel sparse learning method to jointly select features and train the classifier for rumors, and evaluated the proposed framework extensively using real-world social media data.They pointed out utilizing cross-modal information to further facilitate the detection of rumors of all sorts.
The rumor identification model based on maximum entropy in this paper belongs to the first category.

Rumor Identification Model Based on Maximum Entropy
3.1.Maximum Entropy.The maximum entropy principle is one of the most consistent and objective criteria to select statistical properties of random variables.When the distribution of random variables is most uniform, it has a maximum entropy.The maximum entropy principle can translate a problem into an optimization problem under certain constraints.The maximum entropy model was used to estimate probabilities.Several studies [25,26] have applied the maximum entropy model to text classification and have proven that this model is better than other classification methods through experiments.Rui-Hua et al. [27] compensated for losing the characteristics of the maximum entropy model.Xue-Xiang [28] improved the weighting of features to improve the accuracy of text classification.
After consulting a large number of related literatures, we found that there is still no research on rumor identification based on the maximum entropy model.

Constructing the Model.
The maximum entropy model is used for rumor identification.We detected rumor by calculating the maximum entropy of text to obtain the probability of information being rumors.
Vocabulary used in information is an important characteristic of information and a message contains a number of features.We can calculate the probability that information containing the word  belongs to class a through a training set.Given a training set,  = { 1 ,  2 } is a collection of information categories.In this formula,  1 indicates that the information is a rumor while  2 indicates that the information is not a rumor.  = { 1 ,  2 , . . .,   } is a collection of the featured word sets in the information,   .There are some words and phrases used in rumors frequently, like "break out," "harmful," "rapidly," "cure," and so on."Food safety," "pay attention," "somewhat," "experiment" and some other words and phrases are used more in real information.
Due to the diversity of rumors, there will be a considerable number of two-tuples (  ,   ) that do not appear even if we have a large training set, which is known as the "sparse incident" problem.It is obvious that it is unreasonable to take the probability as 0. This problem can be solved by using a maximum entropy model, which can always make the probability distribution of the nonclassified event as uniform as possible, that is, tending to get maximum entropy.
Shannon [29] believed that if the release of a source of information is uncertain, the probability of a source releasing different information can be measured by information entropy.His definition of information entropy is where (  ),  = 1, 2, . . .,  indicates the probability that a source takes the first  symbol.() is the information entropy of a source.
In this paper, based on the maximum entropy principle and a uniformity principle based on the conditional distribution, we used the following formula to calculate the information entropy of a text: where p() is the empirical distribution of  in training set and ( | ) is the probability of a message belonging to class .
We needed to calculate the maximum information entropy of a text.Thus, there is a probability distribution formula based on the maximum entropy principle: This is used to find  to make the information entropy () as large a value as possible.
In the absence of any prior knowledge, it is known that the maximum entropy is largest when distribution is the most uniform.Thus, the condition for the maximum value of formula (3) is We can calculate the probability values of some twotuples (  ,   ) using a training set.Thus, the rumor identification problem becomes about finding an optimal solution for the maximum entropy under the partial information condition or to satisfy certain constraints.
In order to express the known information, feature functions were introduced.In general, the feature function is a two-valued function, where (, ) → {0, 1}.As for rumor identification, the feature function can be defined as We optimized function as follows: where (,   ) indicates the times in which   appears in   .
As the sample features are few and scattered, most of the values of the feature function are 0 for a message.Thus, we used additive smoothing techniques to solve this problem.The additive smoothing method adds a fixed value to all events (including all events that occur in model and not) to avoid zero probability events.We directly added 1 to Complexity all feature functions.The smoothing feature function is as follows.
Rumor classification is a two-category problem that features the content of information based on the basis for identification.We knew that   contains rumor features and nonrumor features.At present, we only measured the number of rumor features in   .Essentially, this was the probability that a message with more rumor features is greater when we calculated  * ( 1 | ).However,   also contained nonrumor features, which can reduce the probability of   being a rumor.Therefore, we should take this feature into account for the feature function.Thus, we should show all the features in feature functions by taking rumor and nonrumor features into account when calculating  * ( 1 | ) and  * ( 2 | ).We improved feature functions again as follows: We considered the influence of rumor and nonrumor features on rumor identification in novel feature functions.
The expectation value of the empirical probability distribution p(, ) for the feature function is The expectation value of a feature function ( | ) is We claim the two expectation values are equal so that Equation ( 10) is called the constraint condition.Given  feature functions,  1 ,  2 , . . .,   , we can obtain  constraints of the probability distribution we want to find.
Among this,  = 1, 2, . . ., .Thus, we transform the problem into finding optimal solutions satisfying a set of constraints so that We used the Lagrange multiplier method to find an optimal solution: where   is the weight of the feature function, which we calculate by learning from a training set.Thus, we can get a probability distribution function and construct the maximum entropy model.After   is obtained, we can calculate the probability of information   belonging to classes  1 and  2 as long as we choose a larger probability category of   for the classification result of information   .

Experimental Design and Results Analysis
4.1.Experimental Process Design.The experimental process is divided into two parts: the training process and testing process (as shown in Figure 1).We selected a number of rumors and nonrumors for a training set.Text features are derived from a training set.We selected an optimal subset of these features for calculation.Thus, by selecting a training text and extracting its feature set, we can obtain the final classification results through the classifier.
For the training of a classifier by existing rumor classification models, rumor text and normal text are selected as a training set.This classification problem is taken into account on an imbalanced data set, so the training set and the validation set are also imbalanced datasets.We collected 1430 rumors, 1430 corresponding denials of rumors, and 5000 pieces of real information about civil life, economy, and livelihood policy from Sina Microblog between 2012 and 2016.First, we selected 700 rumors and 2100 random pieces of real information as training set one to train a classifier.Considering rumors and random pieces of real information that had low similarity, we improved the training set to better train the maximum entropy classifier.We selected 700 rumors, 700 corresponding denials of these rumors, and 2100 real information pieces as training set two.The rest of the information constituted a test set.We used training set one and the improved training set two to train a classifier, comparing the results of a classification test.

Experimental Results and Analysis.
We used microaverage accuracy (arithmetic average of model performance index) as the evaluation index of this classifier.The performance of the classifier is studied from the following four aspects: (1) The influence of different training sets on identification results (2) of the classifier with a different number of features (3) Effect of an improved feature function on identification results (4) Classification accuracy comparison of the maximum entropy model classifier, SVM, BP (Backpropagation) neural network, Bayes, and decision tree classifier When training the classifier, we used training set one and the improved training set two to perform experiments.We generated features with ICTCLAS which is an software of word segmentation and selected a number of features using the  2 method and trained parameters with the GIS (generalized iterative scaling) algorithm.
We used two training sets (as shown in Table 1) to train the classifier and to run experiments with a different number of features and different feature functions, including original and improved feature functions.According to Figure 2, we can get the following conclusions: (1) With an increased number of features, classification accuracy increased gradually, although it is not always better with more features.When the number of features reaches a higher level, classification accuracy decreases.Classification result is optimal when 200 features are selected.
(2) The classification accuracy is clearly improved when using an improved training set.
(3) The classification accuracy is higher with an improved feature function.
In order to measure the validity of the maximum entropy model classifier, we drew a ROC (Receiver Operating Characteristic) chart (as shown in Table 2 and Figure 3) for the experimental results and set the threshold to 10 percent.We also calculated the AUC (Area Under Curve) value of this classifier.After calculation, the AUC value of the maximum entropy model classifier is 0.7954, which can be considered as having a higher efficiency.
In order to compare the rumor identification effect of different rumor classifiers, we choose four commonly used The following conclusions can be obtained from Figure 4: (1) The rumor identification method based on the maximum entropy model is obviously clearly more optimal compared to the BP neural network identification method, Bayes, and decision tree identification method.(2) When the number of features is less than 200, the identification accuracy of the improved training set and the feature function is better than that of the SVM.When the feature number is great than 200, the identification accuracy of the two methods is decreased, with a greater decrease in the identification method based on the maximum entropy model.

Conclusions
Research shows that there is a relatively low accuracy of rumor identification despite the type of identification method used.This is because some information classification attributes are fuzzy.Essentially, when the information is classified by two categories, there is a relatively small index value difference that indicates information belonging and thus the classification accuracy of such information is relatively low.We calculated statistics about the sample classification accuracy when | * ( there is a significant difference between the probability of the information belonging to  1 and  2 , with classification accuracy being 92.3%.Based on the above findings, we will further classify information and put forward relevant recommendations.When  * ( 1 | ) −  * ( 2 | ) > 0.5, we can consider such information as rumors and should release official warnings to remind users to pay attention to this rumor.When 0.2 <  * ( 1 | ) −  * ( 2 | ) < 0.5, the probability of such information being a rumor is relatively small.We could track it to find rumors timely.If information has | * ( 1 | ) −  * ( 2 | )| < 0.2, its classification accuracy is low and we need to find other rumor identification methods to improve accuracy.
In this paper, we researched a rumor identification method in the social network environment.Based on information entropy, we used a classifier of the maximum entropy model to detect rumors.This classifier uses Chinese word segmentation software to generate information features and improves the feature function of the ordinary maximum entropy model.We solved the problem of "sparse features" using an additive smoothing method.We performed experiments examining the impact of the training set, the feature function, and the number of features on the performance of this rumor classifier based on the maximum entropy model.Furthermore, we compared the novel rumor identification classifier and other commonly used text classifiers.Experiments show that the improved training set feature function can improve the accuracy of rumor identification.Meanwhile, the accuracy of this method is higher than other common methods.The accuracy of this classifier is clearly decreased when there are more features in the model, meaning that it is more sensitive to the number of features.Our main task in further research is to find out how to reduce the sensitivity of the classifier to the number of features.We also hope to improve the feature function and expand the scale of our experiments in further research.

Table 1 :
Micro-average accuracy comparison among different training sets.

Table 2 :
ROC statistics of the maximum entropy model.

Table 3 :
Micro-average accuracy comparison among different classification approaches.
methods (SVM, BP neural network, Bayes, and -Means as shown in Table3) for rumor identification to run experiments with the classifier-based maximum entropy model.