Survey on Astroturfing Detection and Analysis from an Information Technology Perspective

,


Introduction
With the rapid development of the Internet, more people are communicating with each other through network applications and services. Recently, how people work, contact, acquire information, and purchase has been greatly influenced by e-commence, the emergence of social network (e.g., Facebook, Twitter, WeChat, and Weibo) in particular. Among these network applications and services, there widely exists a kind of suspicious online behavior, namely, astroturfing, mostly appearing in business and political events as others' opinions often have an important impact on an individual's impression of a certain subject [1]. In other words, the attitude and opinions of online users are very likely to be affected by other users. aware of, and it poses a challenge to distinguishing truth from falsehood. erefore, it is urgent to figure out how to detect and combat astroturfing. Researchers from IT and sociology are highly interested to study astroturfing.
In the current paper, we will study astroturfing mainly from an IT perspective rather than a sociology perspective. Generally speaking, IT covers any form of technology, i.e., any equipment or technique used by a company, institution, or any other organization that handles information. us, our main focus is to design algorithms and technologies to effectively detect online astroturfing and help users quickly identify potential online astroturfers. Furthermore, we give a comprehensive and practical discussion from a technical perspective, rather than from the social analysis view.
Unfortunately, as a kind of suspicious online behavior, astroturfing has a much closer relationship with and has differences that can be easily confused with other suspicious online behaviors: traditional spam [4,5], fake review [6,7], social spam [8][9][10][11][12], and link farming [13,14]. Hence, firstly, we should attempt to find out the essential characteristics of astroturfing. en, we find out how to design a semiautomatic or automatic computer algorithm and realize how it will be naturally carried out to suit different data sizes. To wrap up, from the IT perspective, we hope that the computer has the learning ability to effectively detect unknown astroturfing activity in big social data. In this paper, we will summarize the astroturfing detection and analysis based on the astroturfing feature, the basic method, evaluation, and application. e structure of this article is organized as follows: section 2 will discuss the astroturfing feature. Section 3 will present the learning-based astroturfing detection methods and describes the effective detection evaluations. Section 4 shows the astroturfing detection applications. Finally, the future directions are envisaged in section 5.

Feature in Definition.
e group or individual involved in the action of "astroturfing" is called an "astroturfer" or "Internet water army." It is very difficult to define astroturfing in a qualified and accurate way. erefore, several descriptive explanations of "astroturfing" are summarized as follows: (i) e behavior of covering up the sponsors of a particular organization, such as public relations, advertisings, and politics, can make such a behavior seem like it originated from the grassroots participants [15]. (ii) e "fake" grassroots participants are established by public organizations [16]. e astroturfing situation occurs when a large number of people are employed to post some employer-mandated statements or claims via various public social channels that they do not subjectively support [17,18]. (iii) e expensive fraud at the national level is explained as follows [19]: to convince the public, the astroturfing propagates "fake" claims among the public. Furthermore, it is costly to employ astroturfing and spread the employer-mandated claims. Hence, the influence of astroturfing is constrained by financial support [20].
From these explanations of "astroturfing," we generalized the following feature keywords: (i) Money business: if astroturfing exists, there must be an employer who offers money indirectly to the lobbying firms or directly to the grassroots. (ii) Content effect: any astroturfing should aim to timely achieve the effect of astroturfing by posting a large number of employer-mandated opinions to public channels or by improving the corresponding search rankings so the employer-mandated claims can obtain wider public attention. e content consists of two parts: claims and posts.
erefore, the definition of the concept features of astroturfing should consists of two parts, namely content effect and money business. Furthermore, these two parts can be utilized to guild the astroturfing detection. is can be simply represented as follows: (1) In the aforementioned formula, V(money business) and V(content effect), respectively, represent the feature sets of money business and content effect. However, from the IT perspective, the money business feature is very difficult to mine as a lot of evidence is missed if we rely on the open Web data alone, whereas the other feature can be obtained in public Internet. us, all IT research studies on astroturfing belong to the category of mining and utilizing the content effect feature.

Feature in Suspicious Behavior
Category. In the behavior category of suspicious online behaviors, compared with other known suspicious behaviors, astroturfing has some similarities, as well as differences from the traditional ones, such as spam [4], fake review [6], spam (social) [8,9], and link framing [13]. We make a comparison of them in Table 1.
As observed in Table 1, we select five aspects to make a comparison. e application indicates the carrier of a suspicious online behavior. Among various suspicious behaviors, astroturfing mainly appears in business-to-consumer (B2C) and customer-to-customer (C2C) e-commerce applications, such as TaoBao, Amazon, and Netflix. In addition, astroturfing also appears in various recent social networks, such as Weibo, Facebook, and Twitter. In comparison, the spam appears in the applications of email and subscriber management system (SMS) alone.
As for the aspect of the participant, astroturfing is implemented by artificial intelligence (AI) or an individual via the online astroturfing platform or in other ways online that can achieve the purpose of performing requiring effects.
Similarly, fake reviews, spam (social), and link farming also utilize an AI program or an individual. e time denotes the participating time within the particular task cycle. In an astroturfing mission, generally speaking, the duration of astroturfing behavior is not long. Usually, it takes a few hours or a few days to achieve the purpose of generating sufficient public influence through a series of astroturfing operations. However, the link farming behavior usually needs more time.
e participant visual refers to whether the suspicious behavior of participating is visible to normal social network users. To achieve the purpose of confusing the public, the participating behavior of astroturfing must be visible to the public. In addition, the scale represents the actual number of users participating in such abnormal network behaviors. In most cases, for an effective astroturfing campaign, at least a few hundred astroturfers are involved.
Hence, astroturfing's main features from the suspicious online behavior category are the ones that can cover those in the fake review and social spam behaviors.
is can be simply represented as follows: (2) Hence, we can exploit the detection approach to fake reviews and social spam to detect astroturfing.

Feature in Social
Application. As mentioned above, the exploitation of the detection approach to fake reviews and social spam helps detect astroturfing. We will discuss the feature in the operable actions of social applications. We try to generalize a formula to calculate the astroturfing probability by an activation function H(s), which is as follows: where x i is the i th feature and ω i is the corresponding weight. It is noted that the like action m like and the comment action n comment are key to generate the content effect, and hence, these two action features can be composed and assigned with bigger weights. e composed feature is calculated as follows: where M 0 and N 0 are the thresholds of the like action number m like and comment number n comment , respectively, for a specific poster. en, we describe the common feature x i in real social applications and list the corresponding notation for all features and their parameters in Table 2.

Like
2.4.1. Following Similarity. e following similarity F sim of the accounts u i and u j can be calculated as follows: where F(u) denotes the original tweets set for the account u.
Astroturfing has low following similarities.

Low Followers.
Astroturfing has fewer followers, while their following is more. us, the index of low followers F low can be calculated as follows: Generally, astroturfing has a relatively high F low .

Comment
2.5.1. Percentage of Replies. Astroturfing hardly read and reply to others' comments. Posting new comments rather than reading and replying are preferred. us, we can calculate the ratio of the number of replies to the total number of comments from the same user as follows: p � the number of replies the number of total comments , where p represents the ratio. Astroturfing has a low value of p.

e Ratio of Similar Comments.
To minimize their own workloads and obtain the maximum benefits, astroturfing will make a number of published duplication or other people's comments when they make false comments. e ratio of similar comments is high.
2.6. Retweet 2.6.1. Retweet Similarity. e two retweets are for the same tweet and are created within a threshold time window. e retweet similarity RT sim between the two accounts u p and u q can be computed as follows: where RT(u p ) � (u p , T 1 , tid 1 ), (u p , T 2 , tid 2 ), . . . , (u p , T n , tid n ),T i refers to the moment for retweeting, and tid i represents the retweeted ID of u p .

Retweet Time Distribution.
As the tweet is constantly exposed to astroturfers, it is continually retweeted. As a result, the average retweet time for astroturfing is much longer than that for normal internet users. e astroturfing tweets have a higher standard deviation and the lowest kurtosis. A near-zero skewness is also noticed for most astroturfing tweets, suggesting the even retweeting of these tweets.

e Most Dominant (MD) Application's Percentage.
Twitter applications from the third party are often used to produce retweets for collusion-based astroturfing services. e percentage of the amount of retweets generated by the MD application to the total quantities of the retweets can be calculated as follows: RT ratio � dominant application retweets total reweets .
On an average, approximately 90% of the crowdturfing retweets and nearly 40% of the normal retweets are generated by dominant applications.

e Number of Unreachable Retweeters.
Most astroturfing users will not follow the initial tweeter who posts the tweet when retweeted since the astroturfing services randomly recommend tweets to uncertain users without focusing on the relationships between these users. Approximately 80% of the astroturfing tweets have about eighty percentage "unreachable" retweeters. As the identifiable feature, the proportion of "unreachable" retweeters for astroturfing is relatively high.

Average Interval Time of Posts.
It is considered that normal users are less aggressive when posting comments, while astroturfing is not interested in online discussion but is keen to complete the task in the least amount of time, suggesting a shorter average interval time from paid posters. us, for the same user, we calculate the mean interval time among different adjacent user comments within each signal episode, computing the total average overall execute episodes. e result indicates that 60% of the potential paid posters are posting at a speed of 200 seconds per interval, whereas only 40% of the normal users post comments at this speed.

e Posting Frequency.
It refers to the average number of posts generated per day and is computed as follows: nb /day (posts) � nb total (posts) age days (account) .
Astroturfing has a high posting frequency.

Notation
Meaning u i Account p e percentage of replies, and astroturfing has a relatively low value. P P p/q is the probability of word w p/q appears, and P p,q is the joint probability. s e consecutive sentence v Semantic vector representation k e piece of a certain sentence, and K is the total number of pieces l e total quantity of the sentences in the k th piece F sim Following similarity, and F sim (u i , u j ) denotes the following similarity of the accounts u i and u j F Original tweets set, F(u) denotes the original tweets set for the account u F low Low followers, and astroturfing has a relatively high value. RT sim Retweet similarity, and RT sim (u p , u q ) denotes the retweet similarity between the accounts u p and u q RT RT(u p ) � (u p , T 1 , tid 1 ), (u p , T 2 , tid 2 ), . . . , (u p , T n , tid n ), T i is the retweeting time, and tid i is the retweet ID RT ratio e most dominant application's percentage nb /day e posting frequency, and the astroturfing has a relatively high value Cl rat e number of received clicks, and astroturfing has a relatively low value P(w q |w p ) e transition probability from the word w p to w q PTP i,: Sentence transition, and astroturfing has a relatively high value O p,q Word cooccurrence sco as (s 1 , s 2 ) e average score of two consecutive sentences s 1 and s 2 sco bs (s 1 , s 2 ) e best score of two consecutive sentences s 1 and s 2 sim(s i , s i+1 ) Pairwise sentence similarity SD Semantic dispersion, and the astroturfing has a relatively high value be small. We calculate the proportion of the total number of clicks to the retweet number per tweet as follows: Cl rat � the number of received clicks the number of retweets per weet . (11) It is found that a smaller number of clicks is received for approximately 90% of the links within the astroturfing posted tweets than the total number of retweets, implying that retweets are performed by astroturfing accounts in other ways.

Active
e user ID usually changes in astroturfing. e user ID is normally discarded when a mission is completed, and another ID will be adopted for a new mission. us, the number of active days of an online user is analyzed. e result indicates a high similarity of the percentage of potential paid users to that of common users in the groups with active days of 1, 2, 3, and 4 days. However, a few normal users participate in the discussion for 5 days or more, and there is no astroturfing active over 4 days.

Sentence Transition.
Given W, which denotes the corresponding words set, we calculate the transition probability from word w p , which can form a p th row matrix PTP, and it can be calculate as follows: where P(w q | w p ) denotes the transition probability from the word w p to w q . e PTP value of the composite comments is larger than that of the real comment.

Word Cooccurrence.
ere are cooccurrence patterns for words between the two different adjacent sentences. For the word w p with w q , the cooccurrence score can be defined as O p,q � log(P p,q /P p P q ), in which P p/qp/q means the probability of the word w p/q appears, and P p,q denotes the joint probability.

Average Score.
e average score of two consecutive sentences s 1 and s 2 can be represented as sco as (s 1 , s 2 ), which can be calculated as follows: 2.10.4. Best Score. e best score of two consecutive sentences s 1 and s 2 can be represented as sco bs (s 1 , s 2 ), which can be calculated as follows: For a reviewer, with a sentence sequence s 1 , s 2 , . . . , s n , the coherence measure is defined based on the total average cooccurrence scores within the sentence, which can be calculated as sco r (r) � 1/n n−1 i�1 sco(s i , s i+1 ). For the synthetic comments, the SCO value is more possibly produced compared with the normal users' comments.
2.10.5. Pairwise Sentence Similarity. Such feature focuses on the similar signal word or topic that both represent by two adjacent sentences. For the user who gives the comment, the coherence score can be represented as the mean pairwise sentence similarity among all pairwise sentences, which can be calculated as 1/n − 1 i�1 n−1 sim(s i , s i+1 ), and sim(s 1 , s 2 ) � 2|s 1 ∩ s 2 |/|s 1 | + |s 2 |.

Semantic Dispersion.
Given each sentence's vectorized semantic representation, the semantic dispersion enables a quantified representation of the review content's dispersion. Let v 1 , v 2 , . . . , v n be the semantic vector representation corresponding to a review's each sentence; the semantic dispersion is defined as follows: where centroid � 1/n n i�1 v i . It is expected that the synthetic reviews usually have a larger SD than the truthful reviews.

Running length.
We count the total quantity of sentences in the k th piece and denoted as l k , and the overall compactness of the review is measured by K k�1 l k /K, in which K is the total quantity of pieces.
is measure is denoted as the running length.

Learning-Based Astroturfing Detection Approach
To completely portray the character of spams, many researchers started to use ensemble models, using many kinds of information to train a classifier for spams detecting. It is mainly summarized into three parts, which are supervised, semisupervised, and unsupervised learning.

Supervised Learning-Based Detection Approach.
Supervised learning-based method can be employed to realize the detection of a review spam. e underlying mechanism is to consider the task of review spam detection as a categorization task of separating the reviews into spam and nonspam comments.

Expectation-Maximization (EM) model.
e underlying idea of this method is to utilize the EM algorithm to construct a label prediction approach to give the prediction Security and Communication Networks for the untagged tweets. Based on the tweets set that contains of labelled and unlabelled tweets, the EM method is trained for a text classification task. Firstly, this model employs the Naïve Bayes classifier using the labelled tweets. For the E step of this model, this well-trained classifier can give the label prediction for the unlabelled tweets. For the M step, this model will give a reestimate for the word features' probabilities on the basis of the learned labels' prediction. Reasonably, the classifier within E can be updated, and the new classifier can be utilized to give a new prediction to the tweets in M. e overall interaction processing will continue until the number of changes in the predicted labels is below 0.01% of the total unlabelled tweets.

Support Vector Machines (SVM) and Naive Bayes.
e semantic flow difference between the synthetic and truthful reviews can be leveraged to identify the review spam. us, the SVM and Naive Bayes classifiers can be used to distinguish if a review is truthful or synthetic.

Behavior Feature Learning
is famous method can be regarded as an iterative approach, and the underlying idea can be concluded as a merging task. Firstly, it involves training various "weak" classifiers based on the same training set, and for the following processing, it involves combining different "weak" classifiers into a "stronger" one. Such processing is realized by changing the data distribution. According to the classification accuracy of each sample, we determine the weights of each sample according to the classification accuracy and the overall classification precision. Moreover, the lower classifier will be retrained with a new date set equipped with updated weight. us, the final decision classifier can be obtained by merging the well-trained lower classifiers.

K-Nearest.
is algorithm can be utilized to solve tasks, such as regression and classification, which can be regarded as a nonparametric method. e input for this algorithm consists of k samples that closets to each other within the feature space. e constant k is defined by the user, and in the phase of classification, the unlabelled sample is assigned to the most frequency label tagged to the k-closest samples. For the detection of astroturfing based on behaviors, the aim is to differentiate between the two types of tweets: one receiving retweets from astroturfing accounts and the other receiving retweets from normal accounts.
Four novel retweet-based features are found by Song and Lee et al. [21]. It enables us to discriminate astroturfing tweets from others. e four features can be concluded as: (1) the distribution of retweet time, (2) the most dominant application proportion, (3) the total quantity of unreachable retweeters, and (4) the received clicks amount. Based on the features of retweet, the authors constructed the K-Nearest Neighbors and evaluated the accuracy of this model according to the ground truth.

Logistic Regression.
According to the logistic functionestimating probability, such an algorithm measures the relationship between the dependent classification variable and one variable or various independent variables. In this context, logistic regression can be regarded as the distribution of cumulated logistic. erefore, this method can utilize a similar technology to process the same group of problems of probit-regression, in which probit-regression utilizes cumulated normal distribution.

Support Vector Machines (SVM).
Chen et al. [22] investigated a web forum and discovered that the comment spams and their senders have some common features, such as the proportion of spam publication, publisher ID for spams, first poster, reply comment, publish timing, and tweet activity. e relationship between different spam publishers is obtained in their research. ey also built a classifier SVM with an RBF kernel to detect the spammer.
ere are two important hyperparameters in the radial basis function (RBF) kernel-based SVM that needs to be optimized, namely, C and c. During the learning of the model, employing multiple five-folds cross-validation to the training set and taking the F measure as the optimization measure to realize the grid searching of C and c can be concluded as follows:

Random Forests.
is model can be regarded as a kind of ensemble learning approach designed to realize classification, regression, and other machine learning tasks. A random forest is realized by constructing a huge number of decision trees during training. It outputs the classification result or the average prediction score of the single tree.
Lee et al. [23] proposed a comprehensive analysis of astroturfing and trained a random forest-based classifier to detect astroturfers. ey identified a few valuable features such as profile features, content features, and social network features. Accordingly, the authors calculated the feature values of each user in the training and validation set. e authors select the popular classification algorithm: random forest. Lee et al. developed a classifier based on a random forest that can predict whether the user is normal.

Neural Autoencoder Decision
Forest. It has been proven that autoencoder is a kind of robustness algorithm, and it can give an unsupervised description within the feature mode. Random forest can be regarded as the set of multiple decision trees that can solve the problem of overfitting effectiveness. Extensive experiments show that this method performs well in real world practice.
Dong et al. [24] proposed a unified model which is trainable and end-to-end based on the interesting characteristic of the autoencoder and random forest. In this model, Dong et al. utilize the hidden representation of the feature generated by the autoencoder. e whole model is jointly trained by two models, namely, the stochastic decision tree model and the differentiable one. e final prediction is generated by the decision forest.

Semisupervised Learning-Based Detection Approach
Evidence from other areas suggests that the learner accuracy can be considerably enhanced by combining the unlabelled data with a small amount of labelled data compared to methods that are completely supervised.

FakeGAN Model.
e lacking of substantial ground truth poses a major challenge to the classification methods for deceptive review detection. Hence, Aghakhani et al. [27] proposed the system FakeGAN, where the learning approach based on semisupervised neural network is firstly employed to detect a deceptive spam. Unlike the standard GAN models, the FakeGAN adopts three models in total, which consists of two discriminators and one generator. In reinforcement learning (RL), the generator is modelled as a stochastic policy agent, and Monte Carlo search algorithm is utilized in the discriminator to estimate and pass the intermediate action value, which can act as the RL reward for generator.

Behavior Feature Learning
4.2.1. C4.5. For the given data set, each tuple can be represented as a group of attributed vales, and each tuple belongs to one of the mutually exclusive classes. e purpose of C4.5 is to find the mapping from the attribution to the classification through learning, and this mapping can be utilized to classify the unlabelled new objects.
In the detection of astroturfing, Xu et al. [28] presented an analysis of the whole system of spam attack from multiple perspectives. ey used the profile attributes, QA attributes, and SN attributes of the users as features to train a classifier (C4.5) for detecting the spammers.

Unsupervised Learning-Based and Other Detection
Approach. Facing the challenge of implementing an accuracy label for a review spam dataset, it is not inapplicable to use supervised learning, occasionally. A new unsupervised text mining model is proposed.

Latent Dirichlet Allocation (LDA)-Based Model.
LDA is a document theme generation model used to analyze the topics studied by each user's cluster. is model takes a text corpus as the input, and it outputs K themes. Each theme is a list of words and is ranked according to the relevance to this theme.
Dong et al. [29] realized the LDA model-based unsupervised topic-sentiment joint (UTSJ) probability model. To obtain the topic-sentiment joint probability distribution vector for each comment, UTSJ makes the first attempt to employ the Gibbs sampling algorithm to realize the estimation of parameters for the maximum likelihood function with the offline way. Furthermore, the UTSJ model takes a kind of offline training to separately obtain the random forest-classifier and the SVM-classifier. Extensive experiments show that the UTSJ model is obviously better than other baseline models.
When detecting astroturfing with LDA, Yang et al. [30] used OpenCLAS and the words toolkit developed by Sogou to divide each message into a single word. Moreover, Yang et al. further improved the accuracy of LDA by combining each tweet and the corresponding review as an individual document. ey also filtered out the most frequent words that appeared in the top 10. Yang et al. employed LDA to the subdocument of the corpus, which is randomly sampled. Extensive experiments show that when compared to the state-of-the-art methods in this field, LDA outperforms the others.

MF Model.
Ma et al. [31] employed a matrix factorization model based on the orthogonal nonnegative matrix trifactorization model (ONMTF) to learn the lexicon knowledge from the spam. ey suggest learning external information on the level of topic instead of studying on the level of word. e underlying idea of this model is to realize the clustering of data samples based on the feature distribution. Furthermore, we cluster data instances based on the distribution of features. ONMTF can be realized by optimizing the follow formula:

Security and Communication Networks
where X denotes the context matrix. U is the low dimension representation of the word and V indicates the low dimension representation of the user. ey are nonnegative matrices. By adding a least squares penalty to the level of topic in the ONMTF model, Ma et al. project the initial context information to the topic space.

Semantic Language Model (SLM).
Lau et al. [32] proposed the SLM that can be regarded as a unsupervised model to solve the problem of text mining. SLM can be integrated to a semantic model to detect the fake reviews published by astroturfing. e underlying idea of this model is to establish a kind of approximate computing method that can imply the fake degree of reviews. More specifically, Raymond et al. utilized SLM to evaluate the semantic overlap among different reviews. Instead of the unsupervised detecting of review spams, Chen et al. also proposed an idea about the high-order associate mining to capture the concept associate knowledge, which is context-sensitive. Assuming that the semantic of one review is extremely similar to the other review, these two reviews are likely to be spam reviews with high probability. Based on the Amazon reviews collected, Raymond et al. constructed a dataset whose AUC can achieve 99.87%.

DetectVC. Liu et al. [33] employed a detection method
named DetectVC that can realize the robustness and deficiency detection for volowers and customers. DetectVC utilizes the inherent motivation and purpose of the volowers and the customers. is method combines the graph structure of the relationship within the tweet users' followers and the prior knowledge collected from the follower context.

Markov.
e Markov model can be regarded as a kind of stochastic model that can be utilized to construct a randomly changing system. e Markov model assumes that the future state depends on the current state alone, and it has no relation to the event that happens before it. Generally speaking, such an assumption is used only in the reasoning and computing process of the model. us, it is reasonable that a given model has the property of Markov under the model and the probability prediction fields.
In the structure-based detection of astroturfing, Fakhraei et al. [34] proposed a method to detect abnormal users under the scenario of multiple relational social networks. Social network can be regarded as a multiple relational graph with a time stamp where users are represented as vertices and different activities between them are edges. e authors use the mixture of Markov models. Fakhraei et al. assumed that the operating of each user is generated by the mixture of Markov model. Specifically, each cluster in the spam sender or nonspam sender is associated to a mixture component y. Taking the cluster y for a user as a condition, the authors assume that the action sequence for a user is generated by the corresponding Markov. e joint probability can be calculated as follows: y is the user's cluster and x i , . . . , x n is the action sequence, P(x i |y) denotes the probability of x i when the cluster is y.

Recurrent Convolutional Neural Network (RCNN)
Model. Zhang et al. [35] made the first attempt to propose a fake review detection method based on deceptive review identification by RCNN, namely DRI-RCNN. is method makes use of the context and deep learning technology to identify fake reviews. Zhang et al. utilized the RCNN vector to represent each word in a review based on the deceptive context and truthful context property of reviews and word embedding. Furthermore, the authors also developed a deep neural network that combines max pooling and ReLU filter to detect fake reviews.

CrossSpot Model.
Jiang et al. [6] proposed the CrossSpot, which can be regarded as a scalable searching approach. CrossSpot finds dense suspicious areas under multiple mode data and realizes sorting based on the degree of suspiciousness. is method starts from a potentially suspicious module, and it takes iterative for updating to determine the optimal setting value for mode j. Meanwhile, CrossSpot makes sure that the included value for all other modes will keep the same. e aforementioned process of updating will continue until they converge.

Graph-Model Based.
Numerous methods are based on graph models, especially the structure-based ways in previous papers [8,13,34]. e graph model-based technology can be widely used in the detection of fake reviews, social spam [8,9], link farming [13], etc.
Ratkiewicz et al. [36] proposed a machine learning framework that combines the features of information spreading network on Twitter, including the topological feature, context-based feature, and crowdsourced feature.
is framework is designed to detect the early stage of the policy fake information spreading. To describe the information flow of the Twitter community, Ratkiewicz and Conover et al. constructed a directed graph in which the "node" denotes the individual user account and the "edge" indicates the operations of retweets and following.
Hu et al. [9] made the first attempt to analyze the sentiment differences of spam sender and the normal user, and designd an optimization formula that can incorporate sentiment information into the social spam sender detection framework. ey conduct the modelling of content information with the two constrains of the learned factor matrix U and learn those two constrains from the social network information in addition from the sentiment information. More specially, Hu and Tang et al. constructed user emotion information-based undirected graph in the processing of emotion modelling.
Shehnepoor et al. [37] realized a new framework named NetSpam. ese frameworks use spam features to model the review data set as a heterogeneous information network and map the spam detection processing to the category problem within such networks.
Liu et al. [38] realized a classification method based on the complex probability graph that can solve the abnormal users detection problem. To obtain an initial effective estimation for the nodes (reviews, authors, and products) in the graph, Liu et al. made use of the attention machine to train a neural network and study the embedding representation of multiple modes using the text and various other features. On the basis of the prior calculation of the node, this classification method captures the relationship among different types of nodes based on the construction of a heterogeneous graph. e existing works in the astroturfing detection field only take one or two types of astroturfing objects into account, e.g., text comment, reviewer, reviewer group, and product. Noekhah et al. [39] proposed a multi-iterative graph-based opinion spam detection (MGSD), which can be regarded as a graph-based model. MGSD utilizes a multiple iterative algorithm that takes various factors into consideration to update the entities' score. Furthermore, to improve the detection accuracy of MGSD, Noekhah et al. combined the feature fusion technology and machine learning algorithm to select more weighted features and new features from various categories. Extensive experiments show that the feature selection technology and the feature fusion technology can improve the performance of astroturfing detection.
To summarize, we compare the learning mode, feature, and based model of different literature in Table 3 to show "how" each work is executed.

Evaluation Criterion
ere are many metrics that can be utilized to evaluate the astroturfing detection algorithm's performance, for example, accuracy, F1 score, precision, recall, AUROC, FPR,. All these metrics are used for classification models. When there is a low spam post ratio, the accuracy cannot be taken as a strong metric because of the domination by the majority nonspam class. Since it is not expected that a normal user's review is regarded as a spam, the spam detection approach should have a relatively high accuracy. In addition, to identify as much spam as possible, the spam detection approach should have high recall. For example, when the detection system utilizes manual classification to categorize the spam under the original filter, select the top misclassification with high recall to reduce the possibility of identifying entities completely. Meanwhile, the misclassification for detecting the normal review as a spam can be corrected later. Neither precision nor recall is the priority. us, the evaluation metric will take the weighted mean of precision and recall, which can be named the F measure.
In Y. R. Chen and H. H. Chen [22], the baseline for detecting the confusing forum spams is generated by leveraging the spreadsheets. Yang et al. [40] developed a Sybil detector based on measuring, and the ground truth is provided by Renren company. e Sybil is utilized widely on Renren, and it detects over 100 thousand Sybil accounts. Sedhai and Sun [41] made use of 2104 manually labelled classes as the experimental ground truth. Moreover, based on the KNN classification, the labels will increase with a more efficient way.

Precision, Recall, and F1 Score.
e precision and recall are the commonly used classification metrics. Precision assesses the true positive part under the samples that are classified into positive. Recall assesses the proportion of positive samples that are labelled correctly. e F1 score is the weighted mean of precision and recall, which trades off these two metrics.
where TP denotes true positive, which means that a sample is classified to be positive as the sample is actually positive, FP indicates false positive, which means that a sample classified as positive but is actually negative, while FN denotes false negative, which means that a sample is classified as negative but is actually positive. e weights of the two parts in F1 score are equal. When a detection algorithm can maximize the precision and recall at the same time, the algorithm will perform well. us, if one algorithm performs well in both sides appropriately, it will be better that it performs extremely well in one side and poorly in the other [42].

AUROC and AUPR.
AUROC means the area under the ROC, and it draws how TPR (True Positive Rate) changes according to the changes of FPR (false positive rate), in which TPR denotes the proportion of a sample classified as correctly positive, and FPR indicates a sample that is classified as positive but is actually negative. AUPR is the area under precision-recall. In the work of Fakhraei et al. [34], in order to avoid overoptimistic estimates of the PR curve and the ROC, Fakhraei used AUROC and AUPR to estimate the performance of their method. However, in the work of Song et al. [21] and Wang et al. [48], they used AUROC alone as the evaluate metric.

TPR, FPR, and FNR.
TPR and FPR have been explained above. FNR (False Negative Rate) indicates the proportion of a sample that is classified as negative but is actually positive. In the actual experiment, we hope that the TPR is as large as possible, while the FPR and FNR are as small as possible.
In their works, Lee et al. [23], Barushka and Hajek [49], and Lee et al. [50] used the FPR and FNR as metrics to evaluate the classifier for spammer detection. In the work of Morales et al. [51] and Xu et al. [28], both of them used the TPR and FPR as measures to evaluate the detection performance of the classifier. Accuracy � TP + TN P + N , Morales et al. [51] and Wang et al. [52] used the ER as the experimental metric for evaluation. Meanwhile, Lee et al. [23], Dong et al. [24], Aghakhani et al. [27], Yang et al. [30], Zhang et al. [35], Li et al. [53], Dhingra and Yadav [54], and You et al. [47] used the accuracy as a measure to evaluate the classifier they built. Furthermore, as shown in Table 4, we compared several literature under different evaluation criterions to analyze the evaluation stand in the field of astroturfing detection, in which the best performance under different criterions are bolded.

Applications
Astroturfing can disrupt the normal order of the network and bring many negative effects to society and people's life. Hence, it is essential to design schemes for the assistance of normal users, administrators, law enforcers, etc. Astroturfing detection can help users distinguish truth from falsehood and obtain the information that they really need. In social networks, astroturfing detection is of importance for multiple applications. is section concerns with several typical applications, for instance, the detection of astroturfing in social networks, astroturfing account recognition, and deceptive reviews identification.

Automatic Review Synthesis Model.
Morales et al. [51] leveraged the difference of semantic flows between the semantic and truthful reviews to identify the review spam, and they used SVM and Naive Bayes as classifiers to identify if one review is truthful or synthetic. Positive reviews are automatically generated in their model by mixing up existing reviews.

Real-Time Detection System.
Detecting astroturfing by establishing the classifier is a typical application for astroturfing detection. In Chen et al. [55], the fundamental architecture and the design of a detection system that identifies malicious behaviors and potential paid posters in real time are discussed. e purpose of their system is to identify potential paid posters and locate their user IDs during the information collection process.
is system can not only automatically collect data from different resources/websites and report the behavior of potential paid posters but also provide valuable information for the analysts and online users. Four major components are involved: data crawler, scheduler, data analyzer, and database system.

Multiagent System.
Analyzing the distribution and behavior characteristics of astroturfing is also helpful to better understand and monitor the astroturfing accounts. Zeng et al. [56] surveyed the behavior mode and policy of astroturfing on the Internet forums. ey constructed a multiple-agent system [57], and utilized the ground truth data set of the online forums to conduct extensive experiments. Furthermore, they took the research of the factors that can impact the astroturfers' influence. Zeng et al. found that astroturfing maximized their influence by adjusting their behavior policy dynamically, and the effectiveness of their policy was highly related to the users' features.
Zeng et al. developed a multiple-agents technologybased social network environment.
is model has two different agents, namely astroturfers and users. ese two types of agents utilize the theme features and user features to evaluate the complex dynamics of cooperative coevolution within the multiple agent system, which takes the accumulated polarities as the basis.

IWA Social Network.
Liu et al. [58] developed a new social network named as the IWA social network. IWA consists of two types of nodes, which are nodes that take IWAs as the core and the normal expanded nodes that communicate with IWAs. IWA is a kind of unnatural network, and the core nodes communicate with others because of their own economic interests or other interactions.
Considering of the features of the IWA, Liu et al. think that there are a group of members consist in the IWA social network, and each member controls several good accounts that own huge number of followers. In order to keep the property of transmission, each member confusing as the normal user.
ere are main account that astroturfing member can contact to others, which include both astroturfing users and normal users. In this way, IWA social networks can build connection with other normal users.

User Preference Graph.
Astroturfing users may provide deceptive reviews to interfere with the judgment of normal users. Identify and filter out the deceptive reviews is also important.
Li et al. [59] extracted both textual features and contextual features. ey proposed a new user preference graph to measure the user relationship. ey incorporated both these features and the user preference relationship into the supervised learning framework and obtained more precision results for predicting the deceptive answer. ey propose a new user graph to describe the relationship among users. Figure 1(a) shows the general process in a question-answering thread. Accordingly, other users will give a few answers. en, each user will give voting to each answer as "useful" and "unuseful" to indicate the evaluation of each user. Furthermore, the questioner will choose an optimal answer. If the two users show the same preference toward the target answer, then these two users have the same user preference, and they will have the user preference relationship. Li et al. extracted all the relationships of user preferences (shown in Figure 1(c), and each node denotes a user), in which there will be an edge exist if there is a user preference relationship between these two users.

FakeGAN
System. e existing efforts in the field of spam detecting mainly focus on constructing a supervised classifier according to the review sentiment and lexical mode. Aghakhani et al. [27], inspired by the application of neural network in classification, developed the FakeGAN system. FakeGAN takes the first attempt to employ the generative adversarial network (GAN) in the task of text classification. Compared with the standard GAN model, FakeGAN model has two discriminators and a generator. Aghakhani et al. modelled the generator as a stochastic policy agent in RL and utilized the Monte Carlo search algorithm to estimate. In addition, they also took the intermediate action value as the RL reward, which will be transferred to the generator.

Weakly Supervised Fake News Detection Framework.
Wang et al. [60] made the first attempt to propose a framework to detect fake news, which can be regarded as a kind of reinforced weakly-supervised learning, weakly-supervised fake news detection framework (WeFEND). WeFEND can get labelled the samples with high quality, facing the main challenge of detecting fake news with a deep learning model. WeFEND has three components, including the reinforced selector, fake news detector, and the annotator. Figure 2 demonstrates the framework of WeFEND.

Open Source or Prototype System
More applications have endorsed online reviews. To detect the spammer in the network, the computer science department of California proposed the system of spammer detection that utilizes a single real review as the template and to further replace the sentence with other review sentences in the storing dataset. ey tested the performance of system using hotel reviews in the city of New York. e detection accuracy of this system is approximately 78%.
Fakhraei et al. [34] utilized k-gram features and probability modelling with the mixture of Markov model to obtain the relationship sequence. In addition, to improve the reasoning and prediction performance of the noisereporting system, Fakhraei et al. proposed an analysis relationship model based on the Hinge Loss Markov Random Fields (HL-MRFs) and a kind of probability graph model that can be expanded to a high extent. e authors use GraphLab construction and Probability Soft Logic (PSL) as the experimental prototype and employ extensive experiments to evaluate their solution. Extensive experiments show that their model is effective, and the method proposed in this work can improve the prediction performance by integrating the multiple relationships feature of social network.
ere are three components to support this framework. For the first component, extracting the graph feature within each relationship and experimental results show that considering the property of multiple relationships of the graph can improve the performance. For the second component, Fakhraei et al. took the action sequence of each user within these relationships into account. Furthermore, they extracted the k-gram feature and utilized the mixture of Markov model to label the spammer. Finally, for the third component, the authors of this work developed a HL-MRFsbased analysis relationship model to realize reasoning, which took the basis of the signal of the reporting system from the social network.

Crossing Domain.
e detection of astroturfing is an interdisciplinary research topic, and the main challenge of this topic is to apply the well-trained model to other target fields. Li et al. [61] trained their original model in the field of hotel and tested it in a restaurant field and a hospital field. However, comparing to the performance in the original field, the performance of their model dramatically decreased. us, the crossing domain detection of opinion astroturfing needed to be studied more deeply [62].

Missing Data.
Public astroturfing generally works underground, and the available data are not enough for researching their behavior. us, the research work for the astroturfers behavior is difficult to be carried out.

Optimization Method.
Recently, the semisupervised approach has mainly paid attention to the cooperative training and PU learning. However, the classification performance of these methods is not good enough. ough the constructing of astroturfing spam opinion dataset is difficult, the development of unsupervised learning approaches to identify the spam opinion is very important [62]. In addition, at present, there are excellent works for applying neural network model in studying the sentiment representation of the Natural Language Processing (NLP) task. us, the effectiveness design for neural network algorithms is the focus of the future research.

Complexity of the Internet.
Astroturfing work is always cross-platform, cross-channel, and it has numerous sock puppets. is makes their behavior information fragmented and difficult to link to each other.

Evolutionary Astroturfers.
Online astroturfers not only imitate legitimate users' behaviors, such as the number and the frequency of posting tweets, to avoid being identified [63], but also dynamically adjust their behavior policy to maximize their influence [56]; thus, the learned astroturfers detection systems can quickly go stale.
ere are extensive works in the field of astroturfing detection, and there are also many potential challenges in this field. Many open tasks need to be researched in-depth. us, we discuss the possible challenges for future researching in the domain of astroturfing detection. e essence of astroturfing is the actors' profitable purpose and behavior patterns rather than the content. As a result, a behavior-driven suspicious pattern analysis will be a significant driving force of the detection method. Taking combination of a wide domain of behavior information will better understand and distinguish suspicious behavior from normal behavior.
In short, different practical applications present different detection requirements of suspicious behavior. Astroturfing detection can be realized by optimizing the value of suspicion degree and finding the most suspicious part in the large-scale data of behavior. e principle is that "we would rather kill all just for one" (let the user complaints), instead of excessively ensuring the accuracy.

Cross-Domain Astroturfing Detection
rough finding the relationship among astroturfing behaviors across different domains, this correlation among astroturfing can be utilized to combine the traditional e-mail and the spam together into the detection algorithm of astroturfing, to further improve the detection effectiveness of social spam. For instance, the web spam detection method  has studied the web link structure widely [64,65]. Moreover, utilizing such link-based method into the online community social corrections, can improve the detecting effectiveness of social spam obviously [56].

Utilized Temporal Information.
e temporal information is very significantly in the detection of astroturfing, since the astroturfing users usually post huge messages within a few minutes, or posts a large number of reviews with various accounts within a very short period of time. us, studying the correlated behaviors with astroturfing on the level of time features, and predicting the future astroturfing behavior is very meaningful. Meanwhile, this indicates that we need to adjust model based on the temporal features accordingly, and make the model is more comprehensive.

Protection of Privacy of Individual Online Posters.
In the Internet era, the user's privacy is more easily compromised. e crowd workers' on Amazon Mechanical Turk (MTurk) used to call for "Our Privacy Needs to be protected at All Costs" [66]. How to prevent users' information from being leaked and to give protection of the individual privacy of online posters will be a future challenge.
At present, several famous websites have taken measures in the aspect of user privacy protection. For example, Twitter selectively provides some data to the public (It only provides 1% of real user data for data mining [67].) in order to keep user sensitive data within tolerable limits. Moreover, JD Mall and Taobao.com have preprocessed consumers' comments. e method is as follows: Before reviewing the product, users need to confirm their comments are set to "public" or "anonymous." If the user chooses the anonymous comment, the system will preprocess the user information, replace the user's real avatar with the default image, and replace the user's nickname with * * * . For example, this system will turn a user's name "Hellosunshine" into "H * * * e". At the same time, we cannot see the user's buyer show (the user's purchase lists and comments). In this way, the privacy of users can be protected on the premise that the water army detection is normal.
In addition, we find that the user who transfers astroturfing messages successfully may be a normal user, and they participate in astroturfing messages posting with no idea. In other words, they have been confused by the original astroturfing account. erefore, further studies are required to design more generalized methods for solving the privacy protection problem.

Data Availability
All data generated or analyzed during this study are owned by all the authors and will be used for our future research. e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.