Mining Community-Level Influence in Microblogging Network : A Case Study on Sina Weibo

Social influence analysis is important for many social network applications, including recommendation and cybersecurity analysis. We observe that the influence of community including multiple users outweighs the individual influence. Existing models focus on the individual influence analysis, but few studies estimate the community influence that is ubiquitous in online social network. A major challenge lies in that researchers need to take into account many factors, such as user influence, social trust, and user relationship, to model community-level influence. In this paper, aiming to assess the community-level influence effectively and accurately, we formulate the problem ofmodeling community influence and construct a community-level influence analysis model. It first eliminates the zombie fans and then calculates the user influence. Next, it calculates the user final influence by combining the user influence and thewillingness of diffusing theme information. Finally, it evaluates the community influence by comprehensively studying the user final influence, social trust, and relationship tightness between intrausers of communities. To handle real-world applications, we propose a community-level influence analysis algorithm called CIAA. Empirical studies on a real-world dataset from Sina Weibo demonstrate the superiority of the proposed model.


Introduction
Community-level influence analysis is an emerging problem, which can be used in many filed, for example, recommendation system [1,2], public opinion prediction [3], and cybersecurity analysis [4].There are many researchers who are interested in analyzing the social influence in social networks [5], but rarely assessing the influence in community level.With the rapid spread of online social networks, such as Twitter, Facebook, and Sina Weibo, large amounts of data with the real world are produced, which provide support for the social influence analysis.
How to establish an effective model for analyzing community-level influence has become an important research for online social network.Community-level influence is greater than individual-level influence, but few researchers have studied community influence.The existing studies establish various social influence analysis models [6,7], but they just study the influence in the individual level and mostly ignore the existence of a common influence pattern from a community that includes multiple nodes.A large number of achievements have been obtained on individual-level influence, but most of the studies are based on static statistics method [8][9][10][11], link analysis algorithms [12][13][14], or probabilistic models [15][16][17].These studies do not consider whether the user is willing to receive or diffuse information or what the role of social trust between users is or do not remove zombie fans.However, these factors are very important for analyzing the social influence.Meanwhile, the existing works about community-level influence focus on the influence strength between communities and ignore the problem of analyzing the community-level influence.For example, Belák et al. [18] calculated the community-level influence by only averaging influence of all users in a community.
An important observation is that zombie fans have no contribution to the social influence, and the willingness of users to diffuse information has a certain effect on the accuracy of calculating social influence, and social trust plays an important role in social influence.The trust degree of user A to user B determines the influence of user B on user A. The 2 Complexity more the user A trusts user B, the more influence the user B has on the user A. Because user influence is the basis of the community influence, a little carelessness on the former will lead to errors on the later.
Aiming to assess the community-level influence effectively and accurately, we construct a community-level influence analysis model that can assess community influence.Based on our model, a community-level influence analysis algorithm (short for CIAA) is proposed, which can assess the community influence more effectively and accurately.The main idea of our model is as follows.First, we eliminate the interference of zombie fans on the social influence to make the results more accurate.Then, in the process of calculating user influence, we consider the social trust and use the random walk method to calculate the user influence.In evaluating the user's theme information, the user mean willingness is calculated by exploring the content related to the user's theme information.We combine these two factors (the user influence and the user willingness to diffuse theme information) to calculate the user final influence.Finally, the community-level influence is calculated by comprehensively studying the user final influence, the social trust, and relationship tightness between intrausers of communities.Experiments are conducted on a real-world dataset crawled from Sina Weibo.Comparing with the state-of-the-art algorithm (the averaging user influence algorithm [18]), the results show that our model is more effective and accurate to evaluate the community-level influence.
The contributions of this paper can be summarized as follows.(1) We formulate the problem of analyzing the community-level influence and design a community-level influence analysis model.(2) CIAA, a community-level influence analysis algorithm based on our model, is proposed, which is effective and reliable to evaluate the community influence of microbloggers from Sina Weibo.(3) We conduct extensive experiments to assess the performance of the proposed model.Experimental results on the real-world dataset demonstrate the superiority of the proposed CIAA.
The rest of the paper is organized as follows.In Section 2, we summarize the related works.In Section 3, we propose the community-level influence analysis model and give an example to illustrate its working principle, and the CIAA is proposed.In Section 4, we conduct experiments on the realworld dataset crawled from Sina Weibo and then analyze the performance of the proposed approach.Finally, we state the conclusion and future work in Section 5.

Related Works
Since Katz and Lazarsfeld [19] found that social influence plays an important role in social life and decision-making in the 1950s, researchers in computer field have spare no effort to study the relevant problems.It is found that the popular users play an important role in adopting innovation, social public opinion propagation and guidance, group behavior formation and development [5], and so on.
There are a great deal of research efforts to measure individual-level influence [20,21], typically, the "opinion leaders."Existing methods can be categorized into three types: the network structure based methods, the user behavior based methods, and the mutual information based methods.The network structure based methods are degree centrality [22], closeness centrality [23], betweenness centrality [24], eigenvector centrality [25], Katz centrality [26], PageRank [27], and clustering coefficient [28].We know that node degree essentially means the connection between a node and its neighbors.The method based on node degree can intuitively express this meaning, and its computational cost is smaller than other methods [29].These methods are widely used in measuring the users' influence in the social network.However, the methods based on node degree only reflect the connection between the users and their neighbors and cannot measure the users' influence in the entire social network for the local influence of users.For example, based on the community scale-sensitive maxdegree, Hao et al. [30] proposed an influential users discovering approach called CSSM when placing advertisements.CSSM uses the degree centrality and neighbor's degree to evaluate node's (microbloggers) influence.However, the algorithm does not consider the contribution of microblogs to user influence.Comparing with the methods based on the degree, the method based on the shortest path (closeness centrality and betweenness centrality) can measure the individual-level influence in the entire social network.Nevertheless, its computational complexity is higher than the degree centrality method.For example, based on text mining and social network analysis, Bodendorf and Kaiser [31] proposed an approach to detect opinion leaders in directed graph of user communication relationship.It can predict tendency of network opinion leaders via closeness centrality and betweenness centrality.Moreover, measuring the individual-level influence by the shortest path is an ideal status, and it is difficult to achieve in the real-world application scenarios.Besides, the methods based on random walk only consider the structure characteristics of the node while ignoring the behavior characteristics.For example, Xiang et al. [32] provided an understanding of PageRank and authority from an influence propagation perspective by performing random walks.However, they did not consider the personal attributes to understanding of PageRank as well as the relationship between PageRank and social influence analysis.Zhu et al. [33] proposed a novel information diffusion model called CTMC-ICM, which introduces the continuous-time Markov Chain theory into the Independent Cascade Model.Based on the model, they proposed a new ranking metric called SpreadRank.Based on continuous-time Markov process, Li et al. [34] proposed a dynamic information propagation model called IDM-CTMP to predict the influence dynamics of social network users.IDM-CTMP defined two other dynamic influence metrics and could predict the spreading coverage of a user within a given time period.Zhou et al. [35] established new upper bounds to significantly reduce the number of Monte-Carlo simulations in greedy-based algorithms, especially at the initial step.Based on the bound, they proposed a new upper bound based lazy forward algorithm for discovering the top- influential nodes in social networks.
The aforementioned models focus only on assessing the social influence of single individuals.However, a small number of works attempt to build models on the community influence analysis.Qi et al. [36] applied degree centrality, closeness centrality, and betweenness centrality to groups and classes as well as individuals.Latora and Marchiori [37] put forward a group information centrality to measure the importance of node sets.Mehmood et al. [38] exploited information diffusion records to calculate the influence strength between different communities.Although these works preliminarily study the community-level influence, none of them focuses on how to measure a community's influence.Belák et al. [18] assessed the community-level influence according to the average of the all users' influence in the same community.Because the distribution of the users' influence is uneven in different communities, average based method is inequitable to bigger communities, while summation based method is inequitable to smaller ones.At present, community-level influence analysis is still a challenging problem.

Proposed Methodology
We construct our model and implement the corresponding algorithm in this section.First, we give the related definitions in Section 3.1.Then, we propose the community-level influence analysis model for microbloggers.Next, we describe the working principle of our model via an example in Section 3.2.Finally, the community-level influence analysis algorithm is proposed in Section 3.3.

Related Definitions and Community-Level Influence
Analysis Model 3.1.1.Related Definitions.Social networks and communities are described as follows: a typical social network can be represented as a bipartite graph  = {, },  is a set of nodes (users) in a social network, and  is a set of edges used to describe the relationships between nodes.A community can be represented as a subgraph of a social network: that is,  = {, };  ⊆  is a set of users in a community. ⊆  is a set of relationships between users within a community.A node is defined as a user within the community if he/she belongs to the community; otherwise, he/she is defined as a user outside the community.The set of users outside the community is written as UOC.Modeling and calculating the community influence of   are the basis of our work, and the objective function of our model is as follows: CI(  ) denotes the community influence of the community   , and the function (,   ) indicates that the assessment method is based on  and   .There are two entities (i.e., users and communities) which can produce influence.To study the community-level influence, we give the related definitions as follows.

Definition 1.
Trust.A node in a social network has a certain trust degree in other nodes according to its past contact with other nodes or the reputation of other nodes [39,40].According to the different sources of trust, we divide the trust into direct trust and indirect trust.
(1) Direct Trust (DT).Assume that the node V is the entry node of the node , indicating that there is contact between  and V.According to the previous contacts and the reputation of , V will have direct trust on .
(2) Indirect Trust (IT).Assume that the node  is the reachable node of the node V; V will have indirect trust on  because the reputation of  can be transmitted to V.
Users not only have mutual trust, but also mutually influence each other.According to the different sources of influence, this paper divides the influence into direct influence and indirect influence.

Definition 2.
(1) Direct Influence ().Assume that the node V is the entry node of the node ;  will have an influence on V: that is,  produces direct influence on V.
(2) Indirect Influence (II).Assume that the node  is a reachable node of the node V;  will have an influence on V through transmission layer by layer: that is,  produces indirect influence on V.
In order to assess the overall influence of  on V, we define the user combined influence.

Definition 3.
User Combined Influence (UCI).Because V has direct trust or indirect trust to , and  has direct influence or indirect influence on V, we comprehensively combine the four factors to calculate the combined influence of  on V.

Definition 4.
(1) User Influence (UI).User influence refers to the influence of individual on other users.
(2) Community Influence (CI).Community influence is the overall influence of the community, which is formed by the UI of all the users in the community and the community's self-factors.

Definition 5.
Mean Willingness to Diffuse Theme Information ().In communities, some users receiving the theme information may not diffuse it, some users prefer to post their own blog, and some users prefer to forward others' blog.We assess the community influence by taking into account the diffusion of information between users.MW represents a user' willingness to diffuse the information of a blog.The theme information of the user  is stored in the set () = { 1 ,  2 , . . .,   , . ..},where   represents the user's th theme information.If   is diffused in a social network, a path map   is formed to describe the propagation path.We store the path graphs formed by () in the set () = { 1 ,  2 , . . .,   , . ..}.

Model Framework.
Our model consists of four modules: data preprocessing module, data source module, the user final influence module, and the community influence module.Figure 1 shows our model framework.
Data preprocessing module is used to eliminate zombie fans.We judge the zombie fans from the behavior dimension and time dimension.Behavior dimension is based on the amount of theme information posted by the user and the fans' influence of the user.Time dimension is based on the user login frequency and the frequency of diffusing theme information.Finally, the data preprocessing results are stored to the data source.
Data source module is responsible for providing the relevant data needed for influence analysis.We establish the user information table, the microblog table, the user fans information table, and the user attention table to access the user's relevant information efficiently.
The user final influence module first calculates the mean willingness to diffuse theme information for each user in a community and then calculates the user's influence.Next, it combines these two results to get the user final influence.
The community influence module first calculates the community size, the tightness of user relationship, and the user-integrated influence in the community and then evaluates the community influence by integrating the three factors.

Working Principle.
In this subsection, we introduce the working principle of each module in the model framework in detail.We assume that  and V are two users in community .After performing data preprocessing, Figure 2 shows the working principle, where the mathematical notations will be described in the following subsections in detail.
The working principle can be described as the following steps.
Step 1. Calculate the Diffu V and  V of V. Then calculate the MW(V) of V. Finally, calculate UI(V) of V.
Step 2. According to Step 1, calculate the MW(V) and UI(V) of .
Step 3. Integrate MW and UI to calculate the UII().Then calculate CS and RT().Finally, combine the three factors to calculate the community influence.

Data Preprocessing.
In microblogging networks, some users of ulterior motives or business purpose lead to producing the zombie fans.According to the definition in [41], zombie fans are the users who are fake fans generated and maintained mostly for economic purpose.Zombie fans certainly interfere in analyzing the social influence.A small number of empirical researches have been conducted on recognizing zombie fans [41][42][43].The existing studies were mostly subject to the Twitter platform.
Presently, researchers generally detect the zombie fans based on the amount of attention, the number of fans, original and forward information frequencies, and other basic attributes.With the ever-changing escalation of zombie fans, zombie fans will produce more features [44].The existing feature-based methods to eliminate zombies may gradually fail.We observe that because zombie fans are occasionally managed via software program or a few people behind, zombie fans often rarely speak, even seldom log in, or no longer are used; and their behaviors can be vastly different with ordinary users in profile information and contents.Moreover, no matter how the features of zombie fans change, they can be split into time dimension and behavior dimension.Thus, it is reasonable to recognize zombie fans from the time dimension and behavior dimension, and it is more able to adapt to the needs of detecting zombie fans in microblogging networks.
According to expert knowledge criteria [45], in the time dimension, we assess zombie fans from the user login frequency and the diffusing advertisement frequency.Thus, time dimension includes login frequency (LF) and diffusing advertisement frequency (DAF).Login frequency refers to the number of logins in a period.The lower the frequency of login is, the higher the probability of the user becoming zombie fans is.The login frequency is calculated as follows: where LoginNumber indicates the number of logins.The higher the diffusing advertisement frequency is, the higher the probability of the user becoming zombie fans is.The diffusing advertisement frequency is calculated as follows: where NumberOfDiffusingAdertisement indicates the number of diffusing advertisement frequencies.
For the same reason, in the behavior dimension, we assess zombie fans from the amount of user theme information and the individual influence of the user's fans.Thus, we take into account the number of user theme information (NUI), the number of attention users (NAU), and the number of user's fans (NUF).
To ensure that the criteria of the parameters are reliable, the corresponding criteria are obtained by prior knowledge, expert knowledge, or experimental trial.For example, we select the users who are the last 10% of the login frequency and whose login time interval is greater than 7 days into the set LF.To reduce the amount of calculation, we filter all users in a microblogging network.If a user has a certified user in his/her fans, the user is not considered a zombie fan.If a user does not have a certified user in his/her fans, the details to eliminate zombie fans can be described in Algorithm 1.
As we can see that, unlike the classification and pattern recognition, the proposed method to eliminating zombie fans does not require labeled data and training model.It is effective and easy to use in practice.

The User Final
Influence.The traditional models are simple, not taking into account the degree of social trust between users and the user's willingness to diffuse theme information.However, the two factors are important to the user final influence.In this paper, the user final influence is calculated by integrating the MW and UI.Because the influence of a user on other users is related to the user's willingness to exert his/her influence, the bigger the value of MW, the greater the probability of the user diffusing a theme information.UFI is calculated as follows: Mean Willingness to Diffuse Theme Information.The higher frequency of diffusing theme information means a higher user influence, because more users will know the user.Therefore, MW reflects the probability that a user has highimpact in a microblogging network.The parameter  V,  indicates the state of receiving theme information for the user V as follows: The user has never received the theme information 1, The user has received the theme information. ( The initial value of  V,  is set to 0. Meanwhile, to know the result of V diffusing the theme information   , we observe Complexity   .The parameter Diffu V,  indicates whether V diffuses the theme information that he/she received.
When the outdegree of V is greater than 0, it indicates that V has already diffused the theme information; otherwise, V has never diffused the theme information.The number of users receiving theme information is written as NRTI and the number of users diffusing theme information is written as NDTI.
∈ [0, 1] is the weight.NP(V) represents the total number of theme information posts by V. In(V) is the set of indegree nodes of V. () represents the weight of the user , which is determined by his/her outdegree.num  is the total number of   .The initial value of MW(V) is set as 1.We give an example for calculating MW in Figure 3.
Assume that the MW of all users initially are 1,  = 0.6, and then calculate the MW as follows.
(1) Calculating Direct Trust and Direct Influence.If V is an entry node of , then V will have direct trust on . , where DT V is the direct trust of V on .RU() is the reputation of user .In() is the set of entry nodes of , and RU( ← ) is the reputation of the entry neighbor  of .
The value of RU() depends on the average reputation of all 's entry neighbors.For each node, we give the initial direct trust value 0.1.In Figure 3(a), we calculate the direct trust on  1 from other nodes as follows: RU ( 1 ) = 0.1 + 0.1 + 0.1 + 0.1 4 + 1 = 0.08, has a direct influence on V as follows: where DI V is the direct influence of  on V. ( ← V) is the degree of interest of V to . |theme(V, )| is the amount of the theme information from  in the receiving theme information of V.
In Figure 3, we calculate the direct influence on  1 produced by other users as follows: In Figure 3(a), we have (2) Indirect Trust and Indirect Influence.If  is the reachable node of V, then V will have indirect trust on  as follows: IT V is V's indirect trust on .min V is the length of the shortest path from V to .
In Figure 3(a), we calculate the indirect trust on  1 gained from other nodes as follows: IT  3  1 = 0.08 0 (written as 0) , has an indirect influence on V as follows: In Figure 3(a), we calculate the indirect influence of other nodes on  1 as follows.The calculation of  is the same as the above formula.
If V is the entry node of , the combined influence of  on V is If V is not an entry node of node , but  is a reachable node of V, the combined influence is Assume  = 0.3.In Figure 3, we calculate the combined influence of other nodes on  1 as follows.
2 is the entry node of where SUCP represents a set of users that can reach  through a certain path.For example, in Figure 3, the user influence of  1 is calculated as follows: When we get MW( 1 ) and UI( 1 ), the user final influence can be calculated according to (4).

Community Influence.
The community influence is composed of the users' interaction inside and outside the community.In this paper, we consider it from three factors, that is, the user-integrated influence, the community size, and the degree of relationship tightness among users inside the community.
User-integrated influence (UII) is integrated from the final influence of all users within the community.

UII (𝐶
where UII(  ) is UII of the community   .() is the set of users inside community   .The community size (CS) is important to the calculation of the community-level influence.The larger the number of users in a community is, the greater the influence of the community becomes.The formula is as follows: where |(  )| represents the number of users in a community and max() represents the total number of users in the social network.The degree of relationship tightness (RT) represents the degree of closeness between users inside a community.We describe it from the user's outdegree and indegree as follows: Therefore, we calculate the CI as follows: where  and  (,  ∈ [0, 1]) are used to distinguish the importance of different factors.

The Proposed Algorithm.
According to the above description, we propose a community-level influence analysis algorithm, called CIAA, in a pseudo-code format in Algorithm 2.
It can be seen from the algorithm that the total time complexity is ().This means that our algorithm can be applied on large-scale social dataset.

Experiments
We conduct experiments to validate the effectiveness of the proposed approach on a real-world microblogging network.
In this section, we describe the experimental setup followed by the discussion of experiment results.    5.

Community Structure Analysis.
In order to mine and study the characteristic of community, we plot the outdegree distribution and degree distribution of users in community.
In a directed social network, the indegree of nodes is the number of fans of the user.The outdegree of nodes is the amount of the user's attention.Figure 4 shows the outdegree and degree distribution of data sources.
As shown in Figure 4, the outdegree distribution and the degree distribution of Sina Weibo dataset follow the powerlaw distribution, which indicates that the social network composed of the dataset is a scale-free network.

Eliminating Zombie Fans.
In order to improve the accuracy of our model, we remove zombie fans.According to the eliminating zombie fans method in Algorithm 1, we finally remove 12 zombie fans, as shown in Table 6.
As shown in Table 6, the three sets are NUI, NAU, and NUF.The little black boxes in Table 6 represent the shared users of three sets, and they are the same as the shared users from time dimension and behavior dimension.Therefore, the shared users will be removed.We compare the user final influence without the zombie fans with the user final influence with the zombie fans, as shown in Table 7.
From Table 7, the result of the comparison shows that the accuracy of the UFI with zombie fans for the actual user ranking is only 60%.It is concluded that the elimination of zombie fans is very important for the accuracy of the user final influence.

Accuracy Analysis of the User Final
Influence.We calculate the user final influence of users in community, but we compare the top ten users for simplicity.The top 10 user final influences and their related information are shown in Table 8.
According to the UFI ranking in Table 8, we find that these users are authenticated user.It is concluded that the authenticated users are more influential in microblogging networks.There are two reasons for this phenomenon.First, the majority of well-known users are authenticated users, and the influence of well-known users is larger than the user  average influence.Second, the authenticated user's identity is transparent, which makes the user have higher social trust.Table 8 also shows that the user final influence needs to be considered from the quality of the user fans, the number of user microblogs, and user authentication.Table 9 and Figure 5 show the comparison between the UFI method and the microblog-fans ranking algorithm.Table 9 shows the UFI method ranking and the corresponding ranking via microblog-fans ranking algorithm.Figure 5 shows the overall ranking order via the microblog-fans ranking algorithm.
It can be seen from Table 9 and Figure 5 that the UFI ranking is almost completely different from the microblogfans ranking.Overall, according to the UFI method, the number of microblogs and fans of the top users must reach a certain quantity to support individual influence.Thus, the number of microblogs and fans is a factor of measuring influence in UFI method.However, social trust between users can help improve individual influence in the UFI method.
The user final influence is an experimental evaluation of the user, and there is no existing dataset with its comparison.We can only refer to the ranking of the user influence from some affiliations.Based on the ranking of user influence provided by Sina Weibo official, we verify the calculation method proposed in this paper.We compare the results of the proposed method with the official ranking to verify the correctness of the user final influence.Because each microblogging platform has its own influence calculation method, we cannot numerically compare the results, but we compare the results from the relative position, that is, ranking.If the influence rankings of the two methods are in the similar order, we consider the results of the influence analysis to be similar.The comparison of the users ranking by Sina Weibo officially and UFI method is shown in Table 10.In Table 10, the user final influence calculation method and the user actual ranking are mainly the same but having the user pair of 299 * * * * 593 and 365 * * * * 215.That is because user influence ranking by Sina Weibo emphasizes the number of microblogs and fans, and the number of microblogs and fans of user 299 * * * * 593 and user 365 * * * * 215 is largely different.However, the UFI method considers the factors of influence more reasonably.Considering the results of Sina Weibo official as the standard, the accuracy of UFI method will change with different  and , as shown in Figure 6.
From Figure 6, it can be seen that the UFI method accuracy changes with the different  and .When  = 0.3,  = 0.5, UFI method has the highest accuracy.Therefore, the parameter pair (0.3, 0.5) is used for other experiments.We also find that the UFI method is more accurate than the microblog-fans ranking algorithm.Moreover, this experiment indicates the importance of the user willingness to diffusing theme information in the accuracy of the user influence.

Accuracy Analysis of CIAA.
Because the existing studies of community influence are few, we compare the proposed algorithm CIAA with the averaging user influence algorithm (AI).We set different parameters pair  and  for comparing the two algorithms.Then, we can calculate the corresponding community influence, as shown in Figure 7.
Figure 7 shows that the results of the CIAA are changing with the different parameter values.When  = 0.5 and  = 0.2, the results of the two algorithms are closest.That is because the AI algorithm is mainly the weighted average of the user influence, and the CIAA is the integration of the user-integrated influence, the community size, and the degree of relationship tightness among users inside the community.The greater the proportion of the user final influence, the closer the results of the two algorithms.Therefore, the proposed algorithm outperforms the state-of-the-art baseline algorithm.

Conclusion
In this paper, we studied the emerging problem on how to model community-level influence.Online social networks, especially microblogging networks, are more and more important in our daily life.Previous works can effectively cope with the individual influence in microblogging network, but they rarely evaluate the social influence in community level, which outweighs the individual influence.We defined the related concepts for the community-level influence and constructed a model that combined the user influence, social trust, and relationship tightness of intrausers in a community to reveal the community-level influence appropriately.We proposed the algorithm CIAA to cope with the real-world applications.We conducted empirical studies on a realworld microblogging crawled from Sina Weibo, where the CIAA outperformed the state-of-the-art baseline algorithm.
To the best of our knowledge, the proposed approach has a significant effect on community influence in microblogging network.The highlights of this paper can be summarized as follows: (1) formulating the problem of analyzing community-level influence and designing a communitylevel influence analysis model; (2) proposing communitylevel influence analysis algorithm called CIAA, to cope with real-world microblogging applications; and (3) extensively demonstrating the superiority of the proposed method.In the future work, we plan to extend the proposed method to assess the community influence in dynamic online social network.

Figure 1 :
Figure 1: The framework of the proposed model.

Figure 2 :
Figure 2: The working steps of the community-level influence analysis model.

( 1 )
Input: , , LF, DAF, NUI, NAU, NUF (2) Output:  = (, ) (3) Select the users who are the last 10% of the login frequency and whose login time interval is greater than 7 days, into the set LF (4) Put the users with the top 10% of the diffusing advertisement frequency into the set DAF (5) Select the users who are the last 10% of the number of user' theme information into the set NUI (6) Put the users with the top 10% of the attention users into the set NAU (7) Put the users with the number of fans between 10-200 into the set NUF (8) ZF = LF ∩ DAF ∩ NUI ∩ NAU ∩ NUF (9) Update  =  − ZF and  =  −  ZF (10) return ,  Algorithm 1: Eliminating zombie fans.

Figure 3 :
Figure 3: An example of calculating MW: there are five users inside a community, that is,  1 ,  2 ,  3 ,  4 , and  5 .There are three users outside the community, that is, V 1 , V 2 , and V 3 .(a) shows the relationship between these users.(b) shows the diffusion of theme information from  1 .(c) also shows the diffusion of theme information from  1 .(d) shows the diffusion of theme information from  2 .

Figure 4 :
Figure 4: (a) is the outdegree distribution and (b) is the degree distribution.

Table 6 :
Three user sets for eliminating zombie fans.The boxes represent zombie fans.

Figure 6 :
Figure 6: Comparison of accuracy of two methods with different  and .

Figure 7 :
Figure 7: The community-level influence by two measuring algorithms with different (, ) pairs.

Table 1 :
Data structure and description of the user information.
sites in China.It has more than 33% of the Internet users in China, and its market penetration is equivalent to that of Twitter in the United States.As released by the Sina Weibo, as of June 2016, the active users from different social and cultural backgrounds have reached 282 million monthly and 86.8 million daily.Moreover, there are nearly 100 million new

Table 2 :
Data structure and description of the user theme information (microblogs).

Table 3 :
Data structure and description of the user fans.
The crawled data includes 20,151,129 microblogs, 932,578,467 comments, and 9,218 users.In this paper, we collected more than 1000 users from the crawled dataset and divided the related information into Tables 1, 2, 3, and 4 for data sources according to our model framework.They are stored in txtformatted files.

Table 4 :
Data structure and description of the user attention.

Table 5 :
Parameters for experiments.

Table 7 :
Comparison of the user final influence.

Table 8 :
Top 10 user information of the UFI.

Table 9 :
Comparison of UFI method with microblog-fans ranking algorithm.

Table 10 :
Comparison of user actual ranking with UFI ranking.