Hot Topic Propagation Model and Opinion Leader Identifying Model in Microblog Network

and Applied Analysis 3 data of four events, we know the authority value follows the power-law distribution. Let the authority value of the user i be w i . Its distribution is p(w) which follows the power-law distribution, and the power law is at [−1.3, −1.9]. Therefore, the authority probability density function is defined as p (w) = (1 + βw) −α , (2) where α is 1.5 at [1.3, 1.9] and β is a parameter. The node state is divided into the published microblog and the unpublished microblog. The function δ i (t n ) represents the state of microblog i at t n . Consider the following: δ i (t n ) = { 1, the published microblog; 0, the unpublished microblog. (3) The topic field strength formed by internal nodes in the network is defined as


Introduction
Microblog is another important network information interactive and propagative platform after blog.It is based on the network and communication technology.There are considerable advantages on the speed and space of information propagation as well as on the breadth and the depth of reports.Microblog opinion leaders rely on their microblog amount and quality to raise a drastic group debate through setting discussion topics on this free and open platform.They even cause the attitude shaping, turning, and action following.According to the statistics, among the Chinese Internet users, microblog users older than 19 years old occupy 88.81% until September 20, 2012.The number of the microblog users is about 327 millions [1].Microblog has been a crucial network tool for information propagation.Therefore, it is important to predict topic law and propagation trend in microblog network and study the opinion leaders in topics.It will contribute to design corresponding mechanisms to guide and control the propagation process.
Nowadays, researches about topic diffusion law have obtained high attention, which are mainly related to the time varying model [2,3].Zhao et al. [2] put forward a propagation model in discrete time based on the node popularity and liveness.Zhang et al. [3] used epidemic model for reference to deduce both the BBS and the blog multimodal topic propagation models as well as the multimodal ones.Yan et al. [4] proposed an extended susceptible-infected (SI) propagation model to incorporate bursty and limited attention.Chen and Gao [5] defined some authority nodes that release anti-rumor information as the prevention strategy to control the rumor in a directed microblog user network.And some works predicted diffusion probabilities by independent cascade (IC) model [6,7].Afrasiabi and Benyoucef [8] observed that the effect on propagation of people who are not either in a friendship network or a subscription network is higher than that of friends or subscribers.Yoganarasimhan [9] studied how the size and structure of the local network around a node affect the aggregate diffusion of products seeded by it.
Identification of opinion leaders has been widely concerned.Zhai et al. [10] gave many kinds of recognition methods in their work, while there are three research methods in opinion leader recognition: firstly, an analytical method based on the characteristic attribute, for instance, AHP

The Hot Topic Propagation Model
Hot topic refers to the hot issue that the public most care about within a certain time and range.In recent years, most issues come to public attention through the Internet.This paper takes Sina Microblog as the background and takes the hot topic as research object.This research observes the characteristics of the dynamic propagation process and may digs the opinion leaders.

2.1.
The Hot Topic Propagation.The propagation velocity of hot topic is wide and quick.In order to collect the real-time and more complete microblog data, we use Rweibo to grab the Sina Microblog data automatically.Rweibo is a software development kit of R language, which implements the interface provided by Sina microblog.The data refers to the numbers of talking about these hot events on Sina Microblog.We analyze the quantity change of 4 events, 40 days after happening.The 1st event is about Yuan Lihai's adopting those abandoned babies and orphans.The 2nd event is about the PM2.5 haze in China.The 3rd event is concerning the Diaoyu Islands.And the 4th event is concerning the 2012 Nobel Prize for literature which Mo Yan was awarded.
As shown in Figure 1, after these incidents, the rate of the amount of daily posting is easily seen.The figure's horizontal axis shows the days of these events and the vertical axis shows the percentage of the amount of daily posting and the total number of microblogs that the users participate in discussing one topic in the network.
In event 1, the number of its microblog posts peaks in a day, which shows the timeliness of microblog.The number of microblog postings on event 2 shows the first peak from the 5th day to the 7th day.The National Meteorological Center of CMA issued a haze alert so that the second peak occurred after the 22 day.The number of microblog posting peaked in the 16th, since Japan deployed fighter plane to prevent Chinese plane from flying in the Diaoyu Islands on the 14th day, and the USA has long interfered with this event.After Mo Yan was awarded the 2012 Nobel Prize for literature, the number of postings on the event doubled.We can see the development trend of the event through the number of microblog every day.The data we collected is completely matched with the actual situation.As shown in Figure 1, event 1, event 3 and event 4 belong to the single-peak events.They meet at the peak, and the propagation rate spread slowly so they died in about 30 days.Otherwise, event 2 belongs to the multipeak event.Its propagation rate has two peaks, and the first one is higher than the second one.Therefore, the data collection of identifying the opinion leaders' needs to last for at least 30 days after the first peak appeared.

The Hot Topic Propagation Model.
Let the undirected graph  = {, , } represent the actual propagation network, where  is the set of microblog nodes,  is the set of the edges of connecting the users, and  is the set of authority value.We suppose that any two nodes can communicate with each other and the microblog network is a fully connected undirected graph.Zhao et al. [2] proposed a discrete time dynamic model for bursty propagation of incidental events.We build a time-varying model based on Zhao's model with the variational external field strength to simulate the topic propagation process.
Assume that  MAX represents the total number of microblogs that participate in discussing one topic in the network.Let  0 be the initial time and let   be the  unit time.Let (  ) be the posted microblog numbers at   and let (  ) be the new posting microblog number in ( −1 ,   ].Namely,  (  ) =  ( −1 ) +  (  ) . (1) We mainly discuss the statistical properties of (  ) and the change trend of (  ) by the simulation.The authority value of the user in the actual network is average value through the normalization of friends count, fans count, and microblog count.After checking the actual data of four events, we know the authority value follows the power-law distribution.Let the authority value of the user  be   .Its distribution is () which follows the power-law distribution, and the power law is at [−1.3, −1.9].Therefore, the authority probability density function is defined as where  is 1.5 at [1.3, 1.9] and  is a parameter.The node state is divided into the published microblog and the unpublished microblog.The function   (  ) represents the state of microblog  at   .Consider the following: 1, the published microblog; 0, the unpublished microblog. ( The topic field strength formed by internal nodes in the network is defined as where   is the authority value of node .
In fact, we can obtain the topic from the external network information.With the time passing, the external field strength will improve over time above a fundamental level and then tend to be stable.Because the external field strength is limited to the environmental capacity, we assume that the external field strength follows the logistic model partly.Suppose  0 is a parameter related to the rate of the initial external field strength changing and   is the fundamental level.The external field strength formula is as follows: In practice, some events contain two or more subevents.For example, event 2 contains two sub-events: "The National Meteorological Center of CMA issued a yellow haze alert on the 5th day" and "haze is enshrouded in eastern and midland China on the 21st day." The subevent can lead to a high propagation rate.
Therefore, on the event day, the simulation system is reset by the certain proportion.Namely, we turn some of nodes' state from published to unpublished when the first day of the second sub-event of each event comes.According to the actual situation, the occurred event time is known, saying that to set the occurred sub-event time is reasonable.
If the microblog  gets the topic information from the network at   , the probability of the unpublished state transformed into the published state is The different topics have some differences on the microblog number.In order to see the trend, we perform normalization to the propagation data; namely, In order to judge the simulation effect, we define the mean square error as the error function: where  represents the actual normalized data, 2.3.Simulation.We set the following steps in Algorithm 1 to simulate the process of the topic dynamic propagation.
After collecting the real data of event 1 to event 4, we use the computer program to estimate optimal parameters within a reasonable range of parameters.The result is shown in Table 1.
Zhao's algorithm [2] aims at the sudden accidents that do not contain sub-events.Accordingly, we give out the parameters in this algorithm, as listed in Table 2.
We work out the average error and minimum error of our algorithm and Zhao's algorithm in 1000 tests.Figure 2 and Table 3 are the algorithm comparison of events 1, 3, and 4.
The two algorithms have the better results in unimodal topic propagation.Event 2 contains sub-event, so the result has the obvious difference.As shown in Figure 3 and Table 4, our algorithm has better results on the precision.

Opinion Leader Identifying Model of Topics Network
Now, microblog, which is known as the most deadly public opinion carrier in network, creates a new era of the Internet media.With the emergence and prosperity, microblog not only provides a new platform to the traditional opinion leaders but also provides the fertile soil for the growth of the emerging opinion leaders.
3.1.Microblog Dataset.From the section above, we discuss the topics of how to propagate in the microblog network.We know that a topic will last for about 30 days.So the opinion leaders may appear in 30 days after the incident occurred.Therefore, we only dig out information in that period on the web.The data we use in this paper is about 3 hot topics in January 2013 and the event that Mo Yan awarded the 2012 Nobel Prize for literature, as shown in Table 5.
The details information of each microblog is as follows: (1) microblog: ID of microblog, the number of comments, the number of forward, the text of microblogs, the length of microblog, the posting time; (2) author: ID of user, the number of fans, the number of friends, the number of microblogs; in addition, we also collect information of comments about the event 4; (3) comment: ID of comment, the text of comment, and the length of comment, the posting time.Through Figure 4, we can see that the number of forward and the number of comments satisfy the power-law distribution and the exponent is in [−1.55, −1.30].It proves that the communication networks of these events are scale-free networks, and only a few users have much focus, so opinion leaders possibly exist.

The Method of Identifying Opinion Leader.
Although the theory of opinion leader has been widely used in different fields, the judgment standards of opinion leaders are divergent.There are three traditional methods of finding opinion leaders: questionnaire, self-report, and observation, but the cost of these methods is too high.Sina Microblog is a platform for information exchanging, so users can show their opinions to others by commenting and forwarding microblog.Users communicate with each other through commenting and forwarding microblog.Interaction provides a lot of data to support our research on opinion leaders.According to the definition of opinion leaders proposed by Paul Lazarsfeld, opinion leaders should be very active and have much influence in some topics.Therefore, we should analyze microblog opinion leaders from three aspects: influence, support, and activity.The more influence the users have, the more response they obtain by posting information and influence for the other users accordingly.In addition, opinion leaders should take an active part in discussing any topics and interact with other users such that it is more likely to show their own ideals to others.
In this section, considering these three aspects and combining the characteristics of microblog spreading, we extract features of opinion leaders.Then, we identify and analyze opinion leaders using methods based on the PageRank algorithm and the analytic hierarchy process (AHP).

AHP.
In this assessment system, we set 3 one-class indexes and 7 two-class targets, as shown in Table 6.The value of two-class targets is normalization of the actual data.Since each two-class target of the same one class target is equally important, equations of 1-class targets are as follows: Every two-class target is a normalization of actual data.The formula of normalizing is where  is original data of two-class target.Before normalizing the target , we should use an equation to measure it.The equation of posting time is   =  −|()−(0)| .And we set (0) be be Jan.4th, 2013.Supposing  is a parameter, we make it 0.01.Therefore, the value of assessment about user  is where  is the vector of weight and  = (  ,   ,   ) (see Algorithm 2).

Return
The Top  opinion leaders  = 1% ⋅ .} Algorithm 2: AHP.microblog users and considering the influence of the microblog users.Thus, the microblog opinion leaders are those users who have higher influence, get more comments to their microblogs, actively comment on others' microblogs and form a frequent interaction with surrounding people.
According to the above description, a microblog network on a certain topic can be defined as an undirected network  = (, , , ) with edge weight  and node strength .A node in set  means a microblog user.Set  is an edge set, where edge ⟨V  , V  ⟩ ∈ , which means a relationship of the comments between the user V  and the user V  .The   means the edge weight between the node V  and the node V  , which is the number of comments between the user V  and the user V  .Meanwhile, in the actual network, the users have different influential power, such as friends count, fans count, and microblog count, so we should add a node strength () to measure it.As shown in Figure 5.
PageRank algorithm is one of the top ten classical algorithms in data mining.It assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set.Assume that user  in the microblog network has interactive behavior with others; we define the user's opinion leader value (Microblog-Rank, MR) as follows.
Microblog interactive network is a weighted and undirected network.Firstly, we like to give out the weight of links and nodes.The formula is In (13), () is their own influence value measures by normalization of initial data, such as the number of fans, the number of friends, and the number of microblogs.Then, we get the sum of them.  is the number of communications between the user  and the user . the Microblog-Rank value for any node  can be expressed as follows: This section is based on the weighted network, so we calculate the MR value by the weight.In addition, because dangling links exist in the actual network, which has no reply link, it will lead the algorithms to be not convergent.Therefore, we add the damping factor , and this factor should be set between 0 and 1.And  is always 0.85 (see [18]).By the iteration, we can get all the users' MR values (see Algorithm 3).

The Event of "Mo Yan Being Awarded the Nobel Prize".
On October 11, 2012, Beijing time 19 o' clock, the 2012 Nobel Prize for literature was announced and Chinese writer Mo Yan was awarded.This event has received wide attention in China.We try to explore the influence of the emergencies among college students.We collected 703 microblogs in total.
The data set covers 698 pairs of comment relationships and involves 1171 users.Then, we establish a microblog interactive network based on reply relationship.Figure 6 is the degree distribution of the network, the abscissa is the number of degrees, and the ordinate is the percentage of each degree in the network.
In Figure 6, we see that the microblogs network of reply relationship is a scale-free network, and it satisfies the powerlaw distribution.Isolated users that did not participate in any replies account for nearly 45%, and only one person received 40 replies.
Using MATLAB R2009a, we calculate the MR value for each user and pick up the opinion leaders who are the users whose MR value is in the top 1%; the others are general users.Furthermore, the opinion leaders are visualized in the interactive network by UCINET6.0.Table 7 gives out the opinion leaders of the event "Mo Yan." In Figure 7, blue nodes represent general users, while red nodes represent opinion leaders.
In order to analyze the relationship between scale and influence of opinion leaders, we draw a picture to show that.In Figure 9, the influence increases quickly, when there are less than 15 opinion leaders.If the number is more than 30, the influence is not changing obviously.
From Figure 8, we know that, when  is more than 25, the value of each parameter of opinion leaders tends to be

Return
The Top  opinion leaders,  = 1% ⋅ .} Algorithm 3: Microblog-rank.stable.When  is less than 10, each parameter value changes greatly.Therefore, parameter  should be from 10 to 20.It is reasonable to let  be 12 and our results in Table 7 are reasonable.
Through the above analysis, we found that the opinion leaders of microblog in an accident should have high value in the number of fans, and the number of forward, the number of comments.Because the more fans an author has, the more users can see the microblog.And high numbers of forward and comments mean that the microblog will get much attention on the Internet.So the result is reasonable.January, 2013.Opinion leaders must be those who can give guides in topic discussions and attract more attention.Therefore, we set the weight of the Support to the maximum.In addition, opinion leaders should be those who are active in the topic discussions.Therefore, we set the active to the second most important parameter.The detailed weights are set as the above Table 8.

Three Hot Topics in
In order to measure the effectiveness of the algorithm, we use AHP and TOPSIS method [12] to obtain the Top 10 opinion leaders in these three events.The results are shown in Tables 9 and 10.
According to our analysis, the opinion leaders are all those who possess prominent values on one or more attributes ( Figures 10,11,12,and 13).Their integrated ranks are prior to others.
In event 1, users of the top 10 opinion leaders are in this list all the time.But their ranks have a little difference.All opinion leaders perform outstandingly on more than one attribute.In event 2, the first leader and the second one performs outstandingly on many attributes; however, others merely possess high values on the last two attributes.Moreover, values of parameters of the last six leaders are close to each other.
In event 3, opinion leaders that we obtained all perform outstandingly on "release time" attribute and "microblog length" attribute.From Figure 13, we can come to the conclusion that current affairs such as "Diaoyu Islands" are related more closely to the time and opinion leaders that often appear in the several days after the topic just occurred.
Above all, the results obtained by these two methods are similar.So it proves that the AHP method is cogent and effective.In the TOPSIS, we need firstly to find out the positive ideal solution and the negative ideal solution [5], but this is not needed in the AHP.Therefore, the AHP is simpler and more convenient.

Opinion Leaders.
From the results we recognized, we know that opinion leaders consist of the following kinds of users.
(1) Official microblog users of mass media, including magazines, newspapers, and TV stations such as "Youth Digest, " "Entrepreneurial state magazine, " "Oriental Morning Post, " and "China News Weekly, " all belong to the news media or the literature media.Mass media's understanding to the events is more authoritative and deeper than others and could attract more attention from web surfers.
(2) Public figures, such as the radio program host "Guo Chendong, " the chairman of the HIERSUN diamond agency "Li Houlin, " the radio program host "1011 Zhang Chi, " magazine editor "Zhou Jiangong, " and the litigant of the "a post-90s girl who showed off her books" "Chongqing Weizi, " possess certain social influence and their expressions in microblog attract more attention from others.Thus, their possibilities to be opinion leaders are much bigger than common users.
(3) Microblog users in fields related to the emergencies."Yuan Lihai's adoption" are about public welfare assistance; therefore, public welfare microblog user "powerful mouse v" exists in opinion leaders; "PM2.5 haze in China" is an event about environment problem; thus, microblog users on environmental protection such as "Moruier Air Purifier" and "Sina Environmental Protection" exist in opinion leaders; "Chinese Diaoyu Island" is politics military hot topics; therefore, "Nothing God 2430" in the field of current affairs and "Nucleon Submarine Chaser" on military field also come to

Figure 1 :
Figure 1: The rate of the amount of daily posting.

Figure 2 :
Figure 2: Comparison of the mean square error of simulation data and real data of events 1, 3, and 4.

Figure 3 :
Figure 3: Algorithm comparison: (a) demonstration of the simulation results of our algorithm and (b) demonstration of the simulation results of Zhao's algorithm.

Figure 4 :
Figure 4: The logarithmic graphs of comment amount and forward amount.

Figure 5 :
Figure 5: The relationship of comments among users.

Figure 6 :
Figure 6: The relationship of comments among users.

Figure 8 :
Figure 8: Diversity comparison for opinion leaders.

Table 1 :
The algorithm parameters settings.

Table 2 :
The parameter settings in Zhao's algorithm.

Table 6 :
The assessment system.
The number of nodes (): the value of influence about user .(): the power of support about user .(): the value of activity about user . = (  ,   ,   ).
Algorithm.The recognition method of PageRank algorithm is a method based on graph theory.It identifies whether the users are opinion leaders through studying the comments and reviewed numbers among the

Table 7 :
Top 12 ranked users by Micro-blog-Rank from Sina.

Table 8 :
Comparison table of the weights of the indicators.