The Finding and Dynamic Detection of Opinion Leaders in Social Network

It is valuable for the real world to find the opinion leaders. Because different data sources usually have different characteristics, there does not exist a standard algorithm to find and detect the opinion leaders in different data sources. Every data source has its own structural characteristics, and also has its own detection algorithm to find the opinion leaders. Experimental results show the opinion leaders and theirs characteristics can be found among the comments from the Weibo social network of China, which is like Facebook or Twitter in USA.


Introduction
With further study, the definition of opinion leader expands.It involves not only the most influential person but also the most influential commentary.The finding and detection of opinion leader in social network have great commercial and political values.By identifying the most influential person, companies or governments can use this feature for selling or guiding public opinion, respectively.Additionally, detecting the most influential comments is also able to understand the source of public opinion formation process.By building multiple topic networks, this essay can detect opinion leaders with the algorithm of POLD (Positive Opinion Leader Detection).Some researchers had found that some ideas in control field such as data driven [1][2][3][4][5][6] and robust control [7][8][9][10][11][12][13] could be into the study of find and dynamic detection of opinion leaders in social network.However, this idea only stays on the think level.Therefore, this work will propose a Dynamic Opinion Rank algorithm to find the opinion leaders in the comments of Chinese news.By using the methodology, it can find the most influential comments from all the network comments and the most influential users form the entire user network.

Problem Formulation
A single theme network based Weibo news consists of three levels.Those levels are themes, comments, and users.There exist some relationship mappings between those different levels.For example, the mapping between themes and comments is 1 divided by , while the mapping between comments and users is  divided by .This study will analyze a single topic, build a single view based network and the mathematical model of users, and then find out the most influential comments and users.The structure of those three levels is shown in Figure 1.
As shown in Figure 1, there are three levels."Layer 1" stands for all the themes of news, "Layer 2" denotes the singletopic network  CN (, ), which is composed of comments, while "Layer 3" is the single-topic user network  UN (, ).Based on  CN (, ), it is possible to fine the most influential comments cmt 3 .Then, by using the mapping between comments and users, it can find out  3 from  UN (, ).

Attitude Stabilization Controller Design
3.1.Analysis of Emotions.The work of finding the positive and negative emotion links requires determining the propensity of emotions.Based on the HowNet dictionary, this work will firstly determine the emotional feelings of tendentious comments [16].Comments are usually classified according to emotional bias: positive (1), negative (−1), and neutral (0); this is shown in Table 1.
According to the preceding definitions, any comments   ∈  (1 ≤  ≤ ) can be divided into  statements, such as ⟨ 1 ,  2 ,  3 , . . .,   ⟩.Then, it applies ICTCLAS to split each statement   (1 ≤  ≤ ) into one word, such as ⟨ 1 ,  2 ,  3 , . . .,  1 ⟩ [17], and it extracts emotion words from emotion dictionary and obtains the number of statistical statements   containing negative words, such as "No." Usually, emotional value of words is set as 1, −1, and 0. Finally, accumulating all the emotional words in   , it can obtain the emotional value of the statement.Using the parity negative word sentence to correct the statement of emotional tendencies, it yields a final statement {−1, 1, 0}, while the cumulative review of all statements yields final emotional tendencies {1, −1, 0}.

3.2.
Modeling of Single-Topic Network.By using the explicit and implicit link algorithm, all of the link relationship of the set  is found.Based on the sentiment analysis methods, the algorithm proposed by this work to establish a single theme network is described as in Algorithm 1.
In Algorithm 1, the 1st to 7th line is to traverse the set  and to find out all of the link.The 8th is going to give a certain weight   to the positive and negative links; this weight   is given by In this equation, the function similarity (  ,   ) represents the contents of comments similarity between   and   and "tag" denotes the emotional consistency between comments.For any reply relationship   →   , if the comment is consistent tendency, then this comment is viewed as a positive link; that is, tag = 1.Otherwise, the comment is a negative link and tag = −1.The weight is thus assigned according to the following equation: explicitly links to   and tag = 1 −1   explicitly links to   and tag = −1 Sim (  ,   )   implicitly links to   and tag = 1 −Sim (  ,   )   implicitly links to   and tag = −1. ( In ( 2), if the connection relationship between   and   is explicit, then the similarity is equal to 1.If it is an implicit link relationship, the similarity is between the texts.If the emotional tendency between   and   is consistent, then the weight  , will not be changed, otherwise, its value will become opposite.The above construction procedure can be illustrated as in Figure 2.
In Figure 2(b), the structure of the set  = { 1 ,  2 ,  3 ,  4 ,  5 } is explicitly evaluated.According to the chronological order release  1 <  2 <  3 <  4 <  5 , and the sequence corresponding to the floors floor 1 , floor 2 , floor 3 , floor 4 , and floor 5 , the corresponding single topic views networks as shown in Figure 2(a).The serial number of the edge is ranked in ascending order.That represents the link discovery order.

Dynamic Detection of Opinion Leaders
Opinion leaders are the most influential comments or persons.This paper will present an approach to find out the most influential comments among a single point of network  CN (, ) and build a user views the network  UN (, ) to find out the most influential user.

The Factors of Time.
When reading a review or a comment, the longer the interval to reply is, the weaker its influence is.Hence, the impact of time should be considered [18].As the above analysis, this section will propose a model to explain the time factor and the comments of the relationship between the strength of influence; this impact of time is shown in Figure 3.
As shown in Figure 3, there is a comment set.The first comments will influence the late comment.For example, "B → A" represents that the comment B is affected by the comment A. The distance between those two comments denotes the time intervals.The larger the interval is, the weaker the influence is [19].For example, the distance between A and C is greater than the distance between C and B; thus, the impact of the B on the C is greater than its impact on A. On the other hand, the influence of comment set C will change over time.Therefore, the link weight between comments not only is related to the similarity   but also gradually changes with time [20].
According to the above analysis, it is found that there is an important relationship between the release time of the comment and the choice of the comment.Defining a function  to reduce the probability of the selection of comment, a function of distance on the time is defined as where  is a function of time  1 ,  2 , and the damping .The term  1 is the time of the respondents comment,  2 is the time when the replying person proposed a comment.Hence,  is time-varying function.If the reply comment is far away from now, the comment has a smaller probability to be accessed.In (3),  (0 <  < 1) is a time-dependent coefficient and  is a control factor.Thus, it can choose an appropriate value of  to enlarge or reduce the time.The larger the distance | 2 −  1 | is, the smaller the impact becomes.Additionally,  changes with time are shown in Figure 4.As shown in Figure 4, the function  is gradually changing with the time interval, where  = 0.85.Thus, the function  is defined reasonably.

The Detection of the Most Influential Comments.
In a single theme network, if   and   are explicitly linked, then it leads to  , = 1; the impact thus practically exists.If the link relationship is implicit, then 0 ≤ | , | ≤ 1; that is, the impact may exist between comments.For each  , , due to the effect of the time can be reduced by using the function , and then it follows that To normalize the probability, it needs to normalize the value Sum  : ,       . (5) By using ( 5), the transformed probability can be obtained as Then, the matrix of the improved finite Markov chain can be described as follows. (a) Step 1 (b) Step 2 (c) Step 3 As shown above, ,   , and   are sequentially linked only considering the time factor and the normalized matrix.

The Improved Model of the Finite Markov Chain.
In the field of information retrieval, the PageRank algorithm is widely used.Inspired by this algorithm, a random walk model called Dynamic Opinon Rank is proposed in this section.These algorithms not only take the emotional factors into account but also consider the time factor [21].
From the standpoint view of model use, if a comment gets more positive reviews, then it will be more influential.Moreover, if this comment also replies to other comments, according to the characteristics of the model PageRank, it is reasonable that its influence also will be passed each other.Usually, comments may be affected by the following two cases: (1) the comments raised by users are affected by the interested opinion with a probability ; (2) comments may also be subject to random probability 1 −  effects.
Based on the above analysis, an algorithm similar to PageRank is proposed as follows: where  is an  ×  improved finite Markov chain transition matrix and  represents the set of comments in the .
Transposing  yields where any line of of   denoted by   (1 ≤  ≤ ) represents all the cases that   links to   (1 ≤  ̸ =  ≤ ).For any element   = ( , ×  , )/Sum  , the ranking score can be calculated by using the following equation: where () represents the authority of the value.Following the above methods, it can eventually obtain the score over a period of time.Then, the comment with the maximum  authority value can be chosen as the opinion leader.If a comment gets a lot of penetration and most of them are positive emotions consistent link and further the interval time between the comments is not very long, then this comment may get higher ranking scores.

Experimental Results and Analysis
To verify the proposed algorithm, experimental analysis is conducted.The data for the experiment is obtained from Weibo news.Through tracking this news within two days (2012-08-17 07:55: 17∼2012-08-18 05:42:43) and dividing this period of time into four different time periods, each time period was analyzed to identify opinion leaders, and the dynamic change of the opinion leaders was analyzed.

The Result of Finding Out the Most Influential Comment.
By building a single-topic comments Network  CN (, ), setting the parameter  = 0.85 and  = 2, and applying the algorithm proposed in Section 4, the experimental results are shown in Figures 5 and 6.As shown in Figure 6, there are 211 comments in the first time period.It is also easy to find that the comment Number 25 received the highest scores.However, due to the short time, the relationships between comments are not clear.Therefore, opinion leaders may change with time; it leads to the inaccurate opinion leader.As shown in Figure 6, it is found that Number 25 is not the opinion leader, while Number 166 received the highest score.Hence, Number 166 is opinion leader at this time.Some comments received low scores because their views are not accepted by others.Although there has been an increase of the comment number during the second time of period, the relationship between comments still appears to be relatively sparse.
As illustrated in Figure 7, Number 25 becomes opinion leader, while the score of Number 166 decreases with time.Additionally, due to the increasing number of comments,   the relationship between comments becomes more and more dense and the status of the comments converges to be stable.
It is found in Figure 8 that the number of comments is 560 in the fourth time of period.Now, Number 25 received the highest score, and it is the opinion leader at this time.In comparison with Figure 7, the newly published comment's score grows faster.The result shown in Figure 8 also demonstrates that new comments will get more attention, and it also proves that it is reasonable to take time into account.On the other hand, many comments' scores are growing.Due to the characteristics of news comments, it will get less attention after a period of time.Moreover, the number of comments also will decrease.Hence, the leadership of Number 25 will be maintained for a long time.
The most influential comments and the sort scores are shown in Table 2.We find that opinion leaders are changing over time.Moreover, the rank of opinion leaders is also affected by the time.This also verifies that it is quite necessary to take time into consideration when developing the algorithm.
To evaluate the performance of Dynamic Opinion Rank algorithm, a standard should be proposed to allow the experts to divide comments in each time period into two categories: the strong and weak influence.Then, it needs to measure the time the comment raised, degree centrality, the degree of authority, and the -Score of several Opinion Rank algorithms.Those comparison results are shown in Figure 9.It is found that the Dynamic Opinion Rank algorithm has much more accuracy and stability than other approaches.It thus verifies the effectiveness of the proposed scheme.

The Finding of the Most Influential Users.
In the process of finding out most influential users, single-topic user network  UN (, ) should be constructed firstly, and then the    proposed algorithm in Section 4 should be applied to detect the most influential users.For the DBSCAN density-based clustering algorithm, the radius range is set between 0.06 and 0.12, and the initial MinPts is chosen as 1.Consequently, there exist some clusters containing noise.With application of the proposed approach, 3∼5 clusters are finally obtained, it is shown in Table 3.
From Table 3, we find that the first clusters with most elements can be removed then replaced by clusters with less elements.As  UN (, ) is sparse, set  PD = 10,  CQ = 0.3 and  DC = 10.Finally, it detects opinion leaders in each period.The result is illustrated in the last line of Table 3.According to the experiment, it reveals that opinion leaders can change with time dynamically.

Conclusions
This paper presents a Dynamic Opinion Rank algorithm to find out the opinion leaders in Chinese news.Unlike the existing approaches, the proposed network model explicitly takes explicit and implicit links into account.Moreover, the proposed algorithm was able to conclude that the most influential comments and the opinion leaders were time-varying.Experimental results further verified the effectiveness of the proposed strategy.

Figure 4 :
Figure 4: The response of the probability function  changing with time.

Table 1 :
Sentiment analysis of forum comments.
[14]ore  , score for any comment   ∈ .Based on the sort according to theirs scores, it can be assumed that  1, score >  2, score > ⋅ ⋅ ⋅ >  , score .Then, the comments with the highest score are defined as the most influential comments; these are also the opinion leaders of comments[14].Definition 2 (the most influential user).As in Definition 1, for the user set  = { 1 ,  2 , . . .,   }, each user   ∈  has its own score  , score .Sorting those scores, it follows that  1, score >  2, score > ⋅ ⋅ ⋅ >  , score .Then, the user with the highest score Input: explicit links and implicit links in , sentiment orientation   of every   ∈ ; Output:  CN (, ) //Comment Network of ;   negative link to   ; (8) assign weight wtij for edge   →   ; Algorithm 1: The algorithm to build comment network.

Table 2 :
The changes of opinion leader in the comments.

Table 3 :
Clustering results for detection of opinion leader from users.