Public opinion emergencies have important effect on social activities. Recognition of special communities like opinion leaders can contribute to a comprehensive understanding of the development trend of public opinion. In this paper, a network opinion leader recognition method based on relational data was put forward, and an opinion leader recognition system integrating public opinion data acquisition module, data characteristic selection, and fusion module as well as opinion leader discovery module based on Markov Logic Networks was designed. The designed opinion leader recognition system not only can overcome the incomplete data acquisition and isolated task of traditional methods, but also can recognize opinion leaders comprehensively with considerations to multiple problems by using the relational model. Experimental results demonstrated that, compared with the traditional methods, the proposed method can provide a more accurate opinion leader recognition and has good noise immunity.
As the Internet enters into the We-media era, every individual can be a message sender. However, public opinion emergencies may affect the social activities significantly due to the mixed netizen qualities. To avoid adverse effect, it is necessary to have a comprehensive understanding of the development trend of public opinion and recognize special communities like opinion leaders.
The earliest domestic and foreign researches on the discovery of network opinion leaders focused on using the opinion leader theory in the traditional social sciences and research method transplantation directly to recognize the internet leaders but failed to achieve ideal results. They often determined community opinion leaders based on the quantitative data analysis. These methods [
Behaviors of every netizen involved in the public opinion transmission are described by inherent attribute, content attribute, and social network attribute.
Inherent attribute refers to the independent attributes of participants from the concerning public opinion events such as career, position, internet age, logins, community credits, fans population, and concerns.
Content attribute refers to the behavioral description of the participants in a certain public opinion event including posts, replies, comments received, reposts, number of mentions, number of words, and emotional tendency.
Social network attribute refers to the mutual relationship of participants in the network mainly including fans and concerns of the participants.
Existing network opinion leader discovery is based on the recognition model involving only one or two attributes of participants. No network opinion leader recognition involving all three attributes of participants has been reported yet. This will affect the accuracy of the opinion leader recognition method.
Existing network opinion leader discovery views the attributes of participants as independent and identically distributed (IID) data. In the theory of probability statistics, mutually independent sequence of variables or other random variables that have the same probability distribution is called IID. However, attribute data of participants are relational data. Different attributes of participants are mutually correlated instead of being independent from each other. For example, fans’ population of inherent attribute often is proportional to comments received of content attribute; participants enjoying high attentions from opinion leader are more likely to be the opinion leader. The incomprehensive understanding of such relations will make some opinion leaders remain unidentified. Furthermore, existing network opinion leader discovery has no modeling solution to the relational data.
As a result, simultaneous application of all three attributes and the exploration of relationships in public opinion data can improve the performance of network opinion leader recognition method.
Markov Logic Networks refer to a learning method of statistical relation that is combining the Markov network and first-order logic together. It was proposed by Singla and Domingos [
Markov network [
Characteristic function can be any real function of state. In this paper, characteristic function refers to the dual characteristics value (
Markov Logic Networks are a first-order logic knowledge base where every code has a weight. This first-order logic knowledge base can be viewed as the template of Markov Logic Networks. Viewed from probability, the Markov Logic Networks provide a simple language to define large Markov network as well as a flexible and modularized integration with abundant knowledge. Viewed from the first-order logic, the Markov Logic Networks provide a sound processing to knowledge base with uncertainties, defects, and even contradictories, thus decreasing the vulnerability.
Take the data set of Skyline for example. The simplest situation is as follows: suppose the knowledge base only contains the formula
Given an individual constant set
Closed Markov Logic Network.
Markov logical reasoning is equal to the probabilistic reasoning of the complex relationship. The basic task of reasoning is to inference the most possible state of world
There are two basic types of reasoning: First, we search the most possible state satisfying some evidence and probability of computer random condition. Next, lazy reasoning and relieving reasoning are compared in improving the performance of above two types of reasoning in processing more complicated relationships. The lazy reasoning only requires adopting instantiation “default” value to the deviated basic values, while relieving reasoning divides indistinctive atoms into one group and views them as an independent unit.
The Markov Logic Networks learning includes structure learning and parameter learning. Structure learning is to learn the model structure (network structure of the Markov Logic Networks) from data. In structure learning, it is more difficult to learn the rules. There are two structure learning methods based on the inductive logic programming (ILP):
Markov Logic Networks learning includes parameter learning and structure learning. Parameter learning has production parameter learning and discriminant parameter learning as well as their corresponding formulas and approximate algorithms. Structure learning also includes top-down structure learning and bottom-up structure learning.
The overall structure of the network opinion leader recognition system based on the Markov Logic Networks is shown in Figure
Overall structure of the network opinion leader recognition system based on the Markov Logic Networks.
The designed network opinion leader recognition system includes three modules: public opinion data acquisition module, data characteristic selection and fusion module, and opinion leader discovery module based on Markov Logic Networks. The public opinion data acquisition module is for data collection concerning specific public opinion event. Data characteristic selection and fusion module is for processing and analyzing collected data to disclose the relationship between core characteristics and attributes. The opinion leader discovery module based on Markov Logic Networks is to design predicates, build the knowledge base, and establish the MLN model according to the relationship between core characteristics and attributes.
The technical route of opinion leader recognition based on Markov Logic Networks is presented in Figure
Technical route of opinion leader recognition based on Markov Logic Networks.
The primary task of the training module is to design predicate. Predicate design has two stages:
The initialization module will convert contents in the corpus into DB according to existing predicate design. The structure learning module can conduct the structure learning through the available learnstruct program of Alchemy. Beam search is the default structure learning algorithm. Weight learning can be implemented by the available learning program. Weighted MLN clause can be gained through the structure learning and weight learning, which will be used for reasoning. In the designed system, these weighted MLN clauses are used to deduce user’s identity.
In the verification module of the model, both verification data and training data are converted into DB according to the predicate design. The verification module is mainly used to deduce user’s identity.
Reasoning results contain all possible predicates and their possibilities. Take the predicate of teachby(course, teacher) for instance. Course represents course and teacher represents teacher, meaning that the course is taught by the teacher. If the course is Chinese, possible teacher set is
Result extraction means to select the reasoning result with highest probability as the final result. In the above case, Tom will be selected as the teacher of Chinese. The data output module will calculate AUC and CLL.
Netizens were classified according to the nonrelational data model provided by Weka. First of all, the original data have to be converted into ARFF. ARFF mainly includes attribute assertion and data [
We chose SVM to classify netizens involved in the “Xu-Ting Event” on the legal forum of Skyline Gossip as Leader, Normal, Passer, and Waterarmy. In our experiment, every netizen was recognized independently. Experimental results are presented in Figure
Group recognition of nonrelational data model.
It can be seen from Figure
Firstly, we have to design predicates. Our designed predicates according to the characteristic selection and personal priori knowledge are listed in Tables
Designed predicates of content attribute.
Predicate | Meaning |
---|---|
Post( |
User who is represented by |
|
|
ReplyNumOfPost( |
The reply number of the post which is represented by |
|
|
ClickNumOfPost( |
The click number of the post which is represented by |
|
|
TotalPostNum( |
The post number of the user who is represented by |
|
|
TotalReplyNum( |
The reply number of the user who is represented by |
|
|
TotalBeReplyNum( |
The number of replies to the user who is represented by |
|
|
Correlation( |
The correlation level between the user who is represented by |
|
|
Sentiment( |
The degree of the emotional tendencies bases on the content published by the user who is represented by |
Designed predicates of social network attribute.
Predicates | Meaning |
---|---|
FansNum( |
The fans number of the user who is represented by |
|
|
FollowNum( |
The follow number of the user who is represented by |
|
|
Follow( |
A user who is represented by the first |
|
|
Reply( |
In the post which is represented by the |
Designed predicates of inherent attribute.
Predicate | Meaning |
---|---|
Gender( |
The gender of the user who is represented by |
|
|
Age( |
The age of the user who is represented by |
|
|
NetworkAge(people, |
The network age of the user who is represented by |
|
|
LogNum( |
The login number of the user who is represented by |
|
|
CommunityCredits( |
The community credits of the user who is represented by |
|
|
HasPosition( |
The user who is represented by |
|
|
Role( |
The role of the user who is represented by |
After the predicate design, we have to convert original data into DB.
Next, we have to implement structure learning and weight learning. The input MLN file (predicate statements) for structure learning is shown as Box
follow( reply( post( replynumofpost(post_id, clicknumofpost(post_id, totalpostnum( totalreplynum( totalbereplynum( act( repeat(
And the structure learning results are shown as Box
10.4475 10.7128 10.4475 −6.70724 replynumofpost(a1,a2) v !replynumofpost(a1,a3) 5.77628 5.77628 5.83318 5.77628 5.83318 4.7222 5.83318
The input MLN file (predicate statements and design statements) for weight learning is shown as Box
repeat(a1,+a2)=>act(a1,+a3)
totalreplynum(a1,+a2)
post(a1,a2)
act(a1,+a2)
act(a1,+a2)
The design statements are recognition results of four groups.
The weight learning results are represented as Box
5.8819 act(a1,Waterarmy) v !repeat(a1,Level_repeatnum_20To49) 5.28195 4.26896 !totalpostnum(a1,Level_totalpostnum_2To2) v !totalreplynum(a1,Level_totalreplynum_10To49) v !totalbereplynum(a1,Level_totalbereplynum_2To9) v act(a1,Normal) 4.93641 !totalpostnum(a1,Level_totalpostnum_Lessthan2) v !totalreplynum(a1,Level_totalreplynum_Lessthan2) v !totalbereplynum(a1,Level_totalbereplynum_Lessthan2) v act(a1,Passer) 3.76921 !totalpostnum(a1,Level_totalpostnum_Lessthan2) v !totalreplynum(a1,Level_totalreplynum_Lessthan2) v !totalbereplynum(a1,Level_totalbereplynum_Morethan200) v act(a1,Leader) 3.74401 !reply(a1,a2) v act(a2,Normal) v !act(a1,Passer) 4.34903 !reply(a1,a2) v act(a2,Leader) v !act(a1,Leader)
Table 4.27395 !reply(a1,a2) v act(a2,Leader) v !act(a1,Leader).
Valuable clauses learned from the “Xu-Ting Event”.
Weight | Formula |
---|---|
2.76133 | !fansnum(a1,Level_fansnum_10To49) v !follownum(a1,Level_follownum_Lessthan10) |
|
|
2.63516 | gender(a1,Female) v !lognum(a1,Level_log_num_1000To4999) |
|
|
3.21897 | !fansnum(a1,Level_fansnum_10To49) v !follownum(a1,Level_follownum_Lessthan10) |
|
|
3.77246 | gender(a1,Female) v !lognum(a1,Level_log_num_Lessthan1000) |
|
|
4.27395 | !reply(a1,a2) v act(a2,Leader) v !act(a1,Leader) |
|
|
5.65332 | gender(a1,a2) v !age(a1,a3) v !age(a1,a4) v lognum(a1,a5) v lognum(a1,a6) |
|
|
6.06442 | !communitycredits(a1,a2) v !communitycredits(a1,a3) v !totalreplynum(a1,a4) |
|
|
6.06101 | !networkage(a1,a2) v !networkage(a1,a3) |
This clause means that if a1 is the leader and a1 replies a2, then a2 is a leader too. This is true in real life.
The learned valuable clauses were selected for reasoning of the test set. According to the experimental results (shown as Box
act(Person15,Waterarmy) 0.505 act(Person15,Normal) 0.813969 act(Person15,Leader) 0.495 act(Person30,Waterarmy) 0.99995 act(Person30,Normal) 0.99995 act(Person30,Leader) 0.99995 act(Person21,Waterarmy) 0.495 act(Person21,Normal) 0.823968 act(Person21,Leader) 0.504 act(Person55,Waterarmy) 0.99995 act(Person55,Normal) 0.99995 act(Person55,Leader) 0.99995 act(Person51,Waterarmy) 0.490001 act(Person51,Leader) 0.513999 act(Person51,Passer) 0.99995
The recognition accuracy comparison results of relational data model and nonrelational data model to different events are listed in Table
Recognition accuracy comparison of relational data model and non-relational data model.
Event name | Forum | The accuracy |
The accuracy of |
---|---|---|---|
Xu-Ting Event | Legal Forum | 79.5 | 82.5 |
Xu-Ting Event | Tianya By-talk | 77.4 | 80.6 |
Three years of great Chinese famine | Discussion about the history | 77.8 | 81.8 |
Three years of great Chinese famine | Tianya By-talk | 76.9 | 80.8 |
This paper firstly summarizes and evaluates the shortcomings of existing opinion leader recognition method, describes the advantages of Markov Logic Networks in opinion leader recognition, and summarizes the associated theories of Markov Logic Networks including basic concepts as well as theoretical models (reasoning and learning). The Markov Logic Networks combine the probability theory and first-order logic perfectly, integrating logic/relation expressions, uncertainty processing, and learning. Secondly, this paper designs and implements a network opinion leader recognition system based on previous theories. This designed system firstly collects some public opinion data as the training set for structure learning of Markov Logic Networks and then uses the learning results to reasoning the control results of corresponding public opinion domain of the test data. The experimental results are compared and analyzed to evaluate their validity. Thirdly, this paper carries out an experimental verification, which verifies the superiority of the designed network opinion leader recognition system.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by the National Natural Science Foundation of China (NSFC) under Grant no. 61173145, the National Basic Research Program of China under Grant no. G2011CB302605, and the National High Technology Research and Development Program of China under Grant no. 2011AA010705.