In the era of big data, social network has become an important reflection of human communications and interactions on the Internet. Identifying the influential spreaders in networks plays a crucial role in various areas, such as disease outbreak, virus propagation, and public opinion controlling. Based on the three basic centrality measures, a comprehensive algorithm named PARWRank for evaluating node influences has been proposed by applying preference relation analysis and random walk technique. For each basic measure, the preference relation between every node pair in a network is analyzed to construct the partial preference graph (PPG). Then, the comprehensive preference graph (CPG) is generated by combining the preference relations with respect to three basic measures. Finally, the ranking of nodes is determined by conducting random walk on the CPG. Furthermore, five public social networks are used for comparative analysis. The experimental results show that our PARWRank algorithm can achieve the higher precision and better stability than the existing methods with a single centrality measure.
With the rapid development of network and information technology, the applications in form of rich media have involved in all aspects of our lives. Accordingly, the interaction and communication between individuals have become more and more convenient and frequent. For example, the platforms such as Facebook, WeChat, QQ, and WhatsApp are very helpful for users to deliver their messages, options, or pictures. As a result, the individuals in the society have been tighted together in an invisible way, that is, the socalled
At the early stage, the studies mainly focus on the static statistical properties that characterize the structure of social networks [
As reported in [
Recently, some hybrid approaches have been proposed for the influence maximization problem. In their solutions, several different measures like degree centrality are usually taken into account to design a comprehensive model for evaluating the influence spread of node. Typically, Jalayer et al. proposed a “greedy TOPSIS and communitybased” (GTaCB) algorithm [
The remainder of this paper is organized as follows. In Section
During the spreading process of diseases or rumors, their influences are usually sparked by one or several initial nodes in a social network. Due to the difference in the location of node in the entire network structure, different nodes will have different transmission abilities for disease or rumors and thus will bring different influences on the network. Therefore, it is very necessary to evaluate nodes’ influences and then rank them. This measurement is helpful in scientific decisionmaking on social networks, such as the monitoring of public opinion transmission and the controlling of disease propagation.
In this paper, we assume that the initial source of spreading is only due to one node in a network. Then, the
It should be noted that the information or disease propagation may be caused by several original source nodes in a social network, and hence to identify multiple influential nodes is also an interesting problem [
In this study, our objective is to design a framework for evaluating node influences in a social network through comprehensively considering some basic measures. In the past, quite a few measures have been presented to capture the importance of each node in a network.
The degree centrality is the earliest and most simple method to depict the influence of node in a network. For node
Degree centrality measures the node’s importance from the perspective of degree. Its inherent limitation lies in that it can only reflect the local structure around a given node, i.e., the node and its neighbors, but the reachability from it to the nodes beyond its neighborhood is completely ignored.
The betweenness centrality is used to capture how well situated a node is in terms of paths that it lies on. Specifically, for a node
It is easy to see that betweenness centrality is a measure to reflect the gateway feature of a node. But it has poor capability to express the strength of connections from the node of interest to its neighbors.
The closeness centrality is a measure of tracking how close a given node is to any other nodes in a network. For node
According to the definition in (
Based on the analysis on the above three measures, we can find that each measure has its own specialty for reflecting information (or disease) propagation, but it also has shortcomings. Therefore, combining these representative issues into a comprehensive measure is probably a rational way for identifying influential nodes. As mentioned earlier, the basic measures in our framework can be extended or replaced according to the specific requirements. Besides the above three centrality measures, quite a few other measures have been presented in recent years, such as diffusion centrality [
The random walk model is a special case of Markov chain, that is, a finite and timereversible Markov chain. It arises in many models in mathematics and physics [
In fact, the random walk model can be easily applied to the directed graph [
In the paper, we attempt to design a comprehensive algorithm for evaluating node influences by synthetically considering three basic and independent measures about influence. Thus, the three basic measures are the input data for further processing in our algorithm. Here, assume that the basic measures, such as
As shown in Figure
The overall framework for ranking nodes according to their influences.
It should be noted that, in this paper, only three basic centrality measures are taken into account in the algorithm. However, the above framework is a scalable model for evaluating node influences. That is, besides the basic measures, some other advanced measures can also be adopted in it. Each measure has its own advantages and limitations in representing the influences of nodes, so the complementarity should be considered when choosing the measures for use in our framework.
In order to address the technical details of our proposed algorithm, a small network (graph) is used as a running example. As shown in Figure
A running example for evaluating node influences.
According to the definitions in Section
Three typical measures of node influences.
Measures  Nodes  

1  2  3  4  5  6  7  
Degree ( 
3  2  4  2  2  2  1 
Betweenness ( 
5  0  19  3  1  10  0 
Closeness ( 
0.06  0.05454  0.075  0.05454  0.04615  0.05454  0.0375 
With regard to the basic measure of betweenness centrality (
For the third measure, i.e., closeness centrality (
As mentioned above, the basic measures of node influences merely reflect one aspect of information (or disease) spreading features and behaviors. On the other hand, it may be difficult to distinguish the order of some nodes due to the same metric values. In the paper, we present a new ranking algorithm by comprehensively considering the above three basic measures. The whole algorithm consists of two key steps: partial preference analysis and random walk on the complete preference graph.
To perform the analysis on partial preference relations, for each basic measure, it needs to judge the preferences between nodes in a network.
(preference relation). Given a measure of node influence, the preference on a pair of nodes can be modeled in the form of function
Take the measure
The value of
Based on the above definition of preference relation, the partial preference graph for a given influence measure can be further defined as below.
(partial preference graph, PPG). Given a measure of node influence, if the preferences of all node pairs are analyzed, the partial preference graph
Since the ranges of different measures are not identical, it is hard to merge them into a comprehensive model for ranking node influences. Thus, we normalize the values of each measure for all nodes firstly and then generate the corresponding partial preference graph (PPG). Finally, based on the PPG related to each kind of basic measures, the comprehensive preference graph can be built.
In our work, we adopt
The normalized data of three measures for the running example.
Measures  Nodes  

1  2  3  4  5  6  7  
Degree ( 
0.6667  0.3333  1.0  0.3333  0.3333  0.3333  0.0 
Betweenness ( 
0.2632  0.0  1.0  0.1579  0.0526  0.5263  0.0 
Closeness ( 
0.6  0.4544  1.0  0.4544  0.2307  0.4544  0.0 
For the measure of degree centrality (
The PPG and corresponding matrix about degree centrality for the example network.
The PPG w.r.t. degree centrality (
The matrix of PPG w.r.t. degree centrality
In a similar way, the partial preference graphs with respect to the other two measures (i.e., betweenness centrality and closeness centrality) are built and demonstrated in Figure
The PPGs w.r.t. other two measures for the example network.
The PPG for betweenness centrality (
The PPG for closeness centrality (
To perform the comprehensive evaluation of node influences, it is necessary to construct a model by combining three partial preference graphs together. Here, we define this model as a comprehensive preference graph.
(comprehensive preference graph, CPG). For several PPGs about the same social network, the comprehensive preference graph
To deeply understand the definition of CPG, we use the PPGs about three different measures to illustrate the construction of CPG. Here, we denote the preferences of edge
It is not hard to find, during the above construction procedure of CPG, that all three PPGs are regarded as equally important. In the real application scenarios, if some basic measures need to be considered differently, the overall preference of edge in CPG can be defined as
Obviously, in the generated CPG, the sum of the outgoing edges from a node may be not equal to 1.0. To facilitate the latter operations, we firstly regularize the CPG according to the following regularization.
(regularized CPG,
Based on the above definition, the final regularized and comprehensive preference graph (i.e.,
The regularized CPG and the corresponding matrix for the example network.
The regularized CPG (
The matrix of regularized CPG
As mentioned earlier, the random walk model can be applied to the directed graph to make scientific decisions about ranking. In this paper, we apply random walk to the comprehensive preference graph to rank the node influences in a social network. In the application, each node is attached with an important factor, and the preference between two nodes is considered as the transition probability of node importance. Here, our goal is to obtain a relatively stable probability distribution over nodes through the iterations in random walk, where the probability is interpreted as the importance or influence of each node. Since the probability of node reflects its importance or influence, the final stable probability distribution can be used to rank nodes.
For the example social network, the rank of seven nodes can be generated through applying random walk to its regularized
When step number
Based on the above technical framework and example illustration, here we further address the rank algorithm for node influences based on preference analysis and random
(2) the betweenness measures (
(3) the closeness measures (
(4) the step number (
1.
2. apply the
3. analyze the preference relation of each node pair;
4. build a partial dependence graph (
5. represent current
6.
7. combine three basic PPGs together to form a
8. apply the regularization on each row in
9. set
10.
11. apply rule
12.
13. generate the ranking (
14.
The algorithm takes three basic measures and step number of random walk iteration as the input data and outputs the sorted sequence of nodes about their influences. In lines 1–6, the partial preference graph is generated according to the preference relation in each basic measure. For three basic measures
In order to validate the effectiveness of our proposed algorithm for evaluating node influences, six public social networks are adopted in the experimental analysis. The basic features of these networks are shown in Table
The basic statistical features of six reallife networks.
Network  

ARPA  21  26  2.48  4 
ChenNet  23  40  3.48  8 
Karate  34  78  4.47  15 
PolBooks  105  441  8.4  25 
Airlines  235  1297  11.04  130 
1133  5451  9.62  71 
The ARPA (Advanced Research Projects Agency) network [
In order to perform comparative analysis, our algorithm and other three algorithms based on basic measures were all implemented in Java programming language on the Eclipse platform with JDK1.7. The experiments were employed on an Intel Core i5 CPU 3.2 GHz machine with 4 GB RAM running Windows 7.
To verify the correctness and rationality of our algorithm, it needs a reference rank of nodes about their influences to perform the evaluation. Here, we also use the results of the SusceptibleInfectedRecovered (SIR) epidemic model [
At each step, for each infected node, one of its susceptible neighbors will be randomly infected with probability
For each time of simulation, the total number of infected and recovered nodes of a given initially infected node can be counted. After
While evaluating the node influences, both our proposed algorithm and three basic methods produce a ranked list of nodes. Hence, the precision of influence evaluation should be measured by analyzing the similarity between the generated rank and the SIR modelbased simulation result. Here, we refer to two metrics in the field of information retrieval to show the precision of each evaluation method.
Suppose
Further, the
Besides the measure of AP, we also adopt the
In our experiments, the rank of the SIR modelbased simulation was used as the expected result, and the above two metrics between the ranked list of each method and the excepted rank were calculated, respectively.
The results of two relatively simple networks are addressed firstly in details, and then the ranking results of the other four networks are discussed. Finally, the effectiveness of our proposed evaluation algorithm for node influences is summarized.
For the ARPA network shown in Figure
Topological structure of the ARPA network.
The ranking results and precisions of four algorithms and SIRmodel for the ARPA network.
Algorithm  Ranking result  

SIR model  3, 14, 2, 15, 17, 19, 12, 16, 18, 13, 1, 4, 6, 20, 5, 11, 21, 7, 10, 8, 9  —  — 
Degree ( 
2, 3, 14, 6, 12, 15, 19, 1, 4, 5, 7, 8, 9, 10, 11, 13, 16, 17, 18, 20, 21  0.71  0.36 
Betweenness ( 
3, 12, 19, 6, 4, 14, 13, 5, 11, 2, 18, 10, 7, 20, 9, 21, 8, 17, 15, 16, 1  0.64  0.24 
Closeness ( 
3, 19, 12, 18, 4, 13, 14, 17, 2, 20, 5, 6, 11, 15, 16, 21, 1, 7, 10, 9, 8  
PARWRank  3, 12, 19, 14, 2, 6, 4, 13, 18, 5, 20, 11, 21, 10, 7, 17, 15, 16, 9, 1, 8  0.45 
Based on the results shown in Table
Briefly speaking, the result of the PARWRank algorithm is as good as or worse than that of the
The network in Figure
The network referred from literature [
The ranking results and precisions of four algorithms and SIR model for the ChenNet network.
Algorithm  Ranking result  

SIR model  23, 11, 22, 18, 16, 20, 17, 14, 15, 13, 12, 21, 10, 19, 1, 6, 8, 3, 4, 7, 2, 9, 5  —  — 
Degree ( 
1, 23, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 3, 8, 10, 19, 2, 4, 6, 7, 9, 5  0.73  0.54 
Betweenness ( 
1, 10, 6, 23, 11, 22, 21, 20, 12, 14, 16, 18, 17, 15, 13, 19, 3, 8, 2, 4, 5, 7, 9  0.64  0.49 
Closeness ( 
10, 23, 11, 6, 22, 1, 20, 12, 14, 21, 16, 15, 17, 18, 13, 19, 3, 8, 2, 4, 7, 9, 5  0.73  0.55 
PARWRank  23, 1, 11, 22, 20, 21, 12, 14, 10, 16, 18, 17, 15, 13, 19, 6, 3, 8, 2, 4, 7, 9, 5 
For the metric of
While considering the other metric, i.e.,
In summary, for the network (denoted as ChenNet) in reference [
Similarly, the comparison is also performed on the remaining four social networks and the corresponding results are listed in Table
The ranking results and precisions for other four networks.
Network  Algorithm  Ranking result (Top 10 nodes)  

Karate  Degree ( 
1, 34, 33, 3, 2, 4, 31, 9, 14, 23, …  0.70  0.43 
Betweenness ( 
1, 34, 33, 3, 31, 9, 2, 14, 19, 6, …  0.67  0.40  
Closeness ( 
1, 3, 31, 9, 34, 14, 33, 19, 4, 30, …  0.42  
PARWRank  1, 3, 34, 33, 31, 9, 14, 2, 4, 32, …  
PolBooks  Degree ( 
8, 12, 3, 87, 68, 70, 77, 29, 11, 38, …  0.54  0.08 
Betweenness ( 
29, 52, 9, 12, 68, 80, 3, 59, 8, 7, …  0.54  0.06  
Closeness ( 
29, 59, 7, 52, 9, 80, 14, 68, 4, 32, …  0.53  0.06  
PARWRank  29, 68, 9, 12, 8, 3, 70, 59, 80, 87, …  
Airlines  Degree ( 
137, 51, 81, 131, 71, 42, 155, 85, 201, 193, …  0.60  
Betweenness ( 
137, 51, 81, 201, 131, 71, 19, 174, 42, 119, …  0.60  0.30  
Closeness ( 
137, 51, 81, 131, 71, 42, 155, 201, 85, 193, …  0.59  0.33  
PARWRank  137, 51, 81, 131, 71, 201, 42, 119, 193, 155, …  
Degree ( 
105, 333, 16, 23, 42, 41, 196, 233, 21, 76, …  
Betweenness ( 
333, 105, 23, 578, 76, 233, 135, 41, 355, 42, …  0.55  0.20  
Closeness ( 
333, 23, 105, 42, 41, 76, 233, 52, 135, 378, …  0.21  
PARWRank  333, 105, 23, 42, 41, 233, 76, 135, 134, 52, 378, … 
For network Karate, the
For network PolBooks, the best method for evaluating node influences is our PARWRank algorithm. The metrics
For the third network, i.e, Airlines, the best one is still the PARWRank algorithm. Two evaluation metrics (i.e.,
For the last network Email, the
Based on the above results, we can summarize the dominance relation of the four methods and demonstrate the results in Table
The summary of the effects of four methods for ranking node influences.
Network  Dominance relation of four methods  

ARPA  PARWRank 

ChenNet  PARWRank 
PARWRank 
Karate  PARWRank 
PARWRank 
PolBooks  PARWRank 
PARWRank 
Airlines  PARWRank 
PARWRank 
PARWRank 
PARWRank 
Specifically, for the metric of
For the other metric issue (
Based on the experimental analysis of the reallife six networks, we can conclude that our comprehensive ranking algorithm (PARWRank) achieves the better effectiveness than do the three basic methods for evaluating node influences in a social network.
Threats to construct validity regard the relation between theory and observation. In this study, we focus on the design of a new algorithm for comprehensively evaluating the influences of nodes in a social network. The SIR epidemic model [
Threats to external validity regard the generalization of our results in other situations. As mentioned in Section
Threats to internal validity regard factors that could influence our experimental results. We have carefully inspected the implementation code of our algorithm to ensure the reliability of experimental results. In this study, we treat the three basic measures equally in the algorithm. In fact, each of these basic measures may play a different role in identifying the influential nodes. Assigning different weights to them may produce different results.
As pointed out in the above subsection, our algorithm faces a potential threat in scalability. The threat comes mainly from two aspects: one is the problem of computation overhead, and the other is the robustness of the computation result.
In our algorithm, both PPG and CPG are represented by the matrix. For a largesized social network, the corresponding matrix of PPG or CPG has a large dimensional number accordingly. In general, a matrix with high dimensions will lead to the heavy computation overhead about matrix manipulation. Since the subsequent random walk is performed on the matrix of CPG, the computation cost will obviously increase if the size of the social network becomes large. To ensure the lightweight computation in our algorithm, it is necessary to build the reduced versions of PPG and CPG for the largesized social network.
As shown in (
Here, we provide a preliminary solution for largescale networks as follows. Suppose
Nowadays, the Internet has been applied to all aspects of our lives. Accordingly, the interactions between individuals on the Internet are becoming more and more frequent and plentiful, that is, the socalled
At the earlier stage, the research concerns mainly focused on the static features and structures of social network [
Besides the above static features, the dynamic issues, such as network evolution, information diffusion, and cascading failure, can help researchers to better explore the rules behind social networks. In recent years, the problem of identifying influential nodes has attracted wide attention [
Since the nodes with high betweenness often play the role of gateway in a social network, the betweenness is viewed as an important indicator for measuring the informationspreading ability of a node. Thus, the
As a classical algorithm for ranking Web pages, PageRank [
In recent years, some comprehensive ranking methods for evaluating the influences of node have been presented. Wei et al. [
The influence maximization is a very relevant problem to the influential node identification. It aims at finding a subset of key users that maximize their influence spread over a social network [
With the rapid development of Internet technology and Web media, social network, as a new communication platform, has penetrated into our lives and played an important role. It has affected all aspects of our lives, especially in the aspects of information diffusion, public sentiment analysis, and so on. Accordingly, it is very necessary to investigate the social network from both aspects of static structure and dynamic behavior. While considering the dynamic behaviors of a network, information diffusion between nodes is an important exemplification [
In order to identify the influential nodes in a social network with high precision, a comprehensive evaluation model is proposed in the paper. In our model, three basic and representative centralities are taken into consideration. For each basic centrality measure, a partial preference graph (PPG) is built according to the preference relations of node pairs. Then, the comprehensive preference graph (CPG) is generated by merging the above three PPGs together. Thus, the linkage between two nodes in CPG can reflect the overall preference information of three representative centralities. Subsequently, the random walk technique is performed on the CPG to rank the nodes in network according to their influences. Besides the running example, six public social networks, such as Arpa, Karate, and PolBooks, are taken as benchmarks to validate the effectiveness of our proposed evaluation algorithm. The experimental results confirm that our comprehensive algorithm based on preference relation and random walk has the obvious advantages than the three basic ranking methods.
Although our PARWRank algorithm has exhibited its good performance and robustness for identifying influential spreaders in a social network, there are still some valuable and interesting problems that deserve further exploration. For example, we will adapt our algorithm to rank the spreaders in the weighted social network. In addition, how to analyze the influences of nodes in a dynamic (or mobile) social network is also an attractive research topic.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that there are no conflicts of interest regarding the publication of this article.
This work was supported in part by National Natural Science Foundation of China (Grant Nos. 61462030 and 61762040), Jiangxi Social Science Research Project (Grant No. TQ2015202), Natural Science Foundation of Jiangxi Province (Grant Nos. 20162BCB23036 and 20171ACB21031), Science Foundation of Jiangxi Educational Committee (Grant No. GJJ150465), and the Education Science Project of Jiangxi Province (Grant No. YB2015026).