Information Entropy Based on Propagation Feature of Node for Identifying the Influential Nodes

For understanding and controlling spreading in complex networks, identifying the most influential nodes, which can be applied to disease control, viral marketing, air traffic control, and many other fields, is of great importance. By taking the effect of the spreading rate on information entropy into account, we proposed an improved information entropy (IIE) method. Compared to the benchmark methods in the six different empirical networks, the IIE method has been found with a better performance on Kendall’s Tau and imprecision function under the Susceptible Infected Recovered (SIR) model. Especially in the Facebook network, Kendall’s Tau can grow by 120% as compared with the original IE method. And, there is also an equally good performance in the comparative analysis of imprecise functions. -e imprecise functions’ value of the IIE method is smaller than the benchmark methods in six networks.


Introduction
e phenomena of spreading can be seen everywhere in nature [1][2][3][4][5][6][7]. Many activities can be described as spreading in nature [8][9][10][11][12][13]. In recent years, many research studies focused on the spreading process due to its theoretical meaning and practical value [14,15], including rumor controlling [16][17][18], information diffusion [19][20][21][22], air traffic controlling [23][24][25], and viral marketing [26][27][28]. Among them, research on identification of the influential nodes in complex networks is a hotspot. Understanding the influence of the node has revealed new insight into applications such as mining the key nodes [29][30][31][32][33][34] and designing effective strategies to prevent epidemic from spreading or accelerate information diffusion. e identification of influential nodes is of great significance in fields of epidemic and rumor control, targeted advertising, and air traffic planning [35,36]. Recently, many researchers have put forward a variety of centrality methods to deal with the problem in a more efficient way to identifying these nodes. Degree centrality can be regarded as a typical method to deal with the former problem in consideration for the local information [37,38]. In view of this idea, Chen et al. proposed the Local Rank method by considering the 4th order neighbors of the node [39]. By taking into account the location information of nodes in the network, through the K-shell decomposition method, Kitsak et al. [40] discovered a fact that the most influential nodes are located at the heart of the network. en, a lot of improved methods based on the K-shell decomposition [41][42][43] have been proposed to identify the influential nodes. Closeness centrality [44] and betweenness centrality [45] are two path-based methods. In consideration of the neighbors' influence, Ren et al. [46] came up with the IRA method. Based on the IRA method, Zhong et al. [47] proposed the IIRA method by taking the propagation feature into account. Information entropy is also used as an important centrality to evaluate the influence of nodes [48,49].
Most of the previous methods assume that the node's influence depends on its own importance. But there is another key factor that cannot be neglected, namely, the neighbors' importance. On the basis of this idea, Guo et al. [50] put forward a method of information entropy (IE) by considering the neighbors' information quantity.
Nevertheless, the performance of the IE method is also affected by the propagation feature. In the example network that is presented in Figure 1, the influence of nodes 1 and 6 cannot be accurately identified by the IE method. In this case, we think that the neighbors' number and spreading rate are likely to have a positive effect on the target node. Based on this idea, we proposed an improved information entropy (IIE) method in which the target node's information entropy may be affected by the propagation feature. Compared with the benchmark methods in six real networks, the IIE method has been found with a better performance on Kendall's Tau and imprecision function under the Susceptible Infected Recovered (SIR) model [51,52].

IIE Method
e original IE method assumes that the influence of the node should be obtained through the information entropy of its neighbors. In the IIE method, we argue that the spreading rate and the number of neighbors could adjust the initial information entropy. We can fulfill the identification of the influential nodes by using the final information entropy, namely, the IIE method. e details of the IIE method can be interpreted below.
In general, an undirected network G � (N, E) can be described by an adjacent matrix A � a ij ∈ R n,n , where N represents the number of nodes and E represents the number of edges. If node i is connected to node j, a ij � 1; if not, a ij � 0. And, we think that the spreading rate and the number of neighbors could adjust the target node's information entropy. us, the IIE value of any node j can be calculated by where H ij is the information quantity provided from i to j, ψ j represents the influence of the propagation feature, the spreading rate is β, and k j represents the number of neighbors for node j, also the expression of k j is Equation (1) can be written as where p ij � (k i / m∈Γ L− j k m ) and Γ L− j indicates node j's Lth order neighbors. If L � 1, it indicates node j's direct neighbors.
To describe the IIE method in more detail, we set L � 1 and β � 0.2 by taking into account the example network in Figure 2. For the black nodes (node 1), the improved information entropy (IIE) of node 1 is then calculated by

Data Description.
ere are six empirical networks used to evaluate the performance of the IIE method. e US air network [53] is an integral part of the US air traffic networks. e Polblogs network works as a network of political blogs in the United States with political relationship. e datasets are available on the web. e e-mail network [54] refers to an electronic mail network of a university in Spain.
e Soc-hamsterster network is a social network where the edges between nodes indicate the friendship or family ties. e Facebook network was derived from the Facebook online social platform, and nodes as well as 25 edges. e SIR model can be used to simulate the nodes' real spreading influence, that is, the amount of the infected nodes can be considered as spreading influence. We set β � 0.2 and μ � 1, and the number of simulations are 10 3 . erefore, we can get results of 2.405 and 2.101, respectively, for the spreading influence of nodes 1 and 6. With the IE method, the influence of nodes 1 and 6 should be 1.875 and 2.023, respectively. its edges indicate the interpersonal relationship. e LastFM network [55] was derived from an FM broadcast platform for the Asian users where the edges represent that there exist friendships between nodes. e statistical attributes of the six networks are listed above in Table 1.

Measurement.
For this paper, the node spreading influence is simulated with the SIR model [52]. ere are three components to this system, namely, susceptible individuals (S), infected individuals (I), and recovered individuals (R). In each time step of the SIR model, the susceptible neighbor nodes of each infected node will be infected randomly with a certain probability β. During this time, each infected node would recover with a certain probability μ and will no longer be infected. e spreading influence of a node is the range of infected nodes X which refers to the number of nodes infected by the initial infected node in the whole network. e range of infected nodes X was calculated from an average of 10 3 experiments.
Kendall's Tau [56] and the imprecision function can be used to evaluate the superiority of the IIE methods. e value of Kendall's Tau is between [− 1, 1], and this function can be used to evaluate whether there is a correlation between two ranking lists. e higher the value of Kendall's Tau, the stronger the correlation between the two ranking lists. Kendall's Tau τ can be expressed as sgn(x) works as a sign function; if x > 0, the figure of sgn(x) equals to 1; if x < 0, the figure of sgn(x) equals to − 1; and, if x � 0, the figure of sgn(x) equals to 0. N represents the number of nodes in the lists, that is, in the network. Calculated by the centrality method, x i and x j are the order values in the ranking list for the nodes i and j. And, y i and y j are the order values in the ranking list for the nodes i and j which are generated by the real spreading influence. If (x i − x j )(y i − y j ) > 0, it means that there is a large correlation coefficient between the two different ranking lists. e imprecision function ε ϑ (p) evaluates the performance of the centrality method by calculating the average propagation ability of the top key nodes in the ranking list obtained by the centrality method. ε ϑ (p) should be expressed as where p is a proportion of the nodes to be selected, is also indicates that the accuracy of the centrality method is higher.

Simulation Results.
For this paper, we selected six real networks to test the IIE method. According to different networks, we set β ∈ [0.1, 0.2] and μ � 1 in the SIR model.
At first, we test the influence of different values of L on the performance of the IIE method. L represents the distance between nodes. If L � 1, the direct neighbors' information quantity will be provided to the target node. And, if L � 2, the target node's information quantity will be provided by its 2nd order neighbors. e influence of parameter L on the IIE method in six networks is shown in Figure 3, L ∈ [1, 4].
From Figure 3, we can figure out that the effect of L on Kendall's Tau τ calculated by the IIE method in different networks. Obviously, when we set the distance L � 2, Kendall's Tau τ can get the maximum in the US air, Polblogs, e-mail, and LastFM networks. It demonstrates that the IIE method is more accurate than the ones generated by the other values of L in the four networks. However, there are different phenomena in the Soc-hamsterster and Facebook networks. When L � 3 or L � 4, the value of Kendall's Tau is the largest, while the computation time of the IIE method increases dramatically. In addition, we know from the TDI theory [57] that individuals affect only a relatively small range of neighbors. erefore, we set L � 2 in later experiments.
To check the efficiency of the IIE method, the K-shell, degree centrality, closeness centrality, betweenness centrality, and IE method are selected as benchmark methods to compare with the IIE method in six networks. We set β ∈ [0.1, 0.2], μ � 1, and the distance L � 2. As can be seen from Figure 4, in the six networks, Kendall's Tau τ obtained by the IIE method is much bigger than the ones obtained by the benchmark methods. is indicates that the IIE method is superior to the benchmark method. It can also be seen from Figure 4 that, in the US air and LastFM networks, the value of Kendall's Tau τ obtained by the IIE method gradually increases along with the spreading rate β. On the contrary, in the Soc-hamsterster and Facebook networks, the value of Kendall's Tau τ obtained by the IIE method decreases with growth of spreading rate β. However, divergent phenomena exist in the Polblogs and e-mail networks. As the spreading rate β increases, the value of Kendall's Tau τ, Table 1: ere are six fundamental statistical attributes in the six networks, such as N, E, 〈k〉, β thd , r, and C. N and E, respectively, represent the number of nodes and edges. 〈k〉, β thd , r, and C, respectively, represent the average degree, epidemic threshold, degree assortativity, and clustering coefficient. Complexity 3 which is calculated obtained by the IIE method, will increase first and then decrease. Figure 5 illustrates the improvement of ratio η for Kendall's Tau as making a comparison between the IIE method and the benchmark methods. We define η as where τ IIE represents Kendall's Tau which is obtained by the IIE method. τ 0 represents Kendall's Tau τ calculated by the different benchmark methods. Obviously, if η > 0, which means the performance of the IIE method is much better. Figure 5 clearly shows that, when the IIE method compared with the benchmark methods, Kendall's Tau τ increases considerably. at is, in the six networks, the IIE method is more accurate than the other benchmark methods on identifying the influential nodes. We can also find that, compared with the IE method, the maximum value of η can grow by 80%. Similarly, Kendall's Tau τ shows a significant increase when the IIE method is compared with the other benchmark methods in the US air network. is means that the IIE method is superior to the benchmark methods. e same phenomenon occurs in other different networks. In particular, in the Facebook network, compared with the IE method, the maximum value of η can grow by 120% when β � 0.12.
As can be seen from Figure 6, the imprecision functions ε ϑ (p) of each method are presented and impressive results have been achieved by the IIE method in the six networks. In small networks such as US air and e-mail, the results of the IIE method are remarkably superior to those of other benchmark methods. For instance, ε IIE (p) is much lower than the benchmark methods, which means that the outcome of spreading predicted by the IIE method is more reliable than that predicated by the benchmark method. In the large LastFM network, ε IIE (p) is much lower than ε IE (p). is result reveals that the IIE method performs more accurately than the original IE method in identifying the most influential nodes. It is worth noticing that when p is small, the IIE method shows much better performance than the other benchmark methods. ese phenomena show the rationality of the IIE method considering the propagation feature of the target node.

Conclusions
For controlling the spreading process, one of the basic tasks is to estimate the spreading influence and identify the influential nodes. By considering the information entropy and spreading rate of the target nodes, we proposed an improved information entropy (IIE) method. e IIE method takes the spreading rate and the number of the target node's neighbors into account. And, those information dominate the new information entropy. According to the simulation results, the IIE method achieves a better performance than the IE method, and the IIE method (O(N)) does not add any parameters or increase computational complexity. In the six networks, the IIE method performs much better than the other benchmark methods, such as K-shell (O(N)), degree centrality (O(N)), closeness centrality (O(N 3 )), betweenness centrality (O(N 3 )), and IE method. Especially, in the Facebook network, comparing with the IE method, the maximum improved ratio η goes up to 120%. And, there also exists an equally good performance in the comparative analysis of imprecise functions. In the six networks, ε IIE (p) is much lower than the benchmark methods. ese results demonstrate that the IIE method is sure to identify the influential nodes more precisely than the benchmark methods. And, the key component of the IIE method can be utilized by other centralities. For example, the information entropy of the IIE method can be also obtained by the neighbors' K-shell values.
Compared to the benchmark methods of the six networks, accuracy of the IIE method can be more satisfactory on identifying the influential nodes, while it poses some inevitable challenges. One of the challenges is that the IIE method merely takes the influence of the spreading rate for the target node into consideration and neglects the impact from target node's neighbors. e distance L of the neighbors' should be paid more attention, for its value affects the performance of the IIE method. We should find out what factors affect the value of L. e temporal network has been paid more and more attention, which requires us to design an advanced information entropy method. And, it remains an interesting and open-ended problem.
Data Availability e datasets used in the present study are available from the first author upon reasonable request (googlezlf@163.com).

Conflicts of Interest
e authors declare that they have no conflicts of interest.