Ranking Spreaders in Complex Networks Based on the Most Influential Neighbors

1Department of Transportation Economics and Logistics Management, College of Economics, Shenzhen University, Nanhai Ave 3688, Shenzhen, Guangdong 518060, China 2Department of Risk Management and Insurance, College of Economics, Shenzhen University, Nanhai Ave 3688, Shenzhen, Guangdong 518060, China 3China Center for Special Economic Zone Research, Shenzhen University, Nanhai Ave 3688, Shenzhen, Guangdong 518060, China

Kitsak et al. [1] argue that the most efficient spreaders are located at the core of a network. By using the -shell decomposition (KS) analysis [19][20][21], the location of each node is defined as an integer index, -shell value, according to successive layers in a network. A small -shell value represents the periphery of the network while a large -core value defines the most influential neighbors; however, many nodes with an identical -core value have different spreading influence. In other words, the -shell method manifests a relatively low performance of monotonicity. To address the issue, many other methods have been proposed to improve the effectiveness of the -shell method. For example, Zeng et al. designed a mixed degree decomposition method [22] to rank spreaders according to the links connecting to both the remaining nodes and the removed nodes. Bae et al. [23] defined a coreness centrality index by summing all neighbors ' -shell values and subsequently found that this method provides a more monotonic ranking list than other ranking methods. Ma et al. [24] proposed a gravity model by considering the -shell value of each node as its mass and the shortest distance between two nodes in a network as their distance.
In this paper, we propose a novel influence measure, the most influential neighbors' -shell (MINK) method to quantify the spreading capability of a node. Here, we refer to the nodes with the largest -shell value as the most influential neighbors. Inspired by the effect of the leaders on social ties, we identify a spreader's influence by focusing on its interaction with the most influential neighbors. By using the -shell method to define the most influential neighbors, our proposal also takes into account a node's complex integration with other nodes in the network. It is worth noting that, compared 2 Discrete Dynamics in Nature and Society with the gravity model that considers the interaction with a node's neighbors within a given distance value r, MINK is unacted on the influence of subjective parameters. According to structural holes theory [25], structural equivalence occurs among a node's neighbors in networks that are lacking in structural holes ("structural holes" are network gaps between unconnected nodes, creating opportunities for unique information access and control), which is to say that the neighbors tend to have strong ties with each other and bring redundant information to the node. The MINK index originates from the idea that generally an influential neighbor has more unique information and therefore exerts more influence on a node's spreading capacity than other neighbors. In this case, the MINK index, which excludes repeated information from its less influential neighbors, is more refined and less costly in computation.
The rest of this paper is organized as follows. We briefly review previous studies and present our method in Section 2. In Section 3, we apply the Susceptible-Infected-Recovered (SIR) model to evaluate the performance of our proposed method in both real and synthetic example networks. Conclusion is provided in Section 4.

The MINK Index
Normally, a network = ( , ) with nodes and edges can be described by an adjacent matrix A = ( ) × , where = 1 if node is connected to node , and = 0 otherwise. For the sake of simplicity, is viewed as an undirected, unweighted, and simple network.
The degree centrality, based on local information, defines the influence of a node as the number of its adjacent vertices. The degree of a node can be expressed as follows: (1) The betweenness centrality measures the fraction of all shortest paths between each node pair which passes through the considered node . It can be described as where denotes the total number of shortest paths between vertex s and vertex and ( ) stands for the number of shortest paths from to travelling through vertex . The higher the score ranked by the method is, the more likely a node is a hub vertex, which is an information transfer station in a network. The closeness centrality is introduced to measure how long the information possessed by a node will propagate in a network. The closeness centrality of node is defined as the reciprocal of the average of shortest distances to all the other nodes: where is the number of all nodes and stands for the geodesic distance between vertex and vertex .
Bao et al. [18] put forward a semilocal centrality by including the effects of shortest distance, the number of shortest paths, and the transmission rate simultaneously, which is defined as where is the number of shortest paths between nodes and , ⟨ ⟩ denotes the average degree of the network, and is the neighborhood set whose distance to node is less than or equal to a coverage radius . Specifically, = 3 is set in literature [18].
The -shell decomposition method [1] endows all nodes with a corresponding -shell value by removing nodes iteratively as follows. First, we start with removing all nodes with degree DC = 1 and continue dropping the remaining nodes until no node with DC = 1 exists in the network. All nodes removed are assigned with -shell value KS = 1. Secondly, we iteratively remove all nodes with degree DC = 2 until no node with DC = 2 exists in the network. All of these removed nodes are assigned with KS = 2. Next, we repeat this process until all nodes are removed and assigned with a corresponding -shell value. In the end, each node is defined by the KS index, according to its relative topological location in the network.
However, by using the -shell decomposition, too many nodes with different spreading influences turn out to be assigned to the identical KS index. To improve the monotonicity of the -shell decomposition, we propose most influential neighbors' -shell (MINK) index, which is inspired by the effect of the leaders on the social ties. Specifically, leaders in social networks share information, provide advice, assign work, and collaborate with other members in the networks. The influence of other members on the network is highly determined by their connection to the leaders. Therefore, we measure the spreading ability of a node based on the interaction of its influential neighbors characterized by the largestshell values. On one hand, a node will have a greater influence if its most influencing neighbor or itself has a higher value of KS; on the other hand, the effect increases as their distance shortens. In this way, the influence of node is measured by where is the shortest path distances between node and node and Λ is the set of spreaders with the maximum -shell value.

Empirical Results
Susceptible-Infected-Recovered (SIR) model [26] is a simulation process to mimic the epidemic spreading. It is widely used in identifying the spreading capacity of nodes by scholars and adopted in the study of vaccination strategy and infection control [27][28][29] as well. In principle, the SIR model detects the influential vertices due to the fact that key nodes are more likely to play an indispensable role in information Discrete Dynamics in Nature and Society 3  and viral transmission, and thereby an effective ranking is supposed to stand the test of real spreading coverage. Therefore, we employ the standard SIR model herein to evaluate the performance of our proposed model. It starts from setting a node as an infected node and the remaining nodes as suspected nodes. At each step, the infected node will infect its susceptible neighbors at the spreading rate and then it will recover with probability . The process continues until all infected nodes are recovered with no infected nodes left in the network. The spreading influence of a node can be obtained by calculating the number of infected nodes at the end of the process. In the paper, we set = 1 and ∈ (0, 0.1]. By using this relatively small infection probability, we avoid the situation where most nodes of a network will be inflected easily so that the different influence of each node cannot be detected. To check the performance of our proposed method, six real networks are introduced in this paper, including Dolphins (friendship) [30], USAir97 (US air flights network), C.elegans (neural) [31], Email (communication) [32], PGP (an encrypted communication network) [33], and Internet (router level). For simplicity, we view these networks as simple undirected and unweighted networks. The statistical properties of the six real networks are listed in Table 1, including the number of nodes , edges , the degree heterogeneity , the degree assortativity , the clustering coefficient ⟨ ⟩, and the average shortest path length ⟨ ⟩.
Next, applying these real networks, we compare the effectiveness of our proposed method with degree centrality, the -shell method, betweenness centrality, and closeness centrality. Both the resolution and correctness of these different ranking methods are studied, respectively.
First, following the literature [24], we define the monotonicity index ( ) to quantify the resolution of different ranking methods, as follows: where is the size of the network and is the number of the nodes with the same ranking result when implementing an algorithm. By definition, a ranking method with the monotonicity index closer to 1 has a higher resolution to distinguish nodes' different influence. If ( ) = 1, the ranking method is perfectly monotonic, and each node is identified by a different index value. The monotonicity indexes for different ranking methods are summarized in Table 2. The results suggest that our proposed method can generate higher resolution values than degree centrality, the -shell method, and betweenness centrality do in all six of the real networks. ( ) is close to 1 in networks C.elegans, Dolphins, and Internet. In addition, we find out that although the -shell method may identify the most influential spreaders, its resolution is relatively low in these six networks, implying that the different influences of spreaders are not classified. This means that it is necessary to develop alternative methods to overcome the disadvantage of the -shell method.
Secondly, Kendall's tau rank correlation coefficient ( ) [34] is used to quantify the correctness of the ranking methods. Let < . If = or = , the pair is neither concordant nor discordant. Kendall's tau coefficient is defined as where 1 is the number of concordant pairs, 2 is the number of discordant pairs, and is the size of a network. Kendall's tau is within [-1, 1], and the large values imply a higher level of correlation between the SIR model and the compared method. Kendall's tau is affected by the network infection rate. In this paper, we set the infection rate ∈ (0, 0.1] to derive Kendall's tau different under infection rates. Note that the inflection rate cannot be too large, because, with a large , the whole network will be easily infected so that the influences of different notes cannot be distinguished. The average values of Kendall's tau under ∈ (0, 0.1] for different ranking methods are summarized in Table 3. The results indicate that our proposed model outperforms existing models generally and it is effective especially in networks USAir97 and C.elegans. We also show how Kendall's tau changes in the infection rate for different methods in Figure 1. As described in Figure 1, in most cases, our proposed method achieves a better performance than other methods. As the infection rate increases, Kendall's tau using the -shell method is positively correlated with the value using our method generally, but the former is less than the latter . This implies that our method yields higher correctness than does theshell method. Besides real networks, we also check the effectiveness of our methods on a typical synthetic network using the Barabási-Albert (BA) model [35]. Creating the BA network starts with a network with m 0 nodes. Then, at each step, a new node is added to the network and connected to existing Discrete Dynamics in Nature and Society 5 m (m < m 0 ) nodes according to the preferential attachment mechanism. In this paper, we set m = 3 and m 0 = 1000 . For the BA network, we calculate Kendall's tau rank correlation coefficients for DC, CC, BC, and our proposed model. Figure 2 shows that MINK performs better than CC and much better than DC and BC. Note that all nodes in the BA network are assigned with the same -shell value, so we do not consider the -shell method in our comparison. The average tau values using different methods are listed in Table 4. The results indicate that our model outperforms existing models.

Conclusion
In this paper, we propose the MINK index to measure the ability of spreaders in complex networks using the neighbors with the largest -shell values. Our method is based on the facts that a node's spreading ability is proportional to the -shell values of itself and its most influential neighbors and decreases with the distances between itself and these neighbors. By using real networks and a synthetic network using the BA model, we compare our method with the degree centrality, the betweenness centrality, the closeness centrality, and the -shell decomposition method. The empirical results suggest that our method produces a more monotonic ranking than the degree centrality, the -shell method, and the betweenness centrality in all six real networks. Moreover, in most cases, the ranking result of our method is highly correlated with the epidemic spreading range compared with other well-known methods.
Some limitations of our method need to be addressed. First, we only investigated the performance of our method in some typical networks and the classical SIR model was used to mimic the epidemic spreading process. In practice, the structure of a network and spreading dynamic can be different. Thus, the effectiveness of this method needs to be tested more generally. Second, our MINK index is weighted by the distance between a node and its most influential neighbors, but this distance cannot be calculated if these nodes are not connected. Therefore, our method is not appropriate in identifying spreaders' influence in an unconnected network.

Data Availability
The Dolphins and Internet network data used to support the findings of this study are available from Mark Newman's network data repository (http://www-personal.umich.edu/∼ mejn/netdata/). The C.elegans, Email, and PGP network data used to support the findings of this study are available from the Alex Arenas' data sets (http://deim.urv.cat/∼alexandre .arenas/data/welcome.htm). The USAir97 used to support the findings of this study is available from Vladimir Batagelj and Andrej Mrvar (2006) Pajek datasets. (http://vlado.fmf.uni-lj .si/pub/networks/data/).

Disclosure
Any errors in the work are our own with no responsibility on the funders.

Conflicts of Interest
The authors declare that they have no conflicts of interest.