Identifying Vulnerable Nodes of Complex Networks in Cascading Failures Induced by Node-Based Attacks

1 College of Mathematics and Information Science, Shandong Institute of Business and Technology, Shandong, Yantai 264005, China 2 School of Computer Science, National University of Defense Technology, Hunan, Changsha 410073, China 3 Information Security Center, Beijing University of Posts and Telecommunications, P.O. Box 145, Beijing 100876, China 4National Computer Network Emergency Response Technical Team/Coordination Center, Beijing 100029, China


Introduction
In modern society, people's life depends on the infrastructure networks more and more, such as the power grid, Internet, transportation networks and the financial networks, and so forth.The overall efficiency of these network systems is being increased, while the internal connections and the dynamical characteristics within the networks are becoming more close and complex, respectively.These behaviours make the networks more vulnerable and increase the possibility of system crash.Especially, with the improvement of netwarebased degree, a small incident, through a cascade of reaction, can lead to the collapse of the whole network systems and a great number of economic loss.The typical example is the accident that emerged in the power grid of North America in 2003 [1].The fault of three extra-high voltage transmission lines leads to a chain reaction in power system, spreads to the eight states in the northeast U.S., affects about 50 million people, and finally results in the economic loss of 4 billion to 10 billion.Another example is the electrical collapse that occurred in Italy [2], which has seriously influenced the operation of the Internet.
These large-scale accidents have threatened the network safety and attracted considerable attentions of scientific researchers [3].On the one hand, The robustness and vulnerability of topological structure of complex networks are investigated carefully.Some indexes are proposed to measure how robust or vulnerable the complex networks are under different attacks [4][5][6].The structural vulnerability of the important real-world networks is also probed, including Italian electrical network [7], the US power grid [8], the European grid [9], and other electrical systems [10] and the robustness of computer networks under attacks [11], the stability analysis for the uncertain systems [12][13][14], and the cyber-physical networking systems [15].
On the other hand, there exist the physical flows (also defined as load) in real-world networks, such as the electric stream in power grids, the data transmitted in communication networks, and the cars in transportation networks.Also, the physical load is dynamical.The fault of some local components (nodes or edges) always leads to the redistribution of load over the whole networks.Then, the overloaded components will fail as the new load on them exceeds their capacity (the maximum load).Therefore, the new redistribution of load over the whole networks will begin and lead to the cascade over the networks.This evolving procedure is called "cascading failure" which emerged locally and always resulted in the whole collapse of networks.Therefore, the cascading failures of complex networks have been one of the hottest topics in network safety.Induced by random breakdown and intentional attack, Motter explored the condition of cascading failures occurring in complex networks [16].The similar procedure of load redistribution included the works [17][18][19][20].For the local redistribution of load on nodes or edges, the cascading dynamics due to node overloaded breakdown in US power grid [21] and the edge overloaded breakdown in scale-free networks [22] are investigated, respectively.Also, the condition of cascading failures in weighted networks under edge-based attack is analyzed carefully [23], where the redistribution of load on edges is similar to the process [22].Then, the influence of different definitions of load on cascading failures in weighted complex networks is probed to reduce the possibility of cascading failures [24].The features and time characteristics shown in cascading failures are revealed [25].In addition, recently, considering that the networks in real world are interdependent and internally connected, the cascading propagation in the interdependent networks is investigated [26][27][28][29] and the robustness and critical effect of networks are also explored carefully [30,31].
All the previous researches mainly focus on the structural vulnerability or the cascading dynamics of the integral complex networks.However, in the cascading failures caused by overloaded breakdown, the following important problems have not been considered.What features do the nodes easy to break down or crash have?How should we describe the characteristics of the congested node in complex networks?These problems are the correlations between the vulnerability of nodes and the cascading failures in complex networks.We argue that, by exploring this problem, we are able to identify the vulnerable nodes, analyze the potential safety hazard in networks, and find the bottlenecks in the dynamical change of network "flows".Finally, it can provide the important theoretical basis for protecting complex networks and improving the robustness of real-world networks.
Here, this paper explores the correlations between the vulnerability of nodes and the cascading failures in complex networks.Firstly, by assigning the load on nodes, we model the cascading dynamics of complex networks induced by random attack and intentional attack on nodes.Secondly, we introduce four kinds of weighting methods to describe the characteristics of the nodes in complex networks including BA scale-free networks (SF), WS small-world networks (WS), ER random networks (ER), and two real-world networks (autonomous system network and US airport network).Finally, in order to identify the features of the vulnerable nodes in complex networks, we numerically computed the ratio of the failed nodes with four kinds of weights less than their respective average weights to the total failed ones in networks.As a result, we find that, for SF network under intentional attack, the ratio of the failed nodes with small value of the fourth kind of weight defined here is the highest.The autonomous system network with power-law distribution also shows similarity to SF network.It reveals that the nodes with small value of the fourth kind of weight defined in this work are more vulnerable.Moreover, for the WS small-world network and ER random network, under both random and intentional attack, when the tolerance ability of nodes is low, the ratio of the failed nodes with small value of the fourth kind of weight is higher, and, at the same time, the ratio of the failed ones with high degree is also higher.It means that the nodes with smaller value of the fourth kind of weight and large degree are vulnerable and the nodes with large degree are also vulnerable.
The rest of this paper is organized as follows.Section 2 develops the model of cascading dynamics of complex networks induced by random attack (RA) and highest load attack (HL) on node.In Section 3, we introduce four kinds of weighting methods to characterize the nodes in complex networks in order to distinguish them.In Section 4, we describe the studied complex networks including BA scalefree network, WS small-world networks, ER random network, and two real-world networks.In Section 5, we simulate and analyze the characteristics of the failed nodes of complex networks in cascading failures.Section 6 summarizes the most important contribution of this paper and points out the meaning of this work.

Modelling the Cascading Failures in Complex Networks
In this section, we will model the cascading dynamics of complex networks under node-based attacks.For a general undirected network comprising of  nodes, its adjacent matrix is defined as  = (  ) × , where   = 1 if the node  links to node ; otherwise,   = 0. Usually, modelling cascading dynamics of complex networks is based on the following three key points [19][20][21][22][23]: the definition of load on node, the relationship between the load and the capacity, and the evolving procedure of cascading failures.
(1) The definition of load on node: usually, the physical flows (data packets, energy, etc.) are transmitted in many networks according to the shortest path routing strategy [1,3,16,17].For a given pair of nodes (, ), the physical flows are exchanged and transmitted along the shortest paths connecting them; maybe there exist some shortest paths through node .In this case, it is natural to regard the total number of shortest paths passing through the node  between any pair of nodes in a network as the load on node .Therefore, we define   () as the load on node , where   () is the number of the shortest paths passing through node  at some time  after attacks ( = 0 means the initial load   (0) before attack).
(2) The relationship between the load and the capacity: usually, there is some maximum load (the maximum capacity) that node  can handle.So, we assume that the maximum load   on node  is proportional to its initial load   (0); namely, where the constant 0 ≤  ≤ 1 is the tolerance parameter.The bigger  means the higher capacity of node  and then the higher ability against failures.
It is a rational definition in the design of real-world networks including power grids and Internet because the capacity of the components (nodes or links) in these networks is always limited by the cost.
(3) The evolving procedure of cascading failures: beginning with the removal of some nodes in networks, the load on other nodes will change and be redistributed over the whole networks.At some time , the node  will fail if the new load   () on  exceeds its capacity   .This will cause the new redistribution of load over the networks.This process is iterated until there is no node exceeding their capacity.At this time, the iterative process can be regarded as being completed.This iterative process is called "cascading failures" in complex networks, which is described in Figure 1.
Here, we consider two kinds of attack strategies.
(1) Random attack (RA): we choose some proportion of nodes randomly and then remove them from the networks; here we assume the proportion  = 0.01.This attack mainly simulates the case of complex networks subject to some random breakdown, such as the natural disasters, misoperations and random disturbances, and so forth.
(2) Highest load attack (HL): first, we descend the order of nodes according to the initial load   (0) and then remove some proportion of nodes with the highest initial load; here, the proportion  = 0.01.This attack simulates the case of the intentional attack.

The Weighting Methods of Nodes in Complex Networks
In order to identify the vulnerable nodes of complex networks in cascading failures, in this paper, we introduce four kinds of weighting methods  (1)   ,  (2)   ,  (3)   , and  (4)    to describe the characteristics of nodes in complex networks.These quantities can distinguish the failed nodes in cascading failures.
(1) The Weighing Method  (1)   .Considering that we assume the load (physical flows) is transmitted according to the shortest path strategy, while the initial load   (0) of node Delete some nodes at t = 1 Compute L i (t) for all nodes in the largest component Yes No End is the number of the shortest paths, thus the initial load can describe the characteristic of node.Here, we define the first weighting method  (1)   as (2) The Weighing Method  (2)   .In the research of complex networks, the degree   of node  is always used to describe its feature, which can measure the importance of node in complex networks.Usually, the node with higher degree means higher importance, and it will become the hubs in networks.Thus, we use it to describe the characteristic of node ; namely, where Γ  is the set of the neighbors of node .
(3) The Weighing Method  (3)   .In the investigation of cascading dynamics, the product (    )  of the degrees   and   of the two end nodes of an edge   can measure the weight of an edge   .Wang shows that the networks with  = 1 have the strongest robustness against cascading failures [23].Thus, in this paper, we define the third weighting method as: where Γ  is the set of the neighbors of node , and here we assume the parameter  = 1.Furthermore, according to the theory of the degree of networks and probability [32], as  = 1, the second term on the right hand side of (4) will become where  min and  max are the minimum degree and the maximum degree in a network, respectively.(  |   ) is the conditional probability that the node with degree   links to a neighbouring node with degree   , and this conditional probability satisfies the normalized and equilibrium conditions: Since the BA scale-free networks, WS small-world networks, and ER random networks have no degree-degree correlation [33], according to the conditions in (6), one can get where ⟨⟩ is the average degree of networks.Therefore, inserting (7) back into (5), we get Now, finally ( 4) is simplified as One can see that the weighting method  (3)   is different with  (2)   obviously.
(4) The Weighing Method  (4)   .Here, we introduce the fourth kind of weighting method based on node centrality betweenness.The link is always important as the two nodes of its end are important in many real-world networks.For example, the packet has always been transmitted along the links with the important chosen nodes, while the node betweenness centrality is used to describe the importance of nodes in networks [34].Considering this intuition, usually, the product (    )  is used to measure the weight of the edge   [24], where   and   are the node betweenness of node  and , respectively.The node betweenness of node  is defined as where   is the number of the shortest paths between node  and .  () is the number of the shortest paths passing through the node  in the shortest paths   .Therefore, we define the fourth kind of weighting method  (4)   as the follows: where Γ  is the set of the neighbors of node , and here we assume the parameter  = 1.
Using Bayes' rules [32], (11) can become where  min and  max are the minimum and the maximum node betweenness in networks, respectively.(  |   ) is the conditional probability that a node with node betweenness centrality   links to the node with node betweenness centrality   .
Considering that it has been shown that small-world networks do not show betweenness-betweenness correlations [24], therefore, we can assume (B  |   ) = (  ).Then, (12) can be simplified as where ⟨⟩ is the average node betweenness in networks.Now, it is obvious that the four kinds of weighting methods of node introduced here can describe the characteristics of nodes and distinguish the nodes in network.

The Studied Complex Networks
In this paper, to investigate the vulnerable nodes in networks subject to cascading failures, we mainly take the following typical complex networks into account: Barabasi-Albert scale-free networks (SF), Watts-Strogatz small-world networks (WS), and ER random networks (ER).
(1) Scale-free networks (SF): SF network model in this paper is generated according to the two rules: growth and preferential attachment [35].The degree distribution of the generated SF network obeys the power law distribution () ∼  − ( = 3) and the mean degree ⟨⟩ ≈ 4.
(2) WS small-world networks (WS): here, according to Watts-Strogatz model [36], we generate the smallworld network by changing the rewiring probability .We mainly consider the two cases with the rewiring probability  = 0.1 and  = 0.5.It should be noticed that the rewiring probability  = 1 means that the WS network will become a completely random network.
(3) ER random networks (ER): the random network model studied is generated according to the rules in [37], where we control the average degree ⟨⟩ ≈ 4.
Also, in order to compare with the network models, we consider two real-world networks: the autonomous system network (AS) and US airport network (US airport).
(4) The autonomous system network (AS): from the AS level topology of Internet, the Internet can be seen as a network comprising of routers.Usually, the data is transmitted between routers according to BGP protocols.Thus, the routers (as nodes) and links construct the autonomous system network (AS) [38].Here, we take the AS network with 1470 nodes, for example, and the mean degree ⟨⟩ ≈ 4.3.By computation, we find that the degree of distribution of AS network clearly obeys power-law distribution: () ∼  − , where the index  ≈ 0.005.
(5) US airport network: as an example of transportation networks, we study the famous USA airport network with 500 airports and 2980 links [39].The mean degree ⟨⟩ ≈ 11.9.

The Simulation and Analysis
Now, in this section, concerned with two kinds of node-based attacks, we will investigate how to identify the vulnerable nodes of complex networks subject to cascading failures.The studied networks include SF, WS, and ER complex networks models and two real-world networks.Firstly, we use the relative size of nodes in the largest component of network () to quantify the integral robustness of complex networks under cascading failures.The metric  is defined as where   and  are the number of nodes in the largest component after attacks and the total number of nodes in network, respectively.Obviously, the metric  can be seen as a function of the tolerance parameter  and 0 ≤  ≤ 1.Also, it should be noticed that, with the higher , the network maintains higher  connectivity and shows higher robustness against cascading failures.Secondly, to identify the features of the vulnerable nodes in complex networks, we numerically computed the ratio of the failed nodes with four kinds of weights less than their respective average weights to the total of the failed ones in networks when the iterative process of complex networks in Figure 1 is stopped (at this time, the cascading failures are completed); namely, where  () is the average value of the th kind of weight defined in Section 3 in network ( = 1, 2, 3, 4).() is the number of failed nodes at time step  after attacks in network.
One can see that the ratio in (15) can mainly distinguish the characteristics of the failed nodes in complex networks.

The Analysis of Complex Networks Models.
In this part, induced by random attack (RA) and the highest-load attack (HL), we mainly focus on analyzing the integral robustness and identifying the vulnerable nodes of three kinds of typical complex networks models: scale-free networks (SF), WS small-world networks (WS), and ER random networks (ER).
(1) From the relative size of nodes in the largest component , as shown in Figure 2, being subject to intentional attack (HL), SF network and WS small-world network with  = 0.1 are more vulnerable, while WS network with  = 0.5 and ER random network are more robust.being subject to random attack (RA), SF network model is more robust.It means that SF w (1)   i w (2)   i w (3)   i w (4)    network models show the dual characteristics of both robustness and vulnerability.
(2) From the ratio in (15), under HL attack, as shown in Figure 3, for SF network models, the ratio of the failed nodes with small weighting value to the total failed ones is always more than 50%.Especially, we should notice that the highest ratio is the one of the failed nodes with small  (4)  ( (4)    <  (4) ) and it is always more than 90%.In addition, the second highest is the ratio of the failed ones with small  (3)   .These results reveal that, under intentional attack, the nodes with small  (4)   are more vulnerable.For WS small-world network, from Figure 3(b), under HL attack, as the rewiring probability  = 0.1, the ratio of the failed nodes with small  (4)   is almost more than 80% and also the ratio of the failed nodes with small  (2)    is less than 20% (it implies that the ratio of the failed nodes with big  (2)    is more than 80%).At the same time, for  = 0.5, since  = 0.2 is the transition point of the connectivity of WS network from low to high (see the arrow in Figure 2(c)) and there are few failed nodes when  > 0.2, here, we mainly focus on the case of  < 0.2.As shown in Figure 3(c), for  < 0.2, the ratio of the failed nodes with small  (4)   is more than 70%.Also, the ratio of the failed nodes with small  (2)   is less than 40%.Now, obviously, we can see that, under intentional attack, for WS small-world network, the nodes with small  (4)   are more vulnerable and also the ones with big  (2)   (namely, the nodes with high degree) are easy to break down.w (1)   i w (2)   i w (3)   i w (4)      For ER random network under HL attack, as  < 0.2 ( = 0.2 is the turning point of the connectivity from low to high; see the arrow in Figure 2(d)), ER random network shows similarity to WS small-world network; namely, the nodes with small  (4)   are more vulnerable and also the ones with big  (2)   are easy to break down.
(3) Under RA attack, there are the failed nodes only for small  (see the curves in Figure 2); thus, we only consider the case of small . Figure 4 shows that most of the failed nodes in SF network are still the ones with small  (4)  .While WS small-world network and ER random network show similarity to their case under HL attack; namely, the nodes with small  (4)   are more vulnerable and also the ones with big  (2)   (the nodes with high degree) are easy to break down.

The Analysis of Real-World Networks.
In order to compare with the simulations of the network models, we also analyze two real-world networks: the autonomous system network (AS) and US airport network.
As shown in Figure 5, both AS network and US airport network are very robust under RA attack and vulnerable under HL attack.Especially, AS network is more vulnerable under HL attack.Then, in the following discussion of this part, we only consider the case under HL attack because of their strong robustness against random disturbance.
From Figure 6(a), it is obvious that AS network with scale-free characteristics shows similarity to SF network model, and also the ratio of the failed nodes with small weighting values to the total failed ones is always more than 50%.Especially, the ratio of the failed nodes with small  (4)    ( (4)    <  (4) ) is highest and always more than 80%.The second highest is the ratio of the failed ones with small  (3)   .It reveals that, for SF networks under intentional attack, the nodes with small  (4)   are more vulnerable than other nodes and these nodes are easy to break down.
For US airport network, as shown in Figure 6(b), similarly, this ratio of the failed nodes with small  (4)   is highest and it is more than 90%.Also, the nodes with small  (4)   are more vulnerable under intentional attack.

Conclusions
In the research on cascading dynamics, finding and distinguishing the vulnerable nodes of networks are very important for the protection of infrastructures systems, but the traditional research on the vulnerability of complex networks has not considered this.This paper mainly probes the question of how to identify the vulnerable nodes of complex networks in cascading failures caused by the overload on nodes.We model the cascading dynamics of complex networks induced by deleting some proportion of nodes that are chosen randomly or intentionally.Then, four kinds of weighting methods of node are introduced to distinguish the failed nodes of complex networks, including BA scale-free networks, WS small-world networks, ER random networks, and two realworld networks.The main contributions of this paper are as follows.
(1) For SF networks, under HL attack, the nodes with small  (4)   are most vulnerable and the ones with small  (3)   are also easy to break down.The simulation of the autonomous system network (AS) with powerlaw distribution also verifies our findings.However, the weight  (4)    involved in computing the node betweenness needs to know the whole structure of networks.In fact, The complexity of computing node betweenness is high, especially for large-scale networks.While, computing the weight  (3)   only needs to know the local structure of networks.Therefore, we should pay attention to the nodes with small  (3)   in distinguishing the vulnerable components of large networks.It should be pointed out that the recent research of Ercsey demonstrates that the local information can be used to approximately calculate the node betweenness of large-scale networks in order to reduce the complexity [40].
(2) For WS small-world networks and ER random network, when the tolerance ability of node is low, no matter under RA attack or HL attack, the nodes with small  (4)   are more vulnerable and also the ones with big  (2)   are easier to break down.The findings of this paper provide important theory basis for analyzing network security, mining the hidden potential risk of networks, and protecting various real-world networks with load assigned to nodes.

Figure 1 :
Figure 1: The iterative process of cascading failures in complex networks.

Figure 2 :
Figure 2: Under RA and HL attacks, the relative size of nodes in the largest component  as a function of  for (a) SF network, (b) WS small-world network with the rewiring probability  = 0.1, (c) WS network with  = 0.5, and (d) ER random network.The simulations under RA attack are averaged over 20 times.

Figure 3 :
Figure 3: Under HL attack, the ratio for different weighting methods as a function of  for (a) SF network, (b) WS small-world network with the rewiring probability  = 0.1, (c) WS network with  = 0.5, and (d) ER random network.

Figure 4 :
Figure 4: Under RA attack, the ratio for different weighting methods as a function of  for (a) SF network, (b) WS small-world network with the rewiring probability  = 0.1, (c) WS network with  = 0.5, and (d) ER random network.The simulations under RA attack are averaged over 20 times.

Figure 5 :
Figure 5: Under RA and HL attacks, the relative size of nodes in the largest component  as a function of  for (a) the autonomous system network (AS) and (b) US airport network.The simulations under RA attack are averaged over 20 times.

Figure 6 :
Figure 6: Under HL attack, the ratio for different weighting methods as a function of  for (a) the autonomous system network (AS) and (b) US airport network.