Risk and Resilience Analysis of Complex Network Systems Considering Cascading Failure and Recovery Strategy Based on Coupled Map Lattices

Risk and resilience are important and challenging issues in complex network systems since a single failure may trigger a whole collapse of the systems due to cascading effect. New theories, models, and methods are urgently demanded to deal with this challenge. In this paper, a coupled map lattices (CML) based approach is adopted to analyze the risk of cascading process inWattsStrogatz (WS) small-world network and Barabási and Albert (BA) scale-free network, respectively. Then, to achieve an effective and robust system and provide guidance in countering the cascading failure, a modified CML model with recovery strategy factor is proposed. Numerical simulations are put forward based on small-world CML and scale-free CML.The simulation results reveal that appropriate recovery strategies would significantly improve the resilience of networks.


Introduction
In modern society, many real-world systems, such as Internet, transportation network, and power grid system [1][2][3], can be described by complex networks.They have made a great contribution to social development.However, despite the benefits to people's life, risk or disadvantage of complex network system is also serious since a local failure may trigger a breakdown of the whole system due to cascading effect; many real-world examples have been witnessed around the world, for example, the power grid of North America in 2003 [4], in which a fault of three extrahigh voltage transmissions led to a cascading effect in power system affecting about 50 million people and caused an economic loss of 4 billion to 10 billion, the 2003 SARS plague [5], and the 2008 global financial crisis [6].
The traumatic experiences bring an urgent need of the study of risk and safety of complex network system which has been a hot topic for scientific researchers in recent years [7][8][9][10].Amongst these researches, cascading failure of complex networks has been one of the hottest topics.Numbers of important aspects of cascading failure in complex networks have been discussed by scholars recently.Many of them provide a good research on the assessment and modeling of cascading effect [11,12].
From the perspective of risk management, safety and resilience engineering, hazard, and risk should be identified and controlled timely and effectively.However, with the increasing of scale and complexity of modern complex network systems, the responsibility to manage an effective and safe system has been heavier than ever before.New theories, models, and methods are urgently demanded to protect the systems from an unacceptable risk condition.
Coupled map lattices (CML) are a dynamical system with discrete time, discrete space, and continuous state variables [13] which are initially used to study spatiotemporal chaos and then widely applied in various fields including but not limited to biology, mathematics, and engineering.In CML, the dynamical elements are situated at discrete spatial points while the time is discrete and the state is continuous.Each spatial element is coupled to its neighbors.Recently, CML has been widely investigated to model the dynamical behaviors and cascading failure in complex systems [14][15][16][17][18].To counter the impact of cascading failure and provide an effective prevention of large-scale breakdown, strategies such as redistribution and restoration have been discussed [19][20][21][22]; however there is few considering recovery strategy of failed node in the process of cascading failure.In other words, risk recognition and analysis are just a prime step; actions can be taken to achieve an acceptable resilience of the whole system [23].
The remainder of this paper is organized as follows.Section 2 proposes the risk and resilience analysis approach based on CML and a modified CML model with recovery strategy factor is discussed.In Sections 3 and 4, the smallworld CML and scale-free CML are adopted, respectively, to analyze the cascading failure and the effect of recovery strategy.Finally, Section 5 provides a conclusion of this paper.

Risk and Resilience Analysis Based on
Coupled Map Lattices where   () is the state variable of the th node at the th time step.The connection information of the nodes is given by an adjacency matrix  = (  ) × .If there is an edge between node  and node ,   =   = 1; otherwise,   =   = 0. () denotes the degree of node . ∈ (0, 1) represents the coupling strength.
Then,   () ≥ 1 at the th time step; other nodes would be impacted according to (1) afterwards and a cascading failure would be triggered.
Topological structure (static) and restoration strategy (dynamic) of the complex network would be fundamental to cascading effect, spread threshold, and scope [11,21,23].In CML, it is the coupling strength  and external perturbation  that get an important impact [16].However, coupling strength  is something like topological factor and external perturbation  is something like external attack that is not controllable once happened.Even though restoration strategy in weighted networks is discussed, there are few discussions about the recovery strategy in CML; that is, the failed units would be treated as failed evermore in the cascading process.So a factor of recovery strategy against cascading failure should be added in the research of cascading failure in CML.

A Modified CML Model with Recovery Strategy Factor.
As illustrated above, if the th node got failed at the th time step, that is,   () ≥ 1, then the state variable of the th node is identically equal to 0 after the th time step; that is,   () ≡ 0,  > .But in actual world, recovery strategies would be implemented at proper time after the malfunction, as illustrated in Figure 1; when the th node got failed at the th time step it would be recovered to normal after  time step while the states of its neighbors, that is, th and th nodes, have been changed to failed at the ( + )th time step.The timeliness and effectiveness of recovery would make a critical influence on the cascading process.In cascading, we take the reduction of failure rate and recovery number of failed nodes as the two metric parameters of the effectiveness of recovery.
In the following sections, three different recovery strategies with certain recovery time and one hybrid strategy with random recovery time are adopted to illustrate the effect of recovery strategy.
(1) Strategy 1: the failed node would be recovered immediately at the next time step ( = 1).

Analysis in Small-World Coupled Map Lattices
A small-world network is characterized by shorter average path length and higher clustering coefficient; that is, most nodes only connect to their nearest-neighbor nodes while only a few nodes have long-range connections with relatively distant nodes [24].It is proved that there are many realworld network systems subject to small-world feature, such as the transportation system [25,26] and electrical power grid [27,28].In this section, analyses in small-world CML are put forward to investigate the effectiveness of recovery strategies in different conditions.Firstly, the small-world network is generated according to Watts-Strogatz (WS) model [24] by changing the rewiring probability  based on a -regular network with  nodes arranged on a ring and 2 edges per node.It is noteworthy that the WS network will be a random network when the rewiring probability  = 1 and will be a regular network when the rewiring probability  = 0.Here we take  = 1000,  = 2, and  = 0.3.The node degree distribution of the generated small-world network is depicted in Figure 2.

Cascading Failure Affected by Random
External Perturbation

When No Recovery Strategy Is Considered.
In the smallworld CML, the coupling coefficient of CML is given:  = 0.3 and then a random node failure is triggered by 5 different external perturbations  ∈ (1,5).As shown in Figure 3, the cascading failure in small-world CML would be triggered after the initial failure when  ≥ 1, and the cascading process could be approximately divided into 3 phases:  1 ∈ (1, 3), the failure rate increases slowly;  2 ∈ (4, 6), the failure rate increases rapidly; and  3 ∈ (6,∞), the number of nodes would be stable.Noteworthily, the failure rate is just about 10% when  = 2 and would not be over 40% for other , implying that a few long-range connections are enough for a single node failure to trigger a cascading of failure on network in a few steps, but the scale is controllable in some way.

When a Recovery Strategy Is
Added.However, in actual life, it is unrealistic and irresponsible to let the disaster spread unbridledly.In contrast, a small-world CML with a recovery strategy is simulated to illustrate the difference.Here, strategy Failure rate 2 that is introduced in Section 2.2 with a certain recovery time is employed.Simulation result is shown in Figure 4.
In comparison with the result in Figure 3, the peak failure rates are reduced, and there is an obvious decreasing phase at the 6th or the 7th time step and then after about 2 time steps all of the nodes would return to normal.For each scenario, the recovery number of failed nodes is shown in Figure 5.
As calculated, the recovery number re number = 122, 236, 267, and 273, respectively.It could be found that the   is almost the same except for a little rising trend when the external perturbation  ≥ 3, but the recovery number would get a sharp increasing (nearly doubled) when the external perturbation is changed from  = 2 to  = 3.

When No Recovery Strategy Is Considered.
The coupling coefficient  represents the coupling strength of CML.In this part, its effect on cascading failure is discussed.Firstly, a perturbation  = 5 is added to a randomly selected node to trigger the cascading process, and  takes the values of 0.1, 0.3, 0.5, 0.7, and 0.9, respectively.As shown in Figure 6, cascading process can also be approximately divided into 3 phases:  1 ∈ (1, 4), the failure rate increases slowly;  2 ∈ (4, 6), the failure rate increases rapidly; and  3 ∈ (6,∞), the number of nodes would be stable.However, it is shown that the cascading failure process is little affected by coupling coefficient  as the final failure rates are all about 25% (less than 5% range) and the number of failed nodes will get into stability at the 7th time step.This may be explained by the fact that small-world model is highly coupled intrinsically with a higher clustering coefficient, and then the effect of coupling coefficient  of CML would contribute little in the small-world CML model.

When a Recovery Strategy Is Added.
In contrast, to illustrate the effect of recovery strategy in cascading failure of different coupling coefficients , the small-world CML with recovery strategy is modeled; as introduced in Section 2, strategy 2 with a certain recovery time is applied.The simulation result is shown in Figure 7.
In comparison with the result in Figure 6, the peak failure rates are all reduced, and all of the nodes would return to normal before the 8th time step.However, the result shown in Figure 7 reflects a difference of effect degree of different coupling coefficients  under the recovery strategy.The failure rate would be largely reduced (more than 50%) when  = 0.5 and 0.7.Further, as shown in Figure 8, the recovery number   = 225, 258, 146, 124, and 185, respectively.
It is clear that the   is also fewer when  = 0.5 and 0.7.This would be very helpful to provide guidance for failure prevention in small-world CML.

Cascading Failure Affected by Recovery Strategy.
As illustrated above, an added recovery strategy in small-world CML could significantly improve the robustness of network; a further study of the effect of different recovery strategy is put forward.As introduced in Section 2.2, four different recovery strategies are simulated, respectively (shown in Figure 9).For each simulation, the same perturbation  = 5 is added to a randomly selected node to trigger the cascading process and coupling coefficients take the same value  = 0.3.
Together with the recovery number re number = 2344, 2599, 3172, and 2472 (shown in Figure 10), it could be found that the small-world CML would achieve a lowest peak of failure rate with lowest recovery number by adopting recovery strategy 1 (RS1); that is, the failed node would be recovered immediately at the next time step.This result reveals that in small-world CML a timely recovery would be the best prevention from cascading failure since the failure could spread very fast due to the small-world feature of network.

Analysis in Scale-Free Coupled Map Lattices
Many real-world network systems could be described by scale-free network model, such as the Internet, WWW [29].The scale-free networks with degree distribution () give a power-law behavior; that is, () ∼  − , where () is the probability that the degree of a node in the network is equal to  and  is scale-free networks exponent assigned to a positive real number.Barabási and Albert (BA) argued that the generation of networks in the scale-free structure is based on two rules, growth and preferential attachment [29], according to which the BA scale-free model is constructed with the parameters  = 1000,  0 = 3, and Δ = 3; () ∼  − with  = 6.3 and  = 2.5.The degree distribution of the adopted scale-free network model is depicted in Figure 11 on log-log scales.

Cascading Failure Affected by Random
External Perturbation

When No Recovery Strategy Is Considered.
In the scalefree CML, a random external perturbation  ranging from 1 to 5 (five different levels) is added to a random node to trigger the cascading process and the coupling coefficient of CML is given:  = 0.3.As shown in Figure 12, the cascading failure would be triggered when  ≥ 1, and the cascading process can be approximately divided into 3 phases:  1 ∈ (1, 3 ∼ 4), the failure rate increases slowly;  2 ∈ (3, 6 ∼ 8), the failure rate increases rapidly; and  3 ∈ (6∼ 8, ∞), the number of nodes would be stable.What is more, the failure rate would be 100% when  ≥ 4 and more than 60% even  ≥ 2, revealing that some nodes with high node degree are enough for a single node failure to cause a collapse of the whole network in a few steps.

When a Recovery Strategy Is Added.
In contrast, the scale-free CML with a recovery strategy is adopted to illustrate the difference.Here, strategy 2 with a certain recovery time is employed.Simulation result is shown in Figure 13.
In comparison with the result in Figure 12, the peak failure rate would be all reduced since the recovery strategy works earlier at the second phase of cascading process when failure rate would increase rapidly.Failure rate As calculated in Figure 14, re number = 601, 991, 915, and 913, respectively.It could be found that the re number is nearly the same when the external perturbation  ≥ 3 while it is very few when  = 2.It can also be explained by the earlier work of recovery strategy effect in countering the cascading failure.

When No Recovery Strategy Is Considered.
In this section, the effect on cascading failure of coupling coefficient  in scale-free CML is studied.Here,  takes the values of 0.1, 0.3, 0.5, 0.7, and 0.9, respectively, and the external perturbation  = 5 to trigger the cascading failure (shown in Figure 15).The cascading process appears to be 3 different phases approximately: that is,  1 ∈ (1, 3), the failure rate increases slowly;  2 ∈ (3,6), the failure rate increases rapidly; and  3 ∈ (6, ∞), the number of nodes would be stable.Approximately, the failure rate would be 100% when  = 0.3, 0.9 and 90% ∼95% when  = 0.5, 0.7 and only little higher than 80% when  = 0.1, implying that the cascading failure process would be affected by coupling coefficient  in scale-free CML in some way.

When a Recovery Strategy Is Added.
In contrast, scalefree CML with recovery strategy 2 is modeled to illustrate the effect of recovery strategy with different coupling coefficients .The simulation result is shown in Figure 16.
In comparison with the result in Figure 15, the peak failure rates are reduced relatively, which is obvious (about 20%) when  = 0.3 and 0.9 and nearly did not change when  = 0.1, 0.5, 0.7.All the nodes would return to normal before the 8th time step.Further, the recovery number of failed nodes is shown in Figure 17.
As calculated, re number = 993, 1015, 1008, 950, and 877, respectively.It could be found that the   is nearly the same with different coupling coefficients.

Cascading Failure Affected by Recovery Strategy.
In this section, a further study of effects of different recovery strategy in scale-free CML is put forward.Four different recovery strategies with the same perturbation  = 5 and coupling coefficients taking the same value  = 0.3 are simulated, respectively.As shown in Figure 18, recovery strategy 1 (RS1) would extremely lower the peak failure rate by about 40% compared with Figure 12.Further, as calculated in Figure 19, the recovery number re number = 1226, 1010, 900, and 990, respectively, revealing that the   would increase with the decreasing of recovery time.So, in this way, a tradeoff between failure rate and recovery number would be necessary in the selection of recovery strategy in scale-free CML.

Conclusions
This paper analyzes the cascading failure in Watts-Strogatz (WS) small-world network and Barabási and Albert (BA) scale-free network based on coupled map lattices (CML), respectively.A modified CML model with recovery strategy factor is proposed to manage the risk of cascading failure.Cascading failure triggered by random external perturbation , coupling coefficient , and recovery strategy RS is simulated and discussed, and the main findings and contributions of this paper are as follows: (1) For both small-world CML and scale-free CML, the cascading failure would be triggered after the initial failure when  ≥ 1, and the cascading process could be approximately divided into 3 phases; however, the failure rate is just about 10% when  = 2 and would not be over 40% for other  in small-world CML while it would be 100% when  ≥ 4 and 80% even  ≥ 2 in scale-free CML.
(2) The cascading failure would be little affected by coupling coefficient  as the final failure rates are all about 25% (less than 5% range) in small-world CML but a little more in scale-free CML (about 20% range).

Mathematical Problems in Engineering
(3) For both small-world CML and scale-free CML, when a recovery strategy is added, cascading failure would be significantly affected with an obvious reduction of failure rate in all conditions.But it reveals a difference that a timely recovery in small-world CML would be the best strategy by achieving a lowest peak of failure rate with the lowest recovery number while a tradeoff between failure rate and recovery number should be considered in the selection of recovery strategy in scale-free CML since recovery number would increase with the decreasing of recovery time.
The findings of this paper provide important theory basis for guidance in countering the cascading failure, managing the hidden potential risk of networks, and helping achieve an effective and robust system.

Figure 1 :
Figure 1: An example of the cascading process with recovery strategy.

Figure 2 :
Figure 2: Node degree distribution of the small-world network model.

Figure 3 :Figure 4 :
Figure 3: The relationship between failure rate and time step with different perturbations  in small-world CML with no recovery strategy.

Figure 5 :
Figure 5: The relationship between recovery number of failed nodes re number and perturbation .

9 Figure 6 :
Figure 6: The relationship between failure rate and time step with different coupling coefficients  in small-world CML with no recovery strategy.

9 Figure 7 :Figure 8 :
Figure 7: The relationship between failure rate and time step with different coupling coefficients  in small-world CML with certain recovery strategy.

4 Figure 9 :Figure 10 :
Figure 9: The relationship between failure rate and time step with recovery strategies in small-world CML.

Figure 11 :
Figure 11: Node degree distribution of the scale-free network model.

Figure 12 : 13 :
Figure 12: The relationship between failure rate and time step with different perturbations  in scale-free CML with no recovery strategy.

Figure 14 : 9 Figure 15 :
Figure 14: The relationship between recovery number of failed nodes re number and perturbation .

9 Figure 16 :Figure 17 :
Figure 16: The relationship between failure rate and time step with different coupling coefficients  in scale-free CML with certain recovery strategy.

4 Figure 18 :Figure 19 :
Figure 18:  The relationship between failure rate and time step with recovery strategies in scale-free CML.