Measurement and Analysis of MANET Resilience with Fault Tolerance Strategies

Resilience is usually considered as the ability of network fault tolerance. To improve the resilience of MANET, fault tolerance strategies such as routing protocols are usually employed which will impact resilience of MANET. For resilience measurement and fault tolerance strategies’ efficiency evaluation, the impact of fault tolerance strategies deserves a detailed study.However, the general MANET resiliencemeasurementmethods do not consider the fault tolerance strategies as individual resilience influence factors, let alone reflecting the interplay among strategies that deployed on different network layers.Thus, it results in a limitation on efficiency assessment of fault tolerance strategies. In this paper, it models fault tolerance strategies for MANET resilience measurement with considering strategies as individual resilience influence factors. Firstly, through analyzing the features of fault tolerance strategies that deployed on physical and logical layers of network, we built a hierarchical network model to describe the resilience impact of strategies. Then, based on this network model, we proposed fault tolerance strategies model to measure resilience of MANET. Particularly, the model can well support the interplay study among different strategies through contrasting the quantitative value defined by strategy model. At last, a case study was given for verification and analysis.


Introduction
MANETs are playing more and more important roles in the areas of disaster relief, such as environmental sensing and monitoring, intelligent transportation, and military tactical system [1,2].The movement and wireless features assign MANET more uses in a variety of areas; thus high reliability and fault tolerance ability are required.Due to this demand, several fault tolerance strategies based on logical layer and structural connectivity were proposed.The structure based fault tolerance strategies are ensuring the connectivity of physical layer.That is, Bahramgiri et al. [3] proposed an algorithm of transmission power control for wireless networks to keep -connectivity; hence it can ensure the connectivity of physical layer while network is under failure challenge.Besides, the scale-free networks, whose degrees are subject to power distribution, were found having higher robustness with random failure of nodes [4,5].The main concept of this kind of fault tolerance strategy based on inhomogeneity of network structure is one of the features of scale-free networks [6][7][8][9][10][11].For logical fault tolerance strategies, routing protocols have to be considered as logic based fault tolerance strategies of MANET [12].They can be categorized by the number of paths: multipath routing protocols create and maintain multiple routes and deliver multiple copies of packets through all possible (disjoint) routes between a pair of source-destination nodes [13].Single-path routing protocols have a less ability of fault tolerance than multipath route, but it also can detect the fault nodes and choose the good path by maintaining the route map [14,15].Since there are plenty of fault tolerance strategies for MANET which deployed on different network layers and fault tolerance strategies are usually combined as hybrid fault tolerance strategies for improving fault tolerance ability, network resilience assessment is one of the challenging problems in MANET.Since this problem is important for network designing and reliability enhancements, various resilience measurement methods have been proposed in literature.
1.1.Motivations.In recent years, MANET resilience measurement has been well studied by many scholars.The resilience is usually considered as the fault tolerance ability of network [16].However, one of the most important aspects of the problem not been carefully addressed is that interplay among fault tolerance strategies is not completely considered as impact factors in resilience measurement process [16][17][18].In the case of measuring methods for network resilience, the methods based on performance are poor in measuring resilience for huge scale network due to the defects on complexity and overhead; the methods based on network structure solve the complexity problem but are weak on accuracy without considering routing protocols as resilience impact factors.Particularly, both of these measurement approaches rarely take the resilience impact of different strategies interplay into account; that is, the structure based fault strategies may affect the fault tolerance ability of routing protocols.Thus, the existing measurement methods are insufficient to help network reliability enhancements through fault tolerance strategies design.With the purpose of obtaining a resilience measurement method that is either effective or accurate enough, the modeling of fault tolerance strategies considering the interplay among fault tolerance strategies based on network structural and routing protocols (logical factors) should deserve detailed research.Therefore, in this paper, we focus on addressing the issue of MANET resilience measurement method considering the impact factor of logical fault tolerance strategies and giving an approach that can analyze the interplay among different strategies based on modeling of fault tolerance strategies.

Related Works.
General network quantitative resilience measuring approaches, which include performance based methods and structural based methods [16], are the basis of assessing fluctuation of performance and topology under network failures.The performance based methods consider performance metrics fluctuation; that is, packets delivery ratio and so on depend on the whole network operation modeling, including packets generation and delivery process.Besides, routing protocols are considered as packets delivering rules without resilience influence factors for performance based methods [17,19,20].So they have high complexity and overhead while handling huge scale network resilience measurement and are inefficient to support network designing and development of routing protocols.To deal with these problems [21], in the past few years, there were a plethora of proposed structural based approaches and metrics measured network resilience [16,[21][22][23].The common structure based methods depend on robustness evaluation of structure topology against attacks and failures.In network science area, a series of metrics which are based on complex network robustness and corresponding methods were proposed: Sydney et al. [24] proposed a network resilience assessment method based on algebraic connectivity; Alenazi and Sterbenz [21] introduced a method to evaluate the network resilience based upon spectral measure metrics, which is much more accurate than connectivity metrics of robustness assessment [25].Although a number of MANET routing protocols are designed with fault tolerance feature, they have not been considered as resilience impact factors for resilience measurement yet.Comparing with methods based on performance metrics, the methods based on topology and structure metrics simplify a larger extent on the packet generation and delivery process which greatly reduce the complexity; nevertheless, they are problematic in accuracy because of ignoring routing protocols as resilience impact factors.
In recent years, some researches who study on structural based resilience evaluation methods [18,26] began to aware the resilience impact of routing protocol since routing protocol is a main approach to perform fault tolerance against attack and failures.It is noteworthy that Rohrer et al. [26] are the first ones who found that multipath routing protocols applied network has higher resilience than the network applied single-path routing protocols.However, they did not further study the resilience impact of interplay among different fault tolerance strategies.Although the recent researches start to be aware of the impact factors of resilience from both structure and logic based fault tolerance strategies, the resilience impact from interplay among different strategies was still ignored.Actually, the interplay of fault tolerance strategies should be paid more attention while designing a network.Garzón et al. [27] proposed that the routing protocols would limit the fault tolerance ability of Fat-Tree topology, and Bermudez Garzon et al. [28] proposed an adaptive routing protocol to maximize the fault tolerance ability of Fat-Tree topology.Therefore, recognizing and modeling the interplay of different fault tolerance strategies have important significance for MANET resilience measurement.This work not only can enhance the effectiveness and accuracy of resilience measurement method but also makes the measurement results more valuable for network designing process.

Contributions.
In this paper, we propose a network strategy model for MANET resilience measurement and analysis of some problems through considering interplay among strategies.Firstly, for network fault tolerance mechanism description, according to study of MAMET time-spatial feature, we constructed the hierarchy network model, and then, based on this model, we modeled the fault tolerance strategies for each layer of this hierarchy model.Especially for logical fault tolerance strategies model, it solved the accuracy issues of previous resilience measurement method, which was ignored as a resilience impact factor.Secondly, based on the proposed model, we proposed a resilience measurement method considering not only structural resilience impact but also logical resilience impact that are from different layers of the hierarchy model.The measurement method has higher accuracy than general structure based methods and higher efficiency than general performance based methods.The most important advantage of the proposed method is that the resilience of physical layer and logical layer can be described and analyzed individually according to the proposed strategies model, so the interplay among fault tolerance strategies can be obviously observed and analyzed.
From the interplay analysis, the measurement result can help network designers find out better combination among fault tolerance strategies.At last, the contrast test verified the accuracy and efficiency for the proposed model and method and gave an example for the interplay analysis.

System Modeling
Based on the time-spatial feature for routes running in a network, we proposed a hierarchical network model for modeling networks with considering routing protocols, which is the base and object of resilience measuring.

Network Features Analysis.
Applications and functions are operated in the routes according to routing protocols.The spatial feature of routes is varying with the time process as Figure 1 shows.
Thus we can abstract the routes according to routing protocols as a network.Here we considered physical connective exhibiting on physical layer, and, based on physical connective, the logical connective exhibited on logical layer, dynamic features can be observed in time and spatial aspects: due to nodes' movement features, the physical links on physical layer and routes on logical layer are not fixed; meanwhile, the fault tolerance mechanism against failures is another factor which can affect network spatial structure.Therefore, the different routing protocols which have different route discovery and maintenance mechanisms will also result in the different structure behaviors of the routes abstracted networks.The structure spatial dynamic process is expressed in time dimension.
So we considered the abstracted network formed by routing protocol possessing time-spatial features and proposed "time-spatial" network model.We will use the time-spatial network model to describe time and spatial variation which is caused by routing protocols and fault tolerance principle and carry on the resilience measurement on the network model as well.

Hierarchical Network Model.
Routes discovery and maintenance according to routing protocols depend on the connection of physical layer, so the upper time-spatial network model also takes into account the physical network model.Meanwhile, the connectivity of MANET physical layer depends on the distance between nodes and the radius of signal transmission.About this property of physical layer, there are several random graph models which are used properly to model the physical layer of MANET, such as Gilbert graphs, Waxman graphs, and Gabriel graphs [21,29].With the purpose of closing the real usage profile and comparison, in this paper, we preferred Waxman graph as the physical layer model and our proposed time-spatial network model is above the physical layer as the logical layer.

Physical layer
Logical layer (routes forming network)

Physical layer
Logical layer (routes forming network)

Physical layer
Logical layer (routes forming network) where ,  ∈ (0, 1], L is the maximum distance between any two nodes,  denotes the density of links, and  corresponds to a high ratio of long links to short links.Definition 2 (logical layer network model (time-spatial network model)).Because of the time-spatial dynamic features that the routing protocol construct network have, we name the logical layer network model as "time-spatial network model."Here we proposed the definition of logical layer network model for denoting the feature of logical fault tolerance strategies: MANET can be considered as a directed graph:  = (, ),  is the set of the whole network, the number of the nodes in the set is ||, for  ∈ , ⟨, V⟩ denotes valid path from  to V, denoted by ⟨, V⟩, and (, V) express the link between  and V, denoted by (, V).  = {⟨, V⟩ |  ∈ , V ∈ } is defined as valid edge.The number of edges or links is defined as ||.Define the function :  → +, + is a subset of the positive real number, and (⟨, V⟩) express the weight of edge: ⟨, V⟩, denoted by (, V).If (, V) is +∞, then the link ⟨, V⟩ is break.The partially ordered set which is formed by : {V 1 , V 2 , . . ., V −1 , V  , V +1 , . . ., V  }, which denoted a path whose source node is V 1 , destination node is V  , and the node number of the path is , ∀ (2 ≤  ≤ ), and V −1 is the upstream node in the path of V  , ∀ (1 ≤  ≤  − 1).The node V +1 is the downstream node in the path of node V  .For V  ∈ V, V  ∈ V, Path(V  , V  ) express the set of path that the source node is V  and the destination node is V  ; in other words, it is the edges set of time-spatial network.We assumed that path(V  , V  ) ∈ Path(V  , V  ); then define the function: The purpose of the routing that source node is V  and destination node is V  is to find out a path * (V  , V  ) where the evaluation function is optimal, to satisfy the formula: (3) Since network's structure and topology can define the basic features of the network, based on time-spatial network model, we can evaluate the resilience through structure metrics considering routing protocols impact factor.

Fault Tolerance Strategies Modeling.
For interplay among fault tolerance strategies studies, the most important and first step is fault tolerance strategies modeling.In this paper we mainly focus on the interplay between logical fault tolerance strategies and physical layer fault strategies.According to describing the fault tolerance features, namely, describing how the strategies perform fault tolerance, the models were constructed.Then the next steps of resilience measurement and analysis can take the models into account.

Physical Layer Fault Tolerance Strategies Model.
From network science perspective, the network structure or topology features can indicate certain operation behaviors.Therefore, if we consider the resilience as a part of the network's behaviors, the network's structure will affect resilience features.
Besides the metrics mentioned in Section 2.2, the special metrics for evaluating the network's robustness and resilience taking into account network structure metrics were also proposed, for example, spectral metrics, nature connectivity, and algebraic connectivity [19,25,30].Although all of them are fit for measuring the network structure robustness, it cannot completely reflect the impact from routing mechanism to the network directly.In recent years, some scholars started to consider the packet transmission impacts and proposed flow robustness [26] metrics to measure the physical layer flow robustness (phyFR) as resilience assessment of network; [10] compared flow robustness with other structural based metrics, and it showed higher accuracy than others: where  = (, ) is the graph representing the given network.Let {Component  ; 1 <  < } be the set of components in graph .The complexity for calculating the resilience by using phyFR is hard to find the components size in network topology graph.Hence, we may count edges' number instead of counting nodes number, because the number of links and edges is easy to collect during the simulation process that is considering routing mechanism.The detailed step by step introduction will be given and proven in Section 3.

Logical Layer Fault Tolerance Strategies Model.
We define the source node and destination node as s and d, the path connecting with  and  is denoted as , and the links and nodes, which are included in , are denoted as  and .
Then we can obtain the following function: 0 is the shortest path between  and  and   is one of the paths between node  and node .Function (  ) can denote the number of paths constructed by routing protocols.If the same OD pair nodes have more paths that are efficient, we can consider the transmission between these OD pair nodes to have more ability of fault tolerance.Hence, we proposed a metric LFT (logical fault tolerance) to measure the fault tolerance:

Resilience Measurement Method
Combining the metrics we introduced, here are the steps of our measuring method.
Step 1.Based on the hierarchical network model, build a spatial profile of the network (express as a matrix), and the profile can describe the connection status of the nodes in time-spatial network per time step: Step With the purpose of simplifying the complexity, we can also count edges or links instead of counting   .From the definition of flow robustness [26], the  nodes can construct ( − 1)/2 edges; hence if   is the edge number of   , (8) can be transformed to where  is the total number of possible network edges.Here we proposed a simple proof: Solving (10), Substituting ( 11) and ( 13) into (8), we can obtain (9).
Step 3. Measuring the resilience of network: refer to the resilience calculating method from Alenazi and Sterbenz [22]; we used sum of time-spatial flow robustness (STSFR) to denote the resilience: STSFR  = ∑  =1 TSFR  .For comparing with the traditional structural based measuring method, the sum of physical layer flow robustness (SphyFR) metric will be introduced as follows: Step 4. After obtaining the whole network's resilience value, we can further analyze the logical fault tolerance ability through logical layer resilience according to the logical fault tolerance strategies model: LFT = 1 −  − ∑  =1 (  ) .Meanwhile, the interplay among different strategies will be observed.Based on this step's analysis, we can figure out whether the logical layer's fault tolerance strategies adapt to the physical layer's fault tolerance strategies, namely, whether the combination of fault tolerance strategies on both physical and logical layers of hierarchy network model have better fault tolerance performance.In terms of understanding the fault tolerance ability for individual layer, it will be much helpful to make a strategic decision during network designing process.

Case Study and Discussion
Here we constructed a case to verify the validity of proposed model and method and analyze phyFR and TSFR metrics during network resilience evaluation process.

Case Introduction.
The case framework is based on a MANET which is applied AODV protocol.In order to be closer to the real use profile, inspired by the simulation designing of Rohrer et al. [26], in this case, we set up the simulation as Table 1 shows.
Based on AODV routing algorithm, the time-spatial network will be generated and formed by valid routes and OD pair nodes as Figures 2 and 3 show.
After Step 1 is completed, we can obtain the network topology graph.From observing the generated topology graph at different time points, the topology is varying with time increasing, and the network structure metrics also change obviously; here we choose 2 metrics from complex network theory to observe the robustness changes in time dimension: clustering coefficient and largest component size [5].The results are shown in Figure 4. From the preliminary observation of network robustness, we can obtain that the level of network robustness is dynamic although the fault tolerance strategies are static.Namely, the fault tolerance ability of the strategies is not fixed.With further studying the resilience impact from fault tolerance strategies, we collected the data of phyFR, TSFR, SphyFR, and STSFR metrics during the simulation process; the results are shown in Figures 5 and 6.We selected 6 check points for simulation enumeration results exhibition as Table 2 shows.From the results, we could observe the following phenomenon: Due to the different physical layer structure features, which are indicated by different ,  parameter value of Waxman model as physical layer model, the performance of robustness and resilience for physical layer and time-spatial abstracted layer is different.In order to facilitate the study of impact from different physical structures behaviors, we choose 2 groups of  and  value referring to [10]:  = 0.5,  = 0.8 and  = 0.8,  = 0.5.The network has general higher robustness and resilience level where  = 0.5,  = 0.8, and the result is consistent with the result in [10] as well.Meanwhile, under the same physical layer setup, the value of TSFR and STSFR is lower than the value of phyFR and SphyFR, because the time-spatial abstract layer is the upper level of physical layer, and AODV routing protocol cannot use all physical connection but choose some of them as routes.
While the failure nodes number is increasing, the values of phyFR and SphyFR are decreasing more obviously than TSFR and STSFR, which considers the routing protocol factor.The reason is that the fault tolerance ability of routing protocol can be against the node failure.This reason also could explain why the impact from physical structure is much stronger in phyFR and SphyFR than TSFR and STSFR (Table 2).

Validity Verification and Discussion.
Since the packets transmission is to rely on routes, TSFR metric should be much closer to the real robustness and resilience level than phyFR metric.To prove this assumption and with the purpose of verifying the accuracy, we compared the simulation result with experimental results from general performance based approaches by using NS-3 platform.The experimental scenario configuration is as Table 1 shows.First, we performed the scenario setup on NS-3 platform following Table 1 and extended the node number from 200 to 500.Once all source nodes send the packets completed, we tracked and counted the successfully delivered packets number from destination nodes.
For contrasting and accuracy validation, we adopted packets delivery ratio as the robustness metrics for measuring experimental result.The higher the packets delivery ratio, the stronger the fault tolerance ability.So we also adopted TSFR to compare with packets delivery ratio for accuracy validation.It is noteworthy that phyFR is the metric where [10] mentioned that it has higher accuracy than other structural based metrics for resilience evaluation.From Figure 7, we can observe that TSFR present the closer fault tolerance ability with the real ability (delivery ratio) than the general structural based metric (phyFR).According to the Pearson correlation analysis, which is used to represent accuracy and also was adopted as accuracy calculation method in [10], the accuracy of TSFR is 0.93 and phyFR is 0.83.We also fitted the polynomial curve for both TSFR and phyFR; from these fitting curves in Figure 8, we can obviously observe that the robustness denoted by TSFR is closer than phyFR denoted.Hence, the resilience metric, STSFR, which is based on TSFR, also has higher accuracy than general structural metrics.
Then we verified the efficiency for proposed resilience measurement method according to comparing source code complexity with general performance based method which is implemented by NS-3.The source codes of performance based method are all corresponding classes or methods for measuring MANET resilience that adopt AODV as routing protocol on NS-3 platform (NS-3 version: ns-3.24).The source code of time-spatial model based method program has the same feature as performance based source code.From the comparison result (Table 3), our proposed method has obvious lower complexity than performance based methods and will be more efficient.logical fault tolerance strategies on the same physical layer; the detailed simulation setup is shown in Table 4.

Interplay Analysis of
Besides the network scale and node communication radius, the movement mode and node failure mode are the same as the previous case study (Table 1 shown).About interplay analysis of fault tolerance strategies, we collect the routes constructed by AODV and DSR, which are common routing protocols as logical fault tolerance strategies, but the fault tolerance mechanisms of them are different.Then according to the strategies models proposed in Section 2.3, the logical and physical fault tolerance strategies could be analyzed individually.Therefore, the impact of resilience also could be observed separately, and the interplay among different strategies could be obviously observed.The interplay   analysis result is shown in Figure 8.The top figure in Figure 8 shows logical layer fault tolerance ability (LFT) versus physical layer fault tolerance ability (SphyFR), and the bottom figure in Figure 8 shows the resistance ability of logical fault tolerance strategies to physical layer failure.
From the analysis result, we can observe that, with the failure node number increasing, the logical layer's resilience under AODV and DSR was decreasing, especially to DSR; after the available node ratio being over 20%, the fault tolerance ability decreased sharply and the resistance ability to physical failure is obviously weaker than AODV.This conclusion will inspire that although combining any physical and logical fault tolerance strategies as a hybrid fault tolerance strategy could improve the resilience of network, selecting a property strategy according to the interplay analysis will be helpful to improve network resilience efficiently.Meanwhile, this reselt cannot be shown through any general resilience measurement methods.

Conclusion
In this paper, we proposed a hierarchy network model and modeled fault tolerance strategies for each layer according to this model; then, in the foundation of the modeling work, we proposed a corresponding approach for MANET resilience measurement with fault tolerance strategies.The traditional measurement methods are limited on strategies interplay analysis because of ignorance of logical fault tolerance strategies as individual resilience impact factor.To solve this problem, we added a new upper layer based on physical layer which is adopted by traditional structural based resilience measuring methods and constructed the model for describing logical fault tolerance strategies, to reflect the resilience impact from routing protocols.Meanwhile, the proposed measurement methods based on fault tolerance strategies models can help to further analyze the resilience impact factor of interplay among different strategies.According to the contrast with general performance and structural based resilience measuring methods, our measuring method shows higher accuracy than general structural based methods and higher efficiency than general performance based ones and can give more significant results to support network designing.Furthermore, according to case study and simulation result discussion, the result of our proposed measuring method can obviously reflect the fault tolerance ability from routing protocols and the resilience impact of interplay among strategies; it will be helpful for further study on MANET resilience to consider the combined impact of both physical and logical layers and also for network designing and the development of routing protocols.

Figure 1 :
Figure 1: Different topology of routes and links in different time profiles.

Figure 4 :
Figure 4: Clustering coefficient and largest component varying in time dimension.

Figure 7 :
Figure 7: Accuracy validation by using NS-3 for TSFR under the same scenario within 500 time steps.
2. Analyze the flow robustness of time-spatial (TSFR) network model.

Table 3 :
Complexity analysis and comparison.