Network survivability—the ability to maintain operation when one or a few network components fail—is indispensable for present-day networks. In this paper, we characterize three main components in establishing network survivability for an existing network, namely, (1) determining network connectivity, (2) augmenting the network, and (3) finding disjoint paths. We present a concise overview of network survivability algorithms, where we focus on presenting a few polynomial-time algorithms that could be implemented by practitioners and give references to more involved algorithms.
Given the present-day importance of communications systems and infrastructures in general, networks should be designed and operated in such a way that failures can be mitigated. Network nodes and/or links might for instance fail due to malicious attacks, natural disasters, unintentional cable cuts, planned maintenance, equipment malfunctioning, and so forth. Resilient, fault tolerant, survivable, reliable, robust, and dependable, are different terms that have been used by the networking community to capture the ability of a communications system to maintain operation when confronted with network failures. Unfortunately, the terminology has overlapping meanings or contains ambiguities, as pointed out by Al-Kuwaiti et al. [
These three ingredients will be explained in the following sections.
A network is often represented as a graph
In 1927, Menger provided a theorem [
The maximum number of link/node-disjoint paths between A and B is equal to the minimum number of links/nodes that would separate A and B.
Menger's theorem clearly relates to the
A somewhat less intuitive notion of connectivity stems from the spectrum of the Laplacian matrix of a graph and is denoted as algebraic connectivity. The algebraic connectivity was introduced by Fiedler in 1973 [
The algebraic connectivity equals the value of the second smallest eigenvalue of
The algebraic connectivity has many interesting properties that characterize how strongly a graph
Connectivity properties, may be less obvious when applied to multilayered networks [
In probabilistic networks, links and/or nodes
The outcome of testing for network connectivity could be that the network is not sufficiently robust (connected). Possibly, rewiring the (overlay) network could improve its robustness properties [
Network protocols like OSPF are deployed in the internet to obtain a correct view of the topology and in case of changes (like the failure of a link) to converge the routing towards the new (perturbed) situation. Unfortunately, this process is not fast, and applications may still face unacceptable disruptions in performance. In conjunction with MPLS, an MPLS fast reroute mechanism can be used that, as the name suggests, provides the ability to switch over in subsecond time from a failed primary path to an alternate (backup) path. This fast reroute mechanism is specified in RFC 4090 [
Depending on whether backup paths are computed before or after a failure of the primary path, survivability techniques can be broadly classified into restoration or protection techniques.
In general, protection has a shorter recovery time since the backup path is precomputed, but it is less efficient in terms of capacity utilization and less flexible. Restoration, on the other hand, provides increased flexibility and efficient resource utilization, but it may take a longer time for recovery, and there is no guarantee that a backup path will be found. As a compromise between the two schemes, Banner and Orda [
Depending on how rerouting is done after a failure in the primary path, there are three categories of survivability techniques.
Depending on whether sharing of resources is allowed among backup paths, protection schemes can be of two types:
In general, path protection requires less capacity than link protection, while shared protection requires less capacity than dedicated protection. However, path protection is more vulnerable to multiple link failures than link protection, and so is shared protection compared to dedicated protection.
The remainder of this paper is structured as follows. In Section
Throughout the paper, the objective is not to list and explain all the relevant algorithms. Rather, we aim to briefly explain some fundamental concepts and some polynomial-time algorithms that could easily be deployed by practitioners or which can be (and have been) used as building blocks for more advanced algorithms, and to provide pointers to further reading.
In Section
A link cut refers to a set of links whose removal separates the graph into two disjoint subgraphs, and where all links in the removed cut-set have an end-point in both subgraphs.
The two subgraphs need not be connected themselves.
A node cut refers to a set of nodes whose removal separates the graph into two disjoint subgraphs, and where all nodes in the removed cut-set have at least one adjacent link to both subgraphs.
A minimum cut is a cut whose cardinality is not larger than that of any other cut in the network.
Definitions for a cut also have a variant in which a source node
An
Often, when referring to a cut, a link cut is meant. In the remainder of this paper, we will use the same convention and only specify the type of cut for node cuts.
A maximum cut is a cut whose cardinality is not exceeded by that of any other cut in the network.
The sparsest cut (sometimes also referred to as the (Cheeger) isoperimetric constant) is a cut for which the ratio of the number of links in the cut-set divided by the number of nodes in the smaller subgraph is not larger than that of any other cut in the network.
Finding a maximum or sparsest cut is a hard problem (the maximum-cut problem is APX-hard [
In the celebrated paper from Ford and Fulkerson [
Dinitz' algorithm, published in 1970 by Yefim Dinitz, was the first maximum-flow algorithm to run in polynomial time (contrary to the pseudopolynomial running time of the Ford-Fulkerson algorithm [
(2) While /*loop until the algorithm terminates in line (3) (4) (5) (6) (7) (8) (9)
The residual capacity
The residual graph
A blocking flow
A blocking flow could be obtained by repeatedly finding (via Depth-First-Search [
For further reference, in Table
Related work on computing minimum
Year1 | Reference | Complexity | Description |
---|---|---|---|
1951 | Dantzig [ |
|
Linear programming, where |
| |||
1956 |
Ford and Fulkerson [ |
|
Augmenting paths. |
| |||
1970 | Dinitz [ |
|
Resp. capacitated and unit-capacity graphs. Shortest augmenting paths. |
| |||
1974 | Karzanov [ |
|
Preflow-push (A simplification of Karzanov's algorithm has been presented by Tarjan [ |
| |||
1980 | Galil and Naamad [ |
|
Extension of Dinitz' algorithm. |
| |||
1982 | Shiloach and Vishkin [ |
|
Parallel algorithm for |
| |||
1983 |
Sleator and Tarjan [ |
|
Dynamic tree data structure. |
| |||
1986 | Goldberg and Tarjan [ |
|
Highest-label preflow-push. |
| |||
1987 | Ahuja et al. [ |
|
Excess scaling. |
| |||
1989 | Cheriyan and Hagerup [ |
|
Randomized algorithm. |
| |||
1990 | Alon [ |
|
Deterministic version of Cheriyan and Hagerup's randomized algorithm. |
| |||
1998 | Goldberg and Rao [ |
|
Length function. |
| |||
2011 | Christiano et al. [ |
|
Resp. |
1Throughout the paper, we take the convention of listing the year of the first (conference) publication, while referring to the extended (journal) version there where applicable.
In this section, we describe the algorithm from Matula [
Let
For subgraph
The algorithm of Matula (see Algorithm
(2) While (3) (4) (5) (6)
In the algorithm of Matula, an augmenting path is a path in the residual network, where a residual network is the network that remains after pruning the links of a previous augmenting path. There are no 1-hop paths from
For directed multigraphs, Shiloach [
Let
We refer to Mansour and Schieber [
For further reference, in Table
Related work on computing minimum link cuts.
Year | Reference | Complexity | Description |
---|---|---|---|
1971 | Podderyugin |
|
Undirected graphs. Variation of Ford-Fulkerson max-flow algorithm in how augmenting paths of one and two hops are handled. |
| |||
1971 | Tarjan [ |
|
Testing for 2-link connectivity in undirected graphs via DFS. |
| |||
1975 | Even and Tarjan [ |
|
Application of Dinitz' algorithm. |
| |||
1986 | Karzanov and Timofeev [ |
|
Undirected graphs. |
| |||
1987 | Matula [ |
|
Undirected graphs. It is also shown that the maximum subgraph link connectivity can be determined in |
| |||
1989 | Mansour and Schieber [ |
|
Directed graphs. Relation between minimum cut and dominating set. |
| |||
1990 | Nagamochi and Ibaraki [ |
|
Undirected graphs. Algorithm does not use a max-flow algorithm. |
| |||
1991 | Galil and Italiano [ |
|
Testing for 3-link-connectivity in undirected graphs. |
| |||
1991 | Gabow [ |
|
Directed, resp. undirected graphs. Matroid approach. |
| |||
1996 | Karger [ |
|
Randomized algorithm. |
Maximum-flow algorithms can also be used to determine the node connectivity, as demonstrated by Dantzig and Fulkerson [
For every node
The
By using Dinitz' algorithm, one may compute the
In the previous section, we have provided an overview of several algorithms to determine the connectivity of a network. In this section, we will overview several network augmentation algorithms that can be deployed to increase the connectivity (or some other metric) of a network by adding links. Network augmentation problems seem closely related to network deletion problems (e.g., see [
In this section, we consider the following link augmentation problem.
Given a graph
We can discriminate several variants based on the graph (directed, simple, planar, etc.) or if link weights are used or not (i.e., in the unweighted case all links have weight 1). Let us start with the weighted link connectivity augmentation problem.
The weighted LCA problem is NP-hard.
We will use the proof due to Frederickson and JáJá [
Given a set
For a 3DM instance
Frederickson and JáJá also used the construction of this proof to prove that the node-connectivity and strong-connectivity variants of the weighted LCA problem are NP-hard (in a directed graph strong connectivity is used, which means that there is a directed path from each node to every other node in the graph). We remark that the unweighted simple graph preserving LCA problem was claimed to be NP-hard by Jordán (reproduced in [
Eswaran and Tarjan [
The algorithm of Eswaran and Tarjan as presented in Algorithm
(2) (3)
(2) (3) For each
(2) PostOrder (3) For (4) (5) (6)
(2) Condense connected components of (3) Number the nodes in (4) For (5) Map the ends of each chosen link to an arbitrary node in the corresponding 2-link-connected component
We have assumed that the initial graph was connected. Eswaran and Tarjan's algorithm also allows to start with disconnected graphs, by augmenting the forest of condensed 2-link-connected components to a tree.
The algorithm of Eswaran and Tarjan uses a tree representation of all the 2-link-connected components in
We will use the notation Each proper cut in For any link
A cactus graph without cycles is a tree, and if
Two cuts
Karzanov and Timofeev [
Karzanov and Timofeev [
(2) For (3) Replace the node If (4) Connect path be the set of nodes in node (5) Label the nodes of Update unchanged. (6) Remove all empty nodes of degree an adjacent tree link (a node is an
Figure
Example of a cactus construction for a 4-node ring topology. The top “row” gives the graphs
Naor et al. [
A node in a cactus representation
Similarly to a tree, if the cactus
The algorithm uses a Depth-First-Search-like procedure, see Algorithm
points [ (2) DFS traversal that starts at an arbitrary node and obeys the following rule: if a node for the first time via a cycle with some color, then traverse all other differently colored links adjacent to
(2) Cactus-DFS (3) Form the pairs that map to the leaf (4) For each pair and a node from a different leaf
For further reference, in Table
Related work on augmenting link connectivity in unweighted graphs.
Year | Reference | Complexity | Description |
---|---|---|---|
1976 | Eswaran and Tarjan [ |
|
Augmenting to 2-connectivity. |
| |||
1986 | Cai and Sun [ |
NA | Splitting off links. |
| |||
1987 | Watanabe and Nakamura [ |
|
Based on a derived formula for the minimum number of links to |
| |||
1990 | Frank [ |
|
Different |
| |||
1990 |
Naor et al. [ |
|
|
| |||
1991 | Gabow [ |
|
Poset representation of cuts applied to the Naor-Gusfield-Martel algorithm. |
| |||
1994 | Benczúr [ |
|
Resp. randomized and deterministic algorithms. |
| |||
1996 | Nagamochi and Ibaraki [ |
|
Splitting off links. |
| |||
1998 | Benczúr and Karger [ |
|
Randomized algorithm. |
| |||
2004 | Nagamochi and Ibaraki [ |
|
Maximum adjacency ordering2. |
1NP-hard variations of this problem and corresponding approximation results are provided by Nutov [
2Maximum adjacency ordering rule: add a new node
Splitting off a pair of links
Let
Mader's theorem has been used by for instance Cai and Sun [
As indicated by Theorem
Under specific conditions, the weighted LCA problem may be polynomially solvable, as shown by Frank [
In this section, we consider the following node augmentation problem.
The Node Connectivity Augmentation (NCA) problem. Given a graph
Like for the LCA problem.
The weighted NCA problem is NP-hard.
The proof of Theorem
The unweighted undirected NCA problem has received most attention. The specific case of making a graph 2-node connected was treated by Eswaran and Tarjan [
Augmenting the node connectivity of directed graphs has been treated by Frank and Jordán [
As the weighted NCA problem is NP-complete, special cases have been considered [
When a network is (made to be) robust, algorithms should be in place that can find link- or node-disjoint paths to protect against a link or node failure. There can be several objectives associated with finding link- or node-disjoint paths.
Given a graph
The total weight of the pair of disjoint paths is minimized.
The maximum path weight of the two disjoint paths is minimized.
The smallest path weight of the two disjoint paths is minimized.
The weight of the primary path should be less than or equal to
The smallest capacity over all links in the two paths is maximized.
The most common and simpler one is the
Beshir and Kuipers [
Li et al. [
Sherali et al. [
Finding min-sum disjoint paths is equivalent to finding a minimum-cost flow in unit-capacity networks [
In directed networks, a link-disjoint paths algorithm can be used to compute node-disjoint paths, if we split each node
In undirected networks, a link-disjoint paths algorithm can be used to compute node-disjoint paths by the transformation described in Section
We will present the Suurballe-Tarjan algorithm, see Algorithm
(2) Modify the weights of each link /* (3) For (4) (5) (6) (7) While (8) (9) (10) (11) If (12)
Instead of finding an augmenting path for each source-destination pair, Suurballe and Tarjan have found a way to combine these augmenting flow computations into two Dijkstra-like shortest-paths computations. First a shortest paths tree
(2) While (3) (4) (5) For (6) (7) While (8) (9) unmark (10) (11) (10) Else (11) (12)
Taft-Plotkin et al. [
For a distributed disjoint paths algorithm, we refer to the work of Ogier et al. [
Roskind and Tarjan [
When two disjoint primary and backup paths are reserved for a connection, any failure on the primary path can be survived by using the backup path. The backup path therefore provides 100% survivability guarantee against a single failure. When no backup paths are available, that is, unprotected paths are used, then the communication along a path will fail if there is a failure on that path. Banner and Orda [
The approach by Banner and Orda to solve the
Graph transformation for the
In the transformed graph, a minimum-cost flow of
Luo et al. [
She et al. [
The single-link failure model has been most often considered in the literature, but multiple failures may occur as follows. Due to lengthy repair times of network equipment, there is a fairly long time span in which new failures could occur. In case of terrorist attacks, several targeted parts of the network could be damaged. With Suurballe's algorithm, In layered networks, for instance IP-over-WDM, one failure on the lowest-layer network, may cause multiple failures on higher-layer networks. Similarly, the links of a (single-layered) network may share the same duct, in which case a damaging of the duct may damage all the links inside. These links are often said to belong to the same shared risk link group (SRLG) (the node variant SRNG also exists. When both nodes and links can belong to a shared risk group, the term Shared Risk Resource Group (SRRG) is used, e.g., see [ Natural disasters may affect all nodes and links within a certain geographical area. Work on multilink geographic failures has mostly focused on determining the geographic max-flow and min-cut values of a network under geographic failures of circular shape (e.g., Sen et al. [
We have provided an overview of algorithms for network survivability. We have considered how to verify that a network has certain connectivity properties, how to augment an existing network to reach a given connectivity, and, lastly, how to find alternative paths in case network failures occur. Our focus has been on algorithms for general networks, although much work has also been done for specific networks, such as optical networks, where additional constraints like wavelength continuity and signal impairments induce an increased complexity, for example, see our work [
The author would like to thank Professor Piet Van Mieghem for his constructive comments on an earlier version of this paper.