^{1}

^{1}

^{2}

^{1}

^{2}

We propose a multitree based fast failover scheme for Ethernet networks. In our system, only few spanning trees are used to carry working traffic in the normal state. As a failure happens, the nodes adjacent to the failure redirect traffic to the preplanned backup VLAN trees to realize fast failure recovery. In the proposed scheme, a new leaf constraint is enforced on the backup trees. It enables the network being able to provide 100% survivability against any single link and any single node failure. Besides fast failover, we also take load balancing into consideration. We model an Ethernet network as a twolayered graph and propose an Integer Linear Programming (ILP) formulation for the problem. We further propose a heuristic algorithm to provide solutions to large networks. The simulation results show that the proposed scheme can achieve high survivability while maintaining load balancing at the same time. In addition, we have implemented the proposed scheme in an FPGA system. The experimental results show that it takes only few

Ethernet has become the dominant local area network technology for decades. It has also been extended to support metropolitan area networks (MAN) and even wide area networks (WAN). Nowadays Ethernet is the major data center networking technology [

The IEEE 802.1d spanning tree protocol [

In recent years, several fast failure handling schemes have been proposed in the literature. Those schemes can be classified into spanning tree reconstruction based and multi-VLAN based approaches. Since a failure link breaks a working spanning tree into two parts, the tree reconstruction based approach is to select a new link to reconnect the two subtrees. In the multi-VLAN based approaches, failure recovery is achieved by switching the traffic affected by a failure to a backup VLAN tree so as to bypass the failure device. The backup VLANs are preconfigured and stored in each switch. As a failure event is detected, the switch that is adjacent to the failure performs the failure recovery process individually without exchanging protocol information among nodes. Since each switch performs failure recovery based on local decision, the failover time is greatly reduced compared to conventional Ethernet network.

Fast Spanning Tree Reconnection (FSTR) [

IEEE 802.1s [

In [

Viking is also a VLAN based approach [

In [

Figure

Example of VLAN based protection scheme.

Since a link failure and a node failure have the same syndrome, that is, loss of signal to Ethernet switches adjacent to the failure point, it is difficult to identify the exact failure location within very short time. In fact, the only way to identify the exact failure type is to cooperate among multiple nodes in the network through time consuming message exchange processes. It prohibits a node from achieving fast protection switching.

In order to resolve the difficulty of the above problem, we propose a novel fast local protection scheme. In the proposed scheme, we configured backup VLANs to protect working VLANs on each link. We require that both adjacent switches of a protected link have to be leaf nodes in this link’s backup VLAN tree. This requirement is called leaf constraint in this paper. Therefore, even if the failure event is a node failure, the backup VLANs can still guarantee to provide a survivable path for each node excluding the failure node.

Figure

Example of the proposed leaf constrained VLAN protection scheme.

Original input topology

Normal state VLAN 1 (working)

Backup VLAN 2 for

Backup VLAN 3 for

Backup VLANs that meet the leaf constraint can protect not only any single link failure but also any single node failure. We use the same backup VLANs in Figure

In this paper, we jointly take traffic engineering (TE) and network survivability into consideration. Our target is to guarantee that the provisioned VLANs are not only being able to protect any single link failure and any single node failure, but also being able to avoid traffic congestion. We denote the leaf constrained VLAN provisioning with TE consideration as LCP-TE problem. To facilitate problem formulation, we first use graph transformation technique to transform the considered Ethernet network into a two-layered graph. We propose an ILP model for this problem on the transformed graph. The objective function of the ILP problem is to minimize the utilization on the most congested link. The output of the problem includes working and backup VLANs that can guarantee 100% survivability against any single link failure and single node failure.

The LCP-TE problem is considered in our previous work [

The remainder of this paper is organized as follows. In Section

In this work, we formulate the muti-VLAN LCP-TE problem as an ILP problem. Our algorithm determines the working VLANs for traffic routing in the normal state and the backup VLANs for fast failover in the failure states. Leaf constraint is applied in our formulation so that both single link failure and single node failure can be handled. The objective function is to minimize the link utilization on the most congested link.

We apply graph transformation technique to facilitate problem formulation. The input graph is transformed to a two-layered directed graph. The top layer is used to determine the working VLAN trees for traffic transmission in the normal state while the down layer is used to decide backup VLANs. Those two layers are connected by some artificial bridge edges. As a failure occurs, some particular bridge edges are turned on to allow traffic moving from the top layer to the down layer. We use this idea to reduce the difficulty on formulating the multi-VLAN LCP-TE problem.

To make the notation easier for understanding, we use vertices and edges to denote switches and directional links in the transformed graph. Figure

Graph transformation and failure recovery.

Input network

Illustration of graph transformation

Example for failure recovery

The problem formulation and notations are shown below.

The objective function is to minimize the link utilization on the most congested link. Constraints (

Problem

The Phase I problem has been studied in [

The algorithm for solving the multi-VLAN LCP-TE problem is described in Algorithm

/

(1) Use algorithm [

(2) The output of Step 1 is used to derive

/

(3)

(4)

(5) solve Sub-problem

(6)

(7)

(8) output routing

Multi-VLAN LCP-TE problem subproblem

In this section, we present the numerical results obtained from computer simulations and demonstrate the protection switching time measured on an FPGA-based testbed system.

We have carried out simulations on several randomly generated networks. Each network is denoted by

We first make performance comparisons on the survivability ratio under single node failure scenario. The destination based algorithm [

Simulation results.

Survivability ratio under single node failure

Maximum link utilization in small networks

Performance comparisons in larger networks

Impact of multiple instances

Because the multi-VLAN LCP-TE is an NP-complete problem, we cannot directly solve problem

Besides the relaxed problem, we also implemented the unit weight heuristic algorithm. In the algorithm, the working spanning tree is obtained using minimum spanning tree algorithm in which each link weight is set to be 1. Then we apply CPLEX to solve the model of Section

We further perform simulations to evaluate several algorithms in large networks and show the results in Figure

In the final set of simulations, we perform the proposed multi-VLAN LCP-TE heuristic algorithm to evaluate the performance using multiple spanning tree instances for working traffic. The results are shown in Figure

We further set up an experiment shown in Figure

Experimental results.

Experimental setup

Protection switching time

In this paper, we have proposed a novel multi-VLAN based protection scheme for fast failure recovery and congestion avoidance in Ethernet networks. By enforcing the two end nodes of each link to be the leaf nodes on its backup VLAN, our scheme is able to protect any single link failure and any single node failure. We have introduced a graph transformation technique to facilitate problem formulation for this problem. In the proposed optimization model, we take load balancing into consideration to avoid traffic congestion on the most congested link. Since this problem is an NP-complete problem, we further propose a heuristic algorithm to provide a solution to large sized networks.

We have carried out extensive simulations on several randomly generated networks. The simulation results indicate that the proposed algorithm outperforms both the unit weight and the random heuristic algorithms. Although more working VLAN trees can provide more routing paths to improve load balancing, we have observed that the performance improvement becomes saturated as the number of working trees increases. Simulation results indicate that three working trees are enough for providing load balancing routing. Since the maximum number of VLANs in an Ethernet is 4096 VLANs, the proposed approach is free from scalability issue.

To evaluate the failure recovery time in a real system, we also implemented our approach in an FPGA system. Experimental results show that the protection switching time is within 3

The authors declare that there is no conflict of interests regarding the publication of this paper.

This work is supported by the Ministry of Science and Technology, Taiwan, under Grant no. 103-2218-E-194-007.