Large-scale deployments of mission-critical services have led to stringent demands on Internet routing, but frequently occurring network failures can dramatically degrade the network performance. However, Border Gateway Protocol (BGP) can not react quickly to recover from them. Although extensive research has been conducted to deal with the problem, the multiple failure scenarios have never been properly addressed due to the limit of distributed control plane. In this paper, we propose a local fast reroute approach to effectively recover from multiple link failures in one administrative domain. The principle of Software Defined Networking (SDN) is used to achieve the software defined AS-level fast rerouting. Considering AS relationships, efficient algorithms are proposed to automatically and dynamically find protection paths for multiple link failures; then OpenFlow forwarding rules are installed on routers to provide data forwarding continuity. Our approach is able to ensure applicability to ASes with flexibility and adaptability to multiple link failures, contributing toward improving the network performance. Through experimental results, we show that our proposal provides effective failure recovery and does not introduce significant control overhead to the network.
With the fast development of communication infrastructure and technologies, the Internet has become the critical information infrastructure; more and more new mission-critical services are widely applied on it, such as E-bank, online games, voice-over-IP (VoIP), and video conferencing application, which have stringent demands on network reliability and availability. It is essential for the Internet to provide reliable services even in the presence of failures. Unfortunately, measurements [
Many attempts have been made in order to effectively address the interdomain routing failures [
As the remarkable increase of network scale makes network robustness more and more stringent, multiple failure scenarios are gaining increasing attention recently. Taking advantage of cellular graph embeddings, a novel contingent forwarding approach, called packet recycling (PR), is proposed by Lor et al. [
Software Defined Networking (SDN) is an emerging network architecture with separation of the control plane and data plane, which enables it to provide a good platform for innovations. As one of the most common SDN southbound interfaces, OpenFlow [
In this paper, we propose software defined AS-level fast rerouting for multiple link failures by considering AS relationships (SDFRR-ASR) in one administrative domain. SDFRR-ASR is designed to provide fast failure recovery from various types of multiple link failures which include concurrent, continuous, centralized, and distributed link failures. Although our previous work [
The remainder of the paper is organized as follows. Section
The overall structure of SDFRR-ASR is shown in Figure
System architecture.
In the architecture, Network Controller is the core unit for the system control and management. It includes four key components for constructing protection tunnels. Information Collector is designed to collect all the necessary network information and save it in the centralized database, which enables the Network Controller to have a global view of the whole system and the ability of centralized control. Once link failure occurs, Trigger is used to trigger the upper layer application to compute protection paths. After the application finishes its procedure, Result Receiver is responsible for receiving the results from the application. Command Distribution is responsible for sending commands to install OpenFlow forwarding rules on the underlying routers based on the received results. Besides, it is also used to repeal OpenFlow forwarding rules. Network Controller integrates the function of the above components and provides unified APIs to the upper layer applications, which enable them to implement specific applications through the APIs regardless of the details of the underlying network. It is important to note that the application developed in this paper is fast rerouting for multiple link failures in the interdomain routing with AS relationships.
In our system, OpenFlow technology is used to build the underlying protection tunnels; the BGP routing table and OpenFlow flow table are coexisting in the routers along the protection paths. In this case, the flow table has priority to forward corresponding packets according to the rule, which enables our approach to work well upon failures. As shown in Figure
In this section, the fast reroute application for multiple link failures with AS relationships is explained in detail. Figure
The overall process of the application.
The ultimate goal of our technique is to handle multiple link failures under different failure scenarios which include concurrent failures and continuous failures. FSR plays a fundamental role in the design of our scheme. For different failure scenarios, we need to propose different solutions to address them; thus it is crucial to identify the failure scenarios firstly and accurately.
Concurrent failures are the failures in which links fail at the same time. Upon the interdomain link failures, Network Controller triggers the application to discover the protection paths in order to detour around the failed links. Unfortunately, there is an inevitable delay of the propagation time between the routers and Network Controller; thus concurrent failures may not trigger the application at the same time, which may cause the unavailability of obtained protection paths. In this case, the fast reroute application is expected to consider all the concurrent failures before computing the protection paths. We propose the approach of delaying failures handling in order to accurately handle the concurrent failures; that is, when the first failure arrives, the application should wait a configurable time
Under continuous failure scenarios, the key point is to determine whether the subsequent failures affect the existing protection tunnels. If the existing protection tunnel is influenced, it should be deactivated. After that, the application took all the failures including the failed links corresponding to the deactivated tunnels to compute the protection paths. To cope with this condition, two special types of continuous failures should be considered carefully. One is that the subsequent failure lies on the existing protection paths, and the other one is that it affects the normal use of the existing protection path without lying on it. This model can be expressed as follows:
To effectively find optimal protection paths in the face of multiple link failures, we design the process of the protection path computation (PPC), which is the key function of SDFRR-ASR. As the key factor of our approach, AS relationships are taken into account to find policy-compliant protection paths, in addition to BGP policies and decision rules, so as to guarantee the protection paths to conform to the interdomain routing. The solutions to main types of AS relationships are first discussed in Section
AS relationships are an important aspect of Internet structure. They have great impact on the interdomain routing [
Provider AS has all reachable routes.
For example, consider in Figure
Provider AS has some reachable routes.
As the key part of our approach, the protection path computation (PPC) aims to effectively find optimal protection path to detour multiple link failures. The highlight of this process is that the AS relationships are taken into account in order to meet special requirements. Besides, BGP policies and simplified decision rules are also used to help find the policy-compliant protection paths, which enable the protection paths to conform to the interdomain routing.
Figure
The protection path computation.
For the failure information received from FSR, ASPC selects the failed link to be handled at this time by considering the sequence of link failures.
With regard to the failed link, ASPC needs to check whether any prefixes are affected by the failed link. If no prefix is affected, there is no need to compute the protection path for it; otherwise, go to Step
Considering the AS relationships, ASPC must verify the relationship of the failed link before computing the protection path for it. It checks whether the failed link is customer-to-provider edge, which is expressed as follows:
where Src_AS is the AS which contains one endpoint of the failed link and Dest_AS is the AS which contains the other endpoint of the failed link.
If not, the protection path should be computed according to the procedure of provider-to-customer relationship or peer-to-peer relationship; then go to Step
If the failed link is customer-to-provider edge, it is expected to see whether the customer AS has any other available provider AS, which is described as follows:
If not, it means that the customer AS has no available provider AS to use and it should consider other links; then go to Step
Utilizing the BGP routing table, the process checks whether the available provider AS has reachable routes for all the affected prefixes. If it does, the specific router in the provider AS is the endpoint of the protection path, which will be saved in the database; otherwise, go to Step
Further check whether any other failed links have not been handled. If not, the obtained protection paths will be sent to Network Controller by PPC, and the process ends; otherwise, ASPC selects the next failed link to compute the protection path; go back to Step
If there is no provider AS that has reachable routes for all the affected prefixes, ASPC continues to check whether any affected prefixes have reachable routes according to BGP routing tables. If the specific router exists, it is the endpoint of the protection path for these prefixes and will be saved in the database. As to the remaining prefixes, go to Step
ASPC checks whether any EBGP links are available based on the AS information saved in the database. If no EBGP links are available, PPC sends the protection path saved in the database to Network Controller, and the process ends; otherwise, go to Step
Using the information of ASes saved in the database, ASPC constructs a 2-dimensional matrix
The element
According to the failed links, ASPC deletes the connection relationship of ASes attached to them in the matrix.
Based on the information of failed links and the obtained matrix, ASPC uses the depth-first algorithm to find
The set of
The flow chart of the original ASPC is given in Figure
The flow chart of original ASPC.
It should be noted that the design of the optimal path selection has guaranteed our approach to achieve loop-free routing. There are two situations that should be considered. One is that the optimal protection path does exist; that is, OPS finds the earlier special router that can shorten the length of the protection tunnel, so the affected packets can be forwarded along the shortened protection tunnel. After that, they are forwarded to the destination according to the BGP routing tables. The other one is that the earlier special router does not exist; that is, the protection tunnel is an entire protection path. As the protection path is a sequence of ASes between the AS associated with the failed links and the AS owing the affected prefix, the affected packets can be forwarded to their destinations along the entire AS-level protection paths. As described above, the affected packets can be forwarded to their destinations along an entire path under the two situations, which can achieve the loop-free routing.
To accurately deactivate the protection tunnels under multiple link failure scenarios, it should consider the special condition that the constructed protection tunnels or part of them may be shared by multiple link failures. In this case, it must make sure that the protection tunnel is useless before it is deactivated. To this end, we adopt a number to represent the times a protection tunnel was shared. The protection tunnel can be removed only when the value of the number is equal to 0. The model can be expressed as follows:
In this section, we evaluate the performance of SDFRR-ASR on the real-life topology as shown in Figure
Experimental topology.
Several metrics are used in our experiments to evaluate the performance of SDFRR-ASR, and the details are described as follows.
As for the continuous failure scenario, the failure recovery time is the sum of all failure’s recovery times. The formula can be represented as follows:
When Network Controller sends commands to install OpenFlow flow entries, we assume that the hop from the Network Controller to the router on the protection path is
When Network Controller sends commands to remove OpenFlow flow entries, we assume that the hop from the Network Controller to the router is
All in all, the total control overhead is the sum of
As we are interested in concurrent failure scenarios and continuous failure scenarios, two sets of experiments with AS relationships were conducted to evaluate the performance of our approach. Due to space limitations, only up to four failures are shown in the results, but it is important to note that our approach can handle multiple link failures, rather than just four.
We first conduct experiments to compare the failure recovery time between SDFRR-ASR and normal BGP protocol; that is, the MRAI timer for eBGP sessions is set to 30 seconds and that for iBGP sessions is set to 5 seconds. The failure recovery time in SDFRR-ASR is described in Section
Figures
Failure recovery time under concurrent failures.
Two concurrent failures
Three concurrent failures
Four concurrent failures
Failure recovery time under continuous failures.
Two continuous failures
Three continuous failures
Four continuous failures
In order to compare the packet loss rate between SDFRR-ASR and normal BGP protocol, experiments were conducted under different failure scenarios. The size of the packets sent by Spirent TestCenter is 128 bytes.
Figures
Packet loss rate under concurrent failures.
Two concurrent failures
Three concurrent failures
Four concurrent failures
Packet loss rate under continuous failures.
Two continuous failures
Three continuous failures
Four continuous failures
We investigate the control overhead incurred by our approach upon concurrent failure scenarios and continuous failure scenarios. The results are presented in Tables
Control overhead incurred by concurrent failures.
Failure Scenario | Test ID | Establish tunnel (KB) | Remove tunnel (KB) | Total (KB) | Average (KB) |
---|---|---|---|---|---|
Two concurrent failures | 1 | 0.9140 | 0.8164 | 1.7304 | 1.7255 |
2 | 0.9101 | 0.8125 | 1.7226 | ||
3 | 0.9121 | 0.8144 | 1.7265 | ||
4 | 0.9101 | 0.8125 | 1.7226 | ||
|
|||||
Three concurrent failures | 1 | 1.3691 | 1.2226 | 2.5917 | 2.5887 |
2 | 1.3671 | 1.2207 | 2.5878 | ||
3 | 1.3691 | 1.2226 | 2.5917 | ||
4 | 1.3652 | 1.2187 | 2.5839 | ||
|
|||||
Four concurrent failures | 1 | 1.8222 | 1.6269 | 3.4491 | 3.4452 |
2 | 1.8203 | 1.6250 | 3.4453 | ||
3 | 1.8183 | 1.6230 | 3.4413 | ||
4 | 1.8203 | 1.6250 | 3.4453 |
Control overhead incurred by continuous failures.
Failure Scenario | Test ID | Establish tunnel (KB) | Remove tunnel (KB) | Total (KB) | Average (KB) |
---|---|---|---|---|---|
Two continuous failures | 1 | 0.9121 | 0.8144 | 1.7265 | 2.1532 |
2 | 0.9101 | 0.8125 | 1.7226 | ||
3 | 1.3652 | 1.2167 | 2.5819 | ||
4 | 1.3632 | 1.2187 | 2.5819 | ||
|
|||||
Three continuous failures | 1 | 1.3671 | 1.2207 | 2.5878 | 3.0140 |
2 | 1.3652 | 1.2187 | 2.5839 | ||
3 | 1.8183 | 1.6230 | 3.4413 | ||
4 | 1.8183 | 1.6250 | 3.4433 | ||
|
|||||
Four continuous failures | 1 | 1.8203 | 1.6250 | 3.4453 | 3.8797 |
2 | 1.8417 | 1.6269 | 3.4686 | ||
3 | 2.2753 | 2.0312 | 4.3065 | ||
4 | 2.2714 | 2.0273 | 4.2987 |
Table
Interdomain links play a critical role in the global Internet routing system, but their failures are common on the Internet, while BGP routers fail to react quickly to recover from them. Besides, AS relationships also have an impact on the interdomain routing. In this paper, we present software defined AS-level fast rerouting to handle multiple link failures in one administrative domain by considering AS relationships.
In the architecture of our scheme, the control plane is separated from the data plane as SDN. Our approach aims to provide AS-level fast rerouting with AS relationships under multiple link failures. Taking different failure scenarios into account, we design efficient algorithms to automatically and dynamically find policy-compliant protection paths by considering AS relationships, in addition to routing policies and BGP decision rules; then OpenFlow forwarding rules are installed on relevant routers associated with protection paths to provide fast packet rerouting. Experimental results and analysis demonstrate its feasibility and effectiveness. Our future research will involve how to establish protection tunnels if not all the ASes are SDN-capable. We will also consider the case when each AS belongs to a separate ISP.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The research presented in this paper is supported by the National Natural Science Foundation of China, Distinguished Young Scholar (61425012), and Fundamental Research Funds for the Central Universities (2014PTB-00-02).