Reaching Consensus with Byzantine Faulty Controllers in Software-Defined Networks

Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung City 202301, Taiwan Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Norway University of Applied Sciences, Bergen 5063, Norway Department of Mathematics & Computer Science, Brandon University, Brandon, Canada Research Centre for Interneural Computing, China Medical University, Taichung, Taiwan Hon Lin Technology Company Limited, Taipei 11492, Taiwan


Introduction
With the flourishing enhancement of cloud computing and the Internet of Things (IoT), the scale of the Internet has grown at a very rapid rate [1][2][3][4][5]. This also makes the current Internet Protocol (IP) network architecture gradually unable to forward such a large amount of network traffic. This is because, in the current IP network infrastructure, the packets are forwarded through routers. The routing table of these routers is constructed by traditional routing protocols, which are preconfigured and embedded in the hardware by the vendor and run by a specific model. Network administrators do not have control over packets forwarding, thus resulting in poor utilization of network bandwidth [6]. Take multitenancy technology in data centers as an example, this cloud service is very convenient to users, but it is a heavy burden for the traditional IP network infrastructure [7]. The reason is that the routes to forwarding packets are calculated by dynamic routing protocols, and it is hard for the network administrator to identify which route the packets of a specific application have taken. Moreover, when customized adjustments of the network configurations are needed, the network administrator has to log in to each router or switch to manually change the settings via the Command-Line Interface (CLI). This method has a major drawback, that is, manual configuration of routers one by one involves a high risk of mistyping or giving inconsistent commands, which may cause failure of the entire network [8].
1.1. Software-Defined Networks (SDN). When attempting to solve issues of increased utilization of network bandwidth using the classical IP network infrastructure, the softwaredefined networking-(SDN-) based architecture is proposed. SDN divides a network into a control plane and a data plane. In this network, the controller in the control plane is responsible for the control of the network, while switches in the data plane are responsible for packets forwarding. The controller can communicate with the switches in the data plane via the southbound interface. The southbound interface in SDN is better standardized. In the present, the OpenFlow interface is the most influential southbound interface [9]. On the other hand, applications can communicate with the controller via the northbound interface, but so far, this interface has not been standardized. Common northbound interfaces include REST [10] and SNMP [11].
Simply put, SDN is a concept of networking architecture. It does not specify the use of any specific technology. The core ideas are to separate control from forwarding and to use software programs to control the network and flexibly manage the network [12]. OpenFlow is a concrete protocol that can be implemented as the southbound interface in SDN. SDN has the following characteristics: Applications can flexibly control network traffic, optimize load balancing according to user needs, and make the use of bandwidth more adequate. Moreover, SDN considers time and energy saving to network management according to user needs. On the other hand, SDN combined with the virtualization technology can be used to enable different services in data center, thus sharing network equipment and reducing equipment investment.
Nowadays, Open Networking Foundation (ONF) is the most influential organization for SDN. It was founded in 2011 by Google, Facebook, and Microsoft [13]. Unlike other standards organizations formed by manufacturers of network equipment, ONF was created by "users" of network equipment. The mission of ONF is to promote OpenFlow protocol as the only standard for the southbound interface [14]. Established in 2013, OpenDaylight (ODL) is another well-known organization for SDN. It was initiated by 18 well-known IT companies [15]. The objective of ODL is to create an open SDN platform. As shown in Figure 1, this platform comprises three major layers. The top layer consists of the northbound interface and built-in applications and services. The middle layer is formed by network services and platform services. The bottom layer is the southbound interface. The northbound interface supports REST APIs. The southbound interface supports OpenFlow protocols and many other protocols, such as SNMP, LISP, XMPP,     Wireless Communications and Mobile Computing PCEP, OF-Config, Net-Config, BGP-LS, and even private interfaces defined by vendors [16]. However, attacks and threats are prevalent in today's networking environment [17][18][19]. As for security, a number of scholars have investigated the security mechanisms of SDN in recent years. The security issues that have been discussed in the SDN literature include access control [20], authentication [21], and nonrepudiation [22]. For access control, Nayak et al. [20] constructed a system called Resonance in an OpenFlow-based network. This system executes dynamic access controls based on flow-level messages and real-time alerts. For authentication, Porras et al. [21] designed for the NOXZ OpenFlow controller a software program called Fort-NOX to support role-based authorization as well as detection and negotiation of conflicting flow-entries. For nonrepudiation, Yao et al. [22] proposed a source address validation mechanism, called the Virtual Source Address Validation Edge (VAVE), for the OpenFlow/NOX network architecture. VAVE has three major features, including supporting ondemand filtering, rapidly reacting to route changes, and avoiding unnecessary checks to reduce the load on the router.
The Virtualizing Network Function (NFV) instead uses software such as firewall, router, load balancer, and customer's premise equipment (CME) to implement the network functions of physical devices. The NFV undermines the idea that the network features should exist on a particular hardware device. This is because, after virtualization of the network function, it has excellent elastic configuration features, allowing the deployment effectiveness of network services to be speeded up and costs of buying dedicated hardware reduced. Many great personalized services can be introduced easily with the complementarity of SDN and NFV.

Distributed Control
Architecture. The controller of SDN is responsible for managing network resources and planning flow-entries according to the demand of upper-layer applications for the bottom-layer switches. Figure 2 illustrates a centralized control architecture and a distributed control architecture. The earlier SDN architecture is based on centralized control, that is, the entire network is managed by only one controller [23]. However, the single point of failure (SPOF) is likely to happen when the SDN is managed by a single controller. Moreover, the scalability problem is also an important issue if the network is huge. It would be hard to ensure the stability and efficiency of the network. Therefore, subsequent scholars proposed to use multiple controllers to collaboratively deal with the control tasks [24].
For example, Kyung et al. [25] propose the Load Distribution (LD) algorithm to avoid the overload problem of the controllers. The basic concept of the LD algorithm is that when the loading of the default controller reaches the threshold, the tasks it has received will be transferred to other controllers. Wang et al. [26] suggested to segment a large net-work into several domains and then use multiple controllers to, respectively, manage the domains to reduce the loading of the controllers. Wang et al. pointed out that finding an optimal number of controllers for a multidomain SDN environment is an NP-hard problem. For this problem, they proposed an approximation algorithm called the Greedy Subgraph Cover Problem (GSGCP), which is capable of placing a smaller number of controllers to manage a multidomain SDN. In addition, Yannan et al. [27] suggested that considering the expected percentage of control path loss in the problem of placing controllers can help effectively increase the reliability of SDN. They call the problem as the Reliability-aware Controller Placement (RCP) problem. In their study, the authors also show that the RCP problem is also an NP-hard problem.
1.3. Fault-Tolerant Mechanism. As mentioned above, a distributed control structure involving the collaboration of multiple controllers can effectively enhance the performance and stability of SDNs. However, attacks and threats are prevalent in today's networking environment. SDN controllers may not work normally when it is encountered by hackers or virus infection. Generally, faults of SDN controllers can be classified into crash fault [28], omission fault [29], and Byzantine fault [30][31][32]. A crash fault means that the controller will stop functioning properly. The omission fault of a controller occurs when messages are fully or partially omitted by the controller. Unlike the crash fault or omission fault, a Byzantine fault may cause a controller to send wrong messages or conspire with other faulty devices to interfere with the operation of nonfaulty controllers, thereby delivering incorrect computing results. These malicious behaviours include, for example, sending an incorrect flow-entry to the switch to prevent packets from being delivered to the correct destination and sending a specific flow-entry to the switch to enable the attacker to receive a copy of all the packets delivered to or sent from a particular host (i.e., eavesdropping packets).
To provide a reliable SDN with multiple-controller architecture, we need a mechanism that ensures the correctness of the computing results even if any controller has a fault or is under attack. That is, we have to design a protocol that can take advantage of the distributed system to let controllers work together to resist attacks from Byzantine controllers. Therefore, even in the presence of a faulty controller, the system will still deliver correct computing results. In a distributed system, we can establish the system's fault tolerance capability by solving the consensus problem. Common applications of the consensus protocol include the leader election problem in the P2P network [33], duplicated files storage coordination problem [34], cruise control problem in the car platoon [35], and clock synchronization [36,37]. The requirements of the consensus protocol are shown in Table 1: The requirements of the consensus protocol.

Consensus
Each correct processor should compute a common value. Validity If the initial value of each processors is v i , then each correct processor should obtain the value v i .

Wireless Communications and Mobile Computing
Table 1 [38]. The main contributions of this paper are summarized as follows: (1) We design a consensus protocol for SDN with multiple-controller architecture. (2) The proposed consensus protocol can tolerate the most damaging type of faults (i.e., Byzantine fault). (3) By solving the consensus problem in SDN with multiple-controller architecture, we can create a highly reliable networking environment.
The remainder of this paper is organized as follows. "Preliminaries" describes the preliminary. "Concept and Approach" describes the protocol for solving the consensus problem. "An Execution Example" shows an example of solving the consensus problem. Finally, conclusions are drawn in "Conclusion."

Preliminaries
In this study, we assume that the underlying SDN is synchronous, and the failure type of faulty controllers includes Byzantine fault and dormant fault. Besides, we must reasonably limit the number of faulty controllers to ensure the system can work properly. The number of faulty controllers that can be tolerated by the system is subject to the number of controllers in the network. The following are environmental assumptions: In this paper, a consensus protocol is designed for SDNs. The proposed consensus protocol can take advantage of the distributed system to let controllers work together to resist attacks from Byzantine controllers. That is, all of the nonfaulty controllers within the networks are able to compute an identical consensus value by using the proposed consensus protocol. We will present the problem to address using the following equations: We can clearly express the objective function as in Equation (1), where we say that Cðn i Þ is the consensus value computed using n i . Equation (2) restricts each controller's initial value and says that it comes from the range {0, 1}. Equation (3) constrains the number of allowed faulty controllers in the SDN (we will explain how this quantity limit is derived in "Concept and Approach").

C n i
ð Þ = C n j À Á ,∀non-faulty controller n i , n j ∈ N and i ≠ j, subject to

Concept and Approach
Here, we identify our concepts as well as approach for Consensus Protocol for SDN (CP SDN ). First, all controllers are able to select the initial value using domain range D = f0, 1 g. As the next step, each controller is examined and allowed to exchange their corresponding initial value with all of the other controllers. After message exchange finishes, all of the controllers compute their own consensus value making use of the collected messages. In other words, there are exactly 2 phases in (CP SDN ) which are known as Msg_Exchanging as well as Cons_Making phase, respectively. Next, we give descriptions for the two phases introduced as follows. To start, each and every controller sends its initial value to every other controller in the SDN. Later, the controller will rely on adequate collaboration (i.e., exchanging the collected initial values from other controllers) to overcome the faults and attacks from a few controllers. Hence, the first work is to compute the number of rounds required in the message exchanging. According to Siu et al. [39], if a message is unauthenticated in the network with m Byzantine processors and b dormant processors, there must be at least bðn − 1Þ/3c + 2m + b + 1 processors, and the minimum number of rounds of message exchange is bðn − 1Þ/3c + 1. In the proposed protocol, only the controller is involved. Therefore, the constraint of our system model is n > bðn − 1 Þ/3c + 2f b + f d , and the number of rounds of message exchange is at most f b + 1 ðf b = bðn − 1Þ/3c, f b is the maximum number of Byzantine controllers n = |N | ). Next, we will explain how the controller stores the collected messages. As mentioned above, the number of rounds of message exchanging is f b + 1. Besides, in each and every round, the controller will transmit the messages collected in the previous round to all the other controllers in the network. This means the number of messages will grow at a very rapid speed (exponentially). Therefore, using an appropriate data structure to store the collected messages is very important. Bar-Noy et al. [40] pointed out that the tree structure is very suitable for storing data collected by this exchange method. In this paper, we will also store messages collected in the message exchange process in a tree structure, called the SDN-tree. Figure 3 shows a clear example of what is known as a 2-level SDN-tree. The SDN-tree as given can be organized and labelled in the following manner: Each and every vertex of a given SDN-tree is first labelled to be a nonrepeating sequence α of the controller identifies. To avoid being strongly influenced by well-known Byzantine controllers, no vertices are kept with repeating names within an SDN-tree. The SDN-tree root is then labelled clearly as "root" as well as every parent of a given vertex is then labelled clearly as the sequence αn i (i.e., α concatenates n i , where n i is a single controller name) is labelled α, and the value of vertex of SDN-tree is a value. Figure 3 shows an example of that the parent of the vertex n 2 n 1 (i.e., n 2 concatenates n 1 ) is the vertex n 2 . The value stored in the vertex n 2 n 1 is "1." The checksum, as well as the time-out mechanism, may be able to detect messages that are contaminated by any dormant controllers. As such, in our proposed protocol, we vote SDN Function/ // for each controller n i ∈ N 1. Begin 2. if vertex σ is a leaf then -(1) 3.
output ⊥ 13. End #. valðσÞ is the value of vertex σ. Function maj child ðσÞ is used to find out the majority value in vertex σ's child nodes. packði − 1, msgÞ; 4.
for n j ∈ N do 5.
for n j ∈ N do 9.

Wireless Communications and Mobile Computing
show that those contaminated messages are clearly marked by "Ω 0 ." Here, the value "Ω 0 " is to be used only for the marking of contaminated messages and only from dormant controllers. So, in the upcoming rounds of the message exchange as defined earlier, any and all messages collected from previous rounds will automatically be exchanged. Then, if and only if the message that is received is one of Ω j , Ω j+1 instead then Ω j will then be clearly stored so as to represent the contaminated value which came from the previous round. We note here that the main purpose of +1 noted earlier is to be able to show the number of rounds for message exchange that have already passed. This very simple procedure can then aid us to be able to determine if and only if the message exchange has a contaminated value of the preceding controller or even a contaminated value for the current controller.

Cons_Making Phase.
Here, we describe what is known as the Cons_Making phase. This is known as where a consensus value can be computed. So, to describe the Cons_Making phase, we start by noting that the function called vote SDN is first used for the computation of consensus value starting at the root in SDN-tree. We work here from the furthest leaves all the way to root. The function vote SDN contains 5 conditions that can be presented in the following manner: If and only if vertex α can be defined as a leaf, then and only then there will be only 1 value within vertex α. Therefore, majority of the value then can be defined as value of vertex α (this is Condition 1); Condition 2 can be defined as when we used to be able to remove any influence from the well-known Byzantine controllers; Condition 3 then will be used for the removal of any influence that may arise from the dormant controllers; Condition 4 then will only be used if and only if we get a majority value. If and only if majority value exists, then and only then the output for default value ⊥ exists, where we can say that ⊥∉ V (Condition 5). Conditions 1, 4, and 5 bear similarities to the well-known majority voting convention [32]. The function vote SDN can be clearly seen in Algorithm 1.

An Execution Example
The flowchart of the proposed CP SDN is shown in Figure 4. An example in this section will be presented that is able to show via demonstrating the manner in which CP SDN helps controllers achieve consensus. We assume here that the Each controller sends its initial value/the received messages to all controllers.
Each controller stores the other controllers' messages into it's SDN-tree START END

Msg_exchanging phase
Cons_making phase Each controller computes the consensus value by using the function vote SDN .  network consists of five controllers, which includes N = fn 1 , n 2 , n 3 , n 4 , n 5 g. Among these controllers, controller n 3 is a Byzantine controller, and controller n 5 is a dormant controller. Table 2 shows the initial value of each nonfaulty controller.
In the first round of the Msg_Exchanging phase, each controller will send its initial value to all other controllers in the network. When nonfaulty controller receives the initial values from other controllers, it will store the received initial values in level 1 of its SDN-tree. Controller n 5 is a dormant controller. Under the operation of error checking codes and the time-out mechanism, it will be detected (as shown in Figure 5(a)). On the other hand, due to the controller n 3 is a Byzantine faulty controller, it may send inconsistent initial value to other controllers in the network. As shown in Figure 5(a), some controllers receive 0 as the initial value of Controller n 3 , and some receive 1.
Next, the controllers enter the second round of the Msg_ Exchanging phase. In the second round, controllers will  ✓ vote SDN n 5 ð Þ= vote SDN n 5 n 1 , n 5 n 2 , n 5 n 3 , n 5 n 4 ð Þ (v) vote SDN n 5 ð Þ= vote SDN Ω 1 , Ω 1 , 1, Ω 1 À Á = Ω 0 7 Wireless Communications and Mobile Computing exchange the messages collected in the first round with each other. Byzantine controller n 3 is likely to continue interfering (i.e., sends arbitrary values to nonfaulty controllers). On the other hand, when forwarding the value received from a dormant controller, nonfaulty controllers will mark the value by Ω i . The SDN-trees of nonfaulty controllers n 1 , n 2 , and n 4 after the 2nd of Msg_Exchanging phase are shown in Figure 5(b).
In this example, the number of rounds required in the Msg_Exchanging phase is 2 (f b + 1 = bðn − 1Þ/3c + 1 = 2). Hence, after two rounds of message exchange, each nonfaulty controller will enter the Cons_Making phase. In the Cons_ Making phase, each nonfaulty controller will use the vote SDN function to compute the consensus value. In this example, the consensus value computed by the nonfaulty controllers is "⊥" (vote SDN ðrootÞ = vote SDN ð0,1,1,0, Ω 0 Þ = ⊥). The process of computing the consensus value by vote SDN function is shown in Table 3.

Conclusion
A distributed control structure involving the collaboration of multiple controllers can effectively enhance the performance and stability of SDNs. However, attacks and threats are prevalent in today's networking environment. SDN controllers may not work normally when it is encountered by hackers or virus infection. To provide a reliable SDN with a distributed control architecture, we need a mechanism that ensures the correctness of the computing results even if any controller has a fault or is under attack. Hence, we discussed the fault-tolerant consensus problem in the SDN with distributed control architecture. With the proposed CP SDN protocol, if the number of controllers n is greater than bðn − 1Þ/3c + 2f b + f d , we can ensure that the nonfaulty controller can reach the common consensus value after f b + 1 rounds of message exchange. For some applications that require very high reliability, reaching a consensus is not enough. Hence, we must consider another related problem, called the fault diagnosis problem. This will be the direction of our future research in SDNs.

Data Availability
No data were used to support this study.

Conflicts of Interest
The authors declare that they have no conflicts of interest.