FAS: Using FPGA to Accelerate and Secure SDN Software Switches

. Software-Defined Networking (SDN) promises the vision of more flexible and manageable networks but requires certain level of programmability in the data plane to accommodate different forwarding abstractions. SDN software switches running on commodity multicore platforms are programmable and are with low deployment cost. However, the performance of SDN software switches is not satisfactory due to the complex forwarding operations on packets. Moreover, this may hinder the performance of real-timesecurityonsoftwareswitch.Inthispaper,weanalyzetheforwardingprocedureandidentifytheperformancebottleneckof SDNsoftwareswitches.AnFPGA-basedmechanismforacceleratingandsecuringSDNswitches,namedFAS(FPGA-Accelerated SDNsoftwareswitch),isproposedtotakeadvantageofthereconfigurabilityandhigh-performanceadvantagesofFPGA.FAS improvestheperformanceaswellasthecapacityagainstmalicioustrafficattacksofSDNsoftwareswitchesbyoffloadingsome functionalmodules.WevalidateFASonanFPGA-basednetworkprocessingplatform.Experimentresultsdemonstratethatthe forwardingrateofFAScanbe44%higherthantheoriginalSDNsoftwareswitch.Inaddition,FASprovidesnewopportunityto enhancethesecurityofSDNsoftwareswitchesbyallowingthedeploymentofbump-in-the-wiresecuritymodules(suchaspacket detectorsandfilters)inFPGA.


Introduction
Software-Defined Networking (SDN) is a transforming networking design that simplifies network management and improves programmability of network [1].Its basic attributes include the separation of control and data planes, logically centralized control of networks, and flexible and open interface to program underlying network infrastructure.In the architecture of SDN, the southbound interface is responsible for the interaction of network states between the control and data planes.Furthermore, it defines the forwarding abstraction of SDN data plane.
As SDN offers some hope for rapid prototyping and deployment, the data plane must provide mechanisms to deploy new network protocols, header formats, and functions, yet it still forwards traffic as fast as possible.OpenFlow is the most widely used standard SDN southbound interface which is a vendor-independent interface to switching elements [2,3].Most SDN switches are thus OpenFlow SDN Switches.
There are mainly two ways to implement the SDN data plane: hardware and software.Some SDN hardware switches, developed by the vendors of HP, NEC, and Arista, utilize customized ASIC switching chip.Although the ASIC way can get high forwarding performance, it can hardly be extended for new protocols and functions.Programmable hardware switches [4,5] can provide the flexibility at the expense of large amount of hardware resources, such as expensive TCAMs for flow entries.
The SDN software switches are the most flexible to support new network services, new protocols, and new functions [6][7][8][9].Unlike the TCAMs in hardware SDN switches, memory resource for accommodating flow rules is abundant in software SDN switches [10].Thus, they are primary choices for SDN researchers in laboratories and have been widely deployed as first-hop virtual switches in data centers.However, a purely software-based approach can hardly satisfy the strict performance and line-speed security requirements of most modern networks.

Security and Communication Networks
With the continuous improvement of the capacity and the computing power of FPGA (Field Programmable Gate Array), we try to exploit benefits of FPGA on processing packets.It is noteworthy that Microsoft has built FPGA fabric attached to each server to accelerate large-scale data center services with customized function logic [11].Due to its high performance with flexible reconfigurability, FPGA is also a good choice for accelerating and securing SDN software switches.We propose FAS (FPGA-Accelerated SDN software switch) to enable the offloading of time-consuming software functional modules and implementation of the real-time security modules in SDN switch processing path.The mechanism can address the performance shortage nicely while retaining the flexibility of software switches.In addition, it can also enhance the security property of software switches by deploying bump-in-the-wire security modules in FPGA.The contributions of this paper are summarized as follows.
(1) We make a comprehensive survey on current SDN switches in both academia and industry and classify the existing implementation models of SDN switches into different categories. ( We analyze the bottlenecks in SDN software switches on modern commodity multicore platform. (3) We design FAS mechanism to offload functions in the forwarding path of SDN software switch, including packet buffer management, packet parsing, and some action executions for packets, to FPGA hardware.
(4) We implement the prototype of FAS on NetMagic-Pro, an FPGA-based network processing platform, and compare the performance with the original SDN software switches on commodity multicore platform.
The remainder of the paper is organized as follows.In Section 2, we review the evolvement of OpenFlow specification and introduce current SDN switches and their implementation models.Section 3 analyzes the overhead of Open-Flow forwarding in SDN software switches and points out the bottlenecks.In Section 4, we put forward a mechanism to offload software procedures of OpenFlow forwarding path to FPGA.Section 5 describes detailed design of the FAS mechanism implemented on an FPGA-based network processing platform (i.e., NetMagic-Pro).In Section 6, we give the performance comparison of FAS and the original SDN software switch.We summarize our work in Section 7.

Background and Related Work
In this section, we firstly make a brief introduction of the evolution of OpenFlow protocol, which challenges the design of SDN switches.Then we describe implementation models of SDN switches in both academia and industry.

Evolvement of OpenFlow.
OpenFlow is the first and prevailing standard defined by Open Network Foundation (ONF) among all SDN southbound interfaces.It is wildly supported by many commercial switches, including HP, NEC, Arista, and Pica 8, and the list is still continually growing.
Since the first version (v1.0) distributed by ONF in December 2009, the specification of OpenFlow has been updated to version 1.5 in 2015 and has grown increasingly more complicated [3].Although many features have been added to OpenFlow, the core concept of OpenFlow has not changed.OpenFlow switches process and forwards traffic on the basis of flows instead of individual packets.In the first version, the data plane abstraction is a single flow table of flow rules which could match packets on 12 header fields (e.g., MAC addresses, IP addresses, and TCP/UDP port numbers).In version 1.5, the OpenFlow switch has a pipeline of flow tables, where each flow rule has match fields (41 fields in the packet header), instructions (e.g., drop, flood, forward, or send the packet to the controller), a set of counters (to track the number of bytes and packets), a priority (to disambiguate between rules with overlapping patterns), timeouts (expiration time of flow rules), cookie (opaque data value chosen by the controller), and flags (to alter the way flow entries are managed).Upon receiving a packet, an OpenFlow switch identifies the highest-priority matching rule, performs the associated actions, and increments the counters.
With the evolvement of OpenFlow specification, Open-Flow has been extended with more capabilities and functions.As a result, the procedure of OpenFlow processing is getting more complicated and proliferating with no sign of stopping.It makes challenges to the implementation of SDN switches.

Implementation Models of SDN Switches.
According to the OpenFlow specification, an SDN switch usually consists of four components: OpenFlow Channel (OFCh), Open-Flow Forwarding pipeline (OFFw), and physical port (Port).Besides, to accelerate multiple-tuple classification procedure in OFFw, a commonly used technique, Flow Cache (FCa), is introduced in SDN switches [12].The descriptions of the four modules are as follows.
Port.It is the interface between the network and the switch and is responsible for packet receiving and sending.
FCa.It is the fast forwarding path for OFFw, which caches the entries recently matched in OFFw.Since network traffic has sufficient locality to provide high cache hit rate with relatively small cache size, FCa can accelerate the forwarding rate by bypassing OFFw.
OFFw.It maintains flow table of entries, in which each entry contains a set of packet fields to match and the corresponding actions to perform (e.g., forwarding, dropping, and modifying header).
OFCh.In the event when a switch does not find a match in OFFw, the packet is forwarded to the controller through OFCh.After deciding how to forward the new flow, the controller sends OpenFlow messages to the required switches.Then OFCh resolves those messages, generates flow rules, and installs them to the flow table.Besides, OFCh is also responsible for exchanging the states between the controller and the switch.
SDN switches have been implemented on different platforms (e.g., general CPUs, fixed function switch ASICs, reconfigurable hardware, NPUs, and FPGAs).The platforms can be divided into two categories: hardware-based switches  and software-based switches.Figure 1 shows different implementation models of the existing SDN switches.In the hardware-based switches, FCa or OFFw is implemented by hardware, such as ASICs and FPGAs.For example, Naous et al. implement an OpenFlow switch on Stanford's NetFPGA platform [13].Pongrácz et al. devise an NP-based SDN switch to enhance the programmability in the data plane [14].Some companies, such as Pica 8, supply commodity OpenFlow switches based on ASICs switch chips.These hardware switches are based on the Flow Cache hardware offloading model (in Figure 1(c)).RMT [4] and FM6000 [5] devise reconfigurable match tables in the OpenFlow pipeline, and they conform to the hardware forwarding model in Figure 1(d).The SDN hardware switches can provide sufficient forwarding capability, but they are costly and inflexible.
Recent improvement in processing power of multicores has given reason to revisit software switching.Due to its high flexibility and short development cycle, the SDN software switches are getting increased attention.Software switches commonly use commodity off-the-shelf (COST) PCs or servers equipped with multicores and multiple Network Interface Cards (NICs), running on general-purpose operation systems (such as Linux).The general-purpose OS provides a comfortable environment for research and development.OpenFlow reference switch (maintained by ONF) [6], OFSoftSwitch (maintained by CPqD) [7], and OpenFlow Click (maintained by Stanford) [8] are the typical SDN software switches referred to the software forwarding model (in Figure 1(a)).Open vSwitch [9] (namely, OVS, maintained by VMware) is an SDN software switch belonging to the user space-kernel cooperation model as shown in Figure 1(b).Table 1 gives the detailed information of abovementioned SDN switching platforms.

Problem Description and Analysis
3.1.OpenFlow Forwarding in SDN Software Switches.As the OVS is the most advanced and widely used SDN software switch, we focus on the user space-and-kernel cooperation (UKC) model as shown in Figure 1(b).Figure 2(a) illustrates the forwarding process of UKC model, while Figure 2(b) shows the timeline of it.For simplicity, we first discuss the procedure under multicore architecture with single-queue Network Interface Cards (NICs).The case with multiplequeue NICs will be discussed in Section 3.2.Without loss of generality, we assume that NET RX and NET TX of hardware interrupts in NIC  are both served by a fixed CPU core .Cores  and  are denoted as  and  in Figure 2(a).If  equals , packets are sent and received through the same NIC and processed by the same CPU core.
When a packet arrives at NIC , this packet is attached to a descriptor in the NIC 's receiving (namely, RX) queue.The descriptor indicates the memory locations to store the incoming packets via Direct Memory Access (DMA) transfer.There are two data structures in the network stack of Linux Kernel.data buff of 2 KB size is to hold the packet itself.The other data structure is sk buff, which carries packet information metadata (e.g., pointer of packet data, MAC header, IP header, and packet states) used by TCP/IP protocol stack.The size of sk buff is about 250 bytes.The sk buff has a  pointer to data buff, and it makes up a Skb with the data buff.
After hardware interrupt of packet incoming is served, the Software Interrupts (SoftIRQs) are scheduled afterwards to accomplish the subsequent packet processing.The SoftIRQs are also bound to the specific processors.
In OVS, the handler function of SoftIRQ of  parses the packet by extracting all related match fields from Skb and stores them to flow key.
The flow key is used to look up the Flow Cache, which is shared among all processors in the kernel.If the Flow Cache contains the flow key value, the Flow Cache will return instructions for processing the packet.Then it updates corresponding states and executes actions to the packet.If the packet is to be delivered by NIC , it will be placed in the sending (namely, TX) queue of the NIC .The sending process will be called in the function of NET TX SoftIRQ of .
If the lookup of Flow Cache is missed, the packet will be sent to the user space of Linux via the communication mechanism between the kernel and the user space, for example, netlink.Multiple handler threads in the user space will be created when the OpenFlow software switch is set up.These threads are responsible for processing mismatched packets coming from the kernel.The handler threads look up the shared multiflow tables.If the packets match with the tables, the handler threads will calculate for new Flow Cache entries and update them into the Flow Cache.The handler will also reinject the packet back to kernel for executing corresponding actions on the packet.If not, a Packet-in message will be generated and sent to the controller.The controller responds with a FlowMod message to install corresponding flow rule for the packet to the multiflow tables.
Based on the analysis of the forwarding process, we evaluate the processing overhead in SDN software switches.We make some notations to define the time slots in the procedure.The time period from the packet arriving at the NIC to the lookup of the cache is denoted as  1 .The time of Flow Cache lookup is  2 .The time period from the start of matching Flow Cache to the end of sending out the packets is  3 .In the lookup, we denote the cache hit rate as  1 .The time period from the start to the end of looking up multiflow tables of user space is  4 .The time period from the beginning of matching multiflow tables to the end of sending out the packet is  5 .We denote the hit rate of multiflow tables as  2 .The time period from the start of triggering the controller to the end of installing the rule is  6 .At last, the time used for sending out the packet is  7 .Given these notations above, the evaluated total processing time  is formulated as follows: Due to the high locality of network traffic and the proactive flow rule setup, the Flow Cache can get a high hit rate.As verified by Pfaff et al., the overall cache hit rate of Open vSwitch was 97.7% (i.e., the value of  1 ) in a real commercial multitenant data center [15].Thus, in most instances, the total process time  of a packet in SDN software switches can be approximately calculated as ( These three periods constitute the fast path of the OVS.As depicted in Figure 2(a), the OpenFlow fast path consists of six functional modules: Receive, Parse Packet, Lookup, Update State, Execute Action, and Send Process.To understand the overhead for each functional module, we run Open vSwitch in a commodity PC as a software switch.We use the tool of Iperf [16] for generating the input traffic which is running over two 1 GbE NICs of the PC.Then we use Oprofile [17] to count CPU cycles for each function in fast path of Open vSwitch.We group all functions into the above six functional modules and the experimental results are shown in Figure 3.As we can observe, the Lookup module is the major bottleneck in the fast path.The other modules consume up to 44% of processing time in total.Packet I/O.Unlike CPU-intensive tasks, packet I/O is a critical step in software packet forwarding.Packet receiving and sending consume a large amount of CPU cycles.As pointed out in [18], the buffer allocation and release in packet I/O are the two major overheads, which represent up to 54% of total overhead.

Bottlenecks in SDN
Researchers have proposed several optimizing techniques such as Skb recycle queue, memory mapping, batch processing, affinity, and software prefetching to accelerate packet  I/O.By combining some of above techniques, Intel develops a high-performance packet processing architecture on x86 platforms, named Data Plane Development Kit (DPDK) [19].But the DPDK framework needs the support of specific CPU and NIC, which is not general.Rizzo designs a novel framework for fast packet I/O in general-purpose OS, named netmap [20].However, netmap is originally run in the user space of OS.The performance drops when it is applied to kernel-based SDN software switches.Software Forwarding.A critical problem is contention for shared resources-caches, queues in NICs, and flow tables-in the case of multiple threads running concurrently to forward packets [21].In order to improve the processing performance of SMP Linux, NIC-based core affinity is proposed to exploit the parallel packet processing capability of multicore architecture.By maintaining the affinity relation of a core and an NIC, software and hardware interrupts of the NIC are handled by the specified core.As a result, it incurs less cache miss and improves the execution efficiency of packet processing.However, as for packet forwarding, the NICbased core affinity is not efficient.Since the receiving and sending of packet are handled in different cores usually, which causes the problems of mutex exclusion and cache coherence, Han et al. propose queue-based core affinity [22].Each receiving and sending queue in an NIC maps to a core, and the corresponding CPU core accesses the queue exclusively, eliminating cache bouncing and lock contention caused by shared data structures.However, the NICs must support multiple queues and Receive Side Scaling (RSS), which can hardly be scalable.OpenFlow Classification.As described in Figure 2 [11].Moreover, some actions of packet processing are costly in software, such as fields rewriting and packet encapsulation/decapsulation.
In addition, the performance of the SDN software switches will be largely decreased if complex security functions are required to be integrated.Most processing functions involved in security functions are stateful and CPUconsuming.The method to improve the security capacity without hurting performance should be investigated for SDN software switch.

FAS Mechanism
4.1.Architecture of FAS.Aiming at the above-mentioned bottlenecks of existing SDN software switches, we exploit programmable hardware, FPGA, to accelerate the rate of computation.FAS mechanism is designed to offload timeconsuming functional modules in the OpenFlow software fast path to FPGA.
As depicted in Figure 4, the FAS mechanism consists of three components in FPGA.Self-Described Buffer Management module is used to offload some time-consuming parts in packet receiving and sending of the Linux kernel.Metadata Generate module is to offload the procedure of parsing packet.Execute Action is used to perform some actions after acquiring packet and its actions from Metadata Resolve module.Metadata Resolve module receives packet processing metadata information from software, resolves it, and notifies Execute Action module.
Packet Buffer Management Offloading.The management of Skb for each packet is a critical operation during packet I/O.It contains the operations of conversion from the raw packet to Skb, initialization of Skb, and allocation and deallocation of Skb.It consumes majority of CPU cycles in the procedure of packet I/O.Previously, our research group has proposed Self-Described Buffer (SDB) management in hardware to eliminate the packet buffer management expenditure in software [23].In SDB, the original separated data structure Skb-packet and its metadata-is merged into a successive stored packet buffer (we call it the SDB packet buffer).The software preallocates fixed size space for SDB packet buffers in main memory at initial phase.This enables the SDB hardware to be able to dynamically allocate and recycle circular addresses of the SDB packet buffers during packet I/O.The packet buffer management overhead of software is thus eliminated.FAS makes use of SDB to alleviate the bottleneck of packet I/O.Packet Parsing Offload.The OpenFlow fast path extracts matching fields to generate flow key by parsing Skb.And the parsing procedure is the second most costly part as illustrated in Figure 3.We thus offload the operation of packet parsing to hardware and put the parsing result in the metadata of the packet.The parsing process is relatively straightforward in OpenFlow forwarding, as depicted in Figure 5.It can be easily implemented in FPGA hardware.
Action Execution Offload.The Execute Action module in FPGA executes partial actions for packet according to hardware capability.The fundamental operation is the forwarding action to corresponding port.In the sending process, the hardware will resolve the packet address written in the FPGA registers and read the packet with metadata by DMA.The parameters of actions are carried in metadata, which are specified by the software.For example, if the queue action in metadata is resolved by the module of Metadata Resolve, the packet will be enqueued to the specified queue of corresponding port in FPGA.The offload of Execution Action can eliminate complex and time-consuming software packet operations, such as packet field rewriting and encapsulation/decapsulation.

Security Function Extension Interface.
To support hardware security functions extension, we have proposed a welldefined module interface.The downstream and upstream interfaces of security modules are FIFO-liked interfaces and the control command is encoded into the metadata in front of the packet data.The security module can be easily embedded into the FPGA processing pipeline if the module interface definition is confirmed.

Comparisons between FAS and Existing Mechanisms.
Many mechanisms have been proposed to improve the performance of OpenFlow software switching.We list stateof-the-art mechanisms and show their differences with FAS in Table 2. DPDK and netmap provide high-performance I/O framework for OpenFlow software switching.Note that DPDK relies on specific CPUs and NICs (e.g., Intel 82599), while netmap can be used for general NICs [24].Flow Director proposes using the packet classification hardware on the NIC as OpenFlow fast forwarding path [25].But the cost of maintaining the Flow Cache entries in NIC driver is relatively high.SSDP proposes enhancing slow software forwarding path with commodity switching chip including TCAM [26].
The common OpenFlow forwarding path is divided into two data planes in SSDP: macroflows in switch chip and microflows in CPU.Maintaining state consistency between the two data planes is intractable.

Preliminary Implementation
This section describes a design and preliminary implementation of FAS on NetMagic-Pro (NMP), which is an FPGAbased network processing platform with multicore CPU.

Brief Introduction of NetMagic-Pro.
The hardware and software in NMP (as shown in Figure 6) are both programmable for packet processing.The total power consumption of NMP is 85 W. NMP consists of four parts: CPU board, FPGA board, line-card board, and power supply.The CPU board integrates an Intel i7-4700EQ CPU with 4 GB memory.
The FPGA board is equipped with Altera EP4SGX180 and a piece of Flash storing the configuration file of the FPGA.The software running on multicore CPU communicates with FPGA through the PCIe bus.The PCIe v2.0 with 8 lanes offers a high link bandwidth of 40 Gbps in each direction.The linecard board provides eight 1-GigE Ethernet ports.Similar to NetFPGA, NMP enables researchers and students to experiment with Gigabit rate networking hardware.The difference is that NMP provides better programmability in both hardware and software and higher softwarehardware communication performance than NetFPGA.We implemented FAS mechanism on the platform of NMP.

FAS Driver.
A kernel driver in the software is designed for FAS.It is responsible for NMP packets (including metadata, flow key, and packet data) communicating between the  software and hardware.The data structure of NMP packet is illustrated in Figure 7. Instead of packet descriptors, all control messages for the packet are stored in metadata, including DMA address, ingress port, next NMP packet address, and packet valid indicator.Flow key of 64 bytes consists of all matching fields extracted by hardware, whose size is 64 B. Packet accommodates the packet head and body, which is usually less than 1518 bytes.Therefore, 2 KB memory is preallocated for each NMP packet by software.FAS driver utilizes two techniques to optimize the performance of packet I/O and reducing cache miss rate in packet forwarding.
Polling Instead of Interrupt.Different from the hybrid of interrupt and polling in Linux NAPI (New API), we adopt polling approach to fetch the incoming NMP packets in FAS driver.It is reasonable since NMP is a packet forwarding platform.The FAS driver polls NMP packet from hardware DMA.The packet valid flag in the metadata is used to indicate whether the packet is ready to be processed.The NMP packets are organized into chain to simplify the polling.The next NMP packet address points to the next packet to be processed.
Core Affinity in Packet Dispatch.Each polling thread only runs on one CPU core.NMP packets are organized in chains; each chain is only assigned to one thread, namely, one core, for processing.With Run-to-Completion (RTC) mode of Set MTU of a port thread, each packet is received and transmitted by one core.It thus reduces the context switching overhead among different cores during processing.And it also reduces TLB update in accessing packet buffer.

Virtual Ethernet Port.
Virtual Ethernet ports are required for OpenFlow software switches to fully utilize the features of FAS.In NMP, the Ethernet ports are not standard commodity NICs.Therefore, we implement a customized network device for virtual Ethernet ports in NMP, which is applied to exchange packets and states for OpenFlow software switches.The virtual Ethernet port provides basic functions for packet forwarding applications, including network device application, packets sending/receiving, port counting, MAC address configuration, and Maximum Transmission Unit (MTU) configuration.Thus, the virtual Ethernet ports of NMP transparently support all features of OpenFlow software switches running on NMP.The core operation of customized network device is the conversion between Skb data structure and NMP packet.As shown in Section 6.2, the cost of conversion is quite low.
There are three steps to implement the virtual Ethernet port for NMP.In the first step, we allocate a new network device by calling alloc etherdev in the kernel.In the second step, we implement basic functions for the standard Ethernet port.The functions are listed in Table 3.All these functions are defined in the data structure of net device ops.
Among these functions, the most important one is packet sending, as shown in Pseudocode 1.There are two types of packets to be handled.For the packet received by FAS driver, if it is forwarded to some port; it will be converted from Skb to NMP packet.For the packet generated by the software, for example, OpenFlow messages sent to the controller, it is not provided with preallocated hardware-maintained addresses and other related information.It requires a new generated NMP packet with "soft" tag marked.Then the two types of the packets can be sent by writing hardware registers with corresponding metadata.The FPGA hardware executes operations on the packets according to their metadata.
In the third step, we register the new network device to the kernel.After configuring related parameters for the virtual Ethernet port, the register netdev function is called in the kernel to complete the registration of the network device.
Then the virtual Ethernet port can be accessed in the user space via the system command "ifconfig."At this time, we can create OpenFlow software switches on NMP.For example, the command "ovs-vsctl add-port br0 nmp1" is used to add port 1 in NMP to Open vSwitch's bridge.Tester are used for traffic source and sink, respectively.The packet size of testing flows can be configured in IXIA XM2 Tester.Table 4 lists the specification of the experimental components.Both devices under test have almost the same hardware configuration except for the CPU.Note that the performance of the PC's CPU (Intel i7-3770) is a little better than that of NMP (Intel i7-4700EQ).In the experiments, we mainly evaluate the efficiency of FAS mechanism from two aspects: forwarding rate and forwarding latency.We believe that PCIe bandwidth for transmitting packets between the software and FPGA is not a bottleneck, because PCIe 2.0 x8 links provide 40 Gbps, which is far more enough than required for unidirectional traffic generated in the experiment.

Forwarding Performance Evaluation.
Forwarding performance of OpenFlow software switching is concerned with packet size and flow rules.Firstly, we test the forwarding rate of Open vSwitch under 1 flow rule.All tests last 60 seconds and forwarding rates are sampled by 1 second.The average forwarding rates are shown in Figure 8.When forwarding traffic is with the size of 64 B Ethernet packets, NMP with FAS achieves throughput of 740 Mbps, about 97% of the theoretical wire-speed (762 Mbps).NMP with FAS gets 44% higher performance than the commodity PC.When the packet size is 128 B, the forwarding rate of NMP with FAS achieves wire-speed forwarding, which is 18.5% higher than the commodity PC.The gap of performance between FAS and the original OVS becomes smaller when the size of packets increases.That is because both can achieve wire-speed for large packets.We can conclude that, in the case of 1 flow rule, NMP with FAS can get nearly wire-speed forwarding rate even for minimum size packets, which significantly outperforms the commodity PC.
Secondly, we make comparisons of the forwarding rate under different number of flow rules.The preinstalled flow rules are all mutually exclusive with others.The numbers of rules are 256, 8192, and 65536 in different experiments.The traffic is generated according to the rules to guarantee that every packet matches the corresponding flow rule.We also vary the size of packets.The experimental results are shown in Figure 9.For the cases of 512 B, 1024 B, and 1500 B Ethernet packet, the forwarding rates of NMP with FAS and the commodity PC have little difference and both approach the wire-speed under all different numbers of flow rules.The data of 512 B and 1024 B are omitted for simplicity.For the packet size of 64 B (Figure 9(a)), the forwarding rate goes down with increasing flow rules.When the number of flow rules reaches 65536, the forwarding rates of NMP with FAS and the commodity PC fall by 45% and 47%, respectively, when compared to one-rule case, but the forwarding rate of the NMP with FAS is still 44% higher than that of the commodity PC.When the packet size is 128 B (Figure 9(b)), with the number of flow rules growing, the forwarding rates of NMP with FAS and the commodity PC both vary little.And NMP with FAS is 18% higher than the commodity PC.In summary, the FAS can provide higher forwarding rate, especially for small-size packets.
We evaluate the forwarding latency with different packet sizes and different number of flow rules.As Figure 10 shows,   in the experiment, Skb forwarding indicates packet forwarding with the conversion operation between NMP packet and Skb data structure.It can be observed that the latency of Skb forwarding on NMP is close to the latency of the NMP packets.That means the cost of conversion operation between NMP packet and Skb is quite small.With the growing of packet size, the latency gap between Skb forwarding and Open vSwitch is decreasing.That is because if the packet size is small, there will be more packets in the software processing queue.The packets will experience longer delay when queuing.That incurs large forwarding latency.
At last, we compare the implementation complexity of FAS to a standard L2 Ethernet switch.The consumption of FPGA resource is depicted in Figure 11.As we can observe, the logic utilization of FAS in ALMs is about 40% less than the L2 Ethernet switch.The resource usage results suggest that the implementation of FAS is simple and feasible in FPGA.
We can conclude from the experiment that FAS provides considerable acceleration for OpenFlow software forwarding, especially for small packets.And the implementation complexity in FPGA is acceptable.

Conclusions
The SDN switches are the fundamental infrastructure to supply flexible control of flows.SDN software switches running on commodity multicore platforms are widely deployed due to their upgradability, programmability, and low cost.However, the forwarding performance as well as security capacity provided by general-purpose SDN software switch platform is usually not satisfied.The case becomes even worse for OpenFlow forwarding.
In this paper, we design and implement an FPGA-based mechanism accelerating and securing SDN software switches, namely, FAS.FAS provides a framework to offload the timeconsuming modules and real-time security modules of SDN software switch and it employs some optimization techniques for solving the performance bottleneck between software and FPGA hardware.Our experimental results show that FAS utilizes reasonable FPGA resources and outperforms commodity platforms with nearly 44% higher forwarding rate for small packets.FAS can also be used to enhance the security of SDN software switches by allowing the bump-inthe-wire security modules to be integrated in FPGA.

Figure 1 :
Figure 1: Current implementation models of SDN switches.

Figure 2 :
Figure 2: UKC implementation models of SDN software switch.
Software Switch.There are three main bottlenecks (i.e., packet I/O, software forwarding, and Open-Flow classification) in SDN software switch.

Figure 3 :
Figure 3: Packet forwarding process overhead breakdown.The processing for packets in Execute Action module only contains the forwarding operation.

Figure 4 :
Figure 4: The framework of FAS mechanism.

Figure 8 :
Figure 8: The comparison of forwarding rate with one rule.

Figure 9 :
Figure 9: The forwarding rate with various number.

Figure 10 :
Figure 10: Comparison of forwarding latency with various packet sizes.

Figure 11 :
Figure 11: The FPGA resource usages of FAS and L2 Ethernet switch.
(b) Timeline of UKC forwarding

Table 2 :
Comparisons of acceleration mechanisms for SDN software switches.

Table 3 :
Basic functions for the virtual Ethernet port.
* netdev) Open a port static int nmp close(struct net device * netdev) Close a port static int nmp xmit frame(struct sk buff * skb,struct net device * netdev) Send packet to a port static struct net device stats * nmp get stats(struct net device * netdev) Acquire port statistics static int nmp set mac(struct net device * netdev,void * p ) * netdev,int new mtu)

Table 4 :
Components of platforms in experiment.