A Processor-Sharing Scheduling Strategy for NFV Nodes

The introduction of the two paradigms SDN and NFV to “softwarize” the current Internet is making management and resource allocation two key challenges in the evolution towards the Future Internet. In this context, this paper proposes Network-Aware Round Robin (NARR), a processor-sharing strategy, to reduce delays in traversing SDN/NFV nodes. The application of NARR alleviates the job of the Orchestrator by automatically working at the intranode level, dynamically assigning the processor slices to the virtual network functions (VNFs) according to the state of the queues associated with the output links of the network interface cards (NICs). An extensive simulation set is presented to show the improvements achieved with respect to two more processorsharing strategies chosen as reference.


Introduction
In the last few years the diffusion of new complex and efficient distributed services in the Internet is becoming increasingly difficult because of the ossification of the Internet protocols, the proprietary nature of existing hardware appliances, the costs, and the lack of skilled professionals for maintaining and upgrading them.
In order to alleviate these problems, two new network paradigms, Software Defined Networks (SDN) [1][2][3][4][5] and Network Functions Virtualization (NFV) [6,7], have been recently proposed with the specific target of improving the flexibility of network service provisioning and reducing the time to market of new services.
SDN is an emerging architecture that aims at making the network dynamic, manageable, and cost-effective, by decoupling the system that makes decisions about where traffic is sent (the control plane) from the underlying system that forwards traffic to the selected destination (the data plane).In this way the network control becomes directly programmable and the underlying infrastructure is abstracted for applications and network services.
NFV is a core structural change in the way telecommunication infrastructure is deployed.The NFV initiative started in late 2012 by some of the biggest telecommunications service providers, which formed an Industry Specification Group (ISG) within the European Telecommunications Standards Institute (ETSI).The interest has grown, involving today more than 28 network operators and over 150 technology providers from across the telecommunications industry [7].The NFV paradigm leverages on virtualization technologies and commercial off-the-shelf programmable hardware, such as general-purpose servers, storage, and switches, with the final target of decoupling the software implementation of network functions from the underlying hardware.
The coexistence and the interaction of both NFV and SDN paradigms is giving to the network operators the possibility of achieving greater agility and acceleration in new service deployments, with a consequent considerable reduction of both Capital Expenditure (CAPEX) and Operational Expenditure (OPEX) [8].
One of the main challenging problems in deploying an SDN/NFV network is an efficient design of resource allocation and management, functions that are in charge of the network Orchestrator.Although this task is well covered in data center and cloud scenarios [9,10], it is currently a challenging problem in a geographic network where transmission delays cannot be neglected, and transmission capacities of the interconnection links are not comparable with the case of above scenarios.This is the reason why the problem of orchestrating an SDN/NFV network is still open and attracts a lot of research interest from both academia and industry.
More specifically, the whole problem of orchestrating an SDN/NFV network is very complex because it involves a design work at both internode and intranode levels [11,12].At the internode level, and in all the cases where the time between each execution is significantly greater than the time to collect, compute, and disseminate results, the application of a centralized approach is practicable [13,14].In these cases, in fact, taking into consideration both traffic characterization and required level of quality of service (QoS), the Orchestrator is able to decide in a centralized manner how many instances of the same function have to be run simultaneously, the network nodes that have to execute them, and the routing strategies that allow traffic flows to cross the nodes where the requested network functions are running.
Instead, centralizing a strategy dealing with operations that require dynamic reconfiguration of network resources is actually unfeasible to be executed by the Orchestrator for problems of resilience and scalability.Therefore, adaptive management operations with short timescales require a distributed approach.The authors of [15] propose a framework to support adaptive resource management operations, which involve short timescale reconfiguration of network resources, showing how requirements in terms of load-balancing and energy management [16][17][18] can be satisfied.Another work in the same direction is [19] that proposes a solution for the consolidation of VMs on local computing resources, exclusively based on local information.The work in [20] aims at alleviating the inter-VM network latency defining a hypervisor scheduler algorithm that is able to take into consideration the allocation of the resources within a consolidated environment, scheduling VMs to reduce their waiting latency in the run queue.Another approach is applied in [21], which introduces a policy to manage the internal on/off switching of virtual network functions (VNFs) in NFV-compliant Customer Premises Equipment (CPE) devices.
The focus of this paper is on another fundamental problem that is inherent to resource allocation within an SDN/ NFV node, that is, the decision of the percentage of CPU to be assigned to each VNF.If at a first glance this could show a classical problem of processor sharing that has been widely explored in the past literature [15,22,23], actually it is much more complex because performance can be strongly improved by leveraging on the correlation with the output queues associated with the network interface cards (NICs).
With all this in mind, the main contribution of this paper is the definition of a processor-sharing policy, in the following referred to as Network-Aware Round Robin (NARR), which is specific for SDN/NFV nodes.Starting from the consideration that packets that have received the service of a network function from a virtual machine (VM) running on a given node are enqueued to wait for transmission through a given NIC, the proposed strategy dynamically changes the slices of the CPU assigned to each VNF according to the state of the output NIC queues.More specifically, the NARR strategy gives a larger CPU slice to serve packets that will leave the node through the NIC that is currently less loaded, in such a way as to minimize wastes of the NIC output link capacities, also minimizing the overall delay experienced by packets traversing nodes that implement NARR.
As a side contribution, the paper calculates an on-off model for the traffic leaving the SDN/NFV node on each NIC output link.This model can be used as a building block for the design and performance evaluation of an entire network.
The paper is structured as follows.Section 2 describes the node architecture.Section 3 introduces the NARR strategy.Section 4 presents a case study and shows some numerical results.Finally, Section 5 draws some conclusions and discusses some future work.

System Description
The target of this section is the description of the system we consider in the rest of the paper.It is an SDN/NFV node as the one considered in [11,12], where we will apply the NARR strategy.Its architecture, shown in Figure 1(a), is compliant with the ETSI Specifications [24].It is composed of three different domains, namely, the Compute domain, the Hypervisor domain, and the Infrastructure Network domain.The Compute domain provides the computational and storage hardware resources that allow the node to host the VNFs.Thanks to the computing and storage virtualization provided by the Hypervisor domain, a VM can be created, migrated from one node to another one, and halted, in order to optimize the deployment according to specific performance parameters.Communications among the VMs, and between the VMs and the external environment, are provided by the Infrastructure Network domain.
The SDN/NFV node is remotely controlled by the Orchestrator, whose architecture is shown in Figure 1(b).It is constituted by three main blocks.The Orchestration Engine executes all the algorithms and the strategies to manage and orchestrate the whole network.After each decision, the Orchestration Engine requests that the NFV Coordinator instantiates, migrates, or halts VMs and consequently requests that the SDN Controller modifies the flow tables of the SDN switches in the network in such a way that traffic flows can traverse VMs hosting the requested VNFs.
With this in mind, a functional architecture of the NFV node is represented in Figure 2. Its main components are the Processor, which manages the Compute domain and hosts the Hypervisor domain, and the Network Card Interfaces (NICs) with their queues, which constitute the "Network Hardware" block in Figure 1.
Let  be the number of virtual network functions (VNFs) that are running in the node, and let  be the number of output NICs.In order to simplify notation, in the following we will assume that all the NICs have the same characteristics in terms of buffer capacity and output rate.So, let  (NIC) be the size of the queue associated with each NIC, that is, the maximum number of packets that each queue can contain, and let  (NIC) be the transmission rate of the output link associated with each NIC, expressed in bit/s.
The Flow Distributor block has the task of routing each entering flow towards the function required by the flow.It is a software SDN switch that can be implemented, for example, with OpenvSwitch [25].It routes the flows to the VMs running the requested VNFs according to the control messages received by the SDN Controller residing in the

Processor Flow distributor
Processor Arbiter Orchestrator.The most common protocol that can be used for the communications between the SDN Controller and the Orchestrator is OpenFlow [26].
Let  () be the total processing rate of the processor, expressed in packets/s.This rate is shared among all the active functions according to a processor rate scheduling strategy.Let  () be the array whose generic element,  ()  [] , with  ∈ {1, . . ., }, is the portion of the processor rate assigned to the VM implementing the function   .Of course we have  ∑ =1  ()  [] =  () . (1) Once a packet has been served by the required function, it is sent to one of the NICs to exit from the node.If the NIC is transmitting another packet, the arriving packets are enqueued in the NIC queue.We will indicate the queue associated with the generic NIC  as  (NIC)  .
In order to implement the NARR strategy proposed in this paper, we realize the block relative to each function with a set of  parallel queues, in the following referred to as insidefunction queues, as shown in Figure 3 for the generic function   .The generic th inside-function queue of the function   , indicated as  , in Figure 3, is used to enqueue packets that, after receiving the service of the function   , will leave the node through the NIC .Let  ()  Ins be the size of each insidefunction queue.Each inside-function queue of the generic function   receives a portion of the processor rate assigned to that function.Let  ( Ins ) [,] be the portion of the processor rate assigned to the queue  , of the function   .Of course, we have . . .The portion of processor rate associated with each insidefunction queue is dynamically changed by the Processor Arbiter according to the NARR strategy described in Section 3.

NARR Processor-Sharing Strategy
The NARR (Network-Aware Round Robin) processor-sharing strategy observes the state of both the inside-function queues and the NIC queues, with the goal of reducing as far as possible the inactivity periods of the NIC output links.As already introduced in the Introduction, its definition starts from the fact that packets that have received the service of a VNF are enqueued to wait for transmission through a given NIC.So, in order to avoid output link capacity waste, NARR dynamically changes the slices of the CPU assigned to each VNF, and in particular to each inside-function queue, according to the state of the output NIC queues, assigning larger CPU slices to serve packets that will leave the node through less-loaded NICs.
More specifically, the Processor Arbiter decides the processor rate portions according to two different steps.
Step 1 (assignment of the processor rate portion to the aggregation of queues whose output is a specific NIC).This step meets the target of the proposed strategy that is to reduce, as much as possible, underutilization of the NIC output links and, as a consequence, delays in the relative queues.To this purpose, let us consider a virtual queue that contains all the packets that are stored in all the inside-function queues  , , for each  ∈ [1, ], that is, all the packets that will leave the node through the NIC .Let us indicate this virtual queue as With this in mind, the idea is to give a higher processor slice to the inside-function queues whose flows are directed to the NICs that are emptying.
Taking into account the goal of privileging the flows that will leave the node through underloaded NICs, the Processor Arbiter calculates  ( → NIC  ) Aggr as follows: where  (NIC)  represents the state of the queue associated with the NIC , while  ref is defined as follows: ( The term  ref is a reference target value calculated from the state of the NIC queue that has the highest length, amplified with a coefficient , and truncated to the maximum queue size  (NIC) .It is determined in such a way that, if we consider  = 1, the NIC queue that has the highest length does not receive packets from the inside-function queues because the service rate of them is set to zero; the other queues receive packets with a rate that is proportional to the distance between their length and the length of the most overloaded NIC queue.However, through an extensive set of simulations, we deduced that setting  = 1 causes bad performance because there is always a group of inside-function queues that are not served.Instead, all the  values in the interval ]1, 2] give almost equivalent performance.For this reason, in the numerical analysis presented in Section 4, we have set  = 1.2.
Step 2 (assignment of the processor rate portion to each inside-function queue).Let us consider the generic th inside-function queue of the function   , that is, the queue  , .Its service rate is calculated as being proportional to the current state of this queue in comparison with the other th queues of the other functions.To this purpose, let us indicate the state of the virtual queue . Of course, it can be calculated as the sum of the states of all the insidefunction queues  , , for each  ∈ [1, ]: that is, So, the service rate of the inside-function queue  , is determined as a fraction of the service rate assigned at the first step to the virtual queue , as follows: Of course, if at any time an inside-function queue remains empty, the processor rate portion assigned to it will be shared among the other queues proportionally to the processor portions previously assigned.Likewise, if at some instant an empty queue receives a new packet, the previous processor rate portion is reassigned to that queue.

Case Study
In this section we present a numerical analysis of the behavior of an SDN/NFV node that applies the proposed NARR processor-sharing strategy, with the target of evaluating the achieved performance.To this purpose, we will consider two other processor-sharing strategies as reference, in the following referred to as round robin (RR) and queue-length weighted round robin (QLWRR).In both the reference cases, the node has the same  NIC queues, but it has only  processor queues, one for each function.Each of the  queues has a size of  () =  ⋅  ()  Ins , where  () Ins represents the size of each internal function queue, already defined so far for the proposed strategy.
The RR strategy applies the classical round robin scheduling policy to serve the  function queues; that is, it serves each function queue with a rate  () RR =  () /.The QLWRR strategy, on the other hand, serves each function queue with a rate that is proportional to the queue length; that is, where  (  ) is the state of the queue associated with the function   .

Parameter Settings.
In this numerical analysis, we consider a node with  = 4 VNFs, and  = 3 output NICs.We loaded the node with a balanced traffic constituted by   = 12 flows, each characterized by a different 2-uple { ,  }.Each flow has been generated with an on-off model characterized by exponentially distributed on and off periods.When a flow is in off state, no packets arrive to the node from it; instead, when it is in on state, packets arrive with an average rate  ON .Each packet is assumed with an exponential distributed size with a mean of 1 kbyte.In all the simulations we have considered the same on-off cycle duration,  =  OFF +  ON = 5 msec, while we have varied the burstiness.Burstiness of on-off sources is defined as where  Mean is the mean emission rate.Now, indicating the probability of the ON state as  ON , and taking into account that  Mean =  ON ⋅  ON and  ON =  ON /( OFF +  ON ), we have In our analysis, the burstiness has been varied in the range [2,22].Consequently, we have derived the mean durations of the off and on periods, as follows: Finally, in order to maintain the same mean packet rate for different values of  OFF and  ON , we have assumed that 75 packets are transmitted, on average: that is,  ON = 75/ ON .The resulting mean emission rate is  Mean = 122.9Mbit/s.
As far as the NICs are concerned, we have considered an output rate of  (NIC) = 980 Mbit/s, so having a utilization coefficient on each NIC of  (NIC) = 0.5, and a queue size  (NIC) = 3000 packets.Instead, regarding the processor, we considered a size of each inside-function queue of  ()  Ins = 3000 packets.Finally, we have analyzed two different processor cases.In the first case we considered a processor that is able to process  () = 306 kpackets/s, while in the second case we assumed a processor rate of  () = 204 kpackets/s.Therefore, defining the processor utilization coefficient as follows: with   ⋅  Mean being the total mean arrival rate to the node, the two considered cases are characterized by a processor utilization coefficient of  () Low = 0.6 and  () High = 0.9, respectively.

Numerical Results.
In this section we present some results achieved by discrete-event simulations.The simulation tool used in the paper is publicly available in [27].
We first present a temporal analysis of the main variables characterizing the SDN/NFV node, and then we show a performance comparison of NARR with the two strategies RR and QLWRR, taken as a reference.
For the temporal analysis we focus on a short time interval of 720 sec, in order to be able to clearly highlight the evolution of the considered processes.We have loaded the node with an on-off traffic like the one described in Section 4.1, with a burstiness  = 7.
Figures 4, 5, and 6 show the time evolution of the length of the queues associated with the NICs, the processor slice assigned to each virtual queue loading each NIC, and the length of the same virtual queues.We can subdivide the considered time interval into three different periods.
In the first period, ranging from the instants 0.1063 and 0.1067, from Figure 4 we can notice that the NIC queue  (NIC) 1 has a greater length than the queue  (NIC) 3 , while  (NIC) 2 is empty.For this reason in this period, as shown in Figure 5, the processor is shared between  ( → NIC 1 ) Aggr and , and the slice assigned to serve  ( → NIC 3 ) Aggr is higher, in such a way that the two queue lengths  (NIC) 1 and  (NIC) 3 reach the same value, situation that happens at the end of the first period, around the instant 0.1067.During this period, the behavior of both the virtual queues  ( → NIC 1 ) Aggr and  ( → NIC 3 ) Aggr in Figure 6 remains flat, showing the fact that the received processor rates are able to balance the arrival rates.
At the beginning of the second period that ranges between the instants 0.1067 and 0.10693, the processor rate assigned to the virtual queue  ( → NIC 3 ) Aggr has become not sufficient to serve the amount of arriving traffic, and so the virtual queue  ( → NIC 3 ) Aggr increases its length, as shown in Figure 6.During this second period, the processor slices assigned to the two queues  Now, in order to show how the second step of the proposed strategy works, we present the behavior of the insidefunction queues whose output is sent to the NIC queue  (NIC) 3 during the same short time interval considered so far.More specifically, Figures 7 and 8 show the length of the considered inside-function queues, and the processor slices assigned to them, respectively.The behavior of the queue  2,3 is not shown because it is empty in the considered period.As we can observe from the above figures, we can subdivide the interval into four periods: (i) The first period, ranging in the interval [0.1063, 0.10657] is characterized by an empty state of the queue  3,3 .Thus, in this period, the processor slice assigned to the aggregated queue  ( → NIC 3 ) Aggr is shared by  1,3 and  4,3 , only.(ii) During the second period, ranging in the interval [0.10657, 0.1067],  1,3 is scarcely loaded (in particular it is empty in the second part of this period), and so the processor slice assigned to  3,3 is increased.(iii) In the third period, ranging between 0.1067 and 0.10693, all the queues increase and equally share the processor.(iv) Finally, in the last period, starting at the instant 0.10693, as already observed in Figure 5, the processor slice assigned to the aggregated queue  ( → NIC 3 ) Aggr is suddenly decreased, and consequently the slices assigned to the queues  1,3 ,  3,3 , and  4,3 are decreased as well.
The steady-state analysis is presented in Figures 9, 10, and 11, which show the mean delay in the inside-function queues, in the NIC queues, and the durations of the offand on-states on the output links.The values reported in all the figures have been derived as the mean values of the results of many simulation experiments, using Student's distribution and with a 95% confidence interval.The number of experiments carried out to evaluate each numerical result has been automatically decided by the simulation tool with the requirement of achieving a confidence interval less than 0.001 of the estimated mean value.To this purpose, the confidence interval is calculated at the end of each run, and simulation is stopped only when the confidence interval matches the maximum error requirement.The duration of each run has been chosen in such a way that the sample standard deviation is so low that less than 30 runs are enough to match the requirement.The figures compare the results achieved with the proposed strategy with the ones obtained with the two strategies RR and QLWRR.As said so far, results have been obtained against the burstiness, and for two different values of the utilization coefficient: that is,  () Low = 0.6 and  () High = 0.9.As expected, the mean lengths of the processor queues and the NIC queues increase with both the burstiness and the utilization coefficient.
Instead, we can note that the mean length of the processor queues is not affected by the applied policy.In fact, packets requiring the same function are enqueued in a unique queue and served with a rate  () /4, when RR or QLWRR strategies are applied, while, when the NARR strategy is used, they are split into 12 different queues and served with a rate  () /12.If in a classical queueing theory [28] the second case is worse than the first one because of the presence of service rate wastes during the more probable periods of empty queues, this is not the case here because the processor capacity of an empty queue is dynamically reassigned to the other queues.
The advantage of the NARR strategy is evident in Figure 10, where the mean delay in the NIC queues is represented.In fact, we can observe that, with only giving more processor rate to the most loaded processor queues (with the QLWRR strategy), performance improvements are negligible, while applying the NARR strategy we are able to obtain a delay reduction of about 12% in the case of a more performant processor ( ()  Low = 0.6), reaching the 50% when the processor works with a  () High = 0.9.The performance gain achieved with the NARR strategy increases with burstiness and the processor load, conditions that are both likely.In fact, the first condition is due to the high burstiness of the Internet traffic; the second one is true as well because the processor should be not overdimensioned for economic purposes; otherwise, if overdimensioned, it can be controlled with a processor rate management policy like the one presented in [29] in order to save energy.
Finally, Figure 11 shows the mean durations of the onand off-periods on the node output links.Only one curve is  shown for each case because we have considered a utilization coefficient  (NIC) = 0.5 on each NIC queue, and therefore in the considered case we have  ON =  OFF .As expected, the mean on-off durations are higher when the processor rate is higher (i.e., lower utilization coefficient).This is because, in this case, the output processor rate is lower, and therefore batches of packets in the NIC queues are served quickly.These results can be used to model the output traffic of each SDN/NFV node as an Interrupted Poisson Process (IPP) [30].This model can be iteratively used to represent the input traffic of other nodes, with the final target of realizing the model of a whole SDN/NFV network.

Conclusions and Future Work
This paper addresses the problem of intranode resource allocation, by introducing NARR, a processor-sharing strategy that leverages on the consideration that, in any SDN/NFV node, packets that have received the service of a VNF are enqueued to wait for transmission through one of the output NICs.Therefore, the idea at the base of NARR is to dynamically change the slices of the CPU assigned to each VNF according to the state of the output NIC queues, giving more CPU to serve packets that will leave the node through the less-loaded NICs.In this way, wastes of the NIC output link capacities are minimized, and consequently the overall delay experienced by packets traversing the nodes that implement NARR is reduced.
SDN/NFV nodes that implement NARR can coexist in the same network with nodes that use other strategies, so facilitating a gradual introduction in the Future Internet.
As a side contribution, the simulator tool, which is public available on the web, gives the on-off model of the output links associated with each NIC as one of the results.This model can be used as a building block to realize a model for

Figure 3 :
Figure 3: Block diagram of the generic function   .

Figure 5 :Figure 6 :
Figure 5: Packet rate assigned to the virtual queues.

( → NIC 1 )
Aggr and  ( → NIC 3 ) Aggr are adjusted in such a way that  (NIC) 1 and  (NIC) 3 remain with comparable lengths.The last period starts at the instant 0.10693, characterized by the fact that the aggregated queue  ( → NIC ) Aggr leaves the empty state, and therefore participates in the processor-sharing process.Since the NIC queue  (NIC) 2 is low-loaded, as shown in Figure 4, the largest slice is assigned to  ( → NIC 2 ) Aggr in such a way that  (NIC) 2 can reach the same length of the other two NIC queues as soon as possible.