Proactive VNF Scaling with Heterogeneous Cloud Resources: Fusing Long Short-Term Memory Prediction and Cooperative Allocation

,


Introduction
Network function virtualization (NFV) plays an important role in future network innovation, which breaks the ossified and bloat of the network [1].
e purpose of NFV is to replace the network functions (NFs) such as firewalls and load balancers with software on existing commercial devices through virtualization technology, thus replacing the expensive and dedicated hardware (middlebox) in traditional networks [2].
In recent years, with the emergence of smart devices and applications, communication networks have gradually merged into large-scale cloud architecture networks.More and more applications have transferred the necessary computing load to the cloud platform, which has greatly reduced the pressure on individual users [3,4].Recently, NEC/Netcracker has unveiled the NFV cloud marketplace by introducing a Network-as-a-Service solution [5].While cloud-based applications can alleviate user hardware or software requirements and workloads, outsourcing network functions packet processing to the cloud leave the complex scheduling and provisioning services to the NFV providers [6,7].Hence, in the context of NFV, achieving high network resource utilization and meet Service Level Objective (SLO) for cloud service providers is urgent and means high profit returns [8,9].
Usually, in NFV service chains [10], NFs have various resources requirement that includes traffic-intensive, computationally intensive, and memory-intensive.For example, intrusion detection systems and secure encryption systems require sufficient CPU resources, while software routers rely more on memory resources, but network input and output devices rely more on bandwidth resources [11,12].In addition to simple constraints, combinatorial constraints are also possible, and there are various job flows in cloud services, for example, long-term job flows and short-term job lows coexist, or requiring two tasks to be placed on two distinct machines, and so on [12,13].Different NFs are now consolidated on the same physical server [14].However, in the cloud center, the cluster machines that serve network flows are heterogeneous, and one of the configuration examples is shown in Table 1 [6].Such heterogeneity would make accurate visibility into future resource demands difficult, and it will reduce the effectiveness of traditional slotbased scheduling [6,7].
In order to obtain better quality assurance and costeffectiveness of traffic services, there have been some research results.For example, in the research results of [15], the authors combined online learning and optimization, designed a proactive forecast of upcoming traffic demand, and actively adjusted VNF (virtual network functions) deployment.Also in the research results of [7], the authors actively predict the service chain demand of flow through online learning, adjust the instances of deployment of NFV cloud service providers according to the prediction, and solve the problem of new instance allocation and service chain routing.In addition to the research results of [16], the researchers provided an online algorithm for dynamic VNF deployment and placement with minimal design cost for cloud data centers of NFV service providers, and another scheme [17] scales VNF resources vertically on the fly, which is not encouraged in most cases as it requires rebooting the system [7], and so on.
However, previous studies ignored the diversity of network function resource requirements and the heterogeneity of actual server resource configuration, which can easily result in resource fragmentation and hence low resource utilization [6,12].Deep learning is a branch of machine learning that attempts to learn high-level features directly from the original data [18].Inspired by the successful use of deep learning on traffic flow [19], in this paper, we use long short-term memory recurrent neural network (LSTM RNN) to predict the customer service chain requests and the size of the flow in the NFV network.With prediction results, we derive the traffic classification and processing capacity requirement.
en, we deploy virtual network functions on different virtual machines and collaborative allocation of multiple resources according to the demand of different types of resources.
We summarize the contributions of this work in the following: (1) We investigate the VNF provisioning and resource allocation problem by LSTM RNN model to predict the type and amount of resource requirements, and LSTM RNN adapts well to the requirement of timevarying demands; hence, we can dynamically allocate the corrected amount of resource on the heterogeneous cluster machines for NFV cloud providers.(2) We offer a collaborative resource allocation approach on different resource types, which reduces the resource fragmentation and increases the resource utilization.(3) We evaluate our model on a real data set, and the correctness and efficiency of our method and analysis are validated by simulation studies.

Problem Description and Model
2.1.Problem Description.e cluster machines are heterogeneous, and that different NFs have different sensitivities to resource requirements; the goal is to improve the utilization of multiresources.erefore, we can describe the problem as follows: given a certain amount of resources (for example, CPU, memory, etc.) and resource requirements of each NFV service chain job, how to configure network functions and allocate resources to jobs to achieve high utilization is under the consideration of heterogeneous resource configuration.
For each request flow, it is necessary to ensure the reasonable allocation of resources.Traditionally, simply balancing the load of traffic results in inefficiencies and limits the processing power of the entire NFV cluster.Intuitively, the average load of all resources can be used as a solution.However, the problems that will exist include the following: it does not combine the actual configuration of the servers and it will take resource fragmentation; besides, much information about request network flow is not to be learned and considered (required type, traffic volume, and so on) [20,21].
For example, as shown in Figure 1, two VMs (virtual machines) in the server have three different resources that include CPUs, memory, and bandwidth, and the angle brackets shown in the figure represent the service request vector, where the values correspond to the three required resources, respectively.Because of the flexibility of NFV, different network functions can be deployed on the same server.Considering this situation, the network function required for service 1 is deployed in VM1, the network function required for service 2 is deployed in VM2, and both of the VMs have the network function for flow 3.As shown in the figure, they have an average load of 0.25 and 0.25, respectively, which is the same.erefore, when allocating the resources required by service 3 from the perspective of average allocating, both VM1 and VM2 have the same selected condition.However, the VM1 is more nervous because its CPU resources are insufficient.Hence, it is impossible to allocate resources for the next request, and VM1 forms an unusable resource fragment.Furthermore, from a deployment point of view, if we can predict the required functions and resource requirements of service 3, deploy the network functions which is required by service 3 ; together with the resource allocation complementary considerations, the resource utilization of VM2 will be improved, while the VM1 can be reserved for a higher rate of available resources.Furthermore, because of the heterogeneity of server resources and diverse flow requirements, the complexity of resource assignment is increased.As shown in Figure 2, the same value is used to represent the different resource capacities of the server, and t is used to represent the unit time period.
e coordinated allocation of various resources is considered in Figure 2, and the resources are reserved for subsequent requests in a reasonable manner without considering the characteristics of the flows.However, considering the life cycle of each service flow, if there are service requests with more than 3 memory unit resources in the next eight time periods, such as request 〈3, 4, 2〉, it will be unable to service request and refuse.Actually, after a period of time, the sum of the remaining resources of the server is sufficient to satisfy the user's request.So when we allocate resources, by combining the consideration of the traffic flow characteristics, although one unit of CPU and bandwidth resources is wasted in server 1, the service provider can receive more services in the next 7 time periods.at is to get more revenue while meeting customer service level goals.
In brief, cloud service resources are expensive and critical to the business.Different network function instances are used as intermediate processing nodes, and all applications need to meet specific requirements and constraints.How can we design solutions that dynamically scale VNF instances and coordinate them onto servers, to adequately serve fluctuating input traffic and to achieve long-term goals while meeting short-term resource needs over the long run of the system?e problem is the high variant of a multidimensional bin packing problem, which is NP-hard [22].

Network and Flow Model.
In the problem model studied, we use S to represent the network traffic flow set.|S| is used to denote the number of service set used and [1, |S|] represents the set of integers from 1 to |S|, and similar representations are analogized in this way.us, the service chain can be described as s i , which corresponds network service i.F represents the virtual network function set and f ij � 1 represents the virtual network function i corresponding to the network service j.In addition, for the server set L, l ∈ [1, |L|] denotes the server node l.
e corresponding capacity of various resource sets R are represented by Because different server nodes can be equipped with different virtual network functions, we define the node function set V l � v u , v w , . . ., v v   equipped with the server as a subset of the virtual network function set For example, the network function that can be processed with the server node l is denoted as

Problem Model.
In the multiresource problem of NFV cloud service, for each network service request s i , we define f li � 1 as the required network functions to be processed by the server l and denote the number of required resources of r as d ijr .erefore, the objective function of resource utilization rate u l of server l can be expressed as follows: Hence, in T � 1, 2, . . ., T { } time periods, the total demand of service chain is (2)

Mathematical Problems in Engineering
When processing the function requests, in order to ensure that the scheduled resources of the network service meet the existing network conditions, the restricted conditions are expressed by equations (3) to (5).In the constraint condition (3), it is necessary to ensure that the capacity of various resources allocated to the service set should not exceed the total available resources.We define the binary variable x ijk � 1 as the function j corresponding to the service chain i processed by the virtual machine k.Condition (4) has to ensure that the required network function in the service flow s i is processed by at least one node.Equation (5) introducing variable y i uv � 1 which represents directed edge (u, v) belongs to the assigned source path, and it is used to ensure that the function node selected by the service chain is loop-free, where z i uv is defined as the order number on the path.N s p is the subset of the selected path node set N p and h is an integer greater than or equal to 2 less than |N p |: (5)

Long Short-Term Memory (LSTM).
Neural networkbased prediction is able to nonlinearly model and approximate any continuous function to any desired accuracy theoretically [23].With the development of artificial intelligence, deep learning approaches have been gradually shifted to the computational intelligence forecast approaches.Among them, RNN enables to capture the features of time series and maps the input of sequential data and the output, which is naturally suitable as the method to capture the temporal evolution of flow.en, long short-term memory (LSTM) was developed to solve the problem of gradient vanishing along with the RNN sequence [24].e basic unit of LSTM is memory block instead of traditional neuron node.Each memory block contains a set of recurrently connected subnets and three multiplicative units: the input, the output, and forget gates, which provide continuous analogues of write, read, and reset operations on the cells.Against recurrent cell, the LSTM cell adds the forget gate, and the forget gates allow LSTM cells to remove or add information over long periods of time.
e evolution of flow can be considered as a temporal process.e incoming request traffic flow at the tth time interval is denoted by a i (t).At time T, the task is to predict traffic flow a i (T + 1) based on the historical sequence A � [a i (t) | t � 1, 2, . . ., T]. Flow can be classified into long term and short term according to the definition.Suppose that the input historical traffic flow sequence is denoted as X � (x 1 , x 2 , . . ., x T ), the following equations should be iterated to compute the hidden vector sequence h � (h 1 , h 2 , . . ., h T ) and the output predicted flow sequence Y � (y 1 , y 2 , . . ., y T ) by the LSTM network: where H is the hidden layer function, W denotes the input hidden weight matrix, and b denotes the bias vector sigmoid function, and H is implemented by the following composite function: where  y i represents the predicted value and y i denotes the actual value.
e prediction evaluation metric uses the common RMSE (root mean square error): By training the model of learning, we predict the type and size a i (t) of the flows that arrive in the next time slot t + 1 at each specified time point t.Because the source and destination of each flow are fixed, we can assume that the required functions are predicted by the source and destination pairs.After that, the predicted result is used to deploy the newly launched VNF according to the current server resource configuration and usage.
e allocation algorithm is used to solve the multidimensional bin packing problem and the request instances and remaining resources that change over time.In our deployment design, a heuristic approach is taken to minimize the resource ratios of the remaining nodes that satisfy the condition, while allowing other nodes to have more available space capacity for allocating other new instances.It is similar to relax the balance criteria increasing the utilization levels in [25], and the difference is that the aim of the candidate is minimizing resource fragmentation.
e selected reference standard for deployment can be expressed as where Re r l is the remaining resources of class r on server l and M R is the maximum value of the remaining resources on the server.e following is a simple example.It is assumed that the remaining resources of server A are configured as 〈10, 6, 2〉, the remaining resources of server B are configured as 〈8, 4, 6〉, and the remaining resources of server C are configured as 〈7, 2, 9〉; the maximum value of all kinds of resources is 〈10, 6, 9〉, the next requirement D is 〈6, 3, 4〉, and then P A � 2.22, P B � 2.13, P C � 2.03, and P C < P B < P A , but because the resource requirements are not met on P C , finally select server B to deploy D instance.

Resource Coordination Allocation Algorithm.
Based on predicting the flow and the flow in a certain period t, and dynamically adjusting the NF instance according to the allocated and unused resources in the cluster server, for the newly arrived workflow, the unallocated resources are allocated according to the resource requirements.Cooperative and complementary resource allocation heuristics are aimed at reducing resource fragmentation and achieving high resource utilization.In the problem, the overall utilization is driven by the arrival rate and scheduling of the jobs.Our algorithm is inspired by the packing strategy in [25] and dominant resource idea in [11].e algorithm first classifies the request flows life cycle and then uses the complementary advantages of the dominant resources to maximize the distance function for the resource requirements of the jobs, and the distance function is shown in the following equation: As shown in Figure 3, it represents four similar requests and server available resources and network function deployment.It can be seen that the resource requests for requests 1 and 2 are memory, and the resource requests for requests 3 and 4 are CPUs.It is calculated that the sum of the deviations of requests 1 and 3 is 10, and the sum of the deviations of requests 1 and 4 is 13, and then requests 1 and 4 are cooperatively packaged and requests 2 and 3 are cooperatively allocated.
e allocation method adopts formulas ( 9) and (10).Detailed resource allocation algorithm is listed below for allocating unused resources to newly arrived jobs.
Input: network function service request set S, network function node set F, and various resource capabilities of different nodes Step 1: pretreatment: according to the request service set S, the service s i and its corresponding service function set f ij and the required resource quantity are obtained cyclically.
Step 2: requests are classified long-term and short-term according to the life cycle of the flow.For each type of request flow, the service request set S is sorted into S R in descending order according to the resource type r ∈ R.
Step 3: the difference between the largest item in the resource type S r and the vector S r ′ of other resource types is calculated, and the distance of each request is calculated according to equation (10); the items with the largest deviation are packaged and paired successively according to the calculation results.
Step 4: find the VM subset that meets the matching resource requirements in the function node set, calculate the corresponding value P according to the formula (9), take the node with the smallest value, and select the nodes deploying the same type of stream as the distribution.
Step 5: if the job stream is long term which has a large percentage of jobs with small resource requirements, fit Mathematical Problems in Engineering into the system as many jobs as possible at a time; otherwise, fill the maximum jobs first according to equation ( 9).
Step 6: loop through the above steps until the assignment is complete.

Preliminary.
In this study, we use on-line traffic data on [26] to train and test the model.Each record is derived from .pcapfiles of the Skype traces that contain the features source address, destination address, time-stamp, request stream size, protocol, and other information from real users.e traces contain network traffic captured on main campus link that people have access [26], and the users can be administration staff, faculty members, or students.In each time slot, we regard the packets that have the same source and destination IP as a flow.e flow rate is calculated by dividing the total traffic size of the records in the flow by the interval of a time slot.We have extracted 290,148 network flows from the dataset.e data were split into training (70%), validating (10%), and testing (20%).We have used Tensorflow to implement all the models and the python package scikit-learn to calculate performance metrics.
Research in [4] has proved that traffic load on NF instances brings remarkable overheads on virtual machine to the cloud proxy system.Moreover, a network flow consists of all packets sharing a unique bidirectional combination of source and destination IP addresses and transport protocol, etc.Based on the record of flows, the prediction of source and destination is used to infer the route of request flow and the corresponding service chain.Hence, here we examine the prediction accuracy of traffic flow and service chain.e forecast results will be the basis for providing the VNF instances and aggregating service chains to better utilize the limited and valuable resources over time.
Different from the previous service prediction model, we need to distinguish the type of flows.In each time slot, we regard the same source and destination pair as a type of NF service chain.In Figure 4, we replace the source and destination pairs of the request with numbers to represent the request flow and the figure shows a good prediction of 200 requests.Even if the result may have a certain range of deviation because the IP address is often divided according to the area, the deviation range is acceptable and controllable.Since the data set is captured on the campus main link, the addresses are regular.Hence, we will do more learning on network flow classification to detect much information about a current network flow in the future [27]; here, we focus on examining the prediction accuracy of traffic flow and the method can also be used for flow-type prediction.Before training, in order to have an intuitive understanding of the traffic fluctuations of flows over different time of the target data, we show the part of the traffic fluctuation values collected from 10 am in Figure 5 and part of the statistics and variation of number of requests per second in Figure 6.It shows that the provider must decide where (or whether) to place runnable tasks tens of times per second and even frequently needs to restart tasks.Hence, it is important to effectively estimate upcoming traffic rates and adjust VNF deployment.
Following the research in [28], in previous forecasting approaches, we know that multilayer perception (MLP) performed well in all individual models.In preliminary tests, several experiments are designed to compare with MLP to validate the effectiveness of the proposed LSTM RNN model in traffic flow prediction.After many experiments, we use 6 Mathematical Problems in Engineering two hidden layers and the number of hidden units in each hidden layer is assigned to 64 and 32, respectively.In addition, the fully connected layers have been added to all models.

Validation and Analysis.
e excellent performance of LSTM RNN mainly benefits from the memory ability.We illustrated the training phases' MLP and LSTM in Table 2 to verify the ability of LSTM RNN to memorize historical data.As we can see from Table 2, with the increase of input data length, the RMSE of LSTM model decreases faster than MLP at the start.When the input data length increases, the decline rates of both slows down, but compared with MLP, RMSE of LSTM RNN is lower and LSTM is relatively stable at a low level.Figure 7 shows the traffic flow comparison of observation values collected from 133.7 to 137.7 seconds, which contain more than 300 records, and prediction values obtained by LSTM RNN model.Intuitively, the prediction results are fairly good, and most of the fluctuations are captured.While previous forecasting approaches MLP [28], it shows a tendency to the fitting of mean value.Figure 5 shows the flow fluctuates over time, which means that the traffic flow in the past prediction interval has a impact on the current traffic flow, but the impact of long historical data is relatively small, and LSTM RNN benefits from the memory ability.It is worth noting that a longer input history means more training rounds and time.
e generalization capability is an important evaluation criterion for a prediction model.However, the statistical result is about 10,000 flows per minute.We try to aggregate those records that have the same source and destination IP and fall into same time slot as a flow, but we found that it will destroy the time series of the data and seriously affect the accuracy of the training.Limited by computing power, in this study, we evaluated the generalization capability with different prediction intervals through the number of flows.
e prediction intervals used are 10, 20, 30, and 60, and the sliding window lengths are 32 and 64 [29].e prediction results are shown in Table 3. From Table 3, we can see that the RMSE rises with the increase of prediction horizon.But LSTM model is lower and less sensitive to prediction interval.
To evaluate the effectiveness of resource utilization, the comparison schemes are [7] and [30], and the former adopts a single resource allocation method, the latter aims at minimizing the standard deviation of all server loads.ey minimize the system imbalance by different resources priority, and the priority is given based on their load distributions.Here, we have used MATLAB to simulate the configuration of the server CPU and memory resources in Table 1 and randomly set a set of virtual machine nodes in each server.In the case shown in Figure 8, we fixed the life time of the flow and compared the average utilization ratio of multiple resources of the virtual machines to show the improvement of resource utilization.In the experiments, we   From the results compared with [7] (denoted as Single) and [30] (denoted as Priority) we can find that the utilization improvement rate of collaborative resource allocation mode is mostly better.e resource utilization rate of cooperation showed better performance distribution than that of others in the heterogeneous setting resource allocation mode.e reason behind is that they are designed to achieve balanced load distribution, while ours is devised for reducing resource fragmentation according to the complementary strategies of different resource-type requirements.Even if Jia et al. [30] have adopted to migrate from the heaviest loaded node, the balance is based on the current load situation, without considering future resource requests, flow type, and dominant resources.In this study, LSTM is used to predict and use the resource and then dynamically allocate the resources to improve the resource utilization.e way of allocation of collaborative resources to jobs is consistent with our intuition; that is, the cooperative resource consideration mode is adopted in the allocation and the probability of virtual machine bottleneck resources in the server can be reduced under a certain probability so that the remaining space can be fully allocated to other newly arrived jobs and the resource utilization can be improved.Hence, cooperation results in a more balanced load distribution, and the performance variation of cooperation is adapted to traffic variations.
We verify the advantages of considering dynamic deployment and complementary resource allocation above, and here, we regard VM as a slot.In Figure 9, ratio 1 corresponds to a perfect match between the task assignment and the slot, and ratio less than 1 corresponds to the ratio of underutilized resources.For more than 100 time periods, we use the method in [31] to set a 10% long-term business flow.As shown in the figure, the multiservice flow approach provides higher resource utilization.is is because it better matches the resource allocation with the task demand cycle.Short-cycle flow typically consists of a single execution phase; on the contrary, the completion time of long-term workflows consists of many phases.erefore, among the multiple service flows, the small work completion time is fast and improves the response time of the small jobs without impacting the large jobs, and the higher resource utilization can be achieved.Figure 10 depicts the utilization when VMs are allocated based on our scheme and the method proposed in [32], in which dynamic scaling of resources in NFV is done by combining the offline method by varying the window size along with different types of days and the online method ESA (Exponential Smoothing Average) to detect the     [32] performs well on dealing with peaks, which benefit from the ESA detects the peaks quickly, but in the long run, our approach performs stable and better. is is partly because our approach considers multiple resource coordination and flow types, which reduce the resource fragmentation and the lifetime limitation of the flows, thereby improving valuable network resource utilization.

Conclusion
is paper proposes a deep learning-based forecasting strategy based on the dynamic adjustment of network function instances in network function virtualization resource allocation in cloud services and then designs a collaborative strategy based on deep dynamic learning.e scheme realizes reasonable expansion and allocation of virtual network resources, effectively reducing resource fragmentation and flow life cycle constraints, thereby improving valuable network resource utilization.Finally, the experimental results based on actual data show that this research can effectively improve the ability of virtualized network function resources adjustment and effectively improve the utilization of network resources.

Figure 6 :
Figure 6: Statistics and variation of requests.

Figure 7 :Figure 5 :
Figure 7: Comparison of observation and prediction traffic flow.

Figure 9 :Figure 10 :
Figure 9: Comparison of different service flow scenarios

Table 1 :
Normalized CPU and memory configuration in Google Machine Cluster.

4
Mathematical Problems in Engineering e input: p t � tanh(W c [h t− 1 , x t ] + b c ) e input gate: i t � sigmoid(W i [h t− 1 , x t ] + b i ) e forget gate: f t � sigmoid(W f [h t− 1 , x t ] + b f ) e output gate: o t � sigmoid(W o [h t− 1 , x t ] + b o ) e state: c t � f t • c t− 1 + i t • p t e output: h t � o t • tanh c t

Table 2 :
Prediction results of the RMSE.

Table 3 :
Prediction results of the RMSE.