Optimized Virtual Machine Placement with Traffic-Aware Balancing in Data Center Networks

Virtualization has been an efficientmethod to fully utilize computing resources such as servers.Theway of placing virtual machines (VMs) among a large pool of servers greatly affects the performance of data center networks (DCNs). As network resources have become a main bottleneck of the performance of DCNs, we concentrate on VM placement with Traffic-Aware Balancing to evenly utilize the links in DCNs. In this paper, we first proposed a Virtual Machine Placement Problem with Traffic-Aware Balancing (VMPPTB) and then proved it to be NP-hard and designed a Longest Processing Time Based Placement algorithm (LPTBP algorithm) to solve it. To take advantage of the communication locality, we proposed Locality-Aware Virtual Machine Placement Problemwith Traffic-Aware Balancing (LVMPPTB), which is a multiobjective optimization problem of simultaneously minimizing themaximumnumber of VMpartitions of requests andminimizing themaximumbandwidth occupancy on uplinks of Top of Rack (ToR) switches. We also proved it to be NP-hard and designed a heuristic algorithm (Least-Load First Based Placement algorithm, LLBP algorithm) to solve it. Through extensive simulations, the proposed heuristic algorithm is proven to significantly balance the bandwidth occupancy on uplinks of ToR switches, while keeping the number of VM partitions of each request small enough.


Introduction
As virtualization technology [1] becomes the mainstream way to multiplex various physical resources in modern cloud data centers, the effective and efficient placement of virtual machines (VMs) becomes an important issue.Mechanisms such as VMware Capacity Planner [2] and Novell PlateSpin Recon [3] consolidate VMs such that the consumption of CPU, memory, and power are optimized.Owing to the increasing deployment of communication-intensive applications like MapReduce [4], the data center network (DCN) is becoming the bottleneck of applications performance and scalability.Thus, those mechanisms without considering network resources are not feasible in cloud data centers.
Three-layer tree-like architecture is prevalently used in modern data centers [5], as shown in Figure 1.This kind of architecture, however, inherently suffers scalability issue due to the fact that links connected to the core layer switches usually transfer more traffic from lower layers.Physical machines (PMs) connected to the same Top of Rack (ToR) switch can communicate at full line speed, and the traffic between PMs connected to different ToR switches has to traverse across links of the core layer, which is often the bottleneck of data center networks (DCNs).In this paper, we placed VMs on PMs by effectively balancing traffic in the core layer of DCNs to minimize the maximum bandwidth occupancy on uplinks of the Top of Rack (ToR) switches.
The issue on scalability of DCNs has attracted great attention from academia recently.Several recent works address this issue by designing new DCN architectures, aiming at maximizing the network bisection bandwidth [6][7][8] and reducing the overall oversubscription ratio of the DCN.Other papers [9][10][11] address this issue by optimizing VM placements on PMs with different optimization goals.The bisection bandwidth can be improved; however, the available bandwidth in the core layer cannot be fully utilized.The number of highly utilized links in the core layer never exceeds 25% [12], which means a great number of links in the core layer are underutilized.Our proposed VM placement with Traffic-Aware Balancing can evenly spread traffic across links  of the core layer to utilize the high bisection bandwidth.Since link utilizations in the core/aggregation layers are higher than that in the ToR layer and a significant fraction of the core links appear as hotspots persistently [12,13], we only consider the traffic across the core layer of DCNs, and the traffic between VMs of the same request under the same ToR switch does not contribute any traffic towards the core layer.We formally defined our Virtual Machine Placement Problem with Traffic Balancing (VMPPTB) as an optimization problem, which minimizes the maximum bandwidth occupancy on uplinks of each ToR switch.We proved its NP-hardness by reducing from the Multiprocessor Scheduling Problem [14] and designed a Longest Processing Time Based Placement algorithm (LPTBP algorithm) to solve it.The LPTBP algorithm provides an optimum solution to the VMPPTB problem; however, the generated placement schema tends to evenly place VMs of each request under every ToR switch.As the fact that VMs of a tenant's request only communicate with other VMs within the same request (communication locality property), we should reduce the number of VM partitions of each request at the same time (i.e., placing VMs of the same request on as few PMs as possible).We further proposed Locality-Aware Virtual Machine Placement Problem with Traffic Balancing (LVMPPTB), which aims to minimize the maximum number of VM partitions of each request and the maximum bandwidth occupancy on uplinks of ToR switches simultaneously.We also proved it to be NP-hard by reducing from VMPPTB and designed a Least-Load First Based Placement algorithm (LLBP algorithm) to solve it.
We summarize our contribution as follows: (i) (iii) We designed a Greedy Based Placement algorithm (GBP algorithm) as the expected baseline, and took the LPTBP algorithm as the optimum solution.We evaluated the performance of LLBP through extensive simulations, compared with GBP algorithm and LPTBP algorithm.
The rest of this paper is organized as follows.We briefly present related work in Section 2. Problem formulation and computation complexity proofs are presented in Section 3. In Section 4, we describe the LPTBP and LLBP algorithms.We evaluate our algorithms in Section 5. Finally, we make a conclusion in Section 6.

Related Work
In [9], the authors addressed the network scalability issue by using traffic-aware virtual machine (VM) placement.They defined the placement problem as an optimization problem to minimize the communication costs of all the VMs, where communication cost is defined as the hops between each VM pair.They assumed that the traffic matrices between virtual machines are known in advance.The generated placement scheme can reduce the communication distance between VMs with large traffic and reduce aggregated traffic into the higher level of the data center network (DCN) architecture.
In [10], the authors proposed jointly optimizing virtual machine placement and route selecting.They strived to minimize the averaged congestion rate of every link in DCNs.Their placement strategy performs better in topologies with rich connectivity and path diversity.Although the averaged congestion rate is minimized, the traffic in DCNs may not be significantly balanced.The traffic matrix is also assumed to be known in advance.
In [11], the authors presented a Min-Cut Ratio-Aware VM Placement (MCRVMP) problem.They tried to minimize the maximum ratio of the demand and capacity across all cuts in the network, where the cut is defined as a set of links that partition the hosts into two disjoint connected components, the capacity is the sum capacity of the links, and the demand is the total traffic from either side of the hosts.In this way, each network cut may have spare capacity to absorb unpredicted traffic bursts.This work is only used in medium sized data centers, and traffic rates are assumed to be known in advance.
Knowing the traffic matrix in advance is a very strong assumption.In [15], the authors proposed to adopt the product traffic pattern model to characterize the traffic rates between VMs, where the traffic rate is defined as the product of activity levels of two communicating VMs.Similar to the objective of [9], they tended to place more active VMs into physical machines (PMs) with less communication cost.The proposed product traffic pattern is unrealistic to some degree.The generated placement scheme may result in hotspots, as VMs with higher activity levels are placed in hosts connected to the same switch; the uplinks of that switch may become hotspots.
In [16], the authors formulated the placement problem as an optimization problem to minimize the total cost caused by network traffic and utilization of PMs.When the number of PMs is fixed, the authors proposed three different traffic cost functions to solve the optimization problem.The data center topology, however, is not taken into consideration when designing the traffic cost functions.

Problem Formulation
In this section, we define the VM placement problem based on tree-like architectures such as fat-tree [6] and VL2 [7].Data centers usually leverage three-layer tree-like architectures, in which physical machines (PMs) are directly connected with Top of Rack (ToR) switches, ToR switches are connected with aggregation switches, and aggregation switches are further connected with core switches.The treelike topology, however, is often oversubscribed, due to the fact that a higher level link has to carry traffic from several lower level links.As the higher level links are often the bottlenecks and hotspots [12], we explore to balance traffic on links of the core layer of the data center networks (DCNs).
In the VM placement problem, the traffic between VMs connected to the same ToR switch does not transfer across the core switch layer of DCNs.We only need to consider traffic on the uplinks of ToR switches, since the traffic can be dynamically load balanced among the aggregation layer and the core layer using load balancing mechanisms like Valiant Load Balancing [6].By properly placing VMs of multiple requests, we can better evenly and effectively utilize every link of core layer of the DCNs and thus minimize the maximum bandwidth occupancy on uplinks of ToR switches.

VMPPTB Problem.
We define the VM placement problem in the scenario where the DCN contains  ToR switches, represented as a set T = { 1 ,  2 , . . .,   }.For each ToR switch, we can place up to  VMs in one PM connected with it.Let   be the uplink of the th ToR switch, and let    be the accumulated bandwidth occupancy on uplink   .Suppose there are  requests R = { 1 ,  2 , . . .,   } from different tenants in the cloud data center, and the corresponding numbers of requested VM are S = { 1 ,  2 , . . .,   | ∀,   > }.
If   ⩽ , the requested VMs could be placed under the same ToR switch, and when   > , the requested VMs must be partitioned into several PMs.The bandwidth requirement of VM  of request   is denoted as   .If VM  was placed under a ToR switch   , it would contribute   bandwidth occupancy on uplink   .We should properly place all the VMs of requests under  ToR switches to minimize the maximum bandwidth occupancy on the uplinks of ToR switches with the constraint that each VM should be placed under some ToR switch and the number of VMs placed under one ToR switch should be no more than .
Let    be a binary indicator of whether VM  of request   is placed under ToR   .Consider The descriptions of the symbols used in this paper are summarized in Notations.
We formally define the Virtual Machine Placement Problem with Traffic Balancing (VMPPTB) as follows: s.t.
The objective function (2) minimizes the maximum bandwidth occupancy on uplinks of ToR switches.Constraint (3) ensures that each VM of requests is placed under some ToR switch.Constraint (4) guarantees that the number of VMs under every ToR switch is not more than the capacity of one PM.Constraint (5) updates bandwidth occupancy on uplink   when we place a VM under ToR switch   .
We prove VMPPTB is a NP-hard problem.

Theorem 1. For the Virtual Machine Placement Problem with
Traffic Balancing (VMPPTB) defined above, finding its optimal solution is NP-hard.
Proof.This can be proven by a reduction from the Multiprocessor Scheduling Problem (MSP).Given a set J of independent jobs and a number of processors  and given that job   has length   , what is the minimum possible time required to schedule all jobs in J on  processors such that none overlap?The MSP is known to be a NP-hard problem [14].
For example, we make the set J as a request of some tenant, job   as   of request   , the length   of job   as bandwidth requirement   of   of request   , and  processors as  ToR switches.It turns out to be an instance of VMPPTB, except that the number of VMs placed under a ToR switch is limited to  in VMPPTB.However, we can set  extremely large such that the constraint holds in any placement schema.In this way, the MSP can be reduced to the VMPPTB.Thus, we prove the VMPPTB is a NP-hard problem.
Since we have proven that VMPPTB is NP-hard by reducing from MSP, we can solve VMPPTB by approximation algorithms designed to solve MSP.A simple but classical algorithm called Longest Processing Time (LPT) algorithm can achieve an upper bound of (4/3 − 1/3)OPT [17].It is feasible to design a Longest Processing Time Based Placement (LPTBP) algorithm to solve VMPPTB.We first sort bandwidth requirements of VMs of all requests in nonincreasing order.Then we place the VM with the maximum bandwidth requirement under the ToR switch, of which the uplink has minimum bandwidth occupancy.We repeat the process until VMs of all requests are placed on PMs.

LVMPPTB
We also prove LVMPPTB is a NP-hard problem.
Theorem 2. For the Locaility-Aware Virtual Machine Placement Problem with Traffic Balancing (LVMPPTB), finding its optimal solution is NP-hard.
Proof.Since we have proven that the VMPPTB is a NP-hard problem, we can prove LVMPPTB's NP-hardness by reducing it from the VMPPTB.A special instance of LVMPPTB is that there is only one request in the DCN.Therefore, the number of VM partitions of this request equals the number of VMs of this request divided by the maximum number of VMs placed under a ToR switch.It turns out to be the same with the VMPPTB.If we can find an optimal solution to the VMPPTB, we can also find an optimal solution to this special instance of LVMPPTB and vice versa.As the VMPPTB is NP-hard, the LVMPPTB's NP-hardness is proven.

Algorithms
We have proven that the Virtual Machine Placement Problem with Traffic Balancing (VMPPTB) is a NP-hard problem.
As the VMPPTB is the reduction from the Multiprocessor Scheduling Problem, we designed a Longest Processing Time Based Placement (LPTBP) algorithm to solve the VMPPTB, as shown in Algorithm 1.
The LPTBP algorithm aims to achieve balanced bandwidth occupancy of uplinks of all ToR switches by placing the VM with maximum bandwidth requirement under the ToR switch with minimum bandwidth occupancy every time.First, {  } are put into a two-dimensional array V. {   } and {  } are initially 0. For the VM with maximum bandwidth requirement, the LPTBP algorithm selects the ToR switch with minimum bandwidth occupancy, places the VM under it, and then updates V, B, and C. The LPTBP algorithm repeats the process until no VMs can be placed under any ToR switches and outputs a virtual machine placement schema.
Though the LPTBP algorithm has an approximation ratio (4/3 − 1/3)OPT [17], the output placement schema does not take advantage of the communication locality property.Hence, we propose a heuristic algorithm named Least-Load First Based Placement (LLBP) to solve the Locality-Aware Virtual Machine Placement Problem with Traffic Balancing (LVMPPTB), as shown in Algorithm 2.
The LLBP algorithm places the requests in the nonincreasing order of their numbers of VMs.For each request, LLBP tries to find a minimum empty ToR switch set that can hold all its VMs.The ToR switch(es) in an empty ToR switch set is(are) fully available and connected to none of VMs.If the empty ToR switch set exists, LLBP places all VMs of the request in the nonincreasing order of bandwidth requirement under the empty ToR switch set in the leastload first way (placing the VM with maximum bandwidth requirement under the ToR switch with minimum bandwidth occupancy).If no such empty ToR switch set exists for a request, LLBP tries to find a minimum ToR switch set to hold its VMs and also place the VMs under the ToR switch set in the least-load first way.
LLBP first places VMs of the request with maximum number of VMs, and an empty ToR switch set exists to meet the communication locality property, and the number of VM partitions would be small.LLBP repeats the process until no such empty ToR switch set exists.At this time, for these requests with smaller numbers of VMs, LLBP tries to find some ToR switch sets to hold their VMs, which are placed under several ToR switches, and the numbers of VM partitions would rise.The worst case is that each VM of a request is placed under one ToR switch, in which the communication locality property is hard to meet.However, as the number of VMs is smaller, the number of VM partitions would be smaller and the overhead of communication between these Input: {  }: Set of bandwidth requirements of VMs of all requests; {   }: Set of cumulative bandwidth occupancy of the uplinks of all ToR switches; {  }: Set of cumulative numbers of VMs under ToR switches    : Maximum bandwidth capacity of a uplink of ToR switch; : Maximum number of VMs under a ToR switch.
find a minimum empty ToR switch set T  +1 to hold all VMs of request  +1 , |T  +1 | = ⌈ +1 /⌉ (10) if T  +1 = 0 then (11) find a minimum ToR switch set T   +1 to hold all VMs of request  +1 (12) end if (13) place all VMs of request  +1 in the non-increasing order of bandwidth requirement under T  +1 or T   +1 in the least-load first way under constraints VMs is smaller.The VMs of all requests are placed in a least-load first way, and the bandwidth occupancy on uplinks of ToR switches can be balanced significantly.Therefore, LLBP achieves a better trade-off between the number of VM partitions of requests and bandwidth occupancy of uplinks of the ToR switches.
We also design a Greedy Based Placement (GBP) algorithm, as shown in Algorithm 3. The GBP algorithm places VMs of all requests sequentially under ToR switches in the ToR switch order, which means that each VM of a request is placed as close as possible to other VMs of the same request.and the LLBP algorithm is in the middle.According to the simulation results,  GBP is larger than 3,  LPTBP is nearly 1, and  LLFBP is about 1.5. values are close to the results of example in Section 4.
Although the GBP algorithm places as many as possible VMs of the same request under the same ToR switch, fully taking advantage of the communication locality property, the bandwidth occupancy on uplinks of all the ToR switches varies significantly.The LPTBP algorithm can spread the traffic of VMs evenly across all the uplinks of the ToR switches; however, it distributes the VMs of a request under several ToR switches.The proposed LLBP algorithm simultaneously balances the bandwidth occupancy on uplinks of all the ToR switches and takes advantage of the communication locality property.From the simulation results and analyses, we can conclude that LLBP algorithm performs effectively under different scales of DCNs.
Other topologies include unstructured [27,28] and wireless topologies [29,30].In DCNs, the locality can be defined as different levels.For example, in the fat-tree [6] in Figure 4, we can logically categorize locality into ToR level ( 0 and  1 ), pod level ( 0 and  2 ), and tree level ( 0 and  4 ).Locality is different physically, such as server level, rack level, and row level.We can set different values for various locality levels in optimization.

Different Optimization Goal.
In Section 3, we proposed Locality-Aware Virtual Machine Placement Problem with Traffic-Aware Balancing (LVMPPTB), which is a multiobjective optimization problem of simultaneously minimizing the maximum number of VM partitions of requests and minimizing the maximum bandwidth occupancy on uplinks of ToR switches.We also can optimize one objective while the other objective is fixed.For example, we can restrict the maximum bandwidth on uplink of a ToR switch to a fixed value to optimize the number of VM partitions of requests.

Online Balancing.
VM placement problem occurs in the scenario of running applications from the very beginning, which is addressed by an offline algorithm.Once the VMs of applications are running, various problems could occur at different time, while we should use an online algorithm to perform VM migration for load balancing and fault tolerance.
We also can consider the correlations between several VMs to optimize the VM migration [31].
6.4.Fault Tolerance.In LPTBP and LLBP, we assumed that all requests of tenants were running normally.If the event of requests being lost occurs, we need to design a fault tolerant mechanism to retrieve the lost request.Timeout retransmission may be a way of solving the problem.

Conclusion
In

Figure 1 :
Figure 1: An example of three-layer tree-like architecture.

Figure 2 :
Figure 2: An example of comparison on the GBP, LLBP, and LPTBP algorithms.

Figure 3 : 0 1 2 3 Figure 4 :
Figure 3: Minimum and maximum bandwidth requirement of different algorithms under different simulation settings.
Problem.The LPTBP algorithm can generate an approximate optimal solution to VMPPTB; however, it cannot take the communication locality of VMs of a request into consideration (i.e., VMs of a request only communicate with other VMs within the same request).Thus, if all VMs of a request are placed under as few ToR switches as possible, it could essentially reduce the traffic forward to the core layer and further mitigate hotspots in the core layer.Let T   = {  |    = 1, ∀ ∈ {1, 2, . . .,   }} be a subset of T, which contains the ToR switches under which the VMs of request   are placed.|T   | denotes the number of VM partitions of request   , which should be minimized to effectively take advantage of the communication locality property.The multiobjective optimization problem named Locality-Aware Virtual Machine Placement Problem with Traffic Balancing (LVMPPTB) aims to minimize the maximum bandwidth occupancy on the uplinks of ToR switches and minimize the maximum number of VM partitions of all the requests simultaneously, which is formally defined as follows: , ∀ ∈ {1, 2, . .., }T   = {  |    = 1} , ∀ ∈ {1, 2, . . .,   } .
this paper, we formulated Virtual Machine Placement Problem with Traffic Balancing (VMPPTB) to balance traffic in data centers.We proved its NP-hardness and designed a Longest Processing Time Based Placement algorithm.To take advantage of the communication locality property, we proposed Locality-Aware Virtual Machine Placement Problem with Traffic Balancing (LVMPPTB) which simultaneously minimizes the maximum number of VM partitions of each request and minimizes the maximum bandwidth occupancy on uplinks of ToRs.We proved its NP-hardness and designed a Least-Load First Based Placement heuristic algorithm.We conducted extensive simulations to evaluate the performance of algorithms.Maximum number of VMs under a ToR switch   : Number of VMs under ToR switch     : U plinkofT oRswitch     : Accumulated bandwidth occupancy on uplink      : Maximum bandwidth capacity of uplink R = {  }: Set of requests from tenants S = {  }: Set of number of requested VMs of each request   : Bandwidth requirement of VM  of request      : Indicator of whether VM  of request   is under   T   : Set of ToR switches connected with VMs of request   .