Location-Constrained Virtual Machine Placement (LCVP) Algorithm

Virtual machine (VM) placement is the current day research topic in cloud computing area. In order to solve the problem of imposing location constraints on VMs to meet their requirements in the process of VM placement, the location-constrained VM placement (LCVP) algorithm is proposed in this paper. In LCVP, each VM can only be placed onto one of the specified candidate physical machines (PMs) with enough computing resources and there must be sufficient bandwidth between the selected PMs to meet the communication requirement of the corresponding VMs. Simulation results show that LCVP is feasible and outperforms other benchmark algorithms in terms of computation time and blocking probability.


Introduction
e evolutionary advancements in the field of technology have led to the instigation of cloud computing [1]. With the popularity of cloud computing, VM placement has received more and more attention. Jobs arrive from the cloud consumer and are analyzed to produce the multiple corresponding subtasks that VM can run directly. e physical resources provided by a data center are used to build VMs to support the execution of these subtasks. e part playing a connecting role between the two mentioned above is called VM placement, which is concerned with mapping VMs to PMs in a data center [2,3].
ere has been some research in VM placement. e traditional approaches consider VM placement as a wellknown bin-packing problem [4,5], which assumes that the performance of task execution can always satisfy cloud consumers as long as the requested computing resources are less than the total available resources in a data center. However, this assumption ignores factors such as available bandwidth resources and geographical distances between the selected PMs. Specifically, if available physical resources in a data center are sufficient but belong to the different PMs far away from each other, using these resources to build VMs may lead to network congestion easily, which further detain the task execution [6]. Based on this consideration, the authors in [6] proposed a new algorithm for data-intensive distribution applications, which can promote the effective execution of these applications by giving bandwidth higher priority over PMs. Unfortunately, the performance of the algorithm cannot be guaranteed and it is also possible to end up with worse solutions. In order to finish the task on time, a backfilling algorithm to execute deadline-based tasks was proposed [7]. By introducing the idea of the double auction, the mechanism in [8] can bridge users' task requirements and providers' resources in two-side cloud markets and achieve the purpose that the purchase prices of physical resources for building VMs are as close as possible to their true value. Mann et al. [2] found that VM placement and VM selection influence each other significantly and in a highly nontrivial way. According to this, they proposed a problem formulation for the joint optimization of them. However, they only gave some primary theories and there is still much work for further research. Based on [2], Pascual et al. proposed a new combined optimization model to study how a task affects others on the same PM [9], according to the sizes and types of user tasks. e authors in [10] dug into the design and implementation of virtual machine management strategies for energy-efficient cloud data centers and proposed a distributed approach to an energy-efficient dynamic virtual machine consolidation mechanism. In order to enhance security, a virtual resource mapping algorithm was proposed in [11], which can configure resources based on evaluating and detecting the threats and vulnerabilities of VMs. However, the algorithm missed the opportunity to place multiple VMs onto one PM, which reduces the resource utilization of a data center. Zhao et al. proposed a function model between the performance of task executions, the resource costs, and the impact of multiple tasks on the same PM [12]. And then they proposed an optimization algorithm based on the model. Cortez et al. used machine learning to predict the resource costs of VMs to achieve the purposes of reserving resources and improving resource utilization more effectively [13].
In cloud computing, the data center acts as an infrastructure to provide physical resources for VMs [14] and one of the important problems is determining mappings of VMs to PMs with different objectives, such as optimizing costs, profits, or performances [15]. As the basic unit to supply services for cloud consumers, VMs will be placed very frequently. If multiple VMs that need to communicate with each other are placed onto the PMs far away from each other, substantial bandwidth will be occupied inevitably to maintain communication between the VMs, which will cause unnecessary waste of network resources and may lead to network congestion [11]. Moreover, there are usually thousands of PMs in a data center. If the PMs are searched for numerous cloud user requests without any constraints, the complexity of resource management will increase exponentially. erefore, considering the factors mentioned above and the requirements of functionality, security, availability [16], it is necessary to impose constraints on VMs and the constraints come from location requirements in this article.
In summary, the major contributions of this paper are as follows: (i) We provide a new perspective for VM placement and formulate the problem of imposing location constraints on VMs in cloud computing (ii) Based on this new perspective, we propose an algorithm to generate the desired solution (iii) We conduct and analyze extensive simulations to demonstrate the effectiveness of LCVP. e results show that our algorithm achieves better performance and lower computation time In this paper, the LCVP algorithm that can consider the location constraints is proposed. Simulation results show that, compared with the existing ones, LCVP can achieve its goal and outperform other benchmarks in terms of computation time and blocking probability. e rest of this paper is organized as follows. In Section 2, the problem description illustrates the problem solved in this paper first. And then the models are presented in model definition. e feasible solution is formulated in the rest of Section 2. Section 3 describes the LCVP algorithm and its performance evaluation is done in Section 4. Finally, the conclusion is given in Section 5.

VM Placement
2.1. Problem Description. As mentioned above, it is necessary to impose location constraints on VMs and the problem solved in this paper is mapping the requested VMs onto the appropriate PMs under multiple constraints. Specifically, each VM should be placed onto one of the specified candidate PMs that are defined based on the preferred location and radius, i.e., location constraint. Each selected candidate PM should have sufficient computing resources to host the corresponding VM, i.e., computing capacity constraint. And there should be sufficient bandwidth between each pair of the selected PMs to maintain the communication between the corresponding VMs, i.e., bandwidth capacity constraint. e objective of LCVP is to serve as many customer requests as possible and maximize the resource utilization of a data center based on multiple constraints.

Model Definition.
e physical resources provided by a data center to supply cloud computing services are represented as an undirected graph called PM-graph. e VMs needed by a cloud consumer to run his/her tasks are represented as an undirected graph called VM-graph.

PM-Graph. It indicates the physical resources provided by a data center.
e PM-graph is defined as an undirected graph G s (V s , E s ), where V s represents the set of PMs and E s represents the set of communication between PMs. e communication is called the physical link in the following sections. Each PM v s ∈ V s has a computing capacity c s v s and each physical link e s ∈ E s has a bandwidth capacityb s e s . In addition, each PM is also associated with a location l s v s . P s is the set of loopless paths in G s (V s , E s ) and P s v s is the set of loopless paths that start/end at node v s .

VM-Graph.
It indicates a customer request, or in other words, the VMs needed by a cloud customer to finish his/her jobs before the deadline. e VM-graph is defined as an undirected graph G r (V r , E r ), where V r is the set of VMs requested by a cloud consumer and E r is the set of communication that represents bandwidth requirements to support data flow between the corresponding VMs. e communication between VMs is called virtual link (VL) in the following sections. Each VM v r ∈ V r has a computing requirement c r v r and each VL e r ∈ E r has a bandwidth requirement b r e r . In addition, for LCVP, each VM v r ∈ V r has a preferred location, denoted as l r v r . Based on l r v r , VM v r ∈ V r can only be placed onto the candidate PM(s) that is(are) defined based on the preferred location l r v r and a radius ρ, i.e., location constraints mentioned in this paper. e set of candidate PM(s) is denoted as Φ s v r , i.e.,

Scientific Programming
where ||l s u s − l r v r || is the distance between the two locations. Each VL e r ∈ E r is also associated with a candidate physical path set P s e r , which includes all the paths between the candidate PMs of the two end-nodes of e r , i.e., where e r + and e r − are the two end-nodes of e r . P s e r indicates the subset of P s e r with sufficient bandwidth to carry VL e r . VM-graphs are derived from the customer requests. Specifically, a cloud consumer submits jobs and then the jobs are analyzed to generate the corresponding subtasks that can be run directly on VMs. After this, according to the degree of parallelism in subtasks and the communication dependencies among those subtasks, the corresponding VM-graph is generated.

Compatibility Graph.
e CG is a graph structure. It is denoted as G c (V c , E c ), in which each node v c ∈ V c represents a candidate physical path for a VL e r ∈ E r , and the nodes in the same row represent all candidate physical paths for the same VL, i.e., where f C ( ) is the function to obtain the line number of the corresponding nodes in a CG and f −1 N ( ) is the inverse function of f N ( ) to obtain the corresponding virtual link that e s 1 is a candidate for. Each row of the CG represents all candidate physical paths for a particular VL in the VM-graph, i.e., Each link in a CG denotes the corresponding end-nodes are compatible. Specifically, if two physical paths are compatible, a link is inserted to connect their corresponding nodes in CG. Similarly, there is no link between the incompatible nodes.
Here, "compatible" means that two compatible physical paths can carry two VLs in the same VM-graph simultaneously. Specifically, the compatible physical paths should satisfy the following: (1) they are the candidate paths for two adjacent VLs and have one PM as the common end-node or (2) they are the candidate paths for two VLs that are not adjacent.

Feasible Solution.
A feasible solution should satisfy the following demands. In a feasible solution, each selected PM to host the corresponding VM should satisfy the location constraint, computing capacity constraint, and one-to-one mapping constraint. Each selected physical link for the corresponding VL should satisfy the bandwidth capacity constraint and link mapping constraint. On this basis, as many VMs as possible should be placed to maximize the resource utilization of a data center. Specifically, each VM can only be placed onto the candidate PMs, i.e., where f N ( ) denotes the mapping relation between VMs and its candidates. Each VM in the same VM-graph can only be placed onto a single PM and any two different VMs in a single VM-graph cannot be placed onto the same PM, i.e., If a VM is split and placed onto multiple PMs, additional bandwidth will be occupied inevitably to support the VM's internal communication. And if multiple VMs are placed onto one PM, the customer request will be vulnerable to physical resources failures [17]. In addition, multiple VMs sharing the same PM are vulnerable to resource competition, which may cause performance interference among VMs and thus lead to VM performance degradation [18]. Each selected PM should have sufficient resources for the corresponding VM, i.e., Furthermore, each VL should be placed onto one physical path connecting the PMs that its two end-nodes are placed onto, i.e., where f L ( ) is the link mapping relation between VLs and physical paths and e r + and e r − are the two end-nodes of VLe r . e allocated bandwidth on each physical link should not exceed its bandwidth capacity, i.e., where b e r p s indicates the bandwidth allocated to accommodate VL e r on the physical path p s and I e s p s indicates whether p s traverses e s or not, i.e., I e s p s � 1, if p s traverses e s , I e s p s � 0, otherwise.
It should be clearly noted that there may be one or more feasible solutions, but only one of them will be the desired one that LCVP finally provides to the corresponding cloud consumer. As illustrated in the previous section, the objective of LCVP is to serve as many customer requests as possible and maximize the resource utilization of a data center. Hence, LCVP will select one with lower blocking probability from feasible solutions for a cloud consumer, that is, the desired solution.

Preprocessing.
First of all, LCVP will preprocess all candidate PM sets upon receipt of the VM-graph and PM-graph.
is is because of the one-to-one mapping between VMs and e detailed procedure is given in Algorithm 1.

CG Construction.
After preprocessing, LCVP constructs CG next by using the VM-graph and PM-graph. e CG structure has been mentioned in the previous section, so it will not be repeated in this section. e body of CG construction is shown in Algorithm 2.

Heuristic Maximum Clique Algorithm.
With CG, the problem of imposing location constraints on VM is transformed into the maximum clique problem. Since the nodes from the same row of CG represent all candidate physical paths for a particular VL, a node selected from a row in the CG means a VL has found a feasible physical link. e feasible solution has been found when LCVP found one node from each row of the CG. So, the next thing that needed to be done is to find the maximum clique that can minimize the total resource consumption from the constructed CG, that is, the desired solution. e resource consumption is calculated as follows: where f L is the function between the selected physical link and virtual link e r and b e r p s is the bandwidth allocated on p s for e r in PM-graph. e CG has a good property that a maximal clique in it is also the maximum one as long as there are sufficient bandwidth resources, which has been proven in [19]. Specifically, the feasible solution exists as long as there is at least one node that can be selected in each row. With this property, the problem of finding the maximum clique is reduced to finding a maximal one, which can optimize the computation time of LCVP.
Considering the purpose of load-balancing, the weight h of the candidate physical path is defined as its hop-count divided by its available bandwidth. According to the cannikin law, the available bandwidth of a physical path depends on the physical link with the smallest available capacity: where |p s | denotes the hop-count of physical path carrying virtual link e r and δ is a small positive number to avoid zero denominators.
As mentioned previously, LCVP will find the desired maximal clique from the constructed CG by using the heuristic maximum clique algorithm shown below. f SP is the function to obtain the physical path that a node in the CG represents, while f −1 SP is the inverse function (Algorithm 3).

Desired Solution Generation.
In the end, since each row of a CG represents all the candidate physical paths of a particular VL and each node represents a candidate physical path for a VL, the desired solution can be generated by traversing the maximum clique found in the previous section.

Performance Metrics.
e objective of LCVP is to serve as many customer requests as possible and maximize the resource utilization of a data center.
e solution found through LCVP should satisfy the constraints (5)- (9), that is, feasible solution. Hence, the customer request blocking probability is used as the performance metrics. e so-called blocked request means that, due to insufficient physical resources in a data center, LCVP fails to generate the feasible solution.
Blocking Probability: it is defined as the ratio of blocked to total arrived requests, i.e., where Ω a (T) and Ω b (T) denote the set of the accepted and blocked customer requests during [0,T], respectively.

Simulation
Step. In order to verify the feasibility and time-efficiency of LCVP, simulation experiments are conducted. e simulation environment is MATLAB 2012b running on a computer with 3.10 GHz Intel Core i3-2100 CPU and 4.00 GB RAM. e PM-graph and all VM-graphs are randomly generated with the GT-ITM [20], which is a tool for randomly generating network topology. ere are 50 PMs and 172 physical links in the PM-graph and all the PMs are located within a 100 × 100 grid. e PM-graphs used in the simulations are shown in Figures 1 and 2.
In each VM-graph, the number of VMs is between 2 and 10 randomly and the probability of connecting any two VMs is 0.5. e preferred location l r v r of each VM is also randomly located in the 100 × 100 grid and the candidate PM set for each VM consists of the PMs located within the circle that is centered at l r v r and has a radius of ρ. e default value of ρ is 20.
e VM-graphs mentioned above are generated according to the Poisson process. e average arrival rate of the Poisson process is λ VM-graphs per time-unit and the average holding time of each VM-graph is 1/μ time-units. erefore, the traffic load of VM-graphs is λ/μ in Erlangs. To facilitate the following experiments, Yen's algorithm [21] is used to precalculate K-shortest physical paths between each PM-pair.
e algorithms proposed in [22] and their modified versions are used as the benchmarks, which are denoted as DViNE, DViNE-LB, DViNE-KSP, and DViNE-LB-KSP, respectively. e algorithms ending with "KSP" are the modified versions that precalculate K shortest physical paths compared with the original algorithms, while those including "LB" are the benchmark algorithms considering load balance. In addition to the algorithms ending with "KSP," the others are the original ones proposed in [22]. LCVP also precalculates the K shortest physical paths by using Yen's algorithm and uses these paths to construct CG. Hence, there is no modified version ending with "KSP" for LCVP. e reason for using the modified versions is that the original ones did not apply limitation on the number of candidate physical paths for each virtual link, which is impractical for the data center with massive resources. Figures 1 and 2, the simulation experiment was conducted for 500000 time-units under a fixed traffic load as 20 Erlangs to evaluate the performance of the algorithms. e results are shown in Figure 3.

Simulation Results. On the PM-graph shown in
From Figure 3, it can be seen that the blocking probability of LCVP maintains about 6%, which is lower than the benchmark algorithms, thanks to the fact that LCVP can simultaneously consider the capacities of PMs, bandwidth capacities, and hop-count of physical links. Moreover, the algorithms considering the network bandwidth load have a lower probability than those that do not. It is obvious that the algorithms considering the network bandwidth load prefer to conserve sufficient bandwidth for each physical link to achieve load balance, which can provide more available physical links for subsequent customer requests. e experiments are also conducted to compare the computation times between LCVP and benchmark algorithms. In order to show the results more directly and briefly, the computation time of LCVP is used as the basis and the results on the normalized computation time are shown in Figure 4. e algorithms that precalculate the K shortest physical paths are faster than the others because they not only save the time to find the paths but also prevent the algorithms from the unlimited search for potential candidate paths, which leads to a longer search time. In addition, it should be emphasized that the computation time of LCVP includes the time of using Yen's algorithm, preprocessing, constructing CG, and finding the maximum clique. It can be seen that even when the radius ρ is set as 40, the computation time of LCVP is still much lower than the benchmarks. is is due to the good property of CG, which optimizes the complexity of LCVP.
Finally, the relationship between the blocking rate and radius ρ is experimented and analyzed. For different values of radius ρ, i.e., different sizes of candidates PM sets, the results are shown in Figure 5. It can be seen that even when the radius ρ is 15, the blocking probability of LCVP is only 7%, which is much lower than the benchmarks. Furthermore, with the increase of radius ρ, the probability decreased to 1%. is is because when radius ρ increases, the number of candidate PMs for each VM also increases, which provides more choices for LCVP to find the desired solution. However, benchmark algorithms do not change significantly with the increase of radius ρ, whose reason is that the algorithms have no performance guarantee and may generate worse or even infeasible solutions.
remove v s from Φ s v r ; (6) endif (7) endfor (8) endfor //for each PM, find out all VMs that consider the PM as a candidate and save the VMs in Ψ v s .
endfor (13) endfor //for each PM whoseΨ v s > 1, only reserve it as the candidate for the VM whose candidate PM set //has the smallest size and remove it from other candidate sets.  (4) if min e s ∈p b s e s ≥ b r e r then (5) Insert a node in V c to represent p; (6) else (7) remove p from P s e r ; (8) endif (9) endfor (10) if P s e r � ∅ then (11) return (FALSE);//return construction failure status (12) endif (13)  input： CG G c , PM-graph G s , VM-graph G r output： Maximum clique M (1) M←∅ ; (2) P N ←f SP (V c );//put all corresponding physical links of the nodes in CG into the set P N (3) for each e r ∈ E r in nonincreasing order of b r e r do //obtain the physical links with sufficient bandwidth to carry e r and store them in P

Conclusions
In cloud computing, the VM placement is an important research direction. e LCVP proposed in this paper can place VMs onto PMs under location constraints and ensure that there is at least one physical link between the selected PMs. e experimental results showed that LCVP could serve more customer requests and provide better blocking probability than the benchmarks with much lower computation time. In the future, the focus will be on the modified version of LCVP that can consider multipath based link mapping.

Data Availability
e data used can be found at https://pan.baidu.com/s/1-bjl1P6TM_qJQB09ctfcXw.