AIMING : Resource Allocation with Latency Awareness for Federated-Cloud Applications

Federated-cloud has been widely deployed due to the growing popularity of real-time applications, and hence allocating resources among clouds becomes nontrivial to meet the stringent service requirements. The challenges lie in achieving minimized latency constrained by virtual machines rental overhead and resource requirement. This becomes further complicated by the issues of datacenter selection. To this end, we propose AIMING, a novel resource allocation approach which aims to minimize the latency constrained by monetary overhead in the context of federated-cloud. Specifically, the network resources are deployed and selected according to k-means clustering. Meanwhile, the total latency among datacenters is optimized based on binary quadratic programming.The evaluation is conducted with real data traces.The results show that AIMING can reduce total datacenter latency effectively compared with other approaches.


Introduction
Real-time applications continue to grow rapidly, with evermore functionality and evermore users around the globe.Because of this growth, major real-time application providers now use tens of geographically dispersed datacenters to support service.For early real-time application providers, they have an urgent need to migrate their service from their own servers to distributed datacenters to ensure short latency and compete with new application providers.Hence, the main unmet challenge of leveraging these datacenters is effectively allocating resource and optimizing latency for applications.Usually the resource comes in the form of virtual machines (VM).The application providers deploy different types of VM configurations which consist of specific amount of CPU, memory, and disk resource.
It is challenging to determine the location and the number of VMs for application providers, without any knowledge of the latency among datacenters (since the application providers are in the SaaS layer).However, the cloud providers have their own fiber links to interconnect all major regions, and the latency among datacenters is stable and measurable.
By this feature, it is feasible to determine the number of VMs and optimize delivery latency for application providers.
In this paper, a resource allocation approach AIMING (-meAns & bInary quadratic programMING) is proposed for VM placement to optimize latency among datacenters for real-time application providers in federated-cloud.(The federated-cloud is the deployment and management of multiple external and internal cloud computing services to meet business needs.)The AIMING optimizes total latency from the perspective of application providers and provides a VMs rental strategy for real-time application providers, especially for the early providers who need to migrate their services to federated-cloud.We get the measured latency among datacenters and then select datacenters according to the latency.There are several constraints to be addressed.
(1) Choosing the well-connected datacenters with low latency is essential, since there are dozens of datacenters around the world and the paths' latency among these datacenters is different.Latency significantly affects the QoE of real-time application users.Thus, ensuring low latency is the initial aim of the application providers.
(2) Compared to standalone datacenter, different datacenters have different VM rental prices in federated-cloud.

Related Work
A number of schemes have been proposed for latency and monetary overhead optimization in cloud.In this section, we briefly review previous works related to this work from three aspects, namely, standalone cloud resource allocation, federated-cloud resource allocation, and mobile cloud resource allocation.

Standalone Cloud Resource Allocation.
Many works allocate VMs in standalone cloud to reduce latency and monetary overhead.Meng et al. [6] have proposed a two-tier approximation algorithm to efficiently place VMs.The algorithm consists of two components: SlotClustering and VMMinKcut.SlotClustering is to partition  slots into  clusters by using the cost between slots as the partition criterion.VMMinKcut is to partition  VMs into  VM clusters with minimum intercluster traffic.Fang et al. [7] have designed VMPlanner to optimize both VM placement and traffic flow routing so as to turn off unneeded network elements for power saving.An offline VM placement through emulated VM migration is proposed by Li et al. [8].As long as the suitable physical machine has enough capacity, the migration algorithm can place the VM to its best physical machine directly.Furthermore, they study a hybrid scheme by employing a batch to accept upcoming VMs for online scenarios.Ant colony is deployed by Gao et al. for resource allocation in cloud.Both of them optimize the power consumption of CPU utilization.
In addition, Gao et al. [9,10] also take the potential cost of wasted resources into consideration.A cloud resource allocation based on imperfect information Stackelberg game (IISG) and hidden Markov model (HMM) has been proposed by Wei et al. [11].They firstly use the HMM to predict the service provider's current bid based on demand.Then they establish the IISG to choose the optimal bidding strategy and achieve maximum profits.Li et al. [12] focus on cost optimization of network traffic and physical machines utilization.They solve the VM placement problem from two aspects.The first aspect is putting emphasis on the cost of network traffic with fixed physical machines cost for two cases (the number of requested VMs is same or different).The second aspect is placing VMs in general case and taking both network cost and physical machine utilization cost into consideration.Bandwidth allocation is considered by Tian et al. and Yu et al.A distributed host-based bandwidth allocation design with underlay congestion control layer and private congestion control layer has been presented by Tian et al. [13].To address the issue that existing works do not consider the impact of primary embedding on backup resource consumption, Yu et al. [14] have proposed a novel algorithm to compute the most efficient resource embedding given a tenant request.

Federated-Cloud Resource
Allocation.Some notable schemes are proposed for federated-cloud resource allocation.Jiao et al. and Wu and Madhyastha optimize the network traffic by resource allocation in federated-cloud.Multiobjective optimization for placing users' data over multiple clouds for socially aware services has been studied by Jiao et al. [15].They build a framework that can accommodate different objectives.An optimization approach based on graph cuts is leveraged for decomposing their original problem.To minimize user perceived latency, Wu and Madhyastha [16] have conducted a deep study of latency benefits when deploying web-services across AWS, Microsoft Azure, and Google Compute Engine.Their measurement shows that it is necessary to replicate data to ensure low latency for users.The capacity of cloud infrastructure has been studied by Narayanan   geodistributed cloud infrastructure, Narayanan et al. [17] discuss several factors that influence the capacity provisioning and illustrate the research challenges for software design.
A novel cloud brokering approach has been proposed by Tordsson et al. [18] to optimize the placement of virtual infrastructure across multiple clouds.In a high throughput computing cluster case, they verify the feasibility of the approach.The communication overhead is considered by Lee et al. and Yi et al. Lee et al. [19] have proposed a distributed resource allocation approach for resource competition in federated-cloud.In their approach, each job consists of tasks and the communication behavior among tasks can be profiled.According to the communication behavior, their approach minimizes the communication overhead and tries to allocate grouped tasks to get balance in the case of resource competition.From the perspective of customers, Yi et al. [20] focus on network-aware resource allocation and develop a mixed binary quadratic programming optimal model to minimize the rental cost for each customer.Except for energy, the bandwidth is also considered by Xu and Li [21].
To solve the large-scale optimization, a distributed algorithm based on alternating direct method of multipliers has been proposed.To achieve energy efficiency and satisfy deadline constraints, Ding et al. [22] focus on dynamic scheduling of VMs.Mihailescu et al. and Wang et al. present pricing schemes for users according to their resource need.Mihailescu and Teo [23] have discussed a strategic-proof dynamic pricing scheme suitable for allocating resources in federated-cloud.A new cloud brokerage service is proposed by Wang et al. [24] based on leveraging both pricing benefits and multiplexing gains.The proposed service reserves a large pool of instances and serves users with price discounts.Hung et al. [25] coordinate job scheduling across datacenters with low overhead, while achieving near-optimal performance.A data hosting scheme has been proposed by Zhang et al. [26] to select several clouds to store data with monetary minimizing.Ardagna et al. [27] use game-theory to management runtime of resources from multiple IaaS providers to multiple SaaS with a revenue and penalty cost model.Jiao et al. [28] allocate and reconfigure resource in the multitier resource pool.And Palmieri et al. [29] optimize the overall communication and runtime resource utilization of cloud infrastructures by reoptimizing the communication paths between VMs and big data sources.

Mobile Cloud Resource Allocation.
The resource allocation problem is also a big challenge for mobile cloud environment.Gai et al. [30] have proposed a dynamic energy-aware cloudlet-based mobile cloud computing model concentrating on energy consumption during the wireless communications.In their work, they have solved energy problem with dynamic network and provided guidelines and theoretical supports.To efficiently handle the peak load and satisfy the requirements of remote program execution, Tong et al. [31] have deployed cloud servers at the network edge and designed the edge cloud as a tree hierarchy of geodistributed servers.Furthermore, a workload placement algorithm is proposed to decide which edge cloud servers mobile programs should be placed on.There are also many works for mobile service optimization [32][33][34].
Different from the previous research, AIMING allocates resources by latency optimization with monetary overhead constraint.The resources contain CPU and memory.We combine the -means and binary quadratic programming to solve the optimization problem.

Preliminary
To facilitate presenting the proposed approach, we first define some terms and notations that will be used for the paper in Notations.Then, we illustrate latency model and monetary overhead model for AIMING.

Latency Model.
In this section, we introduce our latency model in detail.We first discuss the communication model of real-time application among two users in federated-cloud through datacenters.Based on this communication model, we then model the latency of data transfer among VMs.
We assume that a real-time application is distributed on federated-cloud and provides service for users through the datacenter network.As shown in Figure 1, user1 and user2 are users of real-time application and server1 and server2 are distributed in datacenter1 and datacenter2, respectively.Server1 in datacenter1 is the nearest available application server to user1, and it is the same as server2 and datacenter2.The message source user1 connects with destination user2 through datacenter1 and datacenter2.
In the federated-cloud scenario, each VM may contact with each other and transfer data.We first show the latency model between two VMs for data transmission.Let VM where    ∈ {0, 1} and    = 1 indicates that VM   is selected to rent for service; otherwise    = 0.And it is the same as    .The latency between VMs approximately equals the latency between datacenters where VMs are located, since the datacenter is extremely large.The formula of total latency can be shown as follows: dc is the number of datacenters where VMs are placed. type is the number of the types of VM instances.(VM  ) is the VM number of th instance types in th datacenter.We consider the data communication among each VM in formula (2).After finishing the latency model, we show the monetary overhead model in the following Section 3.2.

Monetary Overhead Model.
We depict the monetary overhead model in this section.First, we discuss the data communication mode among VMs and pricing mechanism of federated-cloud.Figure 2 shows the data communication model between two VMs.VM 1 and 2 have different rental prices, and the data communication also needs expense.Thus, the total monetary overhead consists of two parts: VM rental monetary overhead and data transmission monetary overhead.
In addition, for federated-cloud scenario, different cloud providers have different prices.Even if the datacenters belong to the same cloud provider, they may have different VM rental prices and data transmission prices.The pricing mechanism of federated-cloud is shown in Figure 3.There are three cloud providers A, B, and C, and each provider has two datacenters.Users can get the prices of datacenters from the pricing module.They can choose the datacenters to rent VMs according to the pricing module.
According to the above discussion, the total monetary overhead consists of two parts: VM rental monetary overhead and data transmission monetary overhead.Different datacenters have different VM rental prices and data transmission prices.We model the VM rental monetary overhead and data transmission monetary overhead separately as follows: rental of formula (3) represents the rental monetary overhead and  transmission of formula (4) represents the data transmission monetary overhead. total of formula ( 5

Proposed Approach
In this section, we illustrate our resource allocation approach AIMING.In real-world, most real-time communications happen in a region, such as a country.Thus, the communications in a region are far more frequent than the communications among regions.Based on this assumption, we optimize the total latency according to regions.We first divide the datacenters into  regions; then resource is allocated for each region to minimize the total latency.

Datacenter Preprocess.
In this section, we preprocess  datacenters by dividing the datacenters into  regions.
The distribution of datacenters is heterogeneous.Thus, for convenience, we divide the datacenters according to the endto-end latency  = (ping)/V(packet).We use -means clustering to divide the datacenters into  regions.Given a set of datacenter latency observations (L 1 , L 2 , . . ., L  ), where each observation is a -dimensional real vector, -means clustering aims to partition the  observations into  (⩽) sets S = { 1 ,  2 , . . .,   } so as to minimize the within-cluster sum of squares (WCSS).Hence, the objective is to find arg min where   is the mean of points in   .Figure 4 is an example of datacenters clustering when the number of datacenters  equals 5. We divide the datacenters into  regions ( is equal to 2).The latency between two Users can specify the following types of deployment constraints as follows: (1) VM hardware configuration constraints are expressed by restricting the number of each instance type and total infrastructure capacity need.Formula (8) is the number of VMs of each instance type constraint.  vm is the minimum VMs number of instance type  of region   .Formula (9) is the total infrastructure capacity constraint of region   .  need is the minimum infrastructure capacity need of region   .
(2) Load balancing constrains are expressed as VM number and infrastructure capacity need of each datacenter.Formula (10) is the VM number constraint of each datacenter.Users can determine this type of constraint of datacenter  by  min vm and  max vm , the minimum and maximum VMs number of datacenter , 1 ⩽  ⩽    .Formula ( 11) is the infrastructure capacity need constraint of each datacenter.Users can determine this type of constraint of datacenter  by  min vm and  max vm , the minimum and maximum capacity need of datacenter , 1 ⩽  ⩽    .(3) Monetary overhead constraint is expressed as budget of VM rental overhead and data transmission overhead.Budget  of formula (12) is the monetary constraint of region   .
Moreover, the number of selected datacenters for region   is    .And we should note that each VM has to be exactly one instance type and placed in exactly one cloud.

Lemma 1. The region total latency optimization problem is NP-hard.
Proof.We prove the NP-hardness by reduction from 0-1 Knapsack problem.Given a knapsack instance with  items, each with a weight   and a value V  , along with a maximum weight capacity , we create the region total latency optimization problem as follows.The number of datacenters   and    are equal to 1.The CPU of each type of VM instance is equal to 0, which means no infrastructure capacity herein.And the VM rental price is particular.There are  links among VMs corresponding to the items.Assuming that the latency of each link between VMs is   , then −  is corresponding to value V  and data transmission price is corresponding to weight   .The budget constraint is the only limitation of the datacenter, which is corresponding to maximum weight capacity .Finally, 0-1 knapsack problem can be reduced to the region total latency optimization problem.Thus, the region total latency optimization problem is NP-hard.There are multiple model languages and solvers which can be used to solve the optimization problem.We choose the AMPL as modelling language.AMPL has good support for sets and its syntax is close to mathematical notation.Since AMPL can also be used for various types of optimization problems for quadratic programming formulations described above, we use the CPLEX solver.

Performance Evaluation
The AIMING is based on federated-cloud environment that is hard to be implemented in practice due to its large network.We evaluate AIMING by simulation and compare it with several approaches.The evaluation shows total latency and rental overhead under various numbers of VMs and datacenters.Parameters analysis includes budget and deviation of VM number.Extensive evaluation results show that AIMING is efficient in total latency reduction.

Experimental Setup.
(1) Using python with PyCharm and based on real-world latency data trace, the data preprocess runs in a variety of realistic settings.We set up a federated-cloud environment with 22 datacenters and 3 cloud providers including Amazon Web Service, Microsoft Azure, and Google Cloud Platform.The 9 datacenters of Microsoft Azure are South USA, Central USA, Sao Paulo, Dublin, Amsterdam, East Asia, Japan West, Singapore, and East Australia.The 4 datacenters of Google are Taiwan, Europe, central USA, and East USA.The other 9 datacenters of Amazon are Oregon, Virginia, California, Sao Paulo, Dublin, Frankfurt, Tokyo, Singapore, and Sydney.We use the interdatacenters latency dataset measured by Mansouri and Buyya [35] from Cloud Laboratory of The University of Melbourne.They have measured the latency between 22 datacenters for 8 hours.The packet size is between 64 bytes and 1 KB and the interval time between each packet is 4 seconds.The latency dataset is measurable and reliable because the cloud providers have their own fiber links interconnecting all major regions.Each network topology of datacenter is fat-tree structure.
Figure 5 shows the WCSS with variety .The number of datacenters  is equal to 22.In Figure 5, the WCSS is below 5 after  equals 4. Thus, we choose 4 as the value of  and get the clustering result as follows.
Cluster 2 contains 8 datacenters: Azure East Asia, Azure Japan West, Azure Singapore, Azure Australia East, Google Taiwan, Amazon Tokyo, Amazon Singapore, and Amazon Sydney.
Cluster 3 contains 7 datacenters: Azure South USA, Azure Central USA, Google Central USA, Google East USA, Amazon Oregon, Amazon Virginia, and Amazon California.
Cluster 4 contains 2 datacenters: Azure Sao Paulo and Amazon Sao Paulo.
Given that each region has the similar trend, we conduct the latency optimization algorithm on region 1 with 5 datacenters in the following part.
(2) In this section, we use AMPL with CPLEX to implement the latency optimization algorithm.Four VM instance types are considered in the evaluation.For the rental VM configuration, the performance parameters are listed in Table 2.As for the rental price and data transmission price, we use the price from their websites [36][37][38].We conduct the latency optimization algorithm on 5 datacenters of cluster 1.For convenience, we define a variable, the number of VMs for each instance type, to replace the total VMs number.Only 4 VM instance types are considered here; then the total number of VMs is 4 multiples.
In order to evaluate the performance of AIMING, we compare it with two approaches in terms of total latency and monetary overhead: (i) Random: this approach is a random method with the constraints of VM number and capacity need.(ii) Greedy: this approach selects    datacenters with cheapest VM rental price to allocate VMs [39].(iii) AIMING: our approach is based on binary quadratic programming.

Total Latency.
In this section, we show the optimal result of total latency with various numbers of VMs and datacenters.As shown in Figure 6, the total latency of AIMING is better than other two approaches.The total latency of AIMING is about 14906 ms when the number of each VM instance type is 10; however, other approaches (e.g., Random, at about 24198 ms, and Greedy, at about 15977.4ms) are higher.We set the number of datacenters as 3.The total latency increases along with the number of each VM instance type, which ranges from 1 to 10.However, the difference value between AIMING and Greedy also increases along with the number of VMs.This is because when the number of each VM instance type increases, the data transmission need increases vastly, which leads to higher difference value of total latency.
Figure 7 shows the relationship between total latency and number of selected datacenters.The number of each VM instance type is equal to 5, which means that there are 20 VMs to allocate.When the number of selected datacenters is equal to 1, only the datacenter can be chosen.Then, AIMING, Greedy, and Random have the same total latency which is 400 ms.The Greedy ignores the transmission latency because it often chooses the cheapest datacenters.The AIMING can decrease a maximum 46.1% latency time of Greedy approach when the number of selected datacenters is 2.

Parameter Analysis
(1) Monetary Overhead and Number of Each VM Instance Type. Figure 8 describes the relationship between monetary overhead and the number of each VM instance type.We set the number of selected datacenters as 3 which is the same as Figure 6.The Greedy and AIMING nearly have the same rental overhead until the number of each VM instance type equals 6.The reason is as follows: when the number of each VM instance type is not large enough, the VM rental overhead plays a more important role in monetary overhead than data transmission overhead.After the number of each VM instance type equals 6, the data transmission overhead occupies a large proportion and AIMING is obviously better than Greedy and Random.When the number of each VM instance type is 10, the monetary overheads of AIMING, Greedy, and Random are $85.275,$103.1247, and $120.8789.
(2) Monetary Overhead and Number of Selected Datacenters.The relationship between monetary overhead and selected datacenters number is shown in Figure 9.The number of VM for each instance type is equal to 5, which is the same with Figure 7.When the number of selected datacenters is 3, the AIMING starts to show its advantage in monetary reduction more than Greedy.When the number of selected datacenters is equal to 3, it will lead to frequent data interaction among VMs and data transmission overhead becomes the major expense.The monetary overhead of the three approaches increases along with the number of selected datacenters.When the number of selected datacenters is equal to 5, the monetary overheads of AIMING, Greedy, and Random are $30.1725,$32.538, and $71.6723.
(3) Total Latency and Budget.The relationship of total latency and budget is shown in Figure 10.The total latency decreases when budget increases.The number of selected datacenters is set to be 5.The number of each VM instance type is 5, which means that there are 20 VMs to be allocated.We control the budget ranging from $33 to $41, and the difference of latency decreases from 5130.8 ms to 4542.9 ms.The decline is caused by relaxing budget constraint.With the increase of budget, the VM allocation choice scope becomes larger.The better allocation solution will appear and lead to lower total latency.
(4) Monetary Overhead and Deviation of VM Number.Using deviation of VM number, we discuss the VM number constraint of each datacenter.In Figures 11 and 12, the number of each VM instance type is equal to 5 and the number of selected datacenters is 5, which that means there are 20 VMs to be allocated in 5 datacenters.To explain the deviation of VM number, we take an example: if the deviation of VM number is 0, the constraint of VM number of each datacenter is no more than 5 and no less than 5, which means that the constraint of VM number of each datacenter is 5.If the deviation of VM number is equal to 1, the constraint of VM number of each data center is no more than 6 and no less than 4. In Figure 11, the monetary overhead decreases from $32.4114 to $27.3404 when the deviation of VM number increases from 0 to 4. It is because the increasing of deviation will enlarge the solution scope.Hence, the curve shows an impressive decline after the deviation is equal to 2.
(5) Total Latency and Deviation of VM Number.In Figure 12, the total latency decreases along with increase of VM number deviation.The reason is as follows: when the deviation of VM number increases, the VM allocation scope becomes larger.Thus, better allocation solution will decrease the total latency, and the curve declines after deviation equals 2. The total latency decreases from 4755 ms to 2967.8 ms when the deviation of VM number increases from 0 to 4.

Conclusion
In this paper, we propose an optimization approach named AIMING to optimize the total latency in federated-cloud scenario.The approach includes two steps: datacenter preprocess and latency optimization.The first step is based on -means clustering, and quadratic programming is used for the latter step.With real latency among datacenters, VM rental price, and data transmission price as input, we use python and AMPL to conduct the evaluation.The evaluation shows that AIMING can reduce the total latency and monetary overhead efficiently.Besides, we also do a lot of parameter analyses to show more details and impact factors.

V(packet):
The packet size of ping   (ping): The latency of ping from th datacenter to th datacenter  dc : The number of datacenters which are taken into consideration  type : The number of the VM instance type  total : Total monetary overhead  rental : VM rental monetary overhead  transmission : Data transmission monetary overhead  total : The total latency of datacenters    : I f V M   is selected to be rented,

Figure 1 :
Figure 1: Distributed clouds communication network model for real-time application users.

4 Figure 6 :Figure 7 :
Figure 6: The total latency with various numbers of each VM instance type, which ranges from 1 to 10.

Figure 8 :
Figure 8: The monetary overhead with various numbers of each VM instance type, which ranges from 1 to 10.

Figure 9 :
Figure 9: The monetary overhead with various numbers of selected datacenters, which ranges from 1 to 5.

Figure 10 :
Figure 10: The total latency with budget ranges from $33 to $41.

Figure 11 :Figure 12 :
Figure 11: The monetary overhead with various deviations of VM number ranges from 0 to 4.
et al. and Tordsson et al.Towards a leaner Through   (ping) and V(packet), we can model the latency between VM   and VM   as follows: denote th VM of th instance type in th datacenter.And it is the same as VM   .(VM   , VM   ) is the latency between VM   and VM   .V(VM   , VM   ) is the transmission data volume between VM   and VM   .We get the latency by ping.  (ping) is the time of ping, and V(packet) is the packet size of ping.

Table 1 :
End-to-end latency of datacenters., which means that   =   , 1 ⩽ , and  ⩽ 5. We get the sets S = { 1 ,  2 } which are the two parts shown in Figure4.The latency set of datacenters is shown in Table1.Each observation a 5-dimensional real vector,   = { 1 ,  2 ,  3 ,  4 ,  5 }, 1 ⩽  ⩽ 5.4.2.Latency Optimization of Datacenter Region.In this section, we minimize the total latency of each region by placing VMs.The region total latency optimization problem is modeled as binary quadratic programming.For region   ,   represents the total need number of datacenters of region   .The number of selected datacenters of region   is represented as    ⩽   .The purpose is to select    datacenters from   and place VMs to minimize the total latency of the region.We assume that the capacity need of each region for customer request is known to us.The capacity performance of a given instance type  is denoted as   .Hence, the objective of region   can be shown as follows:
VM   : th VM of th type in th datacenter (VM  ): Th eV Mn u m b e ro fth instance type in th datacenter (VM   ): Th er e n t a lp r i c eo fth VM of th instance type in th datacenter   (VM   , VM   ): The data transmission price between VM   and VM   (VM   , VM   ): The latency between VM   and VM (VM   ): The rental duration time of VM   V(VM   , VM   ): The transmission data volume between VM   and VM