Coarse-Grain QoS-Aware Dynamic Instance Provisioning for Interactive Workload in the Cloud JianxiongWan

Cloud computing paradigm renders the Internet service providers (ISPs) with a new approach to deliver their service with less cost. ISPs can rent virtual machines from the Infrastructure-as-a-Service (IaaS) provided by the cloud rather than purchasing them. In addition, commercial cloud providers (CPs) offer diverse VM instance rental services in various time granularities, which provide another opportunity for ISPs to reduce cost. We investigate a Coarse-grain QoS-aware Dynamic Instance Provisioning (CDIP) problem for interactive workload in the cloud from the perspective of ISPs. We formulate the CDIP problem as an optimization problem where the objective is to minimize the VM instance rental cost and the constraint is the percentile delay bound. Since the Internet traffic shows a strong self-similar property, it is hard to get an analytical form of the percentile delay constraint. To address this issue, we purpose a lookup table structure together with a learning algorithm to estimate the performance of the instance provisioning policy. This approach is further extended with two function approximations to enhance the scalability of the learning algorithm. We also present an efficient dynamic instance provisioning algorithm, which takes full advantage of the rental service diversity, to determine the instance rental policy. Extensive simulations are conducted to validate the effectiveness of the proposed algorithms.


Introduction
Before the advent of cloud computing, Internet service providers (ISPs) used to reserve mass amount of resources in order to deal with the peak workload; otherwise the service response time may increase to an intolerable degree while facing the flash crowd and greatly degrade the user experience.However, this approach is energy-ineffective since peak resource utilization is often three times larger than the average utilization for a typical ISP.Things get even worse in systems that provide interactive service where the average utilization is only around 10% of the total capacity provisioned for the peak load [1].The cloud computing technology provides a novel service paradigm called Infrastructure-as-a-Service (IaaS) to reduce the hardware cost and maintenance cost.In the IaaS, the ISPs only need to rent resource (e.g., virtual servers and network bandwidths) from the cloud providers (CPs) instead of purchasing a vast number of physical servers themselves.The IaaS service enables a more flexible and effective approach for resource provisioning.For example, users in the Amazon EC2 system can rent resource for a small period of time to cope with the flash traffic.
This paper studies a Coarse-grain Dynamic Virtual Machine (VM) Instance Provisioning (CDIP) problem for interactive workload subjected to a percentile delay constraint in the cloud from the perspective of ISPs.More specifically, this problem is related to the dynamic VM rental policy for the ISPs to minimize the resource rental cost while satisfying QoS constraints.A fine-grain (in the orders of seconds or minutes) resource provisioning policy may be more effective in increasing resource utilization and reducing cost, but it is more complex and hard to implement.For example, the startup phase of a VM instance in EC2 which "typically takes less than 10 minutes [2] (observed on November 2nd, 2013)" is not sufficient to support the fine-grain control policy.Further, the fine-grain policy can induce fluctuation and undermine the system stability.CPs like Amazon EC2 nowadays do provide a coarse-grain IaaS service instead of the fine-grain one.For example, the EC2 system offers IaaS service at 2 time scales.At a higher level, there is a VM rental service for 1 or 3 years (denoted as Reserved Instance Service, RIS); at a lower level, VM instances can also be acquired on an hourly bases (denoted as Marginal Instance Service, MIS) to absorb the instant flash traffic.Generally speaking, the cost for using MIS instances is much higher than using RIS instances (refer to Table 1 for a detailed pricing structure in Amazon's EC2 platform).How to properly use these two services is one of the most important problems faced by ISPs to minimize cost.Beside the VM instance rental cost, ISPs also care about the Quality-of-Service (QoS) issue for their end users.For interactive workload, traditional QoS is expressed by the mean queueing delay which is easy to analyze using classic queueing theory.However, the self-similar nature revealed in the Internet traffic [3] failed queueing-based analysis.In addition, the fact that interactive workload can tolerate some QoS violations drives researchers to propose an alternative form of QoS specification where  is the system response delay,  th and  are the desired threshold value determined by Service Level Agreement (SLA).Unfortunately, there is no analytical form of (1) for the self-similar traffic.
In this paper, we formulate the CDIP problem as an optimization problem where the QoS constraints cannot be precisely determined.We develop efficient algorithms to solve the CDIP problem and conduct numerical analysis to evaluate the proposed algorithms.Our contributions are that (i) we design a resource prediction algorithm to estimate the performance of resource provisioning policy in the self-similar traffic, (ii) we extend the resource prediction algorithm with function approximations to enhance the scalability of the algorithm, (iii) we present a VM instance provisioning algorithm for ISPs to determine the optimal number of RIS and MIS VM instance, which minimizes the VM instance rental cost.
This paper proceeds as follows.Section 2 discusses the related works; Section 3 shows the opportunity for reducing rental cost using hybrid RIS/MIS; Section 4 presents a general optimization framework for the CDIP problem as well as the solution algorithms; Section 5 extends the algorithms with function approximations to address the scalability issue; Section 6 evaluates the proposed algorithms in various settings, followed with conclusions in Section 7.

Related Works
To make resource provisioning in the cloud computing environment, the first issue that must be addressed is to predict the future resource demand accurately.There are many researches dedicated to this area.Chen et al. [4] used a multiplicative Seasonal Autoregressive Moving Average (S-ARMA) approach to predict the mean and standard deviation of interarrival times and used a simple decomposed model as well as Winter's smoothing method to predict the mean and standard deviation of file size.Gmach et al. [5] developed a pattern prediction method for cyclic workload through a workload periodogram function and an autocorrelation function.Caron and Desprez [6] used pattern matching to forecast the resource demand in the cloud.Niu et al. [7] proposed a channel interleaving scheme which can predict demand for new videos that lack historical demand data.
There are a number of works to lower the operational cost for the cloud providers (CPs).Ahmad and Vijaykumar [8] proposed a PowerTrade method to lower the total energy consumption of active servers, standby servers, and cooling facilities.They also developed a SurgeGuard method to maintain an extra number of servers at two time granularities to absorb flash crowd.Meisner et al. [1] developed a PowerNap mechanism which includes a sleep-active state scheduling component and a network interface card (NIC) supported by Wake-on-LAN functionality.The system is put into the sleep mode when there are no workloads.The NIC can wake the system up within 1 ms as long as there are packet arrivals from the networks.Leverich and Kozyrakis [9] integrated Hadoop system with an energy controller which recasts the data layout and task distribution to enable significant portions of a cluster to be shut down.Our work, on the other hand, studies how to reduce the cost from the perspective of Internet service providers (ISPs).
There are some recent researches close to our works.In [10], the author formulated the resource leasing problem as an Integer Programming Problem (IPP) and developed CoH, a family of heuristic policy to solve the problem.However, [10] treated batch jobs only and had little SLA considerations.Reference [11] also studied the instance provisioning problem and purposed a dynamic instance purchasing scheme based on the Central Limit Theorem to minimize the cost.The SLA constraint they considered is the overload probability which is not suitable for delay-sensitive interactive workload.The works [12,13] make resource provisioning decision based on the Autoregressive Integrated Moving Average (ARIMA) prediction method; they still did not consider delay constraint.In contrast, [14] explicitly incorporated the delay into the objective function of the optimization problem.However, the delay was derived based on Markovian queueing theory which is not the case in today's Internet dominated by selfsimilar traffic.

Problem Statement
The structure of a data center in a cloud computing system is shown in Figure 1.Inside the data center, there are a number of physical servers.A physical server hosts one or more Virtual Machine (VM) according to its resource capacity.Note that we only present the VM instead of the physical server in the figure.An ISP rents VMs from the cloud  provider serve to its end users.To reduce the request response time, the data center often employs a shared queue structure.The arrival rate of end user varies over time, which induces a time-changing VM instance demand.Figure 2 presents an example which divides a day into 8 phases (3 hr/phase) and the -axis shows the VM instance demand to ensure the QoS requirement in each phase.The marginal rental cost in Amazon EC2 is given in Table 1.From Figure 2, we can see that there is a big gap between the maximum and the minimum instance demand.If the ISP only uses RIS instance, he must acquire 23 instances in order to satisfy the peak workload appeared in the 6th phase, which wastes a lot of resource and rises the daily instance rental cost to 247.96$ (the rental cost for using only RIS instance can be computed as 23 × 0.448 × 24 = 247.96$(the product of the number of instance, the marginal cost, and total 24 hours)).In contrast, if the ISP only adopts MIS instance, he will obtain the highest resource utilization, and there is an opportunity to reduce the daily rental cost to 230.52$ (from Figure 2, the total number of MIS instances is 10 + 15 + 9 + 19 + 5 + 23 + 12 + 20 = 113.Since a phase contains 3 hours, the rental cost for using only MIS instance can be computed as 113 × 3 × 0.680 = 230.52$).
If the ISP uses a hybrid approach which includes both RIS and MIS, on the other hand, the daily instance rental cost can be remarkably reduced.To see that, consider a resource provisioning policy which rents 10 RIS VM instances and acquires extra MIS instances if RIS instances are insufficient.The number of MIS instance can be formally written as [  − 10] + where   denotes the number of VM instance demand in phase .The daily rental cost for this hybrid approach is 187.08$,(the rental cost for RIS instance is 10 × 0.448 × 24 = 107.52$.The total number of MIS instances is 5 + 0 + 9 + 13 + 2 + 10 = 39; therefore the rental cost for MIS instance is 39×0.680×3= 79.56$.Thus, the total cost is 107.52+79.56= 187.08$.), which saves 24.3% and 18.8% compared with using purely RIS and MIS instance, respectively.
The above analysis suggests 2 assumptions.First, the QoS performance in terms of percentile delay can be precisely predicted; second, the number of RIS and MIS instances can be determined to minimize the VM instance rental cost.The following sections explain these two assumptions in detail.

A General Optimization Framework for the CDIP Problem
The notations used in this paper are shown in Notations section.The CDIP problem can be formulated as min subject to Pr (  ≥ ) ≤ , ∀ ∈ {1, . . ., } , where  0 is the number of RIS instance and   ,  > 0, is the number of MIS instance in phase .Note that, in the CDIP problem, the distribution of   is determined by the characteristics of exogenous interactive workload arrivals and the number of active VM instance   .As stated in Section 1, this problem is hard to solve, since we can hardly derive an explicit form of constraint (3).In this section, we will show how to approximately characterize constraint (3) and obtain the optimal solution.In practice, it is impossible to let  → ∞.In fact, Algorithm 1 converges very fast in our numerical analysis (it converges within tens of iterations).Alternatively, we can also use the following equation as the stop criterion:

A Learning Algorithm to
where  is a threshold value to get a desired precision.

The Instance Provisioning Algorithm.
Based on the VP table, we can obtain the minimum number of VM instances needed to meet the QoS constraints in phase , that is,   .To find the number of RIS instances  0 is equal to solve the following optimization problem min where delay is considered as a function of the number of VM instances.Problems ( 6)-( 7) are an integer piece-wise function of  0 where the optimal solution must appear in the boundary points.Algorithm 2 provides the solution method for problem (6).It can be divided into three parts as follows.
(i) The first part (lines 1-8) uses exhaustive search to obtain the minimum number of VM instance required to satisfy QoS constraints.The result is stored in vector   ,  ∈ {1, . . ., N}.
(ii) The second part (lines 9-17) solves problems ( 6)- (7), and the result is  0 , the optimal number of RIS instances, and the corresponding value of object function .
(iii) The third part (lines 18-20) computes the number of MIS instances based on   and  0 .

Extensions
Algorithms 1 and 2 can effectively predict the number of instances needed for satisfying QoS constraints and reducing total rental cost for the ISPs.However, the scalability of these two algorithms is questionable: in order to obtain a precise estimation of the violation probability in VP table, we must visit all possible instance provisioning policies and get sufficient violation probability samples.This section starts from the point of simplifying VP table by function approximation techniques to enhance the scalability of Algorithms 1 and 2.
The idea of function approximation is to use a function  =   (  ) to approximate the mapping between the number of instances and the violation probability in phase .In this paper, we use two forms of approximation: (i) a linear approximation given by (ii) a nonlinear approximation given by Note that function   (  ) is related to a certain phase ; therefore the parameters  and  have a subscript .We have further remarks for these function approximations as follows.
(1) Intuitively, the QoS violation probability decreases as there are more VM instances; that is,   is a decreasing function with respect to   ; therefore   must be negative in the nonlinear case.
(2) The value of   will all be 0 when   exceeds a certain threshold, since no QoS violations occur if there is sufficient number of VM instances.When using linear approximation, we should filter out the case   (  ) = 0; otherwise the estimation precision will be remarkably undermined for cases where   (  ) > 0.
We use the least square approach to obtain parameters   and   in the approximate function   .Formally, the least square approach is given by min where  is the amount of samples and x  is the th unbiased sample for violation probability .
For the linear approximation, the optimal solution should satisfy Rearranging these two equations, we have The above analysis suggests For the nonlinear approximation, let  = ln ,   = ln   ,   = ln   , and   =   , and take "ln" in both sides of ( 9), which transforms the nonlinear approximation into a linear approximation Following the idea of the linear approximation, we can obtain the solution for the nonlinear approximation as We integrate the function approximations into Algorithms 1 and 2 where VP table is replaced by an array func app [𝑁].Each item in func app[] contains 2 elements, that is,  and .With function approximations, some revisions are needed for Algorithms 1 and 2, which are shown in Table 5.

Simulation Setup.
Internet traffic shows a strong selfsimilar property [3,15].We use the Multiscale Markov-Modulated Poisson Processes (MMPP) model to generate a self-similar like traffic.This approach has been proved effective in previous researches [16][17][18] and was successfully applied in the literatures like [19][20][21][22].We use the approach the same as in [22], that is, a three-dimension Markov onoff modulated Poisson process, to generate the interactive workload arrivals.Consider the following.
(i) The first dimension is the workload burst in the order of 1 second.We assume that the peak workload arrives at the middle of the day, that is, the 43200th second; therefore the arrival rate as a function of time can be given by (ii) The second dimension of workload burst is 2000 requests per 5 second.
(iii) The last dimension of workload burst is 5000 requests per 10 second.

Estimation of the Response Time.
In a production cloud system, it is impossible to log the response time for each incoming request to calculate the delay violation probability.
A more practical way is to measure the mean response delay  in a small time slot and view  as the response delay for all requests arrived in this time slot.This approximation of response delay will be more accurate as the length of the time slot decreases.For example, in [23], the length of the time slot is set to 10 minutes.In our work, we set it to 10 seconds since we need to measure delay violation probability in a higher precision.
To estimate the mean response time in a time slot, we employ the Allen-Cunneen approximation formula [24,25] for the // queueing system: where  is the average response time,  is the average service rate,  is the average arrival rate,  = / is the average utilization of a server,  is the number of servers.  takes value from the following formula: and   are the coefficients of variation of request interarrival times and service times, respectively.In this paper, we assume a Poisson service process with  = 100 requests per second; therefore   = 1.In order to online estimate   , we further divide a time slot into  time windows (see Figure 3).The algorithm to estimate   is shown in Algorithm 3.

Result Analysis
The time structure for simulation.A day is divided into several phases.The delay violation probability is evaluated in each phase.A phase is divided into several slots.The response delay for each request arrived within a slot is approximated by the mean response delay in this slot.To estimate the mean response delay using the Allen-Cunneen formula, a slot is further divided into several time windows to measure the coefficient of variation of inter-arrival time.
Algorithms 1 and 2, we can obtain that the optimal number of RIS instances is 29. Figure 4 shows the cost of three instance provisioning policies.Consider the following.
(i) In the RIS mode, the ISP should rent 37 instances in all hours of a day since the system must satisfy the peak workload demand.This policy yields 408.576$ per day.
(ii) In the MIS mode, the ISP makes instances provisioning decision in each hour according to the predicted demand; therefore the resource utilization is the highest.Unfortunately, the total daily cost (514.08$) is even higher than the one in the RIS mode.
(iii) In the hybrid mode, the optimal number of RIS instances is 29.Although, in some cases, this is a little waste of resource, the daily cost of this policy is the lowest (360.768$).

Effects of the Rental Granularity.
The length of the phase (or interdecision time) in the Amazon EC2 is 1 hour.
Here, we vary the length to 2 and 3 hours to study its impact on the daily cost.Figure 5 plots the optimal number of reserved instances in each hour.It goes "smoother" as the length of the interdecision time becomes longer.For example, the numbers of reserved instances for the three rental granularities in time interval [10,15] are {35, 36, 38, 38, 36, 35}, {35, 37, 37, 37, 37, 35}, and {37, 37, 37, 37, 37, 37}.The mean numbers in time interval [4,6] and [7,9] in the 3 hr granularity are 30 and 34, and the counterparts in the 1 hr granularity are 29.67 and 33.This implies that the instance provisioning policy could be more flexible as the interdecision time goes small.Figure 6 presents the total cost for three rental granularities.It is obvious that the total cost is an increasing function of the length of the interdecision time.However, we can also see that this function is not linear; that is, the marginal cost is shrinking as the length of the interdecision time goes smaller.In production systems, a small interdecision time may induce additional system overhead; therefore there should be a tradeoff between the rental cost and system overhead.
Figure 7 describes the impacts of rental granularity to the delay violation probability.Using instance provisioning policies generated by Algorithms 1 and 2, the target SLA specification is satisfied in all three rental granularities.A    using ( 13) and ( 15) for all phases, which are shown in Table 3. Specifically, the results in the first hour are plotted in Figure 8, where we can see that the nonlinear approximation is more accurate than the linear approximation.Figure 9 shows the estimation of VM instance demands.The linear approximation tends to overestimate the demand by 2-4, and the nonlinear approximation underestimates the demand by 0-1. Figure 10 shows the delay violation probability.By using VP table structure, the delay violation probability is around 4%.The linear approximation approach reduces the delay violation probability to about 1% since it reserves more instances.By contrast, the delay violation probabilities in 13 phases (out of total 24 phases) exceed the target 5% objective.The delay violation probabilities even exceed 9% in the 10th and 16th phases.The basic instance provisioning algorithm makes the best resource-SLA tradeoff but suffers from the scalability problem.The two function approximation approaches only need to estimate two parameters in each phase.They visit fewer instance provisioning policies and evade the lookup table structure (VP table); thus the scalability of Algorithms 1 and 2 is enhanced.The effectiveness, however, lies in how well the function approximates the behavior of VP table.A poor approximation may severely deviate from VP table and generate a wrong instance provisioning policy which either damages the performance or increases the rental cost.Figures 11 and 12 present the number of RIS instances and total daily rental cost.We can see that the number of RIS instances in the VP table approach is the same as in the one in the nonlinear approximation approach (29 VMs).The linear approximation approach, although achieves a lower delay violation probability, overestimates the VM instance demand too much (33 VMs).
where    and    denote the number of rented instances (including both RIS and MIS instances) and the violation probability using function approximations and   and   denote the same parameters but using the VP table structure.Clearly, smaller   and   indicate a more accurate approximation.The results are shown in Table 4.The linear approximation achieves a lower violation probability at the expense of a much higher number of instances.In addition, nonlinear approximation has a lower violation probability deviation.Therefore, we purpose to use nonlinear approximation in Algorithms 1 and 2.

Conclusions
Dynamic instance provisioning is a key issue for Internet service providers in the cloud computing environment.In this paper, we investigate the coarse-grain (in the order of hours) QoS-aware dynamic instance provisioning problem  for interactive workload.The optimization problem in our consideration (see (2)-(3)) is not a traditional optimization problem since the QoS constraint (3) has no analytical form for the self-similar Internet traffic; therefore it cannot be solved using classic methods.We use various approaches, for example, a lookup table and two function approximations to characterize constraint (3).The lookup table approach suffers from the scalability issue, because, in order to obtain a precise estimation of the violation probability in the table, we must visit all possible instance provisioning policies and get sufficient violation probability samples.In contrast, function approximations can predict the performance using a small set of samples.nonlinear approximation) address the scalability problem at the expense of a little sacrifice of prediction precision.We conduct extensive simulations to evaluate the effectiveness of the proposed dynamic instance provisioning policy.

Figure 1 :
Figure 1: A data center in the cloud computing system.

Figure 2 :
Figure 2: An example of VM instance demand in different hours of a day.

6. 3 . 1 .
Cost of Various Instance Provisioning Policies.In this experiment, the length of a phase is set to 1 hr.From

Figure 7 :
Figure 7: Impacts of rental granularity on the delay violation probability.

Figure 10 :
Figure 10: Impacts of function approximation on the delay violation probability.

Figure 11 :
Figure 11: The optimal number of RIS instances.

Table 1 :
The pricing structure for Amazon EC2.
Characterize the Percentile QoS Constraint in Self-Similar Traffic.Algorithm 1 learns the performance of various instance provisioning policies in the form of percentile delay via the stochastic gradient method.The algorithm first creates a data structure called VP table (Violation Probability Table), in which each item VP table[][] estimates the delay violation probability given the number of instance being  in phase .The algorithm runs for several iterations to obtain unbiased delay violation probability samples p[][] for each phase .These samples, which can be generated via real system running or simulation, are further smoothed into VP table[][].Therefore, VP table[][] is an unbiased estimation of delay violation probability with  VM instances in phase .Variables , , and  are iteration counter, decision point counter, and instance number counter, respectively.Algorithm 1 has the following property.Algorithm 1 converges to the unbiased estimation of percentile QoS performance of using  VM instances in phase .[][] is the mean value of all samples up to iteration .As long as the end user request arrival process and service process are stationary stochastic processes in phase  with  VM instances, [][] must be an unbiased estimation of percentile QoS performance as  → ∞.

)
Input: , , and SLA specification  th ; { is the number of iterations and  is the number of decision points in a day.}Output: VP table; (1) Create VP table and initialize each item in VP table to 0; (2) Create p[][] and counter; {p[][] is a sample of QoS violation ratio of using  VM instances in phase , and counter logs the number of delay violations in a phase.}Algorithm 1: The learning Algorithm to characterize the Percentile QoS Constraint.  ,  ∈ {0, . . ., }; { 0 is the number of RIS instance, and   ,  ̸ = 0 is the number of MIS instance in phase .

Table 2 :
Means and standard deviations for various rental granularities.

Table - lookup
Figure 8: Function approximations for VP table.

Table 3 :
Parameters for two function approximation approaches.
Function approximations (especially

Table 4 :
The instance deviation and the violation probability deviation for two function approximation approaches.