A Two-Phase Cloud Resource Provisioning Algorithm for Cost Optimization

Cloud computing is a new computing paradigm to deliver computing resources as services over the Internet. Under such a paradigm, cloud users can rent computing resources from cloud providers to provide their services. 'e goal of cloud users is to minimize the resource rental cost while meeting the service requirements. In reality, cloud providers often offer multiple pricing models for virtual machine (VM) instances, including on-demand and reserved pricing models. Moreover, the workload of cloud users varies with time and is not known a priori. 'erefore, it is challenging for cloud users to determine the optimal cloud resource provisioning. In this paper, we propose a two-phase cloud resource provisioning algorithm. In the first phase, we formulate the resource reservation problem as a two-stage stochastic programming problem, and solve it by the sample average approximation method and the dual decomposition method. In the second phase, we propose a hybrid ARIMA-Kalman model to predict the workload, and determine the number of on-demand instances based on the predicted workload. 'e effectiveness of the proposed two-phase algorithm is evaluated using a real-world workload trace and Amazon EC2’s pricing models. 'e simulation results show that the proposed algorithm can significantly reduce the operational cost while guaranteeing the service level agreement (SLA).


Introduction
Cloud computing [1] is a new computing paradigm to deliver computing resources as services over the Internet. ese services are provided at three different levels: Infrastructure as a Service (IaaS) [2], Platform as a Service (PaaS) [3], and Software as a Service (SaaS) [4]. In this paper, we focus on IaaS. IaaS providers such as Amazon EC2 [5] and Microsoft Azure [6] provide their computing resources to cloud users in the form of VMs. Cloud users can rent VMs from cloud providers on a pay-per-use basis.
Cloud providers usually have different billing cycles and offer different pricing models. Take Amazon EC2 as an example. Amazon EC2 has two billing cycles: per hour billing and per second billing. In this paper, we adopt per hour billing. Amazon EC2 offers three pricing models: (1) On-demand pricing model. On-demand instances let users pay for compute capacity by the hour with no long-term commitments. (2) Reserved pricing model. Users pay an upfront fee (all upfront, partial upfront, and no upfront) to reserve an instance for a 1-year or 3-year term and is then charged a discounted hourly rate for the instance during the reservation period. (3) Spot pricing model. Spot instances allow users to bid on unused EC2 instances and run those instances for as long as their bid exceeds the spot price. Spot instances are charged the spot price which is set by Amazon EC2 and adjusted gradually based on the supply and demand for spot instances. Such diverse pricing models make it challenging for cloud users to determine the optimal cloud resource provisioning.
ere have been a lot of studies on cloud resource provisioning, which aim to minimize the resource provisioning cost while satisfying the service requirements. However, most existing studies [7][8][9][10][11] do not consider the pricing models or only consider the on-demand pricing model. Some recent studies [12][13][14][15][16] consider both on-demand and reserved pricing models to reduce the resource provisioning cost. ese studies typically use reserved instances to meet the minimum service requirements and use on-demand instances to meet the sudden workload demand.
In this paper, we study the cloud resource provisioning problem. To reduce the resource rental cost, we use both ondemand and reserved instances and propose a two-phase cloud resource provisioning algorithm. In the resource reservation phase, we determine the optimal number of reserved instances to minimize the resource rental cost. In the on-demand resource provisioning phase, on-demand instances are purchased based on the predicted workload to guarantee the SLA. e main contributions of this paper are summarized as follows: (i) We use both on-demand and reserved instances for cloud resource provisioning and propose a twophase cloud resource provisioning algorithm to reduce the resource rental cost. (ii) In the first phase, we formulate the resource reservation problem as a two-stage stochastic programming problem, and solve it by the sample average approximation method and the dual decomposition method. (iii) In the second phase, we propose a hybrid ARIMA-Kalman model for workload prediction and determine the number of on-demand instances based on the predicted workload. (iv) We conduct extensive experiments to evaluate the effectiveness of the proposed two-phase algorithm using a real-world workload trace and Amazon EC2's pricing models. e experimental results show that the proposed algorithm can significantly reduce the operational cost while guaranteeing the SLA. e rest of this paper is organized as follows. Related works are reviewed in Section 2. e problem formulation is given in Section 3. e two-phase cloud resource provisioning algorithm is presented in Section 4 and Section 5. Experimental results are presented in Section 6. Finally, we conclude this paper in Section 7.

Related Work
In cloud computing, cloud users can reduce the cost and guarantee the QoS requirements through adaptive resource provisioning. Adaptive resource provisioning has been widely studied [7][8][9][10][11]. In [7], the autoscaling techniques were classified into five categories: static threshold-based rules, reinforcement learning, queuing theory, control theory, and time series analysis. Calheiros et al. [8] proposed a workload prediction model using the ARIMA model and evaluated its impact on cloud applications' QoS. Islam et al. [9] developed prediction-based resource measurement and provisioning strategies using neural network and linear regression to satisfy upcoming resource demands. To train the neural network, Shah et al. [17] presented a quick Gbestguided artificial bee colony learning algorithm. Chen et al. [10] proposed an iterative QoS prediction model and a PSObased runtime decision algorithm to derive a self-adaptive approach for resource allocation in cloud-based software services. Liu et al. [11] presented SPRNT, a reinforcement learning-based aggressive virtualized resource management system for IaaS clouds.
e above works mainly focus on adaptive resource provisioning. However, cloud providers usually offer multiple pricing models: on-demand, reserved, and spot. Cloud users can significantly reduce the cost based on these pricing models. Chaisiri et al. [12] proposed an optimal cloud resource provisioning algorithm by formulating a stochastic programming model in which the demand and price uncertainty is considered. In [13], a two-phase resource provisioning algorithm was presented. In the first phase, the optimal amount of long-term reserved resources was computed by a mathematical formulae. In the second phase, the authors used the Kalman filter to predict resource demand and adaptively changed the subscribed on-demand resources. Niu et al. [14] proposed a semielastic cluster computing model for organizations to reserve and dynamically resize a virtual cloud-based cluster. In [15], a dynamic instance provisioning strategy based on the large deviation principle was proposed to minimize the number of active instances subject to a QoS requirement in terms of the overload probability. Mireslami et al. [16] proposed a twophase cloud resource allocation algorithm. In the first phase, reserved resources were allocated to meet the minimum QoS requirements. In the second phase, a stochastic optimization approach was proposed to allocate on-demand resources under demand uncertainty.
In this paper, the cloud resource provisioning problem is formulated as a two-stage stochastic programming problem. It can be transformed into a deterministic integer program and solved by exact methods such as branch and bound and cutting plane methods, or heuristic methods such as genetic algorithm, particle swarm optimization, and hybrid algorithms [18][19][20]. Grey [18] presented a hybrid PSO-GA algorithm for solving the various constrained optimization problems. In this approach, PSO is used to explore the solution while GA is being used for updating the solution.

Problem Formulation
In this section, we present the model assumptions, including the VM configurations and the pricing models. Based on these assumptions, we present the formulation of the cloud resource provisioning problem. e notations used in this paper are listed in Table 1.

Cloud Computing Environment.
Cloud providers offer multiple types of VMs to cloud users. Let V � V 1 , V 2 , . . . , V M denote the set of VM types, whereM is the total number of VM types. Each VM type has its own resource configuration and processing capacity. Let C i denote the processing capacity of a VM instance of type V i , which is the maximum number of concurrent users or the maximum service request rate that can be handled by a VM instance of type V i without violating the QoS requirements.
We adopt per hour billing and consider two pricing models: on-demand instance and reserved instances (1-year term, partial upfront). Let p o i denote the hourly usage fee of an on-demand instance of type V i . Let p R i and p r i denote the onetime upfront payment and the hourly usage fee of a reserved instance of type V i , respectively. Let T be the number of hours in a reservation period. en, the effective hourly price of a reserved instance of type V i can be computed as p R i /T + p r i , which is charged for every hour during the reservation period. It is usually assumed that

Cloud Resource Provisioning Problem.
We consider the cloud resource provisioning problem over a reservation period. Let t � 1, 2, . . . , T be the hour index of the reservation period. Let d t be the workload at time t. Let R � (n r 1 , n r 2 , . . . , n r M ) be the reservation decision and n r i be the number of reserved instances of type V i . en, the reserved processing capacity is M i�1 n r i C i , and the total cost of reserved instances for the reservation period is For each time t, if the workload does not exceed the reserved processing capacity, there will be no need to purchase on-demand instances; otherwise, on-demand instances will be purchased, and the usage cost of ondemand instances can be written as where n o ti is the number of on-demand instances of type V i at time t.
e resource reservation problem can be formulated as where the objective is to minimize the total cost for the reservation period, including the upfront fee and the usage cost of reserved instances, and the usage cost of on-demand instances. is problem depends on the workload over the reservation period, which is not known a priori. We can estimate the probability distribution of the workload p D (d) based on historical data. en, the resource reservation problem can be rewritten as is problem is a two-stage stochastic programming problem, where the objective function is the average cost per hour, and the possible realizations of the workload are called scenarios.
e first-stage problem corresponds to the resource reservation problem, where the first-stage decision is the reservation decision. e second-stage problem corresponds to the on-demand resource provisioning problem, where the second-stage decision depends on the realization of the workload.

Resource Reservation
In this section, we use the sample average approximation method and the dual decomposition method to solve the resource reservation problem.

Sample Average Approximation (SAA).
If the number of scenarios is very large, it is difficult to solve (3) directly. e sample average approximation method can be used to reduce the number of scenarios [21]. Since the workload is a one-dimensional random variable, a uniform discretization grid is used to generate a set of scenarios where N is the sample size. en, problem (3) can be approximated as Usage cost of on-demand instances at time t p D (d) Probability distribution of the workload Problem (4) is the SAA of problem (3). Problem (4) is also a two-stage stochastic programming problem, which can be transformed into the following deterministic equivalent formulation: Problem (5) is an integer linear program, which can be solved using a standard branch and bound algorithm.

Dual Decomposition-Based Branch and Bound (DDBnB).
e standard branch and bound algorithm uses the linear programming relaxation for bounding. In this paper, we use the Lagrangian relaxation obtained by scenario decomposition to improve the bounds [22]. e idea of scenario decomposition is to introduce a copy R j of the first-stage decision R for each scenario. en, problem (5) can be reformulated as where the constraints R 1 � · · · � R N are called the nonanticipativity constraints. e nonanticipativity constraints have several equivalent expressions. Here, we represent the nonanticipativity constraints by N j�1 By dualizing the nonanticipativity constraints, the Lagrange dual function of problem (6) is defined as where λ ∈ R M(N− 1) is the Lagrange multiplier vector associated with the nonanticipativity constraints. Problem (7) can be decomposed into multiple subproblems according to the scenarios: Problem (8) is called the scenario subproblem, which is a small integer linear program. e dual problem of problem (6) can be formulated as Dual problem (9) can be solved by the subgradient method. From the definition of the subgradient, the sub- is the firststage component of the optimal solution of (8) for a given λ.
e iterative formula of the subgradient method is as follows: where k is the iteration index and c (k) is a positive step size. Dual problem (9) provides a lower bound for original problem (6). In general, the scenario solutions R j , j � 1, 2, . . . , N will not satisfy the nonanticipativity constraints unless the duality gap is zero. In this paper, we present a branch and bound algorithm that uses the Lagrangian relaxation of the nonanticipativity constraints for bounding. To obtain a feasible first-stage solution, we compute the average R � N i�1 R j /N and round it by some heuristic to obtain an integer solution. e feasible first-stage solution provides an upper bound for problem (6). e branch and bound algorithm is described as follows, where P denotes the set of current problems and z(P) is a lower bound ofP ∈ P: Step 1. Initialization: set z � +∞ and let P consist of problem (6).
Step 2. Termination: if P � ∅, then the solution that yielded z is optimal.
Step 3. Node selection: select and delete a problem P from P, and solve its Lagrangian dual.
Step 4. Bounding: if z L D (P) ≥ z, go to Step 2 (this step can be carried out as soon as the value of the Lagrangian dual rises above z).
(i) e scenario solutions R j , j � 1, 2, . . . , N, are identical: let z � z L D (P) and delete from P all problems P ′ with z(P ′ ) ≥ z. Go to Step 2. (ii) e scenario solutions R j , j � 1, 2, . . . , N, differ: compute the average R � N i�1 R j /N and round it by some heuristic to obtain R. If M i�1 n r i (p R i /T + p r i ) + N j�1 U(R, td j )/N < z, then let z � M i�1 n r i (p R i /T + p r i ) + N j�1 U(R, td j )/N and delete from P all problems P ′ with z(P ′ ) ≥ z.
Step 5. Branching: select a component n r i of R and add two new problems to P obtained from P by adding the constraints n r i ≤ n r i and n r i ≥ n r i + 1, respectively. Go to Step 2.

On-Demand Resource Provisioning
On-demand resource provisioning problem (1) is an integer linear program, which can be solved using any standard integer linear programming solver. However, the workload is not known a priori. In this paper, we propose a hybrid ARIMA-Kalman model for workload prediction.
It has been shown in the literature that the workload exhibits strong autocorrelation. en, the workload can be modeled by an ARIMA model [8,23]: where d t � ∇ d d t , ε t ∼ WN(0, σ 2 ), Φ � (ϕ 1 , . . . , ϕ p ) are the AR coefficients, and Θ � (θ 1 , . . . , θ q ) are the MA coefficients. Let r � max(p, q + 1), and model (11) can be rewritten as where ϕ i � 0 for i > p and θ j � 0 for j > q. en, the statespace representation of model (12) can be obtained as [24] where (13) and (14) are the measurement and state equations, d t is the measurement variable, x t ∈ R r is the state vector, W t � 0 is the measurement noise with variance R � 0, and V t � (ε t , 0, . . . , 0) T is the state noise with covariance matrix Q � diag(σ 2 , 0, . . . , 0). e measurement matrix G and the state transition matrix F are given as From state-space models (13) and (14), the Kalman prediction equations is obtained as follows [25]: where (16) and (17) are the time and measurement update equations, x t/s and d t/s are the estimates of x t and d t given the observations up to time s, P t/s is the error covariance matrix of x t/s , Z t/s is the error variance of d t/s , and K t is the Kalman gain.
Let ψ � x 0/0 , P 0/0 , Φ, Θ, σ 2 denote the set of parameters in the Kalman prediction equations, which can be estimated by the maximum likelihood method. In this paper, we use the EM algorithm [26] to obtain the maximum likelihood estimates of the parameters. If we could observe the states X n � x 0 , x 1 , . . . , x n in addition to the observations D n � d 1 , d 2 , . . . , d n , then we would consider X n , D n as the complete data. Under the Gaussian assumption, the loglikelihood of the complete data can be written as lnL D n , X n ; ψ � − 1 2 ln P 0/0 − 1 2 From (18), if we did have the complete data, it will be straight forward to obtain the maximum likelihood estimate of ψ using multivariate normal theory. However, we cannot observe the states. e EM algorithm is an iterative method for finding the maximum likelihood estimate of ψ based on the incomplete data by successively maximizing the conditional expectation of the complete data loglikelihood. Each iteration of the EM algorithm consists of two steps, the expectation step (E-step) and the maximization step (M-step). In the E-step, the conditional expectation of the complete data log-likelihood is computed given the parameter estimates from the previous iteration: From (18), we can obtain Mathematical Problems in Engineering where x t/n x T t/n + P t/n , x t/n x T t−1/n + P t,t−1/n , P t,t−1/n is the error covariance of x t−1/n and x t/n . In the M-step, (20) is maximized with respect to the parameters and then the updated parameter estimates are obtained as e flowchart of the EM algorithm is shown in Figure 1. e one-step-ahead prediction of the workload based on Kalman prediction is given by For each time t, even with a workload prediction method, the underprovisioning problem can occur due to underestimation, which causes the SLA violation. To reduce the SLA violation rate, (23) can be modified as

Evaluation
In this section, we conduct extensive experiments to evaluate the effectiveness of the proposed two-phase algorithm based on a real-world workload trace and Amazon EC2's pricing models.

Experiment Setup.
e workload trace used in the experiments is obtained from a 4-week access log file of the NASA web server [27], as shown in Figure 2. e probability distribution of the workload can be estimated based on the workload trace. We consider four types of VM instances offered by Amazon EC2: small (m1.small), medium (m1.medium), large (m1.large), and extralarge (m1.xlarge) [5]. Table 2 shows the configuration and the pricing models of each VM type. e parameters of the algorithms are set as follows. e sample size of the SAA problem is set to 10. In the subgradient method, we use a diminishing step size c (k) � (1 + m)/(k + m) where m � 100, and repeat the iterations until the stopping criterion | (D (k) − D (k− 1) ) | D (k) | ≤ ε is satisfied where ε � 0.00001. In the EM algorithm, the initial values of the parameters are set according to [25].

Performance of Resource Reservation Algorithm.
We first analyze the impact of resource reservation on the operational cost. Figure 3 shows the operational cost under different resource reservations. We can observe that the operational cost can be significantly reduced by resource reservation, and there is a tradeoff between the on-demand cost and the reservation cost. e optimal resource reservation is n r 1 � 0, n r 2 � 1, n r 3 � 0, n r 4 � 1 with the reserved processing capacity of 275 requests/s, and the optimal operational cost is $3407.9. By combining on-demand and reserved instances, the operational cost can be reduced by 25.58% compared with the pure on-demand strategy.
We compare the accuracy of the uniform discretization grid with that of the Monte Carlo and quasi-Monte Carlo methods [21]. As can be seen from Figure 4(a), the uniform discretization grid is the best among the three methods under the same sample size. We also study the impact of the sample size on the accuracy of the uniform discretization grid. As can be seen from Figure 4(b), the accuracy of the uniform discretization grid becomes higher as the sample size increases, and reaches 98.01% when N � 10. Figure 5 shows the convergence of the dual decomposition-based branch and bound algorithm. It can be seen that the optimal solution can be obtained by the DDBnB algorithm after 9 iterations. Table 3 compares the performance of our resource reservation algorithm based on stochastic programming (RRSP) with two existing algorithms: the RIPAM algorithm considering only medium instance type [15] and the DCRA algorithm [16]. e RRSP algorithm can reduce the operational cost by 24.92%. Our algorithm can achieve 4.14% more cost saving than RIPAM, and 20.84% more cost saving than DCRA.

Workload Prediction Based on Hybrid ARIMA-Kalman
Model. In this subsection, we evaluate the performance of the hybrid ARIMA-Kalman model. e data of the first three weeks are used as the training data and the data of the last week as the test data. Figure 6 shows the prediction results. We can observe that the predicted workload is very close to the actual workload. e prediction accuracy of the hybrid ARIMA-Kalman model is compared with the ARIMA model [8] and the neural network method [9] based on three metrics, mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE). e ARIMA model has an autoregressive order of 2 and a moving average order of 1. e neural network method uses the backpropagation neural network, the learning rate is set to 0.7, there is only one hidden layer, and the numbers of neurons in the input, hidden, and output layers are 6, 4, and 1, respectively. As can be seen from Table 4, the hybrid ARIMA-Kalman model is better than the other two methods.
Although the predicted workload is very close to the actual workload, the underprovisioning problem can occur due to underestimation of the workload. To reduce the SLA violation rate, modified workload prediction formula (24) is used. Figure 7 shows the impact of the Start Initialization: set j = 0 and choose the initial values of the parameters Ψ (0) ; E-step: calculate the smoothed values x t/n , P t/n , P t,t-1/n based on Ψ (j -1) and use x t/n , P t/n , P t,t-1/n to calculate S 11 , S 10 , S 00 ; M-step: update the parameter estimates using (20)     Mathematical Problems in Engineering parameter α on the SLA violation rate and the on-demand cost. It can be seen that, as the value of α increases, the SLA violation rate decreases while the on-demand cost increases.

Conclusion
In this paper, we propose a two-phase cloud resource provisioning algorithm for cloud users to reduce the resource rental cost using both on-demand and reserved instances. In the first phase, we formulate the resource reservation problem as a two-stage stochastic programming problem. We use the sample average approximation method to reduce the number of scenarios, and solve the SAA problem by a dual decomposition algorithm with branch and bound to obtain the optimal resource reservation. In the second phase, we propose a hybrid ARIMA-Kalman model for workload prediction and determine the number of on-demand instances based on the predicted workload. e effectiveness of the proposed two-phase algorithm is evaluated based on a real-world workload trace and Amazon EC2's pricing models. e simulation results show that the proposed algorithm can achieve about 5%-20% more cost saving than existing algorithms while guaranteeing the SLA. In the future, we plan to investigate more pricing models offered by cloud providers such as spot pricing model, and use these pricing models to further reduce the resource rental cost of cloud users.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments is work was supported by the Science and Technology Program of Nantong (grant no. JC2018025).   [15] 3628.1 20.78 DCRA [16] 4392.   Mathematical Problems in Engineering 9