A Green Strategy for Federated and Heterogeneous Clouds with Communicating Workloads

Providers of cloud environments must tackle the challenge of configuring their system to provide maximal performance while minimizing the cost of resources used. However, at the same time, they must guarantee an SLA (service-level agreement) to the users. The SLA is usually associated with a certain level of QoS (quality of service). As response time is perhaps the most widely used QoS metric, it was also the one chosen in this work. This paper presents a green strategy (GS) model for heterogeneous cloud systems. We provide a solution for heterogeneous job-communicating tasks and heterogeneous VMs that make up the nodes of the cloud. In addition to guaranteeing the SLA, the main goal is to optimize energy savings. The solution results in an equation that must be solved by a solver with nonlinear capabilities. The results obtained from modelling the policies to be executed by a solver demonstrate the applicability of our proposal for saving energy and guaranteeing the SLA.


Introduction
In cloud computing, SLA (service-level agreement) is an agreement between a service provider and a consumer where the former agrees to deliver a service to the latter under specific terms, such as time or performance. In order to comply with the SLA, the service provider must monitor the cloud performance closely. Studying and determining SLArelated issues are a big challenge [1,2].
GS is designed to lower power consumption [3] as much as possible. The main objective of this paper is to develop resource scheduling approaches to improve the power efficiency of data centers by shutting down and putting idles servers to sleep, as Intel's Cloud Computing 2015 Vision [4] does.
At the same time, GS is aimed at guaranteeing a negotiated SLA and power-aware [3] solutions, leaving aside such other cloud-computing issues as variability [2], system security [3], and availability [5]. Job response time is perhaps the most important QoS metric in a cloud-computing context [1]. That is also why the QoS parameter is chosen in this work. In addition, despite good solutions having been presented by some researchers in the literature dealing with QoS [6,7] and power consumption [8,9], the model presented aims to obtain the best scheduling, taking both criteria into account.
This paper is focused on proposing a static green alternative to solving the scheduling problem in cloud environments. Many of the cited solutions consist of creating dynamically ad hoc VMs, depending on the workload, made up of independent tasks or a parallel job composed of communicating or noncommunicating tasks. This implies constantly creating, deleting, or moving VMs. These processes consume large amounts of time. Low ratios for return times and VM management should lead to proposing more static scheduling methods between the existing VMs. The solution described in this paper goes on this direction. However, this solution can be merged and complemented with dynamic proposals.
Our additional contribution with respect to [10] is that our solution tries to optimize 2 criteria at the same time: scheduling tasks to VMs, saving energy, and consolidating VMs to the nodes. Providing an efficient NLP solution for this problem is a novelty challenge in the cloud computing research field. 2 The Scientific World Journal Another important contribution of this paper is method used to model the power of the virtual machines in function of their workload. Relying on the work done in [11], where the authors formulate the problem of assigning people from various groups to different jobs and who may complete them in the minimum time as a stochastic programming problem, the job completion times were assumed to follow a Gamma distribution. To model the influence of the workload, the computing power of the virtual machine is weighted by a load factor determined by an Erlang distribution (equivalent to a Gamma). Finally, a stochastic programming problem is obtained and transformed into an equivalent deterministic problem with a nonlinear objective function.
The remainder of the paper is organized as follows. Our contribution is based on the previous work presented in Section 2. In the GS section (Section 3), we present our main contributions, a sort of scheduling policy. These proposals are arranged by increasing complexity. The experimentation showing the good behavior of our cloud model is presented in the Results section (Section 4). Finally, the Conclusions and Future Work section outlines the main conclusions and possible research lines to explore in the near future.

Related Work
There is a great deal of work in the literature on linear programming (LP) solutions and algorithms applied to scheduling, like those presented in [12,13]. Another notable work was performed in [14], where authors designed a Green Scheduling Algorithm that integrated a neural network predictor in order to optimize server power consumption in cloud computing. Also, the authors in [15] proposed a genetic algorithm that takes into account both makespan and energy consumption.
Shutting down servers when they are not being used is one of the most direct methods to reduce the idle power. However, the authors in [16] state that a power-off requires an additional setup cost, resulting in long system delays. Shutting down servers may sacrifice quality of service (QoS) levels, thus violating the SLA. They put the server work at a lower service rate rather than completely stopping work during idle periods. This drawback can be reduced if scheduling is performed for a large enough number of tasks, as in our case.
In [17], the authors treat the problem of consolidating VMs in a server by migrating VMs with steady and stable capacity needs. They proposed an exact formulation based on a linear program described by too small a number of valid inequalities. Indeed, this description does not allow solving, in a reasonable time or an optimal way, problems involving the allocation of a large number of items (or VMs) to many bins (or servers).
In [18], the authors presented a server consolidation (Sercon) algorithm which consists of minimizing the number of used nodes in a data center and minimizing the number of migrations at the same time to solve the bin (or server) packing problem. They show the efficiency of Sercon for consolidating VMs and minimizing migrations. Despite our proposal (based on NLP) always finding the best solution, Sercon is a heuristic that cannot always reach or find the optimal solution.
The authors in [19] investigated resource optimization, service quality, and energy saving by the use of a neural network. These actions were specified in two different resource managers, which sought to maintain the application's quality service in accordance with the SLA and obtain energy savings in a virtual servers' cluster by turning them off when idle and dynamically redistributing the VMs using live migration. Saving energy is only applied in the fuzzy-term "intermediate load, " using fewer resources and still maintaining satisfactory service quality levels. Large neural network training times and their nonoptimal solutions could be problems that can be overcome by using other optimization techniques, such as the NLP one used in this paper.
In [10], the authors modelled an energy aware allocation and consolidation policies to minimize overall energy consumption with an optimal allocation and a consolidation algorithm. The optimal allocation algorithm is solved as a bin-packing problem with a minimum power consumption objective. The consolidation algorithm is derived from a linear and integer formulation of VM migration to adapt to placement when resources are released.
The authors of [20] presented an effective load-balancing genetic algorithm that spreads the multimedia service task load to the servers with the minimal cost for transmitting multimedia data between server clusters and clients for centralized hierarchical cloud-based multimedia systems. Clients can change their locations, and each server cluster only handled a specific type of multimedia task so that two performance objectives (as we do) were optimized at the same time.
In [21], the authors presented an architecture able to balance load into different virtual machines meanwhile providing SLA guarantees. The model presented in that work is similar to the model presented in this paper. The main difference is in the tasks considered. Now, a more complex and generalized model is presented. In addition, communicating and heterogeneous tasks as well as nondedicated environments have been taken into account.
Our proposal goes further than the outlined literature. Instead of designing a single criteria scheduling problem (LP), we design an NLP scheduling solution which takes into account multicriteria issues. In contrast to the optimization techniques of the literature, our model ensures the best solution available. We also want to emphasize the nondedicated feature of the model, meaning that the workload of the cloud is also considered. This also differentiates from the related work. This consideration also brings the model into reality, providing more reliable and realistic results.

GS Model
The NLP scheduling solution proposed in this paper models a problem by giving an objective function (OF). The equation representing the objective function takes various performance criteria into account.
The Scientific World Journal 3 GS tries to assign as many tasks as possible to the most powerful VMs, leaving the remaining ones aside. As we will consider clouds made up of various nodes, at the end of the scheduling process, the nodes all of whose VMs are not assigned any task can then be turned off. As SLA based on the minimization of the return time is also applied, the model also minimizes the computing and communication time of the overall tasks making up a job.
The number of VMs can be different between nodes, so we use notation V ( = 1, . . . , , where = 1, . . . , ) to represent the number of VMs located to Node . In other words, each Node will be made up by VMs VM 1 , . . . , VM .
Task assignments must show the node and the VM inside the nodes task is assigned to. In doing so, Boolean variables will also be used. The notation V is used to represent the assignment of task to Node VM VM V .

The notation
V represents the amount of Memory allocated to task in VM VM V . It is assumed that Memory requirements do not change between VMs, so represents the assignment of task to VM V . Once the solver is executed, the V variables will inform about the assignment of tasks to VMs. This is V = 1 if is assigned to VM V , and V = 0 otherwise.

Virtual Machine Heterogeneity.
The relative computing power (Δ V ) of a VM V is defined as the normalized score of such a VM. Formally, consider where ∑ =1 ∑ =1 Δ V = 1. V is the score (i.e., the computing power) of VM V . Although V is a theoretical concept, there are many valid benchmarks it can be obtained with (i.e., Linpack (Linpack. http://www.netlib.org/linpack/) or SPEC (SPEC. http://www.spec.org)). Linpack (available in C, Fortran and Java), for example, is used to obtain the number of floating-point operations per second. Note that the closer the relative computing power is to one (in other words, the more powerful it is), the more likely it is that the requests will be mapped into such a VM.

Task
Heterogeneity. In order to model task heterogeneity, each task has its processing cost V , representing the execution time of task in VM V with respect to the execution time of task in the least powerful VM V (in other words, with the lowest Δ V ). It should be a good choice to maximize V V to obtain the best assignment (in other words, the OF) as follows: However, there are still a few criteria to consider.

Virtual Machine
Workload. The performance drop experienced by VMs due to workload saturation is also taken into account. If a VM is underloaded, its throughput (tasks solved per unit of time) will increase as more tasks are assigned to it. When the VM reaches its maximum workload capacity, its throughput starts falling asymptotically towards zero. This behavior can be modeled with an Erlang distribution density function. Erlang is a continuous probability distribution with two parameters, and . The parameter is called the shape parameter, and the parameter is called the rate parameter. These parameters depend on the VM characteristics. When equals 1, the distribution simplifies to the exponential distribution. The Erlang probability density function is We consider that the Erlang modelling parameters of each VM can easily be obtained empirically. The Erlang parameters can be obtained by means of a comprehensive analysis of all typical workloads being executed in the server, supercomputer, or data center to be evaluated. In the present work, a common PC server was used. To carry out this analysis, we continuously increased the workload until the server was saturated. We collected measurements about the mean response times at each workload variation. By empirical analysis of that experimentation, we obtained the Erlang that better fitted the obtained behaviour measurements.
The Erlang is used to weight the Relative computing power Δ V of each VM V with its associated workload factor determined by an Erlang distribution. This optimal workload model is used to obtain the maximum throughput performance (number of task executed per unit of time) of each VM V . In the case presented in this paper, the -axis (abscisas) represents the sum of the Processing cost V of each assigned to every VM V . Figure 1 shows an example in which we depict an Erlang with = 76 and = 15. The Erlang reaches its maximum when = 5. Provided that the abscissas represent the workload of a VM, a workload of 5 will give the maximum performance to such a VM in terms of throughput. So we are not interested in assigning less or more workload to a specific VM V because otherwise, this would lead us away from the optimal assignment.
Given an Erlang distribution function with fixed parameters and , it is possible to calculate the optimal workload in which the function reaches the maximum by using its derivative function: The Scientific World Journal Finally, in our case, provided that the VM workload, defined as the sum of the processing costs ( V ) of the tasks assigned to a particular VM, must be an N number, the optimal workload ( ) is Provided that Boolean variable V = 1 is a boolean variable informing of the assignment of to VM V , and

Task Communication and VM Selection.
In this section, the VM selection is also considered. In doing so, each VM can be selected from a range of nodes, forming a federated cloud. We want to obtain an OF that considers the scheduling of heterogeneous and communicating tasks to heterogeneous nodes made up of different numbers of heterogeneous VMs.
The communication cost (in time) between tasks and when in the same VM is denoted by and should be passed to the solver as an argument. For reasons of simplicity, all communication links are considered to have the same bandwidth and latency. Notation V represents the communication cost between task residing in VM V with another task (located in the same VM or elsewhere). Provided equivalent bandwidth between any two VMs, V = V ∀V, V ≤ . In other words, the communication cost does not depend on the VM or the links used between the VMs. VM communication links are considered with the same bandwidth capacity. Depending on its location, we multiply the communication cost between tasks and by a given communication slowdown. If and are located in the same VM, the communication slowdown (denoted by Cs V ) is 1. If is assigned to another VM in the same node than ( V = 1, the Communication slowdown (Cs V ) will be in the range [0, . . . , 1]. Finally, if is assigned to another VM located in another node ( V = 1), the corresponding communication slowdown term (Cs V ) will also be in the range [0, . . . , 1]. Cs V and Cs V should be obtained with respect to Cs V . In other words, Cs V and Cs V are the respective reduction (in percentage) in task communication between VMs located in the same and different nodes compared with task communication inside the same VM. To sum up, Cs V = 1 ≥ Cs V ≥ Cs V ≥ 0.
According to task communication, the idea is to add a component in the OF that penalizes (enhances) the communications performed between different VMs and different nodes. Grouping tasks inside the same VM will depend on not only their respective processing cost ( V ) but also the communication costs V and communication slowdowns Cs V , Cs V , and Cs V . We reward the communications done in the same VM but less so the ones done in different VMs while still in the same node. Finally, communications between nodes are left untouched, without rewarding or penalizing.
In the same way, if we modelled the OF in function of the tak heterogeneity in (2), the communication component will be as follows (only the communication component of the OF is shown): And the OF function will be 3.6. Choosing SLA or Energy Saving. It is important to highlight that the optimization goal is two criteria (SLA and energy in our case). Thus, the user could prioritize The Scientific World Journal 5 the criteria. Assignments are performed starting from the most powerful VM. When this becomes saturated, task assignment continues with the next most powerful VM, regardless of the node it resides in. When this VM resides in another node (as in our case), the energy-saving criteria will be harmed. It would be interesting to provide a means of increasing criteria preferences in the model presented.
In order to highlight specific criteria (i.e., energy saving), one more additional component must be added to the OF. This component must enhance the assignment of tasks to the same node by assigning tasks to the most powerful nodes and not only to the most powerful VMs as before. This is the natural procedure to follow, because the OF is a maximum. Thus, the likelihood of less powerful nodes becoming idle increases and this gives the opportunity to power them off, hence saving energy.
The additional component can be defined in a similar way as for the relative VM computing power (Δ V ) of a VM V . Instead, we obtain the relative node computing power of a Node (Θ ) as the normalized summatory of their forming VMs. Θ will inform about the computing power of Node .
For nodes, Θ is formally defined as where ∑ =1 Θ = 1. To obtain Θ , the parallel Linpack version (HPL: high performance Linpack) can be used. It is the one used to benchmark and rank supercomputers for the TOP500 list.
Depending on the importance of the energy saving criteria, a weighting factor should be provided to Θ . We simply call this factor energy Energy Ξ. The Ξ will be in the range (0, . . . , 1]. For an Ξ0, our main criteria will be energy saving, and for Ξ = 1, our goal is only SLA. Thus, the resulting energy component will be Θ Ξ. Thus, for a given Node with Θ , we must weigh the energy saving criteria of such a node by the following factor: The resulting OF function will be 3.7. Enforcing SLA. For either prioritized criteria, SLA or energy saving, there is a last consideration to be taken into account. Imagine the case where tasks do not communicate. Once they are assigned to a node, one would expect them to be executed in the minimum time. In this case, there is already no need to group tasks in the VM in decreasing order of power in the same node, because this node is no longer eligible to be switched off. A better solution in this case would be to balance the tasks between the VMs of such a node in order to increase SLA performance. Note that this not apply in the communicating tasks due to the communication slowdown between VMs. To implement this, we only need to assign every noncommunicating tasks without taking the relative computing power (Δ V ) of each VM into account.
We only need to replace Δ V in (12) by Δ, defined as For the case of noncommunicating tasks, by assigning a Δ = 1, all the VMs have the same relative computing power Δ V . Thus, tasks are assigned in a balanced way.

Model Formulation.
Finally, the OF function and their constraints are presented. The best task scheduling assignment to VMs which takes all the features into account (GS policy) is formally defined by the following nonlinear programming model: The Scientific World Journal Equation (14a) is the objective function (OF) to be maximized. Note that OF is an integer and nonlinear problem. Inequality in (14b) and equality in (14c) are the constraints of the objective function variables. Given the constants (the total number of requests or tasks), and V for each VM V , the solution that maximizes OF will obtain the values of the variables V , representing the number of tasks assigned to VM V . Thus, the V obtained will be the assignment found by this model.
OF takes into account the processing costs ( V ) and the communication times ( V ) of the tasks assigned to each VM V and the communication slowdowns between VMs Cs V and nodes Cs V . Cs V = 1. Δ is defined in Section 3.7. And (∑ =1 V V ; , ) represents the power slowdown of each VM due to its workload (defined in Section 3.4).
To sum up, for the case when the workload is made up of noncommunicating tasks, if we are interested in prioritizing the SLA criteria, OF 3.5 should be applied. If, on the contrary, the goal is to prioritize energy saving, OF (14a) should be used instead.

Results
In this section, we present the theoretical results obtained from solving the scheduling problems aimed at achieving best task assignment. Two representative experiments were performed in order to test the performance of GS.
The experiments were performed by using the AMPL (AMPL. A Mathematical Programming Language. http:// ampl.com) language and the SCIP (SCIP. Solving Constraint Integer Programs. http://scip.zib.de) solver. AMPL is an algebraic modeling language for describing and solving highcomplexity problems for large-scale mathematical computation supported by many solvers. Integer and nonlinear (our model type) problems can be solved by SCIP, one of the solvers supported by AMPL.
Throughout all the experimentation, the Erlang arguments were obtained empirically by using the strategy explained in Section 3.4.
As the objective of this section is to prove the correctness of the policy, only a small set of tasks, VMs, and nodes was chosen. The size of the experimental framework was chosen to be as much representative of actual cases as possible, but at the same time, simple enough to be used as an illustrative example. So, the experimental framework chosen was made up of 2 different nodes: one of them comprised 3 VMs and the other 1 VM; see Figure 2. The objective of this simulation was to achieve the best assignment for 3 tasks. Table 1 shows the processing cost V , relating the execution times of the tasks in each VM. To show the good behavior of the model presented, each task has the same processing cost independently of the VM. The model presented can be efficiently applied to real cloud environments. The only weak point is that the model is static. That means that homogenous and static workload conditions must be stable in our model. Job executions in different workload sizes can be saved in a database system,  providing a means for determining the SLA of such a job in future executions.

Without Communications.
In this section, a hypothetical situation without communication between tasks is evaluated. In this situation, our scheduling policy tends to assign the tasks to the most powerful set of virtual machines (i.e., with the higher relative computing power Δ V , considering their individual saturation in this choice). This saturation becomes critical when more and more tasks are added. Here, the most important term is the Erlang function, since it models the behaviour of every virtual machine. Thus, taking this into account, our scheduler knows the exact weight of tasks it can assign to the VMs in order to obtain the best return times. This phenomenon is observed in the following examples. Table 2 shows the parameters used in the first example. The amount of Memory allocated to each task in every VM V (as we supposed this amount to be equal in all the VMs, we simply call it ). The relative computing power (Δ V V ) of each VM and finally the and Erlang arguments. Note that all the VMs have the same Erlang parameters. The parameters were chosen this way because any VM saturates with the overall workload assigned. In other words, the total processing cost 7 is lower than the optimal Erlang workload 16. Table 3 shows the solver assignment results. The best scheduling assigns all the tasks to the same VM (VM 21 , the only VM in node 2), because this VM has the biggest relative  computing power (Δ V V ). This result is very coherent. Due to the lack of communications, the model tends to assign tasks to the most powerful VM while its workload does not exceed the Erlang optimum (a workload of 10 tasks). As in our case, the total workload is 7, and VM 21 could host even more tasks.

Without Communications, Low Optimal Erlang, and
Preserving SLA. In this example (see Table 4), the VMs have another Erlang. However, the task processing costs do not change, so they remain the same as in Table 1. In this case, each VM becomes saturated when the assignment workload weight is higher than 5 (because 5 is the optimal workload). The best assignment in this case is the one formed by the minimum set of VMs with the best relative computing power Δ V V (see Table 5 column Task Assignment SLA). The assignment of the overall tasks to only one VM (although it was the most powerful one) as before will decrease the return time excessively, due to its saturation.

Without Communications, Low Optimal Erlang, and
Preserving Energy Saving. Provided that the most important criterion is the energy saving, the assignment will be somewhat different (see Table 5, column Task Assignment Energy). In this case, OF (14a) with Ξ = 1 was used. Then, as expected, all the tasks were again assigned to VM 21 of Node 2 .

With Communications.
Starting from the same VM configuration shown on Table 4, in this section we present a more real situation where the costs of communications between tasks are also taken into account. It is important to highlight that in some situations, the best choice does not include the most powerful VMs (i.e., with the highest relative computing power Δ V ). Thus, the results shown in this section must show the tradeoff between relative computing power of VM, workload scheduling impact modeled by the Erlang distribution, and communication efficiency between tasks.

High Communication Slowdown.
This example shows the behaviour of the model under large communication costs   between VMs (see Table 6). This table shows the communication costs between tasks ( ) and the communication slowdown when communications are done between VMs in the same node (Cs V ) and the penalty cost when communications are performed between different nodes (Cs V ). Note that the penalties are very high (0.2 and 0.1) when the communications are very influential. The solver assignment is shown in Table 7. In order to avoid communication costs due to slowdowns, the best assignment tends to group tasks first in the same VM and second in the same node. Although VM 21 should become saturated with this assignment, the high communication cost compensates the loss of SLA performance, allowing us to switch off node 1.

Low Communication
Slowdown. Now, this example shows the behaviour of our policy under more normal communication conditions. Here, the communication penalties between the different VMs or nodes are not as significant as in the previous case because Cs V and Cs V are higher. Table 8 shows the communication costs between tasks and the penalty cost if communications are performed between VMs in the same node (Cs V ) or between different nodes (Cs V ).
In this case, the solver got as a result two hosting VMs (see Table 9) formed by the VM 11 (with the assigned tasks 2 and 3 ) and VM 21 with task 1 . In this case, due to the low 8 The Scientific World Journal   differences between the different communication slowdowns, task assignment was distributed between the two nodes.

Moderate Communication Slowdown.
We simulated a more normal situation, where the communication slowdown between nodes is higher than the other ones. From the same example, it was only reduced Cs V to 0.4 (see Table 10). As expected, the resulting solver assignment was different from that in the previous case. Theoretically, this assignment should assign tasks to the powerful unsaturated VMs, but as much as possible to the VMs residing on the same node. The solver result was exactly what was expected. Although the most powerful VM is in node 2, the tasks were assigned to the VMs of node 1, because they all fit in the same node. That is the reason why a less powerful node like node 1, but one with more capacity, is able to allocate more tasks than node 2 due to the communication slowdown between nodes. These results are shown in Table 11.

Conclusions and Future Work
This paper presents a cloud-based system scheduling mechanism called GS that is able to comply with low power consumption and SLA agreements. The complexity of the model developed was increased, thus adding more factors to be taken into account. The model was also tested using the AMPL modelling language and the SCIP optimizer. The results obtained proved consistent over a range of scenarios. In all the cases, the experiments showed that all the tasks were assigned to the most powerful subset of virtual machines by keeping the subset size to the minimum.  Although our proposals still have to be tested in real scenarios, these preliminary results corroborate their usefulness.
Our efforts are directed towards implementing those strategies in a real cloud environment, like the OpenStack [22] or OpenNebula [23] frameworks.
In the longer term, we consider using some statistical method to find an accurate approximation of the workload using well-suited Erlang distribution functions.