A Two-Tier Energy-Aware Resource Management for Virtualized Cloud Computing System

The economic costs caused by electric power take the most significant part in total cost of data center; thus energy conservation is an important issue in cloud computing system. One well-known technique to reduce the energy consumption is the consolidation of Virtual Machines (VMs). However, it may lose some performance points on energy saving and the Quality of Service (QoS) for dynamic workloads. Fortunately, Dynamic Frequency and Voltage Scaling (DVFS) is an efficient technique to save energy in dynamic environment. In this paper, combined with the DVFS technology, we propose a cooperative two-tier energy-aware management method including local DVFS control and global VM deployment. The DVFS controller adjusts the frequencies of homogenous processors in each server at run-time based on the practical energy prediction. On the other hand, Global Scheduler assigns VMs onto the designate servers based on the cooperation with the local DVFS controller. The final evaluation results demonstrate the effectiveness of our two-tier method in energy saving.


Introduction
Cloud computing provides elastic computing resources on a pay-as-you-go basis for most conceivable forms of applications but it also causes huge amounts of electric energy consumption.Almost 0.5% of world's total power usage is consumed by the servers in data centers [1].Among them, processors (CPUs) account for the most significant part of power and have the most dynamical power that can be adjusted, while other components can only be completely or partially turned off [2].Owing to these reasons, reducing energy consumption of processors using the dynamic nature of CPUs' power has become a hot research topic in cloud computing system.
To service more users for more income, service providers prefer to share cluster resources among users.In cloud environments, the virtualization technique is widely adopted to allow users to share the physical resources.Making the working servers for Virtual Machines (VMs) as less as possible and letting the idle servers be in a low-power mode will improve the utilization of resources and reduce energy consumption, which is known as VM consolidation.In each server, by applying Dynamic Voltage and Frequency Scaling (DVFS), which enables dynamic adjustment of execution frequency on demand, more energy can be saved.The dynamic power consumption of CPU is proportional to the frequency and to the square of voltage.Scaling down the execution frequency will reduce the power while it may also reduce the performance and increase the execution time, which may instead cause more energy consumption (energy is equal to the line integral of power  to time ,  = ∫  0  ).On the other hand, real-time tasks in the cloud computing system usually have requirements on execution speed; the extension of execution time may violate QoS requirements.Thus, it is nontrivial to reduce energy consumption by scaling the execution frequencies of tasks [3].
VM consideration could improve the resource indeed and many previous works [4][5][6] achieve significant result on energy saving in virtualized cloud system.However, most of them do not take the advantage of DVFS strategy.Some others only apply the DVFS after allocation while not considering

Related Work
Reducing energy consumption has already been a critical issue of data center in recent years.Many works study the energy saving strategies in virtualized environment.Kusic et al. [7] defined a dynamic resource management as a sequential optimization in virtualized environment.The sequential optimization whose objective is maximizing the profit of provider is solved using Limited Lookahead Control (LLC) by minimizing both energy cost and SLA.But the framework captures the behavior of each application by simulation-based learning and the complexity of the model makes the approach not suitable for large scale data center.
In [8], the authors have developed dynamic resource provisioning and allocation problem with virtualized technique for energy-efficient cloud computing.They propose self-manage and energy-aware mechanisms to allocate the Virtual Machines (VMs) and migrate VMs according to CPU utilizations and energy consumption.The placing problem of allocation which can be seen as a bin packing problem is solved by Modification Best Fit Decreasing (MBFD).For the migration problem, three policies are proposed to choose VMs to migrate in order to reduce energy consumption.
Cardosa et al. [9] have presented a novel approach for power-efficient VM placement for the heterogeneous data centers by leveraging min-max and share features of the VMs based on the DVFS and soft scaling technique.The power consumption and utilization obtained from the running time of a VM are optimized by being set a priori.However, their approach does not strictly support SLAs and the information of applications' priorities is needed.Cao and Dong [10] propose an energy-aware heuristic framework for VM consolidation which can obtain a better tradeoff between energy saving and performance.A SLA violation decision algorithm is proposed to determine hosts' status for SLA violation.Based on the hosts' status, the minimum power and maximum utilization policy for VM migration are used to achieve the energy saving.
Reference [11] maximizes the utilization at virtual machine level in the environment of container.The objective of the paper is to dynamically set the sizes of virtual machines in order to improve the utilization of VMs, which saves overall energy consumption.Experiments show that their method can achieve 7.55% of energy consumption compared to scenarios where the virtual machine sizes are fixed.Reference [12] proposes a VM allocation algorithm to reduce energy consumption and SLA violation, which uses the historical record of VMs' usage.
Some other works mainly focus on the DVFS strategy to decrease processors' power consumption in hosts.Some of them periodically adjust the frequency according to the performance of server.Reference [13] monitors the utilization of processors periodically and the frequency is decreased very carefully when there are observable impacts on execution time of tasks.Hsu and Feng [14] proposed a -adaption algorithm that periodically evaluates the performance and automatically adapts the frequency and voltage at run-time.Reference [15] also developed the periodic DVFS controller for multicore processor without using any performance model.However, the length of period has a great impact on the performance of algorithms, it should be evaluated very carefully.
Scaling the frequency according to the types of workloads is another efficient way to carry out DVFS control.They achieve the goal of energy saving with a little or limited performance loss by decreasing the frequency during the communication, data access, memory access, or idle phases.Lim et al. [16] proposed a run-time scheduler that applies DVFS control during the communication phases which is identified by intercepting the MPI calls.In [17], the authors presented a novel algorithm that utilizes the opportunities in execution of hybrid MPI/OpenMP application to scale the frequency and reduce energy consumption.Tan et al. implement the DVFS scheduling strategy for data intensive application in [18] and achieved the energy saving.Their strategy adaptively sets the suitable frequency according to the percentage of CPUbound time in the total execution time of workloads and is implemented in source code level.
The DVFS is able to reduce the energy consumption, but it is limited on a single server.A lot of work developed the DVFS-based task scheduling among servers because the distribution of workloads influences the overall energy.References [19,20] propose similar energy-aware strategies that schedule a set of tasks onto physical machine.They adjust supply voltage by utilizing slack time of noncritical jobs.Reference [19] also discussed the tradeoff between energy consumption and scheduling length.Khan and Ahmad [21] studied the problem of task allocation in grid and they utilized the cooperative game theory to minimize the energy consumption and makespan of tasks for DVFS-based clusters.Similar to [21], Mezmaz et al. studied the problem for the dependent precedence-constrained parallel applications [22].Different to these works, we study the independent real-time services with deadline constraints in multiprocessor system.
References [23][24][25][26][27] researched energy-efficient task scheduling for real-time system.Luo and Jha studied the scheduling of periodic tasks in heterogeneous system and gave a power-efficient solution [27].In [24], authors proposed an energy-aware task partitioning algorithm with polynomial time complexity for DVFS-based heterogeneous system.Awan and Petters proposed an energy-aware partitioning of tasks method which consists of two phases and they use a realistic power model to estimate power consumption [23].Our task allocating algorithm cooperates with the local DVFS controller to predict the energy consumption in different situations; the influence of frequency scaling to energy consumption is taken into account before allocation for saving more energy.

Overview
Our framework can accept and analyze the arrival workloads and package them by Virtual Machines (VMs) and allocate them to the suitable server to reduce energy consumption.We first describe the architecture of our solution in Figure 1 and subsequently introduce the real-time analysis in this section [3].In our solution, the Global Scheduler assigns a task to a VM to execute it and guarantees its QoS requirement.This VM will be allocated to a host which can offload it without ( (3) (4) (5) Global tier

Local tier
Figure 1: System architecture of our solution to the energy-aware resource management.
causing any violation of QoS requirement and brings minimum energy consumption.Our objective is to find the allocation method for VMs and frequencies scaling method for tasks to reduce the energy consumption.
Definition 1 (host model).Let host  = (  ,   ) be denoted as resources of jth host, where   and   are vectors that record the utilizations and frequencies of each processor.
The Task Analyzer in Dispatcher receives and analyzes the information of incoming task and sends it to other components when necessary.The Host Monitor is an assistant component which connects to each server and gathers the basic information of servers.The Local Monitor monitors the resources of a server and sends the basic information to Host Monitor when necessary.The basic information of servers is recorded in the Host Model (Definition 1).We mainly focus on the resource of processor, so we only record the states of processors in the Host Model.The main work mechanism of our solution to schedule a new task request task  is described as follows: (1) When task  comes, the Task Analyzer analyzes the basic information of task  and sends it to the Host Monitor and Global Scheduler (Section 6).
(2) When the Host Monitor receives the information of task  , it selects a set of candidates who can load task  according to the basic information of servers and sends the set to the Global Scheduler.In the large datacenter, the number of candidates can be carefully selected to improve the effectiveness of allocation.
(3) When the Global Scheduler receives the candidates and the basic information of task  , it sends the task information to the servers who are in the candidate set.
(4) When a candidate receives the task information, the local DVFS controller (Section 5) will run to estimate the minimum energy change if task  is allocated to one of its VM according to the monitored information.Then the controller returns the result to the Global Scheduler.
(5) When the Global Scheduler receives responses from all the candidates, it allocates task  to the best server using our allocation algorithm.There may be some network error in communications like packet error or loss or high network delay.We can set some threshold for the Global Scheduler, for example, time threshold for response time or retry times.When response time or retry times of a candidate are larger than the thresholds, the Global Scheduler can discard this candidate.
This is the simple architecture for energy-aware task scheduling and some project implemented details or optimizations are not discussed in this paper.We mainly focus on the energy-aware scheduling for tasks and provide a solution to this problem.Some problems like single point of failure and network error are also important for the distributed cloud system.We consider that these problems have the maturing solutions in today's cloud system and these aspects may not be a problem to our solution.

Task Model
The request of service in the cloud computing system usually has deadline constraints which is the major aspect of Service Level Agreements (SLAs).We explore energy saving method for the cluster that accepts request for tasks.We define the task model (Definition 2) to describe the request for a task.The task model records some important information that users provide.  and   describe the requirements of tasks and   and   describe the execution characters of tasks.
For the isolation, scalability, and stability of system, tasks are usually run in the VMs independently in the cloud computing system.We can regard each task as a VM, so the allocation of the tasks is equivalent to the allocation of VMs in some degree.In our model, we assign a task to a VM to run and the VM will be allocated to appropriate host.When a task finishes, the VM loading this task will be shut off or turned into sleep.The living time for a VM to run a task is equal to the execution time of this task.Therefore, the living time for the VM should not exceed the deadline of the tasks.Let VM  represent the virtual machine load task  .
We designed the Task Analyzer to accept and analyze the incoming request of tasks.It sends the basic information of tasks to other components after preprocessing.The living time of a task (i.e., VM) usually includes computing time and CPU idle time.The CPU idle time may consist of communication, memory, or disk access.The real-time analysis we designed is to distinguish the computing time and CPU idle time.The average utilization of a VM can reflect the computation and CPU idle time in some degree.Let   () and   represent computing time at frequency  and idle time of a VM, respectively.We estimate the computing time   ( max ) =   ⋅   and idle time   =   ⋅ (1 −   ) for ith task.The Task Analyzer calculates   ( max ) and   and sends these information to other modules.
The computing time has a tight relation to the CPU frequency which shows a linear extension to the reduction in frequency [28,29], while the idle time of a task will barely change due to frequency scaling.Therefore, the living time of the VM of ith task frequency  can be expressed as When the local DVFS controller predicts the energy consumption in different frequency, the living time of VMs can be calculated by ( 1) according to the task information provided by Task Analyzer.Although the running time can be predicted under different frequency, the energy prediction and DVFS controller are not a easy task.We will introduce details of our method to solve them in next sections.

Local DVFS Controller
The DVFS Controller plays an important role in our framework and it has two main functions.On the one hand, it predicts the energy consumption of a multiprocessor server according to the processors' utilizations, frequencies, and the living time of VMs.On the other hand, based on the energy prediction, it runs the k-Phase energy Prediction (kPP) algorithm to find the best frequencies combinations that bring minimum energy consumption.

Energy Prediction for Multiprocessor Servers.
The electric energy consumption is the integral of the active power with respect to time.Therefore, the power prediction of server is crucial.Previous works like [30][31][32][33][34] provided serval methods to estimate the power of a server.However, they only focused on the frequency-power or utilization-power relationship and the detailed power prediction for multiprocessor platform is also ignored.In this paper, we provide a practical power prediction for multiprocessor servers based on the frequency-power and utilization-power relationship.We utilize the fact that the homogenous processors will consume the same power when they are under the same condition to predict the power consumption.
The power consumption of a server consists of two parts: static and dynamic power consumption.The static parts include the power consumption of main board, hard disk, fan, and so forth.CPU accounts for the largest part of dynamic power.According to the previous studies, the dynamic power consumption of CPU is proportional to the frequency and to the square of voltage [34], which can be express as where  is the percentage of active gates,  is total capacitance,  is supply voltage, and  is the operating frequency.
According to [31], the voltage has a linear relationship to frequency, so the dynamic power of a processor can be reduced as a function of frequency:  dynamic =  ×  3 , where  is a proportional coefficient.Processors also have static power when they are active.Let   represent the static power of a server and  CPU  represent the static power of processor.The power of a host in which all homogenous processors work in the same frequency  with full utilization can be expressed as follows: where   is the number of CPUs.We want to eliminate the static power of processors, which is not easy to measure.For a given host, we can easily measure its maximum power which is  max =   +   ( CPU  +  3 max ).Therefore, we can estimate the power consumption of a host in which all processors work in the same frequency : The power consumption is also related to utilizations.Figure 2(a) shows the power consumption with only one processor running and Figure 2(b) shows the power of two processors that work in same utilization and frequency.As we can see, the power with different utilization under same frequency is different.The power and the utilization present a linear relationship which is with one voice to [30,32,33].
Therefore, the power consumption of one homogenous CPU with frequency  and utilization  can be denoted as Finally, the power of prediction of a homogenous multiprocessor server can be expressed as We can view the power of jth host as a function of utilizations and frequencies, which is presented as  host (  ,   ), where   and   are defined in Host Model.If we set consistent frequencies for all processors in an FSU, the power state in this FSU is relatively stable because the workloads in this FSU are fixed.We know the length of this FSU, so the energy consumption in an FSU can be predicted conveniently and precisely by the following equation:  =  × , where  and  represent power and time, respectively.Based on the definition of FSU, power function, and related notations in Notations, the energy consumption of jth host to finish all the VMs can be predicted as follows: where the power of host  , can be calculated by ( 6) in different situations.The length of FSU can also be estimated under different frequencies by (1).

kPP Algorithm for Energy Minimization.
According to the analysis of energy prediction, if we set consistent frequencies in an FSU, then we can predict the energy consumption in an FSU conveniently.If we set FSUs with different frequencies, the living time of VMs and power state of server will be different, which brings different energy consumption.There is an optimal solution that consumes minimum energy when all VMs end in this server.We want to find the frequencies combinations for all FSUs that bring minimum energy on the promise of ensuring the requirements of VMs. are different while others are the same, then  1 and  2 are neighbors.In addition, we define () as the cost function of total energy consumption of  according to (7) if we scale the frequencies like  in each FSU.Let a node present a state and an edge (, V) between two nodes presents neighborhood between  and V.The minimization problem is to find the "optimal" node that brings minimum energy without any violation of SLAs from the initial node in the graph.Lemma 4. Let the initial node represent the state in which all processors' frequencies are highest in all FSUs.If the initial state ensures SLAs for all VMs, there is a path from initial node to the optimal node with minimum energy consumption without any violation of SLAs.
Proof.Let   = ( ,1 ,  ,2 , . . .,  ,|  | ) represent the optimal state with minimum energy consumption without any violation of SLAs.If all processors' frequencies are highest in all FSU of   , the initial state is the optimal state.Otherwise, we select the FSU  in which the frequencies are not highest for all CPUs.If we scale the frequencies to highest in , the new state   will also ensure the SLAs for all VMs because processors are working at higher frequencies which leads to shorter execution time.  is one of the neighbors of the optimal state, which means that   can also move to   .Repeating the process above, we can find a path from   to initial state   , which represents that there is a path from   to   .which is too huge if there are many VMs.Therefore, we provide two heuristic algorithms to search the "optimal" solutions which are based on simulated annealing (SA) [35] and variable depth search (VDS) [36], respectively.

Input:
The state of host  ; Output: The possible state ; (1)   = host  .getVM(),  = host  .getFreqSpace()(2) set  0 be the state that the frequency is max for each CPU in all FSU (3)  =  0 ,  =  max = ( 0 ),  = 0,  = |  | (4) while  <  max or  doesn't change in  rounds do (5)  = ,  = random() (6)  , = random(  − .get()) / * Select a neighbor * / (7) .set(,  , ) / * Change frequencies combination of FSU  to  , * / (8) for  = 1 to  do (9) if VM  violates SLAs according to  then (10) go to (17) (11) end if (12)  (1) Simulated Annealing Based Heuristic Algorithm.By comparing energy consumption of a random neighbor, we can find a better state that brings less energy.If we repeat the process many times, we may find the optimal state.Let  0 represent the initial state in Lemma 4. In fact, since the simulated annealing (SA) algorithm has been proved to converge to the optimum with probability 1, it can be expected that our algorithm will output nice results by enough iterations.If we know the frequency steps and tasks' information, the living time of VMs is determined.Therefore, we can estimate the total energy of the situation of state  0 (line 3) using the energy cost function (⋅).The algorithm runs  max iterations to find the state where less energy is consumed compared to the initial state.In each iteration, the algorithm randomly selects an FSU  to change the frequencies of processors and generates a new state .This step takes (1) time.If the random neighbor  violates SLAs for any one of VMs, the state is discarded and our algorithm enters into the next iteration.This step takes ( , ) time, where  , is the number of tasks.Otherwise, if predicted energy  is less than ,  is selected as compared state for next iteration due to less energy consumption.The energy prediction takes ( , ⋅  ) time according to (7), where   is the number of processors.Besides, the algorithm also changes the state from  to  with the probability exp(−( − )/) suggested by Metropolis et al. [37] to give the possible to find optimal solution.The details of the simulated annealing based kPP algorithm are presented in Algorithm 1. Obviously, the time complexity of SA-based kPP algorithm is ( , ⋅   ⋅  max ).
(2) Variable Depth Search Based Heuristic Algorithm.The VDS-based kPP algorithm selects the state that brings minimum energy in a subset of neighbors and compares it to the current state.If the selected state consumes less energy on the promise of ensuring the SLAs of VMs, we will change the state to it.The initialized state of VDSbased algorithm is the same as the initialization of SA-based algorithm.The algorithm selects a subset of neighbors whose frequencies combination of FSU  are different (lines 5-6).The frequencies combination of FSU  with minimum energy will be selected (line 7) and generates a new state.The energy prediction takes ( , ⋅   ) time, so the selection of state with minimum energy takes (|| ⋅  , ⋅   ) time, where || is the size of subset.The algorithm checks the violations of SLAs of new state (lines 9-13).The process repeats for  max times or until the state  does not change in  iterations.Therefore, the time complexity of this algorithm is (|| ⋅  , ⋅   ⋅  max ).The effectiveness of the variable depth search is proved in [36].The details of VDS-based kPP algorithm are shown in Algorithm 2.
The local DVFS controller runs the kPP algorithm when the Global Scheduler asks it to predict the minimum energy consumption and return it to Global Scheduler.This is one of the opportunities to run kPP algorithm.When the workload changes, the power state will change.In addition, the execution time of VMs may have some errors which may lead to the error of energy prediction.Therefore, we apply the frequencies scaling when a VM finishes and scale the frequency for first FSU, which means that the algorithm only scales the CPUs' frequencies just for the first FSU while predicting the frequencies combinations for k FSUs.As the example shown in Figure 3, if the VM 4 comes at the time  1 and is allocated to the host, this host applies the kPP algorithm at that time and sets the CPUs' frequencies like FSU 1 of the result.When

Input:
The state of host  ; Output: The possible state ; (1)   = host  .getVM(),  = host  .getFreqSpace()(2) set  0 be the state that the frequency is max for each CPU in all FSU (3)  =  0 ,  = 0,  = |  | (4) while  <  max or  doesn't change in  rounds do (5)  = ,  = random() (6) randomlyselectasubset ⊆   (7)  = argmin((.set(,))), for all  ∈  / * Select the state form subset with minimum energy consumption * / (8) .set(, ) / * Change frequencies combination of FSU  to  * / ( 9) for  = 1 to  do (10) if VM  violates SLAs according to  then (11) g ot o( 15) (12) end if (13)  VM 1 finishes at the  2 , the kPP algorithm also runs to obtain the "optimal" state  and scales frequencies according to the result.Due to the specialities of kPP algorithm at running time, the iteration should be completed in a short time so that the local stage can scale the frequencies in time.

Global Scheduler
Different allocations of a new VM may affect the overall energy consumption, because the new VM executed on different servers will bring different energy consumption.We want to find the appropriate scheduling to minimize the energy consumption to finish all the VMs.We can obtain the different energy consumption with different allocation if we ask each host to predict the minimum energy consumption.Using the results of different allocations, we can select a better allocating scheme to reduce the energy consumption.Our goal is to minimize the overall energy cost of the whole cluster for finishing all VMs including the new VM VM  .To solve the energy minimization problem of VM scheduling, we first formalize the problem.Let The energy minimization problem of VM scheduling is to find the server which brings minimum energy of whole cluster if VM  is allocated to it.The minimum energy consumption of each server can be predicted by kPP algorithm, represented by EMIN  for jth host.If an incoming VM is allocated to jth host, the value of EMIN  changes while the minimum energy consumption of other hosts does not change.When VM  is allocated to the yth host, the energy consumption becomes EMIN   + ∑ ∈−{} EMIN  , where EMIN   is the minimum energy cost if VM  is allocated to yth host.We have So we can select the host that brings minimum energy change ΔEMIN  to run the incoming VM.We call the scheduling algorithm Minimum energy Change (MC), shown in Algorithm 3. The Global Scheduler sends the information of a VM after analyzing to a subset of host (line 1) and each host returns the predicted minimum energy change on it.Therefore, we can run the energy-efficient algorithm in parallel to

Input:
A new VM VM  ; Output: Designate host and processor for loading VM  ; (1) Host Monitor selects a subset of active hosts that can load the VM (2) notify the information of VM  to all candidates (3) each host estimates the minimum energy change ΔEMIN  if VM is allocated to processor  of host  (4) host  = argmin ∈ (ΔEMIN  ) (5) return host  Algorithm 3: Minimum energy change allocation.obtain the minimum energy consumption for each host when a VM arrives.After the local DVFS controller predicts the minimum energy change, it also records the best processor to hold this VM.When this VM is really allocated to it, the VM will be scheduled onto this best processor.Once deciding the host, the selected host will start a VM to run the VM working under the selected frequencies.It is obvious that the time complexity is (||++), where  and  represent the time complexity of local predicting algorithm and communication time, respectively.
The number of candidate hosts will affect the total cost of a cluster, we evaluate the influence of kPP strategy on the total energy cost.Assume the total VM number is   ; the size of subset for candidate hosts is  ℎ and the average run-time of local DVFS algorithm is 0.5 seconds.Let the mean power consumed by a VM be  Watt and the average length of VMs be  seconds.The kPP algorithm runs when the MC algorithm asks candidate hosts to estimate energy consumption; the energy consumption of kPP algorithm in this part is  ℎ ⋅  ⋅   ⋅ 0.5.The kPP algorithm also runs when a VM is allocated and finished, so the energy for this part is 2 ⋅   ⋅  ⋅ 0.5, and the total energy produced by all VMs is   ⋅  ⋅ .Therefore, the energy consumption ratio (ECR) of kPP algorithm compared to the total energy cost is In a large scale data center, the mean VM length can be acquired according to the historical data, and we can carefully select the size of candidate host estimating the energy change of offloading a new VM to increase the energy consumption of kPP algorithm as less as possible.

Evaluating Power Prediction.
The energy prediction of server depends on the accuracy of power prediction under different utilizations and frequencies.We have evaluated the multiprocessor power prediction method by comparing the real power consumption to the estimation of power model in different status for a specific host.The real experimental environment is shown in Figure 4.The details of server R710 used in our paper are shown in Table 1.We explore the real power consumption R710 and use the first seven steps when  we evaluate the power model because the last frequency is very close to the frequency 2.39 GHz.The power consumption of the host when both processors are fully utilized at frequency level 2.39 GHz is 192 Watt and the static power when the system is not idle is 110 Watt.The proportional coefficient  = 2.33135 is obtained and calibrated by offline experiments.
For evaluating the multiprocessor power prediction, we randomly select some frequencies and utilizations of two processors and use power model to estimate the power consumption.At the same time, we measure the real power consumption of R710 server with the same frequencies and utilizations of processors and results are shown in Table 2.We use the Aitek AWE 2101 power analyzer to measure power.Table 2 shows that the estimated power is very close to the real 10 Scientific Programming power consumption of the server with the same utilizations and frequencies of different processors.

Convergence Speed.
In this subsection, we compare the convergence speeds of the two algorithms and present them in Tables 3 and 4. The reported run-times and iteration times are for running the two algorithms of a synthetic 2-processor and 4-processor with 7 frequency levels machine where 12 and 24 VMs are executed in parallel.The RT and EC in Tables 3 and 4 represent running time and energy consumption, respectively.We can draw three obvious conclusions: (1) the SA-based algorithm iterates significantly faster than the VDSbased algorithm which means that more iterations can be executed during the same period; (2) the VDS-based algorithm outperforms the SA-based algorithm while it leads to the same iteration times if the host is equipped with serval processors; (3) when the processor number and frequency levels are relatively small, both the two algorithms converge rapidly and obtain the close results.And the VDS-based algorithm perform better than SA-based algorithm when the number of processors is small.This conclusion suggests that the service provider may prefer the SA-based algorithm if they persist in finding the best frequency configurations.Notice that, in this experiment, we use a very extreme setup where a 4-processor host is enforced to run as much as 24 VMs at the same time, which means a processor must be responsible for 6 VMs on average.In fact, in the real datacenters, it can be expected that the average VM number on a single processor is far less than 6.Thus our algorithm can run efficiently enough to serve for our online VM scheduling algorithm and obtain an accepted result within 1 second which is close to results of more iterations.

Experiments in Real
Environment.Our real experimental environment has three servers and a controller on a virtual machine.The power is measured by Aitek Power Analyzer AWE2101.Each R720 server whose details are shown in Table 1 runs the kPP algorithm to predict energy and control processors' speed.We combine the kPP algorithm with the random (Ran) and first-fit (FF) VM scheduling.The random scheme allocates the coming task to the processor randomly from the subset of processors which can offload the new task without causing any violations of QoS requirements.The firstfit scheme gives each processor an index and allocates the coming task to the processor with smallest index who can offload the new task without causing any violations of QoS requirements.Meanwhile, the proposed global assignment (MC) is also combined with the default DVFS controller Ondemand [38] (DEF) in Linux.The two-tier energy-aware resource management proposed in this paper is represented by MC-kPP.We compare these six strategies to evaluate the performance of our solution on energy savings.For each VM, its execution time is generated uniformly at random between a minimum and maximum living time represented by  min and  max , respectively.The deadline of a VM is set from 1 to 1.5 times longer to its execution time randomly.Moreover,  In these experiments, the iteration times are 10000 for SAbased kPP and 1000 for VDS-based kPP and the number of neighbors in VDS-based kPP algorithm is 20.The results of SA-based and VDS-based algorithms are very close, so we show the results of VDS-based kPP algorithm in Figure 5 whose legend represents different numbers of VMs.Meanwhile, the size of candidates in MC algorithm is equal to the number of servers.As we can see in Figure 5, the energy savings of our solution can reach from 8% to 17% in the real environment with 3 servers.

Simulation Results
. Due to the inaccessibility of a large scale datacenter, we conduct the simulations to evaluate MC-kPP solution in a larger cluster.We model Dell R710 servers to service the dynamically arriving VMs.Meanwhile, the attributes of generated VMs are the same as the attributes introduced in Section 7.3.
As we can see in Figure 6(a), the local kPP algorithm can reduce energy consumption of a specific server compared to Ondemand strategy.In addition, the influences of global scheduling algorithm are greater than the influences of local DVFS controller on energy savings when different scheduling algorithms are applied.The lengths of VMs in Figure 6(b) are generated uniformly and randomly between 600 and 7200 seconds.The legend in Figure 6(b) represents the arriving ratio of VMs in one minute.The results show an increasing tendency of the energy saving ratio with the increments of VM numbers and the best result can reach about 28%.In Figure 6(c), we investigate the influence of lengths of VMs; VMs are generated in different lengths which are shown in the legend.As shown in Figure 6(c), MC-kPP can also save more energy when the VMs become more.At the same time, MC-kPP performs better when the average execution time of VMs becomes longer, because the influence of local kPP algorithm itself becomes smaller and the effectiveness of frequencies scaling becomes more obvious.In addition, the size of subset in MC algorithm is also investigated in Figure 6(d); the energy consumption of kPP algorithm in local machine is below 0.5% of total energy consumption when the average execution time is long.With the increment of subset size, the performance of MC-kPP is improved because a better server can be found in a larger scale.We also evaluate the effectiveness of MC-kPP in different scales of data centers ranging from 50 to 5000 servers with different features of arriving VMs.According to the results of Figure 7, the MC-kPP outperforms other strategies in different scales of datacenters, which can reach about 25% energy savings.With the increasement of host numbers and VM numbers, MC-kPP performs stably in different scenarios.

Conclusions
In this paper, we propose a cooperative two-tier energyefficient strategy to manage the VM allocations and adapt   frequencies scaling for saving energy.A frequency scaling algorithm is proposed based on the practical power and energy prediction.The Global Scheduler collaborates with local DVFS controller to assign VMs and save overall energy.
In addition, two heuristic algorithms are provided for searching the optimal solutions which predict minimum energy consumption.The time complexities of both the algorithms are acceptable with satisfactory results according to the Task (s i , d i , e i , u i )

Lemma 4 .
There are |  | possible frequencies combinations in each FSU, so there may be |  | |  | possible states of all FSU with different energy consumption.Let   represent the frequency levels of jth host; we have |  | = |  | |  | .We want to find the optimal solution with minimum energy consumption in these |  | |  | possible states while still ensuring SLAs.However, the searching space may be |  | |  ||  |

Figure 5 :
Figure 5: Energy consumption of real system.The legend in (a) and (b) means that the number of VMs needs to be allocated.

Figure 6 :
Figure 6: Performance evaluation on different aspects.The legend in (a) represents the "server numbers-total VM number."The legend in (b) represents the "VM request number arriving in a minute."The legend in (c) and (d) represents the "(minimum living time, maximum living time)."

Figure 7 :
Figure 7: Energy consumption of different scale of datacenters with different number of VMs.
VMs.The FSU represents a period of time that the number of VMs does not change.Once the number of VMs changes, that is, a VM coming or leaving, it enters the next FSU.An example of FSU is shown in Figure3, which includes four FSUs.Assuming a VM is stopped at time  2 and next VM is ended at time  3 , then  =  3 − 2 is an FSU.If a VM is allocated to the server at  1 and the VM finished at  2 ,  =  2 −  1 is said to the first FSU from current time.It is obvious that the number of VMs in the host is equal to the number of FSUs if all VMs finish at the different time, and we suppose that VMs are ended at the different time in a host in the rest of this paper.

Table 1 :
Details of servers.