On-demand resource management is a key characteristic of cloud computing. Cloud providers should support the computational resource sharing in a fair way to ensure that no user gets much better resources than others. Another goal is to improve the resource utilization by minimizing the resource fragmentation when mapping virtual machines to physical servers. The focus of this paper is the proposal of a game theoretic resources allocation algorithm that considers the fairness among users and the resources utilization for both. The experiments with an FUGA implementation on an 8-node server cluster show the optimality of this algorithm in keeping fairness by comparing with the evaluation of the Hadoop scheduler. The simulations based on Google workload trace demonstrate that the algorithm is able to reduce resource wastage and achieve a better resource utilization rate than other allocation mechanisms.
Cloud computing is a new paradigm that provides computational resources as a highly scalable service in a pay-as-you-go model and implements high performance computing in a distributed way [
First of all, the fairness problem is considered in resource allocation, which means no one is allocated much better resources than others. In multiresource environment, resources of various types, such as CPU, memory, and disk storage, are required by users with different demands. In this scenario, fair allocation aims to equalize the largest resource fraction of total availability allocated to each user [
The other goal of resource allocation is to guarantee the computational resources to be fully utilized. Due to the variety of resource requirements with different VM types, lots of resource fragments in physical servers could be generated during the VM deployment. Therefore, an efficient resource allocation method should minimize the amount of resources fragments.
Motivated by these goals, the allocation problem considered in our work is based on two key principles. One is that multiple types of resources should be shared among users in fair way. The other is that complementary types of VMs are packed on physical servers in order to better utilize the underlying resources.
This paper proposes a resource allocation algorithm based on game theory for multiresource environment. The problem is modeled as a finite extensive game. Each physical server providing resources is treated as a game player and knows the utility information of other players. To achieve a fair allocation among users while keeping a high resource utilization level, we design a fairness-utilization tradeoff utility function. A measurement is established for fairness based on the dominant resource fairness (DRF) mechanism [
In summary, the main contributions of this paper are as follows. A cloud resource management system is designed to provide on-demand resources in time. The multiresource allocation problem on virtual machine level is modeled as a finite extensive game with perfect information and the utility function is designed by trading off fairness and resource utilization. A game theoretic resource allocation algorithm is proposed to get an optimal resource allocation decision, which guarantees fairness of multiple resources sharing among separated users and reduces the resource fragments to increase the efficiency.
This paper is organized as follows. In Section
Resource management is a significant issue in cloud computing, as the on-demand resources offering manner. There are various studies on resource management in cloud computing [
Recently, game theory has been applied to solve resource allocation problems in cloud computing. Ye and Chen study noncooperative games for the load balancing and virtual machine placement problem [
The coordination of resource sharing is also one underlying challenge for resource allocation in clouds. Many works have studied the fair allocation so far, for instance, fair scheduler for Hadoop, which divides resources as fixed-size partitions, or slots [
The fair allocation problem for multiple types of resources allocation has been studied by Ghodsi et al. [
Our work makes use of DRF approach to measure the fairness of resource allocation, as well as exploiting a way to improve the resource utilization for greater optimization.
For a cloud with a large amount of heterogeneous physical servers in data center, how to achieve efficient resource consumption is another interesting direction of resource allocation [
Steinder et al. investigate the resource allocation for a heterogeneous mix of workloads and present a system to manage data center to increase the resource consumption of servers [
In contrast to these studies, our work tries to place virtual machines on proper physical servers to minimize resource fragments and achieve the spatial efficiency.
Although some existing researches study on the tradeoff between fairness and efficiency [
The resource allocation mechanism devised in our work exploits incorporation of multiresource nature of virtual machine so as to avoid wastage, while also incentivizing users to share resources in fair way.
Each cloud provider has a large scaled and distributed data center with heterogeneous physical servers and provides the numerous computational resources as a pay-per-use business models. Infrastructure-as-a-Service providers let users apply for virtual machines and charge them for the occupied time. A VM is created by Xen, VMware, or hypervisor on a physical server. Cloud users deploy their high-performance applications on a cluster of VMs to accomplish their missions (web services or MapReduce jobs), which are called jobs in our work. Cloud providers typically offer a group of possible VM types to simplify selection for users, and each type is defined by specifying the number of CPU cores, the memory size, the storage size, and the quantities of other resources.
Since the VMs required by different users are heterogeneity and vary in time, providers have to adjust their resource allocation decisions dynamically. To this end, a resource management system for cloud is designed first.
We are interested in providing a fair and effective resource allocation mechanism on a distributed and complex cloud system; thus, a resource management system is necessary to centralized control and coordinate the physical resources.
Figure RC: every physical server in cloud data center should register its information to RC for connection and management. CEM: this component retrieves information like host names, IP addresses about physical servers, monitors their statuses (starting, running, shutdown) and the consumption of CPU, memory, and disk storage. IM: it is responsible for deploying and managing the virtualized infrastructures, such as creating and releasing virtual machines. CC: it is the computing center to provide the most appropriate decision about resource allocating.
A framework of cloud resource management system.
CEM is monitoring the statuses and resource consumptions for physical servers registered in RC. Once a new physical server started to join the cloud, the information like MAC address, IP address will be registered to RC. When a user sends a service request to cloud, the requirements of resources in this request will be received by CC. CC makes an intelligent resource allocation decision based on the information collected by CEM. The allocation decision is executed by IM to manage the physical servers and place the virtual machines.
This paper proposes an adaptive resource allocation mechanism in cloud environment, which is the problem of mapping a limited quantity of resources to independent users to finish their jobs. In our resource management system, the allocation of resources is in a time-slotted paradigm. The dynamically arriving user requests of current time slot are recorded and will be served for resource allocation at the start of next time slot. Each start of a time slot is called a decision moment. If a user request cannot be served by the idle resources in current time slot, it will be deferred to the next time-slot or trigger the management of physical servers, which is not considered in our work.
Suppose there are
The job submitted by user
In Figure
An example of cloud resource allocation.
Resource requests submitted by different users can be defined as a matrix. Let
The goal of the resource allocation problem, given the resource requirement matrix and the capacity sets of physical servers, is to determine a reasonable mapping from resources to cloud users. In other words, different kinds of resources each physical server has should be fair and effectively distributed to all users to create their required VMs.
For the physical server
An allocation decision
Figure
Furthermore, the total number of resource
Each user in a cloud asks for a type of VM to run its job. The execution of a job involves multidimensional resources, and the resource requirements differ from job to job. For example, a data mining job needs high capacity of disk to store a large number of data while a calculating job might need more CPU than disk to get a result.
In order to support elastic multiresource consumption, we propose a fairness-utilization tradeoff game algorithm (FUGA), which makes an optimal tradeoff between fairness and efficiency.
In this paper, the fair allocation problem is considered for multiples types of resources. For a single type of resources, fair allocation means each user has equal share of resources. However, in multiresource environment, since users have heterogeneous requirements for different types of resources, resources should be assigned to users in proportion to their requirements. Each user has a maximum share fraction of total capacity among different resources which is called dominant share. The major goal of fair allocation considered in our work is to equalize the dominant share of each user.
Three widely used properties should be satisfied to achieve fair allocation [
Sharing incentive means the amount of resource each user should receive is at least as much as simply splitting the total resources equally.
Envy-freeness is the property that no user prefers to the allocation of another user.
It should be impossible to increase the resource amount of a user without decreasing the allocation of another user.
The fairness of multiple resources sharing is measured by extending the dominant resource fairness (DRF) mechanism that Ghodsi et al. put forward at 2011. In words, to mathematically gauge the fairness of a resource allocation mechanism, the DRF is set to be the benchmark of fair allocation. Each allocation decision may have a deviation contrast to the fair allocation, called fairness variance.
Given a resource requirement matrix
Secondly, as mentioned before, the dominant share of a user is the largest fraction of any kinds of resources allocated to that user. Let
Consider the example in Figure
Fairness variance is defined to measure the fairness of a resource allocation. Let
We next turn our attention to the resource utilization problem. During the running time, the resources of physical severs may not be fully used. Consider the example in Figure
In multiresource environment, to improve the resource utilization, resource consumption on each resource dimension should be concerned. To address this challenge, our approach improves the resource utilization rate of physical servers based on two considerations. Firstly, the max-min approach, which means we should maximize the minimum consumption among the multiple resources of each physical server, is applied here. Secondly, the utilization of a physical server can be optimized by minimizing the uneven consumption in the face of multidimensional resources, since most of the resource fragments are caused by the unequal multiresource requirements [
As said earlier, it is critical to consider the bottleneck resources consumption among the multiple types of resources. Let us denote the vector
More formally, skewness is introduced to quantify the unevenness for the utilization of different resources. The reduction of skewness can be positive to combine multiple types of resources better and improve the utilization:
In order to achieve a high utilization of computing resources, the cloud provider tries to coplace VMs on available machines such that the resource requirements on one server are complementary to each other. This VM placement problem can be reduced to a multidimensional bin packing problem. Several heuristic algorithms such as Best-Fit, First-Fit, or Random-Fit are typically used to address it. In our work, a precombination approach introduced in Section
In this section, a game theory approach to resource allocation is presented, aiming at keeping a fair allocation as well as reducing the amount of resource fragments. The game model for resource allocation problem is described first, followed by the proposal of the FUGA algorithm.
A specification of a game is an extensive game which provides the sequencing of all players’ possible strategies and their decision points. A finite extensive game with perfect information has a finite set of players, and each player knows the information of other players’ strategies and all possible utilities. A subgame perfect Nash equilibrium (SPNE) is a solution such that players’ strategies constitute a Nash equilibrium in every subgame of an original game.
In our work, the resource allocation problem is modeled as a finite extensive game with perfect information. Physical servers with idle resources are modeled as the selfish players and each player has a limited number of possible allocation matrices.
The following symbols are introduced to define the resource allocation game.
A resource allocation game is represented as a four-tuple vector
At decision moment, CC gets the resources consumption information of each physical server in data center from CEM.
For a physical server, there are a variety of possible combinations to be fulfilled by different types of VMs without exceeding the capacity. A combination of physical server
In this resource allocation game, the physical servers with idle resources are game players, and they are individual rationality to maximize their own utilities. Based on the discussions in previous, the design of the utility function has a crucial impact on players’ choices and the result of the game. In our allocation model, one global objective of this allocation game is to share resources impartiality. Furthermore, based on the efficient principle each individual player tries to minimize their resource wastage, that is, they prefer to choose those combinations with high utilization. To exploit fair resource sharing and also take the maximization of resource utilization rate into account, a fairness-utilization tradeoff utility function is designed as follows:
Each player of this game aims to choose a strategy to maximize its own utility so that the goal of a resource allocation game would be naturally considered as the following optimization problem:
Firstly, to achieve a high level resource utilization rate by reducing the resource fragments generated in the virtual machine placement process, a precombination approach is proposed to provide a set of possible strategies for each game player. A precombination phase is defined in this approach to compute any possible coordinate placement combinations for each physical server. For example, to place these three types of VMs together on the Physical Server 1 in Figure
To facilitate efficient selection of resource allocation, the minimum resource utilization of each combination is calculated by making use of formula (
We describe the ranked combinations mathematically as an order
Since each physical server with sufficient idle space has a set of possible combinations calculated in advance and ordered by resource utilization, only the top of
If
Each game player has a set of possible combinations to choose now. Once all players picked up one of their combinations, the allocation matrix
For an allocation
Therefore, to determine the optimal resource allocation decision, the utility function that derives from the fairness variance function and skewness in formula (
Supposing the Physical Server 1 and Physical Server 2 both choose their second combination, then
Assuming that
The resource allocation game is modeled as the interaction of physical servers to make choices with perfect information. Each extensive game can be represented as an extension-form game tree. Physical servers take actions in the ascending order of
The extension-form game tree.
Backward induction is a quite straightforward solution to find an SPNE for these extensive-form games with perfect information [
In Figure
The pseudocode implementation of FUGA is given in Algorithm
Input: Output: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) (34) (35) (36)
According to the Zermelo’s theorem, a finite game with perfect information has a pure strategy Nash equilibrium, and for a finite extensive game with perfect information, there always exists an SPNE. If no player has the same utilities at any two terminal nodes of subgames, then a unique SPNE can be derived from backward induction.
Now we prove that FUGA satisfies the three required properties for fair allocation described in the previous section.
An allocation algorithm satisfies sharing incentives if
Thus,
An allocation algorithm is envy free if
Assume user
This section presents a comprehensive evaluation of the resource allocation algorithm proposed in the previous section. The evaluation of fairness is done through a prototype implementation of our FUGA algorithm running on an 8-node cluster first. And then the conduct of Google Trace-driven simulations shows that FUGA is efficient in improving the resource utilization by contrast with the First-Fit Algorithm and the management mechanism of Google cluster.
The experiments to evaluate the performance of fair allocation were done on a small scale cluster with 8 physical nodes which consist of a Dell PowerEdge R910 with two CPUs (Xeon E7-4820 2 GHz 8cores), 32 GB memory, and 300 GB disk storage, three Dell Optiplex9010 with one CPU (i7-3770 3.40 GHz 4cores), 8 GB memory, and 500 GB disk storage, and four Dell Optiplex745 with two CPUs (6600 2.4 GHz 2cores), 4 GB memory, and 200 GB disk storage. Three kinds of resource considered in this experiment include CPU, memory, and disk storage. The simulations were run on a Dell Optiplex9010 with JDK 1.7. To reduce the complexity of simulations, the following assumptions are made: (1) two kinds of resources (i.e., CPU and memory) are considered in our simulations. (2) Each job request submitted by a user indicates the predicted maximum consumption of different resources and will be handled by a cluster of VMs with the same type. (3) The total amount of resources provided for each time slot is previously estimated by cloud provider.
Table
The VM types.
VM type | Resource | ||
---|---|---|---|
CPU core | Memory | Disk | |
Tiny | 1 | 1024 MB | 5 GB |
Small | 1 | 3072 MB | 15 GB |
Medium | 2 | 6144 MB | 30 GB |
Large | 4 | 12288 MB | 60 GB |
X large | 8 | 24576 MB | 60 GB |
This group of experiments aims to show how FUGA dynamically shares resources more close to users’ requirements in contrast to the Hadoop fair scheduler.
Hadoop is one of the most popular frameworks for storage and large scale data processing. Hadoop fair scheduler groups jobs into different pools and each pool chooses its jobs based on FIFO or fair sharing [
Three services belonging to three users were deployed on a cluster of VMs created on this 8-node cluster. Each service can be divided into a series of MapReduce jobs and has different resource requirements in different phases. The 8-node cluster is initially empty with full capacities. FUGA analyzes the requirements in time and provides optimal resource allocation decisions to create a VM cluster at each decision moment. As a comparison, these three services were also deployed on the Hadoop cluster running on the initialized 8-node. Figures
Resource consumption for the three services.
Requirements for CPU
Requirements for memory
Requirements for disk storage
Figure
Dominant share for Hadoop scheduler and FUGA.
The dominant share for Hadoop scheduler
The dominant share for FUGA
In contrast to Hadoop fair scheduler, FUGA is aware of the heterogeneity requirements for multiresource environment and significantly more approach to the demands of users on each resource dimension.
This section highlights the performance of FUGA on improving the resource utilization rate by analyzing the proportions of allocated resources. The higher proportion an allocation achieves, the less resource are waste. Simulations were conducted using the Google workload trace as the input. This trace collects the data (job workloads, server capacities, execution time, resource utilization, etc.) of Google cluster from about 12,500 machines over the 29 day period.
As
Allocated resource proportions for first fit and FUGA.
Allocated CPU proportions
Allocated memory proportions
As to the large scale environment, it is clear that the less resource fragments produced during the allocation, the higher resource utilization rate we get. Since FUGA can achieve a better performance for proportions of allocated resources if the physical servers scale up to a large number so that to bring a higher utilization rate than the first fit algorithm.
Overall, the results in Figure
To study the performance of efficient allocation by evaluating the resource utilization rate, the total number of physical servers is fixed to 300 in this group of Google trace-driven simulations. The parameter
The Google trace provides the information of users’ requirements and the actual allocated resources of running tasks in Google cluster [
Figure
Resource utilization in Google and FUGA.
CPU utilization
Memory utilization
In contrast to the utilization of Google cluster, our algorithm provides more efficient resource allocation decisions. It is not only because FUGA leads to less resource fragments during the allocation as discussed in last section. FUGA also tries to minimize the uneven utilizations for multiple resource dimensions while making the decision on resource allocation.
In this paper, we have investigated the resource allocation problem in cloud computing. We consider multiple types of resources like CPU, memory, and storage on virtual machine level to propose an allocation algorithm called FUGA. The algorithm supports not only fair resource allocation for users, but also efficient resource utilization for each physical server. The resource allocation problem is modeled as a finite extensive game with perfect information and the FUGA algorithm results in a Nash equilibrium decision.
Some experiments and simulations are conducted to evaluate the performance of FUGA by comparing to other related works. The results show that the proposed FUGA can achieve better performance in fair allocation than Hadoop scheduler. FUGA can also guarantee more efficient resource allocation rather than the first fit algorithm and the allocation mechanism in Google cluster by setting the proper parameters for the fairness and utilization tradeoff.
Future work could usefully study the fairness-utilization tradeoff when jobs have machine preferences. Another direction involves considering the allocation problem under the job priority situation. Moreover, we plan to investigate how to use this game theoretic resource allocation into a federated environment with multiple resource providers.
Cloud users
Different kinds of computing resources (CPU, memory, storage, etc.)
Physical servers with idle resources
Resource capacity vector of physical server
The VM type required by user
Possible VM combinations of a physical server
Resource requirement matrix of all users
The amount of resource
Total amount of resource
Resource allocation decision
Fairness variance for an allocation decision
The number of possible strategies for each player
Unevenness for the utilization of resources
The initial resource space of a physical server
Resource utilization for physical server
The utility function of resource allocation game.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was partially supported by the NSF of China under Grants no. 61173048 and no. 61300041 and Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant no. 20130074110015.