A Unified Algorithm for Virtual Desktops Placement in Distributed Cloud Computing

1School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China 2Public Service Platform of Mobile Internet Application Security Industry, Shenzhen 518057, China 3Shenzhen Key Laboratory of Internet of Information Collaboration, Shenzhen 518055, China 4Shenzhen Applied Technology Engineering Laboratory for Internet Multimedia Application, Shenzhen 518055, China


Introduction
Distributed cloud computing has been widely adopted to support service requests from dispersed regions by exploiting the differences of locations and various service capabilities.As one of the most promising services in cloud computing, Virtual desktop technology [1,2] facilitates users accessing virtual machines (VMs) [3] named as virtual desktops (VDs), deployed in remote data centers (DCs) by the local thin clients.Compared with traditional personal computers or desktops, the local clients are equipped with less resources and there is no data or file saved in local clients, hence reducing the initial investment greatly and also realizing better confidentiality.Moreover, it can be more green [4].VDs have attracted the interest of many CSPs such as Microsoft [5], VMware [6], Huawei [7], and other traditional telecom operators [8].It has been widely used around the world [7].
The service delivery scheme is depicted in Figure 1.VDs are placed in geodistributed DCs closer to the branch companies (BCs).Users in each BC access cloud services through one specified gateway by using local thin clients.Because of the relatively small scale of distributed DC or the specific availability policy [9] (e.g., an upper limit of VMs of one BC is designated in one DC), it is possible that more than one DC is used to accommodate the VDs for a big BC [10].The distributed DCs are often connected by dedicated high-speed links or expensive long distance links.Inside each DC, servers are normally networked in a tree-like topology.In the context of infrastructure as a service, VD is corresponding to VM (in this paper, we use servers and physical machines (PMs), VDs,  and VMs interchangeably).The traffic delay inside DC is mainly dominated by switches or routers in the path.The cost matrix [11] in Figure 1 demonstrates the number of switches between each PM for a tree-like topology, such as PortLand, VL2, and BCube.
CSP faces great challenges in VD service delivery.On the one hand, it should guarantee the service level agreement.Especially for the latency sensitive VDs, it should shorten the response time on the premise of keeping the delay within the threshold agreed.The delay consists of three parts.The first is the time between thin client and the gateway.The second is the time between the gateway and DC.The third is the queueing delay inside DC.Now that the first is fixed for each BC once the gateway is specified, so we consider the latter twos.For the second one, we only consider a one-way delay.Delay optimization drives the CSP to deploy VMs in DC closest to each BC.But maybe the closest one is not the best candidate from the economic perspective.
On the other hand, aiming to make maximum profit, CSP also strives to reduce cost as much as possible.At DC level, appropriate DC should be selected to pursue cheaper electricity price and less long distance inter-DC links consumption.Sometimes competition will occur where adjacent BCs compete for the resource of one DC.It complicates matters even more.Inside each DC, VMs should be consolidated and stay together so that fewer PMs are used, power is saved, and less inter-PM bandwidth is consumed.Because nodes, bandwidth, and power comprise 75% of DC cost [12], they are the prime consideration to optimize.
When CSP selects DCs with cheaper electricity price and closer to BC, it must consider the service type, resource capability of PMs inside each DC.The normal two-phase scheme selects DCs according to power and location objective, then determines PMs, and assigns VMs to the hosts.It is a natural but maybe not a good strategy, because the determination of DCs has limited the candidates of PMs to a large extent.Inside each DC, the assignment of the VMs to these PMs cannot guarantee the optimal overall cost.So the decision of DCs is also subject to the assignment of VMs to proper PMs inside each DC.In nature, there is a noncooperative relation which leads to a bilevel structure.Each level pursues different objectives.Some are consistent and others are not (it is detailed in Section 3).CSP should balance quality with cost by considering both DC and server selection simultaneously.
To the best of our knowledge, our work is the first to explore unified algorithms for multi-BC virtual desktops placement.The main contributions are as follows: (1) We formulate the problem as multiobjective bilevel programming considering resource provision at both DC and server level.
(2) A novel unified algorithm, segmented two-level grouping genetic algorithm (STLGGA), is proposed.It can realize the selection of DC and server simultaneously.It also minimizes the response delay between VDs and BCs at the least cost consisting of bandwidth, server nodes, and power.
(3) Extensive simulation demonstrates that STLGGA outperforms the baseline algorithm for both multi-BC and single BC scenarios.For multi-BC, it achieves a 13% shorter delay, while saving power and resources by 21% and 6% on average, respectively.
The remainder of the paper is organized as follows.Related work is reviewed in Section 2. Section 3 formulates the problem and Section 4 presents a novel GA.It is evaluated in Section 5 and the whole paper is concluded in Section 6.

Related Work
Various objectives of virtual desktops placement have been explored recently.Most works take into consideration the resource efficiency inside DC [2,[14][15][16].Man and Kayashima [14] use bin packing (BP) method to find the group of VDs which is suitable for a server so that minimal hosts are used.Makarov et al. [15] propose a tool for evaluating VDs.It focuses on assessing the effect of VD access protocol so as to guide the resource configuration according to the reaction time of the task execution.Deboosere et al. [2] study different aspects to optimize resources and user satisfaction.The resource requirement is predicted first, and then overbooking category is adopted and resource allocation and resource reallocation are used to achieve load balance or energy efficiency.This method provides a VDs resource management framework.VDs secure sharing is explored by [16].But, in all these works, no location of VDs and inter-DC resource are considered.
Other works investigate VDs deployment across DCs.Kochut [17] prefers placing VDs across DCs in different location to lower the power consumption.But no service quality is touched.OpenFlow based mechanism is proposed to distribute VDs in multiple DCs so that the performance and scalability are maximized [18].However, the authors focus on route setup, route selecting, and load balance between thin client and DC.The most similar work to ours is [1].Latency sensitive VDs are optimized by exploiting the geodistributed DCs location.A greedy algorithm VMShadow is presented by migrating VD to its user while considering the overhead of migration.But it does not establish mapping between VDs and PMs.Furthermore, the resources at DC level and inside DC as well as energy cost are not involved.
Only seldom papers [13,19] consider the selection of DC and server simultaneously when placing VMs.Yao et al. [19] propose a two-time-scale Lyapunov optimization algorithm to reduce power cost for delay tolerant workloads.A multilevel group GA (MLGGA) is proposed to reduce carbon emission by exploiting green energy [13].The main idea is DC consolidation and PM consolidation.Herein, the distributed DCs are viewed as the higher level group and servers in DC as lower level group.This scheme can group the items and is designed for multilevel BP.But it does not consider bandwidth optimization and quality of service guarantee.Both of them focus on the general VMs.Calyam et al. [10] present a utility driven model U-RAM to allocate resources for VDs so that the utility in each DC is maximized.The resource should be enough to ensure the timeliness user perceived and coding efficiency of VD access protocol.It adopts a twophase scheme: the DCs are selected based on balance, power, or express migration.Then VMs are assigned to PMs in the selected DC.Inter-DC bandwidth cost is not considered.The overall resource cost in the two levels cannot be optimized integratedly.All works cannot capture the noncooperation relation between DC network and servers.

Formulation
Suppose there are  BCs and each requires   ,  = 1, 2, . . .,  VMs, and ∑  =1   = .There are  candidate DCs.In each DC ,  = 1, 2, . . ., , there are   PMs.The total number of PMs in all DCs is ; that is, ∑  =1   = .The cost and quality aware multi-BC virtual desktops placement problem can be summarized as placing  VMs belonging to  BCs on  PMs which distribute in  DCs, so that the maximum distance between DCs and BCs being served is as short as possible, while minimizing the overall cost at both DC and server level, including power, network, and server.The problem is modeled as multiobjective bilevel programming (MOBLP).The low level only considers resource cost of PM nodes (the former half part of (, )) and bandwidth (the latter half part) inside each DC :

Low Level
where  ℎ is the price for resource ℎ. ℎ is summed from 2 because the bandwidth cost has been calculated explicitly in the latter part.Multiplier    means that only cost of active PMs needs to be considered.  is the th layer bandwidth price, where  = 1, 2, 3 represents bandwidth in access, aggregate, and core layer (Figure 1), respectively.Normally, is the th layer bandwidth consumption and defined as the sum of the th layer traffic of VMs across PMs (2).So it equals zero for the same PMs (3).Consider In each PM, the resource capacity should be respected.Because if VMs are placed in one PM, then the inter-VM traffic is changed to intra-PM traffic and no outgoing bandwidth is occupied; the intra-PM traffic is subtracted.The bandwidth constraint is as follows: The other resources constraint is For each DC , the low level programming is written as 3) , ( 4) , ( 5) , ∈ {0, 1}  = 1, . . .,   ,  = 1, . . .,   .
Constraint (7) states that a PM is viewed as active if it hosts at least one VM.Constraint (9) implies that all VMs should be assigned and a VM can only be placed in one PM.

High Level Objectives and Constraints.
High level focuses on DC selection.It optimizes the delay, the overall power, and physical resource cost.Binary variable   indicates whether there exists a PM which is used to serve BC  and the PM is in   .If it is true then   = 1 and 0 otherwise.Once there exists one active PM in   , the   is viewed as active.So we have Suppose the one-way delay between   and BC  being served is   .It can be estimated by some extensively studied work [20].We hope to ensure that all the delays are within a threshold   which is the maximum delay permitted; that is, Multiplier   means that we only care about the delay between active DC and the BC being served.At the same time we want to reduce the delay to a minimum.This is the first objective: The second objective aims to optimize power cost of all the selected DCs by leveraging the geodiverse electricity price: where   is electricity price of   . is the coefficient to reflect the relation between power and CPU load.   is the power consumption of    in idle or standby state.Because power grows largely positively proportional to CPU utilization [21], we use an affine function of CPU load (∑   ∈    2  ) to estimate power cost.To make the power consumed and physical resource cost comparable, we follow the way of [12].All the one-time purchased physical resource cost is amortized in a reasonable lifetime.So all the physical resource prices in the formulation are the amortized ones.Implicitly, in the formulation, we only balance the cost in the amortized period.The former half of  2 (, , ) represents power cost caused by workload and the latter half is power in idle or standby state.
The third objective is the overall resource costs including PMs and bandwidth: The former two parts are consistent with the objective of low level ((, )).However, low level only considers resources inside each DC.High level considers the inter-DC bandwidth cost in the third part additionally.Inter-DC bandwidth definition is similar to inter-PM bandwidths (2), (3): The high level optimization can be summarized as multiobjective programming (MOP): min ,, (13) , ( 14) , ( 15) 3) , ( 11) , ( 12) , ( 16) , ( 17) , (18)   ∈ {0, 1}  = 1, . . ., ,  = 1, . . ., , where  and  are the solutions of the low level programming.Obviously, each level has its own objectives and constraints.The high level objective value depends on the optimal Encoding: solutions of the low level.Needless to say the first two parts of  3 which are just what was pursued by the low level, the inter-DC bandwidth ( 3 ) and the overall power ( 2 ), are both subject to the optimal solution of the low server level.Inter-DC bandwidth consists of the traffic between VMs across DCs (16).The overall power is directly related to the CPU load and how many PMs are used.The low level optimizes itself under the determined high level decision variable (DC is determined by high level).In particular, there is a noncooperation relation between DC level and server level.For example, each DC only tries to minimize its resource consumed, but sometimes it is contrary to minimizing the inter-DC traffic because minimizing inter-DC traffic may need to consume more PM resources.This is just the case described by bilevel programming [22].Note that the bandwidth related constraints, ( 2), (4), and ( 16) are nonlinear, MOBLP is nonlinear bilevel programming.Even linear bilevel programming, the simplest one of bilevel programming problems, is proved to be strong NP-hard.The problem formulated herein is NP-hard.GA has been demonstrated as a very efficient scheme to address bilevel programming [23,24].We resort to GA to solve it.

Algorithm
For minimization programming, suppose there are two vectors,  and , with the same dimension ; we say  dominates  if and only if   ≤   for any  and there exists at least  ∈ 1, 2, . . .,  so that   <   .Suppose the feasible solution set of multiobjective programming (MOP) is F; then a solution  * is Pareto optimal to MOP if there is no point  ∈ F, so that () dominates ( * ), where  = { 1 ,  2 ,  3 }.In short, any decrease in one dimension of  must lead to the increase of at least one other dimension.We try to find the multiple Pareto solutions for MOBLP.
Distinctive encoding and decoding, initial population generation scheme, and genetic operators are designed to address the multiobjective bilevel resource provision problem.

Encoding and Decoding Scheme.
In GA, it is very important to reflect the structure and information of the problem to be optimized in the genes of chromosome (it is assumed that readers are familiar with the structure of GA, otherwise please refer to [25] for details).Considering the characteristic of multi-BC and two levels of , we propose a segmented two-level grouping encoding scheme as depicted in Figure 2.
The entire candidate PMs and DCs are numbered first.The encoding gives the serial number (SN) of PM to which each VM is assigned and DC to which each PM belongs.It consists of  segments in series and is encoded as  =  1 ;  2 ; . . .;   .Segments are separated by a semicolon and   corresponds to BC  for  = 1, 2, . . ., .The structure of   is the same as the encoding of MLGGA [13].It comprises three parts.The first is SN of PMs used.The second is SN of DC to which each PM belongs.The third is DC in the second part after deleting the repeated ones.The three parts are isolated by a colon.For example, suppose there are DCs , , , PMs  ∼ ; the relation of belongingness of the PM and DC is as follows:  = , , ,  = , ,  = .BC 1 requires  1 = 10 VMs and they are assigned to PM .The corresponding DCs are .They are abbreviated as  by deleting the latter repeated ones.Then  1 is : : .
The new encoding scheme lists the PMs and DCs to which the VMs of each BC are assigned.It can capture the placement of VMs of multiple BCs as a whole and facilitate the competitive scenario resolution.Therefore, better solutions can be found.It also remedies the incompetency of MLGGA which can only place VMs of one BC one time and achieves better performance as revealed in Section 5.The decoding is selfevident.
Distinctive initial population generation scheme and genetic operators are designed to address the multiobjective bilevel VDs placement problem.

Initial Population Generation.
The Pareto solution aims to optimize each scalar objective   .So we strive to embody the optimum of each   in the initial population.Solutions for the solo member objective are produced so that the initial population has a rather good gene to be inherited by the offspring.
For  1 , the shortest delay VM placement algorithm (SD) is proposed in Algorithm 1.For BC , only those DCs, delays between them, and BC  which do not exceed  are considered.Denote those candidate DCs for BC  as feasible DC set F  .For each BC, SD prefers placing VMs of this BC in closer DC in F  until all VMs are assigned.This procedure is repeated for all BCs.
In Algorithm 1, we can replace the sorting criterion   with electricity price   .Then we have another method which strives to place VDs in feasible DC with the lowest electricity price.We denote it as LeastPowerCost (LPC).LPC aims to optimize  2 .  ← (  \VMs assigned to   ) (6) if |  | = 0 then (7) break (8) end if (9) end for (10) end for (11) encoding the solution as  according to Section 4.1 Algorithm 1: Shortest delay VM placement algorithm (SD).

Input
For  3 , a modified first fit decreasing algorithm (MFFD, Algorithm 2) which takes into account both nodes and inter-PM bandwidth optimization is depicted in Algorithm 2. It strives to place VDs in the --cluster with the largest capacity in a DC which is the largest one in F  .The communication cost is defined as the number of switches or routers in the path [11].The PMs with the same  cost are named as -cluster.There are three kinds of clusters in topology in Figure 1.For example, PM1∼PM4 is a 1-cluster.PM9∼PM16 is a 3-cluster.PM5∼PM8 and PM9∼PM11 are both 5-clusters.Each time, we prefer the largest capacity cluster with the smallest cost, because bigger cost means more aggregate links and core links will be used.The overconsumption of these relatively scarce top layer links may further lead to congestion and communication delay.This tactic can reduce consumption of the higher layer bandwidth.In the process of placement, once a -cluster is selected, another cluster with the same cost  will be selected in priority if both clusters can constitute a ---cluster.This can ensure the effect of consolidation and save more links between clusters.For example, if PM1∼PM4 is the cluster with the biggest capacity in all 1-cluster, it will be selected first and then PM5∼PM8 cluster is considered unconditionally because they constitute a 2-cluster.
The capacity of a PM    is defined as that in [26]: that is, the sum of all the resources dimensions of all the PMs in the cluster.Or if there is at least one VM in this PM, where  ℎ is the normalized factor for dimension ℎ.  ℎ  is the ℎ-dimension residue capacity of PM    .The reason is that, in the computer, if any dimension is used up, then the PM cannot support any more VMs.The corresponding cluster capacity is defined as the sum of the capacities of PMs in the cluster and similar for the capacity of DC.Capacity of VM   is similar to (21) except that  ℎ  is replaced by  ℎ  .MFFD strives to place VDs in the --cluster with the largest capacity in the largest DC chosen from F  .
MFFD, SD, and LPC will be invoked  times, respectively, to produce 3 *  initial feasible solutions.This scheme can produce a rather large initial population and the three groups of population embody a relatively good assignment for the three objectives, respectively.Thus, the initial parents are endowed with some optimal property.In the latter crossover and mutation, though the initial solution will be replaced by a new one which dominates this solution, the size of the initial population remained at least 3 *  so that the GA can converge faster.
Lines ( 2)-( 21) of MFFD describe the placement scheme for one BC.We denote them as MFFDOneB.Compared with line (2) where DC with a larger capacity is preferred, in MFFDOneB, the DC, which has the smallest residue capacity and has been used, has a higher priority for selection, so as to take full advantage of the residue capacity and reduce the number of DCs used.Lines (4)-( 21) of MFFD describe the placement scheme inside one DC.We denote then as MFFDOneDC.MFFDOneDC mainly works for , while SD, LPC, MFFD, and MFFDOneB work for   ( = 1, 2, 3).

Crossover Operator.
The crossover is applied to the segment one by one and every segment can produce an effect on other segments if the BCs corresponding to these segments compete for the same DC.Crossover operator is depicted in Input: : numbers of BCs F  : feasible DC set for BC  and |F  | =   : cost matrix of each DC as illustrated in Figure 1   : VMs set for BC  Output: solution encoding  (1) for  = 1, . . .,  do (2) Sort DCs in F  according to their capacity.Bigger DC is selected with higher probability and denote the sorted sequence as  1 ,  2 , . . .,    (3) for  = 1, . . .,   do (4) while (|  | ̸ = 0) and (exists    not be searched) do (5) Find smallest-cost-PM cluster with the largest capacity.Suppose there are total   PMs in this cluster (6) Sort these PMs in non-increase capacity ( 21) and ( 22) order as   1 ,   2 , . . .,     (7) for  = 1, . . .,   do (8) Select the biggest non-assigned VM from   and put it into ← (  \VMs assigned to    ) (10) if |  | = 0 then (11) break (12) end if (13) end for (14) end while (15) if |  | = 0 then (16) break (17) end if (18) end for (19) ) and (all    is searched) then (20) There is overflow, return FAILURE (21) end if (22) end for (23) Encoding the solution as  according to Section 4.1 Algorithm 2: Modified first fit decreasing algorithm (MFFD).

Input: 𝑋, 𝑌: two individuals
Output:   ,   : two individuals after crossover (1) for  = 1, . . .,  do (2)   () crossover with   () according to MLGGA [13] with the exception that when "competition" occurs then the upper described scheme is used (3) end for Algorithm 3: Crossover operator.Algorithm 3.   () denotes the segment corresponding to BC  in encoding .For each segment, the mechanism of the crossover operator in MLGGA is adopted.But to capture the scenario of multi-BC, we propose twofold exceptions.
First, the classic FFD used in MLGGA is replaced by MFFDOneB when placing VMs of one BC or MFFDOneDC when placing some VMs inside one DC, respectively.Second, a new technique is recommended to address the "competition" case in the multi-BC scenario.In MLGGA, it tries to inherit the property of the parent and keep the VMs in the inserted group (PM or DC) unchanged.So it will clear the different resident VMs in the same group in the target chromosome in advance and keep the common VMs.For example, there are two chromosomes,  1 : : : ||, where DC  contains VM 4 and VM 8, and  2 : : : ||, where DC  contains VM 2 and VM 4.Here the same alphabet with different case represents the same PM or DC.Therefore,  and  indicate the same DC.| means crossover point.For target chromosome  1 , when crossover operates,  will be replaced by .Now there are two same DC  and  in the offspring of  1 .So VMs in  should be cleared so that VMs in  are kept unchanged; that is, VM 8 in  will be reassigned by FFD and VM 4 is preserved.But in the multi-BC scenario, maybe  contains many VMs of other BCs.So when there are too many VMs in ,  has not enough residue capacity to host these VMs.This is just the case of "competition" where BCs compete for the same PM or DC.We propose the competition and change it to one DC with lower electricity price with higher probability in F  , denotes as   (3) if (  is used by ) or (  is used by another BC (such as   )) or (  is shared by BC  and BC   ) then (4) Clear   .Assign VMs of BC  in  to   by MFFDOneDC in priority (5) The overflow VMs of , if any, are assigned to other DCs in the order of non-increase order of electricity price in F  .Inside each DC, the assignment is completed by MFFDOneDC (6) The VMs of  originally in   are assigned to   by MFFDOneDC.(7) The overflow VMs of   , if any, are preferentially assigned to   , then to other DCs in the order of non-increase order of electricity price in F   .Inside each DC, the assignment is completed by MFFDOneDC ( 8) else (9) Place the VMs in   by MFFDOneDC (10) end if (11) if All VMs are assigned successfully then (12) Clear the VMs of BC  in DC  (13) else (14) Remain the original assignment before mutation unchanged (15) end if (16) end for Algorithm 4: Mutation operator.
resolution scheme and allow one BC to drive out VMs of other BCs as follows.The resident VMs of the other BCs and this driving BC in  will be cleared first.Thus ensure VMs in  are kept unchanged and the group property of the parent is inherited.Then the cleared VMs are reassigned in  by MFFDOneDC according to the following BC order: first this driving BC and then another randomly selected one.The procedure is repeated until all BCs are processed or  is full.At last, the overflow VMs are reassigned to the feasible BCs of the BC being served, by MFFDOneB.Crossover can reduce both delay and PM and network cost.

Mutation
Operator.The mutation happens in the third part of each segment, that is, DC in F  , thus leading to VMs replacement in PMs belonging to the mutated DC.There are three possible scenarios for the mutation.The first is that DC mutates to an idle candidate.The second is that it mutates to a DC which is used by the same BC.The last is that the target DC has been used by other BCs, and therefore this DC/PM is competed for by two BCs.The latter two scenarios may coexist.Because the newly added VMs may cause a violation of capacity or exceed the upper limit of the BC, some VMs may overflow; that is, the resident VMs need to be reassigned.This facilitates the changeover of DCs/PMs for two BCs so that resources can be balanced between them.See Algorithm 4 for details.In line (7), the overflow VMs of BC   are preferentially assigned to   because maybe   is sill not full after VMs of BC  are assigned.
Power cost optimization is mainly fulfilled by mutation operator in that the electricity price differs at DC level.

Simulation Results
DC network is simulated in a 1400 * 1400 grid in - plane.Generally, the DCs hold the property of clustering.80% of DCs follow a normal distribution and 20% of DCs are selected uniformly from the grid.The distance between DCs is -distance and the number of PMs inside each DC follows  .Configurations of PMs are borrowed from IBM System x M5 server and System x3300 M4 server [27].Four classes of PMs equipped with a 1 Gbps Ethernet card are simulated.Considering the proportional configuration of PMs, we simply give each class a price instead of giving every resource a unique price.For the resource requirements of VMs, we adopt the four kinds of configurations of Amazon m3-serials [28] (for consistency with PM, GiB is replaced  by GB).m3-serials are designed for general purpose and are very suitable for VDs.Table 1 lists the details.For each DC, we simulate a tree-like topology (Figure 1).Each core switch administrates 15 aggregate switches.Each aggregate switch administrates 2 access switches.5 PMs are connected to the same access switch.Depending on VMWare [29], a virtual machine cannot have more vCPUs than the number of logical cores of the host.The number of logical cores is equal to the number of the physical cores if hyperthreading is disabled or at most twice that number of the physical cores if hyperthreading is enabled.So we suppose there is a one-to-one relation between vCPU and physical core.
For multi-BC scenario, the number of VDs each BC requires is chosen uniformly between 20 and 300.Traffic between VDs follows  (0-1) Mbps [9].For the VDs in the same BC, all of them communicate.For the VDs belonging to different BCs, only 10% VDs communicate.The bandwidth prices  1 = 0.01,  2 = 0.03, and  3 = 0.05 are attached to access, aggregate, and core layer, respectively.The long distance inter-DC bandwidth price is  = 0.1.
The electricity price pool is from the data of August, 2015, of EIA [30].Each simulated DC is equipped with a random price selected from the pool.The transmission delay is measured by distance; herein we use 300 as the threshold.We assume the queueing delay is same and therefore it is omitted.We adopt the idle or standby power consumption    as 60% of the peak power [21].
The initial solution size of STLGGA is 3 *  and  = 20.Our simulation is realized with Matlab.All numerical experiments stop after 30 thousand iterations.In average, it takes about 105 seconds and is a little slower than MLGGA which will take about 81 seconds as claimed in [13].This is because STLGGA need to deal with the competition scenario.In all the simulations, the numerical results are the average of all the Pareto solutions for multiobjective programming.
We use a latest proposed unified algorithm, MLGGA [13], as the baseline.Because each time it can only place the VMs of one BC, the objective  3 cannot be calculated.Therefore,  1 and  2 are used to calculate the fitness values.MLGGA is invoked for a randomly selected BC  in F  to optimize  1 and  2 .A solution is chosen randomly from the Pareto solutions as the assignment scheme of .Then, other BCs are traversed on the basis of the remaining resource after the deployment of VDs of previous BCs, until the VMs of all BCs are assigned.Now  3 can be calculated based on the results.The simulation results are detailed in Section 5.1.We also compare the two algorithms for one BC case in Section 5.2, where each algorithm pursues three objectives defined in MOBLP.Both scenarios validate the effectiveness of STLGGA.

Simulation Results for Multi-BC Scenario.
To investigate the scale efficiency of STLGGA, we vary the number of BCs from 5 to 15. Figure 3  in Figures 3(a)∼3(c).With the number of BCs increasing, delay, accompanied with power and resources, increases for both algorithms.STLGGA results in an average of 13% shorter delay, which also means the communication of users supported by VDs deployed in different DCs becomes much faster.When STLGGA is used, power and resources are saved by 21% and 6% on average, respectively.
Resource efficiency is detailed in Figures 3(d)∼3(g).STLGGA uses fewer PMs.The average reduction is about 27%, that is, 118 PMs (Figure 3(d)).Because of the heterogeneity of PMs, we also compare the PM resource consumed.STLGGA leads to 1% ∼19% resource cost saving.On average, about 9% of cost is saved (Figure 3(e)).Fewer PMs indicate that less power is needed to keep PMs active.Therefore, power efficiency is improved and the total power is reduced.This further backs up Figure 3(b).On average, STLGGA also leads to a reduction of 13% of the expensive inter-DC traffic (Figure 3(f)).Because the total traffic between all VMs is a determined value, STLGGA saves more expensive long distance inter-DC bandwidth by converting more inter-DC traffic to intra-DC traffic at the cost of relatively cheaper intra-DC bandwidth, including access, aggregate, and core layer one.The traffic across the three layers produced by STLGGA is more than what was produced by MLGGA.But the cost of the total required bandwidth produced by STLGGA inside DC is much less than what was produced by MLGGA, that is, an average reduction of about 13%.It is consistent with our purpose (objective ) to optimize the bandwidth cost inside DC (Figure 3(g)).
Suppose 5 BCs apply for VDs.We study VDs placement with different scales of DCs varied from 4 to 60.The capacity of DC and the number of VDs requested for each BC remain as before.Figure 4 demonstrates the three objectives.On average, 5% delay is shortened.10% power and 5% resource cost are saved, respectively.The detailed resource comparison results show the same tendency as Figures 3(d)∼3(f) and are omitted here.
It is noted that, naturally, it is expected that the solution quality will improve because as the number of DCs increases, there are more candidates.But in reality, the Pareto solution tries to balance the three objectives and the figures appear in a nonsmooth phenomenon.The delay displays an uptrend when DCs increase from 35 to 50 (Figure 4(a)), accompanied by the declining of power cost (Figure 4(b)) and resource cost (Figure 4(c)).But when delay decreases as DCs increased from 55 to 60, the latter two objectives go upward.This is due to the random number of PMs in DC, the location diversity, and random power assignment.It is also observed that STLGGA still performs much better than MLGGA in the latter two objectives.This comes at a cost of a little bigger delay when 60 DCs are searched.Generally, with the number of DCs increasing, all the three figures show a downtrend.

Simulation Results
for Single BC Scenario.We also examine the VDs placement when there is just one BC.The number of VDs being applied for varies from 500 to 1000.This time, MLGGA is invoked to optimize  1 ,  2 , and  3 simultaneously within the feasible DCs, now that  3 can be calculated.The average results of the Pareto solutions for both algorithms are reported in Figure 5. Similar to the results of multi-BC, STLGGA outperforms MLGGA for the three objectives, as well as for PM number, PM resources, inter-DC traffic, and total required bandwidth cost.
This further validates the idea that STLGGA not only works well for multi-BC, but also does for a single BC.

Conclusion
Considering the bilevel resource provision for the deployment of virtual desktops of multi-BC in distributed cloud,  service delay, power efficiency, and cost optimization are explored in this paper.The problem is formulated as multiobjective bilevel programming which captures the noncooperative relation of DC network level and server level.So it can facilitate the optimization of nodes and bandwidth cost of both levels without violating the delay threshold, while striving to further minimize the maximum delay of each BC.Because of the NP-hard nature of the problem, a segmented two-level group GA is proposed.Novel coding, initial population production, and operators schemes are tailored to address the problem.The effectiveness of the algorithm is validated by extensive simulations.The algorithm outperforms the baseline algorithm in both multi-BC and single BC scenarios.
Though we focus on VDs deployment, it is just one applicable object of the proposed formulation and algorithm.They can also be applied to the placement of VMs to support any location-sensitive or delay-sensitive services [31] in distributed clouds, such as VOD [32] and big data [33].
In this paper, we only consider different electricity prices of DCs in energy cost optimization.But it cannot reflect the utilization of renewable energy.Because renewable energy, such as solar, wind, and tidal energy, is varying with time and regions, VDs can be migrated to exploit them more efficiently within the delay threshold [34,35].In our future work, we aim to utilize more renewable energy by leveraging the widespanned distributed DCs over the globe, so that not only economic, but also social benefit can be achieved.

Figure 1 :
Figure 1: Virtual desktops in worldwide distributed cloud computing.

Input:
: input individual F  : feasible DC set for BC  and |F  | =   Output:   : individual after mutation (1) for  = 1, . . .,   do (2) Randomly select one DC  in the third part of   ()

Figure 4 :
Figure 4: The efficiency comparison with the number of DCs increasing.
Objective and Constraints.Low level is server level.It aims to select PMs in DC determined by high level (DC level, as illustrated in Figure1).We will place   VMs in DC .Note that maybe the VMs serving different BCs will be placed in one DC.The number of VMs is fixed; that is, ∑  =1   = ∑  =1   .Each VM  (  ) requires  kinds of  ℎ and   is  ℎ and  ℎℎ equals 0. Herein we suppose  ℎ =  ℎ in that a bidirectional link normally has equal bandwidth in each direction.If  ℎ ̸ =  ℎ we select the biggest one as the traffic.

:
(5)umbers of BCs F  : feasible DC set for BC  and |F  | =     : VMs set for BC  Output: VM placement solution encoding  (1) for  = 1, ...,  do (2) Sort DCs in F  according to   .Closer DC is selected with higher probability and denote the sorted sequence as  1 ,  2 , ...,    (3) for  = 1, ...,   do (4)Randomly select non-assigned VM from   and put it into a random non-full PM in   until   is full or reach the upper limit(5)

Table 1 :
2. Applying Algorithm 3 to  1 , 2. Produce two offspring ,  (5) Mutation.Applying Algorithm 4 to ,  to produce   ,   (6) Update of C. For each  in C, If (  ) ≤ (), then  =   If (  ) ≤ (), then  =   The size of C is kept not less than 3 *  (7) Update of . Remve from  all the points  if () is dominated by (  ) and (  ) Add (  ) to  if not exist point  in  so that () dominates (  ) Add (  ) to  if not exist point  in  so that () dominates (  ) (8) end while Algorithm 5: The unified genetic algorithm (STLGGA).PMs resource configurations and VMs resource requirements.