Task assignment in grid computing, where both processing and bandwidth constraints at multiple heterogeneous devices need to be considered, is a challenging problem. Moreover, targeting the optimization of multiple objectives makes it even more challenging. This paper presents a task assignment strategy based on genetic algorithms in which multiple and conflicting objectives are simultaneously optimized. Specifically, we maximize task execution quality while minimizing energy and bandwidth consumption. Moreover, in our video processing scenario; we consider transcoding to lower spatial/temporal resolutions to tradeoff between video quality; processing, and bandwidth demands. The task execution quality is then determined by the number of successfully processed streams and the spatial-temporal resolution at which they are processed. The results show that the proposed algorithm offers a range of Pareto optimal solutions that outperforms all other reference strategies.
Nowadays multimedia applications such as multicamera surveillance or multipoint videoconferencing, are increasingly demanding both in processing power, and bandwidth requirements. In addition, there is a tendency towards thin client applications where the processing capacities of the client device are reduced and the tasks are migrated to more powerful devices in the network.
In this respect, grid computing can integrate and make use of these heterogeneous computing resources which are connected through networks, overcoming the limited processing capabilities at a client’s device.
In the context of distributed media processing we can think of scenarios such as video control rooms where multiple video streams are processed and simultaneously displayed. One way to downscale the processing and bandwidth requirements at the displaying device is by transcoding the video streams at the servers to lower temporal or spatial resolutions. This is done, however, at the cost of a degraded perceived video quality and an increased processing cost at the server. Therefore, in grid computing we may need to optimize and trade off multiple objectives, targeting for instance quality maximization of the stream execution and minimization of the energy consumption on the client/servers simultaneously. In this respect, implementing a suitable strategy for task assignment/scheduling becomes crucial for achieving a good performance in grid computing. This subject has been thoroughly studied in literature, and various heuristic approaches have been widely used for scheduling. In particular, Genetic Algorithms (GAs) have received much attention as robust stochastic search algorithms for various optimization problems. In this context, works such as [
In [
Note that the presented works only consider single objective optimization. It is in works such as [
However, in none of these works [
In our approach, we use GAs to target multiple objectives for task assignment in grid computing and we consider bandwidth availability between nodes. Moreover, in comparison with all related work presented, in our analysis, we introduce an extra dimension on the task assignment problem by considering the downscaling of the video streams to lower spatial/temporal resolution. This offers a tradeoff between bandwidth and processing constraints on one hand and perceived video quality on the other hand. By doing this the effective system capacity to process tasks is increased while a graceful degradation of the video stream quality is allowed. Additionally, we target multiple objectives such as task quality maximization, client’s energy minimization, and minimization of the bandwidth usage.
The rest of the paper is structured as follows. Section
In the context of distributed video processing, we are considering a scenario such as the one of a video control room. Several video contents are streamed towards the client device, where the content is visualized, while the required video processing can be distributed between the client and other processing nodes such as servers.
We assume all processing nodes to be heterogeneous with a different amount of processing resources, such as CPUs and GPUs. In addition, we assume that the client’s device has more limited processing capacities than the server nodes. Concretely, we consider 4CPUs and 1GPU at each server node, while only 2CPUs at the client node. We assume moreover that multiple codec implementations for these different processor types are available. To overcome the limited processing at the client node, we perform distributed processing over other nodes in the network. In this case, the decoding task is executed at a server, and the resulting output (raw video) is transmitted to the client’s device. Note that this highly increases the bandwidth requirements, which should fit in the maximum available bandwidth towards the client that we assume of 1 Gbps and shared from any server node to the client node. Therefore, to fit both processing and bandwidth requirements, one possibility is to trans-code (decode and reencode at a lower temporal or spatial resolution) the video streams at the server’s side. This lowers both its bandwidth and decoding processing requirements at the end device at the cost of a reduced perceived quality and increased server processing.
The following section describes the task assignment strategies used in the scenario described.
An efficient task assignment strategy is a key element in the context of distributed grid computing. In this section, we describe the assignment strategies that we implement for comparison with our evolutionary-based approach.
The stream processing tasks are assigned in turns on the different available processing elements, that is, client device and server nodes.
This is a well-known heuristic [
We assign all video streams to be spatially trans-coded at the server nodes. This lowers the processing requirements for decoding at the client devices while it also reduces the bandwidth usage. However, this happens at the cost of a reduced quality of the video streams. As trans-coding is an intensive task (decoding plus encoding) the trans-coding of the streams is evenly distributed among the available servers (by means of round robin) to avoid processing overload of a server.
In addition to the presented strategies, we implement a strategy that targets the maximization of the quality of the stream assignment. We describe this strategy next for the case of 1 server and 1 client node, where
We consider that the assignment and execution, stream assigned to be decoded at the client at original temporal and spatial resolution. stream assigned to be decoded at the server at original temporal and spatial resolution. stream transcoded to lower temporal resolution at the server. stream transcoded to lower spatial resolution at the server. stream transcoded to both lower temporal and spatial resolution at the server.
This way, our task assignment consists of a set of
We want to find a task assignment solution whose bandwidth and processing demands at client and server fit within the bandwidth and processing constraints:
In Step 1, the algorithm assigns as many stream decoding tasks as possible to execute on the client device; this number of tasks is constrained by the processing power at the device.
In Step 2, the remaining tasks, exceeding the processing power at the client device, are assigned for processing at the server.
Then, we check if the current assignment meets bandwidth and processing constraints. While either bandwidth or processing constraints at client or server are not met, the algorithm will gradually transcode video tasks to lower temporal or spatial resolution at the server (done in Step 3) or will migrate some of the decoding tasks from the client device to the server (done in Step 4). This process continues till the assignment fits the system bandwidth and processing constraints.
In Step 3 we proceed as follows. Find those stream which are currently assigned at original temporal and spatial resolution to the server If there are no streams available at full temporal resolution, then we take a stream at lowered temporal resolution If all streams have been spatially transcoded, then we pick one of them (at highest bandwidth) and transcode it both temporally and spatially
Note that at this point (Step 3), we are trying to find those stream tasks (
This procedure is repeated till the bandwidth constraint is met.
Finally, in Step 4, if the processing constraints at the client are exceeded we migrate one client task to the server side. If at the server’s side the processing constraints are not met and all streams have been spatially and temporally transcoded, the assignment loop is stopped. It is not possible to downscale the stream tasks further, and therefore we cannot find an assignment that satisfies all constraints while processing all streams.
If the task assignment exceeds any of the system constraints in (
Related to this, we can attach to each assignment solution a corresponding cost in terms of end video quality, bandwidth usage, and energy consumption. This cost is determined by how many stream tasks are successfully completed and how (on which device and at what spatial-temporal resolution) they are executed.
Therefore, for a specific stream assignment solution we first need to estimate which stream processing tasks can be successfully completed and which will fail due to not meeting current processing and bandwidth constraints. Then, depending on the specific execution of each individual stream processing, we can attach a cost, in terms of quality, consumed bandwidth, and energy at the client’s side, as defined in Table
Definition of task execution costs (SD resolution).
Stream Execution | Quality | Bandwidth | Energy |
---|---|---|---|
Failed execution | 0 | 0 Mbps | 0 |
Decoding at client | 1 | 20 Mbps | 1 |
Decoding at server | 1 | 146 Mbps | 0 |
Temporal transcoding | 0.9 | 10 Mbps | 0.5 |
Spatial transcoding | 0.8 | 7 Mbps | 0.25 |
Temporal and Spatial transcoding | 0.7 | 3.5 Mbps | 0.12 |
Note that the data in Table
A stream processing fails when it does not fit within the available processing, or bandwidth resources. For instance, the node where the stream is assigned may not have sufficient processing resources or even if the processing is completed at the server, the available bandwidth could be insufficient to deliver the server’s output to the client causing the stream processing to fail.
We attach successfully processed streams a quality value of 1 when the content is displayed at the client at its original temporal and spatial resolution. If the video stream is down-scaled to a lower temporal/spatial resolution in order to fit bandwidth or processing constraints, the perceived video quality will be slightly degraded, and therefore, we attach a lower quality value. This favors that to maximize the streams quality, assignment solutions where the original spatial and temporal resolution of the streams are kept are preferred. Note that the quality value of any stream at its original resolution (CIF, SD, or HD) is identical; only in transcoding, we consider the quality degraded; that is, an HD-streamed spatially transcoded (to SD) is attached a 0.8 quality (distortion of 0.2) while a stream at original SD resolution is attached the maximum quality of 1 (0 distortion).
The bandwidth cost per stream is also dependent on how the stream processing is performed. This way, if decoding is performed at the server’s side, the stream is transmitted raw to the client, which highly increases the bandwidth requirements. On the contrary, if the stream is transcoded at a server to a lower spatial or temporal resolution, the bandwidth requirements are reduced. For the sake of simplicity, we assume the same bandwidth cost for all video streams with the same spatial-temporal resolution. In addition, we consider that reducing the temporal resolution from 30 frames per second to 15 approximately reduces the bandwidth by half. Similarly, we assume that reducing the spatial resolution to the immediate lower resolution roughly reduces the bandwidth to approximately one-third of the original resolution.
Finally, in terms of energy/processing cost at the client’s device, we assume that the energy cost is negligible when the video decoding task is executed on a server, and the raw output video stream is merely transmitted to the client device for display. When the decoding task is executed at the client, the corresponding energy cost is dependent on the temporal and spatial resolution of the decoded stream. We assume that decoding a video sequence at 15 fps requires approximately half of the processing/energy than decoding the same sequence at 30 fps. In a similar way, when the spatial resolution is lowered, for example from SD to CIF, we can roughly assume 1/4 of the decoding energy costs. Finally, the combination of lowering temporal and spatial resolution corresponds to a decoding cost of 1
To obtain the total quality TQ, bandwidth TBW
The heuristic and strategies presented in the previous section target at most one single objective optimization. However in practice we may want to optimize multiple objectives simultaneously. For instance, we may need to maximize the video streams quality while minimizing the bandwidth usage and the energy cost at the client. This multiobjective optimization is challenging, especially when multiple heterogeneous nodes and multiple ways of processing the streams (decoding, trans-coding) are considered. To achieve this, we base ourselves on genetic algorithms and use the concept of Pareto fronts of solutions. This allows us to obtain a set of Pareto optimal assignment solutions from which we can choose the solution that best meets the constraints or our preferences towards a certain objective. In addition, a genetic algorithm is a flexible tool where the target objective can be easily modified. The remainder of this section describes how the genetic algorithm is implemented.
A genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution. In a genetic algorithm, a population of strings (called chromosomes), which encode candidate solutions (called individuals of the population) to an optimization problem, evolves toward better solutions. The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are selected from the current population (based on their fitness), and modified (recombined and possibly randomly mutated) to form a new population. The new population is then used in the next iteration of the algorithm. Generally, each generation of solutions improves the quality of its individuals. The algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached. The structure of our genetic algorithm can be summarized as follows.
Initialize the population of chromosomes.
Evaluate each chromosome with the
Random
Elitist
Repeat Steps
Traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are also possible. In our case, we use a decimal representation. Each possible stream assignment solution is represented as a chromosome, which is composed of several gens. In our case, the length of the chromosome is equal to the number of streams that need to be scheduled in the system. Each of the genes in the chromosome represents the node that is going to process the stream and how it is going to be processed. Table
Description of genes.
Gene value | Meaning on task execution |
---|---|
“1” | Decoded at client device S0 |
“2” | Decoded at S1 and transmitted to S0 |
“3” | Transcoded at S1 to lower temporal resolution |
“4” | Transcoded at S1 to lower spatial resolution |
“5” | Transcoded at S1 to lower spatial-temporal resolution |
“6” | Decoded at S2 and transmitted to S0 |
“7” | Transcoded at S2 to lower temporal resolution |
“8” | Transcoded at S2 to lower spatial resolution |
“9” | Transcoded at S2 to lower spatial-temporal resolution |
Figure
Example of chromosome.
We now detail how to compute the cost of the assignment solution in Figure
For streams 1 and 2 decoded at the server at full resolution and transmitted raw to the client:
In general terms we initialize the population of assignment solutions by random generation. We also include in the initial population solutions that contribute to distribute the tasks processing evenly among the existing processing elements as well as solutions that imply transcoding of all processing tasks (as this may facilitate convergence to suitable assignments in high load scenarios). By doing so, we are making sure that certain potentially useful features are present in the population.
With respect to the population size, its optimal value is highly dependent on the scenario dimensions. In our case, we experimented with populations of size 10 to 40 and determined experimentally that a population of size 30 was suitable for the considered scenarios.
In addition, in a dynamic scenario where the number of tasks to be processed may be varying over time, we can improve convergence by reusing previously found solutions as part of the initial population for a new scenario. For instance, if the stream tasks to be processed increase from
The goal of the fitness function is to evaluate how good an assignment solution is with respect to the defined target objectives. If we consider the optimization of a single objective, for instance, maximization of the video quality, we can define the fitness function as the quality value of every assignment:
If we are considering multiple objectives such as video quality, bandwidth usage, and energy consumption at the client device, a particular assignment solution will result in a certain value in these three axes. In other words, each assignment solution can be represented as a point in the multi-objective space with the objective values (
Each of these values is obtained as the sum of distortion, bandwidth and energy for all
We are therefore interested in obtaining a range of Pareto optimal solutions in said multiobjective space. In addition, we evaluate the fitness of an assignment solution according to how close the solution is to a Pareto point or to the actual Pareto envelope.
In Figure
Pareto front of solutions.
To evaluate the fitness of each solution point, we compute the Euclidean distance from each point to the closest point in the hypothetical Pareto front. In a three-dimensional objective space, this is expressed as
During each successive generation, a proportion of the existing population of solutions is selected to breed a new generation. Individual solutions can be selected through a fitness-based process, where fitter solutions (as measured by the fitness function) are typically more likely to be selected. In our case, we do not use a probabilistic selection but an elitist selection, that is, the fittest members of the population are used to breed a new population. Moreover, after crossover and generation of new child solutions, we apply again elitist selection and retain in the solution space those solutions that are the fittest among the parents and the newly generated children solutions. We use the fitness function as described in the earlier section. In practice, this means that for the selection of the parent chromosomes during the crossover and mutation steps, we select the Pareto optimal points from the pool of chromosome/solutions (as these are the fittest points in our space) and from the non-Pareto points, we take the fittest ones.
The elitism we apply in the selection is similar to the one applied in the Nondominated sorting GA or NSGA [
The
Single-point crossover.
Crossover is applied on a percentage of the population given by the
We use a high mutation rate in our approach, as this helps the algorithm avoid local minima. Nevertheless, we do not risk losing good features of the solution space, thanks to the elitist selection applied after crossover or mutation, that is, the fittest solutions are always kept; therefore, if the mutated solutions are less fit than the original solutions, the original solutions are retained.
The way the Genetic Algorithm evolves towards fitter solutions is highly dependent on the parameter selection. In this respect, the percentage of the population on which the crossover and mutation steps are implemented is given by the crossover and mutation rate parameters, respectively. As explained in the previous section, a high mutation rate is selected to prevent the algorithm from a too early convergence and falling in local minima. A similar approach is taken in [
Table
Assignment quality versus population size.
Nos. of tasks/population | 10 | 20 | 30 | 40 |
---|---|---|---|---|
10 | 10 | 19.7 | 28 | 35.9 |
20 | 10 | 19.9 | 28.4 | 36.8 |
30 | 10 | 19.9 | 28.7 | 36.5 |
40 | 10 | 19.9 | 28.4 | 36.6 |
In addition, we observed that to reach a high-quality solution, it is advisable to use a population with at least the same size than the number of tasks considered. Therefore, for the number of tasks considered in our scenarios, a population size of 30 individuals proves to be suitable. In this respect, we observed that bigger populations cause slow convergence and increased execution cost while small ones tend to evolve to less fit solutions.
In terms of number of iterations, we let the algorithm evolve during 30 generations. This value is experimentally found to be a good tradeoff in terms of achieving a good convergence while still having a reduced execution time.
Table
GA parameters.
Crossover rate | 0.7 |
Mutation rate | 0.3 |
Population size | 30 |
Max generations | 30 |
One way to analyze the convergence of our multiobjective GA is by measuring the area/volume under the Pareto front of solutions in the multiobjective space. The reason is that the minimization of several objectives in our GA translates to Pareto fronts becoming closer to all objective axes, in other words, Pareto fronts with lower areas/volumes underneath.
Figure
Convergence versus iterations.
As explained earlier, tenths of iterations are sufficient in our scenario to find good assignment solutions that outperform the reference methods.
Our genetic algorithm is an in-house-developed Matlab code and does not form part of the Matlab Optimization toolbox. The code has not been optimized for speed and its average execution time for 10 iterations of the algorithm is in the order of a couple of seconds. In this respect, the computational cost of the reference methods such as MaxQ and Min-Max is almost negligible with respect to GA. However, these methods achieve suboptimal results and are not able to tackle a multiobjective optimization.
Note that genetic algorithms are subject to parallelization, which can speed up its execution considerably. Therefore, a more dedicated and optimized implementation of the algorithm exploiting parallelism would highly reduce its execution time. However, developing such algorithm is out of the scope of this paper. In previous work such as [
In addition, the computational load of the GA is marginal when compared to the high-computational load of any transcoding, decoding operation that takes place in the servers in our scenario. Therefore, the execution of the GA can be placed on such a server with high processing power elements such as a GPU.
Last but not least, in our cloud computing scenario we could expect that new stream processing tasks enter or leave the system not faster than every couple of minutes. Therefore, global or partial recomputations of the stream assignments are not frequently needed.
To give an indication of how the complexity of our algorithm scales, Table
Relative execution cost.
Nos. of tasks/population | 10 | 20 | 30 | 40 |
---|---|---|---|---|
10 | 8 | 9 | 8.5 | 8 |
20 | 14 | 15 | 18 | 19 |
30 | 20 | 20 | 30 | 32 |
40 | 27 | 29 | 40 | 41 |
Note also that for large-scale problems, we could address the task scheduling problem in a hierarchical way, that is, tasks can be initially distributed locally among clusters of processing devices and within each cluster; GA can be applied to obtain the optimal assignment. This would limit the complexity increase of the GA optimization.
In this section, we compare the performance of the different assignment strategies considered. We first focus on a single objective optimization, namely, quality, where we use the fitness definition in (
Performance of strategies versus system load.
We then focus on multiple objectives optimization for a specific system load. We consider the processing of 20 video streams of mixed spatial resolutions: CIF, SD, and HD at 30 fps. We compare the different strategies as well as the availability of one single server node with respect to two server nodes where tasks can be migrated to.
Figure
Energy-distortion tradeoffs.
Two sets of Pareto solutions are shown for the GA, one corresponding to the use of 1 server and another to the use of 2 servers. Naturally, having two servers available to process, the tasks allow the GA find better assignment solutions. Similarly, for the reference strategies, we show two points (assignment solution) for each strategy displayed. The two points correspond to the use of 1 server and 2 servers respectively, where generally the use of 2 servers achieves lower distortion but also higher energy consumption (as more streams can be processed).
Note that both distortion and energy values are given as percentages from the maximum possible distortion or energy. This way, the maximum energy cost at the client is defined as the cost of processing the decoding tasks of all streams at full spatial and temporal resolution, while the maximum distortion (100%) corresponds to a failed execution for all streams. In general, the relative distortion is defined as
We can further analyze the assignment solutions found by the different strategies in Table
Assignment strategies for HD/SD/CIF streams.
Failed tasks | Decode @client | Decode @server | Temp trans | Spat trans | Temp & spat trans | Dist (%) | Energy (%) | |
---|---|---|---|---|---|---|---|---|
RR | 0/0/4 | 4/3/2 | 3/4/0 | −/−/− | −/−/− | −/−/− | 20 | 37 |
MM | 0/0/6 | 7/6/0 | 0/1/0 | −/−/− | −/−/− | −/−/− | 30 | 30.4 |
TA | 0/0/4 | −/−/− | −/−/− | −/−/− | −/−/− | 7/7/2 | 36 | 14 |
MaxQ | −/−/− | 2/6/0 | 0/0/1 | 0/0/1 | 0/0/0 | 5/1/4 | 15.5 | 36.3 |
GA_1 | −/−/− | 4/0/1 | 2/5/0 | 1/2/0 | 0/0/1 | 0/0/4 | 8.5 | 28 |
GA_2 | −/−/− | 0/0/1 | 3/5/0 | 0/1/0 | 4/0/1 | 0/1/4 | 13 | 22.5 |
GA_3 | −/−/1 | 0/1/0 | 5/5/0 | 1/0/0 | 1/0/0 | 0/1/5 | 15.5 | 11.6 |
In this table, for the different kind of stream processing assignment, a distinction is made between the different original stream resolutions (HD/SD/CIF). This way, for example, in the assignment of the round robin (RR) strategy 4 CIF streams fail, while 4 HD streams, 3 SD, and 2 CIF are successfully decoded at the client, and 3 HD and 4 SD streams are decoded at the server. Finally, no streams are transcoded in this strategy. By analyzing Figure
However, transcoding all streams at the server nodes (TA strategy) is neither an optimal assignment as we enforce the quality degradation of all streams. Moreover, transcoding tasks are processing intensive and exceed the processing capacity at the servers resulting in some failed stream processing. In this case, the use of 2 servers increases the overall processing capacity but the assignment remains quite suboptimal in terms of distortion and energy.
In contrast, the MaxQ strategy succeeds in finding an assignment with low distortion value (15%). However, its energy cost is relatively high (36%). In Table
It is finally the GA that outperforms all strategies by addressing both objectives and finding a set of assignment solutions, that is, Pareto optimal in both senses. Moreover, having 2 server nodes available for processing allows the GA to find even better tradeoffs (lower Pareto curve) in terms of quality and energy. Note also that the set of GA solutions offers an energy range from 30% to 10% for low distortion values. This way, we can reduce the energy at the client by factor 3 by trading off some stream quality. This offers the flexibility to choose between different operating points at grid level according to how scarce the processing power and energy at the client is.
In Table
Figure
Bandwidth-distortion tradeoffs.
Finally, Figure
Pareto points in 3D space.
As in the previous figures, the assignments found for 1 and 2 servers are displayed. Once again, we can see that the assignments found by the GA outperform all other strategies in terms of distortion, energy and bandwidth while at the same time, it provides a good tradeoff for all three objectives. Indeed, we can see that the GA solution points concentrate around lower distortion values (especially those corresponding to use of 2 servers), lower bandwidth, and lower energy values. For the sake of clarity, in Figure
Projection onto energy-distortion space.
In practice, both processing and bandwidth constraints may vary over time. Therefore, we may require a new stream assignment to fit the new constraints. One possible way to tackle this is by simply rerunning the assignment strategy. However, as the GA strategy already produces a set of assignment solutions with different energy-distortion-bandwidth tradeoffs, another possibility is to simply choose from the set of solutions a different operating point (assignment solution) that satisfies the new constraints. This way, if the current assignment solution requires a processing of 50% at the client’s side, switching to a new assignment with 30% processing may be suitable for a more overloaded client device. We can also cope with variations in bandwidth or processing constraints by targeting more limiting constraints, for instance, 80% of the maximum bandwidth and maximum processing power. By doing so, the assignment is slightly overdimensioned and can cope with variations of up to 20–25% above the current constraints. This would also help avoid too frequent task migrations in the system.
We have presented an evolutionary-based strategy for stream processing assignment in a client-cloud multimedia system where multiple heterogeneous devices are considered. In this context, we not only decide on which node each stream is assigned but we also consider the possibility of stream transcoding to a lower temporal or spatial resolution. This extends the system capacity at the cost of smooth quality degradation in the task execution.
Moreover, both processing capacities in the nodes and bandwidth availability are taken into consideration. The proposed strategy is highly flexible and can target multiple objectives simultaneously. It outperforms all other considered strategies while providing a wide range of tradeoffs in the assignment solutions.
The authors would like to thank Yiannis Iosifidis for his insights on genetic algorithms.