A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems.
Modern computer systems with multiple processors working in parallel may enhance the processing capacity for an application. The effective scheduling of parallel modules of the application may fully exploit the parallelism. The application modules may communicate and synchronize several times during the processing. The limitation of the overall application performance may be incurred by a large communication cost on heterogeneous systems with a combination of GPUs, multicore processors and CELL processors, or distributed memory systems. And an effective scheduling may greatly improve the performance of the application.
Scheduling generally defines not only the processing order of application modules but also the processor assignment of these modules. The concept of makespan (i.e., the schedule length) is used to evaluate the scheduling solution quality including the entire execution and communication cost of all the modules. On the heterogeneous systems [
Heuristic scheduling strategies try to identify a good solution by exploiting the heuristics. An important subclass of heuristic scheduling is list scheduling with an ordered task list for a DAG job on the basis of some greedy heuristics. Moreover, the ordered tasks are selected to be allocated to the processors which minimize the start times in list scheduling algorithms. In heuristic scheduling, the attempted solutions are narrowed down by greedy heuristics to a very small portion of the entire solution space. And this limitation of the solution searching leads to the low time complexity. However, the higher complexity DAG scheduling problems have, the harder greedy heuristics produce consistent results on a wide range of problems, because the quality of the found solutions relies on the effectiveness of the heuristics, heavily.
Metaheuristic scheduling strategies such as ant colony optimization (ACO), genetic algorithms (GA), Tabu search (TS), simulated annealing (SA), and so forth take more time cost than heuristic scheduling strategies, but they can produce consistent results with high quality on the problems with a wide range by directed searching solution spaces.
Chemical reaction optimization (CRO) is a new metaheuristic method proposed very recently and has shown its power to deal with NP-complete problem. There is only one CRO-based algorithm called double molecular structure-based CRO (DMSCRO) for DAG scheduling on heterogeneous system as far as we know. DMSCRO has a better performance on makespan and convergence rate than genetic algorithm (GA) for DAG scheduling on heterogeneous systems. However, the rate of convergence of DMSCRO as a metaheuristic method is still defective. This paper proposes a new CRO-based algorithm, tuple molecular structure-based CRO (TMSCRO), for the mentioned problem, encoding the two basic components of DAG scheduling, module execution order and module-to-processor mapping, into an array of tuples. Combining this kind of molecular structure with the elementary reaction operator designed in TMSCRO has a better capability of intensification and diversification than DMSCRO. Moreover, in TMSCRO, the concept of constrained critical paths (CCPs) [
In theory, a metaheuristic method will gradually approach the optimal result if it runs for long enough, based on No-Free-Lunch Theorem, which means the performances of the search for optimal solution of each metaheuristic algorithm are alike when averaged over all possible fitness functions. We have conducted the simulation experiments over the graphs abstracted from two well-known real applications: Gaussian elimination and molecular dynamics application and also a large set of randomly generated graphs. The experiment results show that the proposed TMSCRO can achieve similar performance as DMSCRO in the literature in terms of makespan and outperforms the heuristic algorithms.
There are three major contributions of this work. Developing TMSCRO based on CRO framework by designing a more reasonable molecule encoding method and elementary chemical reaction operators on intensification and diversification search than DMSCRO. For accelerating convergence, applying CEFT and CCPDAG to the data pretreatment, utilizing the concept of CCPs in the initialization, and using the first initial molecule, InitS, to be a super molecule in TMSCRO. Verifying the effectiveness and efficiency of the proposed TMSCRO by simulation experiments. The simulation results of this paper show that TMSCRO is able to approach similar makespan as DMSCRO, but it finds good solutions faster than DMSCRO by 12.89% on average (by 26.29% in the best case).
Most of the scheduling algorithms can be categorized into heuristic scheduling (including list scheduling, duplication-based scheduling, and cluster scheduling) and metaheuristic (i.e., guided-random-search-based) scheduling. These strategies are to generate the scheduling solution before the execution of the application. The approaches adopted by these different scheduling strategies are summarized in this section.
Heuristic methods usually provide near-optimal solutions for a task scheduling problem in less than polynomial time. The approaches adopted by heuristic method search only one path in the solution space, ignoring other possible ones [
The list scheduling [
The heterogeneous earliest finish time (HEFT) scheduling algorithm [
The modified critical path (MCP) scheduling [
Dynamic-level scheduling (DLS) [
Mapping heuristic (MH) [
Levelized-min time (LMT) [
There are two heuristic algorithms for DAG scheduling on heterogeneous systems proposed in [
Comparing with the list scheduling algorithms, the duplication-based algorithms [
The clustering algorithms [
In comparison with the algorithms based on heuristic scheduling, the metaheuristic (guided-random-search-based) algorithms use a combinatorial process for solution searching. In general, with robust performance on many kinds of scheduling problems, the metaheuristic algorithms need sampling candidate solutions in the search space, sufficiently. Many metaheuristic algorithms have been applied to solve the task scheduling problem successfully, such as GA, chemical reaction optimization (CRO), energy-efficient stochastic [
GA [
Chemical reaction optimization (CRO) was proposed very recently [
Our work is concerned with the DAG scheduling problems and the flaw of CRO-based method for DAG scheduling, proposing a tuple molecular structure-based chemical reaction optimization (TMSCRO). Comparing with DMSCRO, TMSCRO applies CEFT [
Constrained earliest finish time (CEFT) based on the constrained critical paths (CCPs) was proposed for heterogeneous system scheduling in [
The constrained critical path (CCP) is a collection with the tasks ready for scheduling only. A task is ready when all its predecessors were processed. In CEFT, a critical path (CP) is generally the longest path from the start node to the end node for scheduling in the DAG. The DAG is initially traversed and critical paths are found. Then it is pruned off the nodes that constitute a critical path. The subsequent traversals of the pruned graph produce the remaining critical paths. While the nodes are being removed from the task graph, a pseudo-edge to the start or end node is added if a node has no predecessors or no successors, respectively. The CCPs are subsequently formed by selecting ready nodes in the critical paths in a round-robin fashion. Each CCP may be assigned a single processor which has the minimum finish time of processing all the tasks in the CCP. All the tasks in a CCP not only reduce the communication cost, but also benefit from a broader view of the task graph.
Consider the CEFT algorithm generates schedules for n tasks with
Specific terms and their usage for the CEFT algorithm.
|
Execution cost of a node |
|
Communication cost from node |
|
Possible start time of node |
|
Finish time of node |
|
Actual finish time of node |
|
Finish time of the constrained critical path |
|
Availability time of |
|
Set of predecessors of node |
|
Set of successors of node |
|
Average execution cost of node |
The CEFT scheduling approach (Algorithm
(1) //PHASE 1: Find the constrained critical paths (CCPs) (2) Find set of critical paths CP according to the description in the second paragraph of Section (3) (4) (5) (6) Insert ready node (7) (8) (9) (10) (11) //PHASE 2: Assign and schedule tasks (12) (13) (14) (15) Find the start time of node (16) Find the finish time of the node (17) (18) Find the finish time of the CCP (19) (20) Assign the processor to CCP (21) Let (22)
Chemical reaction optimization (CRO) mimics the process of a chemical reaction where molecules undergo a series of reactions between each other or with the environment in a closed container. The molecules are manipulated agents with a profile of three necessary properties of the molecule, including the following. (1) The molecular structure
Four kinds of elementary reactions may happen in CRO, which are defined as below. On-wall ineffective collision: on-wall ineffective collision is a unimolecule reaction with only one molecule. In this reaction, a molecule Decomposition: decomposition is the other unimolecule reaction in CRO. A molecule Intermolecular ineffective collision: intermolecular ineffective collision is an intermolecule reaction with two molecules. Two molecules, Synthesis: synthesis is also an intermolecule reaction. Two molecules,
The canonical CRO works as follows. Firstly, the initialization of CRO is to set system parameters, such as PopSize (the size of the molecules), KELossRate, InitialKE (the initial energy of molecules), buf (initial energy in the buffer), and MoleColl (MoleColl is a threshold value to determine whether to perform a unimolecule reaction or an intermolecule reaction). Then the CRO processes a loop. In each iteration, whether to perform a unimolecule reaction or an intermolecule reaction is first decided in the following way. A number
This section discusses the system, application, and task scheduling model assumed in this work. The definition of the notations can be found in the Notations section.
In this paper, there are multiple heterogeneous processors in the target system, which are presented by
We assume a static computing system model in which the constrained relations and the execution costs of tasks are known a priori and the execution and communication can be performed simultaneously by the processors. In this paper, the heterogeneity is represented by
In DAG scheduling, finding optimal schedules is to find the scheduling solution with the minimum schedule length. The schedule length encompasses the entire execution and communication cost of all the modules and is also termed as makespan. In this paper, the task scheduling problem is to map a set of tasks to a set of processors, aiming at minimizing the makespan. It takes as input a directed acyclic graph
Two simple DAG models with 7 and 10 tasks.
A fully connected parallel system with 3 heterogeneous processors.
Consider
The constrained-critical-path sequence of DAG =
The start time of the task
The communication to computation ratio (CCR) can be used to indicate whether a DAG is communication intensive or computation intensive. For a given DAG, it is computed by the average communication cost divided by the average computation cost on a target computing system. The computation can be formulated as follows:
TMSCRO mimics the interactions of molecules in chemical reactions with the concepts of molecule, atoms, molecular structure, and energy of a molecule. The structure of a molecule is unique, which represents the atom positions in a molecule. The interactions of molecules in four kinds of basic chemical reactions, on-wall ineffective collision, decomposition, intermolecular ineffective collision, and synthesis, aim to transform to the molecule with more stable states which has lower energy. In DAG scheduling, a scheduling solution including a task and processor allocation corresponds to a molecule in TMSCRO. This paper also designs the operators on the encoded scheduling solutions (tuple arrays). These designed operators correspond to the chemical reactions and change the molecular structures. The arrays with different tuples represent different scheduling solutions, and we can calculate the corresponding makespan of the scheduling solution. A scheduling solution makespan corresponds to the energy of a molecule.
In this section, we first present the data pretreatment of the TMSCRO. After the presentation of the encoding of scheduling solutions and the fitness function used in the TMSCRO, we present the design of four elementary chemical reaction operators in each part of the TMSCRO. Finally, we outline the framework of the TMSCRO scheme and discuss a few important properties in TMSCRO.
This subsection first presents the encoding of scheduling solutions (i.e., the molecular structure) and data pretreatment, respectively. Then we give the statement of the fitness function for optimization designed in TMSCRO.
A reasonable initial population in CRO-based methods may increase the scope of searching over the fitness function [
The data pretreatment is to generate the CCPDAG from DAG and to construct CCPS for the initialization of TMSCRO. The CCPDAG is a directed acyclic graph with |CCP| nodes representing constrained critical paths The CCP and the processor allocation of each element of CCP in DAG can be obtained by executing CEFT and the first initial CCP solution, After the execution of CEFT for DAG, the CCPDAG is generated with the input of CCP and DAG. A detailed description is given in Algorithm
CCP corresponding to the DAG as shown in Figure
|
|
---|---|
1 | A-B-D |
2 | C-G |
3 | F |
4 | E |
5 | H |
6 | I |
7 | J |
(1) (2) CCP (3) CCP (4) (5) create (6) (7) add Start and End (8) add edges among Start and CCP nodes (9) add edges among End and CCP nodes (10)
As shown in Algorithm
CCPDAG corresponding to the DAG as shown in Figure
In this paper, there are two kinds of molecular structures of TMSCRO, CCPS, and S. CCP molecular structure CCPS is just used in the initialization of TMSCRO, which can be formulated as in (
The initial molecule generator is used to generate the initial solutions for TMSCRO to manipulate. The first molecule InitS is converted from InitCCPS. Part three
(1) InitS = ConvertMole(InitCCPS); (2) update each (3) MoleN = 1; (4) (5) (6) find the first successor Succ( (7) (8) find the first predecessor Pred( (9) (10) interchanged position of (11) (12) (13) (14) Generate a new CCP molecule (15) (16) update each (17) MoleN (18)
(1) (2) (3) (4) (5) (6) (7) Generate a new tuple (8) (9) (10) (11) Generate a new reaction molecule (12) (13) find the first successor Succ( (14) (15) find the first predecessor (16) (17) interchanged position of (18) (19) (20) (21) (22) change (23) (24) return
Potential energy (PE) is defined as the objective function (fitness function) value of the corresponding solution represented by S. The overall schedule length of the entire DAG, namely, makespan, is the largest finish time among all tasks, which is equivalent to the actual finish time of the end node in DAG. For the DAG scheduling problem by TMSCRO, the goal is to obtain the scheduling that minimizes makespan and ensure that the precedence of the tasks is not violated. Hence, each fitness function value is defined as
Algorithm
(1) slength = 0; (2) (3) Calculate the start time of predecessor node (4) Find the finish time of (5) (6) update scheduling length slength (7) (8) (9)
This subsection presents four elementary chemical reaction operators for sequence optimization and processor allocation optimization designed in TMSCRO, including on-wall collision, decomposition, intermolecular collision, and synthesis.
In this paper, the operator, OnWallT, is used to generate a new molecule
Illustration of molecular structure change for on-wall ineffective collision.
Illustration of the task-to-computing-node mapping for on-wall ineffective collision.
In this paper, the operator, DecompT, is used to generate new molecules
Illustration of molecular structure change for decomposition.
Illustration of the task-to-computing-node mapping for decomposition.
In this paper, the operator, IntermoleT, is used to generate new molecules
Illustration of molecular structure change for intermolecular ineffective collision.
Illustration of the task-to-computing-node mapping for intermolecular ineffective collision.
In this paper, the operator, SynthT, is used to generate a new molecule
Illustration of molecular structure change for synthesis.
Illustration of the task-to-computing-node mapping for synthesis.
The framework of TMSCRO is shown as an outline to schedule a DAG job in Algorithm
(1) Initialize PopSize, KELossRate, MoleColl and InitialKE, (2) Call Algorithm (3) Call Algorithm (4) (5) Generate (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) (34) (35) (36) (37) (38) (39) (40) (41) (42) (43) (44) (45) (46)
It is very difficult to theoretically prove the optimality of the CRO (as well as DMSCRO and TMSCRO) scheme [
First, just like DMSCRO, TMSCRO enjoys the advantages of GA and SA to some extent by analyzing the chemical reaction operators designed in TMSCRO and the operator environment of TMSCRO: (1) the OnWallT and IntermoleT in TMSCRO exchange the partial structure of two different molecules like the crossover operator in GA. (2) The energy conservation requirement in TMSCRO is able to guide the searching of the optimal solution in a similar way as the Metropolis Algorithm of SA guides the evolution of the solutions in SA. Second, constrained earliest finish time (CEFT) algorithm constructs constrained critical paths (CCPs) by taking into account a broader view of the input DAG [
The simulations have been performed to test TMSCRO scheduling algorithm in comparison with heuristic (HEFT_B and HEFT_T) [
Gaussian elimination for a matrix of size 7.
A molecular dynamics code.
A random graph with 10 nodes.
Considering that HEFT_B and HEFT_T have better performance than other heuristics algorithms for DAG scheduling on heterogeneous computing systems, as proposed in the 8th paragraph in Section
The performance has been evaluated by the parameter makespan. The makespan values plotted in the bar graph of makespan and the chart of converge trace are, respectively, the average result of 50 and 25 independent runs to validate the robustness of TMSCRO. The communication cost is calculated by using computation costs and the computation cost ratio (CCR) values. The computation can be formulated as in (
All the suggested values for the other parameters of the simulation of TMSCRO and their values are listed in Table
Configuration parameters for the simulation of TMSCRO.
Parameter | Value |
---|---|
InitialKE | 1000 |
|
500 |
|
10 |
Buffer | 200 |
KELossRate | 0.2 |
MoleColl | 0.2 |
PopSize | 10 |
|
0.33 |
Number of runs | 50 |
The real world application set is used to evaluate the performance of TMSCRO, which consists of two real world problem graph topologies, Gaussian elimination [
Gaussian elimination is a well-known method to solve a system of linear equations. Gaussian elimination converts a set of linear equations to the upper triangular form by applying elementary row operators on them systematically. As shown in Figure
The parameters and their values of the Gaussian elimination graphs performed in the simulation are given in Table
Configuration parameters for the Gaussian elimination graphs.
Parameter | Possible values |
---|---|
CCR | {0.1, 0.2, 1, 2, 5} |
Number of processors | {4, 8, 16, 32} |
Number of tasks | 27 |
The makespan of TMSCRO, DMSCRO, HEFT_B, and HEFT_T under the increasing processor number is shown in Figure
Average makespan for Gaussian elimination.
As the intelligent random search algorithms, TMSCRO and DMSCRO search a wider area of the solution space than HEFT_B, HEFT_T, or other heuristic algorithms, which narrow the search down to a very small portion of the solution space. This is the reason why TMSCRO and DMSCRO are more likely to obtain better solutions and outperform HEFT_B and HEFT_T.
The simulation results show that the performance of TMSCRO and DMSCRO is very similar to the fundamental reason that these algorithms are metaheuristic algorithms. Based on No-Free-Lunch Theorem in the field of metaheuristics, the performances of all well-designed metaheuristic search algorithms for optimal solution are the same, when averaged over all possible objective functions. The optimal solution will be gradually approached by a well-designed metaheuristic algorithm in theory, if it runs for long enough. The DMSCRO developed in [
The experiment results for the Gaussian elimination graph under different processors, CCR = 0.2.
The number of processors | HEFT_B |
HEFT_T |
DMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
---|---|---|---|---|---|---|---|
4 | 112.2 | 122.227 | 109.9 | 109.31 | 109.2 | 109.9 | 0.2473 |
8 | 112.2 | 112.648 | 108.9 | 107.83 | 107.1 | 108.9 | 0.9613 |
16 | 80.4 | 92.354 | 77.5 | 76.62 | 76.3 | 78.9 | 1.6696 |
32 | 79.64 | 85.454 | 77.5 | 76.62 | 76.1 | 78.9 | 1.7201 |
In Figure
Figure
The experiment results for the Gaussian elimination graph under different CCRs; the number of processors is 8.
CCR | HEFT_B |
HEFT_T |
DMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
---|---|---|---|---|---|---|---|
0.1 | 108.2 | 110.312 | 106.78 | 105.04 | 104.76 | 106.6 | 1.7271 |
0.2 | 112.2 | 112.648 | 108.9 | 107.83 | 107.1 | 108.9 | 0.9613 |
1 | 120.752 | 124.536 | 115.63 | 114.717 | 114.3 | 115.4 | 0.3787 |
2 | 207.055 | 197.504 | 189.4 | 188.303 | 188.1 | 188.75 | 0.1522 |
5 | 263.8 | 263.8 | 252.39 | 250.671 | 250.3 | 251.79 | 0.9178 |
Average makespan for Gaussian elimination; the number of processors is 8.
Figure
The parameters and their values of the molecular dynamics code graphs performed in the simulation are given in Table
Configuration parameters for the molecular dynamics code graphs.
Parameter | Possible values |
---|---|
CCR | {0.1, 0.2, 1, 2, 5} |
Number of processors | {4, 8, 16, 32} |
Number of tasks | 41 |
As shown in Figures
The experiment results for the molecular dynamics code graph under different processors, CCR = 1.0.
The number of processors | HEFT_B |
HEFT_T |
DMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
---|---|---|---|---|---|---|---|
4 | 149.205 | 142.763 | 139.51 | 138.13 | 137.87 | 138.6 | 0.1749 |
8 | 131.031 | 122.265 | 118.8 | 116.9 | 116.2 | 117.33 | 0.2764 |
16 | 124.868 | 115.584 | 113.52 | 113.36 | 113.1 | 113.43 | 0.0237 |
32 | 120.047 | 103.784 | 102.617 | 101.29 | 101.023 | 101.47 | 0.0442 |
The experiment results for the molecular dynamics code graph under different CCRs; the number of processors is 16.
CCR | HEFT_B |
HEFT_T |
DMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
---|---|---|---|---|---|---|---|
0.1 | 82.336 | 90.136 | 80.53 | 77.781 | 77.3 | 78.9 | 0.9459 |
0.2 | 82.356 | 87.504 | 80.53 | 78.704 | 78.21 | 79.13 | 0.2002 |
1 | 124.868 | 115.584 | 113.52 | 113.36 | 113.1 | 113.43 | 0.0237 |
2 | 216.735 | 174.501 | 167.612 | 164.7 | 164.32 | 164.91 | 0.0742 |
5 | 274.7 | 274.7 | 265.8 | 262.173 | 262.022 | 262.6 | 0.1344 |
Average makespan for the molecular dynamics code.
Average makespan for the molecular dynamics code; the number of processors is 16.
Average makespan of different task numbers,
An effective mechanism to generate random graph for various applications is proposed in [
In the random graph generation of this mechanism, the topological order is used to guarantee the precedence constraints; that is, an edge exists between two nodes
The parameters and their values of the random graphs performed in the simulation are given in Table
Configuration parameters for random graphs.
Parameter | Possible values |
---|---|
CCR | {0.1, 0.2, 1, 2, 5, 10} |
Number of processors | {4, 8, 16, 32} |
Number of tasks | {10, 20, 50} |
Figure
The experiment results for the random graph under different task numbers, CCR = 10; the number of processors is 32.
The number of tasks | TMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
---|---|---|---|---|
10 | 73 | 67 | 65.1 | 62.2 |
20 | 148.9 | 143.9 | 139.421 | 136.8 |
50 | 350.7 | 341.7 | 334.17 | 331.9 |
The experiment results for the random graph under different processors, CCR = 0.2; the number of tasks is 50.
The number of processors | HEFT_B |
HEFT_T |
DMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
---|---|---|---|---|---|---|---|
4 | 167.12 | 178.023 | 159.234 | 157.63 | 157.12 | 158.3 | 0.3923 |
8 | 136.088 | 145.649 | 128.17 | 127.178 | 127.06 | 127.7 | 0.1949 |
16 | 119.292 | 125.986 | 115.9 | 114.33 | 114.1 | 115.2 | 0.4753 |
32 | 111.866 | 120.065 | 108.7 | 108.71 | 108.31 | 108.9 | 0.0733 |
The experiment results for the random graph under different processors, CCR = 1.0; the number of tasks is 50.
The number of processors | HEFT_B |
HEFT_T |
DMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
TMSCRO |
---|---|---|---|---|---|---|---|
4 | 178.662 | 175.52 | 168.12 | 167.703 | 167.42 | 168 | 0.0857 |
8 | 138.572 | 136.47 | 131.8 | 131.451 | 131.1 | 131.9 | 0.178 |
16 | 125.772 | 124.31 | 122.91 | 122.32 | 122.1 | 122.432 | 0.0233 |
32 | 117.11 | 116.4 | 114.124 | 113.127 | 112.9 | 113.54 | 0.1348 |
Average makespan of four algorithms under different processor numbers and the low communication costs; the number of tasks is 50.
Average makespan of four algorithms under different processor numbers and the low communication costs; the number of tasks is 50.
As shown in Figure
The experiment results for the random graph under different task CCRs, the number of tasks is 50.
CCR | The number of processors is 4 | The number of processors is 8 | The number of processors is 16 | The number of processors is 32 |
---|---|---|---|---|
0.1 | 156.97 | 115.724 | 110.3 | 101.87 |
0.2 | 157.63 | 127.178 | 114.33 | 108.71 |
1 | 167.703 | 131.451 | 122.32 | 113.127 |
2 | 294.042 | 289.878 | 273.375 | 269.514 |
5 | 473.5 | 467.61 | 429.13 | 428.13 |
Average makespan of TMSCRO under different values of CCR; the number of tasks is 50.
The result of the experiments in the previous subsections is the final makespan obtained by TMSCRO and DMSCRO, showing that TMSCRO can obtain similar makespan performance as DMSCRO. Moreover, in some cases the final makespan achieved by TMSCRO is even better than that by DMSCRO after the stop criteria are satisfied. In this section, the change of makespan in the experiments as TMSCRO and DMSCRO progress during the search is demonstrated by comparing the convergence trace of these two algorithms. These experiments help further reveal the better performance of TMSCRO on convergence and can also help explain why the TMSCRO sometimes outperforms DMSCRO in some cases.
The parameters and their values of the Gaussian elimination, molecular dynamics code, and random graphs performed in the simulation are given in Tables
Configuration parameters of convergence experiment for the Gaussian elimination graph.
Parameter | Value |
---|---|
CCR | 0.2 |
Number of processors | 8 |
Number of tasks | 27 |
Configuration parameters of convergence experiment for the molecular dynamics graph.
Parameter | Value |
---|---|
CCR | 1 |
Number of processors | 16 |
Number of tasks | 41 |
Configuration parameters of convergence experiment for the random graphs.
Parameter | Values |
---|---|
CCR | {0.2, 1} |
Number of processors | {8, 16} |
Number of tasks | {10, 20, 50} |
Figures
The convergence trace for Gaussian elimination;
The convergence trace for the molecular dynamics code;
The convergence trace for the randomly generated DAGs with each containing 10 tasks.
The convergence trace for the randomly generated DAGs with each containing 20 tasks.
The convergence trace for the randomly generated DAGs with each containing 50 tasks.
The statistical analysis results over the average coverage rate at 5000 ascending sampling points from start time to end time of all the experiments are shown in Table
The results of the statistical analysis over the average coverage rate at different sampling times of all the experiments (the threshold of
DAG | The value of |
Average convergence acceleration ratio |
---|---|---|
Gaussian elimination |
|
4.23% |
Molecular dynamics code |
|
7.21% |
Random graph with 10 tasks |
|
23.27% |
Random graph with 20 tasks |
|
16.41% |
Random graph with 50 tasks |
|
13.32% |
In these experiments, the stopping criteria of the algorithms are that the algorithm stops when the makespan performance remains unchanged for a preset number of consecutive iterations in the search loop (in the experiments, it is 5000 iterations). In reality, the algorithms can also stop when the total processing time of it reaches a preset value (e.g., 180s). Moreover, both of TMSCRO and DMSCRO have the same initial population. In this case, the fact that TMSCRO outperforms DMSCRO on convergence means that the makespan achieved by TMSCRO could be much better than that by DMSCRO when the stopping criteria of the algorithm are satisfied. The reason for this can be explained by the analysis presented in the last paragraph of Section
In this paper, we developed a TMSCRO for DAG scheduling on heterogeneous systems based on chemical reaction optimization (CRO) method. With a more reasonable reaction molecular structure and four designed elementary chemical reaction operators, TMSCRO has a better ability on intensification and diversification search than DMSCRO, which is the only one CRO-based algorithm for DAG scheduling on heterogeneous systems as far as we know. Moreover, in TMSCRO, the algorithm constrained earliest finish time (CEFT) and constrained-critical-path directed acyclic graph (CCPDAG) are applied to the data pretreatment, and the concept of constrained paths (CCPs) is also utilized in the initialization. We also use the first initial molecule, InitS, to be a super molecule for accelerating convergence. As a metaheuristic method, the TMSCRO algorithm can cover a much larger search space than heuristic scheduling approaches. The experiments show that TMSCRO outperforms HEFT_B and HEFT_T and can achieve a higher speedup of task executions than DMSCRO.
In future work, we plan to extend TMSCRO by applying synchronous communication strategy to parallelize the processing of TMSCRO. This kind of design will divide the molecules into groups and each group of molecules is handled by a CPU or GPU. So, multiple groups can be manipulated simultaneously in parallel and molecules can also be exchanged among the CPUs or GPUs from time to time in order to reduce the time cost.
Input directed acyclic graph with
Node sequence in which the hypothetical entry node (with no predecessors)
Edge set in which
Set of multiple heterogeneous processors in target system
Constrained-critical-path sequence of
Constrained critical path in which the set
Directed acyclic graph with |CCP| nodes representing CCPs, two virtual nodes (i.e., start and end) representing the beginning and exit of execution, respectively, and |CE| edges representing dependencies among all nodes
A CCP molecule used in the initialization of TMSCRO, in which
A reaction molecule (i.e., solution) in TMSCRO
Atom (i.e., tuple) in
The first CCP molecule for the initialization of TMSCRO
The first molecule in TMSCRO
Edge between CCPs and CCPe
Average computation cost of node
Execution cost of a node
Communication cost from node
Possible start time of node
Finish time of node
Availability time of
Set of predecessors of node
Set of successors of node
Communication to computation ratio
The parameter to adjust the heterogeneity level in a heterogeneous system
Current potential energy of a molecule
Current kinetic energy of a molecule
Initial kinetic energy of a molecule
Threshold value guiding the choice of on-wall collision or decomposition
Threshold value guiding the choice of intermolecule collision or synthesis
Initial energy in the central energy buffer
Loss rate of kinetic energy
Threshold value to determine whether to perform a unimolecule reaction or an intermolecule reaction
Size of the molecules
Total collision number of a molecule.
The authors declare that there is no conflict of interests regarding the publication of this paper.