DRSCRO : A Metaheuristic Algorithm for Task Scheduling on Heterogeneous Systems

An efficient DAG task scheduling is crucial for leveraging the performance potential of a heterogeneous system and finding a schedule that minimizes the makespan (i.e., the total execution time) of a DAG is known to be NP-complete. A recently proposed metaheuristic method, Chemical Reaction Optimization (CRO), demonstrates its capability for solving NP-complete optimization problems. This paper develops an algorithm named Double-Reaction-Structured Chemical Reaction Optimization (DRSCRO) for DAG scheduling on heterogeneous systems, which modifies the conventional CRO framework and incorporates CRO with the variable neighborhood search (VNS) method. DRSCRO has two reaction phases for super molecule selection and global optimization, respectively. In the molecule selection phase, the CRO as a metaheuristic algorithm is adopted to obtain a super molecule for accelerating convergence. For promoting the intensification capability, in the global optimization phase, the VNS algorithm with a new processor selection model is used as the initialization under the consideration of scheduling order and processor assignment, and the load balance neighborhood structure of VNS is also utilized in the ineffective reaction operator. The experimental results verify the effectiveness and efficiency of DRSCRO in terms ofmakespan and convergence rate.


Introduction
A large application can be decomposed into several smaller models (i.e., tasks) processed in parallel on heterogeneous computing systems.An efficient task scheduling is crucial for leveraging the performance potential of a heterogeneous system.The problem of the task scheduling on heterogeneous system can be stated as assigning the processors to the tasks for minimizing the makespan (i.e., the total execution time).As one task is required only after all of its predecessors are executed, these tasks with precedence constraints can be modeled as directed acyclic graphs (DAGs), where the nodes and the directed edges represent the tasks and the communications between the tasks, respectively.Finding a schedule that minimizes the execution time of a parallel program is known to be NP-complete [1].Therefore, two scheduling strategies, heuristic and metaheuristic, are developed for searching a suboptimal solution with lower execution time.
Heuristic scheduling strategies focus on identifying a solution by exploiting the heuristics, an important class of algorithms based on which is list scheduling [2][3][4][5][6][7][8][9][10][11][12], such as heterogeneous earliest finish time (HEFT) [3].List scheduling consists of two basic phases, constructing a scheduling list of tasks order by priority of each task and mapping each task to a processor in priority order according to greedy approach (i.e., a task with the highest-priority is assigned to a processor that allows the earliest finish time).The performance of heuristic-based algorithms relied on the effectiveness of the heuristics in a tremendous manner.
Metaheuristic scheduling strategies such as Ant Colony Optimization (ACO) [13], Genetic Algorithms (GA) [14][15][16][17][18][19][20][21], Tabu Search (TS) [22,23], and Simulated Annealing (SA) [24] search the solution spaces in a direct manner and produce consistent and high quality results on the wide range problems while, in comparison with heuristic-based algorithms, these strategies always cost much more time.The Chemical Reaction Optimization (CRO) is a new metaheuristic method and has shown its efficiency in solving NP-complete problem [25][26][27][28][29].There are only two CRO-based algorithms [27,30] 2 Mathematical Problems in Engineering for DAG scheduling on heterogeneous system so far according to our knowledge.These two algorithms both focused on the DAG scheduling with the objective of minimizing the makespan.However, as metaheuristic scheduling strategies, CRO-based algorithms for DAG scheduling still have very high time cost and the convergence rates of them also need to be improved.In [30], the concept of super molecule is applied for accelerating convergence and the super molecule is selected by heuristic scheduling strategies.However, the performance of this kind of super molecule selection method is affected by the range of problems.
This paper proposes an algorithm, Double-Reaction-Structured CRO (DRSCRO), for DAG task scheduling on heterogeneous systems to aim at obtaining schedules with better quality.In this paper, the conventional CRO framework scheme is modified and two reaction phases, one for super molecule selection and another for global optimization, are developed in DRSCRO.CRO as a metaheuristic algorithm is utilized in the molecule selection phase to obtain a super molecule [31] for better convergence rate.And the variable neighborhood search (VNS) algorithm [32] method with a new processor selection model, as well as its neighborhood structure, is also utilized to promote the intensification capability in the global optimization phase.
There are three major contributions of this work: (1) Developing DRSCRO by modifying the conventional CRO framework and utilizing a metaheuristic method to obtain a super molecule for accelerating convergence.
(2) Utilizing the VNS [32] algorithm with a new processor selection model as the global optimization phase initialization, which takes into account the optimization of the scheduling order and processor assignment, and applying one of its neighborhood structures in the reaction operator to promote the intensification capability of DRSCRO.
(3) Conducting simulation experiments to prove the efficiency and effectiveness of DRSCRO in terms of makespan and convergence rate.
The next section introduces relevant research works on the DAG scheduling problem on heterogeneous systems.Section 3 describes the models of the studied problem as formal statement.Section 4 presents the design of the proposed DRSCRO for DAG scheduling.In Section 5, the simulation performance of DRSCRO is analyzed and compared with some existing scheme algorithms.Section 6 draws the conclusions of this paper and the suggestions for future research.

Literature Review
The DAG scheduling problem, which has been proven to be NP-hard in general [1], can be formulated as the search for an optimal solution to the assignment of the tasks in DAG onto a set of processors, to minimize the total scheduling length (i.e., makespan).There are two main categories, heuristic (deterministic) and metaheuristic (nondeterministic), of the various scheduling algorithms proposed over the last decade.As metaheuristic methods, CRO-based algorithms for DAG scheduling on heterogeneous systems are based on Chemical Reaction Optimization (CRO) algorithm, which was proposed very recently and has shown its power to deal with NP-complete problems.
2.1.Heuristic and Metaheuristic Methods.The heuristic methods are on the basis of the heuristics which are extracted from intuitions, and the most important class of them is list scheduling algorithms [2][3][4][5][6][7][8][9][10][11][12].The HEFT algorithm, which was proposed by Topcuoglu et al. [3], utilizes the information of execution cost on average of each task as an upwardranking heuristic to calculate the task priority.At each step of HEFT, the task with the highest value of upward rank is selected and mapped to the processor with a greedy approach (i.e., the assigned processor minimizes the earliest finish time of the selected task).Experimental results prove that HEFT obtains better performance on schedule quality and computational cost than the other list scheduling algorithms.The performance of heuristic-based algorithms heavily relied on the effectiveness of the heuristics.The higher complexity DAG scheduling problems have, the harder greedy heuristics produce consistent results on a wide range of problems.In particular, GA has been widely used to evolve solutions for many task scheduling problems as the most representative metaheuristic method [21].Different from heuristicbased algorithms, the metaheuristic methods use a guidedrandom-search-based process for solution searching.They typically require sufficient sampling of candidate solutions in the search space and have shown robust performance on a variety of scheduling problems.For solving DAG scheduling problem successfully, many metaheuristic algorithms have been utilized such as GA [14][15][16][17][18][19][20][21], ACO [13], SA [24], TS [22,23], CRO [27,30], VNS [21], and energy-efficient stochastic [33].
According to No-Free-Lunch Theorem [34], all welldesigned metaheuristic methods have the same performance on searching for optimal solutions when averaged over all possible fitness functions.In comparison with the heuristic methods, the metaheuristic methods, which always have much higher computational cost, can obtain better performance in terms of schedule quality, because the metaheuristic methods can search a wider area of the solution space with the guided-random-search-based processes for solution searching, while the search of the heuristic-based algorithms are narrowed down to a very smaller portion by means of the heuristics.

CRO-Based
Algorithms for DAG Scheduling on Heterogeneous Systems.CRO was proposed by Lam and Li very recently [25], and, as far as we know, as metaheuristic methods, Double Molecular Structure-Based Chemical Reaction Optimization (DMSCRO) [27] and Tuple-Based Chemical Reaction Optimization (TMSCRO) [30] are the only two CRO-based algorithms for DAG scheduling on heterogeneous systems.CRO-based algorithms mimic the chemical reaction process, which accords with energy conservation, in a closed container.The molecules with two kinds of energy, potential energy (PE) and kinetic energy (KE), in CRO-based A buffer is also used in CRO-based algorithms for energy interchange and conservation.Moreover, to find the solution with the global minimal makespan, four types of elementary chemical reactions, on-wall ineffective collision, decomposition, intermolecular ineffective collision, and synthesis, are applied for the intensification and the diversification searches.The typical execution flow of CRO framework adopted in DMSCRO and TMSCRO is as proposed in [25] and the parameters used in CRO are presented in Table 1.
As metaheuristic methods, DMSCRO and TMSCRO have better performance in terms of schedule quality than heuristic methods and the reason is as presented in the last paragraph of Section 2.1.The experimental results in [27,30] prove that both of DMSCRO and TMSCRO outperform GA.DMSCRO is the first algorithm by applying CRO proposed by Lam and Li in [25] to solve the DAG scheduling problem, and it enjoys the advantages of both GA and SA.On the one hand, the intermolecular collision and on-wall collision designed in DMSCRO have similar effect to the crossover operation and the mutation operation in GA, respectively.On the other hand, the energy conservation requirement in DMSCRO is able to guide the searching of the optimal solution similarly to the way the Metropolis Algorithm of SA guides the evolution of the solutions in SA.Two additional operations, decomposition and synthesis, give DMSCRO more opportunities to jump out of the local optimum and explore the wider areas in the solution space.This benefit enables DMSCRO to find good solutions faster than GA, which has been widely used to evolve solutions for many task scheduling problems.DMSCRO are not compared with SA in [27,30], because the underlying principles and philosophies between DMSCRO and SA differ a lot [27].Typically, metaheuristic algorithms like CRO-based algorithm of GAbased algorithms operating on a population of solutions are able to find good solutions faster than that operating on a single solution like SA-based algorithms.Comparing with DMSCRO, TMSCRO applies constrained earliest finish time algorithm to data pretreatment to take the advantage of the super molecule and constrained critical paths [35], which is, as heuristic information, for accelerating convergence.Moreover, the molecule structure and elementary reaction operators design in TMSCRO are more reasonable than those in DMSCRO on intensification and diversification of searching the solution space.
However, for solving the NP problem of DAG scheduling on heterogeneous systems, CRO-based algorithms, TMSCRO and DMSCRO, still have very large time expenditure as metaheuristic scheduling strategies; therefore, the searching capabilities and convergence rates of them need to be improved.There are three deficiencies of TMSCRO and DMSCRO.First, in [30], the concept of super molecule is applied for accelerating convergence and the super molecule is selected by heuristic scheduling strategies, but the performance of this kind of super molecule selection method is affected by the range of problems.Second, in both TMSCRO and DMSCRO, the initial molecules, which are very important for the whole searching process, are randomly created, and the uncertainty of this kind of initialization undermines the searching capabilities of TMSCRO and DMSCRO.Moreover, the intensification capabilities of CRO-based algorithms for DAG scheduling also need to be improved, to obtain better performances of the average results when the iteration stopping criterions are satisfied.
Therefore, this paper proposes an algorithm, Double-Reaction-Structured CRO (DRSCRO), for DAG task scheduling on heterogeneous systems to aim at obtaining schedules with better quality.In this paper, the conventional CRO framework scheme is modified and two reaction phases, one for super molecule selection and another for global optimization, are developed in DRSCRO.CRO as a metaheuristic algorithm is utilized in the molecule selection phase to obtain a super molecule [31] for better convergence rate.Moreover, in the global optimization phase, the variable neighborhood search (VNS) algorithm method [21,32,36], which is an effective metaheuristic with the utilizations of neighborhood structures and a local search to change the neighborhood systematically, is used to optimize the initial molecule, and one of its neighborhood structures is also adopted in the reaction operator to promote the intensification capability.And there is a new model proposed for processor selection utilized in the neighborhood structures of the VNS algorithm for better effectiveness.
Moreover, in [21], VNS was incorporated with GA for DAG scheduling, but the task priority was unchangeable in the VNS algorithm in [21], which reduces the efficiency of VNS to obtain a better solution.So, different from [21], to promote the intensification capability of the whole algorithm, the VNS in DRSCRO is modified under the consideration of the optimization of the scheduling order and the processor assignment both.

Problem Formulation
The DAG scheduling problem is typically with two inputs: a heterogeneous system for task computing in parallel and a parallel program of application (i.e., DAG).In this paper, the heterogeneous system is assumed as a static computing system model presented by  = {  |  = 1, 2, 3, . . ., ||}, which is a fully connected network of processors.The heterogeneity level in this paper is formulated as (1+hl%)/(1−hl%), where the parameter hl ∈ (0, 1).In this paper, EcCost   (V  ) represents the computation cost of a task V  mapped to the processor   and the value of each EcCost   (V  ) is randomly chosen within the scope of [1 − hl%, 1 + hl%].
In general, DAG = (, ) consists of a task (node) set  and an edge set . EcCost   (V  ) is as defined in the first paragraph of this section, and the same processor executes a task in the DAG without preemption.The constraint between tasks V  and V  is denoted as the edge  , ( , ∈ ), which means that the execution of task V  only after the execution result of task V  has been transmitted to task V  .Each edge  , has a nonnegative weight comm(V  , V  ) denoting the communication cost between V  and V  .Each task in a DAG can only be executed on one processor and the communication can be performed simultaneously by the processors.In addition, when two communicating tasks are mapped to the same processor, the communication cost of them is zero.Predecessor (V  ) represents the set of the predecessors of V  , while successor (V  ) represents the set of the successors of V  .The task with no predecessor is denoted as V entry while the task with no successor is denoted as V exit .
Consider that there is a DAG with || tasks to be mapped to a heterogeneous system with || processors.Assuming the highest-priority ready task V  on the processor   , the earliest start time of V  ,  ESTime (V  ,   ), can be formulated as where  avail (  ) can be defined as (2). avail (  ) is the time when processor   is available to the execution of the task V  : where exec(  ) represents all the tasks which have already been scheduled on the processor   and  AFTime (V  ) denotes the actual finish time when the task V  finishes its execution.
ready (V  ,   ) in (1) represents the time when all the data needed for the process of V  have been transmitted to   , which is formulated as where  AFTime (V  ) has the same definition in (2) and predecessor (V  ) denotes the set of all the immediate predecessors of task V  .comm(V  , V  ) is 0 if the task V  and task V  are mapped to the same processor   .
If task V  is mapped to the processor   with nonpreemptive processing approach, the earliest finish time of task V  ,  EFTime (V  ,   ), is formulated as After the task V  is executed by the processor   ,  EFTime (V  ,   ) is assigned to  AFTime (V  ).The makespan of the entire parallel program is equivalent to the actual finish time of exit task V exit : The computation of the communication-to-computation ratio (CCR) can be formulated as in where (V  ) is the average computation cost of task V  and it can be calculated as follows: A simple four-task DAG and a heterogeneous computation system with three processors are shown in Figures 1(a) and 1(b), respectively.The definition of the notations can be found in Table 2.

Design of DRSCRO
DRSCRO imitates the molecular interactions in chemical reactions based on the concepts of atoms, molecule, molecular structure, and energy of a molecule.In DRSCRO, a molecule corresponds to a scheduling solution in DAG scheduling, with a unique molecular structure representing the atom positions in a molecule.We utilize the molecular structure of TMSCRO in our work, under the consideration of its capability to represent the constrained relationship between the tasks in a molecule (solution).In addition, the energy of each molecule corresponds to the fitness value of a solution.The molecular interactions try to reconstruct more stable molecular structure with lower energy.There are four kinds of basic chemical reactions, on-wall ineffective The earliest start time of task V  which is mapped to processor    EFTime (V  ,   ) The earliest finish time of task V  which is mapped to processor    avail (  ) The time when processor   is available The time when all the data needed for the process of V  have been transmitted to    AFTime (V  ) Actual finish time when task V  finishes its execution predecessor(V  ) Set of the predecessors of task Set of the successors of task V  exec(  ) Set of the tasks which have already been scheduled on the processor   (V) Average computation cost of task V CCR Communication-to-computation ratio hl Parameter for adjusting the heterogeneity level in a heterogeneous system Start End 5 16 17 14 0 0 1 (8) 4 (10) 3 (15) 2 (14) (a)   (1) makespan = 0; (2) for each node (8) end for (9) return makespan; with a new model for processor selection is adopted as the initialization of the global optimization phase, and it is also utilized as a local search process to promote the intensification capability of DRSCRO.There are four kinds of elementary chemical reaction in DRSCRO, on-wall collision, decomposition, intermolecular collision, and synthesis.And each kind of reaction contains two types of operators which are, respectively, utilized in two phases of DRSCRO.In each iteration, one of the elementary chemical reaction operators is performed to generate new molecules and the PEs of the newly generated molecules (i.e., the fitness function values of the newly generated molecules) will be calculated.In addition, SMole will be tracked and only participates in on-wall ineffective collision and intermolecular ineffective collision in the global optimization phase to explore as much as possible the solution space in its neighborhoods and the main purpose is to prevent the super molecule from changing dramatically.The iteration of each phase repeats until the stopping criteria (or next phase criteria) are met, and SMole and its fitness function value are just the final solution and makespan (i.e., global min point), respectively.In the implementations of the experiments in this paper, the next phase criteria and the stop criteria of DRSCRO are set as when there is no makespan improvement after 10000 consecutive iterations in the search loop.

Molecular Structure and Fitness
Function.This subsection presents the encoding of scheduling solutions (i.e., the molecular structure) and the statement of the fitness function in DRSCRO.

Molecular Structure.
In this paper, an atom with three elements can be denoted as a tuple (V  ,   ,   ) and the molecular structure M with an array of tuples can be formulated as in (8) to represent a solution to the DAG scheduling problem.The order of the tuples in M represents the priority of each DAG task V  with the allocated processor   , and V = (V 1 , V 2 , . . ., V || ) is a topological sequence of DAG, which is with the hypothetical entry task (with no predecessors) V 1 and exit task (with no successors) V || , respectively, representing the beginning and end of execution.Moreover, if tuple A is before tuple B and V A is the predecessor of V B in DAG, the second integer of tuple B,  B , will be 1, and vice versa 4.2.2.Fitness Function.Potential energy (PE) is defined as the fitness function value of the corresponding solution represented by .The overall schedule length of the entire DAG, namely, makespan, is the largest finish time among all tasks, which is equivalent to the actual finish time of the exit node in DAG.In this paper, the goal of DAG scheduling problem by DRSCRO is to obtain the scheduling that minimizes makespan and ensure that the precedence of the tasks is not violated.Hence, each fitness function value is defined as Algorithm 1 presents how to calculate the value of the optimization fitness function Fit(m).

Initialization.
There are two kinds of initial molecule generator, one used in the phase of super molecule selection and the other used in the phase of global optimization, to generate the initial solutions for DRSCRO to manipulate.The tuples of the first molecule m used in the initialization of the phase of super molecule selection are ascendingly ordered by the upward rank value [27] of their V  , and element three   of each tuple is generated by a random perturbation.The upward rank value can be calculated by A detailed description of the initial molecule generator of the super molecule selection phase is given in Algorithm 2. For the first input molecule m,   in each tuple in m is set as  1 .

Elementary Chemical Reaction Operators.
In DRSCRO, the operators for super molecule selection just randomly change   of each tuple in a molecule as the intensification searches or the diversification searches [25] to optimize the processor mapping of a solution.Figures 3, 4

Molecule
New molecule  ( 1 , 0, p 1 )  (1) tempSet = pop set; (2) pop subset = 0; (3) if pop set is the input of the VNS algorithm for the first time (i.e. the output of the super molecule selection phase) (4) for each tempS in tempSet except SMole (5) choose a tuple (V  ,   ,   ) in tempS, where   = 0, randomly; (6) generate a random number  ∈ (0, 1); (7) if rnd≥ 0.5 (8) find the first predecessor v j = Pred(v i ) from v i to the begin in molecule tempS; (9) interchanged position of (v i , f i , p i ) and (V +1 ,  +1 ,  +1 ) in molecule tempS; (10) update f i ,  +1 and  +1 as defined in the last paragraph of Section 4.2.1.(11) end if (12) for each p i in molecule tempS to randomly change; (13) change p i randomly (14) end for (15) if Fit(tempS) < Fit(SMole) (16) SMole = tempS; (17) end if (18) end for (19) end if (20)   As shown in Figure 5, the operator, IntermoleSMS, is used to generate new molecules m  1 and m  2 from given molecules m 1 and m 2 .This operator first uses the steps in OnWallSMS to generate m  1 from m 1 , and then the operator generates the other new molecule m  2 from m 2 in the similar fashion.In the end, the operator generates two new molecules m   1 and m  2 from m 1 and m 2 as an intensification search.As shown in Figure 6, the operator, SynthSMS, is used to generate a new molecule m  from given molecules m 1 and m 2 .SynthSMS works as follows: The operator keeps the tuples in m  , which is at the same position in m 1 and m 2 with the same   's, and then changes the remaining   's in m  , randomly.As a result, the operator generates m  from m 1 and m 2 as a diversification search.

Global Optimization Phase
4.4.1.Initialization.VNS is utilized by our proposed algorithm as the initialization of the global optimization phase and it is also as a local search process to promote the intensification capability of DRSCRO during the running of the whole algorithm.
Algorithms 3 and 4, respectively, present the subset generator of the phase output and the main steps of the whole VNS algorithm (i.e., the initialization of the global optimization phase).In DRSCRO, the VNS algorithm only processes the subset of the population with the super molecule, SMole, after each iteration in the global optimization phase (the output of super molecule selection phase is the input of VNS for the first time).As presented in Algorithm 3, if the pop set (i.e., the set of population) is the output of the super molecule selection phase, the tuple orders and   s of its elements will be adjusted.pop subset is the subset of population and pop subset num is the number of the elements in pop subset, which is set as PopSize × 50% in this paper.
In Algorithm 4, different from the VNS proposed in [21], the task priority was changeable in the VNS algorithm used in DRSCRO, the reason for which is that the unchangeable task priority in the VNS reduces its efficiency to obtain a better solution.Therefore, under the consideration of the optimization of the scheduling order and the processor assignment both, the input molecules of VNS can be with different tuple order (i.e., task priority) as presented in Algorithm 3 in each iteration. max is set to 2 as presented in [37].As the essential factor of VNS, two neighborhood structures, load balance and communication reduction neighborhood structures, which demonstrate their power in solving DAG scheduling problem on heterogeneous systems as presented in [21], are adopted by the VNS algorithm in DRSCRO for their high efficiency.In this paper, a new model is also proposed for processor selection of these two neighborhood structures.As presented in [21], there are two intuitions used to construct the neighborhood structures.One is that balancing load among various processors usually helps minimizing the makespan, especially when most tasks are allocated to only a few processors; the other is that reducing communication overhead and idle waiting time of processors always results in a more effective schedule, especially given a relatively high unit communication.However, there is a contradiction between these two intuitions, because reducing communication overhead and idle waiting time of processors always means that some processors are with most tasks.
(1) pop subset = InitVNS(pop set); (2) Select the set of neighborhood structures ℎ  ( = 1, 2, 3, . . .,  max ); (3) for each individual m in the pop subset do (4) d = 1; (5) while  <  max do (6) Randomly generate a molecule m 1 from the th neighborhood of m; (7) Apply some local search method with m 1 as the initial molecule (the local optimum presented by m 2 ); ( 8) So, different from the original ones in [21], we develop a new model for processor selection.Let TC load (  ) be all the task execution cost of processor   , and TC comm (  ) is the communication cost overhead of processor   as defined in [21].The values of TC load (  ) and TC comm (  ) are the tendencies of load balancing and communication reducing, respectively (i.e., the tendency of task reducing or increasing).The greater TC load (  ) is the stronger tendency of reducing tasks on   is, and the greater TC comm (  ) is the stronger tendency of increasing tasks on   is.Therefore, a parameter Tend(  ) is developed to measure the tendency with the combination of TC load (  ) and TC comm (  ) as (11).The neighborhood structure computation processes of load balance and communication reduction are as presented in Algorithms 5 and 6, respectively.The proposed model is under the comprehensive consideration of both intuitions and can make the VNS algorithm more effective than the original one: The VNS algorithm in DRSCRO utilizes the dual termination criteria.The termination criterion 1 sets the upper bound of the local search iterations to 20, and the termination criterion 2 sets the maximum iteration number without improvement to 3. The VNS algorithm will stop if either criterion is satisfied.To form a new initial population, a combination strategy is utilized for combining the current population and the VNS output after the VNS algorithm outputs the subset of the population.The current population and the VNS output are first merged and sorted by increasing makespan; then the first PopSize molecules are selected to generate the new initial population.

Elementary Chemical Reaction
Operators.The operators for global optimization not only vary   of each tuple but also interchange the positions of the tuples in a molecule as the intensification searches or the diversification searches [25] to optimize the whole solution.
On-wall ineffective collision (as an intensification search), decomposition (as a diversification search), and synthesis (as a diversification search) are as presented in [30], and we do not repeat them here to focus on our main work.In [30], the function of the ineffective collision operator is similar to that of the on-wall ineffective collision operator.Therefore, different from [30], a modified ineffective collision operator is proposed in this paper, and it utilized the load balance neighborhood structure used in the VNS mentioned     2 from m 2 in the similar fashion.In the end, the operator generates two new molecules m  1 and m  2 from m 1 and m 2 as an intensification search.The detailed executions are presented in Algorithm 7. Figure 7 shows the example of the IntermoleGO, in which the molecules correspond to the DAG as shown in Figure 1(a).

Illustrative Example. Consider the example shown in
Figure 1(a).Its edges are labeled with the communication costs, whereas the execution costs are shown in Table 3.
In the global optimization phase, Algorithm 4, InitMole-GOVNS, is then executed to generate (or to update) the initial population after each iteration as presented in Section 4.1.The molecules are operated during the iterations in the global optimization phase as presented in the framework of DRSCRO in Section 4.1, and the global minimal makespan = 40 is finally obtained, for which the corresponding solution (i.e., molecule) is ((V 1 , 0,  2 ), (V 4 , 1,  2 ), (V 2 , 0,  2 ), and (V 3 , 1,  2 )).

Analysis of DRSCRO.
As a new metaheuristic strategy, the CRO-based methods for DAG scheduling which is proposed very recently have demonstrated the capability for solving this kind of NP-hard optimization problems.By analyzing the framework, molecular structure, chemical reaction operators, and the operational environment in DRSCRO, it can be shown to some extent that DRSCRO scheme has the advantage of three points in comparison with other CRObased algorithms for DAG scheduling.First, to some degree, super molecule in DRSCRO is similar to InitS in TMSCRO [30] or the "elite" in GA [31].However, the "elite" in GA is usually generated from two chromosomes, while super molecule is approached by executing the first phase of DRSCRO.Moreover, in comparison with TMSCRO, DRSCRO uses a metaheuristic strategy (CRO) to get a better super molecule.It is because, as intelligent random-search algorithm, CRO used in the phase of DRSCRO for super molecule selection searches a wider area of the solution space than CEFT applied in TMSCRO, which narrow the search down to a very small portion of the solution space.As a result, a better super molecule may contribute to a better global optimum solution and accelerates convergence.Second, DAG scheduling problem has two complex aspects including task sequence optimization and processor assignment optimization, which lead to a very large and complicated solution space.So, for a better capability of intensification search than other CRO-based algorithms for DAG scheduling on heterogeneous systems, DRSCRO applied VNS algorithm as the initialization of the global optimization phase, which is also as a local search process during the running of DRSCRO, and one of the neighborhood structures of VNS is also utilized in the ineffective reaction operator.Moreover, during the running of DRSCRO, the task priority is changeable in our adopted VNS algorithm and a new model for processor selection is also utilized in the neighborhood structures for promoting efficiency of VNS, different from the VNS proposed in [22].All of three advantages as previously mentioned enhance the ability to get better rapidity of convergence and better search result in the whole solution space, which is demonstrated by the experimental results in Sections 5.

Experimental Details
In this section, the simulation experiment and comparative evaluation of HEFT, DMSCRO, TMSCRO, and proposed DRSCRO are presented.As presented in [27,30], by theory analysis and experimental results, TMSCRO and DMSCRO proved to have better performance than GA; therefore, our work is the further study of CRO-based algorithms for DAG scheduling on heterogeneous systems, and, for DRSCRO as a metaheuristic algorithm, we focus on the performance of our proposed algorithm itself and the comparison between DRSCRO and other similar kinds of algorithms.
First, two extensive sets of graphs as the test beds for comparative study are described.Next, the parameter settings which are used in the simulation experiments are presented.The results and analysis of the experiment, including makespan test and convergence rate test, are given in the final part.

Test Bed.
As presented in [27,30], two extensive sets of DAGs, real-world application and randomly generated application graphs, are considered as the test beds in the experiments to enhance the comparability of various algorithms.The first extensive test bed is two real-world problem DAGs, molecular dynamics code [38] and Gaussian elimination [8].Molecular dynamics are a computer simulation of physical movements of the molecules and atoms, which are allowed to interact for a period of time, in the context of N-body simulation.Molecular dynamics code DAG is shown in Figure 8. Gaussian elimination is used to calculate the solution for a linear equation system, which is applied systematically to convert row operations on a set of linear equations to the upper triangular form.As shown in Figure 9, the total number of tasks in the Gaussian elimination DAG with the matrix size of 7 is 27, and the largest task number at the same level is 6.The reason of the utilization of these two application graphs as a test bed is not only to enhance the comparability of various algorithms but also to show the function application of our proposed algorithm as an illustrative demonstration without loss of generality.The second extensive test bed for comparative study is the DAGs of random graphs.A random graph generator presented in [39] is implemented to generate random graphs in the simulation experiment.It allows the user to generate a variety of random graphs with different characteristics, such as CCR, the amount of calculation of a task, the successor number of a task, and the total number of tasks in a random graph.It is also assumed that all tasks and communication links have the same computation cost and communication cost, respectively.
As shown in Figure 2, the next phase criteria and stopping criteria of DRSCRO are that the makespan stays unchanged for 5000 consecutive iterations in the search loop.And the stopping criterion of TMSCRO and DMSCRO is that the makespan remains the same for 10000 consecutive iterations.

Parameter Setting.
In the experiments, a parameter hl is set to represent the heterogeneity level as presented in the first paragraph of Section 3. It complies with the MHM model assumption and results in the fact that speeds of a computing processor are different for different tasks.In doing so, the heterogeneity level (1 + hl%)/(1 − hl%) is equal to the biggest possible ratio of the best processor speed to the worst processor speed for each task.hl is set as the value to make the heterogeneity level 2 unless otherwise specified in this paper.The details of parameter setting are shown in Table 4.The parameters 6-12, which are the CRO-based algorithms tested in the simulation, are set as presented in [25].

Makespan Tests.
The performance of the proposed algorithm is compared with two state-of-the-art CRO-based scheduling algorithms, DMSCRO and TMSCRO, and a heuristic algorithm HEFT.Each makespan value plotted in the graphs is the average value of a number of independent runs.In the first extensive test bed, the makespan is averaged over 10 independent runs (HEFT is run only once as a deterministic algorithm.),while in the second extensive test     of DRSCRO, the best final value achieved in all these runs, the worst final value, and the related standard deviation or variance are also presented.As shown in Figures 10 and 11, it can be observed that the average makespan decreases as the processor number increases.The results also show that DRSCRO, TMSCRO, and DMSCRO achieve very similar performance, which are all metaheuristic methods.It is because, according to No-Free-Lunch Theorem, all well-designed metaheuristic methods have the same performance on searching for optimal solutions when averaged over all possible fitness functions.The TMSCRO and DMSCRO used in the simulation are welldesigned and taken from the literature.Therefore it proved that DRSCRO developed in our work is also well-designed.

Real-World Application Graphs
A close observation of the results in Tables 10 and 11 shows that DRSCRO outperforms TMSCRO and DMSCRO on average slightly.The reason is only because DRSCRO has better capability of intensification search by applying VNS and the utilization of one of its neighborhood structures in the ineffective reaction operator, as presented in the last paragraph of Section 4.6.Therefore the performance of the average results obtained by DRSCRO is better than that obtained by TMSCRO and DMSCRO, when the stopping criterion is satisfied.Moreover, DRSCRO, TMSCRO, and DMSCRO typically outperform HEFT because they search a wider area of the solution space as metaheuristic methods, while the search of HEFT is narrowed down to a very smaller portion by means of the heuristics.
Figures 12 and 13 and the results in Tables 7 and 8 show the performance of the experimental results of these four algorithms with CCR value increasing.It can be seen that the makespan on average increases with the CCR value increasing.It is because the heterogeneous processors are in the idle state for longer, as a result of the DAGs becoming more communication-intensive.It also can be observed that DRSCRO, TMSCRO, and DMSCRO outperform HEFT and the advantage becomes more significant with the value of CCR increasing, which suggest that heuristic algorithm like HEFT has less consistent performance in a wide scheduling Figure 14 shows the performance on the experimental results of these four algorithms with the processor number increasing.As shown in Figure 14, DRSCRO always outperforms TMSCRO, DMSCRO, and HEFT as the number of processors increases.Figure 15 shows that DMSCRO has better performance than the other three algorithms as the task number increases.The reasons for these are similar to those explained in the third paragraph of Section 5.3.1.Figure 16 shows the makespan on average with CCR values increasing.It can be seem that the average makespan increases rapidly with the increasing of the value of CCR.As shown in Figure 16, the makespan on average increases rapidly when the value of CCR rises.It is the fact that the DAG becomes more communication-intensive with CCR increasing which leads to the processors staying in the idle state for longer.

Convergence Tests.
In this section, the convergence experiments are conducted to show the change of makespan  among DRSCRO, TMSCRO, and DMSCRO.The convergence traces and significant tests are to further reveal the differences between DRSCRO and the other two algorithms.
In these experiments, as suggested in [27], the stopping criteria of these three algorithms are that the total running time reaches a setting value (e.g., 180 s).Under the consideration of comparability, the beginning of the time counting of DRSCRO is set as the start of the global optimization phase processing.In the first extensive test bed, the makespan is averaged over 10 independent runs, while in the second extensive test bed the makespan is averaged over 30 different random graph running instances.

Convergence Trace.
The convergence traces of DRSCRO, TMSCRO, and DMSCRO for processing the molecular dynamics code and Gaussian elimination are plotted in Figures 17 and 18, respectively.Figures 19-21 show the convergence traces when processing the randomly generated DAG sets, of which each contains 10, 20, and 50 tasks, respectively.As shown in Figures 17-21, it can be observed that the convergence traces of these three algorithms have obvious differences.And the DRSCRO converges faster than the other two algorithms in every case.The reason for the better rate of convergence of DRSCRO is as   presented in the last paragraph of Section 4.6 (i.e., DRSCRO takes the advantage of its double-reaction structure to obtain a better super molecule for accelerating convergence).Even though the VNS algorithm adds the time cost in each iteration, the enhanced optimization capability of DRSCRO also makes it obtain a better coverage rate than TMSCRO   and DMSCRO.The simulation experimental results show that DRSCRO converges faster than TMSCRO by 19.4% on average (by 29.3% in the best case) and faster than DMSCRO by 33.9% on average (by 41.2% in the best case).
Moreover, the statistical analysis based on the average values achieved is also presented in Section 5.4.2, to prove that DRSCRO outperforms the other CRO-based algorithms for DAG scheduling from a statistical point of view.

Significant
Tests.Statistical analysis is necessary for the average coverage rates obtained in all cases by DRSCRO, TMSCRO, and DMSCRO, which are metaheuristic methods, in order to find significant differences among these results.Nonparametric tests according to the recommendations in [40] are specifically considered to be used, since the experimental results may present neither normal distribution nor variance homogeneity.Therefore, the Friedman test and the Quade test are applied to check whether significant differences exist in the performance between these three algorithms.A significance level  = 0.05 is used in all statistical tests.Tables 12 and 13, respectively, list the test results of the Friedman test and the Quade test, which both reject the null hypothesis of equivalent performance.In both of these two tests, our proposed DRSCRO is not only compared against all the algorithms but also compared against the remaining ones as the control method.The results in Tables 12 and 13 validate the significant differences  in the performance of DRSCRO, TMSCRO, and DMSCRO.
In sum, it could be concluded that DRSCRO, which is the control algorithm, statistically outperforms the other CRObased DAG scheduling algorithm on coverage rate with a significant level of 0.05.

Discussion
The experimental results of makespan tests show that the performance of DRSCRO is very similar to the other similar kinds of metaheuristic algorithms because when averaged over all possible fitness functions, each well-designed metaheuristic algorithm has the same performance for searching optimal solutions, according to No-Free-Lunch Theorem.However, the proposed DRSCRO can achieve better performance and find good solutions faster than the other similar kinds of metaheuristic algorithms as the experimental results of convergence tests, and the reason for it, as the analysis in the last paragraph in Section 4.6, is that DRSCRO has a better super molecule creation by metaheuristic method, and under the consideration of the optimization of scheduling order and processor assignment, DRSCRO takes the advantages of VNS algorithm in the global optimization phase to improve the optimization capability.A load balance neighborhood structure is also applied in the ineffective reaction operator for a better intensification capability.The new processor selection model utilized in the neighborhood structures also promotes the efficiency of VNS algorithm.

Conclusion and Future Study
An algorithm named Double-Reaction-Structured CRO (DRSCRO) is developed for DAG scheduling on heterogeneous systems in this paper.DRSCRO includes two reaction phases, one for super molecule selection and another for global optimization.The phase of super molecule selection is used to obtain a super molecule by the metaheuristic method for better convergence rate, different from other CRO-based algorithms for DAG scheduling on heterogeneous systems.In addition, to promote the intersection capability of DRSCRO, the VNS algorithm, which is with a new model for processor selection utilized in the neighborhood structures, is used as the initialization of global optimization phase, and the load balance neighborhood structure of VNS is also applied in the ineffective reaction operator.The experimental results show that DRSCRO can also achieve a higher speedup than the other CRO-based algorithms as far as we know.And DRSCRO algorithm can also obtain better performance on average makespan in some cases.
In future work, we will analyze the parameter sensitivity of DRSCRO for promoting its activeness.Moreover, to make the proposed algorithm more practical, DRSCRO will be also extended to aim at two main objectives, such as (1) minimization of schedule length (time domain) and ( 2

Algorithm 1 :
Fit(m) calculating the fitness value of a molecule and the processor allocation optimization.

Algorithm 3 :
InitVNS(pop set, pop subset num) initializing the subset of the population pop set for undergoing the VNS algorithm.

EntryFigure 9 :
Figure 9: A Gaussian elimination DAG for a matrix of size 7.
show the simulation experiment results of DRSCRO, DMSCRO, TMSCRO, and HEFT on the real-world application graphs, and Tables 4-7 list the detail of the experimental results.

Figure 12 :Figure 13 :
Figure 12: Average makespan for the molecular dynamics code; the processors number is 16.

Figure 14 :
Figure 14: Average makespan for random graphs under different processor numbers; task number is 50 and CCR = 0.2.

Figure 15 :
Figure 15: Average makespan for random graphs under different task numbers; processors number is 32 and CCR = 10.

Figure 16 :
Figure 16: Average makespan of DRSCRO for the random graph under different CCRs; the task number is 50.

Figure 17 :
Figure 17: Convergence trace for the molecular dynamics code; CCR = 1 and the number of processors is 16.

Figure 18 :
Figure 18: Convergence trace for Gaussian elimination; CCR = 0.2 and the number of processors is 8.

4 Figure 19 :
Figure 19: Convergence trace for the set of the randomly generated DAGs with 10 tasks.

Figure 20 :
Figure 20: Convergence trace for the set of the randomly generated DAGs with 20 tasks.

Figure 21 :
Figure 21: Convergence trace for the set of the randomly generated DAGs with 50 tasks.
) minimization of number of used processors (resource domain).the National High Technology Research and Development Program of China (863 Program) (Grant no.2015AA020107).

Table 1 :
Parameters used in CRO.
The PE value of a molecule is calculated by fitness function, which is equal to the objective value, makespan, of the corresponding solution.And KE is for helping the molecule escape from local optimums and its value is nonnegative.

Table 2 :
Definitions of notations.Input directed acyclic graph with || nodes representing tasks and || edges representing constrained relations among the tasks  = {  |  = 1, 2, 3, . . ., ||} Set of heterogeneous processors in target system EcCost pop subset adds SMole; (21) pop subset adds the molecules in tempSet with tuple order different from SMole; (22) while | | ̸ =    do (23) pop subset add a molecules in pop set which do not exist in pop subset; (24) end while Choose the processor  max with the largest Tend( max ); (5) Randomly choose a task V random from exec( max ); (6) Randomly choose a processor  random different from  max ; (7) Reallocate V random to the processor  random ; (8) Encode and reschedule the changed solution   ; (9) return   ;

Table 3 :
Execution cost of DAG tasks by each processor.from given molecules m 1 and m 2 .This operator first uses the steps in OnWallGO to generate m  1 from m 1 , and then the operator generate the other new molecule m

Table 4 :
Parameter values for simulation experiment.
bed the makespan is averaged over 30 different random graph running instances.Moreover, to prove the robustness 14 Mathematical Problems in Engineering

Table 5 :
Experiment results for the molecular dynamics code, CCR = 1.0.

Table 7 :
Experiment results for the molecular dynamics code; the processors number is 16.

Table 8 :
Experiment results for Gaussian elimination graph; the processors number is 8.

Table 9 :
Experiment results for random graphs under different processor numbers; task number is 50 and CCR = 0.2.

Table 10 :
Experiment results for random graphs under different task numbers; processors number is 32 and CCR = 10.

Table 11 :
Experiment results of DRSCRO for the random graph under different CCRs; the task number is 50.