A Genetic Algorithm for Task Scheduling on NoC Using FDH Cross Efficiency

A CrosFDH-GA algorithm is proposed for the task scheduling problem on the NoC-based MPSoC regarding the multicriterion optimization. First of all, four common criterions, namely, makespan, data routing energy, average link load, and workload balance, are extracted from the task scheduling problem on NoC and are used to construct the DEA DMU model. Then the FDH analysis is applied to the problem, and a FDH cross efficiency formulation is derived for evaluating the relative advantage among schedule solutions. Finally, we introduce the DEA approach to the genetic algorithm and propose a CrosFDH-GA scheduling algorithm to find the most efficient schedule solution for a given scheduling problem.The simulation results show that our FDH cross efficiency formulation effectively evaluates the performance of schedule solutions. By conducting comparative simulations, our CrosFDH-GA proposal produces more metrics-balanced schedule solution than other multicriterion algorithms.


Introduction
The scheduling problem has long been a research hotspot since its proposal in the 1950s, as the Job-shop scheduling problem [1].After the computer technology emerged in the 1940s, the scheduling problem also found its position in the computer science region, as the task scheduling problem on the uniprocessor in the 1960s [2], the multiprocessor in the 1970s [3], the distributed computing in the 1980s [4], and the grid computing in the early 21st century [5].Now the chip fabrication technology has brought us to the singlechip multicore era [6].The presence of chip multiprocessor (CMP), especially the NoC (network-on-chip)-based MPSoC [7], brings new challenges to the task scheduling algorithm design.
In NoC solution, the idea of introducing the network infrastructure to the chip design, along with the newly arisen concept of green communication [8], makes the goal of scheduling algorithm change from single-objective optimization on makespan to performing optimization simultaneously on multiple metrics, not only the traditional makespan, but also energy [9] and NoC criterions [10], and some of the optimizations of these metrics are even in conflict with each other.So the goal of scheduling algorithm design leans forward to balancing these multiple metrics.
On the other hand, DEA is a nonparametric technique that is used to measure the relative efficiency of multiinput multioutput DMUs (decision making units).It was first presented by Charnes, Cooper, and Rhodes as the CCR model in 1976 [11], then developed into several variations based on different RTS (return-to-scale) assumptions.The concept of efficiency in DEA gives us a reasonable standard to make trade-off between multiple metrics.
In this paper, a FDH DEA model of NoC task scheduling is constructed, and a FDH cross efficiency formulation is proposed based on peer appraisal for further assessment of the relative advantages of DMUs.Then the proposed DEA approach is introduced to the genetic algorithm, and a CrosFDH-GA scheduling algorithm is proposed for the task scheduling problem on NoC to find the most efficient and balanced schedule solution.
The rest of this paper is organized as follows.Section 2 summarizes the related work of this paper; Section 3 formulates the task scheduling problem on NoC; the FDH and cross efficiency FDH formulation is given in Section 4; Section 5 presents our CrosFDH-GA scheduling algorithms; (1, 2) (0, 1) (0, 2)  simulations results and discussion are given in Section 6; Section 7 concludes the paper.

Related Works
The multicriterion scheduling algorithms for CMPs have been widely researched in recent years.In [12], a scheduling algorithm is proposed for multicore processors to avoid resource contention, as well as to reduce energy consumption.A modified genetic algorithm which incorporates bacteriological algorithm is proposed in [13] to maximize the system reliability and reduce makespan.In [14], the optimization of makespan and workload balance is addressed, and a NSGA-II based schedule algorithm is proposed for multicore-based grid.A multiobjective evolutionary algorithm (MOEA) based schedule heuristic is proposed for the joint optimization of performance, energy, and temperature on multicore processors in [15].
As for the DEA's application in the field of task scheduling on multicore, a FDH-based evaluation method for the assessment of schedule heuristics is proposed in [16].Although both [16] and our work adopt DEA FDH model as the analytic tool, our work introduces the concept of cross efficiency to the FDH model and uses FDH cross efficiency to rank schedules.Moreover, the incorporation of DEA evaluation method into metaheuristic also distinguishes itself from the work in [16].

Problem Formulation
3.1.Task Model.In this paper, tasks are modeled using directed acyclic graphs (DAGs).A DAG  = (, ) is an acyclic graph where  is the set of nodes which represent the tasks and  is the set of edges in which an element   denotes the communication from task  to task .The edge indicates the precedent relation between two tasks.
Each node V  and edge   are associated with a weight, denoted by   and   , respectively.Weight   is the computational load required by a processing Element (PE) to execute task ; and   is the data transmission load between task  and task .In our work, both   and   are presented using time unit (cycles).

Network-on-Chip
Hardware.The target hardware is a 2D mesh NoC-based MPSoC, as illustrated in Figure 1(a).Each PE is connected to a router, and routers are interconnected with each other through bidirection links.Data is transferred through NoC in the form of packets.
PEs are homogenous processor cores with local data cache.If two consequential tasks are scheduled to the same PE, the successor task reads the predecessor's data directly from the data cache of the PE without routing in NoC.
The microstructure of a NoC router is shown in Figure 1(b).The router has five Inports and Outports corresponding to five directions of East, West, North, South, and Local.The decoder in the Inport scans the first flit of the FIFO for any incoming packet.If the decoder detects the head flit of a packet, it performs XY routing algorithm and sends request signal to the arbiter of the corresponding Outport.If the arbiter receives multiple request signals, the contention is solved using Round-Robin arbitration.The granted Inport then forwards the packet to the downstream router.Wormhole routing is adopted to minimize the buffer requirement as well as the packet latency [17].The back pressure mechanism is also employed to further reduce endto-end delay [18].
The energy model of a NoC is presented by the Bit Energy proposed in [19].Analytically, the average energy consumption of transmitting one bit from node  to node  is calculated by where   bit and   bit represent the energy consumed on the node and on the link, respectively, and  hops is the number of nodes the bit passes on its way from node  to node .

Monitored Metrics.
In this paper, four common metrics of NoC are extracted and monitored for the assessment of schedule solution, and they are makespan, data routing energy, average link load [20], and workload balance [21].
Makespan (the  metric), which is the amount of time required by a NoC to finish entire tasks in a DAG following the instruction of a schedule, is the time metric of a schedule, while the data routing Energy (the  metric) is the total amount of energy dissipated in each NoC component during the execution of the DAG.
The average Link load (the  metric) is calculated by adding up all the data transmission time (in cycles) on each link and then dividing it by the number of links (for the NoC in Figure 1(a), there are 48 links).The  metric represents how busy the NoC is during the execution, and a higher average link load metric implies a higher possibility of link contentions.A good schedule is supposed to reduce the  metric.
Finally, the workload Balance (the  metric) is defined to be the inverse coefficient of variant of the total workload on each processor, as shown in (2).The load prc () is the actual load on processor , and the load ave is the average load.The  metric reflects the load balance of the processors workload balance = load ave (∑  (load ave − load prc ()) 2 ) 1/2 . (2)

DEA Evaluation of Schedules
4.1.A Brief Review of DEA.Data envelopment analysis (DEA) is a nonparametric technique that is widely used to measure the relative efficiency among many-input, manyoutput decision-making units (DMUs), which in our context are the schedules.The efficiency of a DMU is defined as the weighted sum of its output divided by the weighted sum of its input.The essence of DEA is that it allows each DMU to choose a particular set of weight coefficients which favors its own efficiency, under the constraint that the efficiencies of all DMUs calculated by this set of coefficients do not exceed 1.A DMU is "efficient" if the efficiency calculated by DEA is 1; otherwise the DMU is marked as "inefficient." The following Linear Programing (LP) problem is the CCR model of DEA: (CCR multiplier form) where vectors of the inputs and outputs;  is the number of DMUs; and the objective function  is the efficiency of DMU .
The result of applying DEA is a classification among DMUs as efficient group or inefficient group.The efficient DMUs form an efficient frontier on the multi-input, multioutput space that envelops all inefficient DMUs.The projection of inefficient DMU on the efficient frontier is the hypothetical efficient unit, which is a linear combination of the efficient DMUs.An inefficient DMU can also be converted to an efficient DMU by proportionally scaling down by the value of its efficiency in the inputs and maintaining its original outputs.This interpretation of DEA efficiency is corresponding to the envelop form of DEA, which is the dual problem of (3) (CCR envelop form) min where  = ( 1  2 ⋅ ⋅ ⋅   ) and  = ( 1  2 ⋅ ⋅ ⋅   ) are the input and output matrices;  = ( 1  2 ⋅ ⋅ ⋅   )  ∈   is a nonnegative vector; and  is the efficiency of DMU .
The way that the efficient frontier is generated differentiates between DEA models which imply different returns to scale assumptions.There are four basic returns to scale assumption: constant returns to scale (CRS), corresponding to the CCR model [11]; variable returns to scale (VRS), corresponding to the BCC model [22]; increasing returns to scale; and decreasing returns to scale.
In this paper, we focus on a special case of DEA, namely, the free disposal hull (FDH) [23].In VRS FDH formulation of DEA, each DMU is evaluated by comparing itself to other DMUs on a one-on-one basis, and a DMU is considered efficient only when no other DMU dominates it.
In most cases, the idea of DEA that let DMU specify its own weight coefficients to show its maximum advantage is desirable.However, in some extreme scenes, a DMU can "cheat" a high efficiency score by weighting a single input or a single output and setting the rest weight coefficients close to 0. This can happen when some DMUs have a particularly small input or particularly large output; in our words, these DMUs have unbalanced metrics, and these "mavericks" need to be depreciated.Moreover, although DEA effectively discriminates between efficient and inefficient DMUs, it does not further assess the relative advantages among the efficient ones.
One solution to the above questions is to introduce cross efficiency in the efficiency measurement.The concept of cross efficiency, corresponding to the simple efficiency implied by original DEA, is the peer-appraisal equivalent of DEA's selfappraisal process.The cross efficiency of a certain DMU is the efficiency value calculated by using weight coefficients derived by other DMUs.If a DMU is a maverick or has unbalanced metrics, then its cross efficiency value derived from other DMU's coefficients is not likely to be high.

Mathematical Problems in Engineering
Two widely accepted crossefficiency formulations, the aggressive and benevolent formulations, were proposed in [24] based on CCR Model.Both formulations add a secondary goal to the normal DEA efficiency calculation (maximizing reference DMU's efficiency): the aggressive formulation minimizes target DMU's crossefficiency, while the benevolent formulation maximizes its efficiency.Given that the simple DEA efficiency of DMU  is   , then the crossefficiency of DMU  evaluated by DMU  is defined by (CCR crossefficiency, benevolent formulation) (5)

FDH DEA Evaluation of Schedules.
In order to apply data envelopment analysis to the schedule evaluation, the multiinput multioutput DMU model needs to be defined using the schedule metrics, namely, makespan (), routing energy (), average link load () and workload balance (), proposed in Section 3. The classification of metrics as inputs and outputs follows a simple "rule of thumb" [16].If the value of a metric is larger-is-better, then it is an output; otherwise the metric is an input.As a result of this classification, the DMU model of our schedule evaluation is as follows:  = (  )  are the inputs, and  =  is the output.
Moreover, in the rest of this paper the term "schedule" is referring to a schedule that is discriminable using the fourmetric classification.If two schedules have identical metrics, they are regarded as the same schedule; at least they are not discriminable under current metrics.
With the scheduling DMU model, the observed schedules are defined as follows.
Another concept that is relevant to DEA is the possible production set.The possible production set is the space enveloped by the efficient frontier on the multi-input multioutput space.Normally, the possible production set is unknown in the DEA and needs to be constructed using the observed DMUs.The FDH possible production set postulates were proposed in [23].Here, under our context, we restate these postulates as the following axiom.
Axiom 1 (possible schedule set).The possible schedule set (PSS) of a schedule set Γ satisfies the following.
Postulate II is the free disposal postulate, which suggests a free disposal hull of Γ [25].Together with the determinist Postulate I, they define the FDH possible schedule set of our schedule efficiency analysis.Now, we formally introduce FDH DEA to the schedule evaluation and define the efficient schedule as follows.
Definition 2 (efficient schedule and efficient schedule set).In a schedule set Γ, a schedule   is called efficient if the optimization problem (6) has an optimal solution of  * = 1; otherwise   is inefficient.The set of all efficient schedules in Γ is called the efficient schedule set, denoted by Γ  , and likewise, the inefficient schedule set is denoted by Γ  : where   = (      )  and   =   are the input vector and output scalar of   ;  is the number of schedules in Γ;  = ( 1  2 ⋅ ⋅ ⋅   ) and  = ( 1  2 ⋅ ⋅ ⋅   ) are the input and output matrices of Γ;  = ( 1  2 ⋅ ⋅ ⋅   )  is a binary vector in   ;  + and  − are the nonnegative slack variables, representing input excess and output shortfall;  is a non-Archimedean infinitesimal constant; and  is the efficiency of   .Optimization problem ( 6) is a Mixed-Integer Programing (MIP).The binary vector  and the constraint 1⋅ = 1 enforce a one-on-one comparison between target schedule   and all the schedules in Γ to search for a reference schedule   that minimizes .
The first two constraints of problem ( 6) can be simplified to (7) by removing the slack variables.Obviously, the problem has feasible solutions when  =  and  = 1 in this situation.From this point on, if a reference schedule   is found with  < 1, that means that   produces output   at least as same as   , with the inputs no more than   , which is a scale-down from   .Then this makes   inefficient: However, only  = 1 is not enough for a schedule to be efficient.Consider an efficient schedule  = (   ) and an inefficient schedule γ = ( + Δ    ) which is distinguished from  by a small input excess Δ  in the makespan; the constraint (7) holding for  also holds for γ; thus  = 1 for γ.The slacks variables  + and  − are introduced to remove these schedules with input excess and output shortfall.The nonzero  + and  − force the objective function in (6) to be less than 1.
Definition 1 implies the dominance/Pareto optimality of an efficient schedule.A dominant schedule is defined as follows.

Definition 3 (dominant schedule). A schedule 𝛾
greater than that of   ; (II)   =   is not less than   ; (III.a) at least of component of   is less than the corresponding one of   (dominates in input); or (III.b)  is greater than   (dominates in output).
Then the relationship between efficient schedule and dominant schedule is given in Theorem 4.

Theorem 4 (dominance and efficiency). In a schedule set Γ, a schedule 𝛾 𝑖 is called efficient if and only if no other schedule dominates it.
Proof.If Part.Suppose there is a schedule   that dominates   .
The efficiency of   is calculated by solving the following optimization problem: Let   = 1 and  = 1; then constraints of (8) are converted to From the domination relation, we have   ≥   ,   ≥   ,   ≥   , and   ≤   , and at least one of above inequations holds strictly.This means at least one of  +  ,  +  ,  +  , and  −  in ( 4) is not zero.So the objective function  in (8) has a feasible solution of   = 1 −  ⋅ ( +  +  +  +  +  +  −  ) < 1, and schedule solution   is not efficient.
Theorem 4 relates the relatively abstract concept of FDH efficiency to the concept of dominance.Moreover, the following two corollaries are deduced from Theorem 4.

Corollary 5. In a schedule set
then it is in the efficient schedule set Γ  .
Proof.If   satisfies one of the above conditions, then there is no schedule dominating   .From Theorem 4,   is efficient.
Corollary 5 points out that the schedule with the smallest makespan or the smallest energy consumption or the smallest queuing time or the best workload balance is an efficient schedule.
Corollary 6.In a schedule set Γ, removing any inefficient schedule from Γ does not change the elements in Γ  .
Proof.The schedules in Γ  are dominant ones, and removing any inefficient schedule will not change the dominant position of the elements in Γ  .Thus, Γ  remains unchanged.

FDH Cross Evaluation of Schedules.
In this section, a cross evaluation process is proposed based on peer-appraisal FDH DEA for further assessment of schedules.DEA calculates the efficiency of a DMU by allowing the DMU to choose a scenario that is best for itself.Although this basis is plausible in most situations, some "maverick" DMUs, especially the DMUs with a single small input or a single large output, may "cheat" DEA to achieve high score by valuing its only strength and depreciating other metrics.These unbalanced DMUs must be devaluated during further assessment.Moreover, an assessment of relative advantages among efficient DMUs is also required.
FDH model is a MIP problem in nature.In order to derive its peer-appraisal variation, the dual problem of the FDH DEA in ( 6) is needed to construct the formulation.Normally, Mathematical Problems in Engineering it is difficult to write the dual problem of a MIP; however, by exploring the particularity of the vector , Agrell has proven in [26] that the MIP problem in FDH model can be simplified to a LP problem.Using his result, the FDH envelop form in ( 6) is reduced to the LP problem given in min The dual problem of ( 11) is (FDH multiplier form equivalent) max LP problem (12) is the multiplier form equivalent of FDH model.It also reveals the economic meaning of FDH.Coefficients (, V) are the prices for output  and input .The profit of DMU i under the price system (  , V  ) is calculated by   ⋅   − V  ⋅   , and the input cost of target DMU is normalized V  ⋅   = 1.The second constraint of ( 11) is equivalent to which suggests a nonnegative profit difference between the input-scaled DMU  and DMU .The upper bound of scale factor , calculated by letting  = , is  = 1, which indicates a scaling down of DMU 's input.FDH scans all the DMUs to find a reference DMU and a price system with the largest scale-down factor .
Based on the FDH multiplier form equivalent given in (12), we now define the peer-appraisal FDH cross efficiency as follows.
Definition 7 (FDH cross efficiency, benevolent formulation).Given that the FDH efficiency of schedule   is   , the FDH cross efficiency   of   evaluated by   is the optimal value of  in max Definition 7 is the FDH correspondence of the peerappraisal CCR (benevolent formulation) in [24], which is reviewed in Section 4.1.The third constraint LP in (14) ensures that the efficiency of DMU i calculated by coefficients (  , V  ) is the FDH efficiency   (primary goal), and under this constraint, (14) searches for the best efficiency value of   (secondary goal).
The following two theorems reveal the relation between FDH efficiency and FDH cross efficiency.Theorem 8.The cross efficiency of schedule   evaluated by itself is its FDH efficiency.
Proof.Assume that the cross efficiency of schedule   evaluated by itself is   and the FDH efficiency (simple efficiency) is .
The calculation of   is solving the following LP: Compared with the FDH multiplier form equivalent in (12), the LP in (15) has extra constraints of where  is the optimal value of (12).Since  =  satisfies (12), the extra constraints of ( 16) always hold true.That means  is also the optimal value of (15); thus  =   .
Theorem 9.The cross efficiency   of schedule  j evaluated by schedule   does not exceed the value of its simple efficiency   .
Proof.It is obvious that the optimal solution in ( 14) is a feasible solution of calculating schedule   's FDH efficiency using (12) max Thus the optimal value of  in ( 14) is not greater than the optimal value of  in (17).
Using the peer-appraisal FDH proposed in (14), the cross efficiency of a DMU is defined as follows.
Definition 10 (cross efficiency matrix, average cross efficiency, and the most efficient schedule).In a schedule set Γ with  schedules, the cross efficiency matrix is defined by where   is the FDH cross efficiency of   evaluated by   using (14).The average cross efficiency of DMU  is the average value of its cross efficiencies evaluated by all DMUs in Γ, defined by   = ∑  =1   /.The most efficient schedule ( MES ) is the schedule with the largest average cross efficiency.
The cross efficiency matrix  cross is constructed to calculate the cross efficiency of each DMU.The diagonal elements in  cross are the self-appraisal FDH efficiencies, and the rest of the elements are the peer-appraisal FDH efficiencies.The elements in th row of  cross are the efficiencies of DMU  rated by peers, and the elements in th column are the efficiencies of peers rated by DMU .The cross efficiency of DMU  is the average value of the th row.And  MES is the best (both efficient and well metrics balanced) schedule in Γ under our evaluation system.
Corollary 11.The average cross efficiency   of a schedule   , which is defined in Definition 10, is not greater than its simple efficiency   .
Proof.Using the result of previous theorem, the cross efficiency   of schedule   evaluated by arbitrary schedule   is not greater than the   .Thus   , which is the average value of   , is not greater than the   .
Then the relation between the FDH efficiency and the most efficient schedule is given in the following theorem.
Theorem 12.The most efficient schedule of a schedule set Γ is an efficient schedule.
Proof.Suppose the most efficient schedule is an inefficient schedule, and then there exits an efficient schedule that dominates it.Formally, let   = (  ,   ) be the most efficient schedule of Γ, where   = (  )  is the input vector and   =  is the output scalar.Let  *  be the cross efficiency of schedule   evaluated by schedule   , and  *  =  *  , V *  = (V *  V *  V *  ),  = 1, 2, . . ., , are the optimal coefficients.Let   be a dominant schedule of   .
First assume that   dominates   in output (balance), and   = (  ,   ), where   =   = (  )  and   =  + Δ has a small increment in balance.Now we calculate the cross efficiency of schedule   evaluated by schedule   as follows: Compared with the calculation of the cross efficiency of schedule   evaluated by schedule   , it is easy to verify that ẑ =  *  + max  ( *  ⋅ Δ),  *  , V *  ,  = 1, 2, . . ., , is a feasible solution of the above LP.Then the optimal value of   is not less than  *  +  *  ⋅ max  (Δ), which is greater than  *  .That means the cross efficiency of schedule   evaluated by an arbitrary schedule   is greater than the   , which contradicts the assumption of   being the most efficient schedule.
Then assume that   dominates   in input.Without loss of generality, we assume that   dominates   in makespan, and   = (  ,   ), where   =  and   = ( − Δ  )  has a small decrement in makespan.Then the cross efficiency of schedule   evaluated by schedule   is calculated by solving the following LP: . By substituting V   ,    , and    in (20), it is easy to verify that they are a feasible solution of (20).Then the optimal solution of the optimal value of   is not less than  *  ⋅ (1 + Δ ⋅ max  (V *  )), which contradicts the assumption of   being the most efficient schedule.
Finally, if   dominates   in multiple metrics, say   = ( − Δ    + Δ), intermediary schedules of   = ( − Δ   ) and   = ( − Δ    + Δ) can be constructed to prove that   is not the most efficient schedule using the previous results.Now we will prove the unit invariance property of our peer-appraisal FDH proposal as well as the original FDH model.(5) and (12) are independent of the units that inputs and outputs are measured in.

Theorem 13 (unit invariance property). The values of optimal goal in
Proof.First we prove that FDH efficiency ( 5) is unit invariant.The FDH model in ( 5) is equivalent to the LP in (11).So if (11) is unit invariant, then (5) is too.
Under the new unit system, the inputs and outputs now are the original values multiplied by conversion coefficients, denoted by    = ( ) and V = V   , which also satisfy the constraints of original problem in (12).This suggests that   * , û , and V are also a feasible solution of original problem, and   * > ẑ =  * contradicts the optimal assumption of  * .So   * = ẑ is the only possibility.This proves that the FDH efficiency is not affected by the units that inputs and outputs are measured in.
Using this result, the unit invariance of peer-appraisal FDH can be proved in the same manner.

Efficient Scheduling Using CrosFDH-GA
5.1.Basic Design.The most straightforward way to introduce our FDH cross evaluation method to the genetic algorithm (GA) is to calculate the average cross efficiency of all individuals in the pool and use the efficiency value as the fitness of each individual.The problem of this simple solution is the high computational requirement of DEA calculation.
For a genetic algorithm with 1000 population, the calculation of FDH simple efficiency of a single individual is to solve a LP with 4001 variables (  , V  , V  , v  ,  = 1, 2, . . ., 1000, and ) and 2000 constraints using (12), and the calculation of FDH simple efficiency of all individuals is to solve 1000 such LPs.
Then the calculation of FDH cross efficiency of an individual evaluated by another individual is to solve a LP with 4001 variables and 3000 constraints using (14).To calculate the average cross efficiency of an individual, solving 999 such LPs is required.And to calculate the average cross efficiency of all individuals, it requires repeating the process for 1000 times.
In our computing configuration (Intel i5-3210M, 4 Gb RAM, 32 bit Win7, VS2010, and GLPK 4.47), the calculation of a single FDH simple efficiency and a single FDH cross efficiency in the above DEA-GA implementation takes about 3.6 seconds and 6.2 seconds on average.The calculation of  an average cross efficiency requires over 10 minutes.The DEA solving time of 1000 individuals in a single generation is estimated to be about 7 days.If the GA runs for 50 generations, the whole solving time is near a year, which is beyond acceptable level.
In this paper, we propose a solution to this problem using a "divide-and-conquer" method.The whole population (Metapopulation) is divided in to 4 subpopulations: Subpopulation M, Subpopulation E, Subpopulation L, and Subpopulation B, each of which experiences its own evolution towards a single optimization goal (Makespan, Energy, average Link load, and workload Balance, correspondingly).In each generation, after the algorithm evaluates the performance of every individual, the elites of each subpopulation are selected and regrouped as the DEA-ready pool for the DEA evaluation process.The basic idea is that, according to Theorem 4, the more preeminent a schedule is in one metric, the less likely it is dominated by another schedule.Then the top performers in the DEA-ready pool are duplicated to each subpopulation and replace the bottom individuals, and the subpopulations continue to evolve.The process is shown in Figure 2.

Genetic Operations.
In our proposal, a chromosome or an individual represents one schedule solution.The structure of a chromosome is an array with the size of processors number, and the value of its element, ℎ[] = , represents that task  is assigned to processor .
The evolution of the four subpopulations is independent, each of which goes through a complete series of genetic operations.The basic framework of genetic algorithm is based on the proposal in [27] as follows.
Step 1 (initialization).The chromosome is randomly created and added to the subpopulation.When the number of population reaches subpopulation size, algorithm goes to the next step.
Step 2 (evaluation).Performance of each individual in the pool is evaluated.Step 3 (selection).Chromosomes are ordered according to their subpopulation's optimization goal, and the top sel ratio chromosomes directly enter the next generation's pool.
Step 4 (crossover).Two random chromosomes, chr1 and chr2, are selected from current pool, and two new chromosomes, nchr1 and nchr2, are generated by swapping middle part of the chromosome array.This step produces population of cros ratio * subpopulation size.
Step 5 (mutation).A random chromosome is picked, and values of two random positions of the chromosome are swapped to produce a new chromosome.The population generated in the mutation step accounts for mut ratio of subpopulation size.
Step 6 (termination).GA is terminated after certain number of generations.If GA does not meet its terminal condition, the algorithm iterates to Step 2 and repeats the whole process.
For example, consider a scheduling problem of 9 tasks (task 1∼task 9) scheduling to 3 processors (processor 1∼ processor 3).Two randomly generated chromosomes, chr1 and chr2, are listed in Table 1.Following the previous definition, chr1 represents that task 2 and task 7 are scheduled to processor 1; tasks 4, 5, and 8 are scheduled to processor 2; and tasks 1, 3, 6, and 9 are scheduled to processor 3. Chromosome chr2 represents that tasks 3, 5, 6, 7, and 8 are scheduled to processor 1; task 1 and task 2 are scheduled to processor 2; and task 4 and task 9 are scheduled to 3.
In the crossover operation, two randomly selected chromosomes, chr1 and chr2, swap their middle part of chromosome to form two new chromosomes nchr1 and nchr2, as shown in Figure 3.
In the mutation operation, a new chromosome nchr1 is generated by randomly picking a chromosome chr1, and swapping two arbitrary positions in chr1 as shown in Figure 4.

Cross Evaluation of Individuals.
The elites of each subpopulation are selected to form a DEA-ready pool.Then the DEA approach is applied to the individuals in this pool.
First, the FDH simple efficiency of every individual in the pool is calculated, and the inefficient individuals are removed.According to Corollary 5, the removal of inefficient DMUs does not change the dominant position of the efficient ones.Then the average cross efficiencies of the remaining individuals are calculated, and the individual with the largest value of average cross efficiency is marked as the most efficient schedule.The reason for the removal of inefficient DMUs is threefold: first of all, as proven in Theorem 12, we know that the most efficient schedule which we are pursuing is not an inefficient schedule; secondly, the removal of inefficient schedules eliminates the influence of these obviously defected schedules on the coefficients of the following calculation of FDH cross efficiency (otherwise, an inefficiency schedule would become a constraint in the calculation of FDH cross efficiency according to (14)); finally, it further reduces the computational demand of our algorithm.
Pseudocode 1 shows cross evaluation process in our algorithm.

Simulation Results of FDH Cross Evaluation Formulation.
In this section, we extract 40 actual schedule solutions from our simulation, and use our proposed FDH cross evaluation, as well as other DEA methods, to analyze the performance of these DMUs.The chosen schedules are from the 5th generation of our CrosFDH-GA for a schedule problem of scheduling 100 tasks onto a 4 × 4 mesh NoC.Top 10 best schedules in each subpopulation are grouped as our schedule set.The schedules are listed in Table 2, and for more intuitive observation, all the metrics shown are preprocessed by dividing the value by the average value of each metric in the set.As we have proved in Theorem 13, this normalization process will not affect the value of the DMU's efficiency.
Two kinds of FDH cross efficiencies, Cros FDH (All) and Cros FDH (Eff), are presented in Table 2.The difference between these two is as follows.Cros FDH (All) calculates the FDH cross efficiency using all 40 schedules while Cros FDH (Eff) only takes into account the FDH efficient ones, which is suggested in Section 5.3.The value of CrosFDH (Eff) for inefficient schedules is not calculated and marked with "-." Moreover, a maverick index (MI), which is suggested in [24], is calculated for each FDH cross efficiency.MI for DMU  is calculated by MI  = (  −   )/  .
MI measures the difference between a DMU's simple efficiency and its cross efficiency.The larger MI value implies that the DMU is more likely to be a maverick that "cheats" a high simple efficiency by choosing a particular set of coefficients that favors its only strength and depreciating other metrics, in our words, a metric-unbalanced DMU.
From Table 2 we observe that, among the 3 simple efficiency formulations, CCR has the smallest efficient schedule number of 2, while BCC has 9 efficient schedules, and FDH has 13.Both efficient schedules in CCR are efficient in BCC and FDH; all 9 efficient schedules in BCC are also FDH efficient, and FDH has 4 extra schedules: schedule 4, schedule 15, schedule 18, and schedule 19, than BCC.The difference between the efficient schedule numbers of these three DEA models is caused by the different shapes of efficient frontier, which is generated by different constraints in the model formulations.The convex-shaped efficient frontier of BCC in the multi-input multioutput space contains more DMUs than the CCR efficient frontier, while the staircase-like FDH efficient frontier has the largest number of efficient DMUs on it.Moreover, the value of CCR efficiency of a schedule is generally the smallest one among the three models, and BCC is generally smaller than the FDH efficiency.In fact, as pointed out in [29], the FDH efficiencies are generally higher than CCR and BCC efficiencies.
Three DMU ranking methods, CCR super efficiency, CCR cross efficiency, and FDH cross efficiency, are also compared in Table 2.As observed in the table, CCR super efficiency and CCR cross efficiency are consistent with the CCR simple efficiency, and both CCR ranking methods mark schedule 32 as the best schedule.For the FDH average cross efficiency, it is easy to tell that CrosFDH (all) is more discriminating than the previous methods.The FDH average cross efficiencies of 13 FDH efficient schedules vary from 0.9934 (schedule 20, ranks 1 in 40 schedules) to 0.905 (schedule 4, ranks 35 in 40 schedules).The reason for an efficient schedule 4 achieving such a low average cross efficiency score is explained by its MI value.schedule 4 has a high MI of 0.105, which suggests that it "cheats" in the FDH simple efficiency calculation.A close look on its metrics reveals that it compromises too much on the  and  metric.The same situation happens with schedule 31 (ranks 25 in 40) and schedule 32 (ranks 24 in 40), in which both are classified as CCR efficient schedules and schedule 32 even being marked as the "best" schedule under CCR super efficiency and CCR cross efficiency analysis.MIs of schedule 31 and 32 are 0.0568 and 0.0553; the relevantly high MIs imply that they are more likely to be mavericks.And after examining their metrics, it is shown that they are both metric unbalanced because they trade too much performance on the  and  metric for the  metric.
As to the two different implementations of the FDH average cross efficiency, the Cros FDH (All) and Cros FDH (Eff), deliver very similar results on the DMU ranking.Top 3 schedules under Cros FDH (All) are schedules 20, 17, and 18. Comparing to the top 3 ranking of Cros FDH (Eff), schedules 17, 20, and 18, only a slight change of order exists, which validates the process of inefficient DMU removal in the cross evaluation of individuals proposed in Section 5.3.
Moreover, the average values of four metrics and MI (All) in each subpopulation are calculated and listed in Table 3.The most interesting results are in the Subpopulation B. The average workload balance of Subpopulation B is almost as twice as the average value of the rest three subpopulations.However, the performance of other metrics is not so good in the Subpopulation B and is 1.2%, 15.4%, and 16% larger than the average value of the rest three subpopulations in makespan, energy, and average link load, respectively.This In this section, comparative simulations are made to evaluate the performance of our CrosFDH-GA scheduling algorithm.Twenty DAGs, tg1∼tg20, are generated using Task Graph For Free (TGFF) [30].The task number of generated DAGs varies from 50 to 100.Along with two real-world application, solving laplace equation using Gauss-Seidel algorithm [31] and molecular dynamic coding [32], a total of 22 task sets are simulated.The control groups of our simulation are four GAs with different global objective functions, and they are multiplication and division (MD), weighted sum (WS), weighted exponential sum (WES), and exponential weighted criterion (EWC) based on global criterion method [33].The fitness functions are presented as follows: where   ,   ,   , and   are the weight coefficients of the corresponding metric.In our implementation,   ,   ,   , and   are all set to 0.25, which means there is no preference between four metrics.Moreover, the sel ratio, cros ratio, and mut ratio of GA are 0.2, 0.4, and 0.4, and GA terminates its iteration after 50 generations.The population of four global objective functionbased GAs is 1000.The subpopulation size of CrosFDH-GA is 250, which ensures that total population is 1000 individuals.
All the output schedules are simulated under a System C based cycle-accurate NoC simulator, which is a wormholerouting modified version of [34].The implemented NoC simulator is a 4 × 4 mesh NoC with the router structure illustrated in Figure 3.The link width is 16 bit, and the FIFO depth is one flit. routing is used to forward packets and RR arbitration is adopted to solve contentions.The NoC simulator also integrates Orion 2.0 [35] to measure the actual routing energy during execution of a DAG.As shown in Figure 5, it is observed that the four global criterion GAs render similar performance on the , , , and  metrics, while the proposed CrosFDH GA exhibits its different tendency on the optimization of metrics.As shown in Figure 5(a), all five scheduling algorithms demonstrate same level of performance on the makespan.The main performance difference exists in the optimization of the energy (Figure 5(b)), average link load (Figure 5(c)), and workload balance metrics (Figure 5(d)).The four global criterion GAs always output the schedule solution with the best workload balance (in tg10, MD-GA and WS-GA do not find the schedule solution with the best  metric within 50 generations as the WES-GA and EWC-GA do), as shown in Figure 5(d).On the other hand, our proposal always compromises on the  metrics and trades for better optimization on the  and  metrics.
To be more specific, we normalize the , , , and  metrics of the five scheduling algorithms to the CrosFDH correspondence in 22 task sets and calculate the average value of these metrics for each scheduling algorithm.The results are shown in Table 4.
From the table, the four global criterion GAs have eight times as much  metric as CrosFDH-GA on average.However, our proposal has better performance on both  and  metrics.The average energy of our algorithm is about 36% smaller, and the average link load is about 29% smaller than the rest algorithms on average.This tendency of optimization in the CrosFDH-GA is explained by the previous analysis of schedule's FDH cross efficiency, which suggests that a better metric-balanced schedule should trade the  metric for the  and  metrics.
Moreover, in our observation, schedule solutions that show similar performance to the output schedule of the four global criterion GAs exists in the Subpopulation B of the CrosFDH-GA.However, these schedule solutions are depreciated during the peer-appraising process of FDH cross efficiency, which supports the conclusion that the output schedules of the global criterion GAs are metric unbalanced.
Figure 5   In MD-GA, WS-GA, WES-GA, and EWC-GA, a clear increasing trend of solving time is observed as the scale of the scheduling problem (task number) rises, as shown in Figure 6(a).However such trend is not observed in CROSFDH-GA, as shown in Figure 6(b).
The reason of this phenomenon in CROSFDH-GA is that the solving time of CROSFDH-GA is largely determined by the calculation time of average FDH cross efficiency, and the calculation time of average FDH cross efficiency is depending on the size of the schedule set, which in our CROSFDH-GA is the number of efficiency schedules in the DEA-ready pool in each generation.Thus, the solving time of CROSFDH-GA is not directly related to the task number of a schedule problem but to the number of efficient schedules in the DEA-ready pool in each generation.
Figure 7(a) shows the relation between number of efficient schedules in the DEA-ready pool and the average solving time of the average FDH cross efficiency in a generation.As shown in the figure, the solving time rises rapidly as the efficient schedule number increases.Moreover, Figure 7(b) gives a statistic result of efficient schedule number during all 22 simulations.As observed in the figure, most of the generations during the simulation have the efficient schedule number that lies between 30 and 70, which requires about 7 to 100 seconds average FDH cross efficiency solving time.
Finally, Figure 8 demonstrates the trend of how four metrics converge during the iteration.The illustrated metrics are the best ones in the corresponding subpopulations in each generation and are all normalized to the final value of the 50th generation.Figure 8 shows that the  metric is the first converged metric, followed by the  metric and  metric,  which both converge in the same pace.And the  metric is the last converged metric.

Conclusion
In this paper, a FDH cross efficiency formulation, as well as a CrosFDH-GA algorithm, is proposed for the task scheduling problem on the NoC-based MPSoC.Four common metrics, namely, makespan, routing energy, average link load, and workload balance, are used to construct the multi-input multioutput DMU model.After using FDH simple efficiency to eliminate inefficient (dominated) schedules, the peerappraisal FDH cross efficiency is introduced to ranking schedules, during which the maverick (metric-unbalanced) schedules are depreciated.Then a FDH cross efficiency-based genetic algorithm with four subpopulations, each of which optimizes a single metric, is proposed for solving actual scheduling problem on NoC.According to our simulation results, the proposed FDH cross efficiency effectively distinguishes the schedule solutions according to the balance of their metrics, and our CrosFDH always outputs more metricbalanced schedules than other global criterion GAs.

6. 2 . 2 .
Results and Discussion.The simulation results of 22 task sets, which consist of 20 DAGs generated by TGFF and two real-world applications of solving Laplace equation (LE) using Gauss-Seidel algorithm and molecular dynamic coding (MDC), under four global criterion GAs and our CrosFDH GA, are illustrated in Figure5.All the shown makespan and energy metrics are the actual results measured in our NoC simulator.

Figure 6 :
Figure 6: The relation of solving time and task number.

Figure 8 :
Figure 8: The trend of convergence of four metrics.

Table 2 :
DEA analysis results of 40 sample schedules.

Table 3 :
The average value of metrics of each subpopulation.

Table 4 :
The average value of metrics of each scheduling algorithm.Figure5(e) are preprocessed by applying log 10 to the solving time of each algorithm.According to the figure, MD-GA has the smallest solving time in the five algorithms.The WS-GA, WES-GA, and EWC-GA have 4.4%, 4.5%, and 54.6% more solving times than the MD-GA on average.The CROSFDH-GA has the largest solving time which is nearly 109 times larger than MD-GA.The reason of such long solving time is caused by the high computation load introduced by DEA analysis.In the CROSFDH-GA, 90.2% of the solving time is used to calculate the average FDH cross efficiency, 8.3% of the solving time is used to calculate FDH simple efficiency, and the rest of the algorithm consumed only 1.5% of the solving time.