Highperformance heterogeneous computing systems are achieved by the use of efficient application scheduling algorithms. However, most of the current algorithms have low efficiency in scheduling. Aiming at solving this problem, we propose a novel task scheduling algorithm for heterogeneous computing named HSIP (heterogeneous scheduling algorithm with improved task priority) whose functionality relies on three pillars: (
In the era of big data, data intensive computing cannot rely on a single processor to be completed. It often relies on heterogeneous computing system (HCS). A heterogeneous computing system defined as highspeed network interconnection of multiple processors computing platform can carry out parallel and distributed intensive computing [
Typical task scheduling algorithms include Heterogeneous Earliest Finish Time [
Aiming at solving the three problems, this paper proposes the Heterogeneous Scheduling with Improved Task Priority (HSIP). It works in two steps: task prioritizing stage followed by processors selection stage. In the first step, the algorithm combines the standard deviation with the communication cost weight to determine the priorities of the tasks. In the second stage, we proposed an entry task duplication strategy to determine whether there is a need for entry task duplicate to other processors. At the same time, the improved insertionbased optimizing policy makes the makespan shorter. The experimental results show that our projected algorithm performs better than other algorithms in terms of schedule length ratio, efficiency, and frequency of best results.
The rest of this paper is constructed as follows: Section
A task scheduling model is composed of an application, a target computing environment, and performance benchmarks. An application can be characterized by a directed acyclic graph (DAG),
Example of DAG task model and computation cost matrix.
Furthermore, in our model, the processors are considered in a fully linked topology. The task execution and communication with other processors can be attained for each processor simultaneously and without conflict. And now we will present some of the common characteristics used in task scheduling, which we will discuss in the following sections.
Makespan, or schedule length, represents the finish time of the last task in the scheduled DAG, defined as (
Outdegree communication cost weight (OCCW) of task
Outdegree communication cost weight also affects the task priorities ordering. If a task with larger outdegree communication cost weight was not executed, all its successors would not be ready.
The aim of the scheduling issue is to determine an assignment of the tasks in a given DAG to processors so that the schedule length is reduced to a minimum. When all nodes in the DAG are scheduled, the schedule length will now become AFT, the Actual Finish Time of the exit task, as expressed by (
Recently, a number of task scheduling algorithms in heterogeneous computing systems have been projected. They can approximately be categorized into two groups, dynamic scheduling and static scheduling. In the dynamic category, the execution, communication costs, and the relationship of the tasks are unknown. Decisions are made at runtime. While in the static category, such information is known ahead of time. Dynamic scheduling is runtime scheduling, whereas static scheduling is compiletime scheduling.
Dynamic scheduling means when new task comes, only the task is about to be executed and the freshly arrived task will be reflected in the rescheduling process. Dynamic scheduling is adequate for conditions in which the system and task parameters are unknown at the compiled time. Therefore decisions are made at runtime with further observations. Some typical dynamic scheduling algorithms have been presented in the literatures, such as Batch Mode Mapping Heuristics [
Static scheduling algorithms are categorized into two major groups, that is, guided random searchbased algorithms and heuristicbased algorithms. Typical guided random searchbased algorithms include GA Multiprocessor Task Scheduling [
HEFT uses the mean value of the computation cost and the mean value of communication cost as the rank value to determine the scheduling sequence. But it is considered less reasonable for the heterogeneous environment. If the computation costs of the same task on different processors are too large, and if the communication cost weights of the task are too large, the HEFT algorithm will not give a justified scheduling. CPOP algorithm [
SDBATS algorithm is based on the HEFT algorithm and makes the performance significantly improved [
The latest excellent DAG scheduling algorithm is the PEFT algorithm [
In this section, we introduce a new scheduling algorithm for a confined number of heterogeneous processors, known as Heterogeneous Scheduling with Improved Task Priority (HSIP). The algorithm contains two key stages: a task prioritizing stage for calculating task priorities and a processor selection stage for choosing the best processor to execute the current task.
In task prioritizing stage, we improved the task priority strategy. In the processor selection stage, according to the priority of task scheduling order, tasks are assigned to the minimum EFT processor to be executed [
The detailed description of HSIP algorithm is as shown in Algorithm
Input: DAG, set of tasks
Output: Schedule result, makespan
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9) Use “
(10)
(11)
(12) Compute the earliest finish time (EFT) by (
(13)
(14) Assign task
(15)
(16)
(17) Update list
(18)
In descending order of
Standard deviation works better than the mean value with the response to the differences of the computation cost. When computation costs of the same task in different processors differ largely, the standard deviation value will be big. Or else, it will be small. Therefore, using the standard deviation can prioritize the node with larger computation cost differences and improve the overall scheduling results.
However, the standard deviation value of the calculation cost is far below the communication cost weight of task in terms of magnitude. Our algorithm multiplies standard deviation by the average cost as the calculation cost weight. Thus, the task with larger difference of computing cost can get higher priority, as well as the task with large transmission time to child nodes. In fact, our algorithm
Priority weights of tasks.
Function 












335.6  233.4  209.6  229.1  182.2  184.7  137.9  133.4  154.6  85.0 
Schedules of the sample task graph in Figure
HSIP
SDBATS
CPOP
HEFT
PEFT
Traditional task duplication algorithm has shorter length of scheduling. But it is limited by the overload of the processor utilization, mentioned in the literature [
Only the entry task needs the following duplication selection policy:
Choose the processor
Determine whether the entry task in another processor
All processors have been assigned tasks; namely, each processor’s entry task duplication judgment has been completed.
All immediate successor nodes of the entry task (
InsertionBased strategy is proposed by the HEFT algorithm and adopted by many other scheduling algorithms. But there is no precise mathematical description for this mechanism. When multiple idle time slots (ITS) meet the insert conditions, HEFT algorithm just selects the first ITS rather than the fastest one to be completed. This strategy will cause the unreasonable scheduling problem. We refine HEFT algorithm’s insertingbased constraints and provide a choosing policy when multiple slots satisfy the conditions. The detailed description is as follows:
After completion of task allocation, update each processors’ ITS queue.
When allocating
To all of the ITS meet the condition in step (2) and determine when
When there are multiple time slots satisfying steps (2) and (3), choose the ITS with the smallest EFT.
HSIP algorithm has the same time complexity with the HEFT algorithm. Computing
Figure
Results of scheduling DAG.
HSIP  SDBATS  CPOP  HEFT  PEFT  

Task prioritizing 







Makespan  67  76  86  80  85 
This section provides comparisons between the performance of the HSIP algorithm and the algorithms presented above. For this purpose, two sets of workload graphs are taken into consideration: randomly produced application graphs [
The metric that is most commonly adopted to appraise a schedule for a DAG, is the makespan, as defined by (
The denominator in the equation is the minimum computation cost of the critical path tasks (
Efficiency is defined as the calculation of the speedup divided by the number of processors applied in each run, and the speedup value is calculated as dividing the time of sequential execution by the time of parallel execution (i.e., the makespan). The sequential execution time is calculated by assigning the entire tasks to a single one processor that minimizes the overall computation cost of the task graph, as shown in the following equation:
In order to achieve a broad range of test DAGs, we have designed a task graph generator that can randomly generate DAGs with various features depending on input parameters the same as literature [
The density defines the number of edges between two node levels, the lower value generates fewer edges, and the higher value generates more edges. That affects the connectivity between nodes of each level.
The regularity defines the uniformity of each level. The small value will cause the numbers of nodes in each level to differ largely, namely, an unsymmetrical DAG. On the contrary, the number of nodes in each level will be similar.
The jump is the degree of leaping, which decides the steps that show how the node jumps down. The jump value denotes how many leaping steps from the current node level to the down level, and jump = 1 denotes that the node of current layer connects next layer’s nodes properly.
The range percentage of computation costs on processors (
For the purpose of the experiments, we chose the range of values for the parameters as follows:
Average SLR is the key factor that evaluates the performance of the algorithm in terms of the graph structure. Figure
Average SLR for different number of tasks.
Average SLR for different CCR.
Average SLR for different heterogeneity.
Efficiency for different number of processors.
In Figure
Table
Pairwise schedule length comparison of the scheduling algorithms.
HSIP  PEFT  SDBATS  HEFT  CPOP  

HSIP  Better 

68%  75%  81%  97% 
Worse  31%  17%  14%  2%  
Equal  <1%  8%  5%  <1%  


PEFT  Better  31% 

77%  70%  95% 
Worse  68%  32%  26%  4%  
Equal  <1%  <1%  4%  <1%  


SDBATS  Better  17%  32% 

61%  92% 
Worse  75%  77%  33%  7%  
Equal  8%  <1%  6%  <1%  


HEFT  Better  14%  26%  33% 

85% 
Worse  81%  70%  61%  14%  
Equal  5%  4%  6%  <1%  


CPOP  Better  2%  4%  7%  14% 

Worse  97%  95%  92%  85%  
Equal  <1%  <1%  <1%  <1% 
Due to introducing a lookahead feature, PEFT algorithm firstly considers the child nodes for the scheduling priority. So it has some advantages when the parallelism is low. But when the parallelism becomes high, this advantage is not obvious, especially when the DAG has a big heterogeneous difference. When the computation cost differences and communication cost weight of the child nodes are large, the PEFT algorithm often does not enjoy this advantage. Sometimes it is even not better than HEFT. The SDBATS algorithm is better than HEFT in some cases. But SDBATS ignores the insertionbased strategy and focuses on the computation cost differences too much. It does not have the advantage in the case of large communication cost. HEFT is the most classical scheduling algorithm. The effect is relatively stable, which can be seen from its maximum equivalence rate of scheduling results comparing with other algorithms. CPOP has the worst scheduling results, because it pays too much emphasis on the tasks of the critical path on the same processor. But it occasionally has good performance when the tasks on the critical path meet the scheduling optimal conditions.
The experiment results show that HSIP is better than other comparative algorithms in random DAG experiments of various parameters. Especially in the case of big heterogeneous difference, the advantage of our algorithm is more obvious. It is because HSIP pays more attention to the balance between the computational cost difference and communication cost weight as presented in Section
In this section, we take the application graphs of some realworld problems into account, namely, Gaussian elimination [
In the Gaussian elimination applications experiment, heterogeneous computing systems using five processors, CCR and
Experimental result for Gaussian elimination.
Average SLR
Efficiency
In FFT related experiments, because the application structure is established, only the CCR and range percentage parameters (
Experimental result for FFT.
Average SLR
Efficiency
The Montage is used to construct an application of astronomical image mosaic in the sky. We use 25 and 50 task nodes to make experiments. Like other real applications, the application structure has been established, only using the CCR, CPU number, and range rate parameters (
Experimental result for Montage.
Average SLR for different CCR
Average SLR for different number of CPUs
Average SLR for different heterogeneity
Epigenomic is used to compare the genetic performance of genetic state of human cells in whole genome range. Like other real applications, the application structure has been established, only using the CCR, CPU number, and range rate parameters (
Experimental result for Epigenomic.
Average SLR for different CCR
Average SLR for different number of CPUs
Average SLR for different heterogeneity
The standard deviations of all the above realworld problems’ experimental errors are calculated, and they are in the 4–7% range.
In this paper, we proposed a new list scheduling algorithm for heterogeneous systems named HSIP. The task scheduling algorithm proposed in this paper has demonstrated that the scheduling DAG structured applications performs better in heterogeneous computing system in respect of performance matrices (average schedule length ratio, speedup, efficiency, and frequency of best results). The performance of the HSIP algorithm has been experimentally observed by applying a large set of task graphs created randomly with various characteristics and application graphs of multiple realworld issues, such as Gaussian elimination, Fast Fourier Transformation, Montage, and Epigenomics. The simulation results backup the fact that HSIP algorithm is better than the existing algorithms, PEFT, SDBATS, CPOP, and HEFT, for instance. The complexity of HSIP algorithm is
The authors declare that they have no competing interests.
This paper is partially supported by The National Natural Science Foundation of China under Grant no. 11372067 and The General Project of Science and Technology Research from Education Department of Liaoning Province (Grant no.: L2014508).