Energy-Efficient Reliability-Aware Scheduling Algorithm on Heterogeneous Systems

1School of Information Science and Engineering, National Supercomputing Center in Changsha, Hunan University, Changsha 410082, China 2Information Science and Technology College/Southern Regional Collaborative Innovation Center for Grain and Oil Crops in China, Hunan Agricultural University, Changsha 410128, China 3Archive, Hunan University of Humanities, Science and Technology, Loudi 417000, China


Introduction
For a long time, energy consumption has simply been ignored in the performance evaluation in large-scale parallel computing systems.However, Intelligence (DCDi) Industry Census reported that the amount of electricity consumed by global data centers ran up to 40 GW in 2013, and it was also with a 7% increase [1].According to the latest world's Top 500 supercomputers Ranking, the power consumption of first supercomputer "Tianhe-2" is 17.808 MW and average power consumption for Top 10 systems in Ranking list is 6.2939 MW, respectively [2].Thus, it is obvious that high energy cost is a key feature of designing and applying heterogeneous systems.
On the other hand, computing systems are a group of heterogeneous processors connected via a high-speed network that supports the execution of parallel applications.
For example, the Top supercomputer "Tianhe-2" in Top 500 lists consists of Intel Xeon5 E5-2692 12C 2.200 GHz and Intel Xeon Phi 31S1P (MIC) [2].For each processor, the number of transistors integrated into today's Intel Xeon EX processor reaches to nearly 2.3 billion and its power consumption over 130 W [3].This implies the possibility of worsening single processor reliability, eventually resulting in poorness of the whole heterogeneous system reliability.Furthermore, the modern large-scale computing systems usually have a lot of processors, such as "Tianhe-2" with 3,120,000 cores and "Titan" with 560,640 cores [2].One of the main problems existing in this situation is system reliability, which drastically decreases as the number of processor cores increases [4].Even when the single processor's one-hour reliability becomes very high, such as 0.999999, as the system size approaches 10,000 cores, the system's MTTF (the Mean Time to Failure) drops to less than 10 hours [4].This also allows 2 Scientific Programming us to focus primarily on the main problem of this paper, which is the simultaneous management of system performance, reliability, and energy consumption.
In recognition of this, we first build a reliability and energy aware task scheduling architecture including precedence-constrained parallel applications and energy consumption model on heterogeneous systems.Then, we propose the single processor failure rate model based on DVFS technique and deduce the application reliability of systems.Finally, to provide an optimum solution for this problem, we propose a heuristic Reliability-Energy Aware Scheduling (REAS) algorithm, which adopts a novel scheduling objective RE.The overall objective of this paper is trying to get good tradeoff among performance, reliability, and energy consumption.
The rest of the paper is organized as follows: the related work is summarized in Section 2. We describe the task scheduling system model in Section 3. In Section 4, we provide a system reliability model.To solve this problem, a heuristic reliability and energy aware task scheduling algorithm is proposed in Section 5.In Section 6, we verify the performance of the proposed algorithm by comparing the results obtained from performance evaluation.Finally, we summarize the contributions and make some remarks on further research in Section 7.

Related Work
The high-performance parallel application running on computing systems is usually composed of intercommunicated tasks, which are scheduled to run over different processors in the systems.In most cases, the main objective of scheduling strategies is to map the multiple interacting program tasks onto processors and order their executions so that task precedence requirements are satisfied and, in the meanwhile, the minimum schedule length (makespan) can be achieved.The problem of finding the optimal schedule is NP-complete in general [5][6][7][8][9].There are many scheduling algorithms that have been proposed to deal with this problem, for example, dynamic-level scheduling (DLS) algorithm [6] and heterogeneous earliest-finish-time (HEFT) algorithm [5,8,10,11].
As the energy consumption has become important issue in designing large-scale computing systems in the last few years, many techniques including dynamic voltage-frequency scaling (DVFS), dynamic powering on/off, slack reclamation, resource hibernation, and memory optimizations have been investigated and developed to reduce energy consumption [12][13][14].DVFS, which is a technique in which a processor runs at a less-than-maximum frequency when it is not fully utilized in order to conserve power, is perhaps the most appealing method for reducing energy consumption [14,15].Most of the early DVFS-enabled researches focused on the single processor of embedded and real-time computing systems [14,16,17].Recently, there has been a significant amount of work on task scheduling for heterogeneous systems using DVFS-enabled techniques.For instance, Rountree et al. focused on energy optimization of MPI program in HPC environment and proposed a linear programming (LP), which incorporates allowable time delays, communication slack, and memory pressure into its scheduling using DVFS (i.e., slack reclamation) [18].Rizvandi et al. proposed a method to find the best frequencies of processor to obtain the optimal energy consumption [19].Lee and Zomaya addressed the problem of scheduling precedence-constrained parallel applications on multiprocessor computer systems and their scheduling decisions are made using the relative superiority metric (RS) devised as a novel objective function [20].In [21], Zong et al. proposed two energy-efficient scheduling algorithms (EAD and PEBD) for parallel tasks on homogeneous clusters based on duplication strategy.
All of this work demonstrated that dynamic adjusting the processor's voltage and frequency can effectively reduce system energy consumption.However, recent researches have illustrated that scaling the processor's voltage and frequency has negative impact of nanoscale semiconductor circuits's cosmic ray radiations, electromagnetic interference, and alpha particles, which enforce the unreliability of processor [22][23][24].Thus, it is a good way to incorporate the reliability into energy aware scheduling based on DVFS.Recently, Zhu etc. focused on reducing energy consumption while preserving the system reliability for periodic real-time tasks [25,26].They proposed a reliability model that the processor's reliability decreases as scaling their voltage and frequency from max to min and incorporated the reliability requirements into heuristic energy aware task scheduling strategies.However, their techniques are not suitable for precedenceconstrained parallel applications on heterogeneous systems based on DVFS-enabled processors.
Many researches had dealt with the reliability on heterogeneous systems.For example, Dogan and Özgüner introduced three reliability cost functions that were incorporated into making dynamic level (DL) and proposed a reliable dynamic level scheduling algorithm (RDLS) [27]; the goal was to minimize not only the execution time but also the failure probability of the application.In our previous work [8], we propose a scheduling algorithm which considers the task's execution reliability.Qin and Jiang investigated a dynamic and reliability-cost-driven (DRCD) scheduling algorithms for precedence-constrained tasks in heterogeneous clusters [28].Unfortunately, those works did not consider the energy consumption and the reliability of scaling the processor's voltage and frequency.In recognition of this, we focus on the reliability and energy consumption on DVFS-enabled heterogeneous systems.

System Models
3.1.Scheduling Architecture.Various task scheduling architectures are proposed in literature [5,8,9,14,28,29].However, the energy consumption and system reliability are not effectively incorporated into scheduling.In this paper, we propose a reliability and energy aware task scheduling architecture, as depicted in Figure 1(a).It is assumed that all parallel applications, along with information provided by user, are submitted to system by a special user command.First, the parallel applications are divided as a task DAG by Task DAG Model.Then, the   estimate energy consumption of tasks, which are executed on the DVFS-enabled heterogonous processors, is computed by the Eneregy Consumption Estimator.At the same time, reliability analysis computes the processors' reliability according to different frequency to get the whole system reliability.Finally, the Scheduler schedules tasks based on the above task energy consumption and system reliability.
The heterogeneous processor's failure is assumed to follow a Poisson process and each processor has a constant failure rate  [8,9,29].For example,   denotes a processor   failure rate when it works at normal voltage and frequency [8,9,27,29].These failure rates can be derived from system's profiling, system log, and statistical prediction techniques [31].For demonstration purposes, we illustrate two heterogeneous processors, one has 3 frequency levels and the other has 2 frequency levels, and the parameters are listed in Table 1.

Applications Model.
The precedence-constrained tasks of parallel application are usually denoted as a Directed Acyclic Graph (DAG)  = ⟨, , [ , ,  ,, ]⟩ [5,[8][9][10]29], where  = {V 1 , V 2 , . . ., V  } is the set with  tasks that can be scheduled to any available DVFS-enabled processors [5,[8][9][10]29];  represents the precedence relation that defines a partial order on the task set , such that V  V  implies that the task V  must be finished, before V  can start execution [5,[8][9][10]29].[ , ] is  ×  communication matrix that denotes the communication time between tasks V  and V  for 1 ≤ ,  ≤ .[ ,, ] is  ×  ×  max computation matrix in which each  ,, gives the estimated time to execute task V  on processor   at frequency  , .Here,  max is the maximal operation level on systems.The communication cost and computation cost can be evaluated by building a historic table and using code profiling or statistical prediction techniques [31].Figure 1(b) shows a parallel application DAG, Table 2 lists the tasks execution time on two heterogeneous DVFSenabled processors listed in Table 1, and the communication time among these tasks is listed in Table 3.
Generally, the common objective of task scheduling is to map tasks with precedence constrained onto processors and get a minimum schedule length (which is also called makespan) [10,11].Before presenting the schedule length, it is necessary to define the scheduling attributes EST and EFT In this paper, let   , = 1 denote the task V  scheduled on processor   at frequency  , ; otherwise   , = 0. Thus, the schedule length is defined as follows: (2)

Energy Model.
The major energy consumption of computing systems depends on its memory, disks, CPUs, and other components.This paper only considers DVFS-enabled CPUs, which consume the largest proportion of energy on systems [14,19,20,32].The power consumption of DVFSenabled microprocessor based on complementary metaloxide semiconductor (CMOS) logic circuits mainly consists of static power and dynamic power dissipation, which can be modeled as [25,26] where   is the static power, which is a constant and the power used to maintain basic circuits and keep the clock running, and frequency-independent active power.0 denotes the processor's model, if processor is at execution model, 0 = 1; otherwise, 0 = 0.   is the most significant factor of processor power consumption and can be estimated as [14,16,19,20,32] where  represents the switched capacitance,  is the supply voltage,  represents processor's working frequency, and  stands for circuit dependent constant.The example of such processor parameters is listed in Table 1.
Let EN(V  ,  , ) be the energy consumption caused by task V  running on DVFS-enabled processor   at frequency  , , of which it is determined by task execution time and processor power consumption: where   ( , ) denotes dynamic power dissipation of processor   at frequency  , (see ( 4)).Thus, for an application , the energy consumption EN() is the summation of all tasks of energy consumption: At the same time, for heterogeneous systems, all processors are power-on; they are sleep or execution model.That is to say, all processors of systems consume  o all the time.Thus, the computing systems energy consumption EN() is the summation of all processors static power and dynamic power dissipation of application energy consumption: Obviously, systems energy consumption EN() is greater than application energy consumption EN().In this paper, one of our main objectives is to minimize systems energy consumption EN().

System Reliability Analysis and Problem Statement
In this section, we first provide the single DVFS-enabled processor failure rate model.Then, we analyze heterogeneous systems reliability.At last, we formulate the reliability and energy aware task scheduling as a linear programming problem.

Single DVFS-Enabled
Processor Failure Rate.Among various sources of unreliability in a semiconductor circuit processor, it is predicted that the failure rate due to cosmic ray radiation-induced soft errors dominates all other reliability issues [24].Transient fault occurs when a high energy particle such as alpha or neutron strikes a sensitive region in a semiconductor device and flips the logical state of the struck node [33].Most of the modern DVFS-enabled processor is the integration of multibillion transistors on a single chip leading to increasing number of sensitive devices in submicron technologies which is vulnerable to soft error and consequently raises the Soft Error Rate (SER) [34].These phenomena become more and more serious with the continued scaling of processor's voltage and frequency [23,25].
Traditionally, the modern DVFS-enabled processor's reliability has been modeled as the following Poisson distribution with a failure rate  when it works at normal voltage and frequency [8,9,27,29,35].Moreover, it has been shown that DVFS has a direct and negative effect on failure rates as blindly applying DVFS to scale the supply voltage and processing frequency for energy savings, which may cause significant degradation in processor's reliability [23,25,26].Therefore, for the DVFS-enabled heterogonous processor   ∈  to be considered in this paper, the failure rate at a reduced frequency  , (and the corresponding voltage  , ) can be modeled as where   is the failure rate corresponding to the normal processing frequency  nm (and corresponding to normal voltage  nm ).Prior researches which studied the effect of normal voltage on processor's reliability have revealed that the failure rates generally increase with scaled processing frequencies (and supply voltages) away from normal voltage [24,36].On the other hand, the fault rates are exponentially related to the circuit's critical charge (which is the threshold voltage).Thus, we have the following equations: where the exponent   is the parameter of threshold voltage and   is a constant, representing the sensitivity of fault rates to frequency scaling, and  min and  max denote the minimum and maximum frequency, respectively.In order to get the precision value of parameters   and   , we use least squares curve fitting method [37].Therefore, the natural logarithm of both sides for (9) is Thus, we can get the parameters   and   approximation value by using least squares linear fitting method.

Application Reliability Analysis. Assume that the task
V processing time has taken place during the time interval [, ] on heterogeneous DVFS-enabled processor   at frequency  , , where  denotes the task start execution time and  denotes the task finish time [5,8,9,29].Thus, the task execution reliability can be given by For a task V  of application  on processor   at frequency  , , its reliability [V  ,  , ] is equal to all of its immediate parent tasks and its execution reliability, which can be defined by × exp (−  ( , ) ×  ,, ) , (13) where pred(V  ) denotes all direct predecessors of V  and [V  ] is the reliability of task V  that is equal to the reliability of task V  executing on processor   at frequency  , For the entry task V 1 of application, which is executed on processor   at frequency  , and pred(V 1 ) = , its reliability Generally, application  has one exit task V exit .The reliability of application [] is equal to the exit task V exit : This is the other objective of this paper, in which we try to improve the application reliability [].From the above analysis, we know that allocating tasks with less execution times to more reliable processors might be a good heuristic to increase the reliability.

Problem Statement.
As simultaneous management of scheduling performance, system reliability, and energy consumption is the main problem of this paper, we formulate it as follows:

Proposed Reliability-Energy Aware Scheduling Algorithm
This section presents a Reliability-Energy Aware Scheduling algorithm on heterogeneous systems called REAS, which aims at achieving lower energy consumption, high reliability, and shorter schedule length.Its scheduling decisions are made using the hybrid metric including energy consumption, reliability, and schedule length, devised as a novel objective function.The pseudocode of the algorithm is shown in Algorithm 1.The algorithm is complete in three main phases as described in the following sections.

Task Priorities
Phase.This step is essential for list scheduling algorithms.A task processing list is generated by sorting the task by decreasing order of some predefined rank  function, such as  level,  level, Rank, CP, and DL [5,6,[8][9][10]29].Here, we use the average computation capacity, which is defined as In this research, we use  level as the rank function.The  level of task V  is the sum of the path weight from task V  to exit task.We can compute this value recursively traversing DAG from exit task, and it is defined as follows: where succ(V  ) is the set of immediate successors of task V  .RC(V  ) is the average reliability overhead of task V  and can be computed by For the exit task V exit , the  level is equal to Basically,  level(V  ) is the length of the critical path from task V  to the exit task, including the average computation cost and reliability overhead of task V  .For example, considering the application DAG in Figure 1(b), heterogeneous systems parameters in Table 1, task execution time matrix in Table 2, and communication matrix in Table 3, the task  level value which is recursively computed by ( 19) and ( 21) is shown in Table 4.

Task Assignment Phase.
In this phase, tasks are assigned to the processors with earliest execution finish time EFT(V  ), high reliability, and minimum task energy consumption EN(V  ).However, for heterogeneous systems, these performance metrics are conflicted most of the time.Here, we introduce a novel objective as RE, which can get good tradeoff among these metrics.We first redefine task V  earliest execution finish time on processor   at frequency  , as EFT (V  ,  , ) = EST (V  ,  , ) +  ,, + RO (V  ,  , ) , (22) where RO(V  ,  , ) is the reliability overhead of task V  on processor   at frequency  , and is computed by On the other hand, we let Min (V  ), Min (V  ) denote the earliest execution finish time and minimum task energy consumption on all processors of heterogeneous systems.Thus, the novel metric RE of task V  on processor   at frequency  , is where  is the weight of task earliest execution finish time.
If the task execution time is more important than energy consumption, we can give higher value to ; otherwise,  value is lower.Moreover, the scheduling objective of this problem is minimum in both schedule length and energy consumption.Thus, in each task assignment step, we try to get the minimum RE(V i ,  , ) and assign task V  to the corresponding processor frequency.

Slack Reclamation.
Tasks of parallel application may have some slack time for their execution due primarily to communication events, for example, "multidimensional" intertask communication (or intertask data dependencies), and these processor slacks are an obvious source of energy wastage.Slack reclamation was studied to reduce energy consumption using the slack left by some completed task instances.The idea behind the slack reclamation for the reducing of energy consumption is to exploit the slack time to slow down the execution speeds of the remaining tasks [12,20].In this paper, we adopt this technique to reduce energy consumption after making the scheduling decision.The slack time of task V  is defined by where Sch(V  , ) is the task V  earliest start time in scheduling processor-frequency pairs and Sch(V  , ) is the earliest finish time.
If task slack time Slack(V  ) > 0, we can scale down the execution frequency to save energy consumption.Thus, the optimal frequency  , is satisfied with where  ,orig is the original scheduling processor-frequency pairs.At last step, we reassign task V  to the optimal frequency  , .

Experimental Results and Discussion
In this section, we compare the performance, energy consumption, and system reliability using our REAS algorithm with three existing scheduling algorithms: DLS [6], RDLS [27], and ECS [20].The experiments are performed on the synthetic randomly generated precedence-constrained parallel application graphs as described below.The performance metrics chosen for the comparison are the schedule length (2) and ( 22), systems energy consumption EN() (7), and application reliability [] (16).
To test the performance of these algorithms, we have developed a discrete event simulation environment of heterogeneous systems with 8 DVFS-enabled processors using C++.This simulator includes 2 Intel5 Core6 Duo, 2 Intel Xeon, 2 AMD Athlon, 1 TI DSP, and 1 Tesla GPU, mostly based on Intel processor.The systems are interconnected by Infiniband, which is a switched fabric communications link primarily used in high-performance computing.For the Infiniband configuration, the switch considered is Mellanox InfiniScaleTM III SDR and NIC is Mellanox ConnectXTM IB Dual Copper Card [21].Other parameters of the model are set as follows.The failure rates of processors are assumed to be uniformly distributed between 1×10 −4 and 1×10 −5 failures/hr [8,9,28]; the transmission rates of links are assumed to be 1000 Mbits/sec.6.1.Randomly Generated Application Graphs.These experiments use three commonly DAG characteristics to generate parallel application graphs [5,8,9,29]: that the application is communication-intensive [5,[8][9][10]29].
(iii) Out-Degree.It is out-degree of a task node.
In experiments setting, DAG are generated based on the above parameters with the number of tasks 50 and 100.Task weights are generated randomly from uniform distribution [1 × 10 9 , 9 × 10 11 ] execution cycles to be around 4.5 × 10 10 ; thus the average task execution cycles are 4.5 × 10 10 .We also generated edge weights with a uniform distribution based on a mean CCR.Different objective parallel applications can be produced as giving various CCR values [5,[8][9][10]29].In these experiments, we varied CCR in a reasonable range of 0.1 to 10.

Various Weight 𝜃 of REAS Algorithm.
In the first experiments, we evaluate the performance of weight  to REAS algorithm.Figure 2 shows the simulation results of scheduling 50 and 100 tasks with CCR = 1 by varying weight  from 0 to 1, in steps of 0.2.We observe from Figure 2 that the schedule length and energy consumption decrease and the application reliability almost at the same level as the REAS algorithm weight  increases.It is reasonable that the REAS algorithm with high  is mostly based on task execution time and makes its schedule length shorter and consumes less energy.However, as the weight  over 0.4, the performance of REAS is not much distinguishable.Thus, in the below experiments, we let  = 0.5.

Random Task Performance Results.
For the set of randomly generated parallel applications, the results are shown in Figures 3 and 4, where each data point is the average of the data obtained in 1,000 experiments.In this set of experiments, we assume the weight  = 0.5 of metric RE (see (24)) in REAS algorithm.In other word, the REAS algorithm has the same weight on task execution time and energy consumption.In the next section, we will examine the performance by various weights .We observe from Figure 3(a) that REAS is over RDLS and ECS with respect to schedule length, and the schedule length increases as the CCR increases.The average schedule length of the REAS algorithm is shorter than that of the RDLS and ECS by 2.6% and 1.9%, respectively.This improvement becomes more obvious as CCR increases, for CCR = 5 and REAS over RDLS and ECS by 7.5% and 2.6%, respectively.However, the REAS is inferior to DLS in terms of schedule length.Figure 3(b) reveals that REAS saves more average energy consumption than RDLS by 15.3%, ECS by 3.7%, and DLS by 16%, respectively.Figure 3(c) shows that REAS outperforms RDLS, ECS, and DLS by 0.3%, 2%, and 0.7% in terms of the average application reliability.This is mainly due to the fact that REAS algorithm schedules tasks according to the novel objective RE, which can get effective tradeoff among task execution time, energy consumption, and task execution reliability.However, DLS algorithm only focuses on optimizing the task execution time and its actual execution time including the task scheduling time and reliability overhead.Thus, the scheduling solution generated by DLS can get optimal schedule length.However, it consumes more energy and has lower reliability.RDLS algorithm schedules tasks considering their execution reliability and ignoring task energy consumption.ECS algorithm is a solution for optimizing both schedule length and energy consumption, but this solution needs more task execution reliability overhead.Thus, REAS algorithm outperforms RDLS, ECS, and DLS in terms of the schedule length, energy consumption, and reliability.Other interesting experimental phenomena are that RDLS and DLS are better than ECS in terms of reliability.This is mainly due to the fact that tasks of solutions RDLS and DLS are always executing on the normal frequency of processor, which has the high reliability in all processor frequency.
The improvements of scheduling performance also could be concluded from Figures 3(d), 3(e), and 3(f) for 100 tasks.These results also show REAS over RDLS and ECS by 4.9% and 3.5% in terms of the average schedule length.And, REAS is also over RDLS, ECS, and DLS by 8.93%, 4.53%, and 8.24% in terms of the average energy consumption and by 1.86%, 6.28%, and 2.1% in terms of the average application reliability, respectively.We also simulate heterogeneous systems with 4 Intel Xeon and 4 AMD Athlon; the other configurations are the same as before.Figure 4 shows the results of 100 randomly generated tasks on this heterogeneous computing platform.The results show REAS over RDLS, ECS, and DLS in terms of average schedule length and energy consumption.However, REAS is inferior to RDLS in terms of the application reliability.

Application Graphs of Real-World Problem.
Using real applications to test the performance of algorithms is very common [5,[8][9][10]29].In this section, we also simulate a real-world digital signal processing (DSP) problem, and the detail can be seen in [5,[8][9][10]29].From Figure 5, we can conclude that REAS is also better than RDLS, ECS, and DLS.

Conclusions and Future Work
In the past few years, with the rapid development of heterogeneous systems, the high price of energy, system performance, reliability, and various environmental issues have forced the high-performance computing sector to reconsider some of its old practices with an aim to create more sustainable system.In this paper, we attempt the simultaneous management of system performance, reliability, and energy consumption.To achieve this goal, we first built a reliability and energy aware task scheduling architecture, which mainly includes heterogeneous systems, parallel application DAG model, and energy consumption model.Then, we proposed a relationship between execution reliability and processor's voltage/frequency and deduced its parameters approximation value by least squares curve fitting method.Thirdly, we established parallel application execution reliability model and formulated this reliability and energy aware scheduling problem as a linear programming.Finally, to provide an

Figure 1 :
Figure 1: (a) The reliability and energy aware task scheduling architecture.(b) A parallel application task graph.

Figure 2 :
Figure 2: The experimental results of REAS algorithm with various weights .(a) Schedule length.(b) Energy consumption.(c) Application reliability.

Figure 4 :
Figure 4: The experimental results of 100 tasks for 4 Intel Xeon and 4 AMD Athlon.(a) Schedule length.(b) Energy consumption.(c) Application reliability.

Figure 5 :
Figure 5: The experimental results of real-world DSP problem.(a) Schedule length.(b) Energy consumption.(c) Application reliability.

Table 1 :
The parameters of heterogeneous processors.

Table 4 :
The  level value of task.