A Work-Demand Analysis Compatible with Preemption-Aware Scheduling for Power-Aware Real-Time Tasks

. Due to the importance of slack time utilization for power-aware scheduling algorithms, we propose a work-demand analysis method called parareclamation algorithm (PRA) to increase slack time utilization of the existing real-time DVS algorithms. PRA is an online scheduling for power-aware real-time tasks under rate-monotonic (RM) policy. It can be implemented and fully compatiblewithpreemption-awareortransition-awareschedulingalgorithmswithoutincreasingtheircomputationalcomplexities. Thekeytechniqueoftheheuristicsmethoddoublestheanalyticalintervalandturnsthedeferrableworkloadoutthepotential slacktime.TheoreticalproofsshowthatPRAguaranteesthetaskdeadlinesinafeasibleRMscheduleandtakeslineartimeand spacecomplexities.Experimentalresultsindicatethattheproposedmethodcombiningthepreemption-awaremethodsseamlessly reducestheenergyconsumptionby14%onaverageovertheiroriginalalgorithms.


Introduction
Power management is increasingly becoming a design factor in portable and hand-held computing/communication systems.Energy minimization is critically important for devices such as laptop computers, smartphones, PDAs, wireless sensor networks (WSNs), and other mobile or embedded computing systems simply because it leads to extended battery lifetime.The power consumption problem has been addressed in the last decade with a multidimensional effort by the introduction of engineering components and devices that consume less power, low power techniques involving the designs of VLSI/IC, computer architecture, algorithm, and compiler developments.
Recently, dynamic power management (DPM) and dynamic voltage scaling (DVS) have been employed as available techniques to reduce the energy consumption of CMOS microprocessor system.DPM changes the power state of cores on chip to lower the energy consumption according to the performance constraints.DVS involves dynamically adjusting the voltage and frequency (hence, the CPU speed).By reducing the frequency at which a component operates, a specific operation will consume less energy but may take longer to complete.Although reducing the frequency alone will reduce the average energy used by a processor over that period of time, it may not always deliver a reduction in energy consumption overall, because the power consumption is linearly dependent on the increased time and quadratically dependent on the increased/decreased voltage.In the context of dynamic voltage scaled processors, DVS in real-time systems is a problem that assigns appropriate clock speeds to a set of periodic tasks and adjusts the voltage accordingly such that no task misses its predefined deadline while the total energy savings in the system is maximized.
Many studies have proposed different real-time scheduling based on different system models [1][2][3][4][5][6][7][8][9], such as online and offline scheduling, handling discrete/continuous voltage levels, assuming average-case execution time (ACET), bestcase execution time (BCET), or worst-case execution time (WCET) of each task, allowing intratask/intertask voltage transitions, and assuming fixed/dynamic priority assignment.These approaches have a common objective and encounter the same difficulties.Because reducing the supply voltage decreases the clock speed of processors [10], most DVS algorithms for real-time systems reduce supply voltage dynamically to the lowest possible level while satisfying the soft/hard timing constraints of each task.To satisfy the timing constraints of real-time tasks, DVS technique must utilize available slack time when adjusting voltage/speed levels.Consequently, the energy efficiency of a DVS algorithm markedly depends on the accuracy of computing available slack time.
Work-demand analysis on embedded real-time scheduling has been investigated by previous studies [3,[5][6][7]11].Pillai and Shin [7] proposed a cycle-conserving rate-monotonic (ccRM) scheduling scheme that contains offline and online algorithms.The offline algorithm computes the WCET of each task and derives the maximum speed needed to meet all tasks deadlines.It recomputes the utilization by comparing the actual time for completed tasks with WCET schedule.In other words, when a task completes early, they have to compare the used actual processor cycles to a precomputed worst-case execution time schedule.This WCET schedule is also called canonical schedule [1] whose length could be the least common multiplier of task periods.ccRM is a conservative method, as it only considers possible slack time before the next task arrival (NTA) of a current job.Gruian proposed a DVS method for offline task stretching and online slack distribution [3].The offline part of this method consists of two separate techniques.One focuses on the intratask stochastic voltage scheduling that employs a task-execution length probability function.The second technique computes stretching factors by using a response time analysis.It is similar to Pillar and Shin's offline technique, but instead of adopting a stretching factor for all tasks before NTA, Gruian assigns a different stretching factor to the individual task within the longest task period.Kim et al. [6] proposed a greedy online algorithm called the low-power work-demand analysis (lpWDA) that derives slack from lowpriority tasks, as opposed to the method in [3,7] that gains slack time from high-priority tasks.This algorithm also balances the gap in voltage levels between high-priority and low-priority tasks.Its analysis interval limited by the longest of task periods is longer than NTA.Thus, lpWDA gains more energy saving than the previous rate-monotonic (RM) DVS schemes applying NTA.Many slack time analysis methods considered additional assumptions [4,11,12].Kim et al. proposed a preemption-aware DVS algorithm based on lpWDA, which is composed of accelerated-completion (lpWDA-AC) and delayed-preemption (lpWDA-DP) techniques to decrease the preemption times of DVS schedules [11].lpWDA-AC attempts to avoid preemption by adjusting voltage/clock speed, such that it is higher than the lowest possible values computed using lpWDA.lpWDA-DP postpones preemption points by delaying an activated high-priority task as late as possible while guaranteeing a feasible task schedule.Both techniques reduce energy consumption more than the initial ccRM and lpWDA techniques on the assumption of contextswitching overhead.Mochocki et al. in [12] also proposed a transition-aware DVS algorithm for decreasing the number of voltage/speed adjustments, called the low-power limiteddemand analysis with transition overhead (lpLDAT) scheme, which accounts for both time and energy transition overhead.Its algorithm computes an efficient speed level based on average-case workload; notably, this speed can be used as a limiter.If the limiter is higher than the speed predicted by lpWDA, lpLDAT knows that lpWDA is being too aggressive and applies the limiter to the present schedule.On the assumption of transition overhead, this technique with slack time analysis also saves considerable energy when compared with that by the previous methods.He and Jia [4] developed a fixed-priority scheduling with threshold (FPPT) scheme that eliminates unnecessary context switches, thereby saving energy.FPPT assigns each task a pair of predefined priority and corresponding preemption threshold.He et al. applied a novel algorithm to compute a static slowdown factors by formulating the problem as a linear optimization problem.In addition, they considered energy consumption of a task set under different preemption threshold assignments.
Recently, experimental results obtained by Kim et al. [6] indicated that recent DVS algorithms for fixed-priority realtime tasks are less efficient than that of dynamic-priority tasks, leading to more improvements for a better DVS method.The main reason for energy inefficiency of RM DVS scheduling is that, in RM schedules, priority-based slack-stealing methods do not work as efficiently as they do in earliest-deadline first (EDF) scheduling [6].In the EDF schedules, high-priority tasks play an efficient slack distributor of tasks because their slack can be utilized fully by tasks starting before NTA.Therefore, the energy saving achieved by EDF scheduling algorithms, such as that by the ccEDF [7], DRA, and AGR [1] is close to the theoretical lower bound [13].

Motivations
So far, there are a large number of studies on DVS-based RM scheduling for energy saving [1-4, 6-8, 11-14]; most existing studies are proposed for computing and predicting the length and occurrence of slack time.The reason is that the more precise estimation on the slack time, the more energy efficiency we obtain.Those methods for computing available slack time either construct a canonical schedule and compare it to current schedule or propose best-effort algorithms under empirical rules and heuristics.Those methods adopting different strategies and assumptions such as task preemption or voltage transition time on the similar models gain considerable energy saving, but few of them can be combined without difficulty to further enhance their performance.Additionally, modern processor with DVS or DPM feature must be equipped with dc-to-dc converter that varies the processor speed in appropriate levels and requires additional switching time and power [15].It is harmful to power saving in a system when many fragments of short slack time appear.Many of those methods also propose the notions of postponing and advancing task execution for increasing the length of slack time.Their performances for accumulating a continuous slack time are not impressive due to short analysis interval adopted in the schedules.Therefore, it is necessary to study a transplantable method that can cooperate with different existing methods without modification.This idea originate from the layered architecture used in designing computer software, hardware, and communications in which system or network components are isolated in layers so that changes can be made in one layer without affecting the others.The proposed method according the notion also requires the ability to compute and accumulate the slack time solely.By applying the layered architecture, it can also pass the slack time to lower-layered methods and reveals synergy effect to enhance overall energy saving.
In this paper, we propose an online work-demand analysis called parareclamation algorithm (PRA) for RM scheduling which computes the length of potential slack in an interval which is two times longer than the longest task periods.PRA does not rely on the simulation for stochastic data which usually varies according to different applications, and can be applied to many RM scheduling algorithms with various criteria.Moreover, the proposed algorithm has a time complexity of () where  is the number of tasks.In other words, it does not increase computational complexity of the existing online RM scheduling algorithms.Experimental results indicate that existing RM DVS algorithms combined with the proposed method can reduce energy consumption by 5%-21% compared with that by initial algorithms such as lpWDA and lpLDAT.
The remainder of this paper is organized as follows.Section 3 introduces the preliminaries of power-aware realtime scheduling.Section 4 introduces our technique and algorithm.Section 5 provides theorems to prove the schedulability of PRA as well as lpWDA.We present the performance evaluation in Section 6. Section 7 gives conclusions and the directions for future work.

Preliminaries
This paper focuses on how to obtain additional slack for existing RM DVS scheduling methods.Many slack time analysis techniques with different purposes (e.g., transitionaware and preemption-aware schemes) can utilize PRA easily; throughout this paper, these techniques are called host algorithms of PRA.This section also outlines the ideas underlying the lpWDA algorithm.Other techniques, such as the lpLDA, lpWDA-AC, lpWDA-DP [11], and lpLDAT [12] techniques, are abridged.

System
Model.This paper considers preemptive hard real-time systems in which periodic real-time tasks are scheduled under an RM scheduling policy.The DVS processor used in the model operates at a finite set of supply voltage levels  = {V 1 , . . ., V max }, each with an associated speed.Processor speed is normalized by  max = 1 corresponding to V max = 1, yielding a set  = { 1 , . . ., 1} of speed levels.A set of  periodic tasks is denoted by  = { 1 ,  2 , . . .,   }, where the tasks are assumed mutually independent.Each task   is described by its worstcase execution cycles   and average-case execution cycles   (  ≥   ).Throughout this paper, the execution cycles of each task are called work for short.Additionally, each task   has a shorter period length   (i.e., a higher priority) than that of   when  < , and   is the longest of task periods.The relative deadline   of   is assumed equal to its period length   .Each task is invoked periodically by a job, and the th job of task   is  , .The first job of each task is assumed activated at time  = 0.Each job is described by a release time,  , , deadline,  , , and number of cycles that have been executed    .The utilization  of a task set  is denoted by ∑   ∈ (  /  ).During run time, we refer to the earliest job of each task not completed as the current job for that task, and that job is indexed with cur.The deadline of the current job for task   is  cur  , and  cur  denotes the number of cycles that the current job of   has executed.
Without loss of generality, when   is the first scheduled task after time  ,−1 , where  ̸ = , the bottleneck (shortened to bn) is the next release time of   (i.e., the  , ).In the workdemand analysis method, available slack in the interval [bn,  ,+1 ) is estimated.

Low-Power Work-Demand Analysis (lpWDA)
. This section briefly introduces an online DVS scheme called lpWDA [6].Notations  Exchange , ℓ right  , and  asyn belong to PRA algorithm and are presented in Section 4. In line 2 of Algorithm 1,  is an infinitesimal, and readyQ contains the currently activated tasks, and its subset, Γ ACT  (), containing the active tasks is In the lpWDA, the tasks in  are scheduled according to RM priority policy.When a task is activated (released), its job   is moved to , and the remaining WCET of this job is set to   , which is  rem  () =   .When   is executed at time ,   () is the amount of work required to be processed in [,   ).
In Algorithms 1, 2, and 3 and Procedure 1, lpWDA performs in the following steps.First, the system is initialized by setting the initial upcoming deadlines () and remaining worst-case execution ( rem ) of each task.When   is active at time , notation   of each task   is defined as follows [6]: where  is the infinitesimal.The jobs which are active during [, max{  ()}] will be examined for slack estimation.  () denotes the estimation of higher-priority work that must be executed before   (lines 1-2 lines 13 and 14), where   is the earliest upcoming deadline with respect to   .Notably, function   () computing the amount of low-priority work is performed recursively until it finds   with the longest of task periods and lowest priority with respect to   .As defined in Section 3.1, the length of interval [0, bn) is   .Then, lpWDA computes the length of slack-time stealing from low-priority tasks in the interval [  , bn) and applies the slack to the current job.Therefore, Algorithms 2 and 3 play crucial roles in slack-time analysis and dominate the run time complexity of lpWDA algorithm.Formally, to describe the slack analysis method using lpWDA, the following notations are defined:   (): the amount of work required to be processed in interval [,   );   (): the available slack for   scheduled at time  can be computed as follows: In (3),   () consists of three types of work: (1)  rem  (), (2)   () from the higher-priority tasks, and (3)   () from the lower-priority tasks.The work required by higher-priority tasks is derived as follows: where  past  () denotes the work required by uncompleted tasks released before , and  future  () denotes the work released during [,   ].We compute  past  () and  future  () as follows: where  is the infinitesimal.According to the above statements, the amount of work required by the scheduled task   can be formulated as where notation  + stands for max(, 0).Equations ( 6), (7), and ( 8) are repeated iteratively until   is the lowest priority task in  (i.e.,   () = 0).Conceptually, lpWDA uses this linear-time heuristics to estimate available slack in an interval up to the upcoming deadline of lower-priority tasks.

Motivational Example.
The proposed method is to provide lpWDA-based algorithms (e.g., lpWDA, lpLDAT, lpWDA-DP, and lpWDA-AC) with a subroutine to improve their work-demand analysis.The main advantage is that PRA can be independent of each function-specific slack analysis method.For instance, the main purpose of lpWDA-AC and lpWDA-DP techniques is to decrease context-switch overhead while that of lpLDAT is to reduce transition time and energy overhead.PRA can work together with these lpWDA-based algorithms to enhance their slack computation capability.
Example 1.Consider a periodic task set  in Table 1, which presents the period length, WCET, and ACET of each task.Figure 1(a) presents the execution schedule under the worstcase workload in the first hyperperiod.Figure 1(b) shows the speed schedule using lpWDA algorithm for task set  and assumes that actual work of each task equals its ACET.Before assigning  1,1 at time  = 0, lpWDA computes available slack time in an interval up to  3,1 = 6 by calling Algorithm 3, recursively.However, interval [0, 6) has no slack-time under the WCET schedule.If the length of the analysis interval is extended to 2 ×   , one unit of slack time is derived from 2 ×   − ∑  =1 ⌊2  /  ⌋ ×   .The slack in [11,12) can be moved backward to the current scheduling point by a deferred execution of earlier work.For instance, in Figure 1(a), the slack in interval [11,12) can be exchanged with the work in interval [7,8), and then slack in interval [7,8) can be exchanged with the work in interval [4,5), and it can be exchanged once again with the work in interval [2,3).Finally, the slack in interval [2, 3) can be exchanged with the work in interval [1,2).Therefore,  1,1 is scheduled with speed  1,1 = (  /(  + 1)) (Figure 1(c)).Additional slack can be reclaimed without deadline missing from the interval that is, two times longer than the longest task period.Notably, this idea actually neither moves all of the jobs of a schedule to  (e.g.,  = 0) nor Exchanges the slack with work for using this slack time.However, this primitive idea does not work in some situations.For example, in Figure 1(d), when  2 is increased to 6, slack in the interval [11,12) cannot be transferred before  = 6.In fact, jobs  1,3 ,  2,2 , and  3,2 are released simultaneously at time 6.The slack in interval  [11,12) cannot follow this idea, because a deadline is likely to be missed by one of those three jobs.Our goal is to devise an efficient work-demand analysis method that obtains additional slack while satisfying th tasks' deadline.

Work-Demand Computation
Let ) and cannot make a target slack be available for the task right side to bn, job   can still postpone its work for moving the slack forward and approaching bn.For example, in Figure 1(a), when the period length of  1 is increased from 3 to 4, the slack in interval [11,12) cannot be reclaimed by postponing the work of  1,3 or  2,3 because it is hampered at time 8. Therefore,  3,2 can defer its work and the slack time in [6, 7) will be available.On the contrary, if one extends the additional analysis interval such that it is longer than or even several times of   , job   cannot move the slack after bn +   to approach the bn and may be blocked in this interval.For an analytical interval whose length is equal to   , it has the following advantages.After deriving the amount of slack time which will be available to the tasks nearby bn, those jobs whose period spanning astride the bn can be deferred to reclaim additional slack before bn.That is, the current job can utilize the additional slack by performing a lpWDA-based method.Notably, in an actual scheduling process, PRA does not exchange any work with slack.Instead, it only passes the length of additional slack time for current job to lpWDA and does not affect schedulability of subsequent jobs.
To present the proposed method, we define the following notations: where  denotes the number of tasks in the set of   ,  < , and   is the task with the longest period in .A set of tasks are called synchronous at time  if their jobs are released at time .In an extended analysis interval [bn,  ,+1 ), the number of synchronization points of the tasks in   can be derived as follows: where LCM(  ) denotes the least common multiplier of task periods in   .As shown in Figure 2, the first synchronization point of   within the interval [bn,  ,+1 ) is derived as When (  , ) =  ∈ ( , ,  ,+1 ), slack time is likely to be blocked or shrunken at time .In Figure 3, when all tasks except   are synchronized at time , a slack may not be moved backward from the right to the left side of .In this case, slack can still be moved to the current time by postponing the execution of the work of   .When −1 tasks are synchronized in interval (bn,  ,+1 ), we can derive Syn(  , ) ≥ 0, and their the earliest synchronization point is derived by ( −1 , ).The worst-case execution time in interval (bn, ( Therefore, the available slack for  ,+1 in this interval is at least Similarly, when {  ,   } ⊂  −2 and   ̸ =   , there are −2 tasks that synchronize at time ( −2 , ).The available slack time for  ,+1 in interval [bn, ( −2 , )) is at least Therefore, if  of tasks are synchronized in interval [bn,  ,+1 ), the minimal available slack time for   in interval [bn, ( − , )) is denoted as where  Exchange  denotes the estimated slack in interval [bn, (  , )).For example,   does not synchronize with other tasks in   (Figure 4).Therefore, one can compute the value of Syn(  =   −  , ) for each   , where  = −2 and  = −1.Suppose Syn(  −  , ) > 0, the earliest synchronization point of tasks in   −   is derived using (11).
After deriving available slack time within interval [bn, (  , )) where  ≤  − 1, we compute the length of the slack time which is available for the task in interval [bn, (  , )).We assume  bn denotes a set of tasks in which task periods go astride the bn.Let   ∈  bn ; the lengths of left and right parts of   split by bn are defined as ℓ left  Slack time bn Figure 2: An example of task synchronization points.as the total amount of work in  bn .As shown in Figure 3, the lengths of ℓ left max , ℓ right max , and accu bn limit the maximum length of slack that can be moved in interval [ ,−1 , bn).Consequently, the restriction on the length of slack time is as follows:

Slack time
According to the work demand in a WCET schedule, the slack time in interval [ ,−1 ,  ,+1 ) is computed as follows: PRA computes the length of additional slack time within interval [bn,  ,+1 ) by (17).It then computes the length of this slack time that can be available for the jobs in interval [ ,−1 , bn) according to (17) and (18).Finally, it changes the priority of a job that goes astride the bn when this job is moved to readyQ according to RM scheduling.In line 1 of Procedure PRA,  denotes an infinitesimal value.= 1 to the lpWDA algorithm and passes additional slack  Exchange to CalcLowerPriorityWork() in Algorithm 3. Notably, the tasks using PRA still execute under RM priority policy except one of the jobs whose periods span astride the bn.At time  = 0, when jobs  1,1 ,  2,1 , and  3,1 enter  at time  = 0,  1,1 has the highest priority and utilizes additional slack  Exchange min estimated by PRA.Therefore, job  1,1 obtains one unit of time of slack and changes its voltage level from 1 to 0.5.On the contrary, if primitive lpWDA performs  1,1 at time  = 0,  1,1 cannot obtain any slack.When lpWDA executes iteratively, the value of  Exchange does not change until  1,1 is completed.Figure 1(c) presents the scheduling result obtained using Procedure PRA.After completing  1,1 ,  Exchange min unit of slack has been run out, primitive lpWDA continuously performs voltage scaling on the subsequent jobs of  1,1 .In the case of  2,1 , it begins after  1,1 ( = 1) and obtains one unit of slack time from primitive lpWDA.Therefore, its WCET under voltage V = 0.5 is changed to

Correctness Proof
In this section, we prove the correctness of the schedules produced by lpWDA and PRA based on worst-case response time (WCRT) analysis and assume that the given task sets are feasible under preemptive RM scheduling.For the fixedpriority preemptive scheduling, a critical instant for a task   is given by a moment in which the release time of   coincides with all higher-priority tasks.Let   denote the WCRT of   , without loss of generality, the higher-priority tasks have simultaneous release time with the job of   .
Lemma 3. When a task set  contains only one task   , the available slack produced by lpWDA for   is Proof.By (8), the amount of work required to be processed in interval [,   ) is According to (3), the available slack is derived as which completes the proof.
In lpWDA, the slack derived from lower-priority task is given to the highest priority job in the readyQ.In the WCET case, after applying the slack to the highest priority job, the execution cycles of lower-priority jobs are postponed and their WCRTs will be increased in the length equal to that slack.
Lemma 4. When a task set  contains  tasks where  ≥ 2, the amount of work required to be processed in [,   ] ( ≤ ) for the highest priority job   is Proof.Assuming that  contains  tasks where  ≥ 2, we prove that this lemma form the lowest priority task (i.e.,   ) to the highest priority task using mathematical induction.The case of   is proved separately because the third term of   () is different from those with  <  in (22).From (8), when   is the lowest priority task (i.e.,  = ), the workload of the tasks whose priorities lower than   are zero (i.e.,   () = 0).Therefore, the amount of work required to be processed at time  is Because   =   before completing   , we add (  −   ) to (23) and derive and this completes the proof of   .

Inductive
Step.When task  2 is considered, the value of  2 () can be obtained as follows: When we consider the task with the highest priority (i.e.,  1 ), the amount of work of its lower-priority task is Substituting  2 () in (30), we get When all tasks release at time , we have  2 () =  2 .Therefore, we get By (8), substituting (32) in  1 (), we get The proof of Case 2 is similar to that of Case 1 and this completes the proof.
Lemma 5.The length of slack, that is, provided by lpWDA for the highest priority task in readyQ is at most Proof.Assuming that   has the highest priority in readyQ, this proof can be derived directly from (3), (19), and (22).
The following theorem proves the schedulability of lpWDA by using worst-case response time analysis.We consider each active job   in readyQ has a simultaneous release with all higher priority tasks.Theorem 6.Given a set  of tasks is feasible in RM schedule, the maximum response time of task   under lpWDA is less than or equal to its deadline.
Proof.Assuming job   has the highest priority in readyQ.By (8), we get   =   at time  = 0. Due to (3) proposed by lpWDA, the deadline of   can be guaranteed by when  = 0. Therefore, we get

Mathematical Problems in Engineering
When   runs out   , all of subsequent jobs of   have to postpone their response times separately at most   unit of times comparing to those in their WCET RM schedule.
Assuming   has lower priority than that of   , we prove that the length of new WCRT of   including   is less than   .In a feasible RM schedule, the WCRT of   is The new WCRT of   considering the length of   is denoted as Based on (22) in Lemma 4 and set  = 0, we derive From the definitions of function   () in ( 4) and ( 5), we derive and  is the infinitesimal.Because we derive from (40) and complete the proof.
From (41) in Theorem 6, the difference between  new  and   is derived from the following corollary.

Corollary 7.
For some tasks   ∈  bn ,  <  and   = bn, and   is not the multiple of these   ; the difference between  new  and   is formulated as Notably,  new  presents the length of WCRT proposed by lpWDA, and therefore the slack between  new  and   could be utilized by PRA.
Consider the example shown in Figure 4(a).The value of  3,1 (0) is set to the sum of  3 and  3,1 (0) which is shown in the gray box of Figure 4(b).There are 6 time units that are required to be processed before  3,1 = 8.In order to guarantee the feasible schedule of higher-priority jobs whose periods span astride  3,1 (i.e.,  1,2 and  2,2 ), lpWDA estimates how much time should be reserved for the higher-priority jobs.In this case,  1,2 +  2,2 = 2 is derived from (43).We investigate the difference between   and  new  to keep the deadlines of PRA jobs.

Lemma 8. Algorithm lpWDA selects an effective feasible speed for the active job in the analysis scope generated by upcoming deadlines.
Proof.Lemma 8 is derived directly from Theorem 6 and Corollary 7. Lemma 9. Let  > ,   = bn and   ∈  bn .When job   is feasible under PRA,   also keeps its deadline.
Proof.According to Theorem 6, we have The WCRT of job   under PRA is changed to By (45), the deadline of job   is still guaranteed by PRA.
Additionally, the priority of job   is changed temporarily lower than   , and   executes immediately after   .Therefore, the WCRT of job   is denoted as , we get and complete the proof.
Lemma 9 proves that additional slack produced by PRA is shorter than the right part of   split by bn.Therefore, the deadline of job   is kept after changing the priority of   to the lowest priority in the readyQ.After completing   , the schedule is performed continuously under lpWDA.The schedulability proof in interval [bn,  ,+1 ) is similar to Theorem 6 except additional work  Exchange in the WCET schedule.
Lemma 10.An lpWDA schedule remains feasible when  Exchange unit of work is postponed to interval [bn,  ,+1 ) using PRA.
Proof.In the interval [bn,  ,+1 ), the critical instant appears in a situation that job   released before bn may remain incomplete at bn, while other tasks are released at bn.In accordance with (47), the maximum uncompleted work for job   at bn is  Exchange .Let    denote the WCRT of job  , in the interval [bn,  ,+1 ) where  >  > .According to (40) in Theorem 6, we have According to line 13 in algorithm PRA, the length of available slack is  Exchange ≤   .From (48), the following statement still holds: Therefore, we derive which completes the proof.
In fact, Procedure PRA focuses on providing the potential and available slack time to the lpWDA-based algorithms.
Theorem 11.Procedure PRA provides additional slack that guarantees all task deadlines in the lpWDA.
Proof.In the interval [ ,−1 , bn), suppose job   is being executed in [ ,−1 , bn).By executing line 9 in Algorithm 1 while passing the additional slack  Exchange to the function CalcSlackTime() in the lpWDA algorithm, Algorithm 2 computes the length of the slack, which   can use by calling the function CalcLowerPriorityWork() recursively.When job   uses complete  Exchange and all of its subsequent jobs execute in their WCET, job  ,−1 is likely to miss its deadline.However, line 14 in Procedure PRA solves this problem by changing the priority of job   to the lowest priority job in readyQ.Because ℓ right  ≥  Exchange , according to Lemma 9, the deadline of jobs   and   is guaranteed in the interval [ ,−1 , bn).
In the interval [bn,  ,+1 ), Procedure PRA computes the length of additional slack in [bn,  ,+1 ).Given that   has been assigned to the lowest priority in readyQ, and all deadlines before the completion time of   are met.Based on the completion time of   , we divide it into two cases.
Case 1. Job   completes at time  ≤ and  ≤ ≤ bn.In this case, the slack computed by Procedure PRA is not to be used by the jobs started before  ≤ .Because  Exchange produced by Procedure PRA does not shift to the time before the bn, it has no influence on the job execution cycles after bn.Therefore, the initial lpWDA algorithm guarantees the deadline of jobs that started after the bn.
Case 2. Job   completes at time  > and  > > bn.When a scheduling point is at  > , the analysis scope defined in ( 2) is extended up to  ,+1 by calling the Procedure CalcLow-erPriorityWork() in Algorithm 3 recursively.In a feasible schedule, when slack  Exchange has been exchanged on the left side of the bn and jobs before  > have been performed according to their WCET, the length of delayed work is not longer than  Exchange .Therefore, the length of the additional work moved in interval [bn,  ,+1 ) is at most  Exchange .By Lemma 10, the additional work does not affect the feasibility of lpWDA schedule and completes the proof.

Performance Evaluation
In this section, we evaluate the time and energy efficiency of PRA scheme for randomly generated task sets and compare them with those of the ccRM, lpWDA and lpLDAT schemes.Both the ccRM and lpWDA are modified to account for transition time overhead.In simulations, lpWDA, and lpLDAT are called the host algorithms of PRA and cooperate with PRA to compare its performance with those of initial ccRM, lpWDA, and lpLDAT methods.

Complexity and Execution Time of Algorithms
Theorem 12.The PRA algorithm has a computational complexity of O(n) per scheduling point, where n denotes the number of tasks in the systems.
Proof.Lines 4 and 5 are completed in constant time for each iterative step according to (19) and (23).In line 8, the value of  Exchange asyn is derived from (24)-( 27), where the value of  Exchange  (ℓ) in (27) needs () time to compute the length of slack time.In line 10, the computation of  bn and   needs () time.Therefore, the overall time complexity is ().
In the simulation results, for any given pair of ,  and / ratio in , 10000 task sets are generated randomly.The experiment result is the average value over the 10000-task sets.In a task set, every task period   (as well as deadline   ) is uniformly distributed in the range [1,100] ms.The length of each schedule is at least ten times of its tasks hyperperiod except the fourth experiment in Section 6.2.Execution time   of each task is assigned a real number in the range of [1, min{  −1, 90}]ms.In a task set, after assigning values to the execution time of all tasks, we give a utilization  to the task set and rescale the   of each task, such that the summation of task weights (i.e.,   /  ) equals the given .
Due to the limited execution speed of embedded processors, an LCM(  ) function is implemented by offline nonrecursive programs.This function is composed of (, ) and (, ) which compute the least common multiplier (LCM) and greatest common divisor (GCD), respectively.In these programs, each integer is represented by a 32-bit word.Each experiment (schedule) has a maximum of 20 tasks, and each task has an integer variable for storing the period lengths of accumulated values of an LCM.Additionally, two integers are needed to record GCD and LCM of all task periods.Therefore, each schedule requires at most 100 bytes for storing data, which include local variables.
In Figure 5, we examine the execution time required by each online algorithm, including the following algorithms: ccRM: the ccRM algorithm from [7] is modified to account for transition overhead; lpWDA: the lpWDA algorithm from [6] is modified to account for transition overhead; lpLDAT: the algorithm from [12]; lpWDA-PRA: the lpWDA is the host algorithm of PRA and cooperates with Procedure PRA; lpLDAT-PRA: the lpLDAT is the host algorithm of PRA and cooperates with Procedure PRA; lpWDA-DP-PRA: the lpWDA-DP is a host algorithm and cooperates with Procedure PRA; lpWDA-AC-PRA: the lpWDA-AC is a host algorithm and cooperates with Procedure PRA. Figure 5 presents maximum execution time of each algorithm on a processor versus the number of tasks in the system.Notably, the simulation results produced by lpWDA-DP-PRA and lpWDA-AC-PRA are very close to those of lpWDA-PRA and lpLDAT-PRA, for a clear presentation of this figure, they are abridged.All algorithms executing on the simulated processor are based on ARM8 core with the highest speed (100 MHz) and voltage level.The measurement results were generated by inserting a system timer function and executing each algorithm individually.Obviously, ccRM has a significant advantage in terms of execution time when compared with that of the other online algorithms.Because the algorithms are invoked upon each release and completion, one must increase execution time of each task by two times the maximum execution time of the algorithm to account for scheduling overhead.To present the maximum execution times of these algorithms, the simulation process is as follows.
First, a set of experiments performed Procedure PRA with its host algorithms (lpWDA and lpLDAT) and other initial algorithms.The functions of the system timer are used to record the duration of each algorithm, choose their longest execution times in each schedule, and accumulate execution times separately with respect to different methods.At experiment end, these accumulated execution times are divided by the number of schedules generated.PRA is an efficient online algorithm that increases in average additional execution time by less than 12% of those of their host algorithms. .Before performing these experiments, 10000 task sets have been generated randomly including the number of tasks in each set, task period lengths, and their worstcase execution requirements in accordance with a uniformed distribution function.Early completion time of each job in simulation (1), (2), and (4) was randomly drawn from a Gaussian distribution in the range of [BCET, WCET], where BCET/WCET = 0.1.In simulation (3), each experiment was performed by varying BCET at 10%-90% of WCET.
For all experiments, we assume that 10 frequency levels are available in the range of 10-100 MHz, with corresponding voltage levels of 1-3.3 Volts.The energy consumption caused by memory access and cache misses are ignored, and all experimental results are normalized against the same processor running at maximum speed without a DVS technique (non-DVS for short).Table 3 presents the power specification of the ARM8 processor [15].The overhead considered in simulations is as follows.
(1) Algorithm Execution Time and Energy.The execution time overhead is referred to the simulation results in Figure 5. Energy overhead is obtained under the assumption of maximum speed  max .
(2) Voltage Transition Time and Energy.The assumption of voltage scaling overhead is the same with that in [17].For the voltage scaling from  1 to  2 , the transition time is where  and  max the charge to the capacitor and maximum output current of the converter, respectively.Transition time is at most 70  between maximum transition [15].The energy consumed during each transition is where  denotes the efficiency of a DC-DC converter.
(3) Context-Switch Time and Energy.Context-switch time is assumed 50  at the highest speed  max , as in [18].Figures 6, 7, 8, and 9 list the energy consumption of each method.Energy consumption includes both execution duration of PRA and its host algorithms (i.e., the lpWDA and lpLDA), and the context-switch time required to switch to and from other real-time tasks.Since the range of task periods has been shortened to a scale of [1,100] ms, the difference between task periods and context-switch times or transition times is smaller than those assumed in [6,12].Additionally, these energy overheads arose from PRA, and its host algorithms are also taken into account, such that the experimental results are close to actual situations.In these simulations, the host algorithms cooperating with PRA are better than its initial algorithms, respectively.Figures 6, 7, 8, and 9 also present the results for a clairvoyant algorithm, named bound, which knows the actual execution cycle of each task beforehand and adopts the optimal speed accordingly.The length of analytical interval utilized by algorithm bound is set at least four times the  length of   except the fourth experiment in Figure 9.This setting ensures that bound's analytical length is longer than those utilized by other methods and equal to the length of the schedules.Every scheduling point in the entire schedule is examined when looking for the best start and finish times.The context-switch and transition overhead assumed in the bound are the same as those with other methods while its execution time is assumed zero.In fact, bound is not a practical algorithm because it is extremely time consuming when finding the suitable start, preemption, and completion times, and no algorithm can predict the exact amount of job execution cycles beforehand.yardstick in simulations because no real DVS algorithm can achieve better performance than that of bound.
As shown in Figure 6, the lpWDA-PRA and lpLDAT-PRA method, respectively, reduces the energy consumption by at least 12% and 4% over that of primitive lpWDA and lpLDAT.The energy efficiency of lpWDA-DP-PRA is between those of lpWDA-PRA and lpLDAT-PRA, while the energy efficiency of lpWDA-AC-PRA is 6%-3% worse than those of lpWDA-PRA and lpLDAT-PRA.The difference between lpWDA-AC-PRA and lpWDA-PRA is that lpWDA-AC in lower layer tries to schedule tasks with higher speed and leaves additional slack that PRA is unaware of.Therefore, this slack time becomes a fragment of slack time that is harmful to power saving.The following series of experiments almost reveal the phenomenon that the performance of lpWDA-AC-PRA is worse than those of lpWDA-PRA and lpWDA-DP-PRA.The value of  of each task set is assigned randomly at 10%-70% by uniform probability distribution function.The value of  and the / ratio for each task set are 0.8 and 0.5, respectively.With a large U, PRA outperforms its host algorithms in a small , consuming up to 25% and 24% less energy than lpWDA and lpLDAT, respectively.Although execution overhead of PRA is included, for a large , PRA still outperforms its host algorithms.The reason is likely that when the number of tasks increases, the number of task synchronization points appeared in the analytical interval likely decreases and benefits slack time computation.
Experimental results in Figure 7 indicate that lpWDA-PRA and lpLDAT-PRA saved up to 25% and 4% more energy than their host algorithms.Among lpWDA-based methods, although lpWDA-DP-PRA has the best performance, it slightly outperforms lpWDA-PRA.In other words, when PRA is applied in lpWDA, the - (DP) technique contributes little energy saving to the host methods.In the experiment, the  and the bc/wc ratio of each task set are 10 and 0.5, respectively.Increasing the value of  in Figure 7 increases the energy consumption of PRA and its host algorithms.With a small , the gain from PRA is modest, with 1% and 4% saving compared to that of the initial lpLDAT and lpWDA algorithms, respectively.Additionally, with these methods,  is an important factor when computing the slack for deciding processor speeds.With a moderate  value, lpWDA-PRA and lpLDAT-PRA consume at most 16% and 4% less energy than that of the initial lpWDA and lpLDAT algorithm, respectively.Therefore, PRA utilizes not only the advantages of its host algorithms but also slack belonging to the value of 1− as possible and shifts the slack to the current job.
In Figure 8, the set of experiments varies the / ratio at 0.1-0.9, and the value of  and  is 0.8 and 10, respectively.The energy consumed by PRA is positively correlated with the / ratio, while their hosts are not sensitive to the / ratio.In this experiment, lpWDA-DP-PRA still outperforms other lpWDA-based methods but not obviously.With a low / ratio, PRA is the best, consuming up to 26% and 16% less energy than the lpWDA and lpLDAT, respectively.With a / ratio of 0.9, PRA collaborated with lpLDAT which consumes slightly more energy than the initial lpLDAT algorithm.The reason is likely that the additional saving gained by the PRA algorithm is compromised by its execution overhead.
In Figure 9, the analytical interval in bound is exactly  .Notably, when  analysis   = 2, PRA and bound have equal length of analytical interval; bound gains at most 33% energy savings less than the proposed schemes.Therefore, extending the additional analysis interval, such that it is several times longer than   does not increase an already substantial energy saving but rather increases computing overhead during slack time analysis.There exist some limits and difficulties whenever extending analysis interval that is longer than two times of   .Firstly, the length of the potential slack time is affected by the length of analytical interval, the values of Syn(  , ) and (  , ), and by the locations of actual execution cycles of job.In Figure 3, the locations of the job workload affect the available length of slack time.Additionally, utilization of a task set influences the length of available slack time.For example, for a task set with high utilization, actual workload of many tasks may appear after (  , ) where  =  or  =  − 1 and cannot be exchanged with the slack.Therefore, the longer the analytical interval is, the harder the prediction of the length of additional slack becomes.For a longer interval than   , we may need considerably more time and memory space than those required by PRA.Additionally, the length of a hyperperiod can vary from one to many times of   as long as one task period is changed.When an analytical interval extends up to a hyperperiod, execution time required by algorithms may change severely and adversely affect the predictability of a real-time schedule.

Conclusions
In this paper, we proposed the parareclamation algorithm (PRA) based on the concept of work-demand computation.This method can serve many existing RM scheduling methods as a guest algorithm.PRA cooperating with the host algorithms such as lpWDA, lpWDA-DP, lpWDA-AC, lpLDA, and lpLDAT can further decrease energy consumption without increasing time complexities.It is fully compatible not only with transition-aware (i.e., lpLDAT,) methods but also preemption-aware (i.e., lpWDA-AC and lpWDA-DP,) methods.Experimental results indicate that PRA can utilize the additional slack produced by lpWDA-AC and lpWDA-DP and reduce average energy consumption by 14% when compared with that of the initial schemes.
and ℓ right  , respectively, and the longest ℓ left  and ℓ right  are defined as ℓ left max and ℓ right max , respectively.Additionally, we define accu bn = ∑   ∈ bn (16)

Figure 3 :
Figure 3: The task periods span astride the bn.

Mathematical Problems in Engineering 11 Due
to the definition of ℓ right

Figure 5 :
Figure 5: Maximum execution time for the scheduling algorithms versus the number of tasks on a 100 MHz processor.

Figure 9 :
Figure 9: Energy consumption under different values of  analysis analysis   times the length of   .The values of  analysis   are controlled by the simulation from 2 to 18 and the totaltask of each task set is assigned randomly at 2 to 20 tasks.For simplicity, the length of each schedule is also controlled in  analysis   ×   .When the value of  analysis   increases, the energy saving of bound is not obvious, and the energy consumption required by the schemes with PRA are not sensitive to  analysis
(17)t: a reference task   and current time  Output: the amount of lower-priority work needed to be done before  (17)if   is identical to   then return 0; (18) Identify the task   that has the earliest upcoming deadline among the tasks whose priorities are lower than that of   ; (19)   () := CalcLowerPriorityWork(  , t,  Exchange

Table 1 :
An example of real-time task set .
, be the bn of   , which is the first scheduled job at time  where  ≥  ,−1 .PRA computes the length of additional slack in the interval [bn,  ,+1 ).As long as the slack time can be reclaimed at a time earlier than bn, lpWDA can utilize it by postponing lower-priority task and improve energy efficiency of schedules.Why PRA focuses on slack computation in the interval [bn,  ,+1 ) while longer or shorter intervals?Even if all job (except   ) periods are within [bn,  ,+1 2,1 = 2, and actual execution time is  2,1 = 1.At time  = 4, job  2,2 is released and moved to .Its priority is changed to and lower than the remaining execution time of  3,1 by executing line 14 in Procedure PRA.Therefore, job  2,2 begins its work after completing the remaining work of  3,1 .Notably, PRA only changes job's priority in  bn and does not affect the feasibility of lpWDA schedule.The correctness proof is discussed later in the next section.Table2shows the values of scheduling parameters.The rightmost job in  is being executed at that time.In Algorithm 1, job   is a global variable.Whenever job   executes and  Exchange > 0, Procedure PRA lowers its priority to guarantee the timing constraint of jobs   and   .

Table 2 :
Scheduling parameters in Example 2.
Thus, bound functions a