A Dynamic Programming Solution for Energy-Optimal Video Playback on Mobile Devices

Due to the development ofmobile technology andwide availability of smartphones, the Internet ofThings (IoT) starts to handle high volumes of video data to facilitate multimedia-based services, which requires energy-efficient video playback. In video playback, frames have to be decoded and rendered at high playback rate, increasing the computation cost on the CPU. To save the CPUpower, dynamic voltage and frequency scaling (DVFS) dynamically adjusts the operating voltage of the processor along with frequency, in which appropriate selection of frequency on power could achieve a balance between performance and power.We present a decoding model that allows buffering frames to let the CPU run at low frequency and then propose an algorithm that determines the CPU frequency needed to decode each frame in a video, with the aim of minimizing power consumption while meeting buffer size and deadline constraints, using a dynamic programming technique. We finally extend this algorithm to optimize CPU frequencies over a short sequence of frames, producing a practical method of reducing the energy required for video decoding. Experimental results show a system-wide reduction in energy of 27%, compared with a processor running at full speed.


Introduction
The Internet of Things (IoT) allows physical objects to interact and cooperate with one another by exchanging data, and multimedia-related services based on the IoT are now gaining popularity in various applications areas [1].For example, users of home security systems now see images from cameras on a smartphone, and telemedicine systems allow doctors to monitor a patient's health using video communication.
To support multimedia applications within the IoT, the characteristics of video need to be considered carefully.For example, the amount of data involved requires the use of compression techniques for codecs, but encoding and decoding processes are computationally intensive.Video transmission is a real-time process, which requires continuously periodic decoding to avoid distorted playback.Most importantly, mobile IoT devices have a limited energy budget, making the energy requirements of video transmission an important issue.
An effective way of reducing CPU power consumption is to use a dynamic voltage and frequency scaling (DVFS) technique, which adjusts the operating voltage and frequency of the processor [2][3][4].Because the energy dissipated by the CPU scales quadratically with the supply voltage, reducing the voltage saves a lot of energy but also slows program execution, so that an appropriate compromise is always required.
In video playback, frames have to be decoded and rendered at playback rate to avoid a loss of quality.For example, to play a video at 25 frames per second, a frame must be decoded every 40 ms.This decoding process needs to finish within this period, but workload imposed by each frame varies significantly with video content [5][6][7][8][9][10].
In most previous work on the application of DVFS to videos, the lowest frequency that satisfies the deadline of the decoding time is chosen to reduce power consumption [11], but more energy can be saved by introducing flexibility in timing, by means of buffering techniques: if several frames are decoded in advance, the CPU can operate at lower frequencies on average, but buffering comes with its own costs [12].Therefore, power saving is only effective with an appropriate frequency selection method subject to buffer constraints, but previous work took no account of this issue.

Mobile Information Systems
We propose a new scheme that determines the CPU frequency needed to decode each frame, which minimizes energy consumption while avoiding buffer overrun.We start by developing a video playback and energy model, formulate the energy optimization problem, and go on to use a dynamic programming technique to determine a sequence of frequencies.We finally present experimental results based on measurement of smartphone energy consumption and decoding times.
The rest of this paper is organized as follows.We present related work in Section 2 and the system model in Section 3. We formulate an optimization problem in Section 4, propose a new frequency selection algorithm in Section 5, and extend it in Section 6.We assess our scheme in Section 7 and finally conclude the paper in Section 8.

Related Work
CPU power management has been the subject of a lot of research, and most of the resulting techniques involve either dynamic power management (DPM) or DVFS.DPM puts an idle CPU into sleep mode [13], whereas DVFS reduces the voltage and frequency of an active CPU [2,4].DPM is not generally suitable for real-time applications that run continuously, because the idle intervals are too short to allow the CPU to enter sleep mode [8].Therefore, we only review previous works about DVFS only in this section.
DVFS techniques can be classified into interval-based and task-based algorithms [7,14].Interval-based schemes monitor the CPU load at intervals and respond by changing the CPU frequency and voltage.A representative scheme is the Linux Ondemand governor, which adjusts frequency periodically based on CPU utilization in the preceding interval [15].Another scheme is LongRun [16] which varies the frequency to suit the measured utilization.These methods are typically easy to implement but can make inaccurate predictions based on the assumption that loads are similar to recent loads [14].
Task-based schemes can overcome this problem to some extent by classifying tasks into several types to which different frequency selection policies are applied.Ayoub et al. [17] manage frequency and voltage to meet a performance target, expressed as a fraction of maximum system performance.Flautner and Mudge [18] propose a method that chooses a CPU frequency for each task based on its recent computational requirements.Seo et al. [14] present a frequency allocation method to reduce the average response time of tasks.However, all of these methods have been developed for general workloads and therefore may not be suitable for multimedia applications with real-time constraints.DVFS techniques for real-time systems are generally integrated with real-time scheduling [2][3][4].Based on the analysis of worst-case execution times, they select CPU frequencies that satisfy the real-time constraints; but tasks are often complete before their worst-case execution times, so several algorithms incorporate methods of reclaiming the unused time [2,4].The CPU starts each period running at a frequency which will meet the worst-case demands and the frequency is then reduced in response to the actual computation requirement.
Several groups have investigated DVFS techniques for video applications [6,7], in which the key issue is to estimate the computational requirements of successive frames.Most of these techniques predict the workload required to decode a frame from the workloads incurred in decoding previous frames and adjust the CPU frequency.The accuracy of these schemes has been improved by feedback mechanisms, which take previous prediction errors into account [19].
It has been widely observed [5][6][7][8][9][10] that frame decoding times vary significantly.For example, some of the frames in an MPEG video can take ten times as long to decode as an average frame [20].That makes it difficult to estimate the computational requirements of successive frames to meet their deadlines [5][6][7].Several workload estimation techniques have been proposed for video applications [5-7, 11, 19], and they can be categorized [5] into methods which make use of the relationship between the amount of data in a frame and decoding time and methods which predict decoding times based on recent times and aim to correct prediction errors using a feedback mechanism.
A close relationship between frame size and decoding time has been widely observed [5,6,11], especially in videos encoded with MPEG-style compression, and this relationship allows decoding times to be predicted with reasonable confidence.For example, Liu et al. [6] established a linear relationship between frame size and decoding time and used it to predict decoding times, while Yang and Song [11] improved the accuracy of this approach by introducing a logarithmic relationship, and Bavier et al. [21] used it to predict decoding times.Lee et al. [5] introduced particle-filter techniques to further improve the accuracy of this approach for H.264 codecs.
Yuan et al. [8][9][10] proposed several DVFS techniques in which the CPU speed is adjusted on the basis of a statistical analysis of past workloads.Urunuela et al. [7] developed a history-based DVFS technique, but it is tailored to video kiosks rather than general video players.Choi et al. [22] adopted a hybrid approach in which different DVFS policies are applied depending on the characteristics of each frame.Im and Ha [12] presented DVFS techniques in which buffers were used to reclaim unused CPU time, and Huang et al. [20] introduced a method of predicting decoding times from offline analysis of frame characteristics.
Most of these techniques do not consider the characteristics of video playback, in which some deadline misses and frame skipping are acceptable.Kim et al. [23] presented a DVFS scheme specifically for scalable video coding (SVC) codecs, which makes use of temporal scalability.The scheme put forward by Yang and Song [11] acknowledges the effect of the ratio of deadline miss on energy consumption, but this paper does not provide a satisfactory solution that selects the appropriate frequency while minimizing energy consumption, nor does it examine how buffering affects power consumption.Fetches frame j − 1 j + 3 j + 2 j + 1 j

System Model.
To support periodic nature of video playback, a video player decodes  frames per second, and so the decoding period of a frame,   , is 1/.The Notations explain important symbols used in this paper.Suppose that a CPU supports   frequency levels and that level  is the frequency   ( = 1, . . .,   ).If  < , then   <   , so that    is the highest possible frequency.Let   be the number of frames decoded in a video.Let  active  and  idle  , respectively, be the active and idle power consumption of the system at frequency level .
We will assume that the decoding time of each frame is known in advance: decoding times can be predicted by an offline analysis of the bitstream of a video [20] or by formulating a relationship between frame size and decoding time [5,6,11].This decoding time information can be inserted into the header of a video [20], and we assume that these frames are available to our frequency selection algorithm.Specifically,  , is the decoding time of frame  at frequency level .

Video Playback Model.
Frame-level DVFS is appropriate for a media player [5][6][7]11], which then selects the frequency which best matches the CPU workload imposed by the current frame, before that frame is decoded.The CPU does not change its frequency until the frame has been decoded.
Figure 1 shows our video playback model.The decoding task produces frames at playback rate and passes them to a buffer which stores frames for consumption by a display task, which fetches frames at playback rate.If there was no buffer, then only one frame can be handled by the display task, so the decoder enters sleep state until the frame is consumed by the display task.However, if a number of frames can be stored in the buffer, then decoding can run late, allowing lower frequencies to be selected, but this flexibility is limited by the size of buffer.For example, suppose that the buffer can accommodate   frames.If there are already   frames in the buffer, then decoding of a new frame must be delayed until the next decoded frame has gone to the display task.For example, consider Figure 1, where   = 4.If the buffer already contains 4 frames, then the decoder enters sleep state until frame  has been consumed by the display task.
To explain how this buffering technique can decrease CPU power consumption, consider a CPU with 4 frequency levels of 0.8 GHz, 1.2 GHz 1.6 GHz, and 1.8 GHz.We assume that   = 40 ms and that the process of a frame requires 36 ms at level 4, 40.5 ms at level 3, 54 ms at level 2, and 81 ms at level 1.If there was no buffer, then frequency level 4 must be chosen for every frame to keep the decoding time within 40 ms, as shown in Figure 2(a).However, if there is a buffer, which contains frames decoded when playback starts, then frequency level 1 can be selected for the first three frames, level 2 for the next two frames, and level 3 for the final frame as shown in Figure 2(b), without violating deadlines.

Problem Formulation
We formulate an optimization problem with a solution which will minimize energy consumption subject to the constraints of buffer size and decoding deadlines.Frame  ( = 1, . . .,   ) must be decoded before its deadline  end  , which is   .At  end  , frame  leaves the buffer to be displayed on the screen.Let   be the frequency level selected for decoding frame ; and let  start  be the earliest possible time at which the decoding of frame  can start.Because the buffer can contain   frames, the decoding of frame  can start at  end −  (i.e., (−  )  ), at which frame −  can be removed from the buffer and displayed, allowing a new frame to be decoded and stored in the buffer.Thus,  start  can be expressed as follows: Let  over , ( = 1, . . .,   −1) be the length of time by which the decoding of frame  would overrun if frequency level  is chosen, relative to the start time of the next frame  start +1 .The value of  over 0, ∀ is initialized to 0. This overrun can be expressed as follows: If the decoding of frame  indeed finishes after  start +1 , then  over , is the time difference between the actual time at which the decoding of frame  finishes (i.e.,  , +   Time Frame index The energy consumed during the period of  , +  sleep , while frame  is decoded at frequency level  is written as  , , which can be expressed as follows: At this point, we must introduce a further variable  diff , , which is the difference between the actual time at which the decoding of frame  finishes ( , +  over −1, −1 +  start

𝑗
) and  start  , and this difference can be expressed as follows: Figure 3 shows the relationship between  over , ,

Selected frequency Index of previous frame
A two-dimensional array of (F j,u , V diff j,u ) (1, 5) The main idea of the dynamic programming is to construct a table of the optimal energy  energy , for each frame  when  diff , =  ( = 1, . . .,  round  ), as described in Table 1, where the minimum value in the final row, min =1,..., round    energy   , , represents the amount of energy consumed by the optimal frequency allocation.For this purpose, we first initialize the values of  energy 1, and then develop the recurrence relationship between consecutive frames so as to find all the values of  energy , in the table.We also maintain a two-dimensional array of tuples ( , ,  diff , ) which leads to the minimum energy of  energy , as illustrated in Figure 4. Using this array, a backtracking phase starts from frame   to frame 1 to select frequency of every frame.For example, Figure 4 shows an array of these tuples when   = 5 and  round  = 10.Suppose that the third column in the last row has the minimum energy value.Because  diff , points to the column index of the previous frame, a sequence of frequencies can be selected as follows: (6, 3, 4, 5, 1).Likewise, our dynamic programming algorithm has three phases: (1) initialization, (2) establishment of recurrence relation, and (3) backtracking.
Pseudocode for this frequency selection algorithm (FSA) is presented as Algorithm 1.If  max is the maximum round length so that  max = max =2,...,   round  , we can easily see from Algorithm 1 that the complexity of FSA is (     max ).

Algorithm Execution
If the frame decoding time is known in advance, then FSA can run without modification.For example, before playback, frequency allocation table during the entire playback can be obtained as a result of algorithm execution.However, since the algorithm complexity depends on the number of frames to be decoded, we divide the algorithm into iterations and limit the number of frames taken by the algorithm to   (  <   ).Therefore, at the beginning of the th iteration, the algorithm chooses the frequency for frames between ( − 1)  + 1 and   , which we call FSA-split, as shown in Algorithm 2.
FSA-split has the following characteristics in comparison with FSA: (i) FSA-split determines the frequencies of frames between ( − 1)  + 1 and   .
(ii) An initialization part (lines between (9) and (22) in Algorithm 2) takes the length of overrun in the previous iteration ( over (−1)  , prev ) for the calculation of the parameter values.
Several methods were developed for decoding time estimation, most of which predict future decoding times based on recent measured times [5-7, 11, 19].The decoding times of the frames in a certain GOP do not change a lot compared with those of its neighboring GOPs [11].We can therefore predict the decoding times of the next GOP on the basis of those of the current GOP.For example, if   is set to the number of frames of a GOP, then the frequency allocation table can be established for the next GOP by passing predicted decoding times of the next GOP to the input parameters of the FSA-split.

Experimental Results
7.1.Setup.We performed simulations to evaluate our schemes using power data and timings obtained experimentally.The power consumption of a Samsung Nexus S smartphone (not just the CPU) was measured, and Table 2 shows its ,  over , ,  diff , and  , ← ∞; (7) end for (8) end for (9) for  = 1 to  round  do (10) for  = 1 to   do (11) if  (−1)  +1, ≤  end (−1)  +1 and  =  (−1)  +1, then (12) if  > 1 then (13) C a l c u l a t e active and idle power values.The time taken to decode video frames was also measured for the two videos in Table 3.We compared our scheme with two other algorithms as follows: (1) HF always selects the highest frequency, which is equivalent to no DVFS.(2) LF selects the lowest frequency level which will get each frame decoded in time.This method is a good heuristic, because CPU frequency can be expected to have a monotonic relationship with energy consumption [4][5][6][7]11].

Efficacy of FSA.
Table 4 shows how energy consumption depends on the number of frames that are buffered.We see that (1) FSA always shows the best performance, using 13% less energy than LF and 27% less energy than HF on average, and (2) increasing the size of the buffer saves more energy, but this amount of energy saved gradually tails off.In particular, even when   = 1 so that only one additional buffer is used, FSA uses 11% less energy than LF on average, suggesting that the buffer overhead of FSA is not high.
The results in Table 4 can be attributed to FSA's effective use of the slack times generated by storing decoded frames in the buffer, allowing the CPU to operate at lower frequencies.For example, Table 5 shows the average percentage of the frames in both video clips that are decoded at each frequency; FSA chooses lower frequencies than LF, decreasing energy consumption.FSA chooses the highest frequency (1000 MHz) more often than LF, which increases the idle time, allowing relatively lower frequencies to be chosen than LF.These results suggest that frequency selection has a great effect on energy consumption.

Efficacy of FSA-Split.
To evaluate the efficacy of FSAsplit, we examined how the values of   affect the energy consumption against different values of   as tabulated in Table 6.We see that (1) their energy difference is marginal, exhibiting 1.47% difference at maximum, even when   is set to 12 which is the GOP size; (2) increasing the value of   decreases the energy gap; and (3) increasing the buffer size increases the energy gap even though the difference is negligible.Although FSA exhibits slightly better performance than FSA-split, it takes all frame parameters for algorithm execution, requiring a lot of computation.These results suggest that FSA-split is a practical method of reducing the energy required for video decoding.

Conclusions
We have proposed a new frequency allocation scheme which minimizes energy consumption while avoiding buffer overrun, using a dynamic programming technique.This scheme establishes recurrence relationship between consecutive frames to construct a table of the minimum energy values required to decode each frame and determines a sequence of frequencies required to decode every frame using a backtracking technique.It was extended to optimize CPU frequencies over a short sequence of frames, which gives a basis for energy-saving video decoding in practice.
Experimental results show that it uses 27% less energy than a processor at the highest frequency on average.In particular, it uses 13% less energy, compared to the widely used heuristic which chooses the lowest frequency to get each frame decoded in time.We believe that these results give a useful guideline for low-power video service by providing the minimum bound on power consumption required for video playback.

Figure 3 :
Figure 3: Sequence of frames showing the relationship between  over , ,  sleep , , and  diff , .

Table 1 :
=1  ,  .We can now formulate this frequency selection problem that determines   ( = 1, ...,   ) as follows: the minimum amount of energy when  diff ,  is  milliseconds and frame  is decoded ( = 1, ...,   and  = 1, ...,  round).Let  , be the frequency level required to achieve the energy consumption of  Table for the value of sleep , , and  diff , in a short sequence of frames.The decoding of frame  must start after  start  and finish before  end  .We can express this period for each frame  ( = 1, . . .,   ) as  round  , so that  round be be the corresponding value of  over ,  and  diff , the value of  diff −1,  .

Table 2 :
Measured power consumption against frequency of Samsung Nexus S phone.

Table 4 :
Relative energy consumption of FSA against a number of buffers.

Table 5 :
Percentage of frames decoded at each frequency.

Table 6 :
Percentage energy difference between FSA and FSA-split against different values of   .
, +  over −1, −1 +  start Energy consumed during the decoding period for frame  at frequency   , (, ): Two-dimensional array for  , when  =  diff −1,  ,  =  diff , , and  is the frequency level chosen for decoding frame   diff , : Time between completion of decoding frame  and  start to achieve an energy of  energy ,  diff , : V alueof diff −1,  to achieve an energy of  energy ,   : Number of frames for which frequencies are determined by FSA-split.