QoS Management for Embedded Databases in Multicore-Based Embedded Systems

With ubiquitous deployment of sensors and network connectivity, amounts of real-time data for embedded systems are increasing rapidly and database capability is required for many embedded systems for systematic management of real-time data. In such embedded systems, supporting the timeliness of tasks accessing databases is an important problem. However, recent multicorebased embedded architectures pose a significant challenge for such data-intensive real-time tasks since the response time of accessing data can be significantly affected by potential intercore interferences. In this paper, we propose a novel feedback control scheme that supports the timeliness of data-intensive tasks against unpredictable intercore interferences. In particular, we use multiple inputs/multiple outputs (MIMO) control method that exploits multiple control knobs, for example, CPU frequency and the Quality-of-Data (QoD) to handle highly unpredictable workloads in multicore systems. Experimental results, using actual implementation, show that the proposed approach achieves the target Quality-of-Service (QoS) goals, such as task timeliness and Quality-of-Data (QoD) while consuming less energy compared to baseline approaches.


Introduction
Recently, database functionality is increasingly embedded into mobile and embedded platforms for systematic management of a large amount of real-time data such as sensor streams.For example, autonomous cars need to process a large volume of real-time data from sensors in real time [1].In such systems, real-time tasks with intensive database accesses are required to provide a certain level of Quality-of-Service (QoS), such as task timeliness and data freshness.
In our previous work [2], we presented a real-time embedded database, called QeDB, that supports the timeliness of data-intensive tasks using the control-theoretic QoS management architecture.With the feedback control loop, QeDB can achieve the desired QoS by adapting its control knobs based on QoS errors.Hence, a precise system model is not required at the design time.Our previous work assumed single-core platforms for the QoS management.However, modern embedded systems are increasingly moving towards multicore platforms and they pose a huge challenge since concurrent data accesses might cause contention at non-CPU resources, such as memory and I/O [3][4][5][6][7][8][9].There are potential intercore interferences as the data accesses from one core could also be influenced by the requests from the other CPU cores.As a result, the response time of a data-intensive task can be delayed significantly due to the bottleneck in accessing data.
To handle such unpredictable intercore interferences, we might consider Quality-of-Data (QoD) scaling as a primary control knob for the QoS management.With the QoD scaling, the incoming sensor updates are selectively dropped by the admission controller to control the workload.By scaling down QoD, the workload for data accesses is reduced, rendering less intercore contention for data accesses.However, a major disadvantage of QoD scaling is that its applicability is limited by users' QoD requirements.Hence, the QoS goals might not be satisfied if QoD is saturated at its maximum or minimum.Moreover, QoD scaling is not useful if a task's response time is dominated by computation, not by accessing data.

Mobile Information Systems
In this paper, we propose an efficient QoS management approach, in which multiple complementing control knobs are exploited simultaneously to handle highly unpredictable workloads in multicore platforms.In our approach, the limitation of QoD scaling is complemented by exploiting Dynamic Voltage/Frequency Scaling (DVFS) [10].Unlike QoD scaling, DVFS is more appropriate to control the speed of tasks when the workload is less data-intensive.Further, DVFS has a wide range of operating region.For example, ARM-based Exynos 5422 mobile processor supports 19 frequency/voltage levels ranging from 200 MHz to 2.0 GHz.The two distinctive control knobs are combined using a novel multiple inputs multiple outputs (MIMO) control architecture.This MIMO control architecture can capture the interrelationships between the multiple control knobs and the system outputs and generates proper combinations of the multiple control signals according to the varying workloads.
We implement the proposed approach in an actual multicore-based embedded device by extending our previous work.The evaluation results demonstrate that the proposed QoS management approach is more effective in QoS enforcement than applying either DVFS or QoD scaling alone.Our approach can achieve the QoS goals with significantly smaller power consumption, particularly when workloads are dataintensive and have high chance of intercore interferences for accessing data.
The rest of this paper is organized as follows: in Section 2, we summarize our previous work on QeDB including its transaction model and QoS management architecture.In Section 3, we discuss the effect of intercore interferences on data-intensive real-time tasks.In Section 4, we present our approach to QoS management.In Section 5, the performance evaluation settings and results are presented.Related work is presented in Section 6 and Section 7 concludes the paper.

Overview of QeDB
Our current work extends our previous work on QeDB [2].QeDB is a key/value store database for data-intensive realtime applications running on embedded devices.We briefly introduce QeDB for the discussion of the following sections.

Data and Real-Time
Transactions.Data objects in QeDB can be categorized into temporal and nontemporal data.Temporal data objects are updated by update transactions when new sensor readings become available.User transactions are tasks that perform computation using data objects in the database.User transactions consume both temporal and nontemporal data objects.Algorithm 1 is an example of user transaction that performs real-time analysis by accessing sensor data in QeDB.Instead of supporting complex queries, data objects in QeDB can be accessed through get (key) and put (key, value) interfaces.Data objects are identified using keys.
User transactions in QeDB can specify its desired response time or deadline, according to their timing constraints.We call such user transactions as real-time transactions.If a real-time transaction is periodic, its periodic instances are supposed to meet the deadline.QeDB only supports soft real-time semantics, and, hence, missing deadlines decreases the QoS (Quality-of-Service) but does not jeopardize correct system behavior.

QoS Metrics.
Since typical embedded systems do not have many transactions, deadline miss ratio is not a stable metric for QoS management [11].Hence, we define the tardiness of a transaction as follows: For example, a task is tardy if its response time is greater than its deadline.In QeDB, the average tardiness of real-time transactions is used as a QoS metric to quantify the timeliness of the transactions.Each transaction performs both computation and data accesses.Therefore, a transaction's response time is affected by both its data accessing activities and computation activities.To this end, we define the following parameters for a transaction : Another important QoS metric for data-intensive realtime systems is the Quality-of-Data (QoD).In real-time systems, the result of real-time tasks depends on both the logical correctness and the temporal correctness.The result is temporally correct only if the real-time data from sensors are fresh enough.A temporal data object   is considered fresh or temporally consistent, if its timestamp is less than its absolute validity interval (avi).In this work, we define QoD as the ratio of the number of fresh data objects  fresh to the total number of temporal data objects  temporal : Since the higher QoD is desirable as far as the system is not overloaded, users or applications specify only the minimum QoD, denoted in QoD min , as a QoS specification.

QoS Management
Architecture.QeDB supports the desired QoS through its QoS management architecture shown in Figure 1.
The architecture follows the feedback control principle, and hence it exploits the closed loop of continuous "monitoring" and "control."The transaction handler includes the core engine of the underlying embedded database, and it processes admitted transactions.At every sampling period, the performance monitor computes the average tardiness of real-time transactions.The tardiness feedback controller generates control signals by taking the difference between the target tardiness and the current average tardiness.The QoS manager enforces these control signals by using available control knobs in the system.In our previous work, we changed the rate of sensor updates and subsequent QoD using the admission controller to control the system's overheads.In recent embedded platforms, the QoS controller might exploit other control knobs, such as DVFS.In the following sections, we discuss how to exploit these multiple control knobs in multicore environments.

Motivation: Intercore Interferences in Multicore Systems
Modern embedded systems are increasingly moving towards multicore platforms.We might consider scheduling real-time transactions onto a dedicated CPU core to avoid scheduling interferences from non-real-time tasks.However, contention for shared resources, such as memory and I/O, cannot be avoided completely in multicore platforms.In particular, the interferences between the cores pose significant challenge for data-intensive real-time applications, in which predictable system behavior is highly required.
To illustrate the problem, a microbenchmark is performed in a multicore embedded platform.In the benchmark, a real-time task is invoked periodically on every 100 ms, and it is scheduled to run in CPU core #1.The task is a transaction, as shown in Algorithm 1, that performs computational analysis by actively accessing real-time data in the database.At the same time, a stream of independent best-effort transactions are executed in the other CPU cores to interfere the realtime transaction.These best-effort transactions access different databases in the system.For this benchmark, the QoS management mechanism in Figure 1 is deactivated.We use the CPU affiliation feature of the testbed platform to assign the transactions to different CPU cores.Transactions in each active CPU core are scheduled according to real-time FIFO policy.The details of the testbed platform are discussed in Section 5.
In the benchmark, the real-time transaction's data response time  data and computation response time  comp are measured under varying CPU clock speed.Figure 3(a) shows the result when the real-time transaction is executed without interfering best-effort transactions in the other CPU cores.Figures 3(b) and 3(c), respectively, plot the results when the best-effort transactions are scheduled in the other 1 and 3 CPU cores.The results show that the task response time of the real-time transaction is increased significantly as more CPU cores are used to execute best-effort transactions.For instance, at 2.0 GHz, the task response time of the transactions is increased from 0.21 to 0.36 and to 0.42, respectively, when the other 1 and 3 CPU cores are involved in interfering real-time transactions.However, it should be noted that the computation response times  comp 's of the real-time transactions are not much affected by the intercore interferences.Only the data response times  data 's of the realtime transactions are increased from 0.05 to 0.16 and to 2.3, respectively, in Figures 3(a), 3(b), and 3(c).These results demonstrate that the response time of data-intensive realtime transactions can be affected significantly by intercore interferences.Further, these intercore interferences at the shared resources are hard to predict and pose significant challenges for data-intensive real-time applications.
The potential presence of intercore interferences changes the characteristics of the workload.For example, in Figure 3(a), the real-time transactions are more computationoriented when no interfering transactions run at other CPU cores.The ratio between  comp and  data is about 0.8 : 0.2.Therefore, changing the CPU speed using DVFS can be effective in controlling the total response time of the transaction.However, when transactions have high intercore interferences from other CPU cores, as in Figure 3(c), the ratio between  comp and  data is changed to about 0.5 : 0.5.In this situation, changing the computation speed using DVFS has limited effect on the task response time.For instance, in Figure 3(a), the normalized response time of 0.4 is achieved at CPU core frequency of 1100 MHz.In contrast, in Figure 3(c), the normalized response time of 0.4 cannot be supported even at 2000 MHz CPU frequency, which is the maximum CPU frequency.
We performed the second microbenchmark experiment to understand the impact of QoD scaling when transactions incur high intercore interferences.In the experiment, we measure the response time of the real-time transactions while the QoDs of the transactions are varied from 10% to 100%.We can decrease the QoD of temporal data by increasing the update intervals of sensors.During the experiment, the CPU frequency is fixed at 1.0 GHz. Figure 4 shows that the data response time  data of the real-time transactions is affected significantly by QoD.For instance, in Figure 4, decreasing the QoD from 100% to 50% reduces  data from about 0.5 to 0.3.This result shows that decreasing QoD is an effective method to reduce the chance of intercore interferences in multicore systems.

QoS Management for Multicore Systems
In this section, we propose the QoS management approach that exploits multiple control knobs to handle highly dynamic workloads in multicore environments.

Metric to Quantify Intercore Interference.
As seen in Section 3, the workload characteristic of a transaction can be significantly affected by intercore interferences.As a consequence, the effectiveness of the control knobs, for example, DVFS and QoD scaling, also changes according to the varying workloads.The QoS management architecture in Figure 1 is supposed to coordinate these multiple control knobs under such highly variable multicore environments.To this end, we define drr (data response ratio) as a metric that characterizes transaction 's workload state: drr of a transaction is a ratio of data response time to the total response time.In this paper, we assume real-time transactions, in which the data access pattern is not varying much between their repeating periods.Therefore, significant changes in drr imply the presence of intercore interferences.For instance, drr gets higher as more intercore interferences occur.We further define drr norm as 's nominal data response ratio that represents the minimal drr: in which  norm data and  norm comp , respectively, are transaction 's  data and  comp profiled while no interfering tasks are executed in the other CPU cores.Therefore, the gap between drr and drr norm can be used as an indicator that tells how much a transaction is delayed due to tardy data accesses.In multicore-based real-time systems, intercore interferences are the major source of tardy data accesses.

Feedback Control
Procedure.The primary goal of QoS management is to support the transaction response time equal to the desired response time.Further, another goal is to exploit multiple control knobs properly, considering the dynamic workloads of multicore systems.Since we have two control goals, we need to provide at least two control inputs to control them.For example, if real-time transactions are tardy due to intercore interferences, we need a control knob that effectively reduces the intercore interferences.Conversely, if the transactions are tardy because of slow computation activities, we need another control knob to speed up the computation.Given a task, one available control knob that significantly affects its computation response time is the processor speed.The higher the processor speed, the shorter the response time of the task.In modern embedded processors, the processor speed can be controlled by changing processor frequency using DVFS.Regarding data response time, we can exploit QoD scaling as a control knob.Since the higher QoD is translated into the more frequent accesses to temporal data, the data response time of a transaction is highly affected by QoD.
To achieve these multiple goals using multiple control knobs, we propose to exploit the MIMO (multiple inputs/multiple outputs) control loop shown in Figure 5.The overall feedback control steps are as follows: (1) The desired transaction tardiness, tard target , and the desired data response time, drr target , are set.Typically, we may set them to 1 and drr norm , respectively.By setting drr target to drr norm , we require the system to maintain the minimal drr against potential intercore interferences.
(2) At the th sampling instant, the average tardiness error  tard () and the drr error  drr () are computed for real-time transactions.
The MIMO controller computes the control signals simultaneously considering both the transaction tardiness and the data response ratio.
(4) The QoS manager changes the CPU core frequency to achieve Δfreq.
(5) ΔQoD is achieved by adjusting the update rates of temporal data objects.

Feedback Control Loop Design.
In this paper, we take a systematic approach to designing the feedback controller.

System
Modeling and Verification.The first step in designing a feedback controller is to construct a model that captures the target system's properties.In this study, the QeDB running on a multicore system is the target system.As discussed in previous sections, the goals of the QoS management are to support the desired transaction tardiness while preventing excessive intercore interferences in multicore environments.To achieve these multiple control goals using multiple control knobs, we exploit a MIMO model.The form of MIMO linear time-invariant model for QeDB is shown in The model parameters A and B are 2 × 2 matrices because the system has two inputs and two outputs.We may choose to use two separate single input/single output (SISO) models, one SISO model to relate CPU frequency to transaction tardiness and another SISO model to relate QoD to drr.However, if system inputs affect multiple outputs, then a MIMO model should be considered to capture the interaction between the different control inputs and system outputs [12].For instance, in our system, changing QoD affects both the transaction tardiness and drr.
In the actual system identification of QeDB, two inputs are varied simultaneously.The relatively prime cycle inputs are used to fully stimulate the system by applying all different combinations of the two inputs.Figure 6 shows the result of the system identification.The model parameters obtained through the system identification are A = [ 0.8504 −0.0066 −0.1449 0.3882 ] and B = [ −0.1983 0.1485 0.2448 0.7762 ].These parameters quantify the interaction between the control inputs and the system outputs.For instance, the two components of B's first row have different signs and this means that the CPU frequency and the QoD scaling drive the tardiness of transactions in different directions.One widely used metric to quantify the model accuracy is  2 , where  2 = 1−variance(experimental value − predicted value)/variance(experimental value).The  2 's of our model are 0.908 and 0.823 for transaction tardiness and drr, respectively.In general a model with  2 ≥ 0.8 is considered valid [13].

Controller Design. The closed-loop model is constructed as follows:
[ e ( + 1) where r = [1 drr target ]  .In this model, the control error vector e() and the acumulated control error vector e  () are used as the state vector.For the robustness against disturbance and simplicity, we choose to apply a proportional integral (PI) control function, given by where K  and K  , respectively, are proportional and integral controller gains.K  and K  are 2 × 2 matrices.At each sampling instant , the performance monitor calculates the control error and the accumulated control error e  ( + 1) = e  () + e () .
Using e() and e  (), the control law in (7) computes the controller input u().
The properties of the closed-loop system, such as the settling time, the overshoot, and the stability, are determined by the control gains K  and K  .We obtained the control gains using linear quadratic regulator (LQR) technique that minimizes the cost function : where the weighting matrices Q and R quantify the cost of control error and the cost of control effort, respectively.Since minimizing the transaction tardiness is the primary goal of the QoS management, we put the higher weight to the tardiness control error  tard compared to the data response ratio error  drr by choosing Q = diag(1, 1/10, 1, 1/10).The first and the second elements of Q quantify the cost of control errors  tard and  drr , respectively.Once weighting matrices Q and R are determined, MATLAB commands  can be used to get the controller gains.The controller gains obtained through dlqr are K  = [ 0.396 −0.062 −0.107 −0.067 ] and K  = [ 0.058 −0.035 −0.025 −0.041 ].We can analytically prove the stability of the closed-loop system in (6) by showing that the poles of the closed-loop system are all within the unit circle [13].In (6), the poles are the eigenvalues of  (11), we can get the poles of the closed-loop system, which are 0.38, 0.87 ± 0.02, and 0.93.
These poles are all within the unit circle, and this proves that the designed closed-loop system is analytically stable.However, an actual system might manifest different behavior, and hence we need to verify the stability of the system in empirical manner too.In Section 5, we verify the empirical stability of the proposed system through actual evaluation.

Implementation.
The proposed QoS management approach and baselines are implemented by extending QeDB [2].QeDB internally exploits Berkeley DB as a transaction handler.Berkeley DB [14] provides low-level database features, such as storage management, multithreading for concurrent data processing, locking, and recovery.However, the original Berkeley DB does not support QoS, such as task tardiness and freshness of temporal data.QeDB extends Berkeley DB with QoS management architecture shown in Figure 1.Originally, QeDB only supports the QoD scaling through admission control.This work integrates the QoD scaling with hardware-supported DVFS.In each real-time task, every access to data is performed by invoking Berkeley DB's put and get methods.These data access methods are instrumented to monitor the response time and data ratio.

Evaluation
In this section, we introduce the testbed used for the experiment and present the goals and results of the evaluation.

Evaluation Testbed and Settings.
The hardware platform for the testbed is Odroid-XU3 evaluation board [15].The specification of the board is shown in Table 1.The Exynos 5422 SoC of Odroid-XU3 has 4 Cortex-A15 cores and 4 Cortex-A7 cores.During the evaluation, 4 Cortex-A7 cores are turned off to exclude the effect of heterogeneous cores.Exynos 5422 has 19 DVFS voltage/frequency steps.The power consumption of the system is measured in real time using Odroid Smart Power [15].For performance evaluation, we simulate the adapted search-and-rescue scenario from [16] on our testbed.In the scenario, a mobile device, carried by a firefighter, collects streams of sensor readings from nearby sensors.The Odroid-XU3 device is used to simulate fire-fighter's mobile device.Sensor streams from the building are simulated by 3.0 GHz quad-core i-7 Linux desktop.The sensor readings were obtained from realistic simulation using CFAST (the Consolidated Model of Fire and Smoke Transport) simulator [17].Total 1024 sensors are recorded using the simulation, and

None
No interfering best-effort transactions in the other CPU cores.

C10-D90
Data-intensive workload ( norm comp :  norm data = 1 : 9). each sensor's reporting period follows the uniform distribution ranging from 1 to 10 seconds.During the evaluation, the desktop sends sensor streams from the trace to the mobile device.When a new sensor reading arrives to the device, an update transaction is invoked to store the sensor data.At the mobile device, one Cortex-A15 core is assigned for real-time transactions/tasks as shown in Figure 2. A realtime transaction is invoked periodically on every 100 ms to simulate the real-time analysis of the building state such as the direction of fire, possibility of explosion, and safe retreat paths.We set the real-time transaction's workload to have  norm comp :  norm data = 5 : 5.The deadline of the real-time transaction is set to 50 ms.The slack time is used for aperiodic jobs, such as updating the GUI and updating sensor data.The minimum QoD is set to 0.5, implying that maximum 50% of incoming sensor updates can be dropped.
The other 3 CPU cores are assigned for aperiodic besteffort transactions/tasks.These best-effort transactions are supposed to generate various intercore interferences according to workload types.Table 3 shows the workload types of best-effort transactions with different ratios between  norm comp and  norm data .C90-D10 is the most computation-intensive, and, conversely, C10-D90 is the most data-intensive.Each besteffort transaction's  norm data and  norm comp are adjusted by changing the number of data object accesses and the loop counts of a dummy computation loop.However, all transactions are configured to have almost equal nominal response time  norm , which is  data +  comp .At each core, a best-effort transaction is invoked continuously, and its consecutive invocations are separated by a uniformly distributed time interval between 50 ms and 150 ms.
The real-time transactions and the best-effort transactions are assigned to their respective CPU cores using the processor affinity feature of Linux.We do not assign particular CPU cores to update transactions.Hence, update transactions can be assigned to any CPU cores according to underlying operating system's scheduling policy.

Evaluation Goals and Baselines.
The objectives of the performance evaluation are (1) to verify that the proposed approach can support the QoS specification under various conditions and (2) to test the effectiveness of the proposed QoS management approach.
For the first objective, we investigate the behavior of the proposed system under various conditions, where a set of parameters are varied.We vary the following parameters: (1) the workload characteristics of interfering tasks and (2) the number of interfering CPU cores.For the second objective, we compare the proposed QoS management approach with several state-of-the-art baseline approaches.For performance evaluation, we consider 4 approaches shown in Table 2. Open is the Berkeley DB without QoS support.In Open, the operating system's DVFS governor is set to OnDemand, in which the CPU frequency is adjusted to maintain its CPU utilization within the boundary between 20% and 90%.Thus, Open represents the state-of-the-art embedded databases with nominal power management support from underlying operating systems.DVFSonly and QoDxxx represent QeDB supporting transaction tardiness using a single input/single output (SISO) controller.In DVFSonly and QoDxxx, the tardiness of real-time transactions is controlled only through DVFS and QoD scaling, respectively.Since QoD scaling does not adjust CPU frequency dynamically, QoDxxx's CPU frequency is set to xxx MHz.Finally, MIMO is the proposed QoS management approach that supports the transaction tardiness using the MIMO controller integrating DVFS and QoD scaling.

Average Performance.
In this experiment, the average performance of the proposed approach is investigated under various conditions.

Data-Intensive versus Computation-Intensive Workloads.
In this experiment, we test the performance of each approach when different workloads, shown in Table 3, are applied to interfere the real-time transactions in one CPU core.
Figure 7 shows the results.As shown in Figure 7(a), both DVFSonly and MIMO closely support the target tardiness of real-time transactions in all interfering workload types.In contrast, Open and QoDxxx do not satisfy the tardiness goal in most interfering workload types.QoD scaling approaches cannot achieve the tardiness goal since, as shown in Figure 7(b); their QoD is saturated at either the minimum, which is 0.5, or the maximum, which is 1.This result demonstrates the limitation of scaling QoD.For DVFSonly and MIMO, the target tardiness is satisfied at the cost of increased CPU frequency as shown in Figure 7(c).In particular, the CPU frequency of DVFSonly increases rapidly as the more data-intensive workloads are applied.This shows that intercore interferences for accessing data have significant impact on the tardiness of real-time transactions.In contrast, MIMO's CPU frequency increases slowly as the workload becomes more data-intensive.This is because MIMO exploits not just DVFS but also QoD scaling.Figure 7(b) shows that more QoD degradation occurred in MIMO as the workloads become more data-intensive.It should be noted that MIMO's QoD is saturated at the minimum, which is 0.5, when C10-D90 workload is applied.However, unlike QoDxxx, MIMO achieves the tardiness goal since it can exploit DVFS as another control knob.
Figure 7(d) shows the average power consumption of different approaches.When no interfering workload is applied, the power consumption of all approaches, except QoD1800, is not much different.However, as more dataintensive workloads are applied, the power consumption of Open and DVFSonly increases rapidly.For example, Open consumes about 2.7 times more power when C10-D90 workload is applied.This shows that intercore interferences result in significant power consumption.Unlike Open and DVFSonly, however, MIMO's power consumption increases slowly compared to other approaches.This is because MIMO can maintain relatively lower CPU frequency by reducing intercore interferences using QoD scaling.

Varying Number of Interfering CPU Cores.
In this experiment, we change the number of interfering CPU cores while real-time transactions are executed in one CPU core.
Figure 8 shows the result when computation-intensive workload C90-D10 is applied.In Figure 8, increasing the number of interfering CPU cores has not much impact on the performance of real-time transactions.For instance, each approach, except Open, shows very similar tardiness, QoD, and CPU frequency regardless of the number of interfering CPU cores.Further, in Figure 8(d), the power consumption is gradual and proportional to the number of interfering CPU cores.For instance, in Figures 8(b) and 8(c), when 3 CPU cores are used to interfere real-time transactions, MIMO maintains the maximum QoD while CPU frequency is increased no more than 5%.These results demonstrate that when workloads are computation-intensive, the chances of intercore interferences are low, and the power consumption is proportional to the number of active CPU cores.
Figure 9 shows the results when C10-D90, which is data-intensive, is applied.The result shows that increasing the number of interfering CPU cores has significant impact on the performance when the workload is data-intensive.For instance, DVFSonly requires about 72% higher CPU frequency to achieve the tardiness goal when 3 CPU cores are used to interfere real-time transactions.Further, in Figure 9(d), the power consumption increases exponentially for Open and DVFSonly.In contrast, MIMO requires less than 20% increase of CPU frequency at the cost of degrading QoD to the minimum to achieve the tardiness goal.By combining DVFS and QoD scaling, MIMO incurs gradual power increases.This is because MIMO reduces the intercore interferences by decreasing QoD as shown in Figure 9(b).

Transient Performance.
For real-time applications, average performance is not enough to describe their dynamic behavior.Transient performance such as settling time and overshoot should be small enough.In this experiment, we introduce sudden intercore interferences in order to observe the transient behavior of the tested approaches.Initially, realtime transactions are running in one CPU core without  interferences from the other CPU cores.At the 150th sampling instant, disturbance is introduced by executing besteffort transactions in the other 3 CPU cores.The disturbance persists until the 400th sampling period.The best-effort transactions' workload type is C50-D50.
Figure 10 shows the transient behavior of the tested approaches.All approaches, except Open in Figure 10(a), support the desired tardiness using the QoS management architecture of QeDB.These approaches react against the disturbance within 3 sampling periods to achieve the target transaction tardiness.Their overshoots, which are the maximum deviations from the QoS goal, are less than 20%.In Figure 10(b), DVFSonly supports the desired tardiness by increasing CPU frequency by 37%.In Figure 10(c), the QoD1000 does not achieve the target tardiness initially because its QoD is saturated at the maximum.However, while the disturbance is injected, it achieves the target tardiness by lowering QoD.This shows that QoD saturation severely limits the applicability of the QoD scaling technique.Both DVFSonly and QoD1000 do not control drr, and hence drr increases significantly while the disturbance is injected.For instance, DVFSonly's drr increases from 0.28 to 0.44 during the disturbance period.This high drr implies that the realtime transactions' data accesses are delayed due to intercore interferences.In MIMO, we can control drr by setting drr target properly.In Figures 10(d) and 10(e), drr target is set to 0.28 and 0.40, respectively.According to drr target , MIMO shows different behavior.When drr target is 0.28, which is drr norm , MIMO's controller maintains drr target by significantly lowering QoD against the disturbance.On the other hand, the increase of CPU frequency is less than 10%.This means MIMO exploits QoD scaling more aggressively since the transactions are tardy due to intercore interferences.If a user wants to maintain high QoD, MIMO can be configured to resemble DVFSonly by setting drr target high.In Figure 10(e), MIMO's drr target is set to 0.40 and its reaction against the disturbance is similar to DVFSonly's.When drr target is 0.40, MIMO maintains QoD as high as 0.96 against the disturbance.The target tardiness is mostly achieved by increasing CPU frequency; the CPU frequency is increased by about 34%.

Related Works
Prior research demonstrates that, in multicore environments, the contention for shared resources might cause performance anomalies [7][8][9].In particular, existing databases show poor performance in multicore systems due to the interference between cores to access data.Hence, developing databases for multicore machines has drawn intense research effort [3][4][5][6].Papadopoulos et al. proposed to exploit helper cores to efficiently prefetch data needed by working threads [4].Johnson et al. removed locking contention from existing storage managers [5].Salomie et al. proposed to partition the multicore machine and used existing databases in a replicated configuration as if the multicore machine was a distributed system [3].These works target high-performance server environments and their primary goal is to achieve high throughput.Further, they try to change the implementation of a specific DBMS to better exploit multiple cores.Unlike these works, we focus on supporting predictable data access response time in multicore embedded systems and our approach is not tailored for specific DBMS implementation.
QoD scaling via active load shedding [18] has been applied to real-time databases (RTDBs) [19,20] and stream management systems (DSMSs) [21,22] for performance management at runtime.A common approach for load shedding is to drop incoming data updates under overloading situation.For instance, Amirijoo et al. exploited imprecise computation on data to allow data objects to deviate from true value to a certain degree [20].However, the applicability of load shedding is highly application-dependent and its range is limited by applications' requirements.Hence, for many applications, QoD scaling via load shedding is hard to be a primary control knob to support the desired performance.In our work, we use QoD scaling together with DVFS to reduce potential intercore interferences.These two control knobs complement each other.There has been a large amount of previous works to use DVFS to save processor power while still supporting the timeliness of tasks [10,23,[23][24][25].Yao et al. first gave theoretic exploration of DVFS for real-time tasks considering a set of aperiodic tasks [26].For non-real-time systems without specific deadlines, performance metrics such as CPU utilization have been used [27,28].These approaches exploit a simple feedback mechanism based on the chosen performance metric to control processor frequency dynamically.In this work, we showed that the effectiveness of DVFS is diminished when tasks contend to access non-CPU resources in multicore systems.To address this problem, we integrate DVFS with QoD scaling.
Because of its robustness against unpredictable workloads, feedback control theory has been extensively applied for the QoS management of various computing systems, including web servers [29], caching service [30], and email server [31].Feedback control theory has also been used to support the timeliness of real-time transactions in real-time data services [2,19,32].However, these works do not consider modern multicore environments.In this work, we proposed a novel feedback control mechanism to support transaction tardiness while reducing potential intercore interferences of multicore embedded systems.

Conclusions
In this paper, we proposed the QoS management architecture for data-intensive real-time applications running on multicore-based embedded platforms.A novel multidimensional feedback control architecture is proposed to support the timeliness of transactions while reducing the effect of potential intercore interferences.Through the proposed control architecture, two distinctive control knobs, which are DVFS and QoD scaling, are controlled simultaneously to support the QoS goals in an efficient and robust manner.We showed the feasibility of the proposed QoS management scheme by implementing and evaluating it on a modern multicore mobile platform.Our evaluation results show that our approach achieves the target QoS goals, such as task tardiness and data quality, while consuming less energy compared to baseline approaches.
response time R comp Data response time R data

Figure 4 :
Figure 4: Task response time with varying QoD while the best-effort transactions run at 3 CPU cores.

Figure 7 :
Figure 7: Average performance with varying interfering workload patterns.
Algorithm 1: An example of data-intensive real-time task.
and K  to