PAS: Performance-Aware Job Scheduling for Big Data Processing Systems

Big data analytics has become increasingly vital in many modern enterprise applications such as user proﬁling and business process optimization. Today’s big data processing systems, such as Hadoop MapReduce, Spark, and Hive, treat big data applications as a batch of jobs for scheduling. Existing schedulers in production systems often maintain fair allocation without considering application performance and resource utilization simultaneously. It is challenging to perform job scheduling in big data systems to achieve both low turnaround time and high resource utilization due to the high complexity in data processing logics and the dynamic variation in workloads. In this article, we propose a performance-aware scheduler, referred to as PAS, which dynamically schedules big data jobs in Hadoop YARN and Spark and autonomously adjusts scheduling policies to improve application performance and resource utilization. Speciﬁcally, PAS schedules multiple concurrent jobs using diﬀerent policies based on the predicted job completion time and employs a greedy approach and a one-step lookahead strategy to opportunistically maximize the average job performance while still maintaining a satisfactory level of resource utilization. We implement PAS in Hadoop YARN and evaluate its performance with HiBench, a well-known big data processing benchmark. Experimental results show that PAS reduces the average turnaround time by 25% and the makespan by 15% in comparison with four state-of-the-art schedulers.


Introduction
With the prosperity of big data analytics and artificial intelligence, big data processing systems (BDPSs) are playing critical roles in modern enterprises' applications. e job scheduler continues to be a key component to a BDPS, in which diverse coexisting jobs from many users and applications contend for resources in a shared environment. As the data volume increases and the demand for analytics jobs surges, typical production BDPSs are frequently resourceconstrained. erefore, efficient resource management comes as the top priority for cluster schedulers [1]. Recently, due to the prevalence of "data analysis as a service" (DAAS), the BDPSs running on the public cloud are providing data analysis abilities for different users.
To balance interests of both users and service providers, performance (from user side) and resource efficiency (from provider side) need to be concerned simultaneously for the system design on those shared environments [2]. User-side performance is often measured using average turnaround time, which is the average time intervals from the time of each submission of their job to the time of its completion. Provider-side resource efficiency indicates the efficiency of the resource usage and it is usually measured with the resource utilization ratio and makespan of a set of jobs. ese objectives are opposite, because of the conflict of interest between users and the service providers. Current production schedulers often settle for isolation guarantee as the primary objective and seek to maintain fair allocations at all time, which is neither necessary nor efficient [1].
Motivated by this intuition, in this article, we propose a performance-aware scheduling algorithm to opportunistically improve the average job performance for fast job completion for users while still achieving good enough resource efficiency for service providers. To this end, we develop a performance prediction method for estimating completion time of a data analytical job under different resource utilization conditions, formulate a job scheduling problem for big data processing systems (JS-BDPS) in order to minimize average turnaround time under a user-specified resource utilization constraint, and propose a performanceaware scheduling solution. Specifically, our work makes the following contributions to the field: (1) e method for performance prediction is based on generic computer and program models, which provide an accurate estimation of completion time for big data analytical jobs and make it directly applicable to different jobs running on various big data processing systems. (2) e formulation of the JS-BDPS problem and the performance-aware scheduling approach that employs a greedy and a one-step lookahead strategy to solve the JS-BDPS problem effectively and efficiently. (3) e performance prediction method is validated and justified by experimental results using a well-known big data benchmark on disparate computing nodes, and the performance superiority of the proposed PAS scheduling approach is illustrated by extensive simulation results in comparison with four state-ofthe-art algorithms.
e rest of the article is organized as follows. We discuss the related work in Section 2. Section 3 formulates the JS-BDPS problem and discuss the objectives and constraints. In Section 5, we construct the regression model of a job for performance estimation and propose the performanceaware scheduling (PAS) policies to minimize the average turnaround time for a set of scheduling jobs. Section 6 presents the design and implementation of the PAS algorithm on top of Hadoop YARN and Spark. In Section 7, we describe the experimental setup and evaluates the performance model and scheduling algorithm using a well-known big data benchmark. We conclude with a discussion of our approach and a sketch of future work in Section 8.

Related Work
Job scheduling for BDPSs has received much attention from both industry and academia. From the perspective of scheduling goal, previous studies can be classified into two categories: performance-oriented and fairness-oriented approaches, as discussed next.

Performance-Oriented Scheduling.
Maximizing resource utilization and minimizing makespan are two common goals for performance-oriented scheduling [2]. Yao et al. [3] proposed HaSTE to improve the resource utilization by using efficient task packing according to diverse resource requirements and dependencies between tasks. eir later work proposed OpERA [4] to leverage the knowledge of actual runtime resource utilizations as well as future resource availability for task assignments. Quasar [5] used classification techniques to find appropriate resource allocations to applications in order to fulfill their QoS requirements and maximize system resource utilization. Polo et al. [6] dynamically adjusted slots on each machine to maximize the cluster utilization. Verma et al. [7] allocated resource to jobs by using job profiles to estimate the requested resource that meets the deadline. Cheng et al. [8] proposed a deep reinforcement learning (DRL)-based job scheduler that dispatches the jobs in real time to deal with real-time workloads. Fan et al. [9] proposed an intelligent scheduling framework for different hardware resources and increasing diverse workloads in modern job scheduling. Zheng et al. [10] designed an online algorithm for SaaS providers to optimally purchase IaaS instances and schedule pleasingly parallel job.
To minimize the makespan of a set of independent MapReduce jobs, Verma et al. [11] introduced a heuristic scheduling algorithm. Huang et al. [12] tried to optimize the makespan by estimating the completion time of jobs that are prone to error. Hou et al. [13] proposed a deadline-aware scheduling algorithm to reduce the average job execution time by checking the percentage of tasks and allocating resources. Wang et al. [14] proposed a workflow-based scheduling algorithm to satisfy the budget and the deadline constraints. Khan et al. [15] applied linear regression method to perform runtime estimation for Hadoop jobs. Lim et al. [16] proposed CP-Scheduler to estimate task execution time and handle MapReduce jobs with deadlines. Shao et al. [17] proposed an energy-aware greedy algorithm (EAGA) for fine-grained task placement to minimize the energy consumption and job execution time. Chen et al. [18] observed that with demand elasticity, a job requires significantly less amount of resources, only at the cost of a moderate performance penalty. Amer et al. [19] tackled the multi-objective scheduling problem and presents a modified Harris hawks optimizer (HHO)for multi-objective scheduling problem. Khan et al. [20] proposed a task scheduling method based on a hybrid optimization algorithm, which effectively schedules jobs with the least amount of waiting time. Meyer et al. [21] proposed a machine learning-driven classification scheme for dynamic interference-aware resource scheduling in cloud computing environments. ey presented a classification approach to better represents the workload variations for resource scheduling. Chen et al. [22] considered the heterogeneous characteristics of data centers and modeled energy consumption based on the frequency and kernel number of the virtual machine CPU.

Fairness-Oriented Scheduling.
Fairness is another important factor for a scheduling framework. Matei et al. [23] proposed a delay scheduling policy to improve the performance of Fair Scheduler by increasing the data locality of Hadoop. Dominant resource fairness (DRF) [24] is the first work to generalize the max-min fairness to multiple resource types on Hadoop YARN. Wang et al. [25] extended the DRF algorithm for the heterogeneous environment. Many production schedulers, such as Hadoop's Fair Scheduler [26], Quincy [27], Mesos [28], and Choosy [29], support max-min fairness or its extensions. Liu et al. [30] presented a resource allocation mechanism to enable fair sharing multiple types of resource among multiple tenants. Huang et al. [31] calculated the approximated total workload according to the job's runtime distribution and performed resource allocation accordingly in order to maximize the client-specified utilities regarding max-min fairness. Wang et al. [32] corrected the monopolizing behavior of long reduce tasks from large jobs and dynamically balanced the execution of different jobs for fair and fast completion.
Some other studies consider the trade-off between performance and fairness simultaneously. In [33], a general meta-scheduler to leverage existing schedulers in Hadoop YARN to implement the efficiency-fairness trade-off was proposed. Wang et al. [34] utilized many metrics to efficiently balance the performance and the fairness, as well as to reduce the makespan of MapReduce tasks. Tang et al. [35] presented DynamicMR, a dynamic Hadoop slot allocation (DHSA) framework aiming to improve the performance of MapReduce workloads while maintaining the fairness. Pastorelli et al. [36] presented HFSP, a size-based scheduler with aging to implement fairness and near-optimal system response times on Hadoop. Niu et al. [37] presented an adaptive scheduler called Gemini, which adaptively decides the proper scheduling policy according to the running workload, in order to achieve better performance as well as fairness.

Problem Statement
is article focuses on the job scheduling problem for big data processing systems (short for JS-BDPS problem). Generally, the architecture of scheduling frameworks of BDPSs can be centralized, centralized two-level, distributed two-level, or shared-state [2].
is article focuses on the centralized two-level (CTL) architecture because it is easy to implement and can generate an optimal or near-optimal scheduling plan. Many practical scheduling frameworks, such as Yarn [38], Mesos [28], Fuxi [39], Teris [40], and Corral [41], have adopted the CTL architecture. In a typical scheduling framework with the CTL architecture, the scheduler is responsible for allocating resources to the various running jobs subject to constraints of capacities, queues, etc. e goal of JS-BDPS problem is to find an optimal scheduling policy that minimize the average turnaround time (ATAT) of the submitted jobs within a predefined time period, given a set of jobs, and the underlying runtime environment. e reason we choose the turnaround time metric is because it can tell how long a job can take to finish execution since it arrives at the scheduler, which characterizes a most important capability of a scheduler. Other measurements such as energy, cost, makespan, and load balancing are integrated by turnaround time in some ways [2].
Notably, the JS-BDPS problem has the following components: 3.1. Job. A job represents a big data processing application running on a specific BDPS. We model it as a 4-tuple j � 〈f, r, t w , t e 〉, where f denotes the processing logic of j; r represents the required resource for running j; t w records the waiting time, that is the accumulated queuing time for scheduling, of j; and t e indicates the execution time of j red and assumes the jobs obtain the same priority. Once submitted, jobs need to be queued for a while until they are scheduled. ereafter, the scheduler assigns the necessary resource r to j and j will be continuously executed in the BDPS until completion.

Resource.
A cluster of BDPS often contains different types of resource, that is, CPU, memory, and network bandwidth, that a job needs for executing, and a job usually need to describe the necessary resource it required to initiate its execution. For example, a MapReduce job running on a YARN scheduler has to declare the number of CPU core and the memory size when submitting. Without loss of generality, we model the resource requirement of a job j as a 2tuple r � 〈c, m〉, where c is the number of CPU cores and m denotes the memory size, and R � 〈C, M〉 is used to indicate the total resource held by a BDPS cluster.

Turnaround Time.
Turnaround time (TAT) is an important metric in evaluating the scheduling algorithms from the users' perspective [2]. For a specific job j i , its turnaround time TAT i is the time interval from the time of j i 's submission to the time of the j i 's completion, that is

Resource Utilization
Ratio. e resource utilization ratio (RUR) is an important performance metric that is used to measure the efficiency of a BDPS cluster from the service providers' perspective [33]. Given any time point t, the RUR t of a BDPS cluster is defined as a 2-tuple: where R used is the already used resource of the cluster at t and R is the total resource of the cluster. In summary, given a set of n jobs J � j 1 , j 2 , . . . , j n submitted to the scheduler, the JS-BDPS problem can be stated as follows: where equation (2) states that the goal of JS-BDPS problem is to find the optimal job scheduling sequence that minimizes the average turnaround time (ATAT) of J. e constraint (2) is that at any time point t, the RUR of any solution must be

Performance-Aware Job Scheduling.
e key idea of our performance-aware job scheduling (PAS) approach is to train a performance prediction model for each job according to their historical profiles and repeatedly assign resource to the job(s) with the lowest cost gain using the trained model. PAS also applies a heuristic one-step lookahead strategy to find potentially good scheduling policy.

Predicting Job Completion Time.
A BDPS can support many data analytical jobs running on it simultaneously. We observe that the completion time of a specific job j varies under different resource utilization ratio (RUR) of the BDPS. Suppose the completion time j is t when executing it exclusively, that is RUR 0 � 0, on a BDPS. If RUR increases to RUR k (0 < RUR k < 1) by adding more jobs to the BDPS, at each time slot Δt there are statistically (1 − RUR k ) · Δt share of processing time can be used for executing j. Furthermore, when RUR k continues to grow, the swapping and scheduling time among different jobs becomes significantly large [42]. Based on these observations and the job complexity estimation method proposed in [43], we model the job completion time t and the resource utilization ratio (RUR) as a power-law function: where a, b, and c are regression constants. e prediction model indicates that when resource utilization ratio grows, the completion time of a job grows in a power number. Such conclusion has been well verified in our experiments.

Scheduling Gain and ATAT.
To schedule a set of jobs dynamically, PAS has to make decision periodically. More specifically, at each time point t, it needs to decide whether to schedule a set of jobs J from the queue (for decreasing the t w and t e of all jobs in J), or to do nothing (for decreasing the execution time of all running jobs). To lucubrate the scheduling process, we discrete the time into many small and equal slots, that is Δt, and denote the kth time interval as τ k where t 0 represents the start time of scheduling. Note that the discretized time slot Δt is a hyperparameter of PAS that is configurable by the administrator of the BDPS, and it defines the scheduling frequency of the PAS.
Based on the discretized time slot, we can define the scheduling gain to quantify the gain by scheduling a set of jobs: Definition 1. Scheduling gain at τ k , denoted by G k , is defined as the decrease of the waiting time for scheduling a subset of submitted jobs (J k s ) minus the increase of the execution time for the running jobs (J k r ) caused by the newly scheduled and added jobs (J k s ). G k can be formulated by: where R 0 used , R k− 1 used , and R k used denote the used resource at τ 0 , τ k− 1 , and τ k , respectively; Based on the definition of scheduling gain, we can define the relationship between G k and ATAT: Proof. By definition, the ATAT can be formulated as: where the average value of accumulated scheduling gain is defined as: Comparing equations (5) and (7), we have Finally, we have AASG � ATAT.

□
We can see from eorem 1 that maximizing AASG equals to minimizing ATAT, one of the ultimate goal of our JS-BDPS problem, as defined in equation (2). In order to maximize AASG, we apply the greedy strategy in this article: at any time slot τ k ∈ N, we try to find the scheduling job set J k s that can maximize G k .

Small Job First Policy.
To minimize the ATAT while achieving a better resource efficiency, the PAS algorithm tries to utilize as much resource as possible for running more jobs at any time slot τ k on the one hand, and on the other hand it needs to compare the G k values under different scheduling job sets and choose the scheduling sets with the best value of G k . Given a fixed resource utilization ratio RUR k at τ k and a set of jobs J k that needs to be scheduled, all possible number of scheduling job sets equals to 2 |J k | at the worst case. To reduce the searching space of this process, we propose a small job first policy, defined as follows: Theorem 2. Given any two valid scheduling job sets J k s 1 and Proof. According to the Definition 1, the scheduling gain G k s 1 over J k s 1 is defined as: Similarly, the G k s 2 over J k s 2 is defined as: us, G k s 1 − G k s 2 equals: Suppose that the performance prediction functions for all jobs share the similar shape, we have: Based on equations (12)- (14), we can substitute the second subtraction part of equation (12) with equation (14) and draw the final conclusion that the scheduling gain G k s 1 over J k s 1 is greater than or equal to the scheduling gain G k s 2 over J k s 2 .
□ eorem 2 indicates that given a fixed resource r we can use for scheduling at any time slot τ k , the scheduler should always schedule as many jobs as it can to minimize the ATAT. In another word, the small jobs with less resource requirements should be scheduled first, we call this strategy the small job first (SmJF) policy.

One-
Step Lookahead. After using SmJF policy, we can still obtain many scheduling job sets having the same size and the same G k value. In this section, we propose a one-step lookahead (OSLA) policy that can find the scheduling job set(s) to achieve the possibly smaller ATAT value in the future, which is defined as follows: Definition 2. One-step lookahead (OSLA) policy: Suppose the further scheduling policy is shortest job first (SJF) after τ k . According to the assumption, we have |J k Let the following scheduling sequence for J k s 1 is S 1 and for J k s 2 is S 2 , the average turnaround time under S 1 is ATAT S 1 s 1 and under S 2 is ATAT S 2 s 2 , if ATAT S 1 s 1 ≤ ATAT S 2 s 2 , we should choose J s 1 as our scheduling job set. Definition 2 indicates that at any time slot τ k , if two candidate job sets J s 1 and J s 2 have the same size and the same G k value, our scheduler should lookahead for one step and apply a simple shortest job first (SJF) policy J s 1 and J s 2 to generate two scheduling sequences S 1 and S 2 , respectively. ereafter, we simulatively schedule J s 1 under S 1 and J s 2 under S 2 separately and estimate the ATAT S 1 s 1 and ATAT S 2 s 2 . Finally, we choose the job set with the smaller ATAT value. e reason why we use the simple OSLA policy here is because the following scheduling process is dynamic and too complex to predict-an approximate and greedy policy works well under this circumstance.

Algorithm Design and Implementation
Based on the small job first policy (SmJF) and the one-step lookahead (OSLA) policy, we can implement our PAS algorithm. e detailed process is specified in Algorithm 1.
PAS initially sets the available resource R ′ to upper bound B (line 1) and starts to try different resource utilization values decreasingly (line 3 and 5). At each iteration, it applies the SmJF algorithm to select job sets according to R ′ (line 4). After trying all possible R ′ , we choose the candidate scheduling job set(s) with the best scheduling gain value (line 7) and apply the OSLA algorithm to find the final scheduling job set J k s (line 8).

Small Job First
Algorithm. As eorem 2 shows, given a fixed resource utilization value R ′ , we should greedily choose jobs with smaller resource demand first to decrease the waiting time for these jobs without severely increasing the Security and Communication Networks 5 running time for the already running jobs.
e SmJF algorithm is described in Algorithm 2.
SmJF algorithm starts by sorting the jobs in J k according to their resource demands r in the increasing order to generate a bin list B � B 1 , B 2 , . . . , B l , in which each bin contains jobs with the same r (line 3). At each iteration step i, SmJF tries to add all jobs in the bin B i first (line 5) and compares the updated resource utilization R c with the target value R ′ (line 6). If R c is still smaller than or equal to R ′ , we can now safely add all jobs in B i to the scheduling job set J k s (line 7) and move to the next iteration. If R c is already greater than R ′ , we have to release the resource possessed by jobs in B i first and recalculate the desirable number of jobs n (line 9). We should then enumerate all possible subsets of B i containing n jobs (line 10). e algorithm ends after adding subsets of some B i in B to J k s (line 11), or finally adding all jobs in J k to J k s (line 13).

One-
Step Lookahead Algorithm. Given each candidate scheduling set J i in J k c , the OSLA algorithm first appends all jobs in J i to the scheduling sequence S (line 2-5). ereafter, other jobs in J k c are appended to S according to the shortest job first (SJF) policy (line 6). Once S is complete, we call the simulation algorithm to estimate the average turnaround time ATAT i of S starting with J i (line 7). Finally, the candidate scheduling set J i with the smallest ATAT values are chosen and returned (line 9). e whole process is shown in Algorithm 3.
Given a possible job scheduling sequence S, the simulation algorithm estimates the ATAT of S by greedily choosing jobs from the beginning of the S, putting them into the running job set, and updating their waiting and execution time. e detailed working process is shown in Algorithm 4.

Timeout Mechanism.
One potential issue of our PAS algorithm is that the long-running job with large resource demand will suffer from the starvation problem. To deal with it, we set a timeout value, that is the maximal waiting deadline, to each job. Once the waiting time of a job exceeds the timeout, it will be executed first without any delay.
We can set the timeout value as the makespan of the current scheduling job set J k s ; this ensures that the longrunning job(s) with large resource demand has the opportunity to execute after all other jobs in J k s complete their execution. e timeout value for each job can be set at the first time in the simulation algorithm, in which the job scheduling sequence S is given and the makespan value is able to estimate.

Implementation on Hadoop YARN. We incorporate PAS
into Hadoop YARN (2.6.0) by implementing an independent PAS algorithm module and adding a PAS scheduler plugin to YARN. e implementation detail is shown in Figure 1.
As shown in Figure 1, PAS consists of two modules. e first module is called PAS algorithm, which is implemented by Python 3.5. At each time slot τ k , PAS algorithm first receives the necessary scheduling information, for example R k used , J k , etc., from the PAS scheduler, and then calculate the suggested scheduling job sets by calling the proposed PAS algorithm with the scheduling information. e communication between PAS algorithm and PAS scheduler is implemented by TCP protocol.
We have implemented the PAS scheduler as a customized plugin of YARN. PAS scheduler focuses on sending the scheduling information periodically to the PAS algorithm and allocating resource for the scheduling job set returned by the PAS algorithm.
More specifically, at each time slot τ k , PAS scheduler needs to determine whether there still has more resource to allocate in comparison with U, the upper bound of resource utilization: (i) If so, PAS scheduler sends the information including current used resource (R k used ) and the job set waiting for scheduling (J k ) to the PAS algorithm and requests for the scheduling job set. Once received the J k s , PAS scheduler will attach all the jobs in J k s to the execution queue of YARN. (ii) If not, PAS scheduler will do nothing and wait for the next time slot.

Experiments
Require: J k : the job set waiting for scheduling at τ k ; R k used : the used resource at τ k ; R′: the target resource utilization after scheduling. Ensure: All possible scheduling job sets J k s (1 J k s ←∅; (2 R c ←R k used ; (3 Sort the jobs in J k by their resource demands r in the increasing order to form a bin list B � B 1 , B 2 , . . . , B l , in which each bin contains jobs with the same r; Security and Communication Networks this section, we first describe our experiment setup and then report the experimental results to prove the efficiency and effectiveness of the proposed approach. 6.1.1. Running Environment. We use Hadoop YARN (2.6.0) on Spark (1.6.0) and conduct our experiments on a local cluster of five physical servers. Each server is equipped with two 8-core Intel XeonE5-2650v2 2.6 GHz processors, 256GiB RAM, 1.5 TB disk, and running CentOS 6.0, Java 1.7.0_55 and Python 3.5. All of servers are connected via a high-speed 1.5 Gbps LAN. To avoid interference and comply with the actual deployment, we run the YARN on Spark, the workload generators, and the PAS algorithm on different physical servers at each experiment.

Baseline Algorithms.
To evaluate the performance of PAS, we compare it with four state-of-the-art algorithms, namely FIFO [38], AHP [44], SJF [45], and DRF [24]. We provide a brief description for each algorithm as follows and compare the scheduling goals for the five algorithms in Table 1: FIFO sorts all jobs in the order of submission (first in, first out), and it is the default scheduler of the YARN.
AHP is an improvement in priority-based job scheduling algorithm in cloud computing, which is based on multiple criteria and multiple attribute decision-making model.
SJF sorts all jobs in the order of execution time, the shortest will be sorted first.
DRF is an extension of classic fair scheduling [26] for multiple types of resource. DRF determines CPU and memory resource shares based on the availability of those resources and the job requirements.
In PAS algorithm, we set the lower and upper bounds of CPU and memory utilization ratios to [0.5, 0.9] and [0.75, 1], respectively.

Workloads.
In our experiment, we use HiBench, a well-known big data benchmark, to generate Spark workloads. More specifically, we choose 10 different workloads (18 TAT Table 2.
We construct six groups of jobs for scheduling with different job numbers, that is 15, 30, 45, 60, 75, and 90, by randomly choosing these jobs from the candidate workloads listed in Table 2. For each testing group and a scheduling algorithm, we conduct 10 independent runs and record the results separately.

Evaluation Metrics.
We consider three well-known metrics in our experiments for performance evaluation, namely average turnaround time (ATAT), resource utilization ratio (RUR), and makespan (MS).
Average turnaround time (ATAT). ATAT can measure the performance of a scheduler from the user's perspective. e detailed definition of ATAT can be found in Section 3. Here we define the ATAT improvement in an algorithm over the baseline algorithm in comparison as: where ATAT baseline is the baseline ATAT, and ATAT is that of the algorithm being evaluated.
Resource utilization Ratio (RUR). RUR measures the resource efficiency of a BDPS from the service providers' perspective. e detailed definition of RUR can also be found in Section 3.

Makespan (MS).
Makespan defines the time difference between the start and finish of a sequence of jobs. It measures the resource efficiency of a BDPS from the service providers' perspective [2].

Estimation of Job Completion Time under Different
RUR. We choose three workloads, that is a CPU-intensive workload (SP), a memory-intensive workload (WC), and a machine learning workload (SVM), to verify the prediction model for job completion time. We first run each of these three workloads on the experimental environment with a series of RUR. For each workload, we perform regressions on the estimated expressions, that is equation (3), and provide in Table 3 the results of the root mean square error (RMSE), normalized root mean square error (NRMSE), and R-square (R 2 ), which clearly indicate that a high level of goodness-offit is achieved in these regressions. For a better illustration, we plot three fitted curves of these three workloads in Figure 2. By fitting the data points, we find that the job execution time is an approximate cubic function under RUR. With these fitted curves, we are able to provide a good estimate of the completion time of a given workload on the different RUR values. Table 4 shows the ATAT (ms) values of five different scheduling algorithms. As expected, our algorithm outperforms the default FIFO scheduler of YARN by 25.3% to 42.0%, Furthermore, PAS outperforms all other three algorithms: 13.6%-36.4% improvement over AHP, 10.2%-28.8% improvement over SJF, and 4.1%-12.1% improvement over DRF. Figure 3 shows the boxplots of the ATAT performance under six groups of job sets and five algorithms over 10 independent runs. In these boxplots, the bottom and top of the box are the first and third quartiles, the bands inside the boxes represent robust estimates of the uncertainty about the medians for box-to-box comparison. e ends of the whiskers represent possible alternative values and the symbol "+" denotes outliers. As shown by the ATAT boxplots, PAS has lower ATAT values than other four scheduling algorithms on all of the six job sets, which indicates the superiority and the robustness of our PAS algorithm.

RUR.
For a better illustration, we plot the CPU and memory utilization ratio for running a group of 45 jobs under five scheduling algorithms in Figures 4 and 5, respectively. We can see from Figure 4 that the CPU utilization ratios under PAS algorithm (red line) are more stable over the time than others: the CPU utilization ratios are between 50% and 90% (our predefined boundary) during the 69.04% of time slots. Figure 5 shows that the difference of memory utilization ratios among these algorithms is not significative, but the memory utilization ratios under PAS can be still kept in between 75% and 100% during 76.19% of time slots. Note that the resource utilization jitter exists under all scheduling algorithms because there might be a delay between the release of resource for already finished jobs and the allocation of resource for a set of newly added jobs.  Figure 6 shows the makespan values for six groups of workloads under five scheduling algorithms. We can see from Figure 6 that PAS outperforms all other three algorithms in term of makespan, with the exception of DRF algorithm. e reason why DRF has a good performance in makespan is probably because it allows schedulers to take into account the heterogeneous resource demands, leading to both fairer allocation of resources and higher utilization [24].

Conclusion and Future Work
In this article, we propose PAS algorithm to optimize average turnaround time and resource efficiency simultaneously. PAS scheduler constructs a performance prediction models for an accurate estimation of completion time of big data analytical jobs, it then dynamically schedules multiple jobs concurrently using different policies based on the prediction model and employs a greedy and a one-step lookahead strategy to opportunistically improve the average job performance for fast job completion while still achieving good enough resource efficiency.
It is of our future work to refine our prediction model by supporting the automatic selection of the appropriate parameters for any jobs given the hardware and software settings of a BDPS. For practical applications, we will take the priority of jobs and fairness into consideration. We will also investigate the performance dynamics of BDPSs and design better scheduling approaches to account for such dynamics in our algorithm design.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.