Task Priority-Based Cached-Data Prefetching and Eviction Mechanisms for Performance Optimization of Edge Computing Clusters

Advanced Technology Research Center, Korea University of Technology and Education, Cheonan, Republic of Korea Department of Electrical Engineering, International Islamic University, Islamabad, Pakistan Department of Electronics Engineering, Korea Polytechnic University, Siheung, Republic of Korea School of Information Technology and Department of Systems and Computer Engineering Carleton University, Ottawa, Canada Department of Computer Science and Information Technology, University of Malakand, Chakdara, Pakistan


Introduction
Edge computing is a paradigm to extend cloud computing services to those at edge nodes in networks. us, it brings the computing services near to Internet of things (IoT) devices [1]. Putting resources at the edge of the network enables achieving low latency processing. However, since the enormous number of IoT devices generates a high volume of data, transmitting them to the cloud yields high computational processing. In general, the cloud contains distributed computing resources and processes the data using a group of servers in parallel and distributed way. Sending all the data and tasks to the cloud for processing makes the core network congested and yields a huge load to the cloud servers. To minimize the workload of the core network and the cloud, novel paradigms such as edge computing and fog computing are developed [2][3][4][5][6][7] to bring computational resources to the edge of the network and offer services near to each IoT device as shown in Figure 1. Due to low computing power and limited data storage, the edge nodes are clustered to perform computation and the huge tasks are distributed to the edge nodes. To distribute the tasks resourcefully and efficiently to the edge nodes based on task-associated data, an efficient task scheduling strategy is required. In other words, a cost-effective task scheduler is needed to assign the tasks closer to the data on a cluster node and bring the resources near to computation nodes while improving the overall system performance.
In cloud computing systems, complicated tasks and data are collected to the cloud for computing processes [8,9]. All these data and tasks are generated by IoT devices which are connected to the cloud by a middle layer, i.e., IoT edge nodes. ousands of IoT devices are connected to the cloud which can yield a heavy load to the core network and the cloud system. is increases the frequency of communication exchanges and causes a long latency to the end-users. Consequently, there are resource limitations in a cloud computing layer to incline many researchers toward the computation at edge devices. Data generated by IoT devices can be processed by middle layer devices such as IoT nodes and base stations. e nodes at the edge level retain low processing power and limited resources and, thus, they cannot handle such heavy and complicated tasks. erefore, a cost-effective task management strategy is needed to distribute the complicated tasks to the edge nodes in an efficient way.
In a cloud computing cluster, a task manager predicts the amount of data at the computing node and assigns the tasks to appropriate targeted nodes to guarantee data locality [10]. Based on this prediction, each node tries to bring and preload the data from other locations. How much the preloaded data match the task depends on the result of the prediction. e wrong prediction will yield the preloaded data, which is not useful for running the task. It implies the wastes of communication bandwidth and system resources. Yet, the preloaded data can be exploited by fetching the associated task from the queue. So far, several scheduling schemes have been proposed to balance the workload in the network based on the amount of available resource and data [11][12][13][14][15][16][17][18][19][20]. All these schemes are prefetching the data but they do not consider tasks' priorities concerning the available cached or stored data. On the contrary, our approach in this paper assigns a priority to a task according to availability if the required data can be obtained from the cached-data queue. Consequently, it can reduce the overhead required for task eviction.
On the other hand, in distributed systems [21,22], fetching a computation task near to data is cheaper than fetching the data near to the computation task. Bringing the computation task close to the required data is called data locality in cloud computing environments. It is impossible to guarantee 100% data locality but it somehow can be improved with the existing data at the edge level by minimizing unnecessary data transmissions. For quick access, used data are kept in the cache memory for iterative processes. e cache memory contains two different types of data: static data, which is not changeable and can be used in the next round of task execution, and dynamic data, which is changeable and useable in the next round. Due to limited memory capacity, it is impossible to keep all the data needed for the task in the cache memory of the computing node, since data swap-out and swap-in require frequent processes in the cache and storage memories. Loading data from the storage memory to the cache memory is an expensive process in terms of data processing and transferring. If the cache memory becomes full and the system cannot store more data in the cache memory, least recently used (LRU) and first-in-first-out (FIFO) eviction techniques [23] can be applied to swap out unnecessary old data from the cache memory.
In this paper, we extend the idea from our earlier work [11] to utilize the existing preloaded data effectively based on a cost-effective scheduling strategy, named task priority-based data-prefetching scheduler (TPDS), which distributes the tasks to the computing nodes logically. e proposed TPDS tries to match the task in the queue with the cached-data at the computing node. It generates a priority for a task and allocates the task to a proper edge node based on task-associated data in the cache. With this technique, the frequency of data swapping in the cache can be significantly reduced and the data utilization can be improved for available tasks. If there is no task in the queue for the cached-data, the data is swapped out and replaced by the required new data. In this paper, we employ the multi-server queuing theory [24] to evaluate the performance of the proposed scheduling strategy. e proposed TPDS achieves better performance in terms of data locality, task distribution, and reduction of system overheads caused by unnecessary evictions and data exchanges.
e main contributions of this paper are summarized as follows: (i) Dynamic workload scheduling considering queue-wise job priorities is proposed based on data locality of the cache memory in order to maximize the resource efficiency and the data utilization of a cloud cluster. (ii) In the cloud cluster, our proposed scheme prefetches and evicts the cached-data from the computing node based on task priority. It is able to avoid blind eviction of the cached-data and reduce the system overhead. Hence, it improves the resource efficiency at each node. (iii) rough assigning a task to the computing node based on the data locality, we can minimize the average completion and waiting time for each task. (iv) A multi-server queuing model applicable to the proposed TPDS scheme is developed in order to  improve schedulability of the tasks under different constraints and requirements. e rest of the paper is organized as follows. In Section 2, we review the related previous work concerning scheduling considering prefetching and data locality. We propose a scheduling strategy based on priority-based data-prefetching in Section 3. In Section 4, we evaluate the performance of the proposed strategy, compared to the conventional ones. Finally, this paper is concluded in Section 5.

Related Work
Many data locality schemes for task scheduling have been developed to improve the performance of the computing system regarding task execution. e data locality enables avoiding unnecessary data transmissions for the task in cloud computing. In distributed cloud system, tasks are assigned to the nodes in the network based on the prediction of associated data [25].
In [26], a new caching algorithm, called similarity-aware popularity-based caching (SAPoC), is proposed to promote the performance of wireless edge-caching by utilizing the similarity among contents in dynamic scenarios. It is developed for dynamic wireless edge-caching scenarios, where both mobile devices and contents arrive and leave dynamically. In SAPoC, a content's popularity is determined by not only its history of the requests but also its similarity with existing ones to enable a quick-start of newly arrived contents. It aims to devise an efficient edge-caching strategy considering the dynamic nature of wireless edge computing systems.
In [10], data locality aware workflow scheduling (D-LAWS), which focuses on data locality, data transfer time based on network bandwidth, virtual machine (VM) consolidation, and fairness of workflow scheduling at the node level, is proposed. e D-LAWS maximizes resource utilization and parallelism of tasks and analytically formulates data transfer time between VMs. It combines VMs and considers task parallelism by using data flow while planning task executions for a data-intensive scientific workflow. Moreover, it reflects more complex workflow models and the data locality regarding data transfer before task executions. In [27], the authors proposed a novel scheduling scheme for real-time bag-of-tasks jobs that arrive dynamically at a hybrid cloud. It takes into account end-to-end deadlines of the jobs, as well as monetary cost required for use of the complementary public cloud resources. In [28], a novel hierarchical architecture for multiple cloudlets is proposed for mobile edge clouds. In this work, the authors target improving the efficiency of cloud resource utilization by organizing the edge cloud servers into a hierarchical architecture. Instead of serving mobile users directly using a flat collection of edge cloud servers, the basic idea of the proposed scheme is to opportunistically aggregate the mobile loads and send the peak loads exceeding the capacities of edge cloud servers at lower tiers to other servers at higher tiers in the edge cloud hierarchy. ey developed analytical models to compare the performance between flat and hierarchical designs of edge computing in terms of resource utilization efficiency. Also, they provided theoretical results that show the advantages of the proposed hierarchical edge cloud architecture.
In [29], Raicu et al. implemented regulating data locality and resource utilization. In [30], the authors proposed a cache-aware task scheduling (CATS) technique that finds suitable resources for executing the data-intensive workload. e proposed model minimizes energy consumptions for both core network and cache accesses. e CATS model brings good tradeoffs between energy minimization and execution time reduction by employing accurate analytical models. Similarly, to enhance the data locality and replication technique, a delay scheduling scheme, called delay scheduling based replication algorithm (DSBRA), is presented in [31]. e DSBRA tries to replicate and de-replicate blocks of the data based on prior information taken from the scheduler. is algorithm focuses on block-level replication but some blocks are stored on the least loaded nodes and some blocks are stored on the heavily loaded nodes. In [32], a locality-based data scheduling algorithm 1 is proposed. It allocates the input data blocks to proper nodes based on their processing capacity in order to enhance the performance of MapReduce in heterogeneous Hadoop clusters. e prefetching technique is a smart approach for reducing the extra-overhead of data traffic in distributed computing systems. By applying this technique, the delay consumed for task execution can be reduced due to the presence of preloaded data for the task. However, prefetching and predicting data to be preloaded based on the scheduled tasks become a great challenge. In [31,32], the authors present how to enhance the prefetching techniques and also focus on task scheduling for TaskTracker based on the data. e above-mentioned prefetching strategy maximizes the data locality in distributed computing environments.
Our approach in this paper is based on these previous studies which take into account prefetching to efficiently reuse existing cache data. e main focus in the proposed approach is data eviction and confirmation before task assignment. Our goal is to improve the data locality and to guarantee the resourceful task scheduling in edge computing environments. In the next section, we present the proposed scheduling strategy which enhances the performance of data preloading for tasks and reduces the frequency of cached-data removal blindly. According to our proposed approach, the task scheduler tries to select the most appropriate node in the edge computing cluster from the perspective of the data locality and to assign the task to the selected node. It is able to increase the cacheddata utilization and enhance the swapping process for minimizing the overall system overhead.

Proposed Task Priority-Based Data-Prefetching Scheduler (TPDS)
In this section, the proposed TPDS is presented for edge computing clusters. e TPDS tries to avoid unnecessary eviction of data in order to improve the operation process of task scheduling and data caching. Since the costs of data transfer and eviction result in a great impact on system Security and Communication Networks performance, the proposed TPDS attempts to reduce the costs for data transfer and eviction, while it tries to improve the task execution procedure.

Design Goals.
e design goals of the proposed strategy are (i) prioritization of tasks based on the existing data in the cache memory of the computing node, (ii) improvement of awareness between the computing nodes and a task manager regarding data and task to increase hit ratio of the cached-data, and (iii) speeding up the execution of tasks by reducing the waiting time of jobs and increasing the utilization of the cached-data. Let us consider a set of tasks T � t 1 , t 2 , t 3 , . . . , t n with the associated data set D � d 1 , d 2 , d 3 , . . . , d n and edge computing nodes E � e 1 , e 2 , e 3 , . . . , e n , which contain different data blocks d n in the cache memory, C, or storage, S. Based on the traditional data locality scheme, the task t n ∈ T will be assigned to the computing node e n ∈ E which contains its required data, d n .
en, task allocation to the node can be expressed as t n ⟶ e n S e n ∈E ∃d n , or S e n ∈E ←R L n ∃d n , where R L n denotes any remote location, which contains data d n near to node e n . We assume that five tasks arrive in the system as shown in Figure 2. e details of task allocation to the computing node, e n ∈ E, are given in Table 1. e task t 1 is assigned to the computing node e 1 since the cached-data of the node, C e 1 ∈E , have the data block d 8 that is needed for the processing of t 1 . Similarly, the task t 2 needs d 2 which is unavailable in the cache of the n-th node, C e n ∈E , but available in the storage S e 2 ∈E of the node e 2 . By the LRU cache replacement policy, d 0 , which is the old data block, is swapped with d 2 . Similarly, the task t 3 needs the data block d 3 , which considers an old data block is replaced with d 1 as shown in Figure 2.
In Figure 2, it is noted that there are two data blocks d 0 and d 3 which are replaced with d 2 and d 1 by the LRU policy for the tasks t 2 and t 3 , respectively, due to the limited capacity of the cache memory. After finishing, the tasks t 2 and t 3 , and the data blocks d 0 and d 3 will shift again to the cache C e n ∈E for the tasks t 4 and t 5 , which require them. erefore, the proposed scheduling strategy avoids such unnecessary eviction and swapping of data by prioritizing the tasks based on the available cached-data in the computing node C e n ∃ d n as shown in Table 2 and Figure 3. Equations (2) and (3) express the computing node and task allocation based on the availability of cached-data. e n ∈ E � ∀e n C e n ∈E , S e n ∈E , (2) ∀t n ⟶ ∀e n C e n ∈E ∃d n , S e n ∈E ∃d n , S e n ∈E ←R L n ∃d n .

Performance Evaluation Model
In this section, a theoretical model of the proposed TPDS is formulated and derived. We employ an M/M/c queuing model to evaluate the performance of the proposed TPDS.
Suppose that there are n number of tasks denoted by T � t 1 , t 2 , t 3 , . . . , t n with a set of data blocks denoted by D � d 1 , d 2 , d 3 , . . . , d n and a set of computing nodes denoted by E � e 1 , e 2 , e 3 , . . . , e m . Here, e denotes the computing nodes, m represents the total number of computing nodes, D represents the set of data blocks, and d n denotes the specific data block required for a task. If all tasks arrive in the system, the total number of data blocks contained in the cache for all computing nodes can be expressed as According to the proposed TPDS, before eviction of the data d n ∈ D from the cache memory C e m ∈E , the computing node sends a request to the task manager in order to know if there is any task t n ∈ T in the queue for this eviction of the data d n ∈ D. If there is a task in the queue of the task manager, then it gives a priority to the task and assigns that task to the node e m ∈ E. Otherwise, the data d n is evicted and swapped in the cache memory. To estimate and optimize the probabilistic performance of edge computing nodes, the notations are defined in Table 3.
In this model, we consider two types of tasks: high priority task and low priority task, based on the cached-data as shown in Figure 3. e high priority tasks are the tasks whose required data are already available in the cache memory and low priority tasks are the tasks whose required data are not available in the cache memory of the edge node e n ∈ E as follows: t n � t n<C e n ∈ E ∃d n > high priority , or t n<C e n ∈ E ∄d n > low priority .
We consider all tasks arriving at the edge computing nodes with the rate of λ ∈ T. We assume that the arrival of a task follows a Poisson process and each arrival is transferred to different nodes in the cluster of edge computing nodes. Let ρ � λ/μ be the traffic strength regarding the tasks with different priorities based on the available cached-data, where λ and μ are the arrival rate and the service rate, respectively. e parameters for task requests in the queuing model are N s , W Q , and T s . Among these three parameters, W Q affected by the number of tasks being served plays the primary role in the performance. As shown in Figure 4, the scheduling policy is based on M/M/(e n ∈ E). According to M/(e n ∈ E)  e n ←busy (10) While (QT is not empty) do (11) if (e n is idle) then (12) for all tasks in queue do (13) if tn.dn ∈ C e n ∈E then (14) e n ∈ E ← tn <h> (15) else (16) if (C e n ∈E need eviction) then (17) evcit ← C e n ∈E .old_data (18) C e n ∈E ← S e n ∈E d n (19) e n ∈ E ← t n (20) end if (21) end if (22) end for (23) busy ← e n (24) end if (25) end while ALGORITHM 1: Task priority-based data-prefetching.
Security and Communication Networks queueing model, remaining time, waiting time, and service time of the tasks in edge computing are mathematically evaluated.
As the requests to edge nodes come from the end devices like smartphones, tablets, and wearable devices, the pool of the tasks and the size of the queue are considered to be limitless in the task manager at the cluster of edge nodes. e state transition diagram of M/M/(e n ∈ E), which can be denoted through balance equations, is shown in Figure 5. When the number of tasks t n ∈ T is less than the number of computing nodes e n ∃ d n , only n out of the nodes e n are busy and the mean service rate is equal to n?. From (4), we can obtain If the number of tasks is greater than or equal to e n ∃ d n , i.e., n ≥ e n ∃ d n , all the nodes are busy and the effective service rate is equal to μ(e n ∃ d n ). us, P n � P 0 e n ∃ d n ρ n e n ∃ d n n−P e n ∃ d n ! ⎡ ⎣ ⎤ ⎦ for n > e n ∃ d n . (8) Records of cache data in nodes Task manager Ce 1 Task queue Figure 3: Data prefetching and eviction process based on task priority. Table 1: An example of assigning tasks without considerations of priority and data locality.
Arrival of tasks Required data Computing nodes Table 2: An example of assigning tasks based on priority and data locality.

Arrival of tasks
Prioritized tasks

Required data
Computing nodes  Here, ρ � λ/μ(e n ∃ d n ) where ρ must be less than 1 for system stability. Note that the expected number of busy nodes is equal to ρ(e n ∃ d n ) � λ/μ. To obtain P 0 , both sides of (7) and (8) are summed up. Since ∞ n�0 P n � 1, P 0 is derived as e proposed TPDS is an efficient scheduling strategy that minimizes the costs of data transfer and execution latency. For evaluation of the system performance, it is necessary to calculate the total number of tasks in the queue, the total waiting time, the service time of the jobs, and the total number of tasks in the system. If the number of incoming tasks is less than the number of nodes in the cluster as represented in (7), the system is under the stable condition. us, it is expected that all tasks can be completed on time with no extra waiting in the queue. Otherwise, as in (8), it is highly probable that some tasks wait for long time and never get a service. e proposed TPDS tries to optimally minimize unnecessary eviction and improve the data locality for the tasks. As discussed earlier, when n > e n ∃d n , some tasks must wait in the queue. us, the estimated number of tasks in the queue is given by To evaluate the system performance by applying Little's law, it is necessary to obtain the total waiting time of tasks before service, the total number of tasks in the queue, and the total time spent by a single task in the cluster of edge computing nodes.
e probability that all nodes are busy in edge computing clusters can be derived from (14) and (15).
P B � P 0 e n ∃ d n e n e n ∃ d n − 1 ! μe n ∃ d n − λ . (15)

Performance Evaluation
In this section, the proposed TPDS is evaluated through computer simulations. e job completion time and node utilization under data locality in the cache memory are estimated by Cloudsim [33]. Cloudsim includes a broker (task manager node) and client nodes (number of machines) entities. e results of the proposed TPDS are compared with the existing scheduling and eviction schemes: FIFO, LRU, and HPSO [23,34]. e efficiency of the proposed TPDS strategy is evaluated in terms of hit ratio of cached-data, task execution time, task waiting time, and data locality. e parameter details for the Cloudsim simulator are given in Table 4. Figure 6 shows the used ratio of data for the proposed TPDS, compared with the three conventional schemes. e proposed TPDS maximizes utilization of the cached-data by using it for the incoming task in the queue. e proposed TPDS is not blindly swapping out the old data without knowing the incoming task in the queue. us, it is noted that the hit ratio is higher in the proposed TPDS than the conventional FIFO, LRU, and HPSO schemes. Particularly, when the number of data blocks increases, the time consumed for completing the task for all the schemes also increases due to swapping out data blocks from the cache memory without checking the queue of the task manager. is causes lower hit ratios in the cached-data as the number of data blocks increases. Figure 7 shows the execution times of tasks of the proposed TPDS and three conventional schemes. As shown in Figure 8, the execution time of the task for the proposed TPDS is always smaller than those of the conventional FIFO, LRU, and HPSO schemes. It is because the more swapping out of data gives the longer waiting time to the task to update the associated data for the incoming task. Pre-existing data for tasks will execute the task quickly and there is no need to wait to bring the related data.
Similarly, Figure 8 shows the average waiting time of tasks. From the figure, it is observed that the waiting time of the proposed TPDS is smaller than those of the conventional FIFO, LRU, and HPSO schemes. Compared to the conventional schemes, the proposed TPDS consistently allows shorter average waiting time in the whole range of the number of tasks. e number of tasks is varying from 200 to 2200 and the same distribution of job sizes is maintained throughout the simulation test. e proposed TPDS significantly outperforms the other conventional schemes in terms of average execution time as the number of tasks increases.

Security and Communication Networks
Another feature of the proposed TPDS is the priority scheduling of tasks as shown in Figure 9. It is noted that, with help of this scheduling strategy, the jobs achieve nearly the best data locality, which is helpful to improve the performance of distributed systems. e proposed TPDS takes advantage of cached-data locality to accelerate the computation of the task and minimize the CPU usage and data transfer load in terms of swapping out and swapping in data from the cache memory. It significantly improves the performance of the computing nodes and the execution of tasks. It is shown that the proposed TPDS also always outperforms the conventional schemes in terms of data locality.
In Figure 10, the average execution time of the proposed TPDS is compared with the conventional schemes. We use six different types of workloads as different numbers of data blocks (200 to 2200). Compared to the conventional schemes, the proposed TPDS reduces the average execution time by 8.5% to 10.2% for six different workloads, Total number of tasks in the edge computing system T s e total time spent by the task in the edge computing system W Q e total time of the task waiting in the queue for service P n e probability that the system has "n" number of tasks P b e probability that all nodes are busy in the edge computing system e n ∃ d n e computing node that contains the related data, d n , for the task   respectively. is demonstrates that the proposed TPDS performs data locality more efficiently than the existing schemes due to the availability of data blocks in the cache memory.

Conclusion
As the number of IoT devices and the scale of cloud computing grow in popularity, many edge computing and distributed systems have emerged in recent years. In general edge computing architecture, computing power, bandwidth, and data at the edge are scarce resources. To improve system performance, a task scheduling strategy must be efficient. In this paper, we proposed a cache data locality scheduler for edge-computing cluster environments. e proposed strategy schedules tasks by taking a broad view and adjusts data for tasks dynamically according to data in cache memory. Especially in an edge computing cluster environment, where the number of resources is limited, our proposed approach tries its best to enhance task execution under limited resources and reduce the extra flow of data in the cluster network. When the computing cluster is overloaded, the proposed strategy takes the advantage of data in the cache and brings the task first which finds the needed data in the cache of the node. e simulation results show that the proposed strategy exhibits some improvements which can also work in a busy network and cluster. As future work, we plan to improve the proposed task scheduling strategy based on available resources. We will consider the aspects that may affect the performance including data distributions and replication in a heterogeneous system. Edge computing and distributed technologies are growing up due to massive data volume generated by a large number of IoT devices. Accordingly, it is essential to keep update and development on scheduling strategies and efficient algorithms for tasks to manage resources in edge computing environments.

Data Availability
e data used to support the findings of this study are included within this article.

Conflicts of Interest
e authors declare no conflicts of interest.