Decentralized Scheduling Algorithm for DAG Based Tasks on P 2 P Grid

Complex problems consisting of interdependent subtasks are represented by a direct acyclic graph (DAG). Subtasks of this DAG are scheduled by the scheduler on various grid resources. Scheduling algorithms for grid strive to optimize the schedule. Nowadays a lot of grid resources are attached by P2P approach. Grid systems and P2P model both are newfangled distributed computing approaches. Combining P2P model and grid systems we get P2P grid systems. P2P grid systems require fully decentralized scheduling algorithm, which can schedule interreliant subtasks among nonuniform computational resources. Absence of central scheduler caused the need for decentralized scheduling algorithm. In this paper we have proposed scheduling algorithm which not only is fruitful in optimizing schedule but also does so in fully decentralized fashion. Hence, this unconventional approach suits well for P2P grid systems. Moreover, this algorithm takes accurate scheduling decisions depending on both computation cost and communication cost associated with DAG’s subtasks.


Introduction
Splitting a huge job into subtasks yields interdependent subtasks.Once predecessor subtasks return results only then will the execution of successor subtask take place.To characterize a set of subtasks and their dependency on each other we can use directed acyclic graph (DAG).Nodes represent subtasks and dependencies are denoted by arc joining the two nodes.Most of the DAG tasks are highly computation and communication intensive.Intertask dependencies lead to a very complex scenario to find a solution in an efficient manner.Moreover, because of financial constraints most of the organizations do not own high-end computational resources like cluster of supercomputers.The grid provides a solution to get out of this situation.We can access computational resources available on the grid and schedule our DAG based task upon them.Scheduling is the method to shortlist nodes from the available computational resources and then assign tasks upon them in an efficient manner.A lot of scheduling algorithms [1] are in place to schedule tasks upon grid [2,3].However, they use either single server as central scheduler or metascheduler approach.Due to political causes, depending upon central scheduler in a grid computing environment is not viable.Problem with metascheduler takes place when no single cluster has adequate computational resources to execute the bulky job.Moreover, scalability and bottleneck problems are present in both meta-and central-scheduler approach.These shortcomings directed the researcher's interest towards P2P and other decentralized approaches to the problem of grid scheduling.However, most of the initial P2P solutions for grid scheduling problem emphasized only the discovery of accessible computational resources [4][5][6].In P2P approach, each of the resources present on grid takes scheduling decisions on its own [7].P2P approach is also based the concept of decentralization like the ones proposed in [7][8][9][10].Hence, P2P grid has come up as a tempting way to schedule the DAG based tasks.Scheduling algorithm targets at high throughput by utilizing idle nodes present in the P2P grid [11,12].Presently, most of the existing algorithms [13] schedule independent tasks over P2P [14] grid.Fully decentralized technique [15] (computing field scheduling) for scheduling tasks on the grid was proposed in [16].The drawback of this approach [13,16] is that it ignores the communication cost.In [17] we proposed fault tolerant decentralized scheduling (FTDS) algorithm for grid, which schedules independent tasks by taking into consideration the communication and computational cost associated with tasks.However we require decentralized scheduling algorithm which schedules not only independent tasks but also interdependent tasks over P2P grid.While scheduling interdependent subtasks of huge job, scheduling algorithm should consider both the communication and computation cost associated with subtasks of the job.Scheduling subtasks of DAG based task on the heterogeneous decentralized grid is an NP-hard problem.Researchers have used a genetic algorithm to schedule DAG based tasks on a decentralized grid [18].In this paper we propose a fully decentralized P2P grid scheduling (FDPGS) algorithm, which schedules subtasks of DAG based on communication and computation cost.FDPGS gives faster and better results in comparison to the genetic algorithm.The literature review is given in Section 2. Problem of DAG based task scheduling is explained in Section 3. FDPGS algorithm is proposed in Section 4. Simulation results are discussed in Section 5. Finally we conclude and mention future scope of the work in Section 6.

Related Work
Recently, a lot of researchers have proposed decentralized P2P grid scheduling techniques.The prominent one in them is [19].It first shortlists resources from the grid, using CYCLON [14] gossip protocol.Then it schedules tasks on shortlisted computational resources using genetic algorithm.The limitation of this work is that it schedules only independent jobs.Another approach using P2P strategies for decentralized grid is proposed in [13].It uses shaking algorithm originally used for video streaming in P2P network.In this approach authors ignore the cost to send input and output files and assume that negligible communication cost is required to send task to remote sites.As already mentioned fully decentralized technique [15] (computing field scheduling) for scheduling tasks on the grid was proposed in [16].In [16] for any given task authors calculate computing field of that job on direct neighbors of grid resource where a task is generated.We store this data in the dynamic information list of node and schedule task on the node having the least magnitude of the computing field.The drawback of this approach is that it ignores the communication cost like in [13].In our paper [17] we overcome this shortcoming by including the communication cost along with the computing field while making scheduling decision.In [17] we proposed fault tolerant decentralized scheduling (FTDS) algorithm for grid, which schedules independent tasks.
In [18], authors obtain schedule for DAG based task using optimization heuristic.The optimization heuristic used in [18] to obtain good schedule is genetic algorithm.The basic idea of genetic algorithms is given in Figure 1 [20].In genetic algorithm we take initial population and then shortlist some parents from that population.These shortlisted parents are used to obtain new offspring utilizing genetic operators.From this new population, we shortlist those offspring which give the best results for desired properties.To obtain next generation we repeat the above steps till any offspring with desired values of properties is obtained.
Computer-executable generic variant of Fisher's formula [21] is known as genetic algorithm.This generalization as mentioned in [22] is expressed as follows.
Definition 1. "Concern with the interaction of genes on a chromosome, rather than assuming alleles act independently of each other, and enlargement of the set of genetic operators to include other well-known genetic operators such as crossing-over (recombination) and inversion." Genetic algorithm is a four-step technique.In genetic algorithm an individual in a population is known as chromosome and symbolizes feasible solution to a dilemma.In scheduling, every chromosome gives a schedule of a batch of tasks on a group of computational resources.A chromosome can be denoted as a series of individual schedules (every single schedule is a queue of subtasks assigned to that node) for each computational resource in the group separated by a unique value.A second representation uses a matrix arrangement with computational nodes on one dimension and queues arranged on the second dimension.There is also third representation used in [18].We used a variant of genetic algorithm given in [18] to compare with our work.In [18], each gene is represented as a twosome of values (  ,   ).This pair denotes that subtask   is deputed to processor   .This representation reduces computation costs as mentioned in [23].We assign each subtask of DAG based task randomly on any processor to obtain an initial population of solutions.This work amplifies the mutation rate when population stagnates and vice versa.Genetic operator is applied on chromosomes to obtain a new population of chromosomes from previous chromosomes.Reference [18]  put into practice the roulette wheel selection method [24].However the genetic algorithm consumes a lot of time to find the schedule.Schedule length to finish complete DAG based task is taken as a parameter to shortlist parents to be used for crossover.Mutation rate increases when parent generation has schedule length same almost.Hence single test function is taken into consideration.As we increase the test functions complexity of genetic algorithm also increases.The algorithm proposed in this paper is compared with genetic algorithm.However to give genetic algorithm a fair chance we have used single parameter based genetic algorithm.Schedule length is taken as a parameter for selection.We propose a new decentralized scheduling algorithm which efficiently schedules DAG based task on a P2P grid.Our algorithm takes scheduling decision based on computing field and communication cost associated with the DAG based task.Problem of DAG based task scheduling in decentralized grid is very complex; insight into this problem is given in the next section.

Problem of DAG Based Task Scheduling on Decentralized Grid
A computationally intensive task which consists of various subtasks interdependent on each other can be represented by directed acyclic graphs (DAG).The DAG based sample task is shown in Figure 2(a).0 is an origin subtask node shown in Figure 2(a).Important fact about the origin subtask node is that there is no incoming edge.Hence 0 does not require a prerequisite output from any predecessor subtask because it is present at initial precedence level 1.Another type of subtask final node is 9.There is no outgoing edge in 9 because it is the last subtask of DAG based sample task.As soon as 9 finishes our complex task is completed.However 9 can start executing once subtasks present in previous precedence level (level 3) have finished and returned the results.Subtasks 1 to 8 also have such precedence level dependencies on parent subtask.We can schedule these subtask nodes on various computational resources of existing P2P network.The benefit is we can execute subtasks of DAG based task present at the same precedence level in parallel.However to schedule efficiently subtasks on available computational resources is an NP-hard problem.Because of precedence constraints communication cost to send subtask from one computational resource to another varies.Moreover computational cost to calculate subtask on various computational resources varies on the basis of their computational capabilities.Further with increase in size of subtasks or number of available resources complexity of finding good schedule also increases manifolds.Makespan is time the to finish all subtasks of DAG based task.DAG based task scheduling problem targets at reducing makespan while following precedence constraints.This scheduling is better understood with a diagrammatic representation of level by level scheduling of sample DAG based task shown in Figure 2(a).We have taken overlay P2P network shown in Figure 2(b).We consider DAG based task is generated on node .Further we execute subtasks either at  node or on direct neighbors of  which are nodes , , and .
In Figure 3(a) we have shown how first subtask is scheduled without worrying about precedence constraints as there is no parent task present before task 0.When 0 is scheduled at  node then the rest of the nodes do not execute any other subtask of sample DAG based task because all subtasks present at level 2 require results of 0.
As visible in Figure 4(a) we have scheduled all subtasks of level 2 in parallel once 0 returns results.Further subtasks of level 3 start executing in parallel on available computational resources when their parent tasks at level 2 have returned results.Scheduling of the subtasks of level 3 is shown in Figure 5(a).
Finally subtasks present at level 4 start executing on a suitable node as per scheduling algorithms policies, once all subtasks present at level 3 are complete.Figure 6(a) shows scheduling of subtask 9.
Scheduling these subtasks of DAG based task requires the scheduler to make decisions based on scheduling algorithms for DAG based tasks.Firstly all subtasks are assigned priority and then arranged on the basis of their priority.Subtask at lower precedence level gets superior priority as compared to subtask at higher precedence level.Subtask with top priority receives access to computational resources first.Once this top most origin node gets schedule, then subtask with second highest priority gets access to computational resources available.Subtask into consideration is scheduled to computational resource using grid scheduling algorithm for dependent task.These scheduling algorithms are further divided into two subparts static and dynamic.Static scheduling algorithms are of various types like list algorithm, cluster algorithm, and duplication based algorithm.In list scheduling algorithm firstly priority is assigned to all subtasks and then the subtask with highest priority is scheduled to node giving earliest start time, whereas our approach schedules subtask on computational resource finishing subtask faster.

Proposed Algorithm
We have proposed a fully decentralized P2P grid scheduling (FDPGS) algorithm for DAG based tasks on the grid.In the next section, we have used multiple variants of DAG based job to do exhaustive analysis.However, in this section to understand the basic work of FDPGS we have used single DAG based job consisting of 10 interdependent subtasks.We take scheduling decision with the help of contents of modified information list  present on each node.Task's subtasks are interdependent and they are represented by DAG.Sample DAG taken into consideration is shown in Figures 2(a) and 2(b) and represents overlay P2P network.All subtasks are scheduled based on computing field and communication cost attached to that subtask.In [16], the authors put forward the concept of computing field [16] to illustrate the workload  of grid node in a consolidated manner.Method to calculate computing field (CF) [16] is as given in (1) of the following definition.
Hereth waiting job in a queue of length  has size of   million instructions.The number of cores in the node is given by PE.Single core of node can process MIPS PE number of million instructions per second.Computing field for a node is calculated with the help of (1).Entities required to calculate computing field are obtained from the dynamic information list present on that node.The dynamic information list contains values of various properties of the node and its direct neighbor.These values are called entities of that node and its neighbor.
In our approach modified information list  (shown in Table 1) present on all P2P grid resources will contain twelve entities.The first entity in the dynamic information list  contains distinctive name of P2P grid nodes.IP addresses of these nodes are stored in the subsequent row of list , represented by IP  for node .Each node contains different number of processing elements mentioned in the fourth row of list .PE  symbolizes the total number of processing elements present on node .Further, processing elements of each node hold different magnitude of processing capacity, calculated in terms of million instructions per second (MIPS). MIPS  stands for MIPS of single processing element present on node  and is stored in the fifth row of list .Each node can have different number of processing elements.However, processing elements present on the same node contain identical value of  MIPS  .Sixth row holds previous work load values of P2P grid nodes.Previous workload value is utilized to calculate new workload after new subtask is assigned to a node.Wld Old  represents the old workload existing in P2P grid node .This is required to calculate computation cost of subtask under consideration at a particular node.Wsz  stands for window size between two nodes  and .Unit of the window size is kilobits per second and it is stored in row seven of information list .Along with Wsz  we require round trip time (RTT  ) between nodes  and  to calculate communication cost.Magnitude of round trip time is stored in list 's eighth row.
Finally using values of seventh and eighth rows we get the cost of transferring any subtask  from node  to node , which is stored in the ninth row of list .This entity is represented by Trt   .Tenth row stores the load (ld   ) of individual subtask  on node .We add weight of subtask to old workload and get assumed workload (Wld   ).Eleventh row stores the assumed workload (Wld   ) on P2P grid resource  if subtask  is assigned to it.
New entities in it are third and twelfth.The rest of entities are the same as in [17].Third entity gives us the waiting time (pd ld ) for selected subtask  to start executing on any node.Waiting time is the time required by subtask (which consumes maximum time to finish) in previous precedence level to send back results after executing it.Twelfth row gives the time (Rb   ) when the output of subtask  is returned back to origin node from node  where it is executed.
Entities in information list  can be split into two sets; first set consists of static entities and second set consists of dynamic entities.Two new entities added to information list  used in [17] are shown as subsets of the second set in Figure 7.
According to changes in the value of dynamic entities, information list of node and its neighbors will be updated.If there is no change in the value of dynamic entities even then after a fixed time interval information list will be refreshed.This causes extra network traffic [19].However network traffic due to time-to-time updating of information list on neighbor nodes will be extremely low [19].
Figure 8 explains graphically the basic working of FDPGS algorithm.Any P2P grid node where task  is located is known as origin node.Origin node either executes subtasks of task  itself or forwards the subtask to any of its direct neighbors, such that task  finishes in the minimum possible time.
According to precedence order one by one we schedule subtasks of DAG based task , using fully decentralized P2P grid scheduling (FDPGS) algorithm shown in Algorithm 1. Algorithm consists of the following steps.x is having the smallest value, and assign subtask ST j to that task T stored in task sequence , followed by Priority Based Task Sequence.If massive job is present at any node, then we arrange subtasks of job in nonincreasing order of their execution.DAG based task  is generated on the origin node ( origin ).Task  consist of various subtasks.Which are present on various precedence levels.We make priority based task sequence  of subtasks of  on the basis of precedence level.
Selection of Subtask for Scheduling and Predecessor Prerequisite.Now we choose first unscheduled subtask ST  from .If precedence level of subtask ST  is one, then there is no predecessor of the subtask.Hence time for the predecessor subtask to finish and return results (Pd ld ) will be zero.
This way we will schedule all subtasks of DAG using fully decentralized P2P grid scheduling (FDPGS) algorithm.Our algorithm is fully decentralized as for every huge job present on various nodes those nodes will acts as origin nodes.In this example task  is present at  node and node  act as origin node.If task  is present at the , then  will act as an origin node.The same is true for other nodes of P2P grid.
FDPGS algorithm gives results better than genetic algorithm as can be confirmed from the next section.

Simulation Results
We have considered a DAG task  which is divided into 10 subtasks shown in Figure 2(a).Each subtask is of million instructions in size and Kb in magnitude.Details of the subtasks of DAG based task  are shown in Table 2. Now this DAG based task  is generated on node 0 which is shown in virtual network topology as node .Node  is having direct neighbors: node 1 as , node 2 as , and node 3 as  in overlay P2P network.Specifications of P2P nodes are shown in Table 3.
When we schedule randomly our subtask 0 of task , it is assigned to node 3 with new magnitude of workload on node 3.However, our subtask 1 has to wait for more time before it can execute.Cause of this delay is that we have to first transfer 0 to node 3 from node 0, which takes some time.First, we add new workload and transport time and then we finally add roundtrip time between node 0 and node 3 to it.This is how we get waiting time for subtask 1.
Similarly we schedule the rest of subtasks and task  finishes at 16.10 seconds as shown in Figure 9(a).Random schedule is obtained by running 20 times random scheduling Comparison of genetic algorithm and FDPGS algorithm finish time for 10 different DAG based tasks is shown in Figure 18.All DAG based tasks gave good results with FDPGS algorithm over a variant of genetic algorithm proposed in [18].As shown in Figure 19, when we schedule using FDPGS a communication intensive DAG based task, then the last subtask in priority based task sequence  will have waiting a time always less as compared to the one obtained by genetic

Figure 4 :
Figure 4: (a) Scheduling of subtasks present at level 2; (b) position of subtasks in DAG based task at level 2.

Figure 5 :Figure 6 :
Figure 5: (a) Scheduling of subtasks present at level 3; (b) position of subtasks in DAG based task at level 3.

Figure 7 :
Figure 7: Information list  with all its subsets.

I
am origin node, where DAG based task node.Afterwards my information list has been updated accordingly, which had all the details required for scheduling subtasks.Also I have to notify my direct neighbors to change entities in their information lists consequently I am direct neighbor of origin node; hence origin node can send me any job Values of entities in my information list will change as instructed by origin node; also my other neighbors will be instructed to change value of my entities accordingly in their information list Neighbor node N T is located I will choose one by one sub tasks (ST j ) of finding node R for which Rb ST

Figure 11 :
Figure 11: Waiting time for subtasks of DAG based task  when we schedule using random scheduling, genetic algorithm, and FDPGS algorithm.

Figure 12 :
Figure 12: Utilization of P2P nodes in random scheduling.

Figure 13 :Figure 14 :
Figure 13: Utilization of P2P nodes when genetic algorithm is used for scheduling.

Figure 15 :
Figure 15: Time is taken by random, genetic algorithm and FDPGS to finish DAG based task .

Figure 16 :
Figure 16: Memory used to find the schedule in bytes to calculate a schedule using random scheduling, genetic algorithm and FDPGS algorithm.

Table 1 :
An example of an information list  on P2P grid node .
However if the precedence levels of previous and present subtask are not the same, Pd ld becomes equal to resultback from  origin to neighbor node .Assumed load (Wld In the above equation  ST  is the size of subtask in Kb.Wsz  is the window size between nodes  and .RTT  is the round trip time between nodes  and .A third entity in list , shown as third row in Table 1 is waiting time (Pd ld ).Waiting time (Pd ld ) is zero when level of ST  is one and it is equal to RbK when precedence level of ST −1 is not equal to precedence level of ST  .Assumed workload is calculated with the help of the general equation. ST  is the size of subtask ST  in million instructions. on node  when subtask ST  is assigned to  is calculated using one of the three equations stated below.The equation whose condition is satisfied will be shortlisted to calculate Wld Updating Information List and Finding Value of Resultback for Next Subtask in Task Sequence.Finally we update value of Wld Old  in list  on node  and in list of all its direct neighbors.After this, when precedence level of ST  is one or precedence level of ST −1 is not equal to precedence level ST  , then RbK ) is the twelfth entity in list .It is the time required by a node to calculate subtask and send the results back to the node where subtask is generated.Resultbacktime depends upon two factors; first one is assumed workload of the node where subtask ST  is assigned including the load of subtask ST  .Second factor is the round trip time to send result from node where it is executed to origin node where DAG based task  was initially present.Resultbacktime is calculated with the help of = Pd ld + ld   + RTT  .