Complexity Analysis of New Task Allocation Problem Using Network Flow Method on Multicore Clusters

The task allocation problem (TAP) generally aims tominimize total execution cost and internode communication cost in traditional parallel computing systems. New TAP (NTAP) considering additive intranode communication cost in emerging multicore cluster systems is investigated in this paper. We analyze the complexity of NTAP with network flow method and conclude that the intranode communication cost is a key to the complexity of NTAP, and prove that (1) the NTAP can be cast as a generalized linear networkminimum cost flow problem and can be solved inO(mn) time if the intranode communication cost equals the internode communication cost, and (2) the NTAP can be cast as a generalized convex cost network minimum cost flow problem and can be solved in polynomial time if the intranode communication cost ismore than the internode communication cost.More in particular, the uniform cost NTAP can be cast as a convex cost flow problem and can be solved in O(m2n2 log(m + n)) time. Furthermore, solutions to the NTAP are also discussed. Our work extends currently known theoretical results and the theorems and conclusions presented in this paper can provide theoretical basis for task allocating strategies on multicore clusters.


Introduction
Since the single core processors rapidly reach the physical limits of possible complexity and speed, computer architects have designed multicore processor, which means place two or more processing cores on the same chip.Multicore processors are now growing as a new industry trend and widely used for high performance computing.Further, multicore processors are being configured in a hierarchical manner to compose computing nodes or multicore nodes in cluster systems.Multicore clusters based on these computing nodes or multicore nodes have already been one of the most popular models in parallel computing [1,2].
However, for a multicore node, on one side, the future performance growth in multicore processors will almost certainly come from the exploitation of thread-level parallelism through multicore processors, which consequently can lead to memory access contention when multiple cores concurrently access the shared resources such as memory, cache, and disk /.The synchronization operation introduced to avoid the access contention can require a lot of overhead.In a larger-scale multicore node or high-contention situations, synchronization can become a performance bottleneck because contention introduces additional delays and because latency is potentially greater in such a multicore computing node.On the other side, from the message distribution experiments, it is found that on an average, about 50% messages are transferred through intranode communication, which is much higher than intuition.This trend indicates that considering the intranode communication is as important as considering the internode communication on a multicore cluster [1].As a matter of fact, synchronization can be considered as a special form of communication [3].Therefore, in this paper, in order to facilitate description, the intranode communication overhead and synchronization overhead on a multicore node can be referred to as intranode communication cost.The intranode communication cost tends to increase dramatically when the numbers of multicore processors and tasks communicating on a multicore computing node increase.A report from Berkeley [4] predicts multicore processors with thousands of parallel execution units as the mainstream hardware of the future.Thus, the intranode communication cost has become a key factor to be considered in the TAP on multicore clusters.
In traditional parallel computing systems, the task allocation problem (TAP) is to assign a set of tasks or modules to a set of processors or computing nodes, so that the total execution cost and internode communication cost can be minimized [5][6][7][8][9][10].To our best knowledge, new TAP considering overall execution cost, internode communication cost, and intranode communication cost in emerging multicore cluster systems has yet to be investigated.This paper proposes the new TAP (NTAP) aiming to minimize the total execution cost, internode communication cost, and intranode communication cost on multicore clusters.However, we are now encountering two important and challenging theoretical problems: (1) how can the complexity of the NTAP be efficiently analyzed and (2) what are the effects of intranode communication cost on the complexity of the NTAP.Aiming at the two important theoretical problems, we analyze and prove the effects of the intranode communication cost on the complexity of the NTAP via constructing equivalence relation between the NTAP and minimum cost flow problem.Moreover, solutions to the NTAP due to different complexity are also discussed.
The rest of this paper is organized as follows.After describing related work in Section 2, some basic definitions are provided in Section 3. Complexity analysis of the NTAP is performed in Section 4. Solutions to the NTAP are also discussed in Section 5. We conclude this paper in Section 6.

Related Work
TAP is a classical problem in the field of parallel computing research.Solution methods already suggested for this problem can be roughly classified into three categories [5], namely, graph theoretic approach, mathematical programming approach, and heuristic approach.The graph theoretic approach uses a graph to represent the interconnections between modules and represents the tasks to be allocated as a set of nodes or vertices of a graph.The intermodular communication cost between each pair of tasks is represented by the weight of a nondirected arc or a nondirected edge connecting two nodes or vertices.A communication cost of zero means that there is no communication between tasks or computing nodes, while a communication cost of infinity indicates that the communicating nodes or vertices must be assigned to the same processor or computing node.The mathematical programming approach formulates task assignment as an optimization problem and solves it with mathematical programming techniques.And the heuristic method provides fast but suboptimal algorithms for task assignment, which are useful for applications where an optimal solution cannot be obtained in real time.
In this paper, it is worth noting that our work is closely related to the graph theoretic approach, and our emphasis will be on the network flow method which is one of important graph theoretic approaches.For network flow method, each task and processor are represented by a node or a vertex.The network flow model can be built according to interconnections between modules, interprocessor communication, and task execution overhead on processor and can be solved with maximum flow and minimum cut algorithm.Research by Stone [6] and Bokhari [7] has shown how an optimal assignment may be found efficiently for the case of dual processor systems using a network flow algorithm.While an extension to three processors was developed by Stone [8], algorithms for four or more processors have not been found.Bokhari [9] has shown that the problem of finding an optimal assignment for four or more processors is a NP-complete problem and that the case where the graph of the communicating tasks, which we call communication graph, is a tree and can be solved exactly using dynamic programming.Towsley [10] generalized Bokhari's results to the case of series-parallel structures.From the theoretical point of view, by combining Bokhari's and Towsley's work, Fernandez-Baca [11] proposed polynomial time optimal algorithms in the case where the intertask communication graph is a k-tree.Lee et al. [12] and Cho and Park [13] have suggested optimal algorithms for the general structure problem in a linear array network with any number of processors.Fernandez de la Vega and Lamari [14] have investigated the case where all the tasks communicate with communication costs all equal to a constant  0 and gave two exact polynomial time algorithms and a polynomial time approximation scheme using minimum cost flow theory.In addition, the problem of finding an optimal dynamic assignment of a modular program for a two-processor system is analyzed and Stone's formulation of the static assignment problem is extended to include the cost of dynamically reassigning a module from one processor to the other and the cost of module residence without execution by Bokhari [7].Yadav et al. [15] have extended this model and considered the dynamic TAP for a general program structure and heterogeneous  processors in distributed computing systems.
Traditional TAP generally aims to minimize the total execution cost and internode communication cost without considering the intranode communication cost in multicore cluster computing, which frequently results in inefficient solutions since it cannot characterize and explore the hierarchical design features and potential of multicore clusters.Compared with above-mentioned traditional TAP, the NTAP considers the additive intranode communication cost and can fully characterize and exploit the hierarchical design features and potential of multicore clusters but still remains to be studied.

Preliminaries
Without loss of generality, let  = { 1 ,  2 , . . .,   } be a set of  tasks and let  = { 1 ,  2 , . . .,   } be a set of  computing nodes.Let us denote a task assignment by a vector  = ( 1 ,  2 , . . .,   ){1, 2, . . ., }  and denote the total cost of an assignment by (), where   =  means that   is allocated to   with 1 ≤  ≤  and 1 ≤  ≤ .If a task assignment can minimize total execution cost, internode communication cost, and intranode communication cost, then we call it an optimal task assignment.Let   be the number of tasks assigned to   and let   be the execution cost of   on   .Let the binary variable   satisfy   ∈ {0, 1} and the   is defined to be 1 if   is assigned to   and be 0 otherwise.Let the triple variable   satisfy   ∈ {0, 1, 2} and the   is defined to be (1) 0 if   and   both are not allocated to   , (2) 1 if   or   is allocated to   , and (3) 2 if   and   are both allocated to   , where 1 ≤  ≤ .
1, e nm x nm 1, e 1m x 1m 1, e 2m x 2m n, 0.5x 1 (n − x 1 )c 0 + 0.5x 1 (x 1 − 1)I 0 n, 0.5x m (n − x m )c 0 + 0.5x m (x m − 1)I 0 Let   denote the internode communication cost incurred between   and   assigned to distinct computing nodes and   denote the intranode communication cost incurred between   and   allocated to the same computing node.We assume that   = 0 if   =   ,   = 0 if   ̸ =   , and   =   ,   =   .For any   ∈ ,   ∈  and arbitrary constants  0 and  0 , if   =  0 and   =  0 , then this version of the NTAP is called the uniform-cost NTAP (UCNTAP), otherwise it is called the nonuniform-cost NTAP (NUCNTAP).In addition, we assume that   and   are independent of computing nodes, which means that these computing nodes and communication network of multicore clusters to be considered in this paper are homogeneous.

Main Results
Some complexity problems of the NTAP on multicore clusters are analyzed in this section and the main analysis results of this paper are stated in Sections 4.1, 4.2, and 4.3.In addition, we specify the initial amount of flow as  and the flows on all edges as integer flows.

Analysis of Communication
(2) Proving the equivalence between the UCNTAP and the MCF problem, firstly, we prove that each feasible flow corresponds to a task assignment.With the initial amount of flow being , for any V  ∈ , the amount of flow entering V  equals 1.According to flow conservation law, the amount of flow leaving V  is also equal to 1.As the flows on all edges are integer flows, the edges emanating from V  have one and only one edge of amount of flow 1.In other words, the th task corresponding to V  is assigned to one and only one computing node    .Given any feasible flow  0 , without loss of generality, we assume that the set of all edges having amount of flow 1 and pointing to vertices of  from vertices of  is {(V 1 , V  1 ), (V 2 , V  2 ), . . ., (V  , V   )}; then, the feasible flow  0 corresponds to a task assignment  0 = ( 1 ,  2 , . . .,   ).Secondly, we prove that each task assignment corresponds to a feasible flow.Given any task assignment  0 , we can construct a feasible flow  0 in this way as follows.With the number of tasks being , the initial amount of flow is ; that is, the amount of flow entering any V  ∈  equals 1, e 1m x 1m ,   1.If the th task   is allocated to    , then the amount of flow on edge (V  , V   ) equals 1.Therefore, we can construct a feasible flow  0 , on which all the edges having amount of flow 1 and pointing to computing vertices from task vertices constitute an edge set {(V 1 , V  1 ), (V 2 , V  2 ), . . ., (V  , V   )}.Lastly, we prove that the total cost of the feasible flow equals the total cost of corresponding task assignment and the MCF corresponds to an optimal task assignment.Clearly, the cost function 0.5  (−  ) 0 +0.5  (  −1) 0 of the MCF problem corresponds to the sum of internode communication cost and intranode communication cost, and     corresponds to execution cost.Hence, the total cost of any feasible flow equals the total cost of corresponding task assignment.In addition, for any MCF, we assume that  * corresponds to a nonoptimal task assignment  0 ; that is, ( * ) = ( 0 ); then, there must exist an optimal task assignment  * such that ( * ) < ( 0 ).Furthermore, the  * must correspond to a feasible flow  0 such that ( * ) = ( 0 ), so ( 0 ) < ( * ), which contradicts that  * is a MCF.Thus, each MCF must correspond to an optimal task assignment.
(3) Analyzing the effect of communication cost on problem complexity.Now we analyze the effect of communication cost on the complexity of the NTAP by analyzing the effect of cost function on the complexity of the MCF problem.According to the construction process, the quadratic cost function of the MCF problem is given as The convexity/concavity of the quadratic cost function is determined by the quadratic coefficient  0 −  0 .According to the positive/negative sign of  0 −  0 , the MCF problem can be distinguished as Here, the MCF problem is a P-problem in the cases of linear cost network and convex cost network, and the concave cost network MCF problem is a NP-hard problem.Hence, we can conclude that the UCNTAP is a P-problem if the intranode communication cost is not less than the internode communication cost and can be transformed into a convex cost network MCF problem.The convex cost network MCF problem can be solved in ( log ( +  log )) time [16], where  denotes the number of edges and  denotes the number of vertices.Thus, the UCNTAP can be solved in ( 2  2 log(+)) time if the intranode communication cost is not less than the internode communication cost, where  denotes the number of computing nodes or multicore nodes and  denotes the number of tasks.

Complexity Analysis of the NUCNTAP
Theorem 4. For any   ∈  and   ∈ , the NUCNTAP is a P-problem and can be solved in polynomial time if the intranode communication cost is not less than the internode communication cost.
Proof.(1) Transforming the NUCNTAP into a generalized network MCF problem.As shown in Figure 2, the NUC-NTAP can be modeled as a generalized MCF problem on a network , of which all vertices, with the exception of source vertex  and terminal vertex , are divided into three levels.The first level is a task vertex level  = {V 1 , V 2 , . . ., V  }, where the vertex V  corresponds to the th task   .The second level is a task assignment vertex level  = {V 1:1 , V 1:2 , . . ., V : }.If the amount of flow through vertex V : equals 1 (0), then it denotes that   is (not) allocated to the qth computing node   .The third level is a task pair assignment vertex level  = {V 1,2 : 1 , V 1,2 : 2 , . . ., V −1,: }.For the amount of flow through vertex V ,: , in case 0, it denotes that the th task   and the th task   are not assigned to   ; in case 1, it denotes that   or   is allocated to   ; in case 2, it denotes that   and   are both allocated to   .
The edges of network  can be divided into four levels.The first level is  1 = {(, V  )}, a set of edges having capacity 1, cost 0, and gain 1.The second level is  2 = {(V  , V : )}, a set of edges having capacity 1, cost     , and gain  − 1.The third level is  3 = {(V : , V ,: )} ∪ {(V : , V ,: )}, a set of edges having capacity 1, cost 0, and gain 1.The fourth level is  4 = {(V ,: , )}, a set of edges having capacity 2, cost 0.5  (2 −   )  + 0.5  (  − 1)  , and gain 1.The cost network is a generalized cost network because the gain coefficients on edges of  are not all 1.In addition, we specify the initial amount of flow as  and the flows on all edges as integer flows.
(2) Proving the equivalence between the NUCNTAP and the generalized MCF problem.Firstly, we prove that each feasible flow corresponds to a task assignment.With the initial amount of flow being , for any V  ∈ , the amount of flow entering V  is equal to 1.According to flow conservation law, for  edges leaving V  , there is one and only one edge of amount of flow 1 and all other −1 edges have amount of flow 0. That is to say, the   corresponding to V  is allocated to one and only one computing node    .Given any feasible flow  0 , without loss of generality, we assume that the set of all edges having amount of flow 1 and pointing to vertices of  from vertices of  is {(V 1 , V  1 ), (V 2 , V  2 ), . . ., (V  , V   )}; then, the feasible flow  0 corresponds to a task assignment  0 = ( 1 ,  2 , . . .,   ).Secondly, we prove that each task assignment corresponds to a feasible flow.Given any task assignment  0 = ( 1 ,  2 , . . .,   ), we can construct a feasible flow  0 in this way as follows.With the number of tasks being , the initial amount of flow is ; that is, the amount of flow entering any V  ∈  is equal to 1.If   is allocated to    , then the amount of flow on edge (V  , V :  ) ∈  2 is equal to 1.The amount of flow on  3 can be determined after having determined the amount of flow on  2 .For any edge (V  , V :  ) ∈  2 having amount of flow 1 and gain coefficient −1, we can make the amount of flow leaving V :  to be −1.Thereby, the amount of flow on each of −1 edges leaving V :  and having capacity 1 equals 1 and we can construct a feasible flow  0 , where the edges of amount of flow 1 of  2 constitute an edge set {(V 1 , V 1: 1 ), (V 2 , V 2: 2 ), . . ., (V  , V :  )}.Lastly, we prove that the total cost of the feasible flow equals the total cost of corresponding task assignment and the MCF corresponds to an optimal task assignment.Clearly, the cost function 0.5  (2 −   )  + 0.5  (  − 1)  of the generalized network MCF problem corresponds to the sum of internode communication cost and intranode communication cost, and     corresponds to execution cost.Therefore, the total cost of any feasible flow equals the total cost of corresponding task assignment.For any MCF  * , we assume that  * corresponds to a non-optimal task assignment  0 ; namely, ( * ) = ( 0 ); then, there must exist an optimal task assignment  * such that ( * ) < ( 0 ).Furthermore, the  * must correspond to a feasible flow  0 such that ( * ) = ( 0 ), so ( 0 ) < ( * ), which contradicts that the  * is a MCF.Thus, each MCF must correspond to an optimal task assignment.
(3) Analyzing the effect of communication cost on problem complexity, we analyze the effect of internode communication cost and intranode communication cost on the complexity of the NTAP by analyzing the effect of cost function on the complexity of the generalized network MCF problem.According to the construction process, the quadratic cost function of the generalized network MCF problem is given as The convexity/concavity of the quadratic cost function is determined by the quadratic coefficient   −   .According to the positive-negative sign of   −   , the MCF problem can be distinguished as Here, the generalized linear cost network MCF problem and generalized convex cost network MCF problem are Pproblem, and the generalized concave cost network MCF problem is a NP-hard problem.Hence, we can conclude that the NUCNTAP is a P-problem if the intranode communication cost is not less than the internode communication cost and can be cast as a convex cost network MCF problem.The generalized convex cost network MCF problem can be solved in () time [16], where  denotes the number of edges and  denotes the number of vertices.Thus, the NUCNTAP can be solved in ( 2  4 ) time if the intranode communication cost equals the internode communication cost, where  denotes the number of computing nodes or multicore nodes and  denotes the number of tasks.

Discussing Solutions to the NTAP
The effects of communication cost on complexity of the NTAP have been analyzed and proven.Further, solutions to the NTAP are discussed in this section.Unfortunately, Bokhari [9] has shown that the traditional TAP for four or more processors is a NP-complete problem.Needless to say, the NTAP can be difficult.Therefore, solving the NTAP is a challenging problem.
The NTAP can be modeled as a generalized network flow model and thus can be solved with minimum cost flow algorithms.However, solutions should have much difference in complexity due to the convexity/concavity of minimum cost flow problems [17].In general, the NTAP is a NP-hard problem and cannot be solved in polynomial time, which usually is solved with approximation algorithms or heuristic suboptimal algorithms [5].When the intranode communication cost equals the internode communication cost, the NTAP can be cast as a linear network minimum cost flow problem and can be solved with flow augmentation approach or primal approach [17].When the intranode communication cost is more than the internode communication cost, the convex network minimum cost flow can be converted into a linear network minimum cost flow and thus can be solved with flow augmentation method or primal approach.The transformation process is shown in Figure 3. Convex cost on edge of set  4 in Figure 2 can be approximately represented as piecewise linear cost and each convex cost curve shown in Figure 3(a) can be approximately represented as two linear cost edges or arcs shown in Figure 3(b).Thus, the convex network minimum cost flow problem can be converted into a linear network minimum cost flow problem to be solved.
Furthermore, the mathematical programming model corresponding to the model represented in Figure 2 can be modeled as formulation (5).Thus, the NTAP also can be solved with mathematical programming approaches, where  + () denotes the outgoing edge set of vertex  and  − () denotes the incoming edge set of vertex ,   =   ,  denotes edge (V  , V : );   = [(1 − 0.5  )  + 0.5(  − 1)  ]  ,  denotes edge (V ,: , ),   denotes amount of flow on edge , and  denotes vertex set of .
In fact, the excellent results, as shown in [18], demonstrate that solution to the NTAP presented in this paper is particularly efficient when a large number of tasks communicate, solving reasonably large problems faster than other exact approaches available: Min ∑ (5)

Conclusions
This paper investigates the effects of communication cost on complexity of the NTAP and demonstrates the relationships between complexity and communication cost.We also have proved that (1) the NTAP can be solved in ( 2  4 ) time if the intranode communication cost equals the internode communication cost; (2) the NTAP can be solved in polynomial time if the intranode communication cost is more than the internode communication cost and specifically, the UCNTAP can be solved in ( 2  2 log( + )) time; (3) the NTAP is a NP-hard problem if the intranode communication cost is less than the internode communication cost, which indicates that efficient polynomial time algorithms still remain to be further investigated.Furthermore, solutions to the NTAP are also discussed and need to be further studied.Our work extends currently known theoretical results and the theorems and conclusions presented in this paper can provide theoretical basis for task allocating strategies in multicore cluster systems.

Figure 2 :
Figure 2: The generalized MCF problem equivalent to NUCNTAP.

Figure 3 :
Figure 3: The piecewise linear approximation representation for convex cost function: (a) convex cost function and (b) piecewise linear approximation representation.
If   tasks are allocated to   , then other  −   tasks must be assigned to other computing nodes and there are   ( −   ) communications on   in all, and thus the total communication cost on   is equal to   ( −   ) 0 .The intranode communication cost is only incurred between any two of the   tasks, and therefore the total intranode communication cost is 0.5  (  − 1) 0 . , the internode communication cost incurred on   is   (2 −   )  .Similarly, the intranode communication cost on   is 0.5  (  − 1)  .
(1)t for a Single Computing NodeTheorem 1.For the UCNTAP and any   with   tasks, if one supposes that every pair of   tasks communicates, then the total internode communication cost and the total intranode communication cost incurred on   are   ( −   ) 0 and 0.5  (  − 1) 0 , respectively.Proof.Corollary 2. For the NUCNTAP, the internode communication cost and the intranode communication cost incurred on   between any two tasks   and   are   (2 −   )  and 0.5  (  − 1)  , respectively, where   =   +   .Proof.From Theorem 1, the total internode communication cost incurred on   equals   ( −   ) 0 .When considering only two tasks   and   ,  = 2,  0 =   and   =   +   =  4.2.Complexity Analysis of the UCNTAP Theorem 3. The UCNTAP is a P-problem and can be solved in polynomial time if  0 ≥  0 .Proof.(1)Transforming the UCNTAP into a minimum cost flow problem.As shown in Figure1, the UCNTAP can be modeled as a minimum cost flow (MCF) problem on a network .The th task corresponds to a task vertex V  and all tasks correspond to a set  = {V 1 , V 2 , . . ., V  }.Similarly, the th computing node   corresponds to a computing vertex V  and all computing nodes correspond to a set  = {V 1 , V 2 , . . ., V  }.The source  is connected to all task vertices by source edges of capacity 1 and cost 0, and all computing vertices are connected to the terminal  by terminal edges of capacity  and cost 0.5  ( −   ) 0 + 0.5  (  − 1) 0 , where 1 ≤  ≤ .Moreover, each task vertex is connected to all computing vertices by edges of capacity 1 and cost     .
e nm x nm ,