Quality of task scheduling is critical to define the network communication efficiency and the performance of the entire NoC- (Network-on-Chip-) based MPSoC (multiprocessor System-on-Chip). In this paper, the NoC-based MPSoC design process is favorably divided into two steps, that is, scheduling subtasks to processing elements (PEs) of appropriate type and quantity and then mapping these PEs onto the switching nodes of NoC topology. When the task model is improved so that it reflects better the real intertask relations, optimized particle swarm optimization (PSO) is utilized to achieve the first step with expected less task running and transfer cost as well as the least task execution time. By referring to the topology of NoC and the resultant communication diagram of the first step, the second step is done with the minimal expected network transmission delay as well as less resource consumption and even power consumption. The comparative experiments have shown the preferable resource and power consumption of the algorithm when it is actually adopted in a system design.
The development of integrated circuit has provided strong support for the integration of multiple processing elements (PEs) in single chip, and the on-chip communication between cores has developed from bus-based approach to two-dimensional and three- dimensional Network-on-Chip (NoC). The network-based highly parallel System-on-Chip (SoC) structure has become the inevitable choice for next generation of complex computer architecture [
The NoC-based task scheduling and IP mapping, on the basis of given tasks, type and amount of PEs available, and topology of NoC, assign tasks to suitable PEs, map the PEs to reasonable network topology, improve as much system efficiency as possible while the whole system meets the power consumption, and delay requirements. Its significance includes the following: (1) it serves as the bridge between applications and architecture and determines the task implementation, processing performance, and efficiency in architecture; (2) as heterogeneous multicore architecture usually associates with particular field, efficient task scheduling could acquire support applications in specific fields; and (3) as the size of tasks and multicore system architecture is increasing, efficient division of mapping will help improve the quality and efficiency of exploring mapping space and thereby improve the performance and efficiency of the entire SoC.
Current research seldom distinguishes between task scheduling and IP mapping detailedly, and the modeling and analysis is conducted providing that a PE only performs a subtask (in some algorithms, subtasks are simplistic and considered to be PEs). That is to say, the task will be abstracted to a simple form of task model which just gives the calling relationship between subtasks; based on the above information, the scheduling algorithm will allocate as little uptime as possible [
In addition, in terms of the time of scheduling decision, task scheduling can be divided into static scheduling and dynamic scheduling. Static scheduling means that the compiler makes scheduling decision at compiling time, for example, list-based algorithms [
Dynamic scheduling means that a scheduler needs to schedule tasks to appropriate processors for the implementation according to their performance and in a real-time way so that the various requirements for the system can be met. Research in this area mainly employ heuristic algorithm, such as genetic algorithm (GA) [
Meanwhile, in the aspects of NoC topology, through silicon via (TSV) technology [
Based on the analysis above, the whole design process is divided into two stages. As shown in Figure
Two stages of task scheduling and IP mapping.
The rest of the paper is organized as follows. Section
Although the types and quantities of PEs integrated in heterogeneous multicore system based on NoC are expanding, the size of application task varies and the current task scheduling algorithm often assign and map the task in accordance with the numbers of utilizable PEs, which, to some tasks of small size, may result into problems; on one hand, as the tasks are divided into subtasks of extremely small size, communications among subtasks would become overfrequent which may lead to prolonged task execution time; on the other hand, inadequate utilization of the performance of PEs may result into increased system power consumption and reduce overall system efficiency.
This paper superimposes tasks on a PE until the computing resource of the PE is occupied at an appropriate ratio (settings are based on the performance requirement of system as well as PEs), and then new PEs are added. The approach not only ensures that tasks are divided into subtasks of appropriate size but also ensures that every PE invoked is efficiently used, thus bringing the best overall performance.
A task could be divided into
Type ( PCU: the running cost of every type of PE per unit time, in which element
The target of task dividing and scheduling is to find a proper strategy of assigning and scheduling while meeting task processing sequence and resource limitation which could assign
The resource occupation of every subtask is encoded by indirect encoding. The encoding length depends on the amount of subtasks. Every particle corresponds to a certain task assigning strategy.
Assume there exits
Example of particle coding.
Subtask number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Type of PE | 3 | 2 | 1 | 1 | 3 | 2 | 1 | 2 | 3 | 3 |
Example of decoding.
Type of PE | Subtask number |
---|---|
1 | 3, 4, 7 |
2 | 2, 6, 8 |
3 | 1, 5, 9, 10 |
Task dividing.
Type of PE | Number of PE | Subtask number |
---|---|---|
1 | 1 | 3, 4, 7 |
2 | 2 | 2, 6, 8 |
3 | 3 | 1, 5 |
3 | 4 | 9, 10 |
It follows from the task model that the running time of every subtask in different PEs is already known. The running time on every type of PE is defined as
Assuming that the task set in the
Assuming that the population size is
The fitness function of time is defined as
The overall fitness function is obtained as follows:
The algorithm will select particles with higher fitness value so that it could provide excellent basis for generating excellent particles of the next generation.
In every iteration, the particle would update its velocity and position by (
Randomly initialize the position and velocity of the particle swarm based on the description in “Initialization and Fitness Function.” Compute the velocity and position of every particle. Compute the fitness value of every particle and set If Go to step 2. Assign PEs of reasonable amount to every type of PE in accordance with the processing ability and total amount of tasks to be processed.
After task dividing and scheduling, the IP communication diagram is formed. In the multicore system based on NoC, the further need is how to reasonably map these PEs into NoC nodes and minimize the network transmission delay during the task execution under conditions that the resources are less occupied and energy consumption is balanced. This is the question of IP mapping.
There are often two orientations in IP mapping: either to minimize the internal communication cost or to minimize the external communication cost [
In the meantime, as described above, PEs of different types would have different requirements on a NoC communication capability. In order to save on-chip resource and decrease system consumption, various heterogeneous network topologies are designed. Therefore, during IP mapping, the matching between the communication requirements and on-chip communication capability entails comprehensive consideration.
The paper, based on the property of PEs to be mapped and the characteristics of distribution of transmission capability on topology, maps the PEs of high communication requirement to high-capability area, balances communication cost internal with that external, and achieves on-chip communication of system by minimum transmission delay and less resource occupancy. The mapping algorithm consists of two parts: the expression of the network topology by two-dimensional matrix and the IP mapping. They are detailed as follows.
The communication diagram can be abstracted into a triple
It is complicated to express NoC topology directly, especially, three-dimensional NoC. Nevertheless, two-dimensional matrix expresses topology well and many properties of matrix could also be applied to topology computation. Therefore, the paper expresses topology by two-dimensional matrix before IP mapping.
Three-dimensional mesh topology can be taken as an example. Shown in Figure
Topology and its expression by matrix.
Through the approach above, there forms one-to-one correspondence between the position of every node in three-dimensional NoC topology and that of every element in matrix. IP mapping conducts computing optimization on the basis of matrix.
Before introducing the concrete algorithm, three parameters are given as follows.
Manhattan Distance
Euclidean Distance
Communication cost in mapped area is obtained as follows:
The target of the algorithm is to map PEs with high communication requirement to topology area with high communication capability and find out a mapping scheme which has minimum
The algorithm divides communication diagram into collections Start mapping computation from collection Start from the PE with maximum communication traffic (sum of input and output) and map it to the switching node in the area of high communication capability whose available neighboring nodes number is nearest to PE node degree. Choose the node which has maximum communication data with assigned area as the next PE to be mapped. Correspond the PE to switching node which has minimum Manhattan Distance with mapped area. If more than one node meet requirement, choose the node whose available neighboring nodes number is nearest to PE node degree; if there are still more than one node, then choose the switching node which has minimum Euclidean Distance from the center of mapped area. Repeat step 3 and step 4 until all PEs are mapped and start algorithm of another PE diagram to be mapped.
Figure
Description of mapping process.
The mapping algorithm arranges PEs with direct communication relationship to neighboring nodes, ensuring the road between source node and destination node to be shortest without any conflicts with other transmission roads, thus minimizing the delay in the whole mapping area.
The comparison and evaluation on the performance of designed algorithm are given from two aspects. The first one is the velocity efficiency itself of task dividing and scheduling algorithm. By computing tasks of the same size according to GA, ACO, PSO, and algorithm in this paper, respectively, and comparing the running time, we can prove the efficiency of algorithm. This part is conducted in Matlab with iterations being 200 times; the comparison of time required for running algorithms is shown in Figure
Comparison of algorithm velocity.
The other one is the comparison on actual mapping effect (Figure
Comparison of mapping effect.
In this paper, the task scheduling model is further improved and the operating cost per time unit is employed as uniform measurement for PEs of different types and simplifies algorithm; task dividing and scheduling and IP mapping are handled separately so that the resultant algorithm scheduling is more efficient and truthful. The target of scheduling not only considers the total time spent but also considers the time cost and resource cost during the task running so as to achieve comprehensive optimization of system performance.
The authors declare that there is no conflict of interests regarding the publication of this paper.