Monitoring and diagnostic systems are required in modern Network-on-Chip implementations to assure high performance and reliability. A dynamically clustered NoC monitoring structure for traffic and fault monitoring is presented. It is a distributed monitoring approach which does not require any centralized control. Essential issues concerning status data diffusion, processing, and format are simulated and analyzed. The monitor communication and placement are also discussed. The results show that the presented monitoring structure can be used to improve the performance of an NoC. Even a small adjustment of parameters, for example, considering monitoring data format or monitoring placing, can have significant influence to the overall performance of the NoC. The analysis shows that the monitoring system should be carefully designed in terms of data diffusion and routing and monitoring algorithms to obtain the potential performance improvement.
Network-on-Chip (NoC) [
NoC monitoring systems are typically designed for two purposes: system diagnostics and traffic management. The former aims to improve the reliability and performance of the computational parts while the latter concentrates to the same issues in the communication resources. The traffic management should take into account the status of the network resources including their load as well as their possible faultiness.
A technology-independent framework of the dynamically clustered monitoring structure for NoC is presented and its features are discussed in this paper. A SystemC-based NoC simulation model is also presented. The dynamically clustered monitoring structure is fully scalable monitoring system which is primarily aimed for traffic management purposes. This paper is organized as follows. NoC traffic management and different monitoring structures are discussed in Section
Traffic management is implemented into NoC to maintain network performance and functionality in the case of faults and under high traffic load. Typically there is a monitoring system to collect traffic information from the network and an adaptive routing algorithm which adapts its operation when the conditions in the network change.
Two types of information are required in traffic management: traffic status in the network and locations of faults in the network. Traffic status can be observed from different network components: router activity, router FIFO occupancy, or link utilization, for instance. Fault information can cover the faultiness of different network components: routers or links, for instance. A network component is considered as faulty when it does not work as it should by its specification. The network components have to have mechanisms to detect these faults [
The components of a monitoring system are monitors and probes. The probes are attached to a network components (e.g., routers, links, or network interfaces) to observe the functionality of a network component (see Figure
Network components.
Monitoring structure defines the number and type of monitors and probes, their placing, connections, and tasks. A centralized monitoring structure has one central monitor and several probes that observe the data and deliver it to the monitor. In centralized structure the central monitor has complete overall knowledge of the network but it causes significant amount of monitoring-related traffic in the network. A clustered monitoring structure has a few cluster monitors and several probes. The network is divided into subnetworks, clusters, each of them having a cluster monitor and several probes. The complete network knowledge can be reached using intercluster communication but most of the tasks can be executed inside a cluster. However, a clustered structure still causes a considerable amount of monitoring traffic [
In an NoC the data is typically transferred as packets which have a destination address. Routers forward these packets based on this address and the applied routing algorithm [
NoC monitoring systems have been presented in several papers. A dedicated control network is used in a centralized operating system controlled NoC [
Clustered monitoring structures have been discussed in several papers. A monitoring system to collect error, run-time, and functional information from routers and network interfaces (NI) is presented in [
These NoC monitoring structures are nonscalable or partly scalable. Our goal is to develop clustered monitoring towards yet finer granularity and better scalability. A scalable monitoring with regional congestion awareness is represented in [
Our research focuses on a scalable NoC monitoring structures where the knowledge about network conditions is spread widely enough over the network. There are two main factors taken into account while designing our NoC architecture. First, the structure should be not only aware of traffic but also aware of network faults so that network-level fault tolerance can be actively maintained during routing. Second, the structure should also be fully scalable to any size of mesh. All the probes and monitors are identical, and they work autonomously without any centralized control. The presented ideas can be adapted to different kind of NoC topologies but, due to its popularity, we have decided to concentrate on the mesh topology.
Our dynamically clustered Network-on-Chip is previously discussed and analyzed in [
Our in-house NoC simulation model has been presented and different status update intervals in the dynamically clustered NoC has been discussed and analyzed in [
This paper includes broader and more in-detail presentation of the in-house NoC simulation model (see Section
The proposed DCM structure and its features are simulated and analyzed using a
The analysis, presented in this paper, is performed using NoC with 64 cores and routers arranged to rows and columns of eight. Two-level traffic pattern has been used. The NoC simulation model is widely customizable which enables the analysis of several different design aspects.
A simulation mechanism of the NoC simulation model is able to execute transient analysis where the cores are sending packets following to a specific traffic pattern and key figures are documented during the simulation. These figures include the numbers of sent and received packets (including the data packets and the monitoring packets, see Section
The router model is designed for mesh networks and so it has five ports, four for traffic to and from the neighboring routers and one for traffic to and from the local core. There is a small FIFO buffer in each input port and a centralized FIFO for packets which cannot be routed at the first attempt but can be rerouted later. If the routing fails constantly, the packet is dropped and the router should notify the sender when dropping a packet. The reporting feature is not implemented at this point. The packet dropping typically happens in severe situations where routing is inhibited permanently due to permanently faulty network resources making destination unreachable. Sizes of the FIFO buffers are customizable. The router model includes several different routing algorithms. The used algorithm (see Section
The monitors are used to observe the functionality and state of the system. The monitor component in our NoC simulation model includes both a probe to collect the monitoring data as well as the monitor to process the collected data. The monitor component can be also configured to act only as a probe or a monitor. This is useful, for instance, when analyzing centralized monitoring structures [
The model of a link in the NoC simulation model is unidirectional so that they are used in bunches of two in between two routers, one in each direction. The link is a simple component which forwards the incoming packets. A link can be set on usable or unusable state to model faults in the network.
The NoC simulation model has three traffic patterns to be used in analysis. A traffic pattern has two parameters, one defines the amount of sent data during a unit time while another defines how the destination cores are determined. The simplest pattern is a fully random traffic pattern which randomizes the packet destinations among all the cores in the network. A weighted random traffic pattern is adjusted so that one-third of traffic is between neighboring cores, another third between neighbor's neighbors, and the last part between all the other cores in the network. This pattern roughly imitates a traffic pattern in a real NoC implementation.
The third implemented traffic pattern is a two-level pattern which includes uniform random traffic and varying hot spots each of which sends a relatively large number of packets to a single receiver during a certain time interval. A relatively small number of cores operate as hot spots simultaneously and send packets to a statically chosen receiver cores. At the same time, other cores are sending relatively smaller amount of traffic to random destinations. This two-level traffic pattern imitates real applications where most of the traffic takes place between certain cores at a time. It is aimed for even more realistic performance simulations. The simulations, presented in this paper, are done using this two-level traffic pattern.
During simulation the network links can be set faulty. In Network-on-Chip simulations the fault information can be simplified by using only the information on faulty links and representing other faulty components by marking the links around these components to be faulty. In our simulation framework the number of faults is defined by the user and the simulator places the faults randomly in the network. The simulation is executed several times with different fault patterns and the results are averaged from the original simulation results. This procedure gives overall insight of the system's operation when parts of the network are faulty.
Dynamically clustered monitoring (
Network topology showing the connections between routers, networks interfaces (NI), monitors (MO), probes (PR), and cores in a part of mesh shaped Network-on-Chip.
Each router has a dynamic cluster around itself from where the router receives the data it needs for traffic management. There are no fixed cluster borders as there is in traditional clustered networks and a router can belong to several different dynamic clusters. A dynamic cluster is the area around a single router of which the router has knowledge and to where the router shares its own status. Router's own status is delivered to all the routers in this cluster area. The delivery of router status is called as status data diffusion. The dynamic clusters of different routers overlap with each other. The simplest dynamic cluster includes 4 closest neighbors of a router but it can be expanded to neighbors' neighbors and so on. A system which uses DCM for traffic management could have, for instance, operating system level control for tasks that need complete knowledge of the system. When traffic management is implemented with a DCM structure, the load of the network can be optimized.
There are several issues which affect the functionality of a monitoring system. The used simulation environment is presented in Section
The NoC simulation model utilizes an adaptive routing algorithm [
Routing directions. N: North, S: South, E: East, W: West, and R: Router.
We also propose an experimental routing algorithm where the destination's distances to the core in different routing directions are better taken into account. In this algorithm the destinations are classified to 24 different routing directions which differ in varying distances in different routing dimensions (
Routing directions in our experimental routing algorithm.
In each of these 24 routing directions we have ranked the possible output ports based on the destination and network conditions. Every time a packet is routed the algorithm identifies the routing direction and uses available traffic status and fault information to select the appropriate output port.
In the basic DCM structure the monitoring data is transferred in packets using the actual data network. These packets are called monitoring packets. The monitoring packets have a higher priority in the routers so that they can be transferred even when there is congestion in the network. The monitoring packets are sent from a monitor to a monitor, but because the monitors are not directly connected to each other, the packets are transferred via routers and links.
The router statuses in the DCM structure are represented with two binary numbers, one for traffic status and another for fault information. The status of a router is based on the occupancy of the FIFO buffer where packets are waiting to be routed forward. The faultiness of a single component can be represented using a single bit while number of bits in the traffic status values is related to the size of the FIFO buffer, required accuracy as well as the used additional status data processing (see Section
In the DCM structure the monitors exchange their own and their neighbors' statuses with each other. Typically monitoring packets include fault statuses of nearby links and one or more traffic statuses of routers depending on the size of the monitoring cluster. The structure of a monitoring packet payload in systems with monitoring cluster sizes 5 and 13 is presented in Figure
Structure of monitoring packet payload with cluster sizes 5 and 13.
In centralized and clustered monitoring structures the monitoring packets are transferred in a network in the same way as the data packets (see Section
A monitor stores the status data from received monitoring packets to its memory and provides this information forward to its own neighbors. This way the routers are able to receive information not only from their neighbors but also from the neighbors of their neighbors. In dynamically clustered monitoring structure the network status data spreads over the network without centralized control and without routing related processing.
The update interval of a monitoring data denotes the conditions when a monitor sends an up-to-date monitoring packet to its neighbors. In [
Our research has shown that, while being the most complex at the implementation level, the hybrid status update interval leads to the highest network performance in terms of throughput [
The network status data diffusion defines how far the status of a network component (router) spreads in the network. A wider spread area makes it possible to react to problems early and avoid routing packets to the worst hot spots or faulty areas.
There are two factors which affect the status data diffusion: the size of a dynamic cluster and status data processing. The cluster size defines how far the status data diffuses from a router and the amount of neighbor status data a router has. The cluster size has an effect on the quantity of neighbor status data to be transferred in a monitoring packet. If the size of a dynamic cluster is 5, it contains the neighbors of a router and the router itself. In this case it is enough to send only the router's own status to the neighbors. If the size of a cluster is 13, it also includes the neighbors of the neighbors and then four neighbor statuses should be included in every monitoring packet. In larger dynamic clusters the amount of monitoring data which has to be included in a monitoring packet increases to a level which is not practical. Therefore, we have limited our analysis to dynamic cluster sizes of 5 and 13.
In DCM structure the network traffic status values are based on the load of the routers. When additional status data processing is used, each router status value is based on the state of the router itself and the state of its neighbors. When the neighbor routers' status is defined using also its own neighbors, the status data diffuses over the network.
A processed status of a router (
Indexes of the neighbor routers.
The analysis of network status diffusion is presented in Figures
Throughput with different sized clusters (
With additional status data processing
Without additional status data processing
Throughput with different sized clusters (
With additional status data processing
Without additional status data processing
The differences in network performance appear when the throughput has been fully or nearly saturated. The figures show that in a faultless network the performance differences are notable. The proportional differences were measured at the point where 60% (0.6 on the
All the following performance increment percentages are in proportion to the performance of an NoC with similar fault pattern, deterministic routing algorithm, and without network monitoring. In a faultless network (see Figure
The analysis shows that DCM with small cluster size improves network performance significantly. An especially notable feature is its ability to maintain the network throughput in a faulty network. Without network monitoring the throughput decreases 41% when 10% of links become faulty. However, if the presented monitoring is used the decrement is only 11%. Furthermore, the throughput in a faulty network with monitoring is 6% higher than it of a faultless network without monitoring.
A noteworthy observation is that a larger cluster size does not have positive impact on the performance but actually reduces it. This phenomenon can have multiple reasons. One reason for the inefficiency can be that too much data processing leads to inaccurate status data and dissolves the differences between the statuses. Another reason could be the latency in the status data propagation which makes it outdated before it is utilized.
The influence of the additional status data processing is small or even nonexistent. In a faulty network there is a very small increase in the throughput. However, in faultless network the impact is even negative. For example, Figure
Throughput with and without processing.
The inefficiency of the status data processing may stem from the same factors as that of the large cluster size. The differences between status values dissolve and are not on display so that the routing algorithm could make right decisions.
The network status data is used to deliver the information of the state of the network and it can be used to different purposes. When the main application is traffic management the data typically includes information concerning network load and faults. Faults are simply denoted using binary values which indicate if a component of the network is usable or faulty. In more complex systems multilevel fault indicators could be considered. The network load is denoted using a scale where different values represent different amounts of load on a network component. In our simulations the network load representation is linear. Different scales can be considered in some specific applications.
The status data granularity defines the resolution of the status data values or how many different values there are on the scale which is used to represent the load on a network component. The smallest used value indicates that the load of a network component is very low and the highest value represents high load of the component. Rest of the values indicate component load linearly between the extreme values. An example of the status data granularity was given in Section
The 64-core NoC has been simulated with different granularity alternatives. 32 was defined to the maximum possible granularity because of the limited size of payload in the monitoring packets. A status value with 32-level granularity can be indicated with 5 bits. When monitoring cluster size is 13, a monitoring packet should include information on router's status and statuses of its four neighbors. With granularity of 32, this takes 25 bits which can be considered a realistic amount of data in a monitoring packet. The same data granularity is also used between probes and monitors.
The throughput of a 64-core NoC with diverse status granularity is presented in Figures
Throughput with different traffic status granularity alternatives.
No faults
10% of links are faulty
Throughput with different traffic status granularity alternatives.
No faults
10% of links are faulty
A method to simplify monitoring status data is to combine traffic and fault information. In the original status data format (see Figure
Traffic status values can be used to indicate faults by defining that the maximum status value does not only indicate high traffic load but also faulty resources. If there is a faulty component in some direction, the traffic status value of that direction is set to its maximum value. In this case the payload of a monitoring packet (see Figure
The monitoring data is simplified even further when all the status data is combined to the boolean fault indicator values. In this approach, a routing direction is marked as faulty when there is high traffic load. The status can be restored when the traffic load decreases. This way packets are not routed in highly loaded directions. A drawback in this approach is the loss of knowledge about differences between routing directions with low and medium traffic load. Because the traffic status values are not used, the reduction of the monitoring packet payload is
The monitoring data combination approaches were simulated with the NoC model, and the results are presented in Figure
Throughput with separate traffic and fault data as well as with the hybrid formats.
No faults
10% of links are faulty
The presented analysis leads to a resolution that both monitoring data classes are necessary in a system where faults are a realistic threat. In less vital applications the hybrid formats could be a good compromise.
In the DCM structure the monitoring data is transferred in the same network which is used by the original data packets. It is a straightforward solution which minimizes the requirement of additional resources. However, a shared-resource structure is always at least somewhat intrusive and it consumes the network resources which otherwise could be used by the actual data packets.
An alternative solution to the intermonitor communication is serial communication which is implemented with dedicated channels. It can be realized with relatively small amount of additional resources. A drawback in serial communication is the increased transfer delay. However, because the serial communication resources are dedicated to the monitoring communication there can be a nonstop status update without paying attention to update intervals [
Serial monitor communication was simulated with the SystemC-based NoC simulation model. Throughput with different status data granularities and serial communication is presented in Figure
Throughput with different granularity alternatives using serial communication. 10% of links are faulty and data processing is enabled.
Essentially serial communication is slower than the earlier discussed parallel, packet-based communication, and the theoretical delays of the serial communication are even more increased when there is large amount of data to be transferred, for example, in systems with relatively large monitoring clusters. However, in contrast the serial communication is operating in dedicated communication resources which can be used only to this purpose all the time. This way the status values can be updated actually more often than when the monitoring packets are transferred in the shared resources. Somewhat surprisingly the system with
The serial communication could be a useful option when a designer wants to keep the communication resources of different applications separately. The serial approach guarantees that the monitoring communication does not disturb the actual data which is transferred in the network. It may be possible to increase the clock frequency of the serial transmitter from what was used in the presented analysis. In this case the performance differences should shrink.
The DCM system is based on a structure where there is an identical monitor attached to each router. These monitors include both monitoring and probing components. One potential way to reduce monitoring structure complexity is to decrease the number of monitors systematically by removing every
Two patterns of removed monitors. Removed monitors are marked with X.
In addition to performance, the reduction of monitors affects the complexity of the NoC implementation. Routers which do not have their own monitoring component could have less complex routing logic which decreases the router area. This is due to the deterministic routing algorithm which substitutes the adaptive routing algorithm in the routers which do not have a monitoring component. However, the probing components cannot be simplified because they have still have to offer status data to other monitors.
This approach was analyzed using our SystemC-based NoC simulation model and the results are presented in Figure
Throughput with fewer monitors.
No faults
10% of links are faulty
The figure shows that the removal of a monitor, even if it is just every 12th, has notable influence on network throughput and the influence is even more remarkable when there are faults in the network. Removal of every second monitor causes 18% performance decrement in faultless network and 43% in the network with 10% of faulty links. In faultless network the performance is equal with the performance of a system without traffic monitoring. In faulty network there is 2% performance increase compared to the unmonitored system. If just every 12th monitor is removed, the performance decreases by 3% and 10%, respectively.
The removal of monitors has positive impact to area and traffic overheads caused by the monitoring system. However, the total area of the monitoring system is almost negligible compared to the area of a 64-core NoC. This way the removal of monitors cannot be justified with the reduced complexity when the performance decrement is as large as presented here. In application-specific NoCs it could be reasonable to remove monitors from areas where traffic is predictable so that the resources can be sized properly during the design phase and adaptivity is not necessary. However, in our work the focus is on homogeneous general-purpose NoCs so the monitors are placed evenly over the network.
The dynamically clustered monitoring structure for fault-tolerant Networks-on-Chip has been presented and analyzed in this paper. Dynamically clustered monitoring does not require any centralized control. There is a simple monitor and a probe attached to each router in the network. Centralized control is not required but the monitors exchange information with each other. Each router has a dynamic cluster around itself from where a router collects the data it needs for traffic management. The different features of the dynamically clustered monitoring structure were analyzed and their influence on the overall performance of the system were studied. Most of these presented features can be utilized in different NoC implementations with various requirements and limitations. However, due to nature of adaptive, shared-resource system, the presented DCM structure could not be the best solution to systems with strict real-time requirements.
In future works the analysis of individual cores and routers will be improved. In this phase, the NoC simulation model does not enable the analysis of specific senders and receivers but concentrates only on overall performance. This makes it possible to analyze how different monitoring methods and parameters affect performance from a component's point of view.
Performance of the DCM structure could be adjusted by using different sized monitoring clusters in different areas in the network. Areas with low traffic load may work at reasonable performance using very simple deterministic routing algorithms. At the same time in the same system there could be performance critical areas with high traffic loads and tight quality of service requirements. It could be necessary to use larger monitoring clusters and adaptive routing on these areas. Another useful feature could be on-fly reconfiguration of cluster size and the routing algorithm. Performance and energy consumption of the communication resources could be optimized by using more complex mechanisms in the critical areas of the network.
Our SystemC-based NoC simulation model has been proved to be an efficient tool to analyze and simulate different aspects in Networks-on-Chip. The model is reasonably easy to configure for the analysis of different features. The NoC simulation model was used to analyze monitoring algorithms, monitoring data diffusion areas, format of monitoring data and communication as well as number of monitors. The presented research shows that in most cases simple monitoring algorithms and small monitoring cluster areas perform at least as well as more complex implementations. In this paper the most complex structures did not cause significant improvements to performance. However, these structures and algorithms may be developed further in the future works. Another observation is that even small adjustments in the system parameters can have significant influence to the overall performance. Therefore the parameters should be chosen carefully while designing a complex DCM structure.
The authors would like to thank the Academy of Finland, the Nokia Foundation, and the Finnish Foundation for Technology Promotion for financial support.