Abstract models are necessary to assist system architects in the evaluation process of hardware/software architectures and to cope with the still increasing complexity of embedded systems. Efficient methods are required to create reliable models of system architectures and to allow early performance evaluation and fast exploration of the design space. In this paper, we present a specific transaction level modeling approach for performance evaluation of hardware/software architectures. This approach relies on a generic execution model that exhibits light modeling effort. Created models are used to evaluate by simulation expected processing and memory resources according to various architectures. The proposed execution model relies on a specific computation method defined to improve the simulation speed of transaction level models. The benefits of the proposed approach are highlighted through two case studies. The first case study is a didactic example illustrating the modeling approach. In this example, a simulation speed-up by a factor of 7,62 is achieved by using the proposed computation method. The second case study concerns the analysis of a communication receiver supporting part of the physical layer of the LTE protocol. In this case study, architecture exploration is led in order to improve the allocation of processing functions.
In the consumer domain, current trends in embedded systems design are related to integration of high-performance applications and improvement of communication capabilities and mobility. Such functionalities have big influence on system architectures, significantly rising complexity of software, and hardware resources implemented. Typically, hardware resources are organized as multicore platforms consisting of a set of modules like fully programmable processor cores, standard interface modules, memories, and dedicated hardware blocks. Advances in chip technology will allow more resources to be integrated. Consequently, massively parallel architectures clustered by application category will be adopted [
In this context, the process of system architecting consists in optimally defining allocation of system applications on platform resources and fixing characteristics of processing, communication, and memory resources according to functional and nonfunctional requirements. Functional requirements express what the designer wishes to implement whereas nonfunctional requirements are used to correctly tune parameters of related architecture. Typical nonfunctional requirements under consideration for embedded systems are timing constraints, power consumption, and cost. Exploration of the design space is led according to these requirements to identify potential architectures. Performances of candidate architectures are then evaluated and compared. In order to maintain short design time, fast exploration of the design space, and reliable evaluation of nonfunctional properties early in the development process have then become mandatory to avoid costly design iterations. Due to increasing system complexity, evaluation of architecture performances calls for specific methods and tools to assist system architects in creating reliable models.
As reported in [
Simulation speed and accuracy are directly related to the level of abstraction considered to model the system architecture. On both application and platform sides, modeling of computation and modeling of communication can be strongly separated and defined at various abstraction levels. Among simulation-based approaches, the Transaction Level Modeling (TLM) approach has recently received wide interest in industrial and research communities in order to improve system design and its productivity [
This paper presents an approach for creation of efficient transaction level models for performance evaluation of system architectures. Compared to existing works, the main contribution is about a generic execution model used to capture evolution of nonfunctional properties assessed for performance evaluation. This execution model serves as a basic instance to create approximately timed models and it can be parameterized in order to evaluate various configurations of system architectures. Furthermore, it relies on a specific computation method proposed to significantly reduce the amount of required transactions during model execution and, consequently, to improve the simulation speed. This computation method is based on the decoupling between the description of model evolution, which is driven by transactions, and the description of nonfunctional properties. This separation of concerns leads to reducing the number of events in transaction level models. Simulation speedup can then be achieved by reducing the number of context switches between modules during model simulation. The proposed execution model and the related computation method have been implemented in a specific modeling framework based on the SystemC language. The considered modeling approach provides fast evaluation of architecture performances and then allows efficient exploration of architectures. The benefits of this approach are highlighted through two case studies. The modeling approach and the generic execution model are first illustrated through a didactic example. Then, the approach is illustrated through the analysis of two possible architectures of a communication receiver based on the Long Term Evolution (LTE) protocol.
The remainder of this paper is structured as follows. Section
Performance evaluation of embedded systems has been approached in many ways at different levels of abstraction. A good survey of various methods, tools, and environments for early design space exploration is presented in [
The technique called trace-driven simulation has been proposed for performance analysis of architectures in [
Trace-driven simulation is also addressed in the TAPES approach [
Approaches presented in [
The proposed design framework in [
Our approach mainly differs from the above as to the way system architecture is modeled and models of workload are defined. In our approach, architecture specification is done graphically through a specific activity diagram notation. The behavior related to each elementary activity is captured in a state-action table. So, in our approach, models of workload are expressed as finite-state machines in order to describe the influence of application when executed on platform resources. The resulting architecture model is then automatically generated in SystemC to allow simulation and performance assessment. Compared to the related works, a specific attention is paid in order to reduce the time required to create models. In our works, the architecture model relies on a generic execution model proposed to facilitate the capture of the architecture behavior and the related properties. A specific method is also defined to improve the simulation time of models. The proposed modeling approach and the performance evaluation method are close to one presented in [
The modeling approach presented in this section aims at creating models in order to evaluate resources composing system architectures. As previously discussed, model of system architecture does not require complete description of system functionalities. In the considered approach, the architecture model combines the structural description of the system application and the description of nonfunctional properties relevant to considered hardware and software resources. The utilization of resources is described as sequences of processing delays interleaved with exchanged transactions. This approach is illustrated in Figure
Considered modeling approach for performance evaluation of system architectures.
The lower part of Figure
A notation similar to [
Based on this set of rules, the behavior of an activity can be captured using a state-action table notation as defined in [
State-action table for specification of activities.
Current state | Next state | Actions | ||
Condition | State | Condition | Action | |
— | ||||
— | ||||
— | ||||
— | ||||
The first column specifies the set of current states. The second column specifies the next states and the conditions under which the activity will move to those states. The third column specifies assignment of properties under study and production of output transactions. Other actions are not depicted in Table
In our approach, state-action tables are used to capture the behavior and the time properties related to each elementary activity. As a result, captured behavior and related time properties depend on the considered allocation of the application.
Using languages as SystemC, evolution of the model can be analyzed according to the simulated time supported by the simulator. In the following, the simulated time is denoted by
Time evolution of parameter Cc
As depicted in Figure
The time properties used are directly influenced by the characteristics of the processing resources and by the characteristics of the communication nodes used for transaction exchange. These properties could be provided by estimations, profiling existing codes, or source code analysis, as illustrated in [
Furthermore, the temporal behavior related to each activity is relevant to the function allocation. In case of a single processor architecture, functions are executed sequentially or according to a specific scheduling policy. In case of a multiprocessor architecture, behaviors should express the available parallelism to execute functions. In the following, the different allocations will be obtained by modifying the behavior related to each activity.
Moreover, in the notation adopted in Figure
Following this modeling approach, resulting model incorporates evolution of quantitative properties defined analytically and relevant to the use of processing resources, communication nodes, and memories. Using languages as SystemC, created models can then be simulated to evaluate the time evolution of performances obtained for a given set of stimuli. Various platform configurations and function allocations can be compared considering different descriptions of activities. In the following, a generic execution model is proposed to efficiently capture the behavior of activities and then the evolution of nonfunctional properties assessed. This reference model facilitates creation of transaction level models for performance evaluation. State-action tables are then used to parameterize instances of the generic execution model.
A generic execution model is proposed to describe the behavior of activities and then to easily build architecture models. This execution model expresses the reception and production of transactions and the evolution of resources utilization. Its description can be adapted according to the number of input and output relations and according to the specification captured in the associated state-action table. The proposed execution model is illustrated in Figure
Proposed execution model for activity description.
As depicted in Figure
Evolution of assessed property Cc
The application of this modeling style to the activity specified in Table
Application of proposed execution model to a specific activity description.
Figure
Evolution of activity
The Figure
Next section details this computation method and how it can be applied to improve the simulation speed of performance models.
As previously discussed, the simulation speed of transaction level models can be significantly improved by avoiding context switches between threads. The computation method described in this section relies on the same principle as temporal decoupling supported by the loosely timed coding style defined by OSCI. Using this coding style, parts of the model are permitted to run ahead in a local time until they reach the point when they need to synchronize with the rest of the model. The proposed method can be seen as an application of this principle to create efficient performance models. This method makes it possible to minimize the number of transactions required for the description of properties assessed for evaluation of performances. Figure
Comparison of two modeling approaches (a) a transaction-based modeling approach and (b) a state-based modeling approach.
Figure
The evolution of property Cc
Evolution of property Cc
The Figure
The proposed generic execution model makes use of this computation method to provide improved simulation time of performance models. In order to validate this modeling style, we have considered the implementation of the proposed execution model in a specific modeling framework.
The proposed execution model has been implemented in the framework CoFluent Studio [
Graphical modeling in the CoFluent Studio framework to implement the proposed execution model.
In Figure
The procedure
We aim at illustrating application of the proposed generic execution model and the computation method previously presented in Section
Performance model of a 3-stage pipeline architecture.
The lower part of Figure
A first performance model has been defined by using the state-based modeling approach presented in Section
Behavior of
Current state | Next state | Actions | ||
Condition | State | Condition | Action | |
InputStage3 | — | |||
— | ||||
— | ||||
OutputSymbol |
Figures
Time evolution of the computational complexity (in MOPS) of the third pipeline stage.
Time evolution of the global computational complexity (in MOPS) of the pipeline architecture with configuration considered.
Figure
Figure
The lower part of Figure
A second performance model has been defined at a lower level of granularity following a transaction-based modeling approach as presented in Section
This section highlights application of the proposed generic execution model to favor creation of transaction level models and to perform architecture exploration. The case study considered here concerns the creation of a transaction level model for analysis of processing functions involved at the physical layer of a communication receiver implementing part of the LTE protocol. This protocol is considered for next generation of mobile radio access [
Activity diagram of the LTE receiver studied.
A single-input single-output (SISO) configuration is analyzed. Figure
Figure
Proposed model to analyze performances obtained with different architectures.
The lower part of Figure
State-action table for specification of the
Current state | Next state | Actions | |
Condition | State | Action | |
PilotSymbol | |||
PilotSymbol | |||
States
Each activity has been captured by using the proposed generic execution model. Activities
Time constraints applied for each activity according to architectures.
Architecture | TCOFDMDemod | TCChanEstim | TCEqualizer | TCTurboDecoder |
---|---|---|---|---|
I | 0,2* | 0,12* | 0,04* | 0,08* |
II | 71428 ns | 14285 ns | 14285 ns | 7142 ns |
These values have been set to guarantee processing of a LTE subframe each 1 ms. In case of architecture I, OFDM demodulation, channel estimation, and equalization functions are executed sequentially on the processor
In case of architecture II, each function can be executed simultaneously. TCOFDMDemod is equal to the reception period of input transaction
Figure
Evolution of the computational complexity per time unit of (a) processor
The Figure
Evolution of the estimated memory cost (in Kbit) for architecture I.
The memory cost evolves each time a transaction is received or sent by one of the three functions studied. Instants (1) correspond to the evolution of the memory cost after reception of a packet of data by the symbol demapper function. The amount of data stored in memory increases each time a packet of data is received by the symbol demapper. At instant (2), the complete LTE subframe has been processed at the physical layer level and the resulting data packet is produced. The maximum value achieved with the considered LTE subframe is estimated to be 1920 bits.
Figure
Evolution of the computational complexity per time unit for architecture II.
The upper part of Figure
Table
Maximal computational complexity and utilization of the processing resources observed for the two architectures considered.
Architecture I | Architecture II | |||||
Maximal computational complexity (GOPS) | Resource usage (%) | Maximal computational complexity (GOPS) | Resource usage (%) | Maximal computational complexity (GOPS) | Resource usage (%) | |
OFDMDemodulator | 1,612 | 20 | — | — | 0,322 | 100 |
ChannelEstimator | 0,406 | 48 | — | — | 0,244 | 80 |
Equalizer | 0,151 | 12 | — | — | 0,03 | 80 |
TurboDecoder | — | — | 221,77 | 8 | 177,49 | 10 |
Application | 1,612 | 80 | 221,77 | 8 | 177,812 | — |
Observations given in Figures
The main benefit of the presented approach comes from the adoption of a generic modeling style. The modeling effort is then significantly reduced. We estimate that the creation of the complete model of the LTE Receiver architecture took less than 4 hours. Once the model created, parameters can easily be modified to address different architectures and simulation time is fast enough to allow exploration. In the presented modeling approach the simulation method used makes possible to run ahead evolution of the studied properties in a local time until activities need to synchronize. This favours creation of models at higher abstraction level. Synchronization points are defined by transactions exhibited in the architecture model. Then, this approach is sensitive to estimates related to each function and further works should be led in order to evaluate the simulation speed-accuracy tradeoff.
Abstract models of system architectures represent a convenient solution to maintain the design complexity of embedded systems and to enable architecting of complex hardware and software resources. In this paper, we have presented a state-based modeling approach for the creation of transaction level models for performance evaluation. According to this approach, system architecture is modeled as an activity diagram and description of activities incorporates properties relevant to resources used. The presented contribution is about a generic execution model defined in order to facilitate the creation of performance models. This model relies on a specific computation method to significantly reduce the simulation time of performance models. The experimentation of this modeling style has been illustrated through the use of the framework CoFluent Studio. However, the approach is not limited to this specific environment and it could be applied to other SystemC-based frameworks. Further research is directed towards validation of estimates provided by simulation and applying the same modeling principle to other nonfunctional properties such as dynamic power consumption.