Title: Model-Integrated Tools for the Design of Dynamically Reconfigurable Systems

Several classes of modern applications demand very high performance from systems 
with minimal resources. These applications must also be flexible to operate in a rapidly 
changing environment. Achieving high performance from limited resources demands 
application-specific architectures, while flexibility requires architectural adaptation 
capabilities. Reconfigurable computing devices promise to meet both needs. While these 
devices are currently available, the issue of how to design these systems is unresolved. 
This paper describes an environment for design capture, analysis and synthesis of 
dynamically adaptive computing applications. The representation methodology is captured 
in a Domain-Specific, Model-Integrated Computing framework. Formal analysis 
tools are integrated into the design flow to analyze the design space to produce a 
constrained set of solutions. HW/SW Co-simulations verify the function of the system 
prior to implementation. Finally, a set of hardware and software subsystems are 
synthesized to implement the multi-modal, dynamically adaptive application. The application 
executes under a runtime environment, which supports common execution 
semantics across software and hardware. An application example is presented.


INTRODUCTION
Modern high-performance embedded systems, such as Automatic Target Recognition for Mis- siles or Dynamic Protocols Mobile Communica- tions devices, face many challenges.Power and volume constraints limit hardware size.Accurate, high-performance algorithms involve massive computations.Systems must respond to demand- ing real-time specifications.In the past, custom application-specific architectures have been used to satisfy these demands.
*Address for correspondence: 400 24th Ave.S., Nashville, TN 37235.Tel.: 615-343-6709, Fax: 615-343-6702, e-mail: bapty@ vuse.vanderbilt.eduThis implementation approach, while effective, is expensive and relatively inflexible.As the world demands flexible, agile systems, the hardwired application-specific architectures fail to meet re- quirements and become expensive to evolve and maintain.As new algorithms are developed and new hardware components become available, a fixed, application specific architecture will require significant redesign to assimilate the technologies.
Flexible systems must function in rapidly changing environments, resulting in multiple modes of operation.On the other hand, efficient hardware architectures must match algorithms to maximize performance and minimize resources.Structurally adaptive, reconfigurable architectures can meet both these needs, achieving high performance with changing algorithms.Reconfigurable computing devices, such as Field Programmable Gate Arrays allow the implementation of architectures that change in response to the changing environment.
The field of Reconfigurable Computing is rapidly advancing for scientific and Digital Signal Processing applications [1][2][3].While today's Field Programmable Gate Array technology shows great promise for implementing reconfigurable computational systems, their capabilities in certain areas (such as floating point arithmetic) cannot equal other technologies.For this reason, efficient system architectures must encompass a hetero- geneous mix of the most suitable technologies, along with the capability to dynamically restruc- ture the system architecture.The target systems are built on a heterogeneous computing platform: including configurable hardware for computation and structural adaptation, and ASIC's, general- purpose processors and DSPs for computation.The primary difficulty in this approach lies in sys- tem design.A designer must now maintain a set of diverse system architectures, which exist at differ- ent times in the system's lifetime, and map these architectures onto the same group of resources.The designers must manage the behavior of the system, determining the operational modes of the system, the rules for transitioning between operational modes, and the functional properties within each operational mode.In addition, the sys- tem must make efficient use of the resources, enabling the designer to minimize the envelope of hardware required to support the union of all operational modes.Current system design tools are insufficient to manage this complexity.

State of the Art
The standard methods for the design of hardware systems use VHDL specifications with use of off- the-shelf components, such as libraries of parameterized modules (LPM).This approach allows hierarchical design of large systems of a fixed structure.The ability to specify multiple behav- ioral designs for a single entity allows a complex design space to be created.The choice of con- figuration of a system is largely a manual process.Using this approach to design a dynamically reconfigurable system would involve multiple designs, with only a manual linkage between the individual design modes.Custom runtime support would be added to manage the reconfiguration.This approach relies too much on manual, infor- mal interaction between modes.The pure VHDL approach also provides little support for software/ hardware interaction.
The Ptolemy design environment [4] has recently been extended to support reconfigurable hardware.Ptolemy is a comprehensive package that supports hardware/software codesign and heterogeneous architectures, spanning from micro- processors/DSP's to FPGA's to MEMS devices.It is unclear about the support for dynamic re- configuration design and runtime support.DEFACTO from USC/ISI [5] is a high-level design tool for design and synthesis of recon- figurabl systems.In DEFACTO algorithms are specified in a high-level programming language such as C or MATLAB.The target architecture is specified in an architecture description language.The target architecture is assumed to be consist- ing of a general-purpose processor (GPP) and a number of configurable computing units (CCU).A parallelizing compiler partitions computation and control among the GPP and the CCUs, and also manages data storage and communications.The primary drawback of this approach is in the use of sequential programming languages in system specification.The effectiveness and scal- ability of programming languages in specifying large, complex systems with parallel interacting components is not determined.Runtime support for dynamic reconfiguration has not been ad- dressed.MATCH from NWU [6] employs a simi- lar approach, however, the system specifications are solely in MATLAB.
The DISC project at BYU [7] employs dynam- ic reconfiguration to support demand-driven instruction set modification.It can partially recon- figure an FPGA device to page custom application- specific instructions in and out of the device, on demand.While the approach takes advantage of the partial reconfiguration of the instruction set to improve the functional density, it does not address the concerns in performing global system level reconfiguration.
The CHAMPION project at UTK [8] uses KHOROS, a popular design tool for Image and Signal processing applications, to capture system specifications.These specifications are then auto- matically transformed, partitioned, and mapped to a distributed, heterogeneous resource network, consisting of DSPs and FPGAs.The objective of the partitioning process is to partition not only in space but also in time.However, it is not clear as to how the reconfiguration process will be orchestrated at runtime.Several design approaches exist for low-level dynamic reconfiguration.Some of these involve unique FPGA designs with multiple contexts al- lowing rapid switching between a limited number of unique designs.Tools for high-level system de- sign using these components in a heterogeneous system have not yet appeared.

Target of this Paper
High-level design tools are being developed to capture designs and to generate functional systems as part of the DARPA Adaptive Computing Systems Program.This paper describes a model integrated approach, to be used in the develop- ment of reconfigurable systems.There are many significant issues in the development process.The approach described here divides these issues into several categories: (1) Representation and Capture of design information in terms of Models; (2) Analysis of the models for design/requirements/ resource trade-off studies; (3) Synthesis of archi- tectures and executable systems directly from the models; and (4) Runtime support environments to support efficient execution of the synthesized reconfigurable systems.
The Model-Integrated Computing (MIC) ap- proach has been successfully applied to a diverse set of applications [9][10][11][12][13][14].The general MIC approach involves creating a development en- vironment that is customized for a specific application domain.The resultant development environment is a multiple-aspect graphical editor that directly supports the engineering concepts required in the development process.Where sev- eral engineering disciplines are involved in system development (e.g., Software, Hardware, DSP algorithms, Systems Requirement Specification, etc.), the multiple-aspect nature of the approach allows different aspects to be customized for individual disciplines.The graphical editor allows construction of system Models, which capture the specifications and components required along with their relationships.The Models form a database of design information that can then be used in system analysis, trade-off studies, and performance estimation/simulation.These same Models are used to synthesize the executing sys- tems.The synthesis process assumes a runtime environment that hides the low-level hardware/ software details from the synthesis process.
This paper will follow a logical progression in describing the Model-Integrated Computing ap- proach for adaptive systems design.The first section will describe the rationale and implementation of the design capture approach.The next section will give an overview of the current and planned analysis capabilities for design-space for transition), and the specifications for sys- exploration.The following sections will describe tem operation while in each operating mode.the system synthesis process and the runtime 2. Algorithm/Structural Modeling In this cate- environment architecture and implementation.Finally, we will show the implementation of a mis- sile Automatic Target Recognition application incorporating adaptive system behavior.

DESIGN REPRESENTATION
The customization of the Model-Integrated Computing design environment involves a careful analysis of the needs of the design engineers, the methods and components used in the designs, and the target systems.For an environment to successfully support the creation of systems, the concepts used by designers must be faithfully reproduced by the design environment.The criti- cal concepts required for dynamically adaptive computing architecture targets include the ability to specify the dynamic behavior of the system gory, potential algorithms are described.The algorithms define signal flow specifications to compute required system outputs.
3. Resource Modeling The resource models de- scribe the hardware available for construction of the system.This consists of physical processors, devices, and the interconnection topology.4. Constraint Specification These modeling cat- egories are augmented and linked together with a Constraint framework.The Constraints allow user-defined interactions to be specified, establishing linkages between properties in one category and objects in the same or an- other category.This modeling category also allows linkages between processing modules in adjacent modes, to guide the transition of state information between computational structures.
(in terms of Modes), the function of the system Behavioral Models in each of these modes (inputs, outputs, and algorithms), and the resources available.Each of these concepts can be modified by constraints known to the designer.Interactions between the concepts are captured via references within the models.The integration of multiple modeling con- cepts with the constraints and references are critical in the control and optimization of the dynamically adaptive behavior of the target system.This section will describe the concepts devel- oped in the creation of the Adaptive Computing Systems MIC environment.
The Adaptive Computing Systems (ACS) environment divides the design process into four major categories: Behavioral models capture the dynamic behavior of the system and the potential interactions be- tween modes of operation Since the system will be operating in discrete modes, with specific transi- tions between these modes, a Discrete Finite State Machine formalism is used [15] (Fig. 1).States define operational modes of the system.Transi- tions define the potential conditions required for the system to change modes and the end-state of the mode-shift.In order to manage system com- plexity, where the system may have many potential operational modes, states have hierarchical decomposition.
The event expression that triggers a mode change is defined by the transition rules.A tran- 1. Behavioral Modeling Dynamically adaptive sition rule is a Boolean equation composed of systems must manage multiple system be- event variables.When the transition rule expreshaviors.In this first category, the adaptive sion is satisfied, system reconfiguration occurs.behavior is defined.The designer can specify Inputs to the transition rules, event variables, are the operating modes of the system, the legal computed in the Algorithmic/Structural model- transitions between modes (and the conditions ing view described below.be directly sampled external signals or complex computational results.
The behavioral modeling aspect is linked to the Algorithmic/Structural aspect by References.A Reference allows the user to establish a point- er from the mode to a defined computational algorithm.Each mode references a model in the Structural Aspect that defines the processing algorithm that is to be operational in that mode.The references allow a single algorithm to be applied to any number of system states.
The behavioral modeling aspect also allows the specification of constraints, such as real-time requirements and maximal runtime power usage.Maximal permitted system delays can be specified for any pair of input and output ports on the al- gorithm model.Maximum power limits are speci- fied using attributes of the models.Constraints capture the system performance requirements.

Algorithm/Structural Models
The structural modeling aspect is used to define the processing algorithm structure.Algorithms are described in terms of computational components and data interactions.To manage system com- plexity, a multi-level hierarchy is used to structure algorithm definition.
The algorithm is modeled as a dataflow struc- ture with the three classes of objects: compounds, primitives, and templates.The relationship between these objects is captured in Figure 2.
A primitive is the basic element, representing a numerical processing operation.A primitive maps directly to an implementation in either hardware or software.Primitive objects are annotated with attributes, which capture measured performance, resource (memory) requirements, and other user- defined properties.
A compound is an aggregation object that contains primitives, other compounds, and/or templates.These components can be connected within the compound to define the algorithmic dataflow.Compounds provide the hierarchy in the structural description that is necessary for managing the complexity of large designs.
A Template object captures a design alternative.
The Template allows the specification of multiple algorithm architecture alternatives for a given task.These design alternatives can be composed of Compounds, Primitives, or other Templates, allow- ing hierarchies of design alternatives.The selection among the alternatives occurs at model interpreta- tion, allowing design flexibility to be specified.
Design alternatives allow the model of the system to capture a range of design possibilities.

Object Hierarchy
Example Model The large design space gives the tools the freedom to search for and select an implementation that meets the specified requirements and fits within available resources.Each of these alternative methods has different performance attributes and different hardware requirements.The selection of the best alternative depends not only on the hardware that is available, but also on whether the hardware is to be time-shared, and what hard- ware is already allocated to support the processing algorithms that are required for operations in different modes.
For the high-level designer, algorithm alterna- tives allow a virtual separation of algorithm from implementation.Typical algorithm design requires the engineer/physicist to consider the hardware details of the underlying architecture to achieve an efficient implementation.The ultimate effect is that the resulting algorithm reflects the hardware structure.This practice leads to highly non- portable, technology-specific designs.System up- grades to use more modern technology require a bottom-to-top redesign.Algorithm alternatives separate the algorithm from the architecture, to postpone the implementation decisions to a much later step in the design process, simplifying tech- nology migration efforts.

Resource Models
Resource models define the target hardware plat- form.Resources are modeled in terms of hardware components and the physical connections among them.The relationships among the resource model components are shown in Figure 3.
The top-level hardware system is a Network of components.Network components are either programmable processor elements (such as DSPs or standard RISC/CISC processors), programmable logic components (such as FPGAs), or dedi- cated hardware ASIC components for fixed func- tions (such as FFT computation).Data Sources and Data Sinks capture the specifics of hardware I/O interfaces and data acquisition/effector interfaces.
The components are constructed using cores and ports.Every processing element must contain one core.The core object captures the inherent Object Hierarchy performance attributes of the processing element such as clock speed, memory, and other resources.
A port represents a physical communication chan- nel.Ports have associated protocols and specific pin assignments, capturing physical connection points on a chip.Connections between processing elements are created by connections between ports.The connections capture the "as-built" topology of the physical implementation.

Constraint Specification
System constraint specifications have four cate- gories of design constraints: (a) operational con- straints, (b) composability constraints, (c) resource constraints, and (d) performance constraints.These constraints establish linkages between modeling object properties in different modeling categories.
Operational constraints express conditions re- lating design configurations to operational modes.These constraints are applied within the Behavior- al models.Operational constraints can be used to restrict implementation alternatives based upon current operational mode.
Composability constraints are logic expressions that restrict the composition of alternative processing blocks.These constraints express compati- bility between related implementation options.
For instance, if hardware FFT is selected from different alternatives in a processing block, then the hardware IFFT must be selected in a related processing block.
Resource constraints are logic expressions de-  scribing the selection of processing blocks based on resource limitations.These constraints allow hardware requirements to be specified for software components.For example, a software component may require access to a large block of memory or to a hardware unit.
Performance constraints are integer constraint expressions governing the end-to-end latency, throughput, power consumption, and/or space/ volume.Some of the performance constraints are implicitly specified in the properties of Behavioral models.These constraints allow the designer to control the potential design space for the analysis/ synthesis process and to ensure that the synthesized system satisfies real-time constraints.

MODEL ANALYSIS
A design described in the modeling environment defines a design space consisting of modes and requirements, potential implementations, and re- source sets.The designer must select appropriate combinations of implementations and resource assignments for all of the desired operational modes.Given the flexibility in defining design alternatives, this space can be extremely large (moderately sized design examples have defined a space of 1024th).A designer cannot handle such a large design space without sufficient tools.The design space must be evaluated to find a set of designs (mode configuration pairs) that best satisfy the design criteria.
There are a large number of conflicting design criteria in reconfigurable systems.These criteria must be applied across all of the system operational modes.The processing needs of each of the system modes must be satisfied with a single shared hardware platform.The analysis tools must allow efficient exploration, navigation, and prun- ing of this space to select feasible hardware/ software architectures for user-definable cost functions such as weight, power, algorithmic accuracy and flexibility.Given the size of the design space, and the complexity of the analysis, a powerful, scalable analytical method is required.

Constraint Satisfaction Using Symbolic Methods
The design space exploration tool uses a symbolic method based on Ordered Binary Decision Dia- grams to represent, navigate, and prune the design space.In this symbolic representation, sets/spaces are represented as a Boolean expression over the members of the set.The members of the set are encoded as binary variables under a binary en- coding scheme.The principal benefit of the approach is that it does not require enumeration of the set/space to perform operations.
Ordered Binary Decision Diagrams [16, 17] are a canonical representation of logic functions, re- presenting Boolean functions as directed acyclic graph in a memory-efficient format.The opera- tions over the Boolean functions are implemented as graph algorithms, thereby rendering "manip- ulation" of the space fast and efficient.With this symbolic formalism, the application of logical constraints is relatively straightfor- ward.The user-defined logical constraints can be represented as a Boolean expression over the components of the design space.Constraint appli- cation is a conjunction of the constraint Boolean expression with the Boolean expression that re- presents the design space.The resultant Boolean expression represents the "constrained" design space.Application of the integer arithmetic con- straints such as timing and power constraints requires further analysis (see [20] for details), how- ever the basic approach remains the same.
While the approach scales well, in very large design spaces with many constraints applied an exponential explosion of the OBDD's can occur.
To address this problem we support hierarchical constraint processing.The constraint processing is done hierarchically with constraints scoped to a particular level; i.e., constraints are applied to sub-spaces first, pruning them to the extent pos- sible and then progressing upwards in the hierar- chy.This technique is very effective when there are a large number of constraints with a limited scope.The technique is not effective when there are many globally scoped constraints in a large design space.
The constraints "prune" the design space by enforcing the requirements specified in the con- straint.These constraints can be iteratively applied to the design space, with the goal of reducing the "10 24th'' to a more manageable, 10-1000 design alternatives.We have implemented the approach described above in a design space exploration tool.Design engineers can iteratively apply constraints and visualize the sensitivity of the design space to the constraint.If a constraint is extremely tight, its application can eliminate the design space alto- gether.In this case, the constraint can be released and other constraints can be applied instead.The outcome of the constraint satisfaction step is a set of design configurations much smaller than the original design space.Figure 4 shows a screen capture of the design space exploration tool.When the design engineer is satisfied with this constrained design set, the design process contin- ues with simulation.HW/SW Co-simulation   The model constraints encode the behavior of the system with a relatively high level of granularity.
While a large granularity is necessary to work with tremendously large design spaces, the preci- sion of this approach can be too low.The designer will be required to accept designs that are near the fringes of the constraint envelope, due to constraint uncertainty.To establish a more accu- rate estimate of in-system performance, a simu- lation capability is required.Since the target of the tool is hardware and software, the simulator must support co-simulation.While this research is still at its early phases, the current approach is to allow the system designer to perform co-simulation at three levels of abstrac- tion.These levels provide a trade-off between execution speed and accuracy of results.Simu- lation can occur at the performance level, the algorithm level and the gate/instruction level.This will enable the designer to quickly "zoom-in" on the more viable design alternatives and perform more accurate simulations only on this subset.
To be useful for rapid design space exploration, the co-simulation environment must be seamlessly integrated with the rest of the system.Information used to automatically construct the simulation testbench at various levels is directly extracted from the model database to ensure consistency among various levels of detail.Different levels of simulations can be generated that use different, possibly overlapping subsets of the model data- base.Output from the simulation is interpreted and fed back in a high level form to the user in the same design environment.This is an important aid in the interpretation of the data.In addition to simulating each mode individually, the process of mode shifts must be simulated to estimate the cost of reconfiguration.
At the performance level, only the performance of the structural model is simulated.Performance attributes, such as latency and throughput, asso- ciated with processing primitives are used to con- struct a network of delay models for the system.Data flow is represented by tokens for faster simulation via packages such as PML [18,21].No distinction of hardware versus software im- plementation is made at this level, however differ- ent components will have different performance attributes.The output of this step will be an overall performance assessment of the proposed algorithm, flagging the critical components or hot spots of the system.
At the algorithm level, the functional computa- tion is simulated.The simulation does not include low-level timing details, allowing the user to quickly verify the correct numerical functionality of the system.Hardware functions are described in VHDL and software functions are described in C and encapsulated in a VHDL-wrapper entity.A commercial VHDL simulator equipped with a foreign language interface will be the target for mapping.
The lowest level of abstraction is the gate/ instruction level co-simulation.At this level, a HW/SW co-simulation environment is constructed that models the system platform as described in the resource models of Section 1. VHDL simula- tion models will be used to describe hardware components such as ASICs and FPGAs.Proc- essor models can range from full functional models that mimic the internal architecture of the processor to simple bus functional models that only describe the interaction of the processors with external components but do not mimic the internal architecture [22].The former is usually too expensive in terms of execution speed and also difficult to construct from scratch for com- plex processors.The latter approach is more suitable for debugging the hardware portion but not well suited for viewing software execution.
An intermediate approach is to use an instruction set simulator (ISS) coupled with a bus functional model (BFM) to model the processor, such as described in [23].The ISS will be used to simu- late software execution while the BFM will mimic the interaction with the external circuitry.
Synchronization techniques between the ISS and the BFM are needed to keep the simulation realistic.
Numerical simulations can be performed at the algorithmic level by generating Matlab code.
A Matlab program is generated by selecting the "*.M" file representing each of the processing leaf nodes.A combined Matlab program is generated that is a numerically accurate version of thee data flow diagrams.With the proper Matlab functions, precision effects on numerical accuracy can be studied.This capability is used to verify algorithm correctness of the data flow graph as drawn.

SYSTEM SYNTHESIS
By the time the user is ready for synthesis, the tools have been used to capture system requirements, design information and alternatives, and the resources available for system implementation in the form of Models.The constraints developed during the design representation phase, have been applied to the design space to define a manageable set of implementation alternatives.Expected performance has been estimated using the Co-Simulation tools, providing further assurance that the system will function to design specifications.The selected design alternatives must now be transformed to software and hardware for system implementation.We refer to this process as the model interpretation/system synthesis phase.
A model interpretation process generates hard- ware architecture specifications, software modules, process/schedule tables, communications maps, synthesizable hardware specifications, and a run- time Configuration Manger for dynamic adap- tation to changing environments.The synthesis process attempts to optimize hardware/software architectures for user-definable cost functions such as weight, power, algorithmic accuracy and flexibility.

Configuration Manager Synthesis
At this point, the synthesis procedure can gener- ate the actual runtime artifacts.The state-based behavior of the system was defined in the Behavior Models.From the behavioral models, a compact state table is produced for the Configuration Manager.The table contains next state equations for each operational mode.Interfaces to internal and external events are generated that provide the state transition variables to the state machine.These tables and variable interfaces are created in a form to allow direct execution by the config- uration manager.The Configuration Manager core library is compiled along with the compact state table to generate an executable configura- tion manager.This configuration manager will be executed on the system control processor.

Hardware Synthesis
For each configurable component (FPGA), a design specification is generated.This design specification includes a hardware design file for each mode.The design for a component for each mode is specified in structural VHDL.The VHDL design incorporates computational components from the design library.The library can contain user-defined VHDL behavioral/structural descrip- tions and vendor-supplied Intellectual Property (IP) modules.These modules are combined using components from a standard interface runtime library.The interface library is a key component of the Runtime Environment, described later.These interfaces connect computational components on the same chip with simple FIFO's and asynchronous handshaking interfaces.When the communication must occur across chip bound- aries, or between software and hardware components, a set of more complex interface components are used.These interface components manage the physical hardware resources (pins and wires), buffer data, and multiplex multiple logical com- munications across a single set of wires.Where required, data format conversions are supplied.
These VHDL files are then compiled using vendor-supplied/COTS VHDL compilers and part-specific Place-and-Route tools.The result is a set of "bitfiles".One bitfile is generated for each reconfigurable hardware device for each mode.Given the current state of the FPGA market, demand has not yet forced the vendors to pro- vide partially reconfigurable devices and support tools (Xilinx Virtex parts are said to be partial- ly configurable, however the compiler/PPR tools have not shown documentation and support for this mode of operation.Earlier generations, the XC62xx family, were dynamically partially recon- figurable however support has been dropped for these chips).For these reasons, we treat each FPGA as an atomic part, configurable only with a full device reset.A partially reconfigurable device is simulated by aggregating multiple stand- ard fully reconfigurable FPGA's.The approach proposed here will work for partially reconfigur- able devices when the tools become available.In order for this approach to work with standard compilers/PPR tools, the vendor tools must pro- vide methods for floor planning to restrict logical design components (i.e., all components within a single mode) to non-overlapping, regions that coin- cide with legal chip reconfiguration boundaries.

Software Synthesis
For the general-purpose RISC/DSP components, a set of software implementations is generated.These implementations provide the information needed by the Runtime Environment to enact the desired computational behavior.The Runtime Environment requires several categories of design files: Software Load Modules contain executable modules that are downloaded to the processors in the system.The system can generate a common load module that contains the superset of all executable functions (if memory is sufficient) or it will generate a customized mod- ule for each of the processors in the system.The customized module is clearly more memory- (implemented on RISC and DSP processors), efficient, configurable hardware on FPGAs, and a mix Real-time schedules contain the list of processes of ASIC components.
and their priorities.A unique schedule is gen- Low Overhead/High Performance The runtime erated for each processor and for each mode of environment must minimize overhead, since operation, overhead results in extra hardware require- Communication maps describe the information ments.flow between processes.These "streams" can Hard Real-Time The target systems have sig- perform communication between two modules nificant real-time constraints.on the same processor, or they can transport Reconfiguration The execution environment data across the network, through intermediate processors, and to a remote process anywhere in the system.
Interfaces between software modules and hard- ware modules, and data sources/sinks are auto- matically inserted during the synthesis process.
must allow hardware and software resources to be reallocated dynamically.During reconfig- uration, the application data must remain consistent and real-time constraints must be satisfied.
These issues must be addressed at multiple These interfaces perform the management of levels.At the lowest level, the hardware must be hardware interfaces, converting complex commu- capable of reconfiguration.Software-programmanication protocols into simpler hardware com- ble components, such as DSP's and RISC proces- patible protocols.The interfaces also multiplex sors, have excellent inherent hardware support multiple logical streams over a single physical for reconfiguration, since software has the ability port and perform data conversion functions.It to change system function by changing memory is the responsibility of the synthesis process to contents.Internal CPU hardware structures are ensure that adequate bandwidth exists on each designed to restrict dangerous conditions that port for the data flow through that port.could damage hardware.FPGA's, on the other These design files are processed into a set of hand, are an unrestricted collection of gates, object modules and tables for inclusion in the switches, and connectors.The safeguards built configuration manager and for direct download into CPU's do not exist and must be enforced into the array of parallel processors, manually.This protection must be provided by The result of the synthesis and post processing a cooperation of the design process and the run- is a complete executable system, ready for de- time infrastructure.ployment.The deployment is performed in con- cert with the Runtime Environment.

RUNTIME ENVIRONMENT FOR DYNAMIC RECONFIGURATION
The runtime environment must support implementation platforms with the following attributes: At a slightly higher level, the internal state of software must be managed under changing task- ing.Modern operating systems have evolved to support the flexible implementation of multiple tasks, with dynamic addition and removal of tasks on a single processor in the form of time-sharing and/or multitasking, and Real-time kernels allow time critical tasks to be dynamically scheduled on a single processor.These kernels typically Heterogeneity Optimizing the architecture for do not address the consistency of dynamic recon- performance, size, and power requires that the figuration for distributed networks of tasks.most appropriate implementation techniques be Finally the issues of application-specific require- used.Implementations will require software ments must be addressed, to allow the peculiar requirements of specific numerical performance and timing to be achieved in an implementation.Potential solutions to these issues with con- sistency are addressed in the next section.
Hardware/System Consistency During Reconfiguration The runtime system must avoid operational defects during a reconfiguration event.Lack of hardware consistency can have many negative effects, from temporary loss of performance in an operational mode to hardware damage and total, permanent system malfunction.Typically, these deal with specific issues involving interfaces be- tween hardware processes and/or devices.Some of these defects are illustrated in Figure 5.
Port contention occurs when bi-directional ports are improperly initialized, a reconfiguration event is not properly sequenced/synchronized, or if an improper/inconsistent design is implemented.
In this case, two connected drivers are enabled.
If resistance is sufficiently low, permanent physi- cal damage can occur to the circuits.Token loss or duplication results from incor- rect initialization or a loss of communication in- tegrity.Tokens represent the status of empty or full slots in a communication interface.An extra token on the sender side can cause too much data to be sent, resulting in a FIFO overrun.A lost token can effectively block a communication port, resulting in a system deadlock.
Device state maintenance refers to the control of a complex external hardware device, such as an attached processor or storage device.In control- ling an external device, the controlling computa- tional component must maintain an accurate representation of the device's state.If a reconfi- guration occurs during a state transition within the device, or if the reconfiguration modifies the computational component's representation of the device, there can be a state mismatch.This can result in improper commands being sent to the device, or in a deadlock where both components are waiting on each other for triggering events.
These three examples show some of the poten- tial hazards that can occur when the hardware device is improperly reconfigured.Runtime recon- figuration support must not permit any of these conditions to occur.

Software/OS Consistency
Software issues can present a larger challenge to dynamic system reconfiguration.While the hard- ware built into standard microprocessor devices protects against low-level hardware conflicts, there are many more details that must be managed.Figure 6 below summarizes some of the potential problems from an improper reconfiguration.The example shows an initial configuration of 3 processes (A, B and C) in the normal opera- tional state.A reconfiguration occurs, changing to a new configuration.The new configuration replaces these process A with A', C with C' and removes Process B altogether.The bottom half online, the new filter is operating out of sync with the original filter.A rapid switchover will create a discontinuity in both the signal and its first derivative.In a closed-loop system this might lead to strong transients in the con- trolled variable.
of the figure shows the new configuration, along 2. The system can fail to maintain real-time with the potential errors.Memory leaks will adversely affect long-term reliability.Task structure mismanagement results in extra tasks executed by the kernel, with a loss in performance.Messages in transit can be delivered when the receiving process no longer exists, resulting in mis-matched messages and communication errors.constraints during reconfiguration.If the re- configuration cannot be completed in sufficient time, deadlines will be sacrificed.In addition, the timebase can be shifted, resulting in a skew in system output period.

Runtime Reconfiguration Strategies
Application-level Consistency At a higher level, the application's requirements and implementation details impose restrictions in the reconfiguration process.Typically, these attributes are highly application-specific. Two examples of consistency requirements are dis- played in Figure 7 below.
It is clear that reconfiguration support must be built into the design approach, from the lowest levels of the execution environment, to the highlevel design/requirements capture tools.The extent of support is defined by the requirements of the target systems.The driving factors include how fast the system must reconfigure, whether inter- mediate states must be preserved (Application Signal Continuity), and if timing must be .pre- 1.An external system may require signal output served.We now examine the potential reconfig- continuity and/or continuous first derivative uration strategies and their impact on system properties.In the example, which swaps filters capabilities.Period Skew FIGURE 7 Maintaining application consistency through reconfiguration.

Reboot Strategy
The simplest reconfiguration strategy is termed the "Reboot" approach.It involves the orderly shut- down of tasks, bringing the system to a known, clean state.From this state, a new processing structure is constructed (Fig. 8).The implementa- tion for this approach is simple, requiring the minimum amount of non-standard support from the execution environment and there is no need for additional processing capability for overlapping modes.
The drawbacks of this approach are severe.The system is offline during the reconfiguration time.No events can be handled, so a system under control is open-loop during that time.There is no provision for preservation of state.This can lead to long recovery times when the new config- uration is started.Both of these factors lead to system application transients, both in timing and signal continuity.This approach is not suited for the majority of embedded, closed-loop systems.

State Transition Approach
The second approach allows the insertion of transitory states between the major system operating modes (Fig. 9).These states allow the system to take smaller steps between operational modes to approximate a continuous-time transition, result- ing in smaller transients.The intermediate con- figurations inherit state from their predecessors.The intermediate algorithms must be designed to gradually shift system behavior.While not con- tinuous, the steps can be made arbitrarily small.This approach has several positive aspects.The state preservation allows transients to be mini- mized.The magnitude of the steps can be chosen by the designer to minimize key application behaviors.Few spare resources are needed, since the system is operating in only one mode at a time.The flexibility is limited only by the design- ers and by the time available for the transition.
There are several difficulties in this approach: The execution infrastructure must support the

S(A) -. S(NULL) . S(B)
Reconfiguration strategies "Reboot" approach.S(A)-S(A')-S(B')" S(B) rapid transition of processes and transition of the states of the changing processes.The states must be mapped to the structures required by the next step, and installed with the new processing structure.The computation of the mapping may be complex.
The design of intermediate states can be com- plex, depending on the application.These transi- tory states depend both on the initial state and the final state, the algorithm characteristics, and the timing requirements.For smooth application transitions, many intermediate states may be required, leading to long transition times.(It should be noted that the application system is still under control during transition, but probably not the optimal algorithm.)Parallel State Transition Approach An extension of the State Transition approach allows the system to execute several modes in parallel.This has the same benefits as the state transition approach with the added benefit of being able to execute algorithms prior to use, in an offline mode.The state of the offline process can be allowed to stabilize prior to impacting upon system performance.When transients have disappeared, the system can be transitioned to the new state (Fig. 10).
This approach has several benefits.The applica- tion-level transients can be minimized by proper design.The downtime is minimal, as is the operation of the system in a less-than-optimal con- figuration.Multiple states can be preserved, not forcing all information to be encoded in one 13nfig ) S(A)--S(A')I[ S(B')-S(B) FIGURE lO Reconfiguration strategies "Parallel tion" approach.execu-format.This minimizes the impact of the design of one mode on another, thus simplifying design.
There are also several drawbacks.The underlying runtime environment must support mech- anisms for rapid stepping between processes, the ability to execute multiple threads simul- taneously, and the combination of attributes from the parallel executing processes.System design is complicated by the need to design parallel structures.(In some cases, the parallel approach allows design separability, simplifying matters.) The neces-sary computational resources are in- creased, due to the need to execute multiple par- allel processes.

Execution Environment Design
The previous sections assembled a set of requirements for the execution environment.They also point out some of the design complexities.Work- ing alone, the execution environment cannot solve these problems.The overall system design ap- proach must coordinate from the top-level algo- rithm designers/system requirement and resource specifications down to the hardware/software implementations.The top-level design issues have been discussed in terms of a domain-specific modeling environment, where the environment is tuned to reconfigurable system design.The Execution Environment forms the infrastructure onto which these designs are projected.
The Execution Environment must be designed with an interface suitable for synthesis from a MIC-Generator approach.The concepts, proper- ties and interfaces of the runtime environment must be compatible with the design representation and synthesis approach.Capabilities and inter- faces should be tuned to simplify the generator.This requirement demands a simple, uniform interface with a well-defined, consistent set of semantics that apply throughout the system.Since the system includes software, hardware, and interactions between parallel modules, a common structure must map to a wide range of components.
The execution environment concepts have communication schedule, and memory manage- been driven by results from using tools developed ment [19].Communications interfaces are sup- over the past several years.These tools are ported within the kernel, making cross-processor currently used to construct large-scale, parallel, connections invisible.Memory management is real-time signal processing systems.The runtime integrated with the scheduler and communication environment enabled development of CADsubsystems, enabling (but not solving) the problems DMAS systems, which are used by the USAF associated with dynamic reconfiguration.The ker- for turbine engine testing and NASA for SSME nel allows dynamic editing of the process table, monitoring and analysis [9,19].
and of the communications maps.The proper The semantics of the execution environment sequencing of these operations, including task implement a large-grain-dataflow architecture, execution phases, is necessary for the avoidance of The Worker Function captures the tasks that reconfiguration problems.The current approach are performed by the system.Communication  supports the "Reboot" approach directly, and will nodes capture the transfer of data between worksupport the more advanced reconfiguration ap- ers.Computations can be described as a bipartite proaches with cooperation of the application graph, where workers connect to Comm nodes, tasks.
and Comm nodes connect to workers.At this The hardware execution environment supports level, there are no implied semantics of the the same operational semantics.The implemenworkers.The execution properties of workers tation, however, is much different.The Virtual (Data tokens produced/consumed per execution, Hardware Kernel exists as a concept used in the timing of execution, etc.) are maintained at a system synthesis.The MIC Generator synthesizes higher level.The semantics of the Comm units a set of VHDL structural codes, one for each are asynchronous queues, configurable device multiplied by the number of When the generic large-grain dataflow graphs operational modes.Hardware Processors are are implemented, they must be mapped down to directly synthesized using predefined components.a physical implementation.The implementation Communications elements are selected from a takes the form of either software or hardware, library of interface types, based on the require-Software workers execute on a DSP or CPU, ments of the workers on either end, the required which we term Processes.Hardware workers are performance, and the available resources.The either implemented in reconfigurable hardware communication infrastructure works in coopera-(FPGA's), or ASIC implementations, or combina- tion with the software communications, performtions of both.Software Processes and Hardware ing the signal buffering, and the necessary off-chip Processors are logically equivalent, representing interfaces and data converters.The interface functions on data.Processes/Processors are con-components are drawn from a library of modules.nected via logical Comm that must buffer, com- The modules implement a limited set of stand- municate, and match data formats.In software ardized communications protocols to transfer implementations, the Comm object is logical data between modules, and present data in the Stream.implemented by the Kernel as a software format required by the destination processor.As queue in memory.In hardware, the Comm object the system is used for more applications, the set is implemented with registers and/or FIFO, or of interface types will grow in capability.simply wires (Fig. 11).
Inherent in these interface components must be The execution environment spans software and the capability to reconfigure.This involves strict reconfigurable hardware.The software environ- synchronization mechanisms, methods for saving ment consists of a simple, portable real-time kernel and restoring states, and facilities to allow func- with a run-time-configurable process schedule, tion and structure modification.Global  synchronization is greatly aided by having a common system clock, and facilities for very low- latency signaling within the system.Our current concepts for reconfiguration require a single in- terrupt signal to be present at each component participating in a reconfiguration.
In addition, the runtime environment must be designed with an interface suitable for synthesis from a MIC-Generator approach.The properties of the runtime environment must be tuned to simplify the generator.This demands a simple, uniform interface with a well-defined, consistent set of semantics that apply throughout the system.

Reconfiguration Manager
The reconfigurable hardware interfaces, and the flexible microkernel provide the facilities to implement system reconfiguration, however the problem of control and synchronization is critical.A global view of the system is necessary.The kernel, in isolation, cannot perform reconfiguration.Synchronization and control of a system dur- ing reconfiguration is the responsibility of the Configuration Manager (CM).The CM contains tables capturing the behavioral state machine de- fined by the designers Behavioral Models.Tied to these state-based descriptions is the information necessary to configure the hardware and soft- ware components of the system.
Given this information, the CM serves as a system observer.The CM monitors relevant signals, as defined in the transitions leading out of the current state.When the logical conditions for a state transition are satisfied, the CM begins the structural transition process.
The first stage of the reconfiguration involves transitioning the system into a known, safe state.All communication interfaces must terminate.Since many of the data ports are bi-directional, the bus direction control token must be returned to the 'safe' state.Computations must be com- pleted and transitioned into the 'safe' state.The safe state may involve using local algorithms to perform the basic required functions to keep the system stable.
After all necessary components are in the safe state, the global interrupt is toggled to initiate the reconfiguration event.At this point, all commu- nications must stop for the short period required for reloading the FPGA's bitfiles and the Soft- ware schedules and communication mappings.Since the state of the system was in a known safe state prior to reconfiguration enactment, there is little overhead atop the basic information these top-level behavioral models for the missile download.The CM will reload the necessary behavior.From the Initial transition, the system FPGA's using the standard download methods, enters into the INIT phase (upper left).The system A sequence of commands is sent to each of the transitions into the 'Ready' state, where it waits processors to enact the new processing graph for signals from the operator.The 'Seek Target' and interface components.Once the new pro- signal will start the active system operation in a gramming information is installed, the system 'On Platform Target Seek' where the controller interrupt signal is toggled to ensure a globally locates a target then waits for a 'Launch' synchronized start up operation, command before transitioning to the 'Tracking' state.A 'Launch' signal can also cause the system to transition directly from the 'Ready' to the 'Tracking' state.Figure 13 shows the internal APPLICATION EXAMPLE composition of the 'Tracking' state.The system enters the 'Tracking' via the 'LOBL' (Lock-on- The design environment has been used for several Before-Launch) or 'LOAL' (Lock-On-Afterapplications.Here, we will describe an Automatic Launch) transition input.The LOAL input Target Recognition application for missiles.The transitions directly into 'Acquire_LongRange' application is highly resource constrained, has mode, where a many-target acquisition is perform- hard real-time requirements, and must function in ed, and a target is selected.The system then en- multiple operational modes, ters into the long range tracking, until either the The initial design process involves iteratively track is lost (proceed to 'Acquire_LongRange'), constructing models that capture system design or proximity sensors signal the system to transi- information.The ATR application design first tions into the 'Tracking_MidRange' state.This specifies system operational requirements in the process repeats itself for Mid-Range and Short- form of Behavioral Models.Figures 12, 13     In parallel with the definition of the behavioral allows multiple designers to work at different requirement, signal-processing engineers can levels in the design space.Figure 14 shows the define algorithm structures in the Algorithm top-level signal flow for the long-range target Models using a library of components.Hierarchy acquisition modes.In parallel with design of Behavioral and reduce the system to approximately 100-1000 Algorithm Models, hardware engineers are cap- potential configurations.Figure 18 shows the de- turing the hardware architecture details in the sign space size at various stages of the iterative Resource Models.If the system is to be con- constraint application process.structed with flexible hardware modules, the From the remaining configurations, the design- specifics of these modules are captured and the er selects one for implementation.The synthesis final assembly can be left for future specification, process produces an implementable hardware Where the boards are hardwired, the complete and software design.The VHDL designs are topology is captured directly.Figure 16 shows the top level of the Resource models.This figure 25 shows the FPGA, its external memory, 4  = Constraint specifications are developed to express 5 complex relationships.See Figure 17.
The models are analyzed with the symbolic 0 constraint manager to explore the design space.
The initial design space in the ATR algorithm is 1024.The constraints are iteratively applied to   compiled using Synopsys for Xilinx and/or Max-tion, analysis and synthesis of systems will reduce Plus2 for Altera.The software structures are design effort and increase system flexibility.The processed via the native C compiler, underlying Runtime Environment, through the Finally, the system is loaded and executed us- abstraction of hardware and software details, ing the configuration manager.Figure 19 shows presents a uniform architecture for system syn- a testbench configuration with ATR result im- thesis and application implementation.age and target selection crosshairs displayed on a This research adds to the state of the art by Windows-based user interface.Intermediate dedefining and implementing a cohesive design en- signs can be instrumented with graphical displays vironment, where construction of dynamically ad- to view algorithm internal data structures, aptive systems with structural reconfiguration is This discussion shows one path through the a primary concern.The combination of behavior, design process.Typically, the process involves algorithm, and resources provide the breadth to several iterations, to optimize the algorithm per- represent the design of a flexible reconfigurable formance, resource utilization, and system func- system.The ability to define a broad design space tional behavior, permits the synthesis tools to optimize target sys- tems.The extensive set of constraints and model references give the designer control over the syn- CONCLUSIONS thesis process and allow user-guided pruning of large design spaces.The design space explora-The system described within this paper represents tion tools use scalable OBDD's to permit mana- an ambitious set of goals for a design tool.The gement of large design spaces, allowing rapid design environment is a comprehensive approach iteration over a wide design space.The integra- to building heterogeneous, real-time, resource- tion of performance simulation allows users to limited, dynamically adaptive systems.The Mod- receive feedback directly, in the native modelel-Integrated approach has been designed to based design concepts.Finally, a unified hard- support the many aspects and disciplines of ware/software runtime environment allows the embedded systems design.The flexible representa- hardware and software to be treated as conceptual equivalents.Since the abstraction layers are design alternatives within these hierarchical Mod- handled at design-time, the runtime efficiency els will allow the efficiency of these high-level impact is minimized, functions to be maintained near the level of a There are some limitations in the presented hand-coded implementation.
approach.A dataflow model was chosen for There are still many major research challenges representation and execution in the runtime to be addressed before achieving a fully functional, environment.While many data-intensive applica-robust design tool.These issues are: tions fit this paradigm, many control-oriented applications cannot be modeled or executed efficiently on a dataflow architecture.The OBDD approach to design space exploration typically scales gracefully for large problems, however they are sensitive to variable ordering.Some orderings result in an exponential explosion in number of OBDD nodes.While the end-result of the con- straint based pruning is independent of the order of constraint application, the sizes and execution times of the intermediate steps is sensitive to the order in which the constraints are applied.Also, the current runtime environment supports partial reconfiguration only in software and simulates partial hardware reconfiguration with multiple FPGA's.Partial hardware reconfiguration in this design environment awaits devices and vendor tools that fully support these features.
The prototype tool set has been applied to several small-to-medium-sized design projects with significant success.While metrics have not yet been collected, experience indicates high de- signer efficiency.The tools are still researchquality and several key components are in the process of design and implementation.
The design approach leads to flexible solutions.The implementation architecture is decoupled from the algorithm.Hardware is modeled as a set of generalized resources.These two factors combine to support device technology evolution, with changes required only to the resource models.
The high-level approach will produce greater design efficiencies and code/component reuse.Given an extensive set of component libraries, complex systems can be assembled rapidly.The component libraries can be extended and special-1.Optimization The current approach involves defining a very large design space and using constraint methods to extract a set of potential design solutions.The process relies on the engineer to manipulate a complex, interrelated constraint network.This process should be assisted by the design environment.Simple tools are planned that show a sensitivity analysis of design space vs. user-defined con- straints.This will help to guide the designer to the appropriate constraints that maximize system performance.Taking this a step further, optimization procedures can be implemented to automate the manipulation of system para- meters and constraints.In such a non-linear, discretized space, no guarantee of optimization convergence is possible. 2. Methods for assessing the transient upsets that can occur during a structural reconfiguration are needed.These transient assessment tools are needed for predicting both numerical results and for the real-time behavior during recon- figuration.
3. Procedures for rapidly incorporating vendor IP into libraries must be available to ensure that up-to-date components are available for the design.This also contributes to the ease of updating the technologies in the target platform. 4. Significant effort is required to transition the tools from a research prototype to a commer- cial quality, accepted design methodology and design environment.

FIGURE 4
FIGURE4 Design space exploration tool.
assignments are specified by 'drag-10 ging' one model into another as a Reference.
Event variables can system