Taking Thermal Considerations into Account during High-Level Synthesis

Submicron feature sizes result in designs in which power density is significantly increased. High-level synthesis strategies are proposed in this paper to relieve potential thermal problems. Operators are placed as close as possible to their data predecessors in order to minimize the interconnection cost while not violating the thermal constraints. Spreading overused functional units away from the thermal problem area often results in performance degradation. Introducing redundant operators is suggested to reduce the module utilization and hence thermal problems among problem modules when system performance is important. Our experimental results show that this technique produces quite satisfactory results for a powerdominated example.


INTRODUCTION
As integrated circuit feature sizes have decreased, designers have been forced to consider physical effects at higher levels of the design process. An increase in power density results from reduced feature sizes, partly due to higher operating speeds, and partly due to the increased component density on integrated circuits. However, the operating temperature of a chip is limited to a certain range for acceptable system reliability. For silicon devices, this temperature is in the range of 75-85C [1]. With increasing power density and with limits on the operating temperature, thermal limitations of the chip must be considered during the design process. Consequently, it is imperative to study not only the thermal properties of the device and package material but also the run-time thermal properties of the chip/die so that, during the design stage, a better thermal layout can be obtained to alleviate potentially high thermal stress. While thermal effects on-chip might be relatively inconsequential for many single MOS chips, the current thermal problems associated with multichip modules indicate that thermal effects cannot be ignored during multichip synthesis.
For the design of electronic circuits for reliability, there are a number of handbooks, specifications and guidelines which help us establish a common basis for comparing, evaluating and predicting related or competitive designs. The Military Handbook MIL-HDBK-217E [2] was developed by the Department of Defense to unify reliability prediction methods for *Corresponding author. Tel.: (213) 740-4476. Fax: (213) 740-9803. E-mail: parker@eve.usc.edu. 183 184 J.-P. WENG and A. C. PARKER integrated circuits produced by the military. According to this handbook, surface temperature is one of the major factors affecting circuit reliability. Computer-aided design software should predict and analyze thermal properties in circuits while they are under design, highlighting areas which are overused in order to prevent such failures. Engineers can then improve designs so that systems themselves operate at lower temperatures and more reliably.
Localized heat-concentration problems can appear when a high-level synthesis tool or designer uses a greedy approach to the allocation and binding synthesis subtasks which overuses some central resources.
The scheduling procedure, the schedule and topology of functional modules and the selection of the package material all affect the design enormously. Since not all functional modules are active all the time, utilizations of functional modules, which are the result of scheduling, can also affect the reliability of the design.
Thermal effects thus can represent a limiting factor in the development of ASIC chips. An accurate model of the thermal behavior of the die structure is necessary in order to make reliable designs. Thermal analysis methods can be categorized into two general approaches, namely, analytical solutions and numerical solutions. The analytical approach involves the search for an exact analytical solution for structures with regular geometries. An analytical solution is not always obtainable for a structure with complex geometries. Therefore, a numerical method is a better approach for a structure with irregular geometries. Numerical techniques such as finite-difference [3] finiteelement [4], and boundary element [5] have been widely used to analyze thermal profiles of electronic circuits.
The purpose of this work is to provide a novel method for improving overall thermal characteristics of circuits during the high-level synthesis process. We assume the amount of heat produced by a functional module is proportional to its work load. The basic idea in this work is to alleviate the heat-concentration phenomenon by averaging work loads between modules of the same functionality during scheduling and by balancing power dissipation during floorplanning. Averaging or spreading the power dissipation of functional modules allows the design to operate with a more balanced thermal profile.
In the following sections, we introduce our approach to the problem, and present thermal models for a chip under thermal consideration. An analytical solution is then derived. A numerical finite-difference method is also presented. An example is used to illustrate thermal impacts on floorplanning and scheduling. Finally, some remarks on the thermal analysis are given.

Problem Approach
Our approach to this problem is to combine thermal analysis with an existing scheduling/floorplanning program called 3D scheduling. 3D scheduling simultaneously constructs a floorplan during scheduling, allocation and binding. 3D scheduling can introduce additional ("redundant") operators to alleviate wiring delays, even if other operators are free during a given clock cycle. We perform thermal analysis on the floorplans produced by 3D scheduling, and illustrate how the thermal profile can be improved by either moving operators away from the high-temperature area (possibly causing performance to degrade) or by creating additional redundant operators to smooth out the thermal profile. Although the thermal analysis programs are coded and tested, we have manually modied the floorplans produced by 3D scheduling, to illustrate our ideas.

THERMAL MODELS
To calculate the induced thermal stress of a design, the temperature profile must be known. In our analysis, the heat conduction from the top of the die surface to the working fluid (usually, the working fluid is air) is assumed to be negligible, as compared to the heat conducted laterally through the doped substrate to the print circuit board. The thermal conductivities of the substrate and package material are constant in the steady state, and the surface temperature is prima-HIGH-LEVEL SYNTHESIS 185 rily a function of functional module utilizations and the topology of functional modules.
For a steady-state conductive cooling environment, the single-chip heat transfer environment can be modeled by a network of thermal resistances as shown in Figure [6]. In the thermal resistance model, the temperature and heat flux are analogous to the voltage and current in the analysis of an electronic circuit. The thermal resistances in Figure  Note that Equation oly considers one-dimensional heat flow. Since the amount of heat produced inside a chip and the ambient temperature are known, the temperature on the surface of package can then be derived from the network of thermal resistances.
In order to obtain the analytical solution of the temperature profile on the surface of the silicon substrate, we modified this thermal resistance model to a four-layer thermal model as shown in Figure 2 [7].
The thermal characteristics are retained in this simplified model. For a common silicon device, the first (top) layer is usually silicon. The second layer is usually a bonding material, such as epoxy or solder. The third layer represents metallization on the substrate (usually gold). The fourth (bottom) layer, the package, uses material such as alumina. Several assumptions have been made in this four-layer thermal model: The surface temperature at any position on the silicon depends on the heat dissipated by the functional modules, the thermal conductivities of the silicon and package material and the temperature of the working fluid (note that the heat dissipated by convention and radiation is not considered in our current approach). The heat generated from each functional module on the substrate is assumed to be uniformly distributed. The heat dissipation of a functional module is dependent on the clock frequency and its utilization.
The average power dissipation of a functional module is used in the thermal analysis, since the thermal time constant is much greater than the period of the clock signal.
Note that the average power dissipation of a functional module is the product of the maximal power dissipated at the working clock frequency and its utilization. For example, the power dissipated by a typical adder circuit when operating at a frequency of 10 MHz and VDD +5 volts is about 2 mW. If the utilization of this adder on a given design is 80%, the average power dissipation of this adder is 1.6 roW.

DERIVATION OF THE ANALYTICAL SOLUTION
Due to the small thickness of the bonding and metallization layers, only the first silicon layer and the fourth package layer are considered. Therefore, the four-layer thermal model shown in Figure 2 is further simplified to a two-layer thermal model as shown in FIGURE A typical single-chip heat transfer model.

Heat Sources
Package FIGURE 2 A four-layer chip thermal model. Figure 3 [8]. In this model, the package size is assumed to be the same as the die size; only the thermal properties of the silicon and package layers are considered. To derive the temperature distribution of the die structure in the steady state, the classic heat-flow govern equation must be solved: where the thermal conductivity of the medium is assumed not to be a function of temperature. The need for a three-dimensional solution is due to the fact that heat transfer in the die is three-dimensional. The following boundary conditions are considered: The energy flux is continuous at the interfaces between layers.
Equation 2 can be solved by separation of variables or other equivalent techniques. The solution that calculates the temperature on the top silicon surface is thus derived as [8] T The bottom surface (the package surface) has an arbitrary but known temperature distribution To(x,y). The lateral sides of the layer structure are considered to be adiabatic (adiabatic means no heat transfer between the analyzed structure and the working environment). The energy flux dissipated on the top surface of the silicon chip is described by the function P(x,y).   [7]. In the case of ASIC design, the power distribution P(x,y) can be modeled by a group of uniformly dis- Qf, is the heat dissipated by module and M is the allocated module set. Since the solution has the form of an infinite series, a criterion is needed to truncate the summation at a given required precision. Through a closer inspection of the infinite double Fourier cosine series, a rule of thumb, m 6L1/U, n 6Lz/V, is used which generally allows the temperature to converge within percent of the final value [7].

DERIVATION OF THE NUMERICAL SOLUTION
The basic principle of the finite difference approach is derived from differential equations via Taylor's expansion. Instead of using finite difference equations directly, these equations can be interpreted in a physical way to permit a more convenient application. Consider the die structure shown in Figure 4. The die is divided into a number of cubes. The cross-sectional area of each cube equals the area of a unit cell. We use a constructive approach to floorplanning in the preliminary.3D schedulin research. We define a unit cell as a primitive block so a larger functional module (such as a multiplier) is divided into several cells. In this case, the heat dissipated in the module is assumed to equally distribute among all topmost cubes which cover the module. The heat dissipation of a cube is assumed to be a point heat source on the geometric center of the cube. A nodal network is thus obtained, representing the structure under a steadystate condition. Each node i, which is the geometric center of cube i, must satisfy the equations sented by Equation 6, imposed boundary conditions must also be satisfied. The boundary condition here is the equilibrium temperature on the bottom of the structure surface which is the surface of the package [6]. The temperature is assumed to be uniformly distributed on the surface of the package due to the high heat conductivity of the package material. By solving the n-equation set simultaneously, the temperature profile can be obtained. The profile of the thermal stress can then be calculated. jB + qi 0 (6) Rij where B is the set of all neighboring nodes adjacent to node and Rij is the thermal resistance between node and node j which equals to ij ksAi. 3i denotes the conduction distance between node and node j and Ai is the cross-sectional area for heat conduction normal to 8i. qi is the heat produced in the volume lump at (i.e. cube i) which is the average heat dissipated in the module divided by the number of topmost cubes covering the module. Note that the heat produced by a module is assumed only coming from the topmost layer of cubes, since the power is mainly consumed by semiconductor circuits implanted on the surface of the silicon substrate, during signal switching. The formulation so far is restricted to interior points of the structure in which the heat conduction is taking place. In order to solve the equation set repre-

THERMAL EXPERIMENTS AND RESULTS
The thermal model presented in this chapter was used to experiment with the ADAM high-level synthesis tools developed at USC. Figure 5 shows an example of a data flow graph which is used as an input for high-level synthesis. Table I lists the module library set, which was derived from the Cascade Design Automation ChipCrafter silicon compiler. The data path synthesis program used to experiment with the thermal model is based on the preliminary 3D scheduling research, which incorporates interconnection delays during scheduling using floorplanning. A two-timestep schedule for a non-pipelined FIR filter design is shown with the cross-hatched line in Figure 5.    The 3D scheduling algorithm minimizes interconnection delays along critical paths to improve the design performance. The software tries to introduce redundant operators (redundant operators are operators not required for the feasible design with minimum operators) to alleviate interconnection delays, when the synthesized design cannot achieve user specified timing constraints [9]. For example, the floorplan of the two-time-step non-pipelined FIR filter design in Figure 5 is shown in Figure 6. In this example, the feasible design with minimum operators allocated contains 8 adders and 4 multipliers. The software found that interconnection delays along critical paths are reduced by introducing one more adder, which is shown shaded in Figure 7, in this two-time-step nonpipelined FIR filter design.
We assume the size of the floorplan is 1.74 mm by In order to demonstrate the problem of heat con- to be "hot-problem devices" which produce three times the amount of heat that multipliers produce when both adders and multipliers are fully utilized. The thermogram of the design with minimum operators in Figure 8a presents a smaller "hot-spot" as compared to the thermogram of the design with a redundant adder in Figure 8b. The reason was obvious after these two floorplans were compared and analyzed. The design with a redundant adder design  J.-R WENG and A. C. PARKER does produce better system performance as compared to the design with minimum operators. However, the design with a redundant adder attempts to use modules around the central area intensively to shorten wiring delays, which results a large amount of heat being produced in the central area, causing heat dissipation problems.
To resolve the heat-concentration problem, two solutions are proposed. First, by spreading adders around the problem area over the unused space on the floorplan, the heat-concentration problem can be alleviated. We applied this strategy to the FIR filter example. The resultant ttoorplan and thermogram are shown in Figures 9 and 8c, respectively. The results are successful. The temperature dropped about 20% (the dropped percentage is the ratio of reduced temperature versus overall temperature difference). This heat-balancing strategy resolved the heat-concentration problems at the cost of performance degradation.
In a design with critical timing constraint, the synthesis program or designer may not allow any degradation of the system performance. In this case, we propose another strategy by allocating additional redundant adders to reduced utilizations of addersaround the central area of a die or module. The resulting floorplan is shown in Figure 10. Three redundant adders were allocated to reduce the heat-concentration problem in this example. Utilizations of some adders are reduced in this floorplan. The related thermogram shown in Figure 11 reveals that both the thermal constraint and performance constraint are achieved simultaneously. The thermal improvement achieved is the same as in the previous case; the tem-perature is reduced about 20% (Indeed, the thermal profile of this floorplan is slightly better than the previous one.). These two design cases whose thermal properties have been improved illustrate that design tradeoffs are possible among area, performance and reliability factors. We believe that tradeoffs can be done much better if thermal effects are considered during the data path synthesis process instead of considering them as two problems separately and/or sequentially.

CONCLUSIONS AND FUTURE RESEARCH
The main objective of this work is to floorplan heatbalanced designs by avoiding overuse of the functional modules around the central area. We have demonstrated a localized heat concentration problem which induces a high thermal stress in the problem area and causes reliability problems. It also shows that thermal problems may be successfully alleviated by rearranging functional modules around the problem area or by introducing extraredundant operators. Both analytical and numerical solutions were investigated in our study. However, the analytical solution gives us more flexibility and efficiency during the computation of thermal profiles.
In the future work, a more exact thermal model will be studied which allows more layers to be considered in the computation of thermal profiles. Computation time may be reduced by using an infinite plate model [10], when the dimension ratio of heat sources to die structure is large.