Performance and Area Optimization of VLSI Systems Using Genetic Algorithms

A new performance and area optimization algorithm for complex VLSI systems is presented. It is widely believed within the VLSI CAD community that the relationship between delay and silicon area of a VLSI chip is convex. This conclusion is based on a simplified linear RC model to predict gate delays. In the proposed optimization algorithm, a nonlinear, non-RC based transistor delay model was used which resulted in a non-convex relationship between the delay and the silicon area of a VLSI chip. Genetic algorithms are better suited for discrete, non-convex, non-linear optimization problems than traditional calculus-based algorithms. By using the genetic algorithms in the performance and area optimization, we are able to find the optimal values for both delay and silicon area for the ISCAS benchmark circuits.

INTRODUCTION the techniques for performance and area opti- mization of VLSI systems can be divided into two categories.One is to change the circuit struc- ture by re-synthesizing or re-timing the target sys- tem.The other is to change the transistor sizes of the circuit so that the driving and load conditions in the circuit is optimal.The latter approach does not involve any topological changes and is often referred to as "transistor sizing ".
Transistor sizing is to find an optimum set of transistor sizes in a circuit so that the circuit perfor- mance and/or circuit area are optimized.The size of a transistor includes two components: the transis- tor channel length, L, and the channel width, W. Because the transistor channel length is often fixed to its minimum value, varying transistor sizes to change the circuit performance is often accom- plished by varying the transistor widths.The objec- tive function used most often has been f(A, T)= A T, where A is the total active area and T is the longest delay of the circuit.
Most of the existing transistor sizing algorithms [1, 4, 8, 2] uses linear RC delay models in the timing analysis.Fishburn and Dunlop [2] concluded that the relationship between delay and size of a VLSI system is convex.More recently, Sapatnekar [3] and Dunlop  [9] formalized the assumption of convexity and further improved the performance of their tran- sistor sizing algorithms.However, these algorithms based on RC delay models suffer from a number of drawbacks.Firstly, the RC-based delay models lead to inaccurate estimations of circuit delays.20%-30% deviation from the SPICE simulation in predicting delay by RC-based delay model is expected [13].
Secondly, these algorithms are calculus-based algo- rithms which have difficulties in optimizing a dis- crete search space.Even though Shyu [1] made improvements in dealing with the non-differentiality by expanding the definition of "gradient" to the extent of both differentiable and non-differentiable points, the algorithm in [1] were reported for having convergence problems on complex circuits and hav- ing difficulties in automatically finding an optimum parameter set.And finally, the convex relationship between the circuit size .andthe circuit delay may not hold if a more accurate, non-RC delay model is used as our experiments have shown [20].Delay calculation proposed by Hofman and Kim [12] is accomplished by a lookup table.The table-lookup method for delay calculation does not have the inaccuracy problems of RC models, but it often suffers from inflexibility in adapting to different technologies.
We used an analytic delay model [10] which is similar to the analytic delay model described by Weste and Eshraghian [11].The main difference is that our delay model takes the input slew rate into account resulting in a more complex but more accu- rate delay model.Our analytic delay model has the delay prediction accuracy between 0.5% and 2% over a wide range of input slew rates, transistor sizes and output load capacitances compared to SPICE simulations.
The research presented in this paper is motivated by the fact that the relationship between the circuit delay and the circuit size may be non-convex if a more accurate delay model is used rather than the simple RC model.We present a VLSI performance and area optimization algorithm by finding a set of optimum transistor sizes for a given VLSI CMOS circuit.To effectively search for an optimal solution, a chain-like geno-structure, referred to as a chromosome, is used to emulate a circuit path structure.The information of the size of a gate on the given path is represented by the encoding of each gene on the chromosome.Search for global optimum is facil- itated using the genetic algorithms [7].Because the optimization does not take interconnects into con- sideration, it is mainly used for pre-layout optimiza- tion.
The experiments on the ISCAS benchmark cir- cuits show that a substantial amount of improve- ment can be achieved in circuit delay, and very often, in both circuit delay and circuit size.
The remaining part of this paper is organized in 4 sections.In Section 2, a new critical path selection strategy used in our timing optimization is intro- duced.The organization of the timing and area optimization algorithm based on the genetic algo- rithms are described in Section 3. Experimental results on the ISCAS benchmark circuits are dis- cussed in Section 4. Finally, in Section 5, concluding remarks and the future work are presented.

CRITICAL PATH SELECTION
Extracting critical paths from a given circuit is the first step before timing optimization can be per- formed.In timing and area optimizations, the choice for optimum set of transistor sizes often cannot be determined solely on one critical path concerned.The close interdependency between different paths through shared gates plays an important role in both timing and area optimization.We refer to the gates not sharedby more than one path as "unique gates".By optimizing only the unique gates in a given critical path, the interdependency between critical paths is eliminated, thus, simplifying the optimi- zation process.
The selection of critical paths for timing and area optimization is based on the following rules: Select critical paths according to their rankings in path delays, starting from the most critical path.Select only the paths which have at least one unique gate.
Table 1 shows the number of unique gates for each of the 10 longest critical paths in the ISCAS bench- mark circuit C432.The most critical path has 17 unique gates.The next critical path has only 3 unique gates.The number of unique gates in the rest of the critical paths decreases with the increase in the number of identified critical path.This situa- tion is very common among most of the ISCAS benchmark circuits.
The advantages of critical path selection based on both the path length and number of unique gates are twofold: The identification of unique gates among the critical paths allows us to dramatically reduce the search space for optimizing a given circuit, and also enables us to isolate different critical paths during optimization.Such a simplifica- tion, though may not yi.eld a globally optimized solution, will significantly reduce the complexity of the optimization problem.
In general, by ranking the critical paths, longer critical paths are optimized first, fixing the unique gates in the shorter critical paths, thus 15" * For c6288, 5000 paths are searched.For all the other circuits, 10000 paths are searched reducing the computational complexity of the shorter critical paths without affecting the longer critical paths.
Table 2 shows the number of critical paths with at least one unique gate among 10,000 critical paths extracted for each of the ISCAS benchmark circuits. 3. A TRANSISTOR SIZING ALGORITHM BASED ON GENETIC ALGORITHMS Genetic algorithms (GAs) are discrete, probabilistic optimization algorithms.Results have shown [7] that genetic algorithms can search for the global opti- mum in a non-convex, discrete searching space.The genetic algorithms can operate on a diversity of coded strings ranging from binary, integer strings, to strings of alphabets, etc.Therefore, the genetic al- gorithms have been proposed to generate solutions to a wide range of problems [14], [7].In particular, several optimization problems have been investi- gated.These include control system [15], function optimization [16], combinational problems [17], [18], test pattern generation [17], and VLSI floorplanning [19].
The problem of transistor sizing consists of two parts: timing analysis to extract critical paths from a circuit; and optimization of transistor sizes on the extracted critical paths.We map the transistor sizing problem to a problem for the genetic algorithms as follows: An extracted critical path is treated as a chro- mosome.The length of the chromosome is determined by the number of genes on the chromosome.
A gate on a critical path is treated as a gene of the chromosome.A gene is encoded as an integer which represents the size of the corre- sponding gate on the critical path.
Assuming only CMOS gates are used and the risetime and falltime of the output is balanced, the size of a gate is represented by the effective transistor channel width of the n-tree in the gate.For instance, a 2-input NAND gate, the effective transistor channel width of the n-tree is half of the n-type transistor channel width.
The optimization process is divided into the following steps: 1. Extract the critical paths and rank them in the order of path delays. 2. Take one of the critical paths which does not satisfy the delay requirement and generate the initial chromosome population. 3. Calculate the load capacitance of every gate on the critical path according to gate sizes. 4. Use the analytic delay model [10] to calculate the gate delay and the fitness values of the population according to the fitness function where A is the total active area and T is the path delay.The input slew rate of a gate is approximated to the risetime or falltime of the previous gate.5. Calculate the average fitness value for the en- tire population. 6.If at least one of the chromosomes represent- ing the.corresponding path satisfies the timing and the area constraints, or the improvement on the average fitness value of the entire popu- lation is within a specified threshold, the opti- mization process is completed.Otherwise, take two chromosomes from the population according to a set of selection criteria related to their fitness values, and perform crossover or muta- tion operations to generate a pair of new chromosomes, i.e., a pair of new path configura- tions.7. If the new paths are better in terms of their fitness values, they dre kept in the population and two existing chromosomes are eliminated from the population according to a set of crite- ria.Otherwise, the new paths are eliminated. 8. Go to Step 3.  The above optimization algorithm is illustrated in Figure 1.The critical path of a given circuit is first extracted as shown in bold lines.In Figure 1, we assume that there are 17 gates on the critical path b-to-y.The size of each gate in the critical path is then initialized to a random size and the size of a gate in the critical path (an integer) is represented in a chromosome structure according to the order of the gate in the critical path.Many such chromo- somes are generated in the same way to form an initial population.Genetic algorithms are then ap- plied to the population to obtain the final optimized gate sizes which is shown in Figure 1 as the opti- mized chromosome.
There are several parameters which can determine the quality of the optimization using genetic algorithms.These parameters include the popula- tion size, Psize, the probability of using the crossover operator, exover, and the probability of using the mutation operator, P,ut.The population size has the effects on convergence behavior as well as the final results.Typically, the larger the population, the better the quality of the final results and the longer it takes to reach the final results.The probabilities of applying the crossover and mutation operators also have some impact on the quality of the final results.The optimal values of these probabilities need to be experimented.
The transistor sizing algorithm based on the ge- netic algorithm was implemented in C programming language with about 20,000 lines of code.

EXPERIMENTAL RESULTS
The transistor size optimizer was applied to the ISCAS85 benchmark circuits.A commercial 1.5/zm CMOS technology was used as the target technol- ogy.We compared the optimized results to two configurations.Under the first configuration, all the transistor sizes are set to the minimum size allowed by the technology.Under the second configuration, all th transistor sizes are set to the sizes of a commercial standard cell library as if these benchmark circuits were implemented using the standard cell library.In most cases, our transistor size opti- mizer was able to find better solutions in terms of the objective function.In some cases, the circuit size and the circuit delay were both reduced compared to two configurations.The parameters for the ge- netic algorithms chosen for the experiments were: The

No valid results because of memory allocation problem
The value of the size of a circuit is the sum of all the transistor widths in the circuit times the channel length of the transistors, so the unit for size is cm (unit for width)* channel length.
The rationale for choosing these parameters will be discussed later in this section.
In order to compare the optimized results with two different configurations, we define the figure of merit (FM) as: FM (figure of merit) initial delay or size optimized delay or size initial delay or size (2) Table 3 shows the comparison between the opti- mized ISCAS85 benchmark circuits and the mini- mum size configuration.The results are very much expected.We reduced the circuit delay at the cost of the increase of circuit size in 8 out of 10 cases.For C1355, our optimizer could not get any reduction in delay even if the circuit size is dramatically in- creased.Our optimizer ran out of memory during the optimization for C6288.Table 4 shows the comparison between the opti- mized ISCAS85 benchmark circuits and the stan- dard cell library implementation.In 7 out of 10 benchmark circuits, we achieved reduction in both the delay and the size of the circuits.The reduction in circuit sizes ranges from 4.6% to 31.3%.At the same time, 2.4% to 22.5%.For C499 and C1355, the circuit sizes were reduced at the cost of lesser amount of increase in circuit delay.This evidence demonstrates that the relationship between circuit size and circuit delay may not be convex as many others have originally believed.No valid results because of memory allocation problem The value of the size of a circuit is the sum of all the transistor widths in the circuit times the channel length of transistors, therefore, the unit for circuit size is "1 cm*channel length".
In all circuits, 10000 paths searched.Population size 100 in GA  During the optimization, we also kept track on the number of unique gates which increase in size or decrease in size as compared to the initial configu- ration.They are shown in the column "% of gates" with "size increase" and "size decrease" in Table 4.This is an indication of whether the genetic algo- rithm has successfully identified a small amount of critical gates to improve the circuit delay.For most cases, the amount of gate increase in size is smaller than the amount of gate decrease in size.
We also experimented various parameters for the genetic algorithm such as population size and the probabilities of performing crossover and mutation.
Figure 2 shows the objective values, A. T, of a 17-gate path in C432 during the optimization in which genetic algorithms updates the generation of the population as new and better chromosomes are generated.The population size varied from 10 to 10,000 chromosomes.It is obvious that the objective values with larger population sizes are better than those with smaller population sizes.However, larger population size requires longer computation time to update each population.Our experiments show that the difference between the results with population size of 100 and the results with population size of 1000 and 10,000 are similar in terms of their objec- tive values after a certain number of generations.This phenomenon is very typical in other benchmark circuits.Figure 3 shows the same relationship for C7552 with the path length of 43 gates.
The probability of performing crossover and mutation is less well behaved as compared to the population size in terms of finding the best value.Choosing the best value for a given circuit requires a certain amount of experiments.However, varying these two parameters does not have a significant impact on the final results as long as they are within a certain range.Figures 4 and 5 show the relation- ship between the objective values and the popula- tion generated by the genetic algorithm under vari- ous probabilities of performing mutation, Pmut" AS these figures indicate, no definite conclusion can be drawn from the experiments.Our choice of Pmut is 0.1.
Although the delay model in [10] has taken the load capacitance into consideration, the load capaci- tance contributed by on-chip interconnects was not included in the experiments presented in this paper due to lack of layout information.The load capaci- tance contributed by the fanout gate capacitance was, however, included in our experiments.This is the first time, to the best of our knowl- edge, that the entire set of ISCAS85 benchmark circuits has been used for evaluation.All previous publications on transistor sizing have given experi- mental results based on several non-public domain circuits to which we do not have access.Therefore, direct comparison of our results and previously pub- lished results is not possible. 5. CONCLUSION By using a non-RC-based delay model, we have shown that the relationship between the delay and the size of a given circuit may not be convex as many others originally believed.Therefore, we generalized t,he transistor sizing problem as a non-convex opti- mization problem and used genetic algorithms to search for global optimum for both the delay and the size of a circuit.The results show that the genetic algorithms have several advantages over tra- ditional calculus-based algorithms, such as the abil- ity to search for global optimum in a discrete, non-convex search space.By using the proper path selection method, the search space can be greatly reduced to make it feasible for optimizing a large complex VLSI chip.We are especially encouraged by the results that show the opportunities to reduce both circuit delay and circuit size for a large of VLSI circuits of the transistor sizes are chosen properly.
One problem with our current experiments is that it fails to consider the impact of on-chip intercon- nects.As IC technology advances to deep-submicron feature size, the impact of on-chip interconnects on the overall chip optimization will be greatly increased.Our future work will include incorporating layout information about on-chip interconnects by back-annotating such information to the gate level.
In addition, area estimation will include both the active area in terms of transistor sizes and the routing area.
FIGURE A simplified flow chart illustrating the optimization process.

FIGURE 5
FIGURE 5The effect of Pm,t on the objective values of a 43-gate path in C7552.

TABLE 2
The number of critical paths with at least one unique gate in the ISCAS85 benchmark circuits

TABLE 3
Results of the optimized ISCAS85 benchmark circuits with the minimum initial size for all the transistors.

TABLE 4
Results of the optimized ISCAS85 benchmark circuits with the initial implementation using a commercial standard cell library.