Simple Exact Algorithm for Transistor Sizing of Low-Power High-Speed Arithmetic Circuits

A new transistor sizing algorithm, SEA (Simple Exact Algorithm), for optimizing low-power and high-speed arithmetic integrated circuits is proposed. In comparison with other transistor sizing algorithms, simplicity, accuracy, independency of order and initial sizing factors of transistors, and flexibility in choosing the optimization parameters such as power consumption, delay, Power-Delay Product (PDP), chip area or the combination of them are considered as the advantages of this new algorithm. More exhaustive rules of grouping transistors are the main trait of our algorithm. Hence, the SEA algorithm dominates some major transistor sizing metrics such as optimization rate, simulation speed, and reliability. According to approximate comparison of the SEA algorithm with MDE and ADC for a number of conventional full adder circuits, delay and PDP have been improved 55.01% and 57.92% on an average, respectively. By comparing the SEA and Chang's algorithm, 25.64% improvement in PDP and 33.16% improvement in delay have been achieved. All the simulations have been performed with 0.13µm technology based on the BSIM3v3 model using HSpice simulator software.


Introduction
Optimization of VLSI circuits relies heavily on efficient implementation of arithmetic operations considering signal delay, power consumption, and chip area. Such important design considerations and trade-offs lead to a general approach towards transistor sizing that will prove to be extremely useful. In fact, transistor sizing, that is, the operation of enlarging or reducing the channel width of transistors, is a powerful and effective performance optimization tool in the hands of the designer.
Although VLSI circuits can be optimized in a number of ways, such as circuit style selection, structural optimizations, and transistor sizing [1][2][3][4][5][6][7][8][9], various reasons exist as to why transistor sizing is an important issue. First of all, the power dissipation is a strong function of transistor sizing which affects physical capacitance. Sources of power consumption such as glitches and short-circuit currents can be minimized by careful circuit design and transistor sizing. Second, transistor sizing affects not only the resistance of devices and time constant but also propagation delay of the gate due to the parasitic capacitors. Third, careful transistor sizing is necessary to maintain sufficient noise margins. Even more important is the observation that transistor sizing becomes critical in ensuring proper functionality of a circuit. For all the above reasons, transistor sizing is an essential means of implementing high-performance circuits [10].
Furthermore, with the continued scaling of technology and reduced transistor sizes, the behavior and performance of a circuit could not be investigated without transistor sizing. From simulation results, it has been noticed that a small change in the transistor size for a given technology leads to a remarkable change in the characteristics of a circuit. Therefore, using an appropriate transistor sizing method is necessary for a circuit prior to measuring its parameters.

VLSI Design
One practical formulation that recognizes a designer's objective to achieve the best performance at a given time period can be stated as Minimize Power (w) Subject to Delay (w) ≤ Tspec, where both Power and Delay parameters are functions of the transistor size, w ∈ R n , and n represents the number of transistors in a circuit. Tspec and Aspec are, respectively, the constraints on the circuit delay and area, and Minimize is the minimum transistor size allowed by the technology [11]. Note that the optimization problem specified in this formula must be solved at one subcircuit at a time, and although the entire circuit may have a million or more transistors, the number of variables in the sizing problem will be reasonably small.
Transistor sizing is somewhat complicated especially within complex circuits. Sizing a transistor to speed up one signal path may slow down another due to the capacitive loading effect of path interactions. This technique should therefore be used with caution. The algorithms which are presented in [12,13] illustrate a linear method to make a trade-off between power, area, and delay in CMOS circuits. In [14], the relationship between transistor sizes and total circuit delay has been considered as nonlinear. It has been shown by Fishburn and Dunlop [14] that the transistor sizing problem is a convex under the simple lumped RC model.
Hence, a transistor sizing tool to meet our requirements in the performance optimization of VLSI circuits is crucial. An appropriate algorithm is the one that is simple to analyze and implement, decreases the execution time, increases the optimization rate of the goal parameter, and offers flexibility in choosing different optimization parameters.
Selecting the optimization factor is dependent on our needs. At present, for many researches, Power-Delay Product (PDP) which is metric for energy consumption of a circuit, is vitally important and the transistors has been sized to meet the minimum PDP [1,2,6,15,16].
Several approaches have been applied in transistor sizing, one being a Mathematical optimization method [17][18][19][20][21]. The transistor sizing problem is formulated as a constrained nonlinear mathematical program of optimization factors. MDE (Minimum Delay Estimation) and ADC (Area-Delay Curve) algorithms [22] are within this group. Another approach is a Heuristic approach which was proposed for the first time in TILOS [14]. In this method, the transistor sizes have been changed iteratively until optimization is reached. In another approach, the combination of these two methodologies has been used. In this way a twostage approach to combine the advantages of the heuristic and the mathematical programming techniques have been proposed. After using a heuristic method to perform an initial sizing, a timing analyzer and a mathematical optimizer are utilized to optimize the design. In this paper, through the review of the pros and cons of various transistor sizing approaches, a systematic and effective algorithm to size the transistors of various full adder cells for minimal energy consumption is suggested. The objective of our work is to explore the improvement in the performances of different full adders that could be obtained using various transistor sizing algorithms.
The rest of this paper is organized as follows. Section 2 reviews the transistor sizing algorithms based on logical effort. In Section 3, heuristic algorithms are analyzed and surveyed, where we begin with the attributes of Chang's algorithm as it is closest to our proposed algorithm. Then, after exploring the rules of grouping transistors for sizing operation, we propose our new algorithm. In Section 4, the circuits are simulated and the results are analyzed and compared. Finally, the conclusion is presented in Section 5.

Transistor Sizing Based on Logical Effort
One of the transistor sizing approaches is to use mathematical techniques. Logical effort is a technique to solve the transistor sizing problem in this way. Two algorithms, MDE and ADC, which are dealt with, are based on logical effort. The work [22] shows how these algorithms can size the transistors without running a heuristic sizing tool by calculating the minimum achievable delay and the cost of achieving a target delay.
This approach estimates the size of transistors without incurring the overhead of running a sizing tool. In this manner, different implementations are evaluated based on two metrics. First, the problem of estimating the minimum delay is considered. This metric allows a designer to determine whether an implementation can meet a given delay specification. The delay of a circuit is the maximum delay of all Input to Output paths of the circuit. In order to meet design goals, transistor sizing is applied to the circuit to reduce this delay. The smallest delay value that can be obtained in this way is referred to as the minimum achievable delay. Due to the associated high area overheads, most circuits other than their critical paths are rarely sized in order to meet this minimum delay value. In addition, the minimum achievable delay along with the unsized circuit delay helps to determine the range of delay value over which an implementation can be used. In short, these two algorithms present a technique that estimates the minimum achievable delay of a circuit which then goes to trace the areadelay curve.

MDE Algorithm.
Briefly, MDE algorithm estimates the minimum achievable delay of a specific implementation. The most significant point in this algorithm is that the minimum achievable delay is computed for all fan-out paths of a transistor. MDE algorithm obtains the Delay-C in curve (C in is the input capacitance of a circuit) based on some formulas as in [22].
The advantages of MDE algorithm are [22] as follows.
(1) This algorithm is very adaptable for tree structures as multiple paths can be processed at the same time.

3
(2) This approach is utilized in gaining the actual size of a transistor.
(3) It is faster than the transistor sizing tools which work heuristically. For instance, MDE has less execution time in comparison with TILOS in [14].
(4) The circuit can be sized at a minimum delay.
(5) Assume that there are K sizes for each gate in a circuit with N gates and the maximum fan-out on any gate |FO|. According to the pseudocode in [21], the innermost for loop is executed O(K · |FO|) times and the cost of determining the maximum delay point is O(|FO|). The second for loop is executed K times. Finally, since there are N gates in a circuit, the outermost for loop is executed N times. Thus, the running time of MDE algorithm is O(N · K 2 · |FO| 2 ).
(6) The circuit has been sized for minimum achievable delay, and the total size of circuit increases. Hence, it can just be used for optimizing critical paths.
(7) According to the first problem, there is no desirable trade-off between chip area, delay, and power consumptions; thus PDP is not optimum.

ADC Algorithm.
Not all circuits need to be sized to operate at the minimum achievable delay. For these circuits, a target delay is known and multiple implementations of the circuit are available. Therefore, ADC algorithm trades area for delay and also achieves a reduction in power. The cost (in terms of area) for achieving the given delay is determined, which results in estimating the entire area-delay curve of each implementation. The characteristics of ADC algorithm are as follows [21].
(1) It is in the class of mathematical algorithms.
(2) ADC utilizes the Delay-C in curve for sizing, which is the output of MDE algorithm.
(3) The sized circuits have further delay versus smaller area.
(4) It is not an accurate solution but can estimate the suboptimal area value.
According to the pseudocode in [22], the execution time of ADC is O(N), where N is transistor count.
As we see, MDE and ADC algorithms are the complementary of each other. The delay parameters are generated in MDE algorithm and then ADC algorithm uses them for calculating PDP parameter.

Heuristic Transistor Sizing Algorithms
3.1. Chang's Algorithm. By optimizing the transistor sizes of the circuits, it is possible to reduce the delay without significantly increasing the power consumption and transistor sizes can be set to achieve minimum PDP [1]. To provide a fair and insightful evaluation of circuits, a systematic and effective way of sizing the transistors for optimal performance is necessary. To provide a good trade-off between the conflicting sizing requirements for power and delay performances, the goal of optimization is to minimize the power-delay product, that is, the energy consumption. Chang's algorithm is an appropriate algorithm for sizing the transistors of a circuit and it is suitable for optimizing PDP [2]. The characteristics and functionality of this convergent algorithm are similar to our proposed algorithm.
Initially in Chang's algorithm, the sizes of the transistors in the circuit are reasonably set. The scaling operations are carried out in several iterations transistor by transistor. In [2], w j (T i ) is the width of the ith transistor at step j and Θ k is the PDP of the circuit of the kth iteration. For every optimization iteration, one transistor at a time is tuned for minimal PDP in 2 × m steps with a step resolution of ±ψ. The optimization stops when the performance difference in two successive iterations is smaller than a given error ε. More than one iteration may be necessary because each time a new transistor is sized in the current run, the other transistors sized in the previous run may no longer maintain their optimality [2].
In order to obtain enough coverage so that the optimal or quasioptimal sizing falls in the search region, the step resolution, ψ, is made variable. Large step size is used at the first few iterations. Two optimization strategies are adopted in the previous procedure of transistor sizing to accelerate the process.
(1) The corresponding pMOS and nMOS in a complementary pair are optimized in successive runs because the output transitions of the node driven by one transistor is often influenced mostly by the driving capability of its complementary counterparts. (2) Series transistors or parallel transistors of the same type that source current to or sink current from the same node have equal size and can be optimized simultaneously [2].

Transistor Sizing in Three-Dimensional Space.
Transistor sizing is not a linear issue on the basis that modification of a transistor size within a circuit influences the circuit performances. In this space, we encounter different sizes and different parameters. Therefore, we must focus on an ndimensional space and in general one of the parameters in this space will be optimized, meaning that; n − 1 dimensions are related to Kis. In this paper, Ki is the coefficient which is assigned for ith group of transistors in a circuit. The initial value of each K is in fact the initial coefficient of channel width of transistors in a group. In other words Kis are known as the sizing factors of transistors and the nth parameter is the actual target which we desire to optimize. This target can be any of the circuit specifications including that of power consumption, delay, power-delay product, chip area, or their combination with a given weight. If transistor sizing is used to optimize only one target parameter, then the other parameters of the circuit may be neglected and may even lead to weak designs. For this reason, we can apply a product of two or more target parameters with determined weights in forming a new parameter. Then transistor sizing is performed for this new parameter [6]. As an example, the K1 to K4 are sizing optimization coefficients of transistors in illustrated gray groups which is shown in the XOR/XNOR circuit in Figure 1. These coefficients can optimize parameters such as the power-delay product. Therefore, transistor sizing must be performed in a five dimensional space. Hence, in an n-dimensional space, the Kis are sequentially modified until the target parameter arrives at the best possible position. In fact, changing the size of a transistor in an n-dimensional space can modify the circuit specifications. With this in mind, changing the size of transistor T i in an n-dimensional space, initiates the process of transistor sizing. There are a lot of directions that transistor sizing can undertake in an n-dimensional space. Unfortunately, we can only change one Ki in an n-dimensional space at a given time. Undertaking various sequential operations goes to generate multiple peaks whereby each has their own advantages.
Consequently, we are seeking an appropriate algorithm for transistor sizing that has the following features. Firstly, the highest peak is to be obtained indicating that target

Simple Exact Algorithm (SEA)
Group all the transistors in the circuit using Transistor Grouping Rules; Initialize W i , width of the transistors in group i, so that: Initialize S, the step size and m, the number of step sizes; Compute the target parameter, Θ and save it to an array;} W opt is the point where Θ has become minimum for that; Compute Θ and Save it to another array;}} k i is the point where Θ has become mimimum for that;} while (all k i 's converge to an specific value); Algorithm 1: Proposed transistor sizing algorithm (SEA) for optimizing.        this algorithm must be as simple as possibly can be and the number of steps needed to reach the peaks must be at a minimum. This means that it should be neither computationally expensive nor time consuming. Also, this algorithm must be capable of being simulated by related softwares such as HSpice. Figure 2 describes transistor sizing in the simplest manner. According to this sample figure, we assume that we have a three-dimensional space, which can be considered as a circuit consisting of transistors that can be categorized into two groups. K1 and K2 are sizing factors of transistors of the mentioned groups. In this space, point M which is the absolute maximum or peak represents the parameter that is intended to be optimized by transistor sizing. To acquire this parameter, the sizing factors of transistors must converge or must be within a specified confine. Consequently, our aim in optimizing the circuits or specifying the size of transistors is obtaining appropriate values for K1 and K2 such that we can reach the absolute optimum. The issue concerning three-dimensional spaces can be easily extended to four dimensions, five dimensions, and up to an n-dimensional space. This algorithm begins by sweeping K1 while keeping K2 constant. In this condition, m1 on the curve which is the relative optimum in the current state is chosen. Now we consider K1 to be constant at point m1 and we sweep K2. Then we choose point m2 on the curve as the relative optimum parameter. In return, the value of K2 at point m2 is considered as constant. This part of the algorithm is within a loop, because each time the size of a particular transistor is chosen, it is not guaranteed that the result we obtain is optimum for the other previously sized transistors.
In general, the number of iterations depends on the technique we employ to acquire the peak. The algorithm terminates when values of K1 and K2 in successive iterations become identical. When this happens, we are assured that point M is the absolute optimized parameter. Otherwise, the algorithm may terminate when we place conditions on   the target parameter to be varied over a small range. To ensure that our sizing is optimum within the search window, the step size in sweeping is changeable. In this regard, when we get closer to the optimum peak, we can change the step size to a smaller value to improve accuracy. In other words, if the step size is small, consequently the error rate is small, accuracy is increased, and the optimum peak is gained. It is clear that if at the beginning K2 and then K1 are swept, point M will be chosen again as the unique absolute optimum. In other words, in this algorithm the order of sizing factors of transistors is not important and it is not effective in obtaining the optimum value of the target parameter.
Another factor that may cause problems in transistor sizing is local peaks on the curve because there is a danger in selecting local peaks instead of the absolute peak. However, considering Figure 2, if, in the algorithm, point L which is the local peak is assumed to be the optimum point and since the algorithm is iterative, in the next iteration by sweeping K1 or K2, we certainly exit from point L and point M is chosen as the optimum point.
One important point is that a symmetrical threedimensional space can increase the probability and speed of selecting an optimum point. For example, if sphere is considered as the most symmetrical object in a threedimensional space, from any location that K1 and K2 are swept, the same position is finally reached.
In this regard, the best approach in this field is to place conditions for the n-dimensional space to become symmetrical. This is dependable on how the transistors of a circuit are categorized. In this paper, we will introduce a new rule in grouping transistors; that is, transistors with similar positions are placed into one category. This leads to the improvement of the circuit performances and an increase in the optimization speed by means of a symmetrical ndimensional space.

The Rules of Grouping Transistors for Sizing Operation.
Due to the fact that the time needed for transistor sizing is severely influenced by the number of sizing factors, it is necessary to reduce these factors as much as possible before determining the transistor sizes. Thus, the complexity and execution time of sizing operation decreases. This abatement  of parameters must be performed carefully such that disorder is not presented in sizing accuracy. Hence, in this paper, some rules are offered to recognize transistors which are in similar positions and are grouped accordingly. Therefore, from the viewpoint of transistor sizing, the number of sizing operation steps will be reduced. These rules are stated as follows.
(1) Series transistors of the same type have equal sizes to have the same conductivity. Else, the slower transistor will limit the speed of others; therefore, the delay is increased [2].
(2) Parallel transistors of the same type have equal sizes to have the same conductivity. Else, the slower transistor will be determinant of the speed of the path, and delay will be increased [2].
(3) Transistors of the same type which are in similar positions and are not series or parallel must have equal sizes. For instance, consider P1 and P2 transistors or N1 and N2 transistors of XOR/XNOR 6T circuit in Figure 1. Even though these transistors are not series or parallel, they still have to have the same size to equalize the resistance and drivability of the paths which they are located on. Thus, the delays of each path are the same.

Introducing Proposed Transistor Sizing Algorithm.
As shown in [23], the transistor sizing for optimal performance is technology dependent. For a certain technology, the channel lengths of all transistors are fixed at the minimal feature size. So, the only variable to be optimized is the   [2]. The proposed algorithm which is called SEA is described by the pseudocodes in Algorithm 1. At the beginning, all the transistors of a circuit are categorized on the basis of the rules of grouping transistors. Then the channel width of transistors in group i is considered to be equal to W i , where i = 1, 2, . . . , n. According to and a coefficient k i is assigned to each group where k i is the sizing factor of transistors' widths in ith group. k i is initiated with "1" where i = 1, 2, . . . , n.
The point is that, W * or in other words Wis are not permitted to be lesser than technology feature size under any  circumstances. Now assume that all transistors have equal widths. To obtain the optimum width, W * is swept once from minimum feature size to as much as its value in a twodimensional space. The value of W * at the optimum point is the optimum width or W opt .
In each iteration, the sizing factor k i is swept in an n + 1 dimensional space whereas the other k j values where j = 1, 2, . . . , n and i / = j are assumed as constants in order to find the optimum point. Then, in return, the value of k i in optimum point is considered as a constant. This procedure is iterated n times for all k i s. The iterations are executed since k i where i = 1, 2, . . . , n converges to a specified value.
In our proposed algorithm, the outermost for loop is executed n times where n is the number of transistor groups. The innermost for loop that implements the sweep operation is executed m times where m is the number of step sizes. Thus, the running time of SEA algorithm is O(mn).
Attributes of SEA Algorithm. The transistor sizing operation in SEA algorithm tends towards a logical and feasible approach. This algorithm is also flexible in determining our target specification. It means that our goal can be one of the optimization parameters or the combination of them. In addition, the algorithm decreases the optimization VLSI Design parameters by the means of more exhaustive rules of grouping transistors which are in the same positions in symmetrical circuits. The decline rate of optimization parameters not only decreases the execution time of transistor sizing operation but also leads to lower dependency amongst those parameters. Therefore, SEA algorithm seeks the optimum point in a smaller state space which results in a greater probability that the goal parameter assumes its best-case value. It is obvious that the more symmetrical a circuit is, the greater the algorithm's performance.
The behavior of some transistors of New HPSC circuit which is shown in Figure 1 is investigated in Figure 3, concerning their influence on target parameter. This evaluation is done under the condition that the sizes of other transistors are fixed and just the size of intended transistor is changed and then the PDP values are measured. According the rules of grouping transistors, P1 and P2 would be in the same group since they have the same position. P4 and P5 are grouped together as they are series. Considering Figure 3, the behavior of P1 and P2 transistors towards the goal parameter is similar. Thus, classification of these transistors in the same group would be reasonable.
Studying XOR/XNOR 6T circuit, P1 and P2 transistors and also N1 and N2 have the same position and could be classified in the same groups. The preliminary results for XOR/XNOR 6T which is shown in Figure 1 are listed in    Tables 2 and 3 are randomly selected. A comparison of PDP results with achieved values in Tables 2 and 3 demonstrate that there is some negligible difference in PDPs in return for different initial values. Thus, the transistor sizing operation in the proposed algorithm is not determined by initial values of sizing factors.
The initial sweep of W * which is done in the first step is one of the best achievements of SEA algorithm. In the beginning of transistor sizing with sweeping W which is done for all transistors identically, we can seek the optimized performance within a region closer to the absolute optimum point. Hence, the convergence trend will happen faster. This is because each transistor is assigned an allotment in determining the value of target parameter. The transistors which have such important roles in determining the target parameter are more effective in sizing procedure.
In the other words, by sweeping all the transistors' widths, the target parameter is impressed by vital transistors. The other transistors do not have such important roles in defining the circuit performance. It is therefore clear that various transistors have different effects on the goal parameter, according to their position in a circuit. This matter is demonstrated by simulations which are presented as convergence trends in Section 4. The big jumps in Figure 11 at the beginning of simulation which results mainly from sweeping W * at the first run of the proposed algorithm, shows the convergent trend in SEA algorithm in comparison with Chang's algorithm. This jumping towards the optimum point accelerates the simulation and is witnessed in most of the circuits.

Simulation Setup.
In this section, the performance of the proposed algorithm is investigated through simulations under a variety of conditions. Full adder is a versatile and widely used building block in arithmetic circuits and the core element of complex arithmetic operations such as addition, multiplication, division, exponentiation, and so forth. [2,5,[24][25][26]. Therefore, the seven full adder circuits C-CMOS, CPL, TFA, TGA, New 14T, 10T, and New HPSC of Figure 5 which are mostly low-power and high-speed circuits are used for simulation. These full adder circuits are all optimized using MDE, ADC, Chang's algorithm and the proposed transistor sizing algorithms.
In order to experience all the possible transitions as inputs, an input test pattern with 56 transitions must be applied to a full adder circuit. An input transition may or may not result in a change at the output node. Even if there is no switching activity at the output node, some internal nodes may be switching. This switching activity results in some power dissipation. Thus, for an accurate result, all the possible input combinations are considered for all the test circuits [1]. Therefore, a group of input test patterns which offers all the 56 different transitions from one input combination to another are used as the input vectors for the full adder cell. Figure 4 shows the input stimulus, and Sum and C out outputs.
The circuit performances of the seven full adders are evaluated in terms of average power consumption, worstcase delay and power-delay product for a range of supply voltages from 0.8 V to 2.0 V. For all the rise and fall transitions of outputs, the delay is measured from 50% of input voltage level to 50% of the output voltage level. The delay measurement in the simulations is shown by  (5) it is evident that the full adder delay is the larger delay value of Sum and C out outputs. The power-delay product quantity is defined as All the circuits are designed using the TSMC 0.13-μm CMOS technology and were simulated using the BSIM3v3 model with Level 49 technology file. The threshold voltages of the PMOS and NMOS transistors are approximately 0.33 and 0.35 V, respectively. Simulations are carried out using Star HSPICE. Environment temperature has been set to 27 • C.
The generic test bench used to simulate the full adders being compared is shown in Figure 6. This simulation environment has been commonly used to compare the performance of the full adders in [1,[7][8][9]25]. To simulate a real environment, input buffers for all inputs of the test circuit are used to generate a real waveform and output buffers for both outputs are used to generate output load.

Simulation Results.
In order to have a fair and exact comparison of the proposed algorithm with the three previously reported algorithms, simulation conditions for these algorithms were considered identical as far as possible. Since C load capacitance is needed for MDE and ADC algorithms, the equivalent capacitance of output buffers in single test bench is calculated (about 32 fF) and has been considered as C load .
The simulation results for the MDE and ADC algorithms under supply voltages 0.8 V, 1.2 V, 1.6 V, and 2.0 V for five full adder circuits C-CMOS, CPL, TFA, 10T, and New HPSC are listed in Table 4. The values of power, delay, and power-delay product (PDP) obtained for considered values of V DD (0.8-2.0 V) for these full adders are shown in Figure 7.
The seven full adder circuits of Figure 5 are all simulated to achieve the optimum power-delay product (PDP) using Chang's algorithm and the proposed transistor sizing algorithms. Optimization of the transistor sizing is carried out at seven different voltages, 0.8 V, 1.0 V, 1.2 V, 1.4 V, 1.6 V, 1.8 V, and 2.0 V. The step size of the subsequent iterations of these algorithms is set to 0.1 μm. Thus, the final transistor sizes have the precision of 77% of the channel length, which is 0.13-μm for our targeted technology. The final transistor widths for the New HPSC full adder cell which is optimized by both of the algorithms at different supply voltages are listed in Table 11.
The power, delay and power-delay product at supply voltage ranges from 0.8 V to 2.0 V of these full adder cells are listed in Tables 5 and 6, respectively for comparison. To make the comparison easier, the simulation results are shown in Figures 8 and 9.

Comparison of Algorithms.
As discussed earlier, MDE and ADC algorithms are based on logical effort and are classified as arithmetic transistor sizing algorithms. Therefore, the transistor sizing approach in these algorithms is different from Chang's algorithm and the proposed algorithm which are both heuristic. Once the delay is optimized through the use of MDE algorithm, ADC algorithm uses the delay measured and optimizes area-delay product during some calculations. Hence, the delay performance is the target parameter in both of these algorithms but the power is not optimized and we are allowed to measure power for a given delay.
Thus, MDE and ADC algorithms are not particularly optimizing the power-delay product (PDP) performance, but in this paper, PDP is the target parameter for Chang's and the proposed algorithms. In regard to the nature of MDE and ADC algorithms, comparison of these algorithms with Chang's and the proposed algorithms which are particularly optimizing PDP, is not naturally exact. Generally, discussion of these four algorithms has two aspects: 1-Delay 2-PDP.  Table 7 shows the comparison of the proposed algorithm with MDE, ADC, and Chang's algorithms in terms of the delay values obtained at 1.2 V V DD for five full adder circuits C-CMOS, CPL, TFA, 10T, and New HPSC. This table shows that although MDE algorithm optimizes the delay performance only and then ADC algorithm is used to optimize the area-delay product, the delay measured for full adder circuits using the proposed algorithm is on average 55.01% lesser compared to MDE and ADC algorithms. This result is noticeable due to the fact that the proposed algorithm optimizes PDP and the circuits are not optimized for the delay performance only. It can also be observed from Table 7 that the average improvement in delay is 33.16% using the proposed algorithm as compared to Chang's algorithm. Table 8 shows the simulation results from the MDE, ADC and the proposed algorithm comparison regarding powerdelay product. As it can be seen in the table, there is 57.92% improvement in PDP on an average using the proposed algorithm. It should be noticed that although comparison of these algorithms is not fair but it is useful to determine the approach of these algorithms to optimize the target parameter which is PDP.
An exact comparison is being made between the proposed algorithm and Chang's algorithm due to the fact that both of these algorithms are used to optimize the powerdelay product of the full adder circuits. Table 9 tabulates the values of the power-delay product at 1.2 V V DD in order to compare circuit optimization using these algorithms. Compared to Chang's algorithm, the PDP characteristics are improved by 25.64% on average using the proposed algorithm. The optimization results for the seven full adder circuits are also shown in Figure 10. Figure 11 shows the convergence process of obtaining PDP for some of the full adder circuits at 1.2 V supply voltage using Chang's algorithm and the proposed algorithm. MDE and ADC algorithms which are based on logical effort cannot participate in this comparison.
Considering Figure 11, it is apparent that the convergence process of the proposed algorithm is faster than Chang's algorithm since the proposed algorithm performs fewer iterations (each color presents an iteration) to obtain the optimal PDP. As mentioned earlier, this is owing to more exhaustive rules of grouping transistors, considering the transistors with identical positions in a circuit and also sweeping W * at the first run of the algorithm.
Therefore, the number of transistor groups using the proposed algorithm is lesser than Chang's algorithm. Thus, the entire process of obtaining a better performance is faster after application of our proposed algorithm. Second, as it can be seen in Figure 11, the target parameter obtained in the circuits using the proposed algorithm is almost lesser than that of Chang's algorithm. Hence, the proposed algorithm is more successful in providing much lower PDP. Chang's Algorithm at a Glance. In Chang's algorithm, there is no logical trend for initial values of transistor sizes. The designer experimentally assigns proximate values to transistors according to their kind and position in a circuit. In addition, there is no specific rule for the order of sizing the transistors. Therefore, the mentioned parameters, that is, initial values of transistor sizes and the order of selecting them for sizing operation could be variable which result in different optimization rate of the target parameter. In some cases, this difference is negligible but in others it is considerable. SEA algorithm performs better than Chang's algorithm in both cases. These characteristics show briefly the advantages of SEA algorithm over Chang's algorithm.

SEA Algorithm in Comparison with
(1) In SEA algorithm, sweeping W * at the beginning of sizing operation leads to the determination of more sensitive transistors which speeds up the convergence procedure (First sizing run in Figure 10). This strategy guides the optimization procedure to the advantage of more sensitive transistors.
(2) In SEA algorithm, the initial values of transistor sizes are not accidentally chosen. This logical trend has been obtained with initial sweeping of W * either.
(3) The new rule of grouping transistors in the same position results in lesser optimization parameters and execution time, more reliability and an increase in the optimization rate in many symmetrical circuits.

Conclusion
In this paper, we proposed a novel transistor sizing algorithm to optimize low-power high-speed arithmetic circuits, known as the Simple Exact transistor sizing Algorithm (SEA). The proposed algorithm has been compared with MDE, ADC and Chang's algorithm, whereby in comparison it has been shown that its overall performance is significantly superior. The simplicity of the transistor sizing approach using the proposed algorithm leads to a reduction within its sizing execution time. Moreover, using the SEA algorithm, reliability will increase since the order and initial values of sizing factors of transistors do not affect the final value obtained as the target parameter. This advantage lies in the properties of the symmetrical circuits. Faster convergence trend and achieving a better target parameter by the proposed algorithm have been shown to outperform Chang's algorithm. Comparison being made between the proposed algorithm and the other three in terms of delay characteristics exhibits on an average 55.01% improvement over MDE and ADC algorithms and 33.16% over Chang's algorithm. Comparison of the proposed algorithm with MDE and ADC in terms of PDP is not naturally exact since these algorithms are not designed to optimize Power-Delay Product (PDP) and power is measured for the optimum delay. Despite the approximate comparison being made, there is almost 57.92% saving in PDP which proves that the proposed algorithm has preference over Chang's algorithm.
Using Chang's and the proposed algorithm to optimize PDP for seven full adder circuits, simulation results show 25.64% improvement on an average which depicts the efficiency and ability of this algorithm in optimizing the target parameter.
Comparison of the algorithms in terms of time complexity and flexibility in choosing the target parameter can be observed from Table 10. There are some reasons for superiority of SEA algorithm with the others in running time: (1) grouping transistors will reduce the running time of each iteration. (2) within decreasing the number of transistor groups and reducing the dependency of parameters to each other, the speed of convergence trend will improve; (3) the first sweep of W of all transistors at the beginning of algorithm will cause a great jump in finding PDP in optimization operation ( Figure 11). Timing complexity of SEA and Chang's is nearly the same but the running time is remarkably improved.
Although the proposed transistor sizing algorithm (SEA) has been used in this paper to minimize the PDP performance for some full adders, it is highly flexible in determining the target parameter which the transistor sizing approach is performed to optimize. This parameter could be one of the circuit performances such as power dissipation, delay, area or a combination of them.
Execution time of the proposed algorithm is lower than MDE but higher than ADC. The rule of finding transistors with identical positions can be defined and used in symmetric circuits such as balanced XOR/XNOR circuits and results in fewer number of transistor groups and presents a smaller value for n. Thus, the execution time for the proposed algorithm could be lower than Chang's algorithm. The proposed algorithm could be used for other VLSI circuits and could be evaluated in terms of compatibility for certain parameters such as reliability, transistor grouping rules.