Shared Reed-Muller Decision Diagram Based Thermal-Aware AND-XOR Decomposition of Logic Circuits

The increased number of complex functional units exerts high power-density within a very-large-scale integration (VLSI) chip which results in overheating. Power-densities directly converge into temperature which reduces the yield of the circuit. An adverse effect of power-density reduction is the increase in area. So, there is a trade-off between area and power-density. In this paper, we introduce a Shared Reed-Muller Decision Diagram (SRMDD) based on fixed polarity AND-XOR decomposition to represent multioutput Boolean functions. By recursively applying transformations and reductions, we obtained a compact SRMDD. A heuristic based on Genetic Algorithm (GA) increases the sharing of product terms by judicious choice of polarity of input variables in SRMDDexpansion and a suitable area and power-density trade-off has been enumerated.This is the first effort ever to incorporate the power-density as a measure of temperature estimation in AND-XOR expansion process. The results of logic synthesis are incorporated with physical design in CADENCE digital synthesis tool to obtain the floor-plan silicon area and power profile. The proposed thermal-aware synthesis has been validated by obtaining absolute temperature of the synthesized circuits using HotSpot tool. We have experimented with 29 benchmark circuits. The minimized AND-XOR circuit realization shows average savings up to 15.23% improvement in silicon area and up to 17.02% improvement in temperature over the sum-of-product (SOP) based logic minimization.


Introduction
With the rapid increase in the functional complexity and miniaturization of chips, power-density is becoming a critical concern in VLSI design and synthesis methodologies.Feature size scaling to meet the demand of the portability and performance issues increased the total power utilization of the chip.Consequently, the power-density becomes extensive and generates a thermal effect, which reduces the performance and efficiency of the circuit.Even the integrated circuit (IC) chip may burn out due to thermal runaway.In recent time, power-density is an important constraint for designing the VLSI circuits to reduce the thermal effect, because power-density directly converges to temperature [1].So, optimized realization of a circuit taking power-density as a parameter in cost is very much important to limit the temperature generation.Temperature was given importance by researchers in physical design domain, but the cooling cost became high.With the high performance processors, the cooling solutions are rising at $1-3 or more per watt of power dissipation [2,3]; this shows that cooling costs are increased exponentially with the increase of power-density.So, designtime thermal-aware techniques can be used to improve the power and thermal characteristics of integrated circuits.
Logic minimization plays an important role in combinational synthesis domain to optimize the circuit by increasing the shared logic within the functions.Once the minimized circuit is obtained, it is the switching activity and transition probabilities of the logic (dynamic power) that determine the power consumption in the circuit.Then, the power-density is obtained by taking the ratio of the power consumption and the utilized chip area.Here, in this paper, we have proposed a logic synthesizer which tries to optimize the chip area and power-density by providing trade-offs between the two and tries to reduce the thermal effect of the combinational logic circuits.

VLSI Design
Multioutput function optimization aims at reducing the circuit area by extracting common subexpressions within the subfunctions.The most popular CAD tool packages which utilize the above logic are Espresso [4], SIS [5], and ABC [6].Espresso targeted AND-OR based PLA structure and is more commonly known as two-level minimizer.On the other hand, SIS and ABC utilize multilevel logic circuits to increase the sharing between subfunctions.The logic implementation reported in [4,5,7] utilizes AND-OR realization to reduce the circuit area.However, in many real-life circuits used in the fields of coding theory, telecommunication, linear system, computer arithmetic coding circuits, error detectioncorrection circuits, and data encryption and decryption circuits are inherently the basic functions of mod-2 sum form.In such cases, AND-XOR minimized algorithms often produce more compact circuit than the AND-OR based realizations.AND-XOR based PLA realization offers higher testability than AND-OR based circuits.However, applications of AND-XOR based circuits have so far not become popular due to the following two obstacles: (i) XOR gates have slow speed and require large silicon area to realize in comparison with OR gates.
(ii) The problem of optimization of AND-XOR functions is difficult although there has been a great deal of research in recent years.
With the development of new technologies and the advent of various field programmable gate array (FPGA) devices, the first obstacle has been solved.In programmable devices, the XOR gate is either easily realized in "universal modules" or directly available.For example, ATMEL FPGA series AT6000 uses two various input gates such as XORs, ANDs, and NANDs to configure logic blocks [8].Regarding the second obstacle, more recently, there has been some success in achieving area reduction by employing optimization techniques specifically targeted towards initial AND/XOR representations in the well known Reed-Muller (RM) form.
In order to develop an AND-XOR based circuits realization, there are several types of expressions such as positive polarity Reed-Muller (PPRM), fixed polarity Reed-Muller (FPRM), pseudo Reed-Muller, generalized Reed-Muller, XOR sum of products, and Kronecker and pseudo Kronecker forms [9].Each of these circuits has its own advantages.As far as XOR synthesis is concerned, this paper concentrates on the synthesis of FPRM circuits only.
In the above background, the problem of the current work can be addresses as follows.
A multi-input, multioutput Boolean function  and weight factors perform FPRM decomposition and share the product terms.Minimization depending on weighted sum approach for area (number of product terms) and powerdensity is performed.The circuit realization is carried up to physical design synthesis to obtain the actual area and temperature.
The rest of the paper is organized as follows.Section 2 illustrates the motivation and previous work on AND-XOR synthesis.Section 3 presents thermal-aware AND-XOR problem formulation and synthesis approach.Section 4 illustrates the GA formulation for thermal-aware SRMDD based AND-XOR network synthesis.Section 5 presents experimental results and finally Section 6 draws the conclusion and future works.

Motivation and Previous Works
The motivation of AND-XOR realization comes from Example 1.
Example 1.Consider a Boolean function  consisting of 3input and 2-output functions consisting of the following subfunctions: By realization of function  using AND-OR network, it requires 4 product terms:   ,   ,   , and .If we realize the same function  using AND-XOR network with all positive polarities, it will provide 3 product terms.The FPRM forms of  1 and  2 subfunctions are ( The Reed-Muller (RM) canonical expansion of a -variable Boolean function  can be represented by 2  terms.The general expansion of RM is given by where   ∈ {0, 1} If we decompose the function  using Shannon's Expansion, three gates are required (two ANDs and one XOR), whereas only two gates (one AND and one XOR) are required to realize the same function  using positive Davio (pD) or negative Davio (nD).In this work, we are applying the positive Davio expansion or negative Davio expansion to the given function  using either positive or negative polarity of variables but not both for each variable.Then, the Boolean function  is logically expressed as fixed polarity Reed-Muller (FPRM) expansion.For an -variable function, there are at most 2  different FPRMs.The minimization problem is to find one with the minimum products among 2  possible FPRMs.To solve the above problem, we have applied a Genetic Algorithm (GA) based formulation to identify the best polarity assignment to the input variables to get the desired output.
Detailed descriptions of two-level AND-XOR network synthesis have been done in [12,13].Better and minimized realization can be possible using AND-XOR logic synthesis compared with that of AND-OR synthesis in terms of fewer product terms and that has been reported by Sasao and Besslich in [14] and Ye and Roy in [15].Sasao et al. deal with the problem of minimizing the two-level AND-XOR PLAs by utilizing both positive and negative polarity of variables and proposed several heuristic methodologies in mod-2 SOPs in [14,16].Realization of Boolean functions in the positive polarity AND-XOR form has long been proposed as Reed-Muller expansion in [17].The modified versions of this basic canonical form have been studied by several researchers as time passes.The representation in which a variable can have either positive or negative polarity throughout the function is known as fixed polarity Reed-Muller (FPRM) form as given by Davio and Deschamps [18].FPRM expansion utilizes a much smaller number of product terms than the original Reed-Muller form with high testability.An FPRM based heuristic approach has been proposed by Sarabi and Perkowski to find out the best polarity assignment [19].A GA based polarity selection of FPRM realization scheme for multioutput Boolean function to minimize the area was presented by Chattopadhyay et al. in [12].Low-power decomposition of XOR based synthesis has been presented by Narayanan and Liu [10].In [13], a GA based area power trade-off analysis has been reported by Pradhan and Chattopadhyay.Elaborated survey of the work done so far has been given in [20].However, all the above works did not consider the power-densities as a coefficient of estimating temperature of AND-XOR based circuits to analyze the thermal effect.We have contributed a trade-off analysis by taking power-density along with area.In logic synthesis level absolute value of temperature is unknown, so to evaluate temperature we have to consider power-density for temperature from the following equation.Temperature is directly proportional to power-density and this can be established by [21] In (5),  chip is the average chip temperature.  is the ambient temperature (  = 25 ∘ C).   is the equivalent thermal resistance of the substrate (Si) layer plus the package and heat sink (cm 2 ⋅ ∘ C/W). total (in W) is the total power consumption. (in cm 2 ) is the chip area.Keeping ambient temperature constant in (5), it can be concluded that temperature generation depends only on power and area (since equivalent thermal resistance is constant for a particular substrate).This has led us to consider power-density as the constraint of temperature and subsequently consider power-density minimization along with area during polarity selection of FPRM based AND-XOR network synthesis of circuits.

Thermal-Aware AND-XOR Problem Formulation and Synthesis Approach
To represent a Boolean function efficiently into FPRM, the critical issue is to select the polarity thoughtfully for maximum sharing considering the optimization parameters.
In this section, we have explained the method for assigning the input variable polarity and calculation of area and powerdensity depending on polarity assigned.-input, -output Boolean function can be realized as an FPRM expansion by maintaining each variable with a fixed polarity, either positive or negative throughout the expansion.A variable appears either in true or in complemented form within the expansion.This can be achieved by the following steps: (i) Boolean functions are expressed into disjoint cube representation.(ii) Without affecting the functionality, ORs are replaced with XORs.(iii) Each variable is assigned with consistent polarity for FPRM representation.(iv) Decompose the literals into a consistent polarity one.
The output subfunctions are represented into AND-OR cubes into Boolean functions.The AND-OR cubes are converted to a set of disjoint cubes.Then, without affecting the functionality, the ORs can be replaced with XORs.After obtaining the set of AND-XOR cubes, the next task is to determine the polarity assignment.The polarity can be assigned as ⟨ 1 ,  2 ,  3 , . . .,   ⟩,   ∈ {0, 1} to the variables ⟨ 1 ,  2 ,  3 , . . .,   ⟩ to maximize the sharing of product terms to obtain the desired output of the FPRM realization.A variable with polarity 1 occurs in true form in all product terms, whereas variable with polarity 0 occurs only in complemented form.To obtain the FPRM form, the literals having polarities different from that assigned to the corresponding variables are replaced by (1 ⊕   ) if input variable polarity   is 1.Otherwise, the variable is    .Depending on the polarity, the Boolean function gets represented in different FPRM realization.Each realization provides a different area in the form of a number of product terms and respective power-densities and allows a trade-off between the two.Example 2, in Section 3.1, explains the area computation which is preceded by power-density estimation.

Shared Reed-Muller Decision Diagram (SRMDD) Decomposition Based on Fixed Polarity and Area Computation.
Shared decision diagrams are used to represent multioutput Boolean functions, like where  = {0, 1} and  and  denote the number of input and number of output variables, respectively. different logic functions are decomposed into AND-XOR based realization by maintaining a fixed polarity.The realized functions share the identical terms, which are represented by a common part of the SRMDD.In this paper, the shared FPRMs within the subfunctions are termed SRMDD.By iteratively applying FPRM decomposition and sharing the identical product terms, we obtain a compact SRMDD.Example 2 shows the formation of SRMDD of the full-adder circuit.Subsequently, area computation has been illustrated.
Example 2. In full-adder circuit, , , and  are the three inputs added to produce the "Sum" and "Carry" outputs.The functions are given by Sum =   +  +   + , Both output functions can be realized as FPRM based AND-XOR network by applying the positive Davio expansion to  and  and negative Davio to .  and  appear as true form and  appears as complemented form by substituting  = ( ⊕ 1),  = ( ⊕ 1), and  = ( ⊕ 1) into the output functions Sum and Carry.After decomposition, we have The identical product terms, such as , , , and , are shared between the two output functions.
Figure 1 illustrates the formation of FPRM expansion tree of Sum function.The nodes with pD denote the positive Davio expansions, and the nodes with nD denote the negative Davio expansions.In each path from the root node to constant 1, the logical product of the labels in the path corresponds to a product term in an FPRM. Figure 2 shows the FPRM expansion tree for Carry function.After sharing the identical product terms, the SRMDD tree of Sum and Carry generates 6 product terms, whereas if we expand the tree separately it would require 10 product terms.The product terms are the representative area for the Boolean functions.By changing the polarity of a variable in a given function, the structure of the circuit is changed.Initially, the circuit was in the form of AND-OR circuit.After polarity assignment, the circuit is represented in the form of AND-XOR circuit.But the functionality of both structures is the same.In particular, the set of input responses of both circuits is the same.So, the truth vector formed in AND-XOR may differ from its primary initial AND-OR realization keeping the functionality unchanged.In Reed-Muller AND-XOR realization chance of sharing product terms among the subfunction increases, which results in area reduction.Shared Reed-Muller Decision Diagram based on fixed polarity obeys the commutative law of addition and multiplication.Variable ordering does not affect the decision diagram.We kept the variable ordering fixed and changed only the polarity of the variable.In Figure 3, the bold straight lines are shared between Sum and Carry functions and dotted lines are only expanded branches for Carry.

Powers-Density Estimation.
Power-density can be defined as the amount of power drawn per unit area.In CMOS logic circuits, the power utilization takes place mainly due to three components: switching (capacitive), short-circuit, and leakage power dissipation.Among these, switching power is the main contributor, and short-circuit and leakage powers become significant when technology scales down to 65 nm technology.Switching power consumption occurs due to charging and discharging of load and parasitic capacitors.It can be evaluated by Here, ESA  and ESA  are the switching activity at the load and the internal node of the circuit, respectively,  DD is the supply voltage,  is the frequency of operation, and   and   are the load and internal gate capacitances, respectively.To estimate the switching power dissipation, we need to compute the expected switching activity (ESA) of the logic gates.It is defined as the expected number of signal transitions at the outputs of the gates of a combinational logic circuit, and we have used the same technique used in [13].We assume that primary inputs are uncorrelated and are statically independent of each other, and the primary input probability can be expressed as First, we need to determine the ESA of a single gate.The logic gate changes its state when the current state of the output differs from the previous one.Therefore, the probability of the output of a gate changing its state can be evaluated by We assume that the probability does not change with time.So, the estimated switching activity of logic gate (ESA g ) is given by The estimated switching activity for an "" input AND gate with primary inputs (ESA AND ) is given by For "" such AND gates in the first level, switching activity is given by To compute the ON-probability of second-level XOR gates, we consider the Boolean function implemented by them.If a gate with "" inputs realizes a function with "" ON-terms, the ON-probability is given by /2  .Thus, switching activity of the node is estimated by After estimation of switching activity that represents the switching power dissipated by the required Boolean function, it is divided by the estimated area of the logic after the realization of SRMDD expansion to obtain the power-density: This parameter participates while calculating the fitness of particular expansion.

Genetic Algorithm Formulation for SRMDD AND-XOR Network Synthesis
Genetic Algorithm (GA) is a stochastic heuristic search method that utilizes the mechanism of natural selection.
In this section, we structured the solutions of AND-XOR SRMDD of each circuit as chromosomes to reduce the area and power-density (temperature).The genetic formulation involves the careful and proficient choice of proper encoding of the input variables to form chromosome (each chromosome represents a possible solution), cost function measuring the suitability of the chromosomes in a population, elitism (direct copy to save the best chromosomes), crossover operator, mutation operator, and termination criterion.

Chromosome Encoding. 𝑛-input, 𝑚-output
Boolean function is elegantly represented as chromosome into a string of bits of length .The chromosome is a set of  variables ( 1 ,  2 ,  3 , . . .,   ) of positive and negative polarity.If the th bit is "1," it represents the notion that the th variable is implemented in positive polarity, whereas if the th bit is "0," it represents the notion that the th variable is in the form of negative polarity.For a five-input Boolean function, the structure of a chromosome may be described as in Figure 4.The first, third, and fifth bits are represented as "1"; that is, the first, third, and fifth input variables are represented as positive polarity, whereas the second and fourth bits are represented as "0" which means the corresponding variables are represented as the negative polarity.We considered population size of 50 to 100 depending on the number of outputs.

Fitness Function Measurement.
The fitness of a chromosome is determined by the suitability of the resulting circuit.
We have used a weighted linear combination method for the area (number of product terms) and estimated power-density.Fitness of a particular chromosome  can be determined by using the following formula: fitness () =  1 area () area max +  2 power-density () power-density max .
In (17), "area max" and "power-density max" are the maximum area and maximum power-density of any chromosome after SRMDD realization of the circuit in the first generation.
For a chromosome (), the area and power-density are represented by "area()" and "power-density()."The weights  1 and  2 can be set by the designer with  1 +  2 = 1.

Elitism (Direct Copy).
Elitism is a technique to prevent losing the best-found solution in a population [22].The best 20% chromosomes of the present generation are directly copied to the next generation and these are considered as the "elite group."The best solutions are propagating to the next generation by elitism methodology.Elitism guarantees that the best solutions are not lost or inadvertently degraded by crossover or mutation.

Crossover.
The crossover operator constructs new solutions by crossing over two parent chromosomes at randomly selected crossover points.In our GA formulation, the selection of parent chromosomes is not fully random; it is conditionally biased towards the better fitness chromosomes.
Using two-point methodology, crossover operation generates 60% of the chromosomes and propagates them to the next generation.The selection of participating chromosomes for crossover is biased towards the "elite group" of the total population.To obtain the "elite group," the whole population is sorted depending on the fitness value and 20% of the population with better fitness value is considered.
To select a chromosome participating in crossover, first, a uniform random number between 0 and 1 is generated.If the number is greater than 0.5, a chromosome from the "elite group" is selected randomly.Otherwise, a chromosome is selected from the entire population.This biased selecting method enables generating better offspring as compared to the truly random one.After generating each pair of chromosomes, a check is made with the members of the present population and duplicate chromosomes are eliminated.
Figures 5 and 6 show the two methods of crossover operation.Two parents, Chromosome ( 1 ) and Chromosome ( 2 ), are selected from the present generation for the evolution of offspring, which will participate as chromosomes for the next generation.Two crossover points (Pt1 and Pt2) are selected randomly.These two points segment the parent chromosomes into three parts.Chromosome ( 1 ) is divided into  11 ,  12 , and  13 whereas Chromosome ( 2 ) is divided into  21 ,  22 , and  23 segments.In case of Method 1, it produces Chromosome ( 1 ) as  11 ( 22 ) 13 .Figure 6 shows the second method of crossover offspring generation.In this Chromosome (y 1 ) Chromosome (x 2 ) x 11 x 12 x 13 x 23 x 13 x 22 x 22 x 21 x 11 Figure 5: Crossover operation (Method 1).
Chromosome (y 2 ) Chromosome (x 2 ) x 11 x 12 x 12 x 23 x 23 x 13 x 21 x 21 x 22 case, Chromosome ( 2 ) is generated as  21 ( 12 ) 23 .After redundancy check, the generated offspring contribute the population of chromosome for the next generation.

Mutation.
In Genetic Algorithm, the genetic diversity from one generation of population to the next is maintained by mutation operation.It is intended to prevent falling of all solutions in the population into a local optimum of the solved problem.20% of the chromosomes of the next generation will be produced using mutation operator.To perform mutation, few randomly selected bit positions (mutation points) within the chromosome are inverted.Figure 7 illustrates the mutation operation.Chromosome () is chosen from the present generation for mutation operation.Randomly, two positions are selected as mutation points (Mp1 and Mp2) by the abovementioned procedure used in crossover.The selected position bit gets inverted from "0" to "1" and/or "1" to "0"; other remaining position bits are unaltered.The newly generated offspring become the chromosome of the next generation.

Termination Criterion.
The termination of algorithm depends on the fitness criteria.GA terminates if there is no improvement in fitness value for 100 consecutive generations.
The best chromosome at the final generation is taken as the optimum solution with respect to weighted sum of area and power-density (temperature).

Experimental Results
In this section, we present the proposed GA based thermalaware SRMDD AND-XOR implementation technique.The algorithm has been implemented in C language and simulated on a Pentium IV machine with 3.4-GHz clock frequency and 3-GB RAM memory using Linux platform.For experimental validation, we applied the algorithm on a number of benchmark circuits from LGSynth93 benchmark suit.Discussion is parted into three sections.The first part concerns the simulation results based on area and powerdensity aware SRMDD AND-XOR circuits, which proceeded by RTL synthesis of algorithmic resultant circuits using CADENCE "RTL Compiler."The last segment contains the ASIC implementation using CADENCE "encounter."

Simulation Results Based on Area and Power-Density
Aware SRMDD AND-XOR Circuits.Boolean functions formatted as ".pla" file are considered as input benchmark circuits.Figure 8 shows the format of full-adder circuit as a .plafile.Subsequently, input variable encoding and SRMDD AND-XOR circuit realization of full adder has been explained.Functions are encoded into chromosomes as explained in Section 4.1.After encoding, the next task is to implement the .plafile into SRMDD AND-XOR format.That can be achieved by the method explained in Section 3.1.
The sum function () can be converted into FPRM AND-XOR decomposition by replacing all the OR gates by XOR gates after obtaining disjoint cube.Then, by maintaining  The input format generated by Sum and Carry after FPRM transformation is given in Figures 9(a) and 9(b), respectively.Figure 9(c) shows the SRMDD encoding of both functions into single .plafile.
Maintaining the same procedure, we have applied the SRMDD to a number of benchmark circuits.Table 1 shows a comparative study of the best area results of our approach with the RM tree based and heuristic algorithm based area reported in [10].It has been observed that our GA based SRMDD formulation is quite comparable because we obtained around 5.00% improvement with respect to heuristic based approach.Except for three circuits (misex1, rd73, and squar5), all other benchmark circuits reached the optimal value with respect to RM tree based approach.Columns 2, 3, and 4 show the input, output, and number of product terms present in the benchmark circuits.Column 5 shows our area optimal results and columns 6 and 7 show the area reported in [10].
The trade-offs have been observed by varying the weights of area and power-density as shown in (17).Table 2 shows some of the example cases for area power-density trade-offs for different weight values assigned to area ( 1 ) and powerdensity ( 2 ) in a range of 0 to 1.When ( 1 = 1,  2 = 0), 100% weightage is given to area and no control is paid towards the power-density constraint.Similarly, when ( 1 = 0,  2 = 1), the circuit shows power-density aware optimization only.Column 16 shows the max.CPU time required in all

5.00
decompositions and it is expressed in millisecond (ms).To outline the results, we have calculated average percentage increase with respect to minimum area and minimum powerdensity.
Trade-off analysis between area and power-density is shown in Table 3.It is clear from Table 3 that the optimum area result is obtained at weight ( 1 = 1,  2 = 0) where area is minimum of all other combinations but power-density is increased by 20.13% from its minimum value.Similarly, optimum power-density aware circuit is obtained at weight ( 1 = 0,  2 = 1), where power-density is minimum of all other combinations but, on the contrary, area value is increased by 79.39% on average.It is clear from the graph in Figure 10 that the optimal result with respect to area and power-density is obtained at weight ( 1 = 0.6,  2 = 0.4) where average percentage increases in area and powerdensity are 9.17% and 13.84% with respect to minimum area and power-density, respectively.In the next section, we are going to discuss RTL (Register-Transfer Level) synthesis of the result obtained from algorithmic level to find the absolute temperature of the optimized circuit as given in [23,24].

Synthesis Using CADENCE RTL Compiler.
The optimum results obtained from algorithmic level are translated into    Verilog hardware descriptive language and they are fed as an input into CADENCE digital synthesis domain RTL Compiler (RC) for synthesis.This tool converts the RTL to standard cell based gate level netlist.The generated standard cell based gate level netlist is used in physical design level to generate the layout.The necessary inputs to perform synthesis are RTL, standard cell library, and constraints.The timing constraints information is provided using SDC file format.To perform the synthesis, we use the command "synthesis-to mapped-effort medium" which combines the generic, mapped, and incremental synthesis and medium effort is given to synthesis process.The effort can be set to "low," "medium," or "high" depending upon the application area.By using the "report" commands, we can write out the results for area, power, and time utilization at synthesis level.After completion of synthesis, we need to write postsynthesis HDL (netlist file) and constraints generated into Verilog HDL and SDC file, respectively, for layout.Table 4 shows the postsynthesis analysis of the SRMDD AND-XOR circuit realization.
We have considered only the optimum output combinations with respect to area, power-density, and combination of  both, that is, for the combinations ( 1 = 1,  2 = 0), ( 1 = 0,  2 = 1), and ( 1 = 0.6,  2 = 0.4), respectively, for analyzing the postsynthesis results.The last two columns of the table report maximum CPU computation time (in seconds) and maximum delay (in nanoseconds).The CPU computation time is computed with the help of "get attribute runtime" command and maximum delay is computed with the help of "report timing" command in RC.To sum up the result analysis, we have computed the average percentage increment with respect to optimum area and optimum power-density and tabulated the results in Table 5.The results of Table 5 are plotted in graph and shown in Figure 11.From the figure, it has been enumerated that if application is area specific, we can consider the weight ( 1 = 1,  2 = 0) which is optimized area, but power-density in that case is at its maximum of 10.11% greater than its minimum value.For temperature specific application, we can go with the weight combination ( 1 = 0,  2 = 1).In this case, power-density is at its minimum value but area is increased by about 30.18% compared with its minimum value.As observed from the graph, the optimum result considering area and temperature (power-density) is obtained at weight combination ( 1 = 0.6,  2 = 0.4) for which area and power-density are 9.24% and 6.61% higher compared to their respective minimum value.Yet, we have not discussed the absolute temperature estimation.That has been discussed in the next section in physical synthesis level.

Physical Design Using CADENCE Encounter at 45 nm
Technology.To obtain the actual silicon floor-plan area (in micrometer square) and absolute temperature (in degrees Celsius) of a logic circuit, we carry the design process to physical design domain of CADENCE encounter digital implementation (EDI) tool for physical implementation.For EDI, the input requirements are netlist, SDC (Synopsis Design Constraints) library, and physical details of the standard cell which is present in Library Exchange Format (LEF) file.Design netlist created from synthesis stage (as explained in Section 5.2) was imported and after assigning the LEF file we configured the design analysis table.We have selected "macro" and "tech" LEF files at 45 nm technology provided by CADENCE for our analysis.In the design analysis table, we set the maximum and minimum delay library file, cap table, and technology library file at 45 nm technology.Placing of standard cell has been done in step Placement in design tool by using command "place standard cells."After setting all the criteria and placing standard cells, we have saved the floor-plan information, which will act as an input to HotSpot tool to calculate temperature profile.After designing, there are several verification processes under physical verification steps to rectify any errors generated while designing.The report generated after the process completion includes all the information about standard cell area (in micrometers), power dissipation by standard cell (in nanowatts), computational time required, and so forth for designing a particular circuit.The power information acts as another input file for HotSpot tool.The HotSpot tool considered the floor-plan and power profile information and provides the temperature profile in degrees celsius.We reported the area in micrometer square (m 2 ) and temperature in degrees celsius ( ∘ C) in Table 6.We have considered only the optimum output combinations, that is, (0, 1), (0.6, 0.4), and (1, 0), as explained in Section 5.2.Column with "std.cell area" gives the total silicon floor-plan area in micrometer square.The column with "Max Temp" reported the maximum temperature generated by a logic  design in degrees celsius."Max Delay" column shows the maximum time delay required by circuit in nanosecond among all combinations.The memory usage for implementation of the circuit is also reported in terms of megabytes in columns 10 and 15 for our approach and Espresso driven circuits.Columns 9 and 14 report the maximum CPU computation time (in seconds) required for implementing the circuits using "encounter."We involved three commands, "set initial time", "set stop time," and "set total execution time," to evaluate the CPU computation time.We have considered 29 benchmark circuits and compared the results with Espresso driven results which decompose the circuits into AND-OR based logic.
As observed from Table 6 and the graph shown in Figure 12, the best result at combination (0.6, 0.4) shows average percentage savings of 9.12% and 14.86% for standard cell area and maximum temperature, respectively.For the best area decomposition at combination (1, 0) shows 15.23% average savings for area and 12.93% average savings for maximum temperature.However, the best temperature combination (0, 1) shows 17.02% average savings for maximum temperature but 5.17% increase of standard cell area.
Here, a trade-off between area and temperature is shown.As we try to improve the area constraint, the temperature degrades and vice versa.We have also observed 5.46% average savings for maximum delay and 0.86% savings for memory usage in megabytes with respect to AND-OR dominated circuits.

Conclusion
We have presented a thermal-aware GA based heuristic approach for input polarity selection of variables in Shared Reed-Muller Decision Diagram based fixed polarity AND-XOR circuit decomposition.Efficient selection of input variables can reduce the total silicon area and temperature that has been enumerated in this paper.The paper also shows a trade-off between area utilized and temperature of the circuit.
It also collaborates the logic synthesis level and physical design together.A range of solutions are achieved by varying weights of area and temperature values at logic synthesis level.The best results of logic synthesis level were brought into the physical design level and obtained a considerable improvement over AND-OR based circuits.The floor-plan and power profile information of each circuit is fed into the temperature estimation tool HotSpot for temperature estimation.In this work, we have reported 29 benchmark circuits.Within the 29 benchmark circuits, "x3.pla" has the largest input with 135 variables and 99 output functions.We have designed the algorithm dynamically.After reading the benchmark input circuit, the program assigns the memory location for bit manipulation required in different stages of optimization technique.To verify the above statement, we have run the algorithm with the benchmarks "o64.pla,""apex5.pla,"and "x3.pla" which may be considered as large circuits in terms of input variables and output functions.Our algorithm successfully runs and gives optimized Shared Reed-Muller Decision Diagram based AND-XOR outputs.One important point to note is that CPU runtime depends on the frequency of the processor and memory available for usage.The same design can have different runtime in a system if parallel work is going on in the system.According to our survey, thermal-aware consideration in fixed polarity selection of AND-XOR circuit synthesis process has been done for the first time in this work.So, future work involves other XOR based circuit realizations for their thermal-aware realization like generalized Reed-Muller (GRM), mixed polarity Reed-Muller (MPRM), and pseudo Reed-Muller techniques and we are currently working on them.

Figure 3 :
Figure 3: SRMDD expansion tree for Sum and Carry.

Figure 4 :
Figure 4: Structure of a chromosome.

Figure 10 :
Figure 10: Average percentage improvement with respect to min.area and min.power-density.
min.area With respect to min.Pow_den

Figure 11 :
Figure 11: Average percentage improvement with respect to min.area and min.power-density (postsynthesis report).

Figure 12 :
Figure 12: Average percentage savings with respect to area and temperature of AND-OR decomposed circuits (postlayout report).

Table 2 :
SRMDD based area power-density trade-off analysis.

Table 6 :
Postlayout analysis of area (in m 2