The intrinsic variability of nanoscale VLSI technology must be taken into account when analyzing circuit designs to predict likely yield. Monte-Carlo- (MC-) and quasi-MC- (QMC-) based statistical techniques do this by analysing many randomised or quasirandomised copies of circuits. The randomisation must model forms of variability that occur in nano-CMOS technology, including “atomistic” effects without intradie correlation and effects with intradie correlation between neighbouring devices. A major problem is the computational cost of carrying out sufficient analyses to produce statistically reliable results. The use of principal components analysis, behavioural modeling, and an implementation of “Statistical Blockade” (SB) is shown to be capable of achieving significant reduction in the computational costs. A computation time reduction of 98.7% was achieved for a commonly used asynchronous circuit element. Replacing MC by QMC analysis can achieve further computation reduction, and this is illustrated for more complex circuits, with the results being compared with those of transistor-level simulations. The “yield prediction” analysis of SRAM arrays is taken as a case study, where the arrays contain up to 1536 transistors modelled using parameters appropriate to 35 nm technology. It is reported that savings of up to 99.85% in computation time were obtained.
Anticipating the impact of device variability on performance is a critical aspect of design procedures for integrated circuits. With nanoscale technology, “intradie” variability, causing behavioural variation from device to device within each die, is an important consideration [
Because of the high dimensionality of the parameter space, it is very difficult to derive analytical models of large-scale ICs for analysis. Sets of equations describing specific circuit performance parameters, like timing or yield, in terms of the huge number of parameters would be very difficult to derive analytically. Therefore, the use of conventional statistical methods, based on the analysis of such equations, has restricted applicability for the variability analysis of nanoscale ICs. With Monte Carlo (MC) analysis methods, the integrated circuits are simulated directly, and there is no need to derive differential equations that describe the dependency of the properties of interest on circuit parameters. The only requirement is that the circuit properties to be modeled are capable of being described by probability density functions (pdfs) that are dependent on the pdfs of circuit parameters and input variables. Monte Carlo analysis proceeds by generating a random sample of the value of each circuit parameter and each input variable, based on their known statistical properties. It then simulates the circuit thus obtained, using a finite difference analysis package, to compute a random sample of the circuit property of interest. The process is repeated many times to obtain a sequence of samples of the property of interest from which statistical models can be derived.
In nanoscale technology, “atomistic” variabilities are random and generally uncorrelated from device to device. According to Srivastava et al. [
MC analyses are particularly suitable for nanoscale IC statistical simulation to achieve statistical estimates of properties of interest. General conclusions drawn from the analysis of many randomised samples of a circuit can be made representative of the effects of variability. To quantify these conclusions, statistical averages are produced. The statistical reliability of the conclusions generally improves as the number of circuit increases, though the rate of improvement can often be increased by carefully choosing the samples. MC techniques are especially useful for predicting the “yield” from many copies of the circuit when these are manufactured with typical accuracy and parameter variations.
A major problem is the computation required for carrying out sufficient analyses to produce statistically reliable averages. The introduction of intradie correlation into circuit models increases the computational complexity considerably. To simplify the computation, a form of Extreme Value Theory known as statistical blockade (SB) [
This paper contains seven sections. The first is an introduction, and Section
Monte Carlo methods use repeated pseudorandom sampling of the behaviour of mathematical equations, or real or simulated systems, to solve mathematical problems or to determine the properties of systems [
One of the most often quoted applications of Monte Carlo methods is the evaluation of multidimensional integrals [
Here,
Monte Carlo sampling avoids the inefficiency of the rectangular grids created by regular sampling by using a purely random set of
Figures
Convergence of MC and regular integration for (a) 3D integral, (b) 4D integral, and (c) 5D integral.
Much research has been devoted to finding ways of decreasing the Monte Carlo error even further to make the technique still more efficient. One approach has been to use variance reduction techniques [
The “Vegas” Monte Carlo approach of Lepage [
The ideas proposed for integration have inspired similar ideas for efficiency improvement when Monte Carlo techniques are used for simulation, and these prove to be especially valuable for VLSI circuit simulation where the dimensionality and complexity are very high.
Monte Carlo simulation is the application of Monte Carlo methods to study properties of systems having stochastic components. It uses repeated pseudorandom sampling of input variables to determine the behaviour of some physical system as characterized by a computer model. In this work, the physical system is an integrated circuit modelled by SPICE, the input variables are component values which are variable due to the uncontrollability of manufacturing effects referred to in the introduction, and the behaviour we are interested in may be viability, or otherwise, of the circuit. With repeated sampling used to simulate the fabrication of batches of nominally identical integrated circuits with the specified component variation, the estimation of the probability of a circuit being viable, that is, that it works, can be considered an estimate of the expected “yield,” that is, the percentage of working circuits within a batch. The criteria that determine viability are many, including correct logical operation, the power consumption, and the propagation delay in the whole circuit or parts of it.
As argued in [
The direct simulation approach samples observations of the random vector,
The integration formulation of simulation samples the “source of randomness”
Calculating the distribution, between limits, of
The direct approach is clearly applicable when actual component or parameter measurements are available, or when they have been synthesised as for the transistor set provided by RandomSPICE [
However, even when real sets of parameters are available, instead of using them directly, it may be advantageous to produce a model of them as the transformation of a smaller set of independent uniformly distributed random variables. Then the integration formulation may be adopted, based on models derived from real data. The models may be derived by employing Principal Components Analysis (PCA) to extract a smaller set of statistically independent parameters that may be transformed back to the complete set with little distortion. The dimensionality may thus be reduced, and each of the independent parameters may be modelled as the transformation of a uniformly distributed random variable. If the independent variables may be considered Gaussian, it is straightforward to model each of them as a transformed uniform random variable. The transformation to Gaussian is achieved by the Gaussian ICDF function. Transformations to independent random variables with distributions other than Gaussian may be achieved by different ICDFs. Multivariate versions of these functions allow correlation to be introduced. With the different distributions, essentially the same methodology with respect to statistical delay estimation as used with Gaussian, Pareto, and low-discrepancy sequences in this work can remain valid.
Apart from the likely reduction in dimensionality, this transformation of a direct simulation approach to an integration-like formulation allows a much larger set of randomised devices to be generated than are available in the original set. Hence more simulations may be run with different sets of parameters based on parameters obtained by measuring real devices.
Viewed in either formulation, Monte Carlo simulation samples the probability distribution of all the input variables and system parameters to produce many repeated versions of the system. These are in turn analysed to determine how certain key output measurements vary due to the input variability. A histogram of each key output measurement gives an estimate of its likely distribution, the estimate becoming more and more reliable as the number of simulations increases. Since the number of simulations must be restricted for practical reasons, the accuracy of these results is also limited by practicality.
Where valid assumptions can be made about the shape of the distribution, for example, that it is Gaussian or Chi-squared, a maximum likelihood fit to the histogram can be made on such assumptions. Such a fit would produce a value of mean and variance thus allowing a pdf as shown in Figure
Gaussian probability density function of a circuit propagation delay and delay threshold
The accuracy of this estimation will depend on the number of simulations which must be limited. Unfortunately, the effect of the limitation will be most serious over the tails of the distribution, which is the part we are most interested in. For example, for a Gaussian pdf, the probability of being more than three standard deviations greater than the mean is
In general, Monte Carlo methods proceed as follows. The characteristics of the input vectors are determined. Random vectors are generated with appropriate distributions and intercorrelation either directly or by transforming independent uniformly distributed random vectors. A deterministic computation is performed to simulate the behaviour of the system for each of the randomized input vectors. For each key output measurement, its pdf is estimated in the best way possible, given the inevitable limitations in the amount of data available. Deductions about the probability of certain events are made from the estimated pdf.
The procedure of an IC’s Monte Carlo simulation is described as follows. Find the statistical distribution for each parameter. Sample the statistical process to produce a value for each parameter. Parameterize one circuit and simulate it. Repeat for many copies of circuit and obtain the statistical distribution of a specific measurement.
Correlation between device parameters may be due to the intradie proximity of components, or other causes. The device parameters may be direct physical quantities, for example, resistance or capacitance, or they may be coefficients of principal component vectors which are ultimately transformed back to physical quantities. Define a matrix
We need to generate, for each Monte Carlo randomised circuit
Then find matrix
Finally, generate a multivariate (4-element) correctly correlated random vector as follows:
Introducing intradie variability into the multiple parameters of transistor devices is possible using the same approach as used for component values. In such applications, it is useful to partition a large correlation matrix into a number of smaller ones, each catering for one type of parameter or principal component. This is possible when there can be assumed no correlation between the different types or principal components. This is guaranteed for principal components. Clearly, a different value of
The example below demonstrates the effect of correlation on statistical performance analysis of a binary full adder with behavioural models of NAND gates. Consider the binary full adder shown in Figure
Binary full adder (BFA) circuit.
On-chip layout assumed for BFA circuit.
With one delay parameter for each behavioural NAND gate [
Knowing the standard deviations of the parameters allows
Histogram of delay times for 500 BFA circuits (
Consider
Repeating the same procedure with
Histogram of delay times for 500 BFA circuits (
The assertion that the error resulting from MC analyses is proportional only to the square root of the sample size does not mean that the same sample size is appropriate to any circuit no matter how complicated it is. There are clear advantages in reducing the dimensionality of analysis problems including the simplification of the computational complexity of the analyses. This chapter deals with two methods of reducing the dimensionality of Monte Carlo analysis. The first is Principal Components Analysis (PCA) which transforms the random variables required to characterize a circuit to a reduced number of statistically independent variables. PCA is also useful as a means of introducing intradie and interdie correlations. The second is the use of statistical behavioural circuit blocks (SBCB) which substitute functional but computationally simpler circuit models for device-level analogue subcircuits. The aim of this research is to apply them in new ways which may be suitable for inclusion in the ongoing NGSPICE open-source project [
PCA is a technique for transforming samples of
Taking the summed squared differences between the elements of an original component (“feature”) vector and a reconstructed one as a measure or the error or loss of information incurred by PCA, of all possible linear transformations to a lower dimensional space, PCA is optimal in minimising this error over all vectors. Further, the PC vectors are conveniently ordered in the sense that the first one has the highest variance and accounts for as much variability as possible. Each succeeding PC vector has lower variance, and therefore less importance, but has the highest variance possible while being uncorrelated to all the previous ones.
PCA may be carried out by eigenvalue/eigenvector decomposition of the
The previous outline of PCA hides the obvious difficulty of deciding what error is incurred by removing components with nonzero eigenvalues which are considered small and how small an eigenvalue must be to be considered negligible. Such considerations of PCA are application specific and best related to the specific objective and how the error is to be quantified.
Assume that we have a database of sets of randomised device parameters which have been published by a manufacturer or generated on the basis of theoretical modelling. Taking the RandomSPICE [
Values of first ten ordered eigenvalues for Toshiba NMOS data.
Mean square difference between original (mean-subtracted) data and PCA approximated data as the number of eigenvectors increases.
PCA clearly has value in reducing the dimensionality of the randomisation required for MC analysis. It is now possible to randomize the principal component coefficient vectors rather than the complete list of device parameters, and then transform these back to a set of parameters for each device. An independent randomisation may clearly be performed for each device, where the effects of interdie or intradie correlations are not required to be modeled.
However, the modelling of intradie variability is afforded in a convenient way by the use of PC coefficient vectors, and the correlation introduced by proximity on the die can be conveniently applied to these. Interdie variability can also be introduced in this way. The approach is to determine a set of PCs for each device model and then to introduce correlation into the corresponding PC coefficient vectors for each device within a circuit or subcircuit, according to the exponential model outlined previously. The correlation matrix should be partitioned into a number of smaller ones, each catering for one principal component which will be independent of all the others. If the device model has
A statistical behavioural circuit block (SBCB) is a behavioural model of a device such as a transistor, or a circuit building block such as a gate or an adder. Such a block may be based on a combination of a look-up table and a simplified passive circuit. Its purpose is to model the most important aspects of the device’s or circuit building block’s behaviour, to an acceptable accuracy, with a relatively small number of parameters. An SBCB may be defined as a dependent voltage or current source combined with a linear time-invariant “tau” circuit. Such a combination can be optimised to match a required delay and switching waveshape, and this approach has been found to eliminate difficulties with the time-step adjustment algorithm that is sometimes encountered when using MC analysis with versions of SPICE.
SPICE provides versatile dependent voltage and current sources [
Netlist for 2-input NAND gate using e-element.
e |
|
---|---|
+0.0 | 5.0 v |
+0.5 | 4.8 v |
+1.0 | 4.5 v |
+4.0 | 0.5 v |
+5.0 | 0.0 v |
Response of 2-input NAND as defined in Table
The use of such ideal voltage-dependent elements provides a good way to build up behavioural models suitable for augmenting with delay modelling for statistical timing analysis. It is also useful to employ voltage-controlled resistors to implement a switch-level MOSFET.
The tau model of a transistor has long been used as a simple behavioural model in many transistor optimisation tools for designing integrated circuits, such as TILOS [
Tau model for CMOS “pull down” sub-circuit.
The following expressions are obtained for
Look-up tables, as exemplified in Table
If the output of a tau model is applied to the look-up table-defined e-element in Table
A curve fitting procedure can optimize the tau-model elements and the look-up table for the true waveshape produced by the device for which a behavioural model is required.
This “tau and delay” SBCB model may be used for statistical static timing analysis. In some ways, it is similar to the “composite current source” (CCS) modelling technique produced by Synopsys [
The “tau and delay” modelling approach proposed previously uses voltage-controlled voltage sources (VCVS) rather than voltage- or time-dependent current sources. The choice was made arbitrarily since we did utilise any existing libraries, though the dominance of interconnection delay in submicron technologies offers some justification for our approach. It is argued [
Our SBCS modelling approach was used to produce a single model of each cell type. Each SBCS is characterised by parameters which may be randomised to simulate the effect of statistical circuit variation. The statistical characteristics of the randomisation (distribution, mean, and standard deviation) were derived from transistor-level simulations of true devices. In principal, a different look-up table is needed for each value of delay, but the required adjustments were found to be very small.
SPICE offers a large number of different ways of defining and randomising behavioural models, and we have suggested yet another alternative. The most sophisticated version of SPICE, HSPICE, does not often specify how certain features are implemented, and the open-sourced NGSPICE, though based on the same long-established analysis engine, does not have many of these features. All dependent sources in HSPICE have “ideal” delay as an option, which is trivial to achieve in digital circuits but unachievable exactly in analogue circuits. Modelling ideal delay in analogue circuits can cause difficulties with MC analysis since randomising such delays can cause SPICE to make increasingly slow progress and eventually to “hang.” Investigations revealed that, when this happened, it was not the models, but the step-size selection that was to blame. If ideal delays are specified with high numerical precision, the relative timing of events on a single chip can appear to vary almost continuously. Coincident switching events when randomised may become different but very close together. The time-step adaptation algorithm will try to model the very small timing differences and thus generate exceedingly small time steps. Quantizing the Monte Carlo variation eliminates this problem which means that the randomisation should ideally be done with reference to the anticipated time-step size. However, the effect on the results of the statistical analysis must then be investigated.
The modified tau-modelling approach has been adopted for the statistical behavioural circuit blocks to be described in the next subsection. Matching MC randomisation of behavioural model parameters to the step-adaptation algorithm of SPICE is a matter deserving further investigation as there may be great economies and insights to be gained.
SBCBs are used to replace true or accurately modelled subblocks to reduce the dimensionality and therefore the complexity of the analysis. By analysing a representative and random sample of fabricated gates, a delay distribution may be obtained whose statistical properties (pdf, mean, variance, etc.) can be used to define the SBCB.
Extreme value theory (EVT) [
The idea of SB is to try to concentrate on parameter vectors that are likely to generate the “rare events” of failing circuits and block out or disregard the ones that are unlikely to produce such failing circuits. Many input vectors are generated, but only the ones likely to produce “rare events” are simulated. This partial sampling of the performance distributions is the basis of EVT. The computational complexity involved in introducing the bias and compensating for it is much less expensive than performing many uninteresting circuit simulations. The “blockade filter” is a standard classifier used in machine learning and data mining. It is trained by simulating a relatively small “training set” of randomized circuits and is further refined as more and more simulations are carried out. Statistical blockade with this recursive updating is intended to make estimation of rare event statistics computationally feasible.
The implementation of SB is initiated by a “seed” netlist which specifies the basic circuit with its SBCB blocks. This netlist also specifies which parameters are to be randomized and the statistics (mean, standard deviation, etc.) of the required randomisation. It can be divided into four parts. The training of an estimator for predicting circuit performance with minimal computation. This requires a sufficiently large training set of randomized circuits to be generated and analysed by SPICE. In our work, the coefficients of a linear estimator are calculated using the “pseudoinverse” approach, and there must be many more randomized circuits than parameters. The coefficients are computed to make the estimator minimise the sum of the squared differences between the estimated circuit output measurements and the “true” circuit output measurement, as obtained from SPICE, over the whole set of training circuits. The generation of a much larger set of randomized versions of the circuit, and the use of a classifier to “block” the versions that are not likely to be within the tail. The classifier consists of the “linear estimator” followed by a “threshold” comparison with a “start of tail” parameter. Only the circuit copies which are estimated to fall within the tail will be unblocked and submitted for analysis by simulation. The “recursive” refinements of the linear estimator as more and more simulations are carried out. When a sufficient number of nonblocked “tail” circuits have been analysed, a second estimator is calculated using the “pseudoinverse” technique. The second estimator is more accurate than the original estimator for the tail and may be used for more accurate blocking. The use of recursion can move the defined “start of tail” parameter further away from the mean: typically from two to 3 or 4 standard deviations. Through recursion, we can thus get more accuracy in more extreme parts of the tail. The fitting of a Pareto Distribution (PD) to the measurements obtained from the nonblocked (“statistical tail”) versions of the circuit. This is necessary because, with SB, the non-tail circuits are blocked (not analysed) so we can no longer use Gaussian statistics. Also, very few measurements will occur in the “far tail” even when large numbers of circuits are generated. The use of PD fitting to the rarely occurring “tail circuits” allows the prediction of likely yield without the very large number of circuit simulations that would be required with traditional MC analysis.
To illustrate the computation time savings that may be achieved when synchronous and asynchronous circuits employing SBCB blocks are statistically analysed by MC techniques with SB, a frequently used handshaking component in the asynchronous control circuits produced by the BALSA design package [
The accuracy of the linear estimator obtained with the start of tail defined
Accuracy of linear estimator. Tail defined to start at
To obtain accurate predictions of behaviour further from the mean, recursion was employed to refine the accuracy of the original estimator using the results of nonblocked simulations. Figure
Refining linear estimator by recursion.
Original
Refined
4-Phase 3-stage Bundled Data Muller pipeline “ring.”
The estimated failure probability distributions shown in Figures
Muller C-element: (a) gate level, (b) transistor level.
Failure probability for a “C-element” realisation from 500 versions: (a) without SB, (b) with SB (
Without SB
With SB
Comparison of (a) and (b)
More accurate comparison
Error analysis of linear estimators in RandomLA training phase: (a) MC simulation and (b) QMC simulation.
It may be seen that the maximum difference in yield threshold delay between the two graphs for any yield failure probability is about 0.06 ps seconds, which is about 0.1 standard deviations. A more useful measure of difference is the maximum difference in yield failure probability. This cannot accurately be deduced from the graphs, but a resampling of the data plotted in one of the two graphs (since the sampling instants are different) revealed that this maximum difference occurred at a yield threshold of 31.23 ps and is equal to a probability difference of 0.002. This represents a discrepancy of approximately 14.2% from the probability 0.0129 predicted by the non-SB Monte Carlo simulation being used as a reference. Thus the maximum discrepancy in the yield failure probability is 14.2% which occurs when the yield threshold delay is 0.1 standard deviations.
Since a possible source of this discrepancy is the quality and suitability of the Pareto fit, some investigations were carried out. It was observed that one source of the discrepancy was the difference in mean and standard deviation of the Gaussian fits to the delay measurements produced on one hand by the non-SB MC simulations (“meanNB” and “sigmaNB”) and on the other hand by the training procedure (“meanTR” and “sigmaTR”). These are used to determine the “start of tail” parameter at two standard deviations from the mean. The more accurate estimations “meanNB” and “sigmaNB” are available for producing the comparisons since a computationally expensive non-SB will have been carried out for test purposes. But in reality, only the less accurate “meanTR” and “sigmaTR” estimates (based on far fewer randomised circuits) will be available to the SB version and were therefore used in the comparison.
As a test, “meanTR” and “sigmaTR” were replaced by “meanNB” and “sigmaNB,” thus eliminating two sources of discrepancy and allowing the suitability of the Pareto fit to be seen more clearly in Figure
With a more accurate estimator, the “start of tail” parameter may then be redefined as two or even three standard deviations from the mean to obtain even greater time saving since even fewer circuits need to be analysed. This increases the possibility of finding measurements yet further from the mean, that is, “rarer events,” in reasonable computational time and allows a yet more accurate estimation of the statistics of the “far tail.”
Table
Time saving illustrated by comparing simulations with SB to simulations without SB.
Circuit | Binary full adder | C-element | Muller pipeline ring | |||
---|---|---|---|---|---|---|
9 parameters | 12 parameters | 21 parameters | ||||
Start of tail | 1.5 |
2 |
1.5 |
2 |
1.5 |
2 |
1000 circuit without SB | 215.99 s | 221.34 s | 250.05 s | 288.51 s | 949.59 s | 1003.9 s |
1000 circuit with SB | 6.75 s | 3.96 s | 7.63 s | 4.24 s | 27.17 s | 13.15 s |
Time saving | 96.9% | 98.2% | 96.94% | 98.5% | 97.1% | 98.7% |
This section investigates the use of “low-discrepancy” sampling to achieve further efficiency improvements, over what was achieved in earlier sections, with Monte Carlo circuit simulation. Low-discrepancy sampling is the basis of “quasi-Monte Carlo” (QMC) techniques as often applied to multidimensional integration; therefore, this approach to circuit simulation may be referred to as “quasi-Monte Carlo” simulation. QMC methods are modified Monte Carlo methods where the input vectors are not totally random but are to a degree deterministic in that they conform to “low-discrepancy sequences” [
As mentioned in Section
Uniformly distributed numbers in the interval (0, 1) can be generated as pseudorandom numbers or quasirandom numbers, and the variables for all other distributions may be derived from these by means of the appropriate cumulative distribution function inversion. In practice the range must be restricted from (0, 1) to (
The MATLAB functions “Haltonset” and “Sobolset” are provided for constructing initial sequences of
As suggested in this work, the idea is to use a low-discrepancy sequence generator to replace the uniform random number generator as the source of randomisation in both the training and the recursive SB phases of RandomLA, the developed MATLAB software in the research. The choice of LDS will be “Sobol.” First, we present an example that compares the effect of using QMC rather than MC for training the linear estimator. Then we investigate the effectiveness of QMC for MC simulation with and without Statistical Blockade.
To provide comparison for training, an SRAM
As shown in Figure
Figure
(a) MC-SB compared to MC-non-SB for BFA (3000 circuits), (b) QMC-SB compared to MC-non-SB for BFA (3000 circuits).
Both MC and QMC Statistical Blockades were applied using 300 training circuits in both cases. The estimator order, as always, was equal to the number of parameters, that is, 72 in this case. The analysis time for recursive SB with MC and QMC was 146.6 s and 120.6 s, respectively, achieving close to 99% savings in each case with statistical variation from run to run, depending on how many circuits are blocked; it is not uncommon for QMC-SB to take longer than MC-SB when the same number of circuits is specified. Where the criterion is accuracy and reliability, QMC reduces the required sample size. For a given sample size, the advantages of QMC with SB over “non-SB” are not as striking as those of MC-SB over MC without SB. More analysis is needed on this matter. As in Section
The
SRAM
There are 48 transistors within the circuit. Transistor-level simulations of the array were carried out to estimate the yield failure probability for different values of yield threshold when statistical variability is included in the model for each transistor. This was obtained by connecting the eight outputs from the array, all initialised to zero, to a behaviourally modelled NAND gate whose output switching delay was compared to the threshold. The statistics for the parameter variation were derived from analyses of the 35 nm transistor model data set provided by RandomSPICE. Two extreme cases were considered: firstly where there is assumed to be strong intradie cell-to-cell correlation between randomised device parameters of a particular type and secondly where there is no intradie correlation between devices from cell to cell. A graph obtained for yield failure probability against allowable delay threshold for the strong correction case is shown in Figure
Yield obtained from 3000 transistor-level simulations of SRAM
As estimated by the MC runs with 3000 randomised circuits, the mean delay was found to be 16.7 ps from the input pulse edge occurring 20 ps after a timing reference. The standard deviation was found to be 0.376 ps. The graph, in Figure
The results obtained from the noncorrelation case are presented in Figure
Yield obtained from 3000 transistor-level simulations of SRAM
Thirty-two copies of Figure
SRAM
Figures
Run times and time savings for MC/QMC simulations of the SRAM32 × 8 array (strongly correlated).
Transistor model: ngSRAM32 × 8.seed | SBCB model: ngswSRAM32 × 8.seed | |||||||
---|---|---|---|---|---|---|---|---|
MC | QMC | MC | QMC | |||||
Non-SB | SB | Non-SB | SB | Non-SB | SB | Non-SB | SB | |
CPU time/s | 47263.96 | 1481.69 | 39373.99 | 1183.59 | 2376.14 | 63.70 | 2148.03 | 69.84 |
|
|
|
|
|||||
Time saving | 96.87% | 96.99% | 97.32% | 96.75% | ||||
| ||||||||
Overall time saving = (47263. |
Yield obtained from 3000 behavioural-level simulations of SRAM
MC without SB
QMC without SB
MC with SB
QMC with SB
Comparison of yield analysis results of SB and non-SB for behavioural-level SRAM
MC
QMC
Analysis of the results revealed the following. SB with Pareto fitting version is reasonably accurate and much faster in comparison to the “nonblockade” version. For the SB simulations with strong correlation, the number of wrong decisions not to block “ For the noncorrelation examples, the estimator was much less accurate for both MC and QMC training. This caused many more wrong decisions not to block. The results of these wrong decisions are discarded for the Pareto tail fitting procedure with some loss of efficiency. The behaviour of the linear estimator when adapting to the maximum delay criterion explains this loss of efficiency. The use of QMC with “Sobol” vectors makes nonblockade more efficient than with MC in that a given accuracy is achieved with fewer runs. QMC with SB compared with QMC alone offers further savings, but these remain to be fully analysed. When comparing overall time savings for MC with SB and QMC with SB, both the training and the analysis times must be taken into account. A lower number of training circuits were required for QMC simulations than for MC to reach a given estimator accuracy.
From the Gaussian distribution, it may be deduced that, if the delay distribution is Gaussian and the tail is assumed to start at two standard deviations from the mean, the percentage of unblocked circuits may be expected to be about 2.1%. Therefore, out of 3000 randomly generated circuits, about 63 unblocked circuits should be observed. Out of the first 500 circuits, about ten unblocked circuits should occur, and this observation suggests a simple adaptation mechanism for countering inaccuracy in the estimator. After a certain number of random circuits, say 500, have been generated, if the number of unblocked circuits is significantly different from what is expected, say 10, the tail threshold can be decreased or increased accordingly. The decision can be revisited later in the run, say after 1000 circuits, 2000 circuits, and so on. This adaptation was found to be useful in the noncorrelated examples presented in this chapter where the accuracy of the estimator was found to be lower than for the strongly correlated examples. Decreasing the threshold does not greatly affect the computation run time if the intention is to base the tail estimation on a specific number of circuits, say 2.1% of the total. This approach appears even more advantageous when higher deviations from the mean are to be examined, say three or more standard deviations. Instead of specifying a fixed number of randomised circuits, the simulations could be allowed to continue until a suitable number of unblocked circuits have been produced to allow a reliable estimation of the tail distribution.
The timing results quoted in this section are for single-core nondistributed computation. The RandomLA SPICE harness has been developed in such a way that it may be run on multicore machines and distributed frameworks such as Condor. Using parallel or distributed computing facilities can achieve great time savings. For example, rerunning the simulations in this chapter on a dual-core PC achieves a time saving which is very close to 50%, that is, a factor of two reductions in runtime. Using Condor, the run time of the transistor-level simulations of SRAM
Monte Carlo (MC) analysis with analogue simulation is an effective tool for the statistical variability analysis of nanoscale IC designs, taking into account the effects of intradie correlation as modelled by well-known techniques. Computational complexity is a major problem, though a variety of adaptations to the standard MC approach can greatly reduce this complexity as demonstrated by examples based on the simple “exponential” model of correlation due to proximity. These examples indicate that disregarding the effects of intradie correlation may give pessimistic estimates of yield. The results obtained from the simulations of SRAM arrays demonstrate the potential of RandomLA to achieve computation reduction for yield analysis with a delay specification. The RandomLA software is highly suitable for parallel and distributed implementations, which have already been shown to achieve great computation time savings.
The software packages mentioned in this paper, such as Cadence, Synopsys, and MATLAB, were purely used for research under the licenses of the University of Manchester. The authors of this paper do not have a direct financial relationship with the owners of those software packages. RandomLA is the software developed by the authors, intending to contribute to “gEDA” project [
This research was sponsored by the Engineering and Physical Sciences Research Council (EPSRC) under Grant no. EP/E001947/1. The authors acknowledge the financial support from EPSRC and the collaboration among the people of the Nano-CMOS pilot project, Meeting the Design Challenges of Nano-CMOS Electronics.