Advanced Numerical Methods and Software Approaches for Semiconductor Device Simulation

In this article we concisely presentseveralmodern strategiesthat are applicable to drift-dominated carriertransportin higher-orderdeterministicmodels such as the drift-diffusion,hydrodynamic, and quantumhydrodynamic systems. The approachesinclude extensions of “upwind” and artificialdksipation schemes, generalizationof the tradl-tional Scharfetter-Gummel approach, Petrov-Galerkin and streamline-upwindPetrov Galerkin (SUPG), “entropy” variables,transformations, least-squaresmixed methods and other stabilized Galerkinschemessuch as Galerklnleast squaresand dkcontinuous Galerkin schemes. The treatment is representativerather than an exhaustive review and severalschemesare mentionedonly briefly with appropriatereferenceto the literature. Some of the methods have been applied to the semiconductor device problem while others are still in the early stages of development for this class of applications. We have included numericalexamplesfrom our recent researchtests with some of the methods. A second aspect of the work dealswith algorithmsthat employ unstructured grids in conjunction with adaptive refinementstrategies. The full benefits of such approaches have not yet been developed in this application area and we emphasize the need for further work on analysis,data structuresand softwareto support adaptivity. Finally, we briefly consider some aspects of software frameworks. These include dial-an-operator approaches such as that used in the industrialsimulatorPROPHET, and object-oriented software support such as those in the SANDIA National Laboratory frameworkSIERRA.


Introduction
The dramatic advances in microelectronics during the past two decades are largely a result of "shrinking" the technology.The semiconductor device is an integral part of the DISCLAIMER Portions of this document may be illegible in electronic image products.Images are produced from the best available original " document.-------,,  ,,,, ,  hardware, and device size is now well below a micron with channel lengths from source to drain less than 0.1 micron and gate oxide layers less than 10 nanometers in thickness.For given voltage bias and operating conditions, as device size shrinks the local field strength inside the device increases and the interior layers in the property fields become more abrupt.
Other physical effects such as quantum tunneling in the inversion layer become significant and several numerical difficulties commonly arise.These numerical difficulties are typically associated with the following issues: (1) an inadequate physical model that ignores physics (such as the quantum effect) that was negligible at the previous scale; (2) numerical effects associated with the high local gradients in the solution that adversely impact the convergence of the nonlinear iterative solver; (3) other numerical effects such as oscillations in the approximate solutions that are intrinsically tied to the resolution of the underlying grid and the stability of the chosen discretization scheme.These three difficulties are obviously not unique to the semiconductor device problem.In fact, they are endemic to numerical simulation of convection-dominated transport processes.Yet the device problem does embody some of the most extreme behaviors one may encounter in this class of problems.Part of the difficulty has to do with the scale and the reliability of a deterministic mathematical model such as those based on augmented drift-diffusion or hydrodynamic PDE systems.
This is particularly evident in the vicinity of a charge accumulation region at the gate-oxide interface.Other discrete models using Monte Carlo solutions are possible but still not a practical alternative for the device designer's needs.Extending the deterministic models to include quantum tunneling in this regime is one approach that is proving very useful for present generation technology.For example, the use of WKB asymptotic expansion solutions to Schroedinger's equation extends the applicability of drift-diffusion and hydrodynamic models to smaller length scales [41,42].Since the issue of multiscale capability is a topical research subject in a number of modeling applications areas, we suggest that this is a good framework for interpreting the above problems -that is, difFerent microscale (here quantum level) and macroscale (carrier transport by drift diffusion) effects need to be accommodated.This concept has not yet been explored to develop alternative simulation models and strategies and remains an open research opportunity.
The second difficulty -convergence of the nonlinear scheme -is also more sensitive in the device simulation application [40] than in many other convection-diffusion problems.This is partly due to the strength of the solution gradients but a more significant issue appears to be the nonlinear source terms that describe carrier recombination (a reaction-like term [74]).Finally, the local strength of the electric field in the drift terms is extreme relative to the practical grid size, again because of the small length scales over -which significant solution variations and sharp gradients occur.(The doping concentration of modern devices varies by several orders of magnitude over very short lengths.)Likewise, the potential and carrier concentration solutions to the device equations will have abrupt interior layers.This implies the "usual" stabilization needs for treating convection (the drift term) as well as a strong recommendation for graded meshes to avoid excessive dissipation introduced by the stabilization mechanisms.These last two items -stabilization of deterministic models and adaptive grid refinement -are the focus of the technical discussion in Sections 3 and 4.
As the scale of the devices shrinks, more complex models are needed and different types of analysis components are being linked to integrate simulation and design capability.This latter aspect implies a greater demand for computational flexibility and interoperability between analysis modules or simulation models and systems for both process and device simulation.Likewise, there is a need for improved pre-and post-processing, from geometric modeling and CAD, to automated grid generation, through integrated simulations, to collaborative visualization and visual steering.In section 5 we sketch some recent and ongoing ideas related to software frameworks to support these endeavors.Here we briefly describe the dial-an -operator approach in the industrial simulator PROPHET developed at Lucent and the work in progress on the SANDIA national Laboratory system SIERRA.
Another consequence of the growing complexity of the PDE systems and the need for rapid simulation of high resolution grids is the increase in computational requirements to solve the problems.Advances in commodity processor speed have been instrumental in providing the needed computing power economically via desktop systems, area networks, tightly linked , , f , clusters of personal computers or workstations, and large scale distributed supercomputer systems.Affordable parallel shared and distributed memory systems are now available and are beginning to be used for semiconductor applications.In section 6 we briefly summarize some recent developments related to parallel computation.

Transport Equations
The best known models for device simulations are based on the stationary drift-diffusion (DD) equations for electrostatic potential # and carrier concentrations n and p Quantum effects can be included by modifiing these underlying hydrodynamic transport equations using potential solutions for Schroedinger's equation in subregions or by more general treatments.For example, the momentum displaced Wigner distribution function may be introduced in the moment expansion [33,39].
These PDE systems are discretized and the resulting nonlinear system is solved for a sequence of applied voltages to determine the I -V curve for the device design in question and to analyze other effects concerning the performance or breakdown of the device.The potential and transport systems are often solved in an iteratively decoupled form (Gummel iteration) but other algorithms are also applicable and used.

Stabilization
It is easy to verify that the central difference scheme permits oscillatory approximations to a monotone analytic solution of the model 1-D drift-diffusion problem with zero source when the grid is not adequately refined.To see this, consider the 1-D convection-diffusion equation expressed in the form u" -~u' = O on the domain O < x <1.A standard Galerkin scheme with linear elements or the standard central difference scheme both yield the same 3-point difference equation at a representative node i of a uniform mesh.By writing this difference equation in terms of slopess+ and s-to the right and left of the center node i of this difference patch and simpli~ing we find that the sign of the ratio s + /s-is negative, so the computed solution will be oscillatory if the mesh size h exceeds 2.D/l?.This classical result is known as the cell Peclet condition (~< 2).
As with many other transport problems, in the semiconductor device problem it is the convective or 'drift' term in the carrier and energy transport equations that leads to the use of so-called 'stabilization' strategies to suppress numerical oscillations.In the case of the device problem the difficulty is compounded by the fact that these oscillatory errors have large gradients that alternate in sign and usually promote divergence of the nonlinear iterative solution scheme for the system [54].Since there is natural diffusion in the system, this suggests that stabilization may not be necessary provided the grid is graded to a sufficiently fine resolution into the regions where there are Klgh solution gradients.The adaptive grid strategies that we discuss later provide a logical approach to achieve this goal.Nevertheless some form of stabilization that accommodates the near-hyperbolic nature of the problem is desirable, especially since the graded mesh and adequate initial solution iterate on that mesh are not known a pn"on".Instead, they must be arrived at incrementally horn an initial coarse grid and iterate using some form of continuation process.In our studies we have achieved some success by the following type of continuation process: generate an initial coarse grid and solve a simpler problem at low applied voltage (e.g. the potential problem at zero bias); improve the grid, adjust the model to drift-diffusion and use a strongly stabilized scheme to compute the next iterate; improve the grid and adjust the model to hydrodynamic and use a moderately stabilized scheme to solve the problem; repeat the last step to convergence at this applied voltage with adaptive refinement and modified stabilization; apply Euler-Newton continuation [64] to obtain the starting iterate for the next point on the I -V curve and proceed in a similar vein to the last part of the previous 1 -V step.
It is clear from the above that stabilization and mesh adjustment are key components of a successful algorithm.The 'bear hyperbolic" nature of the transport problem implies that the convective term should be treated with care.This is well known both in the device simulation community and in other flow and transport applications such as high speed gas dynamics where convective effects are strong and shock-like layers can arise [49,56].In fact, the behavior of the charge carriers in the device problem is analogous to an "electron gas" [32,34].Classical approaches for ensuring stability of hyperbolic problems involve "upstream" or "upwind" differencing to incorporate the directional property of the drift, ideally within some modified method of characteristics formulation.These upwind schemes introduce numerical diffusion corresponding to the leading order truncation error terms from the discretization.This artificial diffusion is a function of the grid size and convective coefficient.One consequence is that approximations on coarse grids may be very dissipative and the layers are smeared.Since the numerical or artificial diffusion added by these schemes is much greater than any physical diffusion in the original problem this often makes the solution of little practical use other than as a starting iterate for a new solution on an improved grid.However, at a sufficiently fine grid resolution, such as the 1-D resonant tunneling QHD studies in Gardner [33], a uniform fine grid can be used to obtain good results, but solutions in 2D or 3D can not be efficiently obtained.
Degradation of the physical layers (smearing over several grid cells) can be mitigated by the introduction of higher-order upwind schemes that still suppress oscillations, such schemes being frequently adapted horn computational fluid dynamics (CFD).Some examples are the flux limiter schemes [73] and similar total variation diminishing (TVD) or bounded (TVB) schemes [21,46,78] or the non-oscillatory ENO schemes [19,70].This latter strategy has been applied to simple diode simulations to obtain non-oscillatory approximations with sharp fronts [29,35].
The idea of discretizing the drift term to better reflect the underlying physics is also the key to the well known Scharfetter-  which yields a non-oscillatory approximate method .This scheme can be related to exponentially upwinding the test function [12].
In higher dimensions the upwinding issue is more complicated.A common approach for finite difference schemes on Cartesian grids has been to apply the ID formula in the respective coordinate directions.However, since the field .E is in general oriented at some angle to $he axes, simply using its components as will yield a scheme with excessive cross-dissipation.the associated ID drift components (Similaz observations apply to other transport applications in CFD where this approach has been followed.)An alternative is to rotate the coordinate frame into a "streamline-normal" system and apply the ID formula in the streamline direction, then rotate back to the original frame.Similarly, in the higher dimensional Petrov-Galerkin scheme the upwind bias for the test function can be constructed to be aligned with the field vector.The resulting scheme is then a streamline upwind Petrov-Galerkin (SUPG) form [9,44].
A streamline diffusion scheme can be constructed simply by adding artificial diffusion in the streamline direction.For our model steady drift-diffusion equation we then obtain the following weak statement: find u such that holds for all admissible u, where & is the unit vector in the direction of E, the artificial diffusivity ~is a function of the electric field strength and ~is in the direction of the field.
(In two dimensions Uf = Uzzf+uyyg = Eluz+l%uv where 131,Ez are the field components.) This weak statement is equivalent to solving the "dissipative" differential equation: The function coefficient yin the stabilization term is chosen to satisfy -y= O if e = D/ll?l> h; that is, the mesh size is such that there are grid points in the solution layers.Hence wemayset~=h-e if h > e and ~= O otherwise to get a viable stabilization scheme.

Howeverj since this dissipation is an O(h) modification of the original problem it is not
surprising that the asymptotic accuracy is now only O(h).That is, this scheme is only first-order accurate.
The streamline upwind Petrov-Galerkin construction recovers the second-order accuracy: " First, consider the degenerate hyperbolic problem obtained by letting D ~O.We then have, The construction is completed by "interpreting" the last term on the left as a sum of element integral contributions.(Note that this implies that interface jumps are ignored in computing this contribution and that for linear elements Aue = O in the element interior implies that this term is zero.)In our opinion this is a less than satisfactory situation, but the fact that ey scales the term with both e and -y small implies that asymptotically the "correction" is valid.It should also be kept in mind that we are, in fact, perturbing the original problem in the sense that a higher-order artificial dissipation may be associated with the upstream bias term.
The time dependent SUPG scheme follows in a similar fashion with E" Vu replaced by ut + E. Vu and the remainder of the formulation as above.This, however, also allows us to introduce other treatments that have been the subject of recent study in other applications areas.Of particular interest are the space-time and discontinuous Galerkin methods [2,6,22,47].
Let us consider the 1-D case.Introducing the "pseudo-material derivative" -Du/llt = ut + E. Vu = u<, then in the new E(z, t) space-time frame we have uc -DUZZ= f, U< = ut + EUZ (11) Adding an artificial diffusion -yuz.we can construct a corresponding spac~time Galerkin Scheme or in higher dimensions The space-time formulations can also be applied on individual time strips S. = fl x [in, tn+l] where Atn = tn+ltn is the time interval of interest and !2 is the spatial domain (0= [0, 1] in the example above).For example, a Petrov-Galerkin space-time formulation could be introduced of the form: find u satisfying the "initial" condition at t = tn and the essential boundary conditions on 17~= 80 x [tn, tn+l] and such that " holds for all admissible test functions w(fl, t).(Here, for convenience, we have taken u(f2, t) to be specified on l?~, so w = O on this part of the strip boundary as well as on the surface $2 of the strip at t = tn.)A finite element strip method follows on discretizing the space time strip as a single layer of elements and introducing an appropriate basis for the approximation trial and test functions.For example, a tensor product bilinear basis for trial and test functions on a strip of rectangular elements will yield an implicit difference scheme similar to those encountered in the Crank Nicolson difference method and the Crank Nicolson Galerkin semidiscrete strip schemes.In a similar manner, a strip of triangular elements with a continuous conforming basis might be applied to yield a more general fully discrete space-time algebraic system for the strip, and the strip need not be of fixed width in time.Other generalizations are also feasible.For instance, the test functions in the previous tensor product discretization of rectangles may be constant in time and continuous piecewise linear in space.This implies a test basis that is discontinuous across the time interfaces between adjacent strips.This Petrov G,alerkin scheme also yields an implicit system to be solved for each time interval.Error estimates and superconvergence properties (in time) have been shown for this type of scheme applied to the model diffusion equation [3].
The continuity of the trial space across the strip interfaces at tn can also be weakened denote the approximations as tn is approached from above or below, respectively.Then the strip interface jump condition [u] = O on the initial surface can be enforced weakly in the variational statement on each strip.That is, we add surface integrals to the variational problem on S'n of the form J~@]2df2.Note that in this scheme the trial and test spaces as well as the mesh need not be conforming across time strip boundaries.These schemes can also be extended to include upwind strategies and additional stabilisation treatments such as SUPG to accommodate the drift term in the semiconductor device problem.A further generalisation to treat drift-dominated and hyperbolic PDE problems is to use discontinuous Galerkin schemes at the individual element level.In this case the "inflow" and "outflow" boundaries are identified for a specified field direction and an arbitrarily oriented element.
The approximation and test functions are now discontinuous across interelement faces with jump conditions enforced weakly using the correct directional drift inclusion.This concept is applicable to arbitrary space-time elements but has not apparently been investigated to date in this broader context.
The time-dependent transport problem also can be treated by introducing an equivalent numerical dissipation term via the time discretization.This is, in fact, the basis of the familiar Lax-Wendroff approach for hyperbolic problems.The basic idea is to use the differential equation as an auxiliary relation.This relation can be differentiated and manipulated to express the leading time truncation error as a spatial dissipation term that can in turn be difference or similarly discretized.A simple illustrative example is afForded by the fundamental drift equation ut = -Euz.Forward differencing with respect to time we get Differentiating ut = -Euz with respect to t and simpli~ing we have utt = E2UZZso that the time discretized equation becomes Neglecting terms of O((At)2) and differencing centrally in z yields the Lax-Wendroff dissipative stabilized scheme.In like fashion, we can take this resulting dissipative form and integrate against a test function W(Z) to obtain a Taylor-Galerkin scheme.This idea has been generalized to construct a number of higher-order stabilized difference schemes [71,72].Note that the artificial dissipation in x varies as the square of E and linearly with At.This implies that care must be exercised that both the scheme not be excessively dissipative (E2 and At large) and that At not be so small that oscillations arise.Results for a simple 0.4pm silicon diode are shown in Figure 2 x 1015 cm-3 in state.
1 (from [9]).Here the doping is 2 x 1018 cm-3 at source and drain with the channel.The solution shown is obtained by time-marching to a steady Other variants of the SUPG scheme may also be developed.For example, in [10] a transformation to "entropy" variables is introduced and used to symmetrize the flux jacobian matrix for the hydrodynamic system.Consider, for example, the non-parabolic energy band transport system Introducing a change of variables U = U(V) we have -; where now the entropy variable transformation is constructed such that A. = ~is symmetric positive definite and Ai = AiAo are symmetric.The SUPG scheme can then be formulated for the transformed system.
For example, in the above case we introduce the transformation for U = (n, nu, nv, nw)T where p = 2/3nwi arises horn the Wiedemann-l?ranzassumption for Q = -kVU.
In fact, SUPG is a member of a broader class of stabilized finite element methods known as Galerkin/Least-Squares (GLS) [45].In this approach, the symmetric form of the governing differential equation may be written as

*
. Then Galerkin's method may be written as /

Wt.CVdfl = O. (21) c1
The GLS method adds a functional to (21) of least-squazes form, namely where ~is a matrix of local element intrinsic time scales which is constructed from the modal matrix of the differential operator.In (22), if ,CW is replaced by the convective part of the operator A" VV, then the SUPG method is recovered.Similarly, GLS and SUPG coincide for purely convective, steady-state problems.
There is a famous theorem of Godunov [36] which asserts that, in general, no linear numerical scheme for hyperbolic problems may simultaneously be monotonic and better than first-order accurate in space.Hencej modern numerical methods for capturing steep layers in convection-dominated problems typically incorporate some sort of nonlinear feedback mechanism to enforce monotonicity of the solution.In this way, the magnitude of the artificial dissipation is made proportional to the solution gradients.This is the motivation behind the so-called "discontinuity capturing operators" in the SUPG/GLS literature, and slope limiters for TVD and ENO methods.In this way, the formal order of accuracy of the numerical solution may be improved in regions where the solution is smooth, and reduced in regions of large, local gradients to suppress spurious oscillations and enforce monotonicity of the solution.The first order system for carrier transport in (l)-(2) can also be approximated directly using a mixed method [31] by introducing the current densities Jn and Jp as additional variables.This incresses the number of nodal unknowns and therefore the size of the algebraic system to be solved so these ideas have not been pursued in practice.However, in other applications areas such ss coupled fluid flow and transport mixed methods are receiving increased attention because the flux quantities are approximated directly to greater accuracy.Moreover, local conservation can be enforced at the element level using appropriate elements such as the Raviart-Thomas family.This approach can also be applied to  the electrostatic potential equation and, for simplicity of exposition, we will describe the formulation for this case.Accordingly, let us first write the electrostatic equation as a first order system by introducing a flux c to obtain where we have also set j = q(n -p + C).In the following we assume as in a block iteration where carrier concentration iterates are available step with the decoupled carrier transport equations.Note that the full (23) that ~is known , from the previous system could also be considered as a single large first-order system using the mixed Galerkin formulation.
The Galerkin statement for the mixed formulation follows after introducing test functions w and v corresponding to the variations of u and # in a weighted residual statement: find which can be solved for the nodal values S, @ of u and ~.Note that this system can be reduced by static condensation to the Schur's complement system which corresponds to the symmetric positive system obtained using the standwd Galerkin method for the second order electrostatic potential equation in (l).The low order Raviart-Thomas spaces are frequently chosen to develop the system in (26).For a rectangular  2) follows in similar fashion: the weighted residual statement for the first order system is constructed and the approximation spaces introduced to give a saddle point problem and algebraic system of the same structure as that appearing previously in ( 25), where the electrostatic potential and electric field are now presumed known from the above electrostatic Gummel step.
The Galerkin approach leading to a saddle point problem is not the only mixed weighted residual formulation that can be developed.One such alternative is to use a least squares minimization formulation for the residuals in the first order problem .This avoids the restrictions associated with a saddle point formulation and generates a mixed system that is symmetric positive definite.The analysis of this type of treatment and variations of this approach are under investigation.For the electrostatic potential problem (23) a least squares residual functional can be easily written where a is a weight and we seek a minimizer of 1 for (c, 4) in ~div x L2.The approximate problem is obtained by introducing finite element expansions oh and ~h in (27) and setting the first variation of I to zero to obtain a system for the nodal vectors that is now symmetric and positive [14,15].Wehavecommented previously that locally adapting the~idcan bevery beneficial.Obviously, grading the mesh into the layers will improve accuracy and efficiency.Furtherj numerical oscillations arise as a consequence of discretizing on a grid that is too coarse.
While adding dissipation or applying other strategies to suppress oscillations is important, the underlying difficulties stem from the mesh.The best approach is to combine a stabilized scheme with mesh adaption.
The main approaches for adapting the mesh are by (1) redistributing the nodes or (2) refining cells.The redistribution approach is best suited to computations using stencil-based finite difference schemes on mapped structured grids.For example, in a recent study we developed a mapped formulation for the device problem as follows: (1) first a coordinate mapping is introduced between a Cartesian reference grid and. a topologically equivalent curvilinear graded mesh in the physical domain; (2) then the governing device transport equations are mapped to the reference domain and difference (together with the metric coefficient introduced by the inverse map) on the Cartesian grid.This is standard practice for similar discretization schemes on structured grids in aerodynamics and CFD.The distinction now is that we have extended the Scharfetter-Gummel discretization procedure to the mapped equation and For example, consider along the edges of the grid cells in the reference domain [57].
the following hydrodynamic model for electron transport [7,8] V" where Jn and S'n respectively denote the current density and energy flux, which are defined by the following equations .

J. = rpq [
1 The momentum and energy relaxation times, TPand ~W,are empirical functions, which are chosen from the work of Bordelon et al. [8] as Tw = 0.46 X 10-12 S (30 The system is closed by assuming a Fourier type constitutive relation for the heat flux, Q, of the form (31) The other quantities in equations ( 28 The mapped Scharfetter-Gummel approach is derived for the current-density and energyflux terms in this hydrodynamic system.Details of this derivation for the case of a general coordinate mapping are given in [57].Here we state these results in the original coordinate system, and give the usual one-dimensional version that is used in a finite-volume setting to discretize the V. Jn and V. Sn terms.Accordingly, if we use subscript i to denote quantities at node i, the (constant) current-density and energy flux components on the mesh segment between nodes i and i + 1 are given by J.

c,.B(wy)
'ni+l where x= [ Note that the transformation takes into account the way the mesh is graded and this permits grading the mesh into regions where solution gradients or errors are large.
The second approach for improving the grid is to add new grid points locally to enrich the mesh in certain regions and simultaneously remove grid points in those parts of the domain where they are not needed.The ID FETG result in Figure 1  final nonuniform grid of 150 elements obtained by point insertion.The situation in higher dimensions is obviously more difficult.However, points can be added conveniently as part of a Delaunay triangulation process [13].The Delaunay triangulation is optimal in the sense that it connects the nodes (grid points) so that the local triangle shape is the best possible in a certain geometric sense (in the sense of maximizing the minimum angle via edge swaps).The refinement algorithm for grid point insertion is very straightforward: assume a new point p is to be added in a designated element based on the solution behavior or a local error "indicator"; add the point and identi~any neighboring triangles that will be influenced by the Delaunay process; remove the corresponding interior edges to define a "cavity" around p and connect p to the vertices of the cavity.This process is repeated recursively until the point insertion is completed.Similarly a vertex center of an interior patch can be deleted to form a polygonal cavity which can then be retriangulated to again Not only is this point insertion strategy simple, but it also incurs little overhead to the data structure, using only the edge neighbor information available from the Delaunay process.Yet, surprisingly, it appears to be little used for adaptive grid enrichment.In the case of existing industrial software that uses unstructured triangulation, this point insertion approach is appealing because it is relatively straightforward to retrofit the adaptive component to the analysis software.Hence, this is the easiest path for upgrading existing device and process simulators to include adaptivity and yield more accurate, reliable simulations that are more stable and not oscillatory.
A more common approach for adaptive refinement that does not require a Delaunay property is to simply insert points at the midpoints of specified edges of a triangulation.
For example, we can refine a designated triangle to a quartet of similar subtriangles by simply connecting new nodes at the midpoints of the three sides.The neighbor triangles can then be refined by connecting these midpoint nodes to the opposing vertices or a similar strategy.This idea has been applied by Bank et al [4,5] to device simulations using adaptive refinement with a multigrid solver.The approach could also be combined with Delaunay swaps to help avoid generating poorly-shaped sub-triangles.Such a scheme relies on an element-based data structure using an element error indicator.A variant of this method is to split designated edges (rather than elements) guided by an edge-based error indicator.A point is then inserted in a given edge and the adjacent triangles are appropriately subdivided.
(Clearly this can also be done using the previous Delaunay procedure).These strategies and variants of them can easily be generalized to tetrahedral.
For example, see Plaza and Carey [62,63] for recent tessellations based on longest edge bisection of tetrahedral using the skeleton triangulation.The sketch in Figure 2 shows a tetrahedron refined in this manner.
The approach of subdividing the triangle to a quartet of sub-triangles or the tetrahedron to an octet of sub-tetrahedra generates respectively a quadtree and an octree data structure that can be exploited in the simulation to yield a more efficient adaptive algorithm [18].
Rather than reconnect midside nodes of edges shared by unrefined neighbor elements, one can include constraints on the solution behavior along these edges to ensure appropriate continuity or smoothness of the approximation (conformity).This approach can also be applied to quadrilateral and hexahedral (quadrilateral brick) elements with their associated quadtree and octree data structures.One of the first studies of this type for device simulation used quadrilateral elements for a MOSFET simulation [17,69].A sketch of the potential field from that early calculation is given in Figure 3.More recently Dutton et al [20,26,37] and Hitschfield et device simulations respectively.
Instead of refining the mesh al [43] have developed adaptive schemes for process and to improve accuracy, one may increase the order of the difference scheme or the degree p of the finite element basis.The latter p-type or spectral element approach is particularly appealing, since the grid with mesh parameter h remains fixed and the element degree can be increased as needed.For elliptic boundary value problems with smooth solutions, a polynomial element basis of degree p will yield a global asymptotic rate of convergence in the L2 norm that is O(h~+l).This high accuracy and rapid convergence can be achieved by increasing p to the necessary level on a relatively coarse grid.
The element matrices increase with p and the bandwidth grows correspondingly, but the high accuracy implies that these methods will be more efficient when the solution is sufficiently regular.However, for problems with singularities, the reduced global regularity restricts the rate of convergence: if the solution is in H' then the rate in L2 becomes p = min(p + 1, r) , so r limits the rate of a p scheme.In this case it is desirable to adapt by refining the mesh towards singularities (local h refinement) and increase the polynomial degree on elements remote from the singularities.Such schemes are called adaptive hp methods and have not yet been applied to semiconductor device or process simulation although they are used in other field problems in engineering mechanics and electromagnetic.
A uniform p refinement scheme has been applied to compute solutions to the augmented drift-diffusion equations mentioned previously.This study involved the use of parallel multilevel iterative solution techniques in which the level corresponded to the degree of the element polynomial basis [24].Adaptive p schemes can be constructed in a manner similar to the adaptive h schemes where now the polynomial degree varies across the elements of the discretization.This implies that, as in the case of h-refinement, a strategy is needed to permit transition between refined and unrefined elements.In the adaptive h scheme on simplices this can be achieved by connecting the "hanging" node on the element interface to the opposite vertices of the adjacent unrefined element.For hexahedral grids, special transition elements can be constructed or techniques for locally constraining the solution by penalties or Lagrange multiplier methods can be introduced.Similarly, in the adaptive p scheme, continuity of the approximation across the element interface can be enforced by constraining the higher degree basis functions on the element interface.If a hierarchic basis is used, then this simply implies that the appropriate degree of freedom be set to zero at the interface node of the refined element.Further details on adaptive p and hp strategies are provided in [60].

Software Frameworks
The use of more sophisticated algorithms such as those with complex data structures for grid adaption has increased the complexity of the associated software.In addition there is a desire from the applications area to be able to handle a more diverse class of problems in more general settings (variable spatial dimensions, coupled fields, etc.).Finally, and this is particularly the case for the device problem, the formulation, algorithms and software should be easily extensible to treat new models or a gradation of models.We have seen that, depending on the application, one may wish to solve in order of increasing complexity the potential problem, the drift-diffusion system, the hydrodynamic system, quantum hydrodynamic systems or higher order models.Moreover, the "constitutive" models for mobility (augmented mobility models), inversion layer treatment, thermal closure etc. should be encompassed within the one software framework.These requirements imply a concomitant demand on the software design and the use of higher level programming languages and tools to facilitate a flexible, extensible package.
The use of object-oriented software paradigms certainly facilitates design of such systems [25, 28,30, 80] and this approach has been applied in both the commercial and university sectors to varying degrees (Avanti, Floods, Maple, Mathematical) [16,51].The Stanford University ALAMODE simulator [79] is another example of an object oriented dial-anoperator tool for TCAD simulations.
Symbolic manipulators have also been receiving increasing attention as a mechanism for expressing differential equations explicitly in software using, for instance, Mathematical or Maple.While symbolic manipulation does incur a modest overhead, it facilitates design of a framework which can accommodate a broad applications set.This implies that changes to the differential equation system may often be possible directly in the higher level symbolic language without affecting the discretization procedure, data structure and solvers.
Part of our recent work on device simulation has involved the industrial simulator PROPHET.This software framework was originally developed for semiconductor process simulation and we have been collaborating with the Lucent developers and colleagues at Stanford on the extension of the capability to device analysis [67].One long-term goal is to provide a single integrated framework for both process and device simulation.PROPHET embodies some of the features mentioned above -in particular, it provides a '(dial-anoperator" capability that allows the user to "build" a differential equation system.This implies that the analyst may even construct mathematical models for other classes of applications beyond the process and device problems of immediate interest [59].
Of course, the ability to handle very general differential systems at the symbolic level presumes that individual differential operators such as div, grad, curl and integral operators can be discretized appropriately.This, in turn, places considerable demand on the data structure at the next lower level.Some operators will require element or cell information, others edge or patch information and so on.For example, discretization strategies for stabilization such as streamline upwinding or exponential weighting via a local Green's function may use patch or edge data structures in specific ways.
The basic strategy used in PROPHET consists of decomposing equations into terms, and treating each term as a combination of a geometric and a physical operator.New Over the past three years the SIERRA C++ framework [75] has been under development at Sandia National Laboratories as part of the U.S. Department of Energy's Accelerated Strategic Computing Initiative (ASCI).The goals of this project are to provide software support services that are common to finite element applications.An important aspect of this effort is the construction of a set of high-level abstractions that allows the details of services such as adaptive mesh refinement, cache management, message-passing, linear solvers, and so forth to be hidden from the applications developer.
The basic paradigm is that a finite element code is a set of nested computational mechanics objects.The highest level, called a domain, is essentially a container for one or more procedures that manages the time integration of a set of regions.Each region is responsible for solving nonlinear sets of strongly coupled equations at a single timestep.Different sets of physics correspond to loosely coupled regions.In turn, the region contains lower-level element mechanics, which perform the actual element integrations, etc.At the lowest level, the element mechanics may contain nested material mechanics that provide the constitutive

Parallel and distributed computing
The need for more sophisticated physical models leading to larger coupled PDE systems together with the requirement of high resolution grids has fostered interest in parallel de vice simulation [23,76].Traditionally, the device designer has depended on uniprocessor workstation capability, but the recent emergence of multithreaded parallel shared-memory workstations and of tightly-coupled distributed parallel PC workstation clusters provides an inexpensive means of scaling up the application and reducing run time [38].
A standard approach for parallelizing PDE simulation is via domain decomposition: The domain and grid are partitioned to a set of subdomains and corresponding subdomain grids.Both overlapping and non-overlapping domain decomposition strategies are .applicable.Typically, the subdomain calculations are carried out on their respective processors with overlap or interface communication using MPI between adjacent processors [1,11,23,38].To date there has been little use of these ideas for device and process simulation except in a few university research studies [61,77], but but they are now widely used for other engineering applications.We will see more widespread interest and use in the near future, particularly as software frameworks are developed to support parallelism for process and device simulation.
We are particularly interested in parallel solution strategies that can accommodate unstructured grids.Hence, it is important that the grid partitioning problem be efficiently treated and that the resulting partition have good load balancing properties for parallel computation.Sandia software package CHACO [50] provides several algorithms of varying complexity ranging fi-om simple inertial bisection to more costly spectral schemes.These approaches have been continued in the METIS software which is widely available [48].Both software systems provide effective means for partitioning unstructured grids in applications to the stationary device equations.
A representative example is included here from a recent study.The test problem was the n-channel depleted MOSFET described in [11] using the drift-diffusion model on a mesh of 7722 triangles generated by Delaunay triangulation.A partitioning of the unstructured grid to 6 subdomains is indicated in Figure 5. See [11] for numerical results as well as more details of the algorithm and implementation.--.-----... .:..: . 1-. .. ...             (from [11]).

2 )
with J. = q~.nE+ qD.Vn Jp = q/+PE -qDpvP where pn and pP are electron and hole nobilities, Dn and E= -V+ is the electric field, q is the unit charge, C (Dp are corresponding diffusivities, is the electrically active impurity concentration, and R is the electron-hole recombination term [27, 53, 55].An augmented system that permits a simplified treatment of hot electron effects can be constructed by making the mobility p. a function of the local field gradient, such as pn = pn (~) where ~corresponds to the gradient of -E in the field direction.A more rigorous "hydrodynamic" model for hot carriers can be obtained by including transport of energy density w as an additional (the third-order) moment of the Boltzman transport equation [52].Introducing the electron temperature 7' and a closure relation for the heat flux in the moment system we obtain an additional convection-diffusion equation for 2. .-.. ...-. --where the subscript c denotes collision contributions and the closure relation for the heat flux Q is an "appropriate" constitutive relation (a Fourier type relation Q = -kVT is frequently assumed with some question as to both the validity of this form and the value of k [52]).

.
scheme involves exponential (or hyperbolic cotangents) in the difference coefficients and
z) = x/ (eZ -1) denotes the Bernoulli function.The other quantities introduced in = 0.007 x 10-12, local mesh spacing = (Z~+l -Z~), average energy along edge Ax = ~(wi+l + Wi) ,4 was first computed on a uniform grid of 100 elements and generated solution profiles with strong oscillations near regions of large solution gradient.The non-oscillatory results in the Figure are for a .
meet the Delaunay requirement.Local coarsening can thus be achieved by successive point deletions, again guided by a local error or feature indicator.The approach can be extended to point insertion into tetrahedral grids.

20 ,
be treated by either combining the predefine geometric and physical operators to construct new PDE systems, or by creating new physical operators via a welldefined interface.The package also includes a database library which enables easy access to any coefficients, parameter values or other properties that pertain to pre-configuring the supported applications.More comprehensive details regarding the set up of PDE systems and the structure of the database library are discussed in the references[65, 66].The analyst may 'interact' with the PROPHET framework in three main modes:(1) at the top-most application level; (2) at the middle "dial-an-operator" level; and (3) at the lower-most discretization level.For instance, adding the new drift capability (grad operator) for the device problem using the mapped S-G approach on edges requires expanding the discretization capability at the lowest level, whereas modifying the differential equations with operators from the existing library involves the mid level and simpler parameter changes involve only the top level.An example of a MOSFET simulation with PROPHET is given in Figure4.Further details are provided in[58].
relations such as stress-strain, thermal conductivity, etc. Boundary conditions are typically implemented as mechanics classes that are owned by the region and hence are peers of the element mechanics.In this way, separate physics may be loosely coupled at the procedure level, by a region that contains each distinct set.For example, a microelectromechanical defining systems (MEMS) problem might consist of several regions: one (or more) in which fluid motion is modeled, another in which structural deformation is modeled, a region for electromagnetic field calculations and regions for heat transfer and radiation modelling.Data transfer among regions occurs at the procedure level via abstract transfer objects that hide the details of the mesh projections from the developer.Regions may overlap and it is not necessary that the two meshes align.As another example, a transistor might be modeled as two .nonoverlappingregions: the first could be the oxide, in which the nonlinear Poisson equation is solved for the electric field, and the second might contain the rest of the device, in which the coupled Poisson equation and the charge transport equations are solved.Alternatively, if Gummel iteration is used to decouple the electrostatic potential and transport calculations, then the first region corresponding to the nonlinear Poisson equation could span the entire device, and thus overlap with the second, in which only transport equations are solved.In ', this case, the transfer object would pass the electric field from the first region to the second, and also pass the carrier concentrations from the second region to the Poisson region.In this latter strategy, adaptive refinement could be used to obtain two separate meshes in the semiconductor, one optimized for the potential solution and the other optimized for the concentration solution.The details of determining the mesh intersections on distributed memory computers and projecting solutions from one mesh to another are transparent to the application developer.