Topologically Rectangular Grids in the Parallel Simulation of Semiconductor Devices

Topologically rectangular grids offer simplicity and efficiency in the design of parallel semiconductor device simulators tailored for mesh connected MIMD platforms. This paper presents several approaches to the generation of topologically rectangular 2D and 3D grids. The effects of the partitioning of such grids on different processor configurations are studied. A simulated annealing algorithm is used to optimise the partitioning of 2D and 3D grids on two dimensional arrays of processors. Problems related to the discretization, parallel matrix generation and solution strategy are discussed. The use of topologically rectangular grids is illustrated through the example of power electronic device simulation.


INTRODUCTION
The development of parallel device simulators is a widely accepted method for overcoming the speed and memory restrictions of a single processor system [1 ]. This is particularly important for 3D problems [2] where the speed of existing 3D single-processor simulators significantly restricts their use in practical device design. To achieve maximum speed-up and efficiency, parallel code design should reflect the architecture of the parallel platform, while remaining scalable and portable in this changing world of parallel systems.
Topologically rectangular (structured) two and three-dimensional grids are attractive for parallel simulation of semiconductor devices because they are easily partitioned onto processor arrays, enhancing the scalability and portability of the simulator code. There are speed-up and efficiency advantages when large topologically rectangular grids are partitioned onto mesh connected arrays of processors using physical domain decomposition instead of logical spectral methods [3].
In this paper we report on different aspects of the implementation of topologically rectangular grids in the parallel simulation of semiconductor devices including grid generation, optimum partitioning, discretization and solution strategy TOPOLOGICALLY RECTANGULAR GRIDS Structured topologically rectangular grids allow index ordering preserving the number of grid nodes in each one of the index directions. In addition, nodes with neighbouring indices are physically adjacent in the grid. Most finite difference grids are inherently topologically rectangular. However it is also possible to construct topologically rectangular finite element (FE) grids.
Topologically rectangular grids can be easily partitioned not only onto pipelines but also onto 2D and 3D arrays of mesh connected processors. Such partitioning only requires nearest-neighbour communication in the design of iterative solvers, because adjacent nodes in adjacent partition subdomains appear on neighbouring processors. Topologically rectangular grids lead to a regular band structure in the discretization matrix which may also be advantageous in the design of parallel direct solvers. They are also ideal for the design of multigrid solvers. However because of the constraints of grid generation a topologically rectangular grid may require more nodes than unstructUred FE grids.
One simple approach to the generation of topologically rectangular grids is the deformation of an originally rectangular grid to the shape of the solution domain boundaries. Such an approach is used in the FE Monte Carlo simulation of recess gate compound FETs [4]. Fig.l(a,b) illustrates the generation of a quadrilateral grid in the recess region of a MESFET by deformation, following the shape of the recess and the mushroom gate.
A more complex approach can also be used for the generation of the grids. Here, non-equidistantly spaced guiding contours resembling the shape of the domain boundary control the position of the grid nodes. The number of nodes on each guiding contour vary to satisfy the criteria for a topologically rectangular grid. This second approach allows better control over the density of nodes and the shape of the finite elements. The quadrilateral elements of the grid could be further subdivided into two triangles if simplex type triangulation is required. Attention must be paid to avoid obtuse triangles and if necessary some nodes may be slid along the guiding contours. A similar strategy may be applied to 3D grid design where the guiding contours are replaced by guiding surfaces following the domain boundary.

PARTITIONING
In the domain decomposition approach a topologically rectangular 2D grid may be partitioned on 1D (pipeline) or 2D arrays of processors, and a 3D grid may be partitioned on 1D, 2D or 3D arrays of processors. This is done by dividing the number of grid nodes in each index direction by the number of processors available in the processor array in that direction. For a 3D topologically rectangular grid, F.ig.2 shows the theoretical speed-up of a hypothetical linear equation solver as a function of problem size for a pipeline, 2D and 3D array of processors. The calculations are based on detailed performance theory [5] with local and global communication times typical for yO. First-order load balancing only occurs when the number of grid nodes is exactly divisible by the number of processors in each dimension of the processor array. Otherwise deep oscillations in speed-up and efficiency occur. We apply a simulated annealing algorithm to improve the first-order load balancing of such a partition. In the case of a 2D array of processors, every grid partitioning (state) has an associated score function E(V,Sx, Sy, r where V and S are the number of grid points assigned to each partition subdomain and the partition x,y edge lengths respectively. r is the processing/comms ratio for the device simulation implementation. State transformations have been developed (Fig.3) that allow transitions into states which preserve the four way connectivity of the partition subdomains. Transformations are chosen randomly, and only followed if they decrease the score i FIGURE 3 Non-repeating partition transformations which preserve processor mesh connectivity. Subdomain boundaries marked by heavy lines function (i.e. AE<0), or if random(O,1) < exp(-AE/T) (T is the annealing temperature). If transformations merely perturb the score function, and enough of them occur at each T to form a Markov chain in equilibrium, then an appropriate schedule of T reductions will result in convergence towards an optimal final state [6]. Fig.4 shows the annealed partitioning of a 19x19 grid over a 3x3 array of processors. Fig.5 shows the theoretical speed-up of a coloured SOR parallel solver for rectilinear and annealed partitionings as a function of problem size.
To enhance the portability of our parallel device solvers, the code is split in two parts: a hardware mggMggOOOOOagmmigg ***'i''"'-***"aauou alwnmi mnaaaaoa aaaaaa:we,mmmaaaaaa oooaoawmmmmoooooo aoaawWmmlO_Oaaoo * * *i * * * * * *1= " '1" * * wmmmmloaoaammm mwmmwmwloaooammm wm mmmwmmwlaaaaowmm mm FIGURE 4 Partitioning of a 19x19 grid on a 3x3 array of processors after simulated annealing dependent communication harness and the simulation engine. The communication harness, written using PARIX message passing primitives, provides all global and local communications between the processors required for the operation of the simulation engine on ID, 2D and 3D arrays of processors.

PARALLEL DISCRETIZATION AND SOLUTION
The Galerkin finite element approach has been adopted to solve the Poisson equation. The calculation of matrix coefficients is carried out by an isoparametric mapping of the quadrilateral elements into rectangles (in the 2D case) or the distorted brick elements into cubes (in 3D). For parallel matrix generation and assembly a node based approach is used where the solution subdomain on each processor is scanned node by node. For each node only the contributions of the elements to this particular node are calculated. This leads to almost 100% efficiency for annealed partitioning. For the non-linear Poisson equation we use a four colour Block Newton SOR scheme.
A modified control volume approach has been developed for the discretization of the current conti-nuity equation using quadrilateral elements in the 2D case. Alternatively the quadrilaterals may be subdivided into triangles and the standard control volume procedure applied. In the 3D case a control volume approach amenable to distorted bricks is in the process of development. However each distorted brick may be subdivided into six tetrahedral elements and a standard control volume approach applied. A parallel implementation of the BiCGSTAB(2) method with polynomial preconditioning has been adopted for the current continuity equations.

EXAMPLE
An illustration of the use of topologically rectangular grids is the finite element 3D simulation of cellular power IGBTs. The complex shape and doping distribution of these devices requires a 3D finite element discretization. This, together with the computational complexity of power device simulation, makes the problem ideal for parallel processing. Because of symmetry only one quarter of the cell has been discretized for calculation of the in-cell breakdown after punchthrough stopper implantation. The grid at this stage ( Fig.6) conforms to the shape of the gate and the metallurgical p-n junctions at the cell surface.   In this paper we have presented a method of using topologically rectangular grids in the simulation of semiconductor devices. Such grids facilitate the development of parallel codes, enhancing their scalability and portability. Two procedures for the automated generation of topologically rectangular grids are outlined. Based on detailed performance theory we have demonstrated that under domain decomposition, optimum performance is achieved when the processor array topology reflects the dimensionality of the simulation problem.