Ensemble Monte Carlo and Full-Wave Electrodynamic Models Implemented Self-Consistently on a Parallel Processor Using Perfectly Matched Layer Boundary Conditions

We have been using a self-consistent formulation of full-wave electromagnetic solvers and ensemble Monte Carlo techniques to model ultrafast photoconductivity. Our simulations are running on a MasPar machine. This paper will address aspects ofthis simulation which may interest workers who are simulating not only photoconductive systems but other systems as well which involve electrodynamics, waves and wave phenomena and ensemble Monte Carlo transport models. In particular, we will report on the inclusion of perfectly matched layer approaches to absorbing boundary conditions for electromagnetic waves. These have in the past several years become widely used in computational electromagnetics codes because they reduce error due to spurious numerical wave reflection off of an absorbing boundary by several orders of magnitude. We will also address the issue of computational cost and show that a full-wave electromagnetic approach is more competitive with a Poisson's equation approach than one might believe. Lastly, our system has the feature that the active portion where the electrons and holes lie is in fact a small fraction of the total experimental system's volume. Unless care is exerted one either has a very significant load imbalance problem or high communications overhead. We compare two different tradeoffs between load imbalance and communications overhead.


INTRODUCTION
Semiconductor device models always incorporate electromagnetic forces, usually by a quasistatic field calculation.The electric field is allowed to vary in time but its spatial derivatives are assumed to be electrostatic in nature, that is X7.E:p/ (1) and 27 E 0. (2) The irrotational electric field in equation ( 2) is the essential approximation being made.In con- trast, one can use an electrodynamic or full-wave electromagnetic model [1][2][3][4][5][6][7][8] in which one solves Maxwell's curl equations v E -O /Ot (3) X7 H J + OD/Ot. (4) In this paper we will discuss this issue using a system where the transport model is an ensemble Monte Carlo model.We will discuss load imbal- ance/communications overhead problems asso- ciated with the parallelization of such a system, show that a full-wave electromagnetic approach is more cost competitive than one might believe and will report on the inclusion of perfectly matched layer approaches to absorbing boundary condi- tions for electromagnetic waves.

TRADEOFFS BETWEEN QUASI-STATIC AND FULL-WAVE MODELS
The physical motivation for switching to a full- wave model of a semiconductor device may be either internal or external to the device.Our devices are often incorporated into a larger system.In some cases, e.g. when computing broadband response in the millimeter-wave and submilli- meter-wave regions, electrodynamic models of the external components are used.Incorporating full-wave techniques into a device model is then quite natural.
The internal physical issue is when does a quasistatic approach to the internal field calcula- tion break down?The typical quasi-static device model solves Poisson's equation v2v -pl (5) to determine a scalar potential function V. Fields and forces are then computed by taking the gradient of this scalar potential i.e.E -VV.(6) The boundary value problem associated with Poisson's equation is computationally demanding.
The gradient operation of equation ( 6) on the other hand is computationally trivial.Poisson's equation can always be used, even in a fully electrodynamic setting, if one makes the appro- priate selection of gauge [9, 10].What fails in electrodynamic setting is the computationally trivial gradient operation of equation (6).Instead,   we must additionally solve for the vector potential and compute the electric field using E -V V OA/Ot.( 7) Equation ( 6) implies the irrotational field of equation (2) but equation (7) does not.
Another part of the tradeoff between quasistatic and full-wave approaches is the issue of computa- tional cost.Before dealing with this issue, a more detailed discussion of our computational system is needed.We have been implementing this system on a MasPar MP-2 machine.This machine has 8192 Processing Elements (PE) connected in a mesh.It is a SIMD machine with two routes for data transfer between elements.The fastest com- munication is between nearest neighbor PEs on a network called the Xnet.The slower route is a global router which communicates between any two PEs.As described before [1-3], we will couple two different calculations together.We self-con- sistently couple an Ensemble Monte Carlo (EMC) model with a finite difference solution of Maxwell's curl equations.These two calculations involve different types of data.The EMC calcula- tion has data which is associated with a specific particle while the electromagnetic calculation has data which is associated with a location in space.There are natural methods for parallelizing both computations but these methods differ in their utilization of the PE array due to this difference in data type.In the EMC calculation one wishes to evenly distribute carries over the PE array while in the field computation one assigns adjoining regions of space to nearest neighbors PEs.Our problem is to simultaneously parallelize both solutions.
Two basic strategies for this simultaneous parallelization will be compared here.In both algorithms, we compute the fields using the adjacent PE approach mentioned above.(The initial condition for the fields is produced by solving Poisson's equation using a red/black SOR technique [11]).In algorithm 1, an equal number of particles is assigned to each processor thus alleviating any load imbalance in the EMC calculation but a high price is paid for commu- nication between the PEs during the process where we switch between the two calculations.In algorithm 2, a region of space is assigned to a single processor even during the EMC calculation.This processor handles the particles found in this region.This minimizes communication overhead but introduces a significant load imbalance as the EMC region is only a small part of the simulation domain.In order to alleviate this load imbalance, the field information for just the EMC region is duplicated throughout the PE array so that carriers can be located on any PE which has the appropriate field information.
We apply this to the physical system shown in Figure 1.A nonuniform grid was used.The grid used a 2 micron spacing in all three dimensions in the passive region and a 0.1 micron spacing in the EMC region.There is a gradual transition between the two extremes.The run times of these two algorithms are shown in Table I.They are broken down into entries for the field calculation, the MC calculation and the coupling of these two.As expected, algorithm spent less time on the EMC calculation due to better load balance but more time in coupling the two calculations due to the communication overhead.Over all, algorithm was somewhat superior to algorithm 2 for this particular example.The performance of the algorithms though is sufficiently close that no conclusion should be reached that algorithm is always superior to algorithm 2. Now, how does a quasistatic computation compare in numerical cost with a full wave approach?Since we have both a Poisson solver and a full-wave model here, we can compare the two.We duplicate the computation just described only we proceed quasistatically.(Of course, a quasistatic solution of a transmission line transient response in unphysical but our goal is to compare computational costs).This run-time data is also shown in Table I along with the number of iterations required per time step.The full-wave model uses iteration per time step while the number of iterations per timestep in the Poisson's solver depends on the convergence criteria.If too many iterations are required per timestep, the quasi-static model loses any cost advantage it may have had over a full-wave solution.In fact, for a very tight convergence criteria, the quasistatic model is signficantly more expensive than the full-wave!.
Examination of Figure shows that we are simulating an open system with electromagnetic waves propagating away towards infinity.As our grid is finite in size we need to minimize the numerical reflection of these outgoing waves at the / LiYaO3 / GaAs / Transmission Line

FIGURE
The system simulated here is a microstrip line structure on a GaAs substrate.A dc bias is applied and the system is which will be excited by a subpicosecond pulse in the Monte Carlo or MC region shown.An electro-optic material is place over the top of the system.The grid size for the electromagnetics code was 127 x 63 x 54. 8192electrons and 8192 holes were included in the Monte Carlo calculations.end of our numerical grid.There has been notice- able advance in such absorbing boundary condi- tions for electromagnetic waves in the past several years, usually associated with the phrase "Perfectly Matched Layer" or PML [12-18].These techni- ques have reduced the numerical error associated with spurious numerical reflection by several orders of magnitude and are particularly advanta- geous for the case of oblique incidence.We use the technique of  although other techniques have been developed as well [15][16][17][18].Here we will outline the conceptual nature of these techniques and refer the reader to the references for details.
The essential idea is to add a fictitious "Perfectly Matched Layer" on the outside of all surfaces where we have to implement a wave absorbing boundary.We then extend our numerical grid out into this fictitious layer, terminating it at the end of this layer.Since this is a purely artifical layer, we are free to pick its material properties to meet two constraints.First, the reflection coefficient asso- ciated with the interface between the valid solution space and the fictitious absorbing space must be zero for all frequencies and all angles of incidence.This will eliminate numerical reflection off of this interface.However, accomplishing this goal just moves the original problem to the new truncation of the grid at the far end of the fictitious absorbing layer.We will have a reflection off of this truncation.Therefore our second constraint on the PML is that it must be a lossy medium which attenuates electromagnetic waves.The techniques described in the references succeed in meeting both constraints.
In summary, one can proceed with an electro- dynamically based approach in the modeling a semiconductor device.For the cases where one is already expending significant computational re- sources e.g. three dimensional device models, quasistatic and electrodynamic techniques do not necessarily differ significantly in computational cost.If one does so, the use of a PML type of absorbing boundary is recommended.Care how- ever may be required if one implements the system on a massively parallel processor array as the field calculation and the transport model may employ very different types of data and additionally, the device itself may be a comparatively small portion of the simulation domain.

TABLE
Computational run times for various solution techniques