Multilevel Algorithms for Large-scope Molecular Dynamics Simulations of Nanostructures on Parallel Computers

curvilinear-coordinate load balancing; iii) hierarchical dynamics via a rigid-body/ implicit-integration/normal-mode approach; iv) variable-charge MD based on electronegativity equalization; and v) multilevel preconditioned conjugate gradient method. Fuzzy clustering is used to facilitate the seamless integration of the multiple levels of

As new fabrication technologies for quantum nanostructures emerge (such as strain-induced self-organized growth [1] and substrate encoded size-reduced epitaxy [2] ), there is growing need for hybrid atomistic/mesoscopic computer simula- tions.Molecular Dynamics (MD) [3] is a powerful tool for the atomistic understanding of long-range stress-mediated phenomena, phonon properties, and mechanical failure of nanostructures.For realistic modeling of nanostructures, however, the scope of simulations must be extended to larger system sizes, longer simulated times, and more complex realism than what has been feasible until recently.In this paper we describe various new multilevel algorithms for large-scale, long-time MD simulations.In MD simulations, a system is represented by a set of atomic coordinates, {xil 1,..., N }, where N is the number of atoms.Time evolution ofthe system is governed by Newton's second law of motion [3], d2xi mi---gi(xi), (1) where m and gi({Xi}):--OV/Ox are the mass and force for the i-th atom.The potential energy function V({x;}) consists of a sum over atomic pairs and triples [4].
The most prohibitive computational problem in MD simulations is associated with the Coulomb potential.Because of its long range, each atom interacts with all the other atoms in the system.Therefore the evaluation of the Coulomb potential requires O(N 2) operations.An MD algorithm is developed based on multiresolutions in both space and time [5].The long-range Coulomb interaction is computed with the Fast Multipole Method (FMM) [6, 7].The FMM uses the truncated multi- pole expansion and local Taylor expansion for the Coulomb potential field (see Fig. 1).By computing both expansions recursively on a hierarchy of cells, the Coulomb potential is computed with O(N) operations.Short-and medium-range non- Coulombic interactions are computed with the Multiple Time-Scale (MTS) approach [8, 9].The MTS method is based on the fact that the farther the distance between particles the slower is the time variation of forces.Therefore different time steps are used to computer forces for diffe- rent interparticle separations.To implement this MultiResolution Molecular Dynamics (MRMD) algorithm on parallel computers, we use spatial decomposition.Processors are logically orga- nized as a cubic array of dimensions Px Py Pz, and we partition the simulation system into subsystem into subsystems of equal volume accordingly (Fig. 2).For a 4.2 million-atom SiO2 system, one MD step takes only 4.8 seconds on the 512-node Intel Touchstone Delta machine [5].The memory-bound parallel efficiency [10] of the program is 0.92 and the communication overhead is 8%.
Simulation of nanostructures is often characterized by irregular atomic distribution.One practical problem in simulating such irregular systems on parallel computers is that of load imbalance [11].Because of the irregular distribution of atoms, the uniform spatial decomposition results in unequal partition of workloads among processors.As a result the parallel efficiency is degraded signifi- cantly.

Physical System Parallel Computer
Mapping FIGURE 2 Regular spatial decomposition scheme for parallel computing.(Left) The physical system is divided into sub- systems of equal volume.Spheres and planes represent atoms and subsystem boundaries, respectively.(Right) Each subsystem is mapped onto a computing node in a parallel computer.
To avoid this problem, a dynamic-load-balan- cing capability is added to the MRMD program [12].The new load-balancing scheme introduces a curvilinear coordinate system [13], {, which is related to the atomic coordinate, x, by a mapping, x / ZxQexp(iQx). (2)  Q Workloads are partitioned with a uniform 3- dimensional mesh in the curvilinear coordinate system.The variational parameters {XQ} are chosen to minimize the load-imbalance and com- munication costs.Simulated annealing is used to solve the optimization problem.For an irregular nanocluster-assembled material, the load-balancing scheme sped up simulations by a factor of 4.2 [12].
Many important material processes (e.g., sinter- ing and sol-gel processes) are characterized by time scales that are many orders-of-magnitude larger than atomic time scales (10-15sec).A new algorithm is developed for large-scale, long-time MD simulations by combining a hierarchy of subdynamics (Fig. 3) [14].Equation ( 1) is numerically integrated using a reference system ri, which represents two types of essential dynamics, i.e., global conformational changes and fast atomic oscillations.The reference system is thus defined as a superposition, r r0i + rhi, where r0. and rm are the collective and harmonic parts, respectively.
The collective part of the reference system represents the rigid-body motion of clusters.We use the quaternion formulation of rigid-body dynamics in order to avoid the numerical singularity associated with angular coordinates [15].A large time step (10 -12 SeC) is used for the numerical integration.The harmonic part rm of the reference FIGURE 3 Various physical processes involved in the sin- tering of nanoclusters, i) Relative rotation of clusters is included by rigid-body dynamics with fuzzy clustering; ii) anharmonic atomic motions lead to surface diffusion and the growth of the neck between clusters, and these motions are included by implicit integration of Newton's equations; iii) thermal atomic motions assist the above diffusion process, and these highfrequency motions are dealt with through the normal-mode analysis.
system represents the fast oscillation of each atom around the local potential minimum, and its equation of motion can be integrated analytically in terms of trigonometric functions [16].The residual system, defined as z x;-r, is expected to vary slowly, since the rapidly oscillating harmonic motions have been subtracted.Therefore its equation is integrated by an implicit integration scheme using At which is much larger than atomic time scales [17].The integration scheme is stable for an arbitrarily large At, and it is also symplectic [17].Symplectic integrators preserve the phase-space volume, and this is essential for the long-time stability of orbitals.
The greatest challenge, however, is to integrate these heterogeneous abstraction levels into a seamless, unified scheme.To facilitate such inte- gration, we find it useful to introduce the concept of fuzzy clustering [18,19].We introduce a mem- bership function P(ic) which describes the degree of association between atom and cluster c.The principle of maximum entropy is used to determine P c).The fuzzy-body/Implicit-inte- gration/Normal-mode (FIN) scheme sped up a simulation of nanocluster sintering by a factor of 28 over a conventional explicit integration scheme, without loss of accuracy.A parallel implementa- tion of the scheme achieves an efficiency of 0.94 for a 12.7 million-atom nanocrystalline solid on 64 nodes of an IBM SP2 computer [14].
Conventional interatomic potential functions used in MD simulations are often fitted to bulk solid properties, and they are not transferable to systems containing defects, cracks, surfaces, and interfaces.In these systems, the partial charges on the atoms vary dynamically according to the change in the local environment.This environ- ment-dependent charge distribution is crucial for the physical properties of these systems including the fracture toughness.Transferability of intera- tomic potentials is greatly enhanced by incorpor- ating variable atomic charges which dynamically adapt to the local environment.Atomic charges can be determined by equalizing electronegativity [20].
However, the increased physical realism in the variable-charge MD is accompanied by increased computational cost for minimizing the electro- static energy at every MD step.This minimization is equivalent to the electronegativity equalization condition that the chemical potentials be equal for all the atoms.This condition leads to a linear equation system for atomic charges, {qi}: (3) where Mini denotes the Coulomb-interaction ma- trix, Xi is the electronegativity, and the Lagrange's multiplier # is determined from the chargeneutrality constraint.
A Multilevel Preconditioned Conjugate-Gradi- ent (MPCG) method is developed for this minimization problem by splitting the Coulomb- interaction matrix into short-and long-range components: M M + Mt [21].The short-range matrix is the contributions from atomic pairs (i,j) within the nearest neighbor leaf cells used in the FMM [5][6][7].The sparse short-range matrix M. is used as a preconditioner to improve the spectral property of the linear system and thereby accel- erating the convergence [22].For c-A1203 crystal, the preconditioner reduces the execution time to achieve the same convergence level by 20% [21].
Numerical tests involving up to 26.5 million atoms are performed on an IBM SP2 computer.Figure 4 shows the parallel efficiency (solid lines) and communication overhead (dashed lines) as a function of the number of atoms.The results with and without preconditioning are denoted by circles and squares, respectively.For the largest system, the preconditioning improves the parallel effi- ciency from 0.92 to 0.95 [21].The communication overhead of the MPCG scheme is 5% of the total execution time for the largest system.The multi- level preconditioning scheme enhances the locality of computation by extensively using the short- range interaction matrix M, and consequently the program runs efficiently on parallel platforms.In summary, we have developed various parallel multilevel algorithms for multiscale phenomena in nanostructures.Using these algorithms, multi- million-atom MD simulations are being performed for: i) nanocluster-assembled Si3N4 [23, 24], SiO2, SiC, and A1203; ii) Si/Si3N4 and A1/A1203 inter- faces; and iii) GaAs stepped surfaces and mesas.Mechanical properties including fracture, long- range stress-mediated phenomena, and phonon properties in these nanostructures are being investigated.
FIGURESchematic representation of spatial multiresolu- tion for a two-dimensional system.(Left) A hierarchy of cells in the fast multipole method.(Right) The direct forces on a particle (solid circle) are due to the primary (open circles within the hatched area), secondary (open circles within the shaded area), and tertiary (the other open circles) neighbor atoms.

FIGURE 4
FIGURE4 Memory-bound parallel efficiency of the Multi- level Preconditioned Conjugate Gradient (MPCG) program (solid lines) as a function of the number of atoms.Open circles and open squares are the results for the MPCG and the Conjugate Gradient (CG) methods, respectively.Communica- tion overheads of the same program are shown by the dashed lines.