Efficient Ab-Initio Electron Transport Calculations for Heterostructures by the Nonequilibrium Green ’ s Function Method

We present an efficient computation technique for ab-initio electron transport calculations based on density functional theory and the nonequilibrium Green’s function formalism for application to heterostructures with two-dimensional (2D) interfaces. The computational load for constructing the Green’s functions, which depends not only on the energy but also on the 2D Bloch wave vector along the interfaces and is thus catastrophically heavy, is circumvented by parallel computational techniqueswith themessage passing interface, which divides the calculations of the Green’s functions with respect to energy and wave vectors. To demonstrate the computational efficiency of the present code, we perform ab-initio electron transport calculations of Al(100)-Si(100)-Al(100) heterostructures, one of themost typicalmetal-semiconductor-metal systems, and show their transmission spectra, density of states (DOSs), and dependence on the thickness of the Si layers.


Introduction
Recent technological advances have made it possible to fabricate nanometer-scale devices.Since device operations are strongly affected by the interfaces of different materials, understanding of the electronic structures and transport properties from atomistic viewpoints has become indispensable for such nanodevices.Ab-initio electron transport calculations based on density functional theory (DFT) [1,2] combined with the nonequilibrium Green's function (NEGF) formalism [3,4] have been widely recognized to be highly advantageous for analyzing the electron transport properties of nanostructures.
Metal-semiconductor-metal heterostructures are one of the most important systems for nanodevices such as field effect transistors (FETs) and capacitors.This system has been studied intensively to clarify the electronic states of interfaces, which determine the performance of nanometerscale electronic devices.For two-dimensional (2D) interfaces, since the electron transport properties depend not only on the incident electron energy across the interfaces but also on the 2D momentum parallel to the interfaces, the computational load for calculating the Green's functions at each incident energy using a 2D Bloch wave vector has become severely demanding.
When the electron-phonon interactions are small and can be ignored, the couplings to other states with a different energy and wave vector are neglected and then the Green's functions are computed separately for different energies and wave vectors.This enables us to perform a parallel computation with message passing interface (MPI) and reduce the computational burden significantly.To demonstrate the efficiency of this technique, here, we perform ab-initio electron transport calculations of metal-semiconductor-metal heterostructures with 2D interfaces using our program code based on DFT and the NEGF formalism [5,6] combined with MPI parallelization with respect to the incident energy and 2D Bloch wave vectors.Specifically, we present electronic states and transmission spectra for Al(100)-Si(100)-Al(100) heterojunctions in which a small number of Si layers are sandwiched between Al electrodes and discuss the dependence of transport properties on the Si layer thickness.

Basis Functions and Matrix
Elements with Two-Dimensional Periodicity.In this paper, we use atomically localized basis functions  k ‖ (r) for the expansion of wavefunctions [7].Here, the numerical pseudoatomic orbitals |  k ‖ ⟩ are the eigenfunctions of the pseudopotentials under the confinement potential [8] in the 2D Bloch representation: where the index  is used as an abbreviation of atom   , multiplicity   , and azimuthal and magnetic quantum numbers   and   , respectively; that is,  = {  ,   ,   ,   }. k ‖ and  represent the 2D Bloch wave vector and the index of the unit cells along the direction parallel to the electrode interfaces, respectively.Using these basis functions, the overlap matrix and the Kohn-Sham Hamiltonian matrix are written as The overlap matrix and the Kohn-Sham Hamiltonian matrix in (2) are calculated by the standard procedure for DFT calculations using the numerical pseudoatomic orbital basis set [9].

Charge Density Construction with Massive Parallelization.
In the DFT-NEGF calculations, the density matrix is obtained by integration of the Green's function with respect to energy and the wave vectors [3,4].For a system with 2D periodicity, the density matrix is calculated as follows: Here,  L and  R are the chemical potentials of the left and right electrodes, and  r and  < are the retarded and lesser Green's functions, respectively, where  BZ is the area of the Brillouin zone in the 2D reciprocal space.The two Green's functions are calculated from where  a is the advanced Green's function and Σ L and Σ R are the self-energy matrices of the left and right electrodes, respectively, which are obtained using the surface Green's function technique: The first term on the right hand side of (3) is the contribution from equilibrium states and the second term is the contribution from nonequilibrium states.They are integrated along the complex energy  contour and the real energy  axis, respectively, as shown in Figure 1.
The most time consuming part of the computation based on the DFT-NEGF formalism is the calculation of the Green's functions  r and  < to obtain the density matrix by integration of the Green's functions with respect to energy and the wave vector.In a typical system, the total number of discretized points for the energy and the wave vector in practical calculations are about 10 2 -2 × 10 2 and 10 2 -10 3 , respectively, and thus the total number of discretized points is of the order 10 4 -10 5 .This heavy computational load is imposed at each iteration procedure for self-consistent field (SCF) in the DFT-NEGF formalism, causing difficulty in the SCF transport calculation.
This heavy computational load can be avoided by using parallel computing.When the electron-phonon interactions are small and can be ignored, the Green's functions depend only on a single set of the energy and the wave vectors as shown in (4), which means that the calculations of the Green's function are performed independently for various sets of energies and wave vectors.The calculation of (3) is separated by parallelization with MPI as shown in Figure 2(a), and the MPI processes are distributed among multicore processors.The integration of the Green's functions is also performed in parallel.

Yes
No Eq (3) and ( 4) Eq ( 8) and ( 9) The Green's functions are calculated independently for various sets of complex/real energies and 2D Bloch wave vectors, whose processes are assigned to multicore processor by MPI.During the calculations of the Green's functions, the density matrix is partially calculated by integration with respect to the energy and the wave vectors in each core.After the calculations of the density matrix in each core are completed, the Hamiltonian matrix is constructed by summing the elements with communication among MPI processes, and then all the matrix elements of the density matrix are obtained.(b) Elapsed time and speed-up ratio of total elapsed time resulting from the parallelization by the MPI in the ab-initio electron transport calculations for an Al(100)-Si(100)-Al(100) heterostructure.The speed-up ratio is calculated as  1 /  , where  1 and   are the elapsed times for serial and parallel computing, respectively.The number of discretized grid points in the calculations are  k ‖ = 121,   = 50, and   = 100.For reference, the ideal speed-up ratio is also represented by the green dotted line.The calculation speed increased linearly while lagging slightly behind the ideal speed-up ratio.
The partially integrated density matrix    in each core is calculated using where   k ‖ // and  k ‖ // are the total number of discretized points and the weight factor for the integration with respect to k ‖ , , and , respectively.After the partial integration, the density matrix    in each processor is distributed among all the processors and is summed to obtain the total density matrix   .Only values of    are communicated in the parallel computing, which significantly reduces the MPI communication latency.Since the total number of discretized points of energy and wave vectors is more than 10 3 , the computational efficiency is improved as the total number of processors increases.The number of distributed arrays in the communication between the processors required for the calculation of the density matrix becomes larger as a trade-off in such large-scale parallel computing, and thus the maximum efficiency should be determined.
The speed-up ratio of the present numerical code is shown in Figure 2(b) as a function of the total number of cores for ab-initio SCF electron transport calculations of an Al(100)-Si(100)-Al(100) heterostructure.Here, the speed-up ratio is defined as  1 /  , where  1 and   are the total elapsed times for serial and parallel computing, respectively.We see that the speed-up ratio exhibits almost linear behavior with increasing number of cores, showing the efficiency of MPI parallel computing using the present code.This enables us to perform the transport calculations of large-scale systems with 2D interfaces.
After the SCF loop converges, the transmission function () for the electron transport and the density of states (DOS) () are calculated from x, y z Left electrode Right electrode Figure 3: Schematic diagram of Al(100)-Si(100)-Al(100) heterostructure.The system is periodic along the  and  directions parallel to the interfaces and the transport direction is taken along the  direction, which is oriented in the (100) direction normal to the interfaces.The square region surrounded by dashed lines represents 2D unit cell.Here,  0 and  are the lattice constant of a Si monolayer and the total thickness of the Si layers, respectively.
The electric current () flowing through the system at an applied bias voltage  is obtained by integrating the transmission function with respect to the energy within the different chemical potentials of the left and right electrodes as Note that, in the same way as for the parallel computations of the Green's functions in the SCF calculations, the computations of the Green's functions along the real energy axis for the transmission function T() are also performed by parallelization with respect to the energy and the wave vectors.

Results
As an application of the present method, we calculate the electron transport properties of Al(100)-Si(100)-Al(100) heterostructures.The system is composed of a small number of Si layers sandwiched between Al electrodes with (100) direction normal to the interfaces in the transport direction.
A schematic diagram of the system is shown in Figure 3.For this system, we consider the dependence of the transport properties and the electronic states on the thickness of the Si layers  between the interfaces.First, note that the system becomes metallic when the Si layers are thin owing to the hybridization of electronic states of metallic electrodes.On the other hand, the system becomes semiconducting with a finite energy gap as the Si layers become thick.It is interesting to consider the Si layer thickness needed for the system to show semiconducting behavior.
Here, we analyze the transport properties of Al(100)-Si(100)-Al(100) heterostructures with various Si layer thicknesses for the bulk lattice constant of  0 = 5.431 [ Å]. Single- polarized basis sets [8] with the GGA-PBE functional [10] for the exchange-correlation energy and nonlocal pseudopotentials in the fully separable Kleinman-Bylander form [11] are used in the calculations.
Figure 4 shows the transmission probability (Figure 4(a)) through the two Al electrodes and the DOS (Figure 4(b)) for various thicknesses of Si layers.We see that the system becomes metallic owing to metallic electrode contacts in the case of the minimum Si layer thickness of  =  0 .We also note that the direct interaction between the electrodes exists when their distance is less than twice the cutoff radius of the basis functions for Al atoms.In the case of more than three Si layers, finite energy gaps appear around the Fermi level, which originate from the semiconducting properties of the Si layers.Since the lattice constants of Al and Si along the  and  directions are comparable, that is,  Al / Si = 0.997 in the present heterostructure, the electron density and effective potential near the center of the Si layers rapidly recover the bulk values as the Si layers become thick.Thus, the electron transport properties at the atomic scale are governed by the thickness of the layers.

Conclusions
We have developed an efficient numerical calculation code for ab-initio electron transport with 2D interfaces based on DFT and the NEGF formalism in atomically localized basis sets.The parallel computation techniques with the MPI are employed for efficient computation of the density matrix via the Green's functions, which depend on the energy and 2D wave vectors parallel to the interfaces.Since each matrix element of the Green's functions depends only on a single energy and 2D wave vector point, it is possible to divide the calculations with respect to the energy and the wave vectors, which are carried out separately in different processors.
As a demonstration of the efficiency of the present code, numerical calculations of the electron transport properties of Al(100)-Si(100)-Al(100) heterostructures are presented, where the total number of points of energy and wave vectors is about 10 5 .The computational efficiency improves as the total number of processors increases, with the communication latency between the processors becoming negligible.As an application of the present technique, electron transport properties such as transmission spectra and the DOS were investigated for the above system with different thicknesses of Si layers.The system shows metallic behavior with finite transmission for the entire energy regime for the minimum Si layer thickness.For a thickness of more than three-layers, finite energy gaps are formed around the Fermi level, which originate from the semiconducting properties of the Si layers.

Figure 1 :
Figure1: Integration contour for the Green's functions used to calculate the density matrix.The complex contour  is discretized by the Gauss-Legendre scheme, where the total number of grid points of contour  is typically about 40.The contour along the real axis is discretized by a fine mesh.

Figure 2 :
Figure 2: (a) Flowchart of DFT-NEGF calculations with parallelization by MPI.The calculation process surrounded by the dashed line corresponds to the calculations of the Green's functions, which are the central and most time consuming parts of the present formalism.The Green's functions are calculated independently for various sets of complex/real energies and 2D Bloch wave vectors, whose processes are assigned to multicore processor by MPI.During the calculations of the Green's functions, the density matrix is partially calculated by integration with respect to the energy and the wave vectors in each core.After the calculations of the density matrix in each core are completed, the Hamiltonian matrix is constructed by summing the elements with communication among MPI processes, and then all the matrix elements of the density matrix are obtained.(b) Elapsed time and speed-up ratio of total elapsed time resulting from the parallelization by the MPI in the ab-initio electron transport calculations for an Al(100)-Si(100)-Al(100) heterostructure.The speed-up ratio is calculated as  1 /  , where  1 and   are the elapsed times for serial and parallel computing, respectively.The number of discretized grid points in the calculations are  k ‖ = 121,   = 50, and   = 100.For reference, the ideal speed-up ratio is also represented by the green dotted line.The calculation speed increased linearly while lagging slightly behind the ideal speed-up ratio.

Figure 4 :
Figure 4: Transmission probability (a) and DOS (b) for various thicknesses of Si layers around the Fermi level.The system shows metallic behavior with finite transmission for the entire energy regime for the minimum Si layer thickness.For a thickness of more than three-layers, finite energy gaps are formed around the Fermi level, which originate from the semiconducting properties of the Si layers.