Parallel FDTD method is applied to analyze the electromagnetic problems of the electrically large targets on super computer. It is well known that the more the number of processors the less computing time consumed. Nevertheless, with the same number of processors, computing efficiency is affected by the scheme of the MPI virtual topology. Then, the influence of different virtual topology schemes on parallel performance of parallel FDTD is studied in detail. The general rules are presented on how to obtain the highest efficiency of parallel FDTD algorithm by optimizing MPI virtual topology. To show the validity of the presented method, several numerical results are given in the later part. Various comparisons are made and some useful conclusions are summarized.

The Finite Difference Time Domain (FDTD) method, introduced by Yee, in 1966 [

Now, the code is developed to solve larger-scale electromagnetic problems. It is applied on super computer based on Linux system, which belongs to Shanghai Supercomputer Center of China (SSC). Numerical examples prove that the virtual topology will affect the computational efficiency of Parallel FDTD severely. In this paper, the influence of different virtual topology schemes on parallel performance of Parallel FDTD will be studied in detail. The general rules are presented on how to obtain the highest efficiency of Parallel FDTD algorithm by optimizing MPI virtual topology.

In Section

MPI was proposed as a standard by a broadly based committee of vendors, implementers, and users. Now, it becomes a definition for interfaces among a cluster of computers or the processors of a multiprocessor parallel computer. The key problem that MPI-based programming relates to is how to distribute the tasks of users to processors according to the capability of each processor and reduce the communications among processors to as little as possible. Reducing the communications is especially crucial as the speed of communication is far slower than that of computation.

FDTD is easy to implement because the Yee scheme is explicit. Besides, it has the principle advantage that since the grid is regular and orthogonal, electromagnetic field components are easily indexed by

Division and communication of Parallel FDTD in 3D.

Partition of the simulated problem

Field components communication

The FDTD algorithm is combined with MPI to run on parallel system. The MPI functions are employed to exchange the tangential electric (magnetic) fields on the boundary of the subdomain among the adjacent neighbors.

The parallel algorithm can be described as follows.

Initialization.

MPI Initialization.

Reading the modeling parameters from the input files.

Creation of the three-dimensional Cartesian topology.

Creation of the derived data types for communication.

Start time measurements.

Allocation of memory.

Setting all field components to zero.

At each time step.

Exciting source only on processors that include the source plane.

Calculation of new magnetic field components on each processor.

Communication of the magnetic field components between processors.

Calculation of new electric field components on each processor.

Communication of the electric field components between processors.

Calculation of transmission only on processors that include the transmission plane.

Collection of field variables only on processors that include detection points.

Reducing the transmission to a fixed processor and writing it on file.

Saving results in files.

Deallocation of memory.

Stop time measurement.

MPI Finalization.

End.

Its machine-type model is 4155-D43 with a total of 24 Intel(R) Xeon(R) X5650 CPU cores (2.67 GHz per CPU) and a total of RAM approximately equal to 64 GB.

The 37 nodes from Magic-cube Machine with a total of 512 AMD CPU cores (1.9 GHz per CPU and 4 cores on each CPU) 16 CPU cores on each node and 4 GB RAM per core, and a total of RAM approximately equal to 2.3TB. Infiniband is used for the network interconnection.

For the absorbing UPML medium, we use a thickness of 5 cells in the following examples.

For validation, the bistatic RCS is calculated for a PEC sphere with a 1 m (

Comparisons of computation time.

Cores | Virtual topology | Computation time |
---|---|---|

1 | 646.25 | |

2 | 330.75 | |

4 | 172.25 | |

4 | 170.88 | |

8 | 98.38 | |

8 | 92.75 | |

8 | 91.25 | |

16 | 95.12 | |

16 | 91.38 | |

16 | 89.00 | |

24 | 69.38 | |

24 | 67.38 |

(a) Comparison of the RCS results in

Smooth contour fill of near field distribution in frequency domain on three planes:

In Table

From Table

We discuss the parallel performance of the Parallel FDTD using the different dimensional virtual topology with the same number of processors

From above, it is obvious that for the same number of processors, the more the dimension of the virtual topology, the less the computation time required.

Parallel efficiency is shown in Figure

Parallel efficiency.

In addition, the topology along the direction where the amount of the FDTD grids is larger can save the computation time for the same dimensional virtual topology. Different division subdomains with the same dimensional virtual topology lead to different amounts of the transferred data. Expression for the total number of the grids lay on interface between processors is

Till now, the general rules on how to obtain the highest efficiency of Parallel FDTD algorithm by optimizing MPI virtual topology can be drawn as follows.

If possible, the optimum virtual topology scheme should be created in three dimensions, and then the better is in two dimensions, which can bring us higher efficiency than in one dimension.

As to the same dimensional virtual topology, the topology scheme should be created along the directions where the amount of the FDTD grids is larger.

A waveguide with ten slots is analyzed by parallel FDTD. The dimension of the waveguide and the slot structure in this example are chosen as follows: the thickness of the waveguide wall is 1.27 mm, the length of the slot is 15.785 mm, the width of the slot is 2.54 mm, and all of the offsets of the slots are 6.35 mm. Its FDTD model is shown in Figure

Comparisons of computation time.

Cores | Virtual topology | Total grids lay on interface between processors | Computation time |
---|---|---|---|

8 | 24500 | 121.09 | |

8 | 37380 | 124.56 | |

8 | 29700 | 116.09 | |

8 | 49580 | 118.00 |

(a) Model of the waveguide with ten slots; (b) smooth contour fill of E field distribution of the

Results of radiation: (a) radiation pattern of the

From Table

Then we analyze the scattering of a perfectly conducting airplane whose FDTD model is shown in Figure

(a) Cartesian Mesh of the airplane and (b) smooth contour fill of the amplitude of surface inductive current on the airplane surface.

RCS of the airplane. (a) E-plane (

Smooth contour fill of near-field distribution in frequency domain on

This example is calculated on super computer with 512 cores, which belong to the SSC. Time consumed by two virtual topology schemes with the same number of cores are listed in Table

Comparisons of computation time.

Cores | Virtual topology | Computation time | Total grids lay on interface between processors |
---|---|---|---|

512 | 896.00 | 2479680 | |

512 | 3968.00 | 3000640 |

In this paper, parallel FDTD method is applied to analyze the scattering of the electrically large targets. The code we developed is successfully run on super computer in Shanghai Supercomputer Center of China (SSC). The influence of different virtual topology schemes on the parallel performance of Parallel FDTD is studied in depth and in detail. The results show that the computation time efficiency can be improved by properly choosing MPI virtual topology schemes. Following the two conclusions above, we can obtain the highest computational efficiency.

This work is partly supported by the Fundamental Research Funds for the Central Universities of China (JY10000902002 and K50510020017) and the National Science Foundation of China (61072019). This work is also supported by Shanghai Supercomputer Center of China (SSC).