In parallel computing based on finite element analysis, domain decomposition is a key technique for its preprocessing. Generally, a domain decomposition of a mesh can be realized through partitioning of a graph which is converted from a finite element mesh. This paper discusses the method for graph partitioning and the way to actualize mesh partitioning. Relevant softwares are introduced, and the data structure and key functions of Metis and ParMetis are introduced. The writing, compiling, and testing of the mesh partitioning interface program based on these key functions are performed. The results indicate some objective law and characteristics to guide the users who use the graph partitioning algorithm and software to write PFEM program, and ideal partitioning effects can be achieved by actualizing mesh partitioning through the program. The interface program can also be used directly by the engineering researchers as a module of the PFEM software. So that it can reduce the application of the threshold of graph partitioning algorithm, improve the calculation efficiency, and promote the application of graph theory and parallel computing.
Currently, numerical simulation has become an important means to solve complex scientific and engineering problems. The scientific and technical workers in many fields have developed specialized CAE calculation program, offering solution to a variety of practical problems. However, as the problems arising from simulation become more complex, many professional softwares fail to meet the current demand in computation speed, data accuracy, and computation scale. With increasing the speed and scale of calculation as the primary objective, high-performance computing has become a fundamental solution to this problem. A great number of commercial CAE softwares have developed the parallel computing function, mostly by performing parallel computing based on domain decomposition. Domain decomposition refers to the coarse-grained decomposition of the entire computational domain. Therefore, the effectiveness and efficiency of domain decomposition have a great influence on calculation softwares.
The optimal partitioning of irregular and nonstructural graphs is the key to efficient and scientific simulation on high-performance parallel machine. For finite element computing, especially that on distributed storage system, the physical meshes need to be mapped onto each processor to ensure the consistency of the number of mesh cells and the minimal exchange of information between the processors. Therefore, a proper way is to define the mesh as a graph and then perform graph partitioning [
The methods for graph partitioning include geometric method [
Multilevel partitioning process.
The
For large-scale graph partitioning (above the level of tens of millions), the internal memory of a single processor is insufficient. For the parallel computing for self-adaptive problem, repartitioning needs to be done for the graph. In the serial part, the graphs need to be gathered to a single processor. This would become a serious bottleneck affecting the computing efficiency of the entire process. Thus, parallel graph partitioning is especially important. At present, parallel graph partitioning mainly targets the parallelization of geometric method, spectral method, and multilevel method. Geometric method is easy to be parallelized, but usually a parallel sorting algorithm is needed. By contrast, spectral method and multilevel method are more difficult to be parallelized. The execution time for these methods is equal to the time needed for the parallel vector multiplication for the randomly distributed matrix. For the initial graph after initial partitioning, the times needed for the parallel multilevel method and spectral method are reduced to the time needed for the vector multiplication for an already partitioned matrix. Generally, static graph partitioning does not yield initial partitioning, while high-quality initial partitioning is obtained in repartitioning. Therefore, the time for the parallel execution of self-adaptive problem is shorter than the time for static graph partitioning. Thus, initial partitioning needs to be done with geometric partitioning method, followed by multilevel repartitioning to improve parallel efficiency. At present, both ParMetis and Jostle have actualized parallel multilevel partitioning [
Parallel finite element computing needs to partition the mesh are based on two principles: ensuring the same node number in each subdomain and minimizing the node number on the interface between subdomains. The purpose of the two principles is to ensure the load balancing of each processor and the minimal communication volume between processors. Mesh partitioning by means of graph partitioning first requires using the graph to simulate the computation structure of the mesh, that is, converting the mesh to the graph. Each node of the mesh can be programmed as the vertex in the graph, with the connecting line between nodes as the edge of the graph. The graph obtained in this way is called nodal graph. Another practice is that each mesh cell can be processed to correspond to a vertex of the graph so that when two cells share a face or an edge, the side exists between the corresponding vertices. This graph is called dual graph. These two graphs correspond to the partitioning based on mesh node and mesh cell, respectively. Figure
Graph for the mesh.
In the past two decades, graph partitioning methods have witnessed rapid development, giving rise to many graph partitioning software packages, such as Metis, ParMETIS, Chaco, Jostle, Party, and Scotch. Besides, ParMETIS and Jostle have also actualized graph partitioning. Among these software packages, the most representative ones are METIS and its parallel version ParMETIS. METIS is a procedure set that can actualize serial graph partitioning, finite element mesh partitioning, and calculate the fill-reducing ordering of sparse matrix. The actualized algorithms include multilevel recursive bisection, multilevel
Metis has provided some programs, including partnmesh and partdmesh, the programs of partitioning the mesh by converting the mesh to nodal graph and dual graph. By calling the two programs in accordance with the corresponding data entry format, mesh partitioning based on nodes and cells can be carried out. The data file format is shown in Figure
Data file and node numbering format.
ParMetis is mainly a function library. In order to actualize graph partitioning, the user needs to program and call ParMetis function. The required data structure for the graph is DCSR (distributed compressed storage format). As for the graph shown in Figure
Data format on three processors.
A simple graph.
Later, ParMETIS_V3_PartKway function can be called to perform graph partitioning. If the information about the vertex coordinates of the graph is available, the parallel graph partitioning can be carried out with ParMETIS_V3_PartGeomKway and ParMETIS_V3_PartGeom function. As they both use the coordinate-based multiconstraint
Targeting the function provided by ParMetis, the author wrote the interface program for ParMETIS_V3_PartMeshKway and converted the mesh to graph after parallel reading of the data file. Then parallel partitioning is finally actualized. Take a finite element mesh as an example. For a mesh with a cell number of 810,000 and a node number of 836,381, Metis function and ParMetis function are respectively called to partition the mesh. Data show that the partitioning results of Metis and ParMetis are consistent, as displayed in Table
The results for multilevel recursive bisection algorithm.
Subdomain 1 | Subdomain 2 | Subdomain 3 | Subdomain 4 | Cut edge number | |
---|---|---|---|---|---|
Nodal graph | 209095 | 209095 | 209095 | 209096 | 22002 |
Dual graph | 202500 | 202500 | 202500 | 202500 | 21242 |
Metis (including ParMetis) uses two types of objective function. One has the minimal edgecut, while the other has the minimal ncommunications volume, which is represented by the sum of the numbers of domains to which the boundary points belong or the total number of times of data transmission. For the partitioning of finite element mesh, the two objective functions produce generally consistent results. The above test was controlled by the objective of minimizing the cut edge. Table
The comparison of the partitioning time of Metis and ParMetis (second).
Partition number | 4 | 16 | 64 | 256 | |
---|---|---|---|---|---|
Metis | I/O | 2.350 | 2.350 | 2.400 | 2.370 |
Partition | 3.730 | 3.750 | 4.130 | 4.700 | |
Total time |
|
|
|
|
|
| |||||
ParMetis (2CPU) | Mesh2Dual | 2.559 | 2.568 | 2.567 | 2.571 |
Partition | 1.533 | 1.663 | 1.81 | 2.047 | |
Total time |
|
|
|
|
It can be seen from Table
Figure
Time of 256 partitionings on different CPUs.
Diagram for parallel partitioning.
In this paper, the authors discuss the way to actualize partitioning methods and mesh partitioning and introduce relevant softwares for parallel partitioning algorithm. The data structure and key functions of Metis and ParMetis are analyzed. The partitioning program is compiled based on these key functions. Moreover, detailed test on typical mesh partitioning is conducted. The results show that ideal partitioning results will be yielded by actualizing domain decomposition by the programming based on Metis and ParMetis. Parallel partitioning is mainly for the simulation of large-computing scale, large-data size, and multiphysical, multistage, and multimesh simulations, while for static finite element method of general size, serial partitioning is enough.
The authors do not have any conflict of interests with the content of the paper.
This work was financially supported by the National Natural Science Foundation of China (51209235), the National Basic Research Program of China (973 Program) (2013CB035904, 2013CB036406), the National “Twelfth Five-Year” Plan for Science & Technology Support (SQ2013BAJY4138B02), the Governmental Public Industry Research Special Funds for Projects of MWR (201201050), and Special Research Funds of IWHR (1361/1353/1169/1309).