The aim of this paper is to discuss efficient adaptive parallel solution techniques on unstructured 2D and 3D meshes. We concentrate on the aspect of parallel a posteriori mesh adaptation. One of the main advantages of unstructured grids is their capability to adapt dynamically by localised refinement and derefinement during the calculation to enhance the solution accuracy and to optimise the computational time. Grid adaption also involves optimisation of the grid quality, which will be described here for both structural and geometrical optimisation.
The accuracy of a numerical simulation is strongly dependant on the distribution of grid points in the computational domain. For this reason grid generation remains a topical task in CFD applications. Prior knowledge of the flow solution is usually required for a grid to be efficient, that is, matching the features in the flow field with appropriate grid resolution. This, however, may not be available, requiring human intervention in analysing the results of an initial solution, going back to the preprocessing stage, and taking an educated guess at how the mesh should be modified. Alternatively, a generally fine grid over most parts of the domain is generated to obtain a relatively good solution. Both of the above cases however, require excessive time, effort, and computational resources.
Let us consider the case with manual intervention by the user. This step can be automated by adaptation, whereby the flow solution is analysed automatically, following some predefined criteria, and the grid resolution adjusted to the problem. The use of such techniques allows for computationally precise distribution of grid points (rather than eye precision) and for extremely reduced user intervention, thus addressing the time and effort issues. It also resolves problems related to computational time and costs, as the adapted grid can have fewer overall points, with similar resolution in areas of interest, than an unadapted fine mesh.
Grid enrichment (h-refinement) is used here; that is, the density of grid points is increased in regions in order to minimise the space discretisation error [
In this method the mesh topology is drastically changed, as nodes are added and removed in order to capture flow features and at the same time reduce the computational load in areas where the solution is sufficiently smooth. Therefore it is particularly suitable for unstructured grids, where the structure can undergo significant changes.
The criteria for refinement and derefinement can be based on solution-based criteria and/or error estimation criteria. Grid enrichment may be further divided into two main streams, grid remeshing and grid subdivision. We will be using the second method, with the grid being divided into smaller elements where necessary. New nodes are added to edges that are identified for refinement, and in turn the cells are divided. Therefore, it is easy to see how the use of unstructured grids can be particularly beneficial. The advantage of this method is its speed and efficiency. Drawbacks of this technique are the complex data structure and most often, the lack of information of the underlying geometry on the bounding surfaces.
This technique can be approached in two different ways, with a hierarchical framework which saves parent-child relationship between cells at every step and a non-hierarchical approach which discards the history of the original mesh during the filiation of successive grids. The method adopted here is completely nonhierarchical [
The paper is organised as follows. Grid adaption is treated in detail in Section
The algorithm outlined in [
Note that different space discretisation techniques can be used at different grid adaptation cycles. In general, we start with first-order schemes on nonadapted grids, then we turn to second-order schemes on adapted grids.
The goal of grid adaptation is to increase the accuracy of the solution process by locally enforcing the
The first step in a grid adaptation algorithm is therefore to locally evaluate criteria corresponding to the solution error estimate and mark out the zones to be modified in order to minimise globally the error. The criteria used should be as close as possible to the error estimations of the underlying discretisation scheme, taking as adaptation criteria functions of the current solution field (local error). There are several derivations of the adaptation criteria. One way is to evaluate the error based on the form of the original equations, a priori, which is a challenging task for nonlinear complex systems such as the hyperbolic-elliptic system of the Euler equations. Another is to evaluate an optimisation procedure based on derivatives. For the grid adaptation procedures developed here, a strategy based on an a-posteriori error estimate where the computed residual of the solution is used to define the error [
Strictly speaking, all the error estimation-based criteria require a complete formulation of the underlying discretisation scheme of the non-linear system that is used to model the physical problem. For finite element discretisations, there are several rules for assuring admissibility, conformality, and regularity of the geometrical properties of the grid elements [
Adaptation requires in all cases a local error estimate per grid cell,
In this work local estimates are all based on a posteriori criteria, which require a solution on the starting grid. Then, grid refinement and coarsening operations are performed, taking as adaptation criteria functions of the current solution field (local error) and geometrical properties of the current grid (optimisation).
These criteria are used to perform both grid refinement and derefinement operations at first. Then, an optimisation step follows, based on geometrical properties of the current grid, followed by repartition, reordering, and renumbering. These phases are now detailed.
The physical adaptation criteria adopted in this work are based on flow quantities such as density, Mach number, pressure, and entropy, as are also error estimators, but differ in the simplicity of their construction. In fact these use directly the physical quantities mentioned. A first method is to take the difference between the values at the nodes of a segment and use its absolute value as an indicator for the adaptation process. Although this may seem as a very crude way of identifying flow features, it is very effectively applied to the grid enrichment method mentioned earlier. Another method employed is the undivided gradient along an edge [
From this it can be clearly seen that the value of
Various modifications to this method have been developed, such as inclusion of local mesh length scale:
This leads to a more effective refinement criterion [
In fact the so-called “physical” adaptation criteria correspond to exact mathematical error estimators. In [
Two adaptation criteria used herein are based on physical criteria. Let us consider a solution field the difference of the gradient of the upwind flux of the downwind flux of
where
Further control on the field is obtained through the use of a filter
Graphical representation of the filter
The refinement criterion is then built with
Graphical representation of factor
The adaptation procedures developed here apply for general shaped elements in 2 or 3 dimensions. They are based on the concept of an element built up as an agglomerate of a cell and its neighbours, called a shell. The filters work on the shells, as the structural optimisation procedures that are described below. The first step is the local refinement and coarsening step, which requires evaluation of the criteria and successive marking out of the interior properties of the shells.
Initially, the algorithm tests all the segments of the existing grid and decides whether a new node has to be created on each segment or not. Usually, a low-pass and a high-pass filter are applied to the solution fields. These filter operations render an error estimation in an appropriate norm, and also detect discontinuities. By filtering the gradient of the pressure or the Mach number, for instance, a local densification and stretching of the grid is applied. Also criteria based on the original geometry of the grid are used to optimise the grid structure. These criteria will depend on factors such as absolute segment length, related to neighbouring cells, and so forth.
Then, in order to minimise the number of nodes and elements and to improve the geometrical quality of the grid, a grid coarsening algorithm is employed. The procedure is essentially as the one outlined for grid refinement. As a result, a set of nodes to be deleted is found. These nodes are removed using an edge collapsing procedure (see Figure
Segment collapsing with shell control: (a) initial shell, (b) shell inversion collapse, (c) collapse with element distortion, and (d) best collapse available.
The low- or high-pass filter which is applied on the solution field
This segment collapsing method used herein, developed by Savoy and Leyland [
The procedure works in both two and three dimensions and is very efficient in the first case. Efficiency is somewhat reduced in the 3D case due to a large number of constraints imposed during the marking, especially when avoiding element inversion, which in turn does not allow to remove a large amount of nodes.
Mesh quality and precision of the underlying discretisation are highly dependent on the shape of elements and shells just described. Therefore an equilibrium state would be desirable in the cells. This is achieved by equilateral triangles in 2D and equilateral-type tetrahedra in 3D. However, the mesh obtained after the refinement and coarsening steps will be far from this desired equilibrium state. This is due to the different local node density and strong variations between element sizes and nodes angles. The number of node neighbours may also differ dramatically between vertices. In order to overcome these problems arising from the previous steps, the mesh must be optimised. This is done in several ways that may be grouped into two major strategies: structural optimization: diagonal swapping, edge collapsing, geometrical Optimisation: spring analogy, boundary smoothing, inverted elements.
In this step the mesh is analysed and modified in function of the number of node neighbours
This consists in swapping the internal edge of two neighbouring triangles, as shown in Figure
Two-dimensional edge swapping.
The procedure is carried out to reduce the number of node neighbours
The three-dimensional case requires more effort and attention, as the swap implies a face swapping, leading to complete remeshing of the shell built with the elements surrounding the deleted segment. The volume conservation must also be checked in order to avoid cell inversions during the shell remeshing. An example of a face swap is shown in Figure
Three-dimensional face swapping.
This intervention is done when
These criteria are valid in both two and three dimensions. An example of edge collapsing in 2D is shown in Figure
Edge collapsing in 2D.
The goal of this step is to modify the mesh without changes to the global data structure. This is achieved primarily by means of node displacement, based on spring analogy. However, other techniques must be applied to ensure a better handling of the node displacement. Node neighbours’ number, for example, will be employed again for adjusting the spring stiffness. Particular care will be given to nodes lying on the bounding geometry and avoiding element inversion.
This technique has been heavily developed for moving mesh algorithms [
Spring analogy: springs replacing segments.
Node neighbours’ number is once again very useful for mesh optimisation. In fact, if spring stiffness if if
To partially avoid this problem, the following weight function can be used to determine the spring stiffness:
This relates the spring stiffness to
Springs’ movement based on node neighbours: springs contracting.
Springs’ movement based on node neighbours: springs expanding.
Nodes lying on the geometric boundaries have to be moved with caution (if moved at all). Whatever the method used for positioning the node on the underlying geometry, a sufficient node density must be guaranteed within critical regions where the boundary curvature is large. This can be achieved by maintaining boundary nodes with a new spring joining the reference point
The stiffness of the new spring is then defined as a function of the curvature angle
Curvature angle and filter.
This is a major issue, [
Inverted elements’
Cell inversion may also occur inside the heart of the mesh. A method to avoid this can be devised by setting critical cells rigid, with segment springs working in only one direction, rendering a relaxation of the elements. The vertex movement is then made free if it increases the element quality, which means that the introduced segment springs work like
Cell inversion
To determine the stiffness of the segment spring when it acts as a
Torsional spring.
The nonisotropic behaviour of the
Finally the effect of the torsion spring is shown in Figure
Torsion spring effect: (a) initial grid, (b) cell inversion, and (c) torsion spring.
We note that smoothing and stretching algorithms are global, and hence the parallelisation of the these algorithms requires a global renumbering of all the grid nodes. This means that each node in the overlapping region will be assigned to the update set of a unique processor before the smoothing and stretching can be performed. Apart from the renumbering phase, the communication required to solve (
In order to perform the grid adaption procedures on a parallel computer, two approaches can be followed. The first one is the
The paradigm we have adopted is based upon the concept of nonhierarchical grid adaptation; that is, the successive grids do not remember their original affiliation; see [
Note that the incorporation of parallel grid adaptation within the solution process requires load balancing partitioning techniques to obtain well-balanced subdomains. This introduces other algorithmic concepts such as parallel sorting and renumbering techniques.
The grid adaptation techniques are applied globally throughout a prepartitioned mesh and require careful renumbering and reordering internally per processor (local) and globally of the addresses of the entities, cells, shells, faces, edges, nodes, and so forth, in order that the adaptation renders a global mesh that is in turn re-partitioned again. All this is dynamic and needs to have the partitioning procedure as an integral part of the adaptation procedure.
The parallelisation of the refinement leads to the tracking of nodes created on an updated segment which are considered as a new border (interface) node. For coarsening, the principle that when attempting to delete a border node, a border segment must be chosen was applied.
The parallelisation of the structural changes is one of the hardest points, especially for the choice of overlapping partitions. For these reasons the swapping and collapsing work most efficiently on the internal segments. However, diagonal swapping or face swapping is still straightforward across partition interfaces. Collapsing is often harder to control.
The smoothing procedures do not modify significantly the internal mesh topology; the parallelisation is hence straightforward as long as a coherent numbering of the nodes, segments, faces, and cells is employed.
From the point of view of parallel computing, the grid adaptation procedure may result in an unbalanced distribution of the workload among the processors. Hence, the workload for each subdomain may be different, and this can produce an inefficient parallel performance. Effectively, the worker with the largest workload can delay the process. In fact, the starting domain decomposition was obtained to balance the workload for the initial grid (i.e., the same number of nodes on each subdomain and the minimum number of cut elements), whereas the adaptation algorithm could have generated many nodes on some subdomains (leading to more computing resources on the corresponding processors) and may have derefined in subdomains given to other processors (thus requiring less computational resources). Therefore, the computational domain is repartitioned dynamically within the parallel adaptation procedure using a parallel graph partitioning algorithm. For these purposes, the library ParMETIS [
To complete the parallel adaptation procedure, fast efficient multiple renumbering techniques are necessary for the grid entities: elements, segments, faces, and nodes. For this MPI library routines are called explicitly and a fast dynamic binomial search tree to sort during renumbering procedures is implemented based on a balanced binomial search tree algorithm AVL (Adelson-Velskii and Landis) [
An AVL tree is a dynamically balanced binary search tree that is height
When a new node is inserted into the tree, it appears at the root, then moves along the branches until it finds an attachment to the tree. Once the node is inserted, the tree balance is checked. If no imbalance is found, another node is inserted and the process continues. If an imbalance is found, the heights of some nodes are fixed and the process repeated. When a node is deleted, the root becomes unbalanced. The lookup is performed to balance again.
Lookup, insertion, and deletion operations are of
In order to assess the various functionalities of the techniques in place, a few test cases have been carried out in both two and three dimensions. For the two-dimensional case, the transonic flow over a NACA 0012 airfoil is used. For the three dimensional case we consider the supersonic flow over a forward wedge and different flow regimes for a concept aircraft. For all test cases, a parallel, unstructured grid, Euler solver THOR [
For this standard 2D test, the starting, nonadapted grid is composed of 2355 nodes. A first-order solution is computed on this grid; then, 4 steps of adaptation are performed, in order to improve the solution quality. The adaptation criteria are based on the density. Figure
Evolution of the NACA0012 grid and the corresponding solution field (density) for
In the second test case we present is a 3D forward wedge. We start from a rather coarse, hand-made grid, while the final adapted grid is composed by 80629 nodes and 480442 elements. The adaptive module is then used to refine the grid according to the physics of the solution field. This test case is interesting since despite its simple geometry, it presents shock reflections of different strength, which are to be captured by the adaptative procedure.
The successive grids and their corresponding solutions are presented in Figure
Evolution of the 3-dimensional wedge and the corresponding solution field (density) for
The reasonably good quality of the last grid requires a large number of smoothing iterations. It is indeed essential to proceed carefully to avoid any element inversion. The solution scheme chosen is the standard N-scheme, which is a first-order scheme. Note that even if the starting grid is too coarse to permit an acceptable solution, adaptivity allows to obtain a solution which clearly captures the complex physics of the this problem.
The second, three-dimensional test case is represented by a concept aircraft, Smartfish [
Here we present the results of some adaptations with different initial grid sizes and adapted with different physical criteria. The tests are carried out at transonic Mach numbers and with nonzero angles of attack. In particular we first test a very coarse grid for this type of problem, with
Adapted coarse grid with respect to the Mach gradient. 2 adaptation cycles only with refinement at the Mach number 0.8 and angle of attack 2°,
Moving onto a denser initial grid (
Adapted coarse grid with respect to the Mach difference. 1 adaptation cycle with refinement and derefinement at the Mach number 0.9 and angle of attack 4°,
Finally a relatively fine grid was used to start the process (
Adapted finer grid with respect to the Mach difference. 4 adaptation cycles at the Mach number 0.9 and angle of attack 4°,
In order to verify the performance of the adaptive procedures, some of the test cases presented in the previous sections have been re-run. First a 2D NACA 0012 airfoil is used to test the code on the Linux cluster (
In order to test the charge of the adaptation process, with respect to the total time of the CFD computation, a same NACA 0012 airfoil case was executed with a different number of processors. In particular the adaptation process was run with refinement and coarsening procedures, adaptation with respect to the Mach gradient, 20 optimisation cycles (swapping and collapsing), and 20 smoothing cycles. Four adaptation steps were carried out; hence from an initial grid of
Final grid sizes for different number of processors after 4 adaptation steps. 2D NACA 0012 airfoil.
Processors |
|
|
|
---|---|---|---|
1 | 35 130 | 69 615 | 645 |
2 | 35 082 | 69 527 | 637 |
4 | 34 335 | 68 031 | 639 |
8 | 34 839 | 69 035 | 643 |
Execution time for flow solver and adaptation process on multiple processors. 2D NACA 0012, four adaptation steps.
In a similar way to that of the 2D case above, we first compare the execution time of the flow solver and that of the adaptation with a different number of processors. The test is carried out with the 3D wedge example, with an initial mesh of
Final grid sizes for different number of processors after 3 adaptation steps and after 1 adaptation step for 64 and 128 restarting from 32 final solution and grid. 3D wedge.
Processors |
|
|
|
---|---|---|---|
8 | 1 118 350 | 6 721 070 | 50 714 |
16 | 1 123 326 | 6 753 716 | 50 963 |
32 | 1 130 577 | 6 801 332 | 51 073 |
64 | 3 843 209 | 23 302 721 | 85 652 |
128 | 3 842 653 | 23 290 638 | 85 641 |
Here the computations start from 8 processors, rather than 1, due to memory requirements for the grid obtained after the third adaptation step. Figures
Execution time for flow solver and adaptation process on multiple processors. 3D wedge, three adaptation steps.
Execution time for flow solver and adaptation process on multiple processors. 3D wedge, one adaptation step after restart.
Although the adaptation execution times are far less than the total computation, where the flow solver is accounted for, it is interesting to examine the various stages of the adaptation cycle and see the impact these have on the use of computational resources used. Therefore what follows is a breakdown of the adaptive cycle in three main blocks, with the refinement/derefinement and renumbering as a first block, swapping/collapsing and renumbering a second block, and smoothing being the third and last block.
The previous 3D wedge initial mesh was used to start a computation with two adaptation steps on 4 processors and a third adaptation step for 8, 16, and 32 processors. The reason for this choice is that it is not possible to run three adaptation steps on 4 processors, due to memory constraints. The final solution and mesh of this last computation were once again used as a starting point for an adaptation step carried out with 64 and 128 processors. Adaptation conditions were maintained the same as for the previous case. Grids for all steps and number of processors are given in Table
Step-by-step grid sizes for different number of processors after 1, 2, and 3 adaptation steps, and after 1 adaptation step for 64 and 128 restarting from 32 final solution and grid. 3D wedge.
Steps | Processors |
|
|
|
---|---|---|---|---|
Step 1 | 4 | 140 180 | 811 918 | 20 902 |
8 | 140 129 | 811 258 | 20 868 | |
16 | 140 133 | 810 941 | 20 915 | |
32 | 140 108 | 810 176 | 20 941 | |
| ||||
Step 2 | 4 | 362 718 | 2 145 897 | 31 844 |
8 | 364 891 | 2 158 249 | 31 728 | |
16 | 365 409 | 2 161 629 | 31 911 | |
32 | 367 912 | 2 176 392 | 31 923 | |
| ||||
Step 3 | 8 | 1 118 350 | 6 721 070 | 50 714 |
16 | 1 123 896 | 6 756 786 | 50 931 | |
32 | 1 131 649 | 6 806 732 | 50 949 | |
| ||||
Restart | 64 | 3 843 363 | 23 310 788 | 85 648 |
128 | 3 842 828 | 23 298 937 | 85 647 |
Execution time for adaptation blocks on multiple processors. 3D wedge, first adaptation step.
Execution time for adaptation blocks on multiple processors. 3D wedge, second adaptation step.
Execution time for adaptation blocks on multiple processors. 3D wedge, third adaptation step.
Execution time for adaptation blocks on multiple processors. 3D wedge, one adaptation step after restart.
In this paper mesh adaptation techniques based on physical phenomena are developed in a parallel environment. The various steps (refinement, coarsening, optimisation, smoothing, reordering, and renumbering) and their algorithms are described. The techniques are validated on extensive complex flow simulations. The parallel adaptation performances show the efficiency of the implementation of these methods.
The Swiss National Science Foundation (SNSF) and the Swiss Federal Office for Education and Science (OFES) are acknowledged for financial support.