Parallel Pseudo ArcLength Moving Mesh Schemes for Multidimensional Detonation

We have discussed themultidimensional parallel computation for pseudo arc-lengthmovingmesh schemes, and the schemes can be used to capture the strong discontinuity for multidimensional detonations. Different from the traditional Euler numerical schemes, the problems of parallel schemes for pseudo arc-length moving mesh schemes include diagonal processor communications and mesh point communications, which are illustrated by the schematic diagram and key pseudocodes. Finally, the numerical examples are given to show that the pseudo arc-length moving mesh schemes are second-order convergent and can successfully capture the strong numerical strong discontinuity of the detonation wave. In addition, our parallel methods are proved effectively and the computational time is obviously decreased.


Introduction
Detonation is a type of combustion involving a supersonic exothermic front accelerating through a medium that eventually drives a shock front propagating directly in front of it.Detonations occur in both conventional solid and liquid explosives, [1] as well as in reactive gases [2].Due to the strong discontinuity, one of the most important challenges in detonation simulations is to capture the precursor shock wave.To capture the strong discontinuity in detonation waves, mesh adaptation is an indispensable tool for use in the efficient numerical solution of this type of problem.The moving mesh method is one of the mesh adaptation methods, which relocate mesh point positions while maintaining the total number of mesh points and the mesh connectivity [3][4][5].Particularly, Tang et al. proposed a moving mesh method that contained two parts: physical PDE time evolution and mesh redistribution [6][7][8].The physical PDE time evolution and mesh redistribution were alternating, and conservative interpolation was used to transfer solutions from old mesh to new mesh.The method is shown to work well generally for hyperbolic conservation.After that, Ning et al. have improved this method and proposed the pseudo arc-length moving mesh schemes, which can deal with the multidimensional chemical reaction detonation problem [9].
In this paper, we will discuss the parallel computation of pseudo arc-length moving mesh schemes for multidimensional detonation.There are some works about the parallel schemes for Euler numerical scheme.Knepley and Karpeev developed a new programming framework, called Sieve, to support parallel numerical partial differential equation (PDE) algorithms operating over distributed meshes [10].Morris et al. demonstrated how to build a parallel application that encapsulates the Message Passing Interface (MPI) without requiring the user to make direct calls to MPI except for startup and shutdown [11].Mubarak et al. present us with a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing interprocess communication [12].Wang et al. use the parallel scheme to reduce the cost for adaptive mesh refinement WENO scheme of multidimensional detonation [13].However, there are no discussions about the parallel computation for moving mesh schemes, which will be of concern in this paper.Different from the traditional Euler numerical scheme, the data communications of moving mesh schemes between processors are more complex, which include physical values and mesh points.Besides, the processor communications for pseudo arc-length moving mesh schemes include adjacent processor and diagonal processor.Here, we adopt the software architecture of MPI.The most important advantages of this model are twofold: achievable performance and portability.Performance is a direct result of available optimized MPI libraries and full user control in the program development cycle.Portability arises from the standard API and the existence of MPI libraries on a wide range of machines.In general, an MPI program runs on distributed memory machines.The processor communications for the parallel computation of pseudo arc-length moving mesh schemes are more complex than the traditional Euler scheme, and it includes adjacent processor and diagonal processor.Here, we will consider three kinds of processor partitions to show our parallel schemes.This article is organized as follows.Section 2 introduces the chemical and physical model.Section 3 presents the numerical scheme.Section 4 is devoted to the parallel computation.Section 5 conducts several numerical experiments to demonstrate our schemes.The paper ends with a conclusion and discussion in Section 6.

Governing Equations
Instead of using many real elementary reactions, a two-step model was utilized as the testing model.Two-step reaction model considers a complicated chemical reaction to be an induction and an exothermic reaction.For both induction reaction and exothermic reaction, the progress parameters  and  are unity at first, then  decreases to zero, and  decreases until an equilibrium state is reached.The rates   and   are given as follows [14].
where  is the mass density,  the pressure,  the temperature,  the gas constant,  the heat release parameter,   and   the constants of reaction rates, and   and   the activation energies.
In deriving fundamental equations, the gas is assumed to be perfect, nonviscous, and non-heat-conducting.In Cartesian coordinates, governing equations for gaseous detonation problem, including the above chemical reaction, are where x is the multidimensional vector, F(w) is the multidimensional matrix function, and s(w) is the chemical reaction source term.For the one-dimensional space, ( For the two-dimensional space, ) ) ) , In the case of three-dimensional space, ) ) ) , where , V, and  are the Cartesian components of the fluid velocity in the , , and  directions, respectively.Total energy density  is defined as Here  is the specific heat ratio.

Numerical Method
Firstly, we present the framework of pseudo arc-length moving mesh schemes for the gaseous detonation problem (2).Our adaptive scheme is formed by two independents parts: the evolution of the governing equation and the iterative mesh redistribution.

Time Evolution of Governing Equations.
The physical domain for computation is Ω  .Given a partition of the mesh domain { j |  j ∈ Ω  ,  j = [x j , x j+1 ], ⋃ j  j = Ω  } and a partition of the time interval [0, ].The average on the cell  j is Here, the sign | j | denotes the length for the region  j (  ) in one dimension, the area for the region  j ( , ) in two dimensions, and the volumes for the region  j ( ,, ) in three dimensions.Integrating (2) over  j , we have Applying the Green-Gauss's theorem and rewriting, we have where n j is the outward unit normal vector of boundary external surface  j .The Lax-Friedrichs flux is defined by where  = max u,k |F  (u) ⋅ n|.It satisfies the conservation and consistency Let =1   j =  j } be a partition of boundary external surface  j and n j be the outward unit normal vector of surface   j .Then we have a semidiscrete scheme of ( 2) where w int() + s (w  ) . ( By the initial reconstruction technique [15] to reset w int()  , w ext()

𝑗
,  = 1, 2, on the edge    , we can obtain the second-order accurate spatial discretization: where  is a nonlinear limiter function which is used to suppress the possible pseudo oscillation.In our computations, we use van Leer's limiter [15]  (, ) = (sign () + sign ()) For the two-dimensional space,   = 4, and the diagram for edges   , ,  = 1, 2, 3, 4, of the quadrilateral element  , is shown in Figure 1.
The values w int() , , w ext() , at edge   , ,  = 1, 2, 3, 4, can be defined by , = w int(1) ,+1 , ,+1 For the three-dimensional hexahedron element, the situation is complex.Due to the movement of the mesh points, the hexahedron element will change to polyhedron element.x i,j,k+1 x i+1,j,k+1 x i+1,j+1,k x i,j+1,k+1 x i+1,j+1,k+1 Time discretization can be achieved by the strong stability preserving high order Runge-Kutta time discretization [16,17].The semidiscrete scheme ( 12) can be written as Here, the second-order TVD Runge-Kutta method for (18) in the time discretization is ) ) . ( 3.2.Pseudo Arc-Length Moving Mesh Scheme.Let x = { 1 ,  2 , . . .,    } and  = { 1 ,  2 , . . .,    } denote the physical and computational coordinates, respectively.Here,   denotes the number of spatial dimensions.A one-to-one coordinate transformation from the computational domain Ω  to the physical domain Ω  is denoted by In the variational approach, the mesh map between the domains Ω  and Ω  is usually provided by where ∇ x fl (  1 ,   2 , . . .,     )  and   ,  = 1, 2, . . .,   , are given symmetric positive definite matrices called monitor functions, depending on the underlying solution to be adaptive.The Euler-Lagrange equations of the functional ( 21) have the form In practice, Ω  may have a very complex geometry, and as a result it is not realistic to solve (22) directly.An alternative is to consider a functional where ∇  fl (  1 ,   2 , . . .,     )  .In addition, one of the choices of monitor functions is  = , where  is the identity matrix and  is a positive weight function.For the purpose to have more accuracy near the nonsmooth part of solutions, we introduce the monitor function of pseudo arc-length norm [18,19] where  1 and  2 are some nonnegative constants.Thus, the corresponding Euler-Lagrange equations about (23) are In our computations, we will use the Gauss-Seidel type iteration method to solve the mesh equation ( 25) for 1 ≤   ≤    , 1 ≤  ≤   , and  = 0, 1, . .., where ).In addition, let us consider the solutions updating on new grids x [+1] from the old grids x [] .Here, the solution updating should preserve conservative property for w  1 , Next, we will show the examples to calculate   ,  = 1, 2, . . .,   , according to different dimensions.

,𝑗
at time level   .
Step 3. Evolve the underlying PDEs using finite volume scheme (12) and time discretization scheme (19) on the new mesh {x  1 , where 1 ≤   ≤    , 1 ≤  ≤   .The redistribution for other boundaries can be carried out in a similar way.Numerical experiments show that the above procedure for moving the boundary points is useful in improving the solution resolution.

Parallelization Strategy
The perfect parallel strategy can reach the conflicting goals of portability and high performance.Here, we adopt the software architecture of MPI, which is a de facto standard for parallel programming on distributed memory systems [20].The primary importance with the MPI model is its discrete memory view in programming, which makes it hard to write and often involves a very long development cycle [21,22].Carefully thought-out strategies for partitioning the data and managing the resultant communication are essential for a good MPI program.Secondly, due to the distributed nature of the model, global data may be duplicated for performance reasons, resulting in an increase in the overall memory requirement.Besides, because of machine instabilities and the lack of fault tolerance in the model, it is very challenging to get very large MPI jobs running on large parallel systems (although this is true in general for other programming models).Considering our numerical schemes, the schematic diagram for our parallel computation is given in Figure 6.
Here,  corresponds to the number of processors.Next, the illustrations aiming at the critical problems for the schematic diagram will be discussed.

Partition the Spatial Mesh Domain.
In the parallel computation, the whole domain is divided into some blocks, and each block is distributed to different processors.The distribution is such that each processor gets allocated with one block.Because the computational complexity is related to the number of mesh points, the whole spatial domain is divided according to mesh points rather than spatial coordinates.
In addition, the number of mesh points for each processor computation is the same as far as possible.The common partitions for whole spatial domain are given in Figure 7. Figure 7(a) is line partition which fits the stripe domain.
The surface partition is shown in Figure 7(b) which can be used for the surface domain and body domain.The body partition in Figure 7(c) is used for the three-dimensional spatial domain.Next, we will discuss relationship between the whole domain and processors, and pseudocodes will be given.It is assumed that  is the number of cells in the whole spatial domain.The array __() is created to store which processor is assigned to each cell, and array _() is created to store the number of cells for each processor.The new number __() in processor for th cell can be obtained by the following pseudocode:

𝑒𝑛𝑑𝑑𝑜
In addition, the array __((_()), ) is created to store the relationship between the whole domain and processors.The pseudocode is Reduction and Synchronization.The size of spatial mesh is different for each processor, which leads to the time step for each processor being different.Thus time step is needed to synchronize in the computation. and  are used to achieve this.Let ℎ   be the time step of each processor computation with CFL condition.ℎ  is the time step after synchronization.The codes for MPI Fortran are  _(ℎ   , ℎ  , 1, __ , & _, 0, , )  _(ℎ  , 1, __ , 0, , )

Processor Communication.
For data communication, it is necessary to know neighbor processors for each processor.For line partition, there are two neighbors for current processor to communicate, and if the current processor is on the boundary or its neighbor does not exist, the neighbor processor will be signed .For the adjacent two processors   and   (  is left;   is right), there are _ adjacent cells for processors   and   .Thus, processor   needs _ cells to communicate with processor   .In the same way, processor   needs _ cells to communicate with processor   .The following pseudocode is given to show the communication between   and   .The arrays __(_) and __V(_) are the cells which   send and receive, while The arrays __(_) and __eV(_) are the cells which   send and receive.For the surface partition, there are eight neighbor processors for the current processor, which is shown in Figure 8(a) (here,  is the current processor).Because neighbors   ,  = 1, 2, 3, 4, have the common communicating edge with the current processor , we can use the similar way as the line partition to communication.The position of processor   ,  = 5, 6, 7, 8, can be obtained with   ,  = 1, 2, 3, 4. For example, the following pseudocode is to show the communication between  5 and .

Numerical Examples
In this section, the numerical convergence and the efficiency for our parallel schemes will be shown.In addition, the practical problem will be considered.In our simulation, the parameters are taken as  = 1.
The final time for this computation is  = 2.0.The monitor function for this computation is where  =   / and  1 = 0.1.The parameter  is 1.2.The computational domain is taken as [−, ].The errors and convergence orders in the density are obtained and compared in Table 2. From Table 2, we can see that our scheme is the second-order convergent scheme.In addition, Sod's classical shock tube problem is studied to show that our schemes can capture the singularities and avoid the nonphysical phenomena.The Riemann initial values are It is found that the mesh is adaptive with the solutions and our schemes can capture the discontinuous jump in numerical simulations.The computational complexity and errors analysis are compared between the fixed mesh and moving mesh schemes in Table 3. Comparing (F1) and (M1), we can use 50 mesh points to reach the effect of 100 fixed mesh points while costing the same CPU time.In addition, the different arc-length parameters with 100 mesh points are simulated to compare 200 and 250 fixed mesh points schemes.Our schemes can reach better results with less mesh points.Particularly,  2 ,  ∞ errors are obviously reduced.Thus we can conclude that our schemes can capture the singularities and avoid the nonphenomena.
is used in the computation.The terminal time is 0.5.The parallel cost with different processors is compared in Tables 5 and 6.Table 5 is the comparison of parallel efficiency with different processors for line partition, while Both tables illustrate that our parallel schemes are effective and the computational cost time is reduced.Figure 13 is the computational results at time 0.15, and Figure 14 is at time 0.3.The solutions figures show that our schemes can capture the shock waves and the mesh trajectory is adaptive.

Example 3.
The last example is the three-dimensional problem.The numerical exact test is taken firstly.The periodic boundary condition is used and the initial conditions are set  The errors and convergence orders about density are shown in Table 7, which implies that our schemes are the secondorder convergence.The computational domain and boundary conditions are shown in Figure 15.The initial values for reacted region and unreacted region are the same as Example two.Table 8 is used to show the parallel efficiency for body partition, which illustrates that the computational time is reduced and our parallel scheme can deal with the threedimensional problems.There are 120 × 120 × 120 cells in our simulations.The monitor function  = √ 1 +    16 shows the three-dimensional sliced effect of the adaptive mesh with the solutions at different time points.

Conclusion and Discussion
In this paper, we have discussed the parallel computation for pseudo arc-length moving mesh schemes.Different from the traditional Euler numerical scheme, the communications

Figure 17 (
a) is the simulated result of the block in the center of the domain about density at time 0.15, while Figure 17(b) is a -direction of the sliced simulation about density at time 0.3.The spatial cells are distorted to adapt to the shock wave front.They show us that our schemes can capture shock waves.

Table 2 :
Errors and convergence orders in the density.

Table 3 :
Comparison of various fixed and moving mesh.

Table 4 :
Errors and convergence orders in the density.
[23], we apply our schemes to a two-dimensional explosion problem.The computational domain is illustrated in Figure12.The boundary condition is reflective, and the initial conditions in the unreacted region are (, , V, , , ) = (1, 0, 0, 0, 1, 1).The initial conditions in the reacted region are obtained by the  model[23].There are

Table 5 :
Parallel efficiency with different processors for line partition.

Table 6 :
Parallel efficiency with different processors for surface partition.

Table 7 :
Errors and convergence orders in the density.

Table 8 :
Parallel efficiency with different processors for body partition.isused in the computation.The terminal time is 0.5.Figures16 and 17show the adaptive solutions.Figure