A Hybrid Algorithm Based on Optimal Quadratic Spline Collocation and Parareal Deferred Correction for Parabolic PDEs

Parareal is a kind of time parallel numerical methods for time-dependent systems. In this paper, we consider a general linear parabolic PDE, use optimal quadratic spline collocation (QSC) method for the space discretization, and proceed with the parareal technique on the time domain.Meanwhile, deferred correction technique is also used to improve the accuracy during the iterations. In fact, the optimal QSC method is a correction of general QSC method. Along the temporal direction we embed the iterations of deferred correction into parareal to construct a hybrid method, parareal deferred correction (PDC) method. The error estimation is presented and the stability is analyzed. To save computational cost, we find out a simple way to balance the two kinds of iterations as much as possible. We also argue that the hybrid algorithm has better system efficiency and costs less running time. Numerical experiments by multicore computers are attached to exhibit the effectiveness of the hybrid algorithm.


Introduction
The parareal algorithm was firstly proposed by Lions et al. [1] in 2001 as a parallel-in-time approach for solving timedependent differential equations, mainly for ordinary differential equations (ODEs) and parabolic equations [2][3][4].The significant advantage of the parareal algorithm is that it does not require the consecutive solving of subproblems along the temporal direction.As a purely parallel algorithm, the parareal algorithm has been applied to many practical problems, such as hyperbolic problems [5], fluid mechanics [6], quantum control [7,8], and optimized control problem [9,10].Recently, some innovative variants of the parareal were proposed by integrating with other efficient methods, such as spectral deferred corrections [11], domain decomposition technique [12], the Krylov subspace method [13], and adaptive technique [14].We have also proposed a kind of parareal waveform relaxation methods for ODEs [15] and parabolic PDEs [16], as well as parareal deferred correction methods for PDEs [17].
The parareal algorithm was initially derived from multiple shooting, time multigrid, and algebraic deferred correction [18,19] through borrowing the idea of domain decomposition on the time domain.In particular, it employs two operators on two time decompositions of different step sizes with the name of coarse and fine grids to propagate the values along the time direction.The initial guesses of subproblems defined on the coarse grid are given by the coarse propagator for the first iteration.Then, the values at the interface of two adjacent coarse-grid subproblems are updated using fine propagators in iterations, which is quite similar to the Schwarz methods for elliptic systems.Spectral deferred correction method, a kind of iterative methods for solving ODEs, is originally proposed in [20] and has been combined with parareal in [11].It makes use of the previously computed solution on the same time subdomain.High order approximations can be achieved while the computation cost is reduced sharply.
In this paper, we apply the idea of correction to the problem of parabolic PDEs.In detail, we employ the optimal QSC method for the spatial variables and PDC method for the temporal variables, which will result in high accuracy approximations with low computational cost.The new method can 2 Mathematical Problems in Engineering also be implemented in parallel, just like the same parallel mechanism as the classical parareal method.
The outline of our paper is as follows.In Section 2, we give a brief review about parareal algorithm and deferred correction.In Section 3, we employ the optimal QSC method for the space discretization of parabolic PDE, which results in an ODEs system.Then we present the PDC method and apply the method to the ODEs system.In Section 4, we give a proper error estimation for the PDC method.In Section 5, we analyze the stability of the algorithm and show the relationship between the stability and the propagators employed.In Section 6, we present a simple way to choose the time step of the fine propagator.The speedup and system efficiency of the new algorithm are analyzed in Section 7. Numerical experiments are attached in Section 8.

Materials and Methods
2.1.Parareal Algorithm.We briefly review the parareal algorithm on the following nonlinear ODEs:  ()  =  ( () , ) ,  ∈ [0, ] , where the nonlinear function  : R  × R → R  is Lipschitz on R  and the function  : R → R  is to be computed.The parareal algorithm requires a decomposition of the time domain [0, ] into  time subdomains [  ,  +1 ],  = 0, 1, . . .,  − 1, with 0 =  0 <  1 < ⋅ ⋅ ⋅ <  −1 <   = .For simplicity, we suppose that all time subdomains have the same length Δ fl  +1 −   .The approximations at these points {  |  = 1, 2, . . ., } can be obtained using the following scheme: where ( +1 ,   ,   ) and ( +1 ,   ,   ) denote, respectively, the coarse and fine approximation at the point  +1 with initial value   at the point   .The coarse propagator  is cheap and lowly accurate, and the fine propagator  is of high precision but also more expensive.The initialization of the iterative algorithm (2) is usually realized by the propagation of the coarse propagator along {  |  = 1, 2, . . ., } consecutively; that is,  0 +1 = ( +1 ,   ,  0  ).The parareal algorithm has several well-known properties in the following: (a) the initial values for the fine propagator on each time subdomain are known in every iteration, so the fine propagators on different time subdomains can be implemented by different processors.(b) The final result of parareal algorithm will achieve the accuracy of serial implement of the fine propagator on the whole time domain after  iterations, where  is the number of time subdomains.Note that the number of iterations is usually set to be much less than the number of time subdomains in practice to avoid necessary computations, while achieving a competitive accuracy [13].

Deferred Correction Method.
Deferred correction method is an iterative method for ODEs.We employ the scheme designed in [21] as one of the basic elements for our proposed method, in spite of many other variants of the original deferred corrections.We adopt the scheme in [21] because of its simplicity for implementation with high accuracy and also robustness for different problems.
We also take system (1) as our example to describe the execution details of deferred correction method.We firstly define a grid on [0, ] with ( + 1) equispaced nodes, where 0 =  0 <  1 < ⋅ ⋅ ⋅ <   = , with time step ℎ = /.An th order method on this grid could bring a numerical solution { 0 ,  1 , . . .,   } with error   = (  ) + O(ℎ  ).In addition, we can also generate a unique th-degree polynomial ũ() on [0, ] with this grid using Lagrange interpolation.Then we can define an error function which satisfies the differential equation This equation is usually regarded as the neighbor problem of original system (1) in the regime of deferred correction methods.Meanwhile, if we use the same th numerical method to solve this neighbor problem, we could get another group of numerical values {Δ 0 , Δ 1 , . . ., Δ  } on the same grid with error Δ  = (  ) + O(ℎ  ).The numerical solution {  + Δ  ,  = 0, 1, . . ., } is a better approximation of the original system after one correction since we have   + Δ  = (  ) + O(ℎ 2 ) [22][23][24].If we iterate such correction  times, the order of accuracy of the iterated deferred correction is O(ℎ min[(+1),] ).In practice, the degree  can not be very large.

QSC-PDC Methods for Parabolic PDEs
In this study, we consider the following linear PDE as our basic model to demonstrate our proposed method: where Ω ≡ [, ] is the spatial domain, [0, ] is the time domain with  > 0, and (, ) is the unknown function.
We take the time domain decomposition for system (5) and design the generic iteration scheme: The essential efforts of this paper are concentrated on the design of techniques, which could bring execution cost and high accuracy for scheme (6).
3.1.QSC Methods for Spatial Variables.We firstly consider reducing the computational cost in spatial domain.Note that, for every iteration , we have to use the propagators  and  to solve a linear homogeneous PDE defined on a fixed spatial domain the same as that of the original system for every time subdomain.The discretization of the spatial and temporal variables will result in algebraic equations for subproblems.The approximate order of the discretization procedure is crucial to the scale of the algebraic equation.In this paper, we use the optimal QSC method for the spatial variable, which was originally used for two-point boundary value problem [25], then elliptic PDEs [26], and parabolic PDEs [27,28].
For a fixed , system (5) is a two-point boundary value problem, whose quadratic spline approximate solution can be written as Therefore, we can write the approximation of system (5) as where   (),  = 0, 1, . . ., , are degrees of freedom (DOFs).
Then the quadratic spline approximate solution of system (5) can be written as With the values of the basis functions on collocation points { 1 , . . .,   }, we can obtain the following ODEs systems on the DOFs   () of the optimal one-step QSC methods: where () = [ 1 (), . . .,   ()]  , the coefficient matrices are and ⃗ () is the interpolating vector of (, ) at all the collocation points.The vector  0 ∈   satisfies  0  0 =  0 , with  0 the interpolation of () at collocation points.
The discretization error of the spatial variable in the uniform norm is O(Δ 3 ) globally and O(Δ 4 ) at the nodes Mathematical Problems in Engineering of the uniform partition and the collocation points [25].In contrast, the classical finite difference discretization could only generate approximations with error O(Δ 2 ) locally, which demonstrates the great benefits of the optimal QSC.

PDC Method for the Temporal Variables.
The parareal algorithm has a very important property.The approximation at all the endpoints of time subdomains will achieve the accuracy of the -propagator after  iterations [8,18], where  is the number of time subdomains.In fact, for any  < , the approximation at the time point   will be updated  times and reach the final approximation which is exactly the result by the -propagator at this time point.
With these final DOFs   0  , we can obtain the approximations of original system (5) on the grid consisting in spatial collocation points and temporal points.
Remark 1.When solving neighbor problem (15), we can use some simple numerical integrators, such as the backward Euler method.An alternative formulation is to compute the approximate solution in two steps, and the order of accuracy O(Δ 4 ) is also preserved [25,27,28].The realization process of such strategy based on backward Euler method is as follows: (i) first compute a second order approximation (ii) then compute an optimal order approximation { The two-step version involves two linear systems over every time step, and both the two coefficient matrices of the above two systems are tridiagonal.So the total computational cost will be smaller than that of one-step integrator (15), in which the bandwidth of the coefficient matrix is 4, because of the matrix   .

The Convergence of the PDC Method
The convergence analysis of the optimal QSC method can be found in the literature [25], and we just need to present the convergence of the PDC algorithm in the temporal domain.Bal [2], Gander, and Hairer [29] gave a convergence analysis on the classical parareal algorithm.Based on their work, we present the convergence analysis on the PDC method.
We also assume that the coarse propagator  satisfies error expansion for Δ small and the Lipschitz condition where ( +1 ,   , ) means the true solution at point  +1 of the original system with the value  at the time point   .Within every time subinterval [  ,  +1 ], the difference between the true solution and the approximation by the PDC method can be written as Based on this relationship and also the conditions (20) We first look at the last term ‖( +1 ,   ,    ) −  DC ( +1 ,   ,    )‖, which is an upper bound of the error caused by the deferred correction.For simplicity, we assume the solution of the original system to be smooth enough and solve the original equation in the prediction steps and the neighbor equations in the correction steps by the forward Euler integrators.The prediction step in deferred correction includes a numerical integration with initial value  0  on every time interval   .Let ⃗  [1]   = ( [1]  ,0 ,  [1]  ,1 , . . .,  [1]  , ) be the numerical solution from the forward Euler integrators on ( + 1) uniformly distributed nodes in   with time step ℎ.We know from [30] that the error satisfies ‖  (ℎ) −  [1]  , ‖ ∞ ∼ O(ℎ), where   (ℎ) is the true solution at  +1 with the same initial value  0  at   .In fact, the error at  +1 is related to the length of   , so it is reasonable to hypothesize ‖  (ℎ) −  [1]  , ‖ ∞ ≤ Δℎ, where  is a constant.
The correction steps in the hybrid algorithm are different from these steps of the pure deferred correction.The initial values of neighbor problems are always 0 for pure deferred correction.Meanwhile, for the correction steps which are embedded in parareal process, the initial values of the neighbor problems are kept being updated in parareal iterations.In other words, the initial value of problem (15) on time subinterval   is   0 − −1 0 .However, this formula of the initial value does not affect the iterative improvement property of the deferred correction.For the th parareal iteration, we have where  , = (Δ + ℎ).We can see from (24) that the error of deferred correction method equals the error of the corresponding neighbor problem.Similar to the inductive convergence proof of spectral deferred correction in [30], we can obtain that ‖( +1 ,   ,    ) −  DC ( +1 ,   ,    )‖ ∼ O(ℎ +1 ), considering the solution of the original system to be smooth enough.
Based on these results, we have the convergence of the PDC method.
Theorem 2. Let the solution of the original system be smooth enough.If the coarse propagator  satisfies condition (20) and condition (21), where   ,  =  + 1,  + 2, . .., are continuously differentiable, then the error at time point   after  PDC iterations is bounded: Obviously, when the time step ℎ approaches 0, such error bound will go to 0 with enough iterations.
We denote   () fl ∑ ≥0   +1  +1 .This function satisfies the following recurrence relation: Then, we have Solving for   (), we obtain after induction where  = /(1−).We know that replacing the factor 1− in the denominator by 1− only will increase the coefficients in the power series of   ().It means the coefficients in the power series of   () are smaller than the corresponding coefficients in the power series of ρ (), where Using the binomial series expansion it is easy to find that the coefficient of the term   in the expression ρ () is For any  > 0, we have 1 +  ≤   .So, it follows Remark 3. In practice, we usually take Δ very small and ℎ much smaller.We notice that the first term in (25) decreases to 0 superlinearly as  increases, and such decrease is very quick with the help of small Δ.For another hand, the increase of  also takes the summation in (25) much smaller despite more terms.Therefore, a couple of iterations are enough to pull the error down below some tolerance.

Stability Analysis of the PDC Method
To describe the stability of the PDC method, we apply such numerical method to the equation We follow the definition of stability in [20] and define the amplification factor Am (Δ) for  ∈ C as where φ(1) denotes the numerical approximation at the point  = 1.The numerical method is stable for some given value of , if For simplicity, we adopt forward and backward Euler methods in the analysis.The time domain [0, 1] is divided into  subdomains.The coarse propagator is forced to take large time steps, and implicit Euler is better than explicit Euler for the coarse propagator [31].Therefore, we choose one-step backward Euler method to be the coarse propagator on every subdomain, while the fine propagator is chosen as  steps of Euler method on every subdomain for the original equations and corresponding neighbor problems in the process of deferred correction.
We notice that the approximation φ( 1) is related to the iteration number  of the PDC method.We denote by Φ   the approximation of system (36) at th subdomain point and on the th iteration, where  = 0, 1, . . .,  and  = 0, 1, . ... The approximations satisfy Because of the one-step backward Euler method for the coarse propagator, we have We denote  = 1/(1 − Δ) as the amplification factor of the coarse propagator and let  be the amplification factor of the fine propagator, which is from the deferred correction method.In fact, the first iteration of the PDC method is exactly the classical deferred correction, but with smaller numerical steps on each time subdomain.Therefore,  = (1/(1 − Δ/))  for  steps of backward Euler integrators, and  = (1 + Δ/)  for  steps of forward Euler integrators.However, the expressions of  will be much more complicated when  ≥ 1.
For the initial approximation, we have For the first iteration ( = 0), we have and, after inductions, we get which is also the amplification factor after one iteration of the PDC method.We will continue to analyze the stability in two different cases, respectively.
Case 1.For the  steps of implicit Euler integrators as the fine propagators Am ImIm (Δ,  = 0) =          1 We denote  = Δ ∈ C, and the factor Am ImIm (Δ) is always smaller than 1 except two areas around its two poles  = 1 and  = .If we choose  = 20 and  = 20, the stability regions are shown in Figure 1.
When  ≥ 1, we can not get the explicit expressions of the amplification factors.The stability regions are estimated numerically in Figure 2. We see that the size of the instable region expands as the iteration number is increased.Nevertheless, these instable regions always lie in the right half plane, so all the PDC methods with implicit fine propagators at these tested iterations are -stable.

Additionally, we notice that lim
and the PDC method with both implicit propagators is stable at the first iteration.We take several values for  along   1.
From Table 1, we can conjecture that the PDC methods at different iterations could also be -stable.
Case 2. For the  steps of explicit Euler integrators as the fine propagators, the amplification factor is Am ImEx (Δ,  = 0) =          1 The regions of  = Δ to make Am ImEx (Δ,  = 0) ≤ 1 are related to the relationship between  and .
If  >  − 1, the factor Am ImEx (, 0) has two poles, that is,  = 1 and infinity.We choose  = 20 and  = 20, and the stability regions of the PDC method at different iterations are shown in Figure 3, in which the stable areas are shrunken as the iteration proceeds.
If  <  − 1, the factor Am ImEx (, 0) has one pole, that is,  = 1.We choose  = 20 and  = 16, the stability regions of the PDC method at different iterations are shown in Figure 4.In fact, there are very small unstable areas located at the left half plane in Figures 4(a  If  =  − 1, the factor Am ImEx (| ||=+∞ , 0) is always a positive constant much smaller than 1 because the number of time steps  is usually a moderate integer.Therefore, we are reasonable in merging this case into the previous one.
In Figure 4, we observe that there is a significant change about the pattern of the size of the stable region at the third iteration.The stability region before the third iteration is quite similar to the stability region of the backward Euler method, while the stability region after the third iteration has a similar shape of the stability region of the forward Euler method.The change means that the dominant control is transferred between the components of the PDC method.Figure 5 illustrates the relationship between this critical iteration and the number of interpolation points .
We can see a common trend that the stability regions are decreased with more iterations according to Figures 2-4 in this section.In fact, there are several factors having impact on the size of the stability region, including the coarse propagators, the fine propagators, the way of interpolation at those equispaced nodes, and the numerical differentiation formula of the original ODEs [32,33].We believe that the change of the stable regions is the result of all these factors working together accumulatively.

The Choice of the Fine Time Step
The PDC method includes two iterative processes, that is, parareal and deferred correction.Their individually different iteration speeds will alternatively influence the performance of the proposed hybrid algorithm.To this end, we study the balance strategy to avoid the waste of resources as well as a slowed-down speed.It is intuitively to tune the parameters of deferred correction since they are embedded in the framework parareal for our proposed PDC method.In fact, there are several ways to improve the accuracy of the final approximations for pure deferred corrections, such as smaller time step, higher order numerical integrators, or more iterations.In this study, we choose to tune the time step of the fine propagators of the PDC method.
Here we take the following system as an example to show how to find an optimized time step for the PDC method: The fine time step ℎ can be regarded as the convergence factor of deferred correction iterations.If th order numerical integrator is employed in deferred correction, this factor will be obviously ℎ  .The constant , which describes the difference between the true-solution propagator and the coarse propagator, can be regarded as a part of the convergence factor of parareal in contrast to deferred correction.We can choose ℎ close enough to  to balance the two iterations in the PDC method.Note that the fine time step ℎ is crucial for the final approximation performance of the PDC method.We denote the number of the fine time steps on each time subdomain   by , where  = Δ/ℎ.Polynomials with degree of  are employed in the process of deferred correction, and  is the upper bound of the final approximate order of the PDC method.So there should be a moderate integer  1 such that  ≥  1 , which means the final result could reach at least the accuracy O(ℎ  1 ).However, the  can not be very large either because the polynomials of low degree are preferred for interpolations.So there should be a little bigger integer  2 such that  ≤  2 to avoid Runge's phenomenon.Now, we get a value interval for the fine time step, with Δ/ 2 ≤ ℎ ≤ Δ/ 1 .The choice of the optimal time step is which means that ℎ should be chosen as the nearest point to  from the value interval [Δ/ 2 , Δ/ 1 ].

System Efficiency Analysis
Parareal can be implemented in two parallel mechanisms which are called serial-parallel parareal and pipelined parareal in [34].For the pipelined parareal, each processor alternatively compute the fine and coarse propagators without waiting except the initial wait.It had been shown that the pipelined fashion could be better in terms of the computation speed.In this section, we study the parallel speedup and efficiency of the PDC method based on pipelined parareal method.
We first investigate the PDC method for ODEs system (12) from optimal one-step QSC method for PDE.Here we suppose backward Euler methods are used in both the coarse and fine propagators.We also assume that the computing costs of the algebraic equations involved in  and  propagators are the same if they have the same scale and sparse structure.When analyzing the speedup of the classical parareal method, we equalize the computing costs of one time step at each iteration.In this paper we do not give such assumption any more, since interpolations are necessary in correction steps of the PDC method, which is not included in the prediction step.The costs of information transmission and data storage are ignored in this paper.
Similar to the speedup and system efficiency analysis shown in [34], we adopt the following notation: (i)  is the length of time domain.
(ii) Δ is the length of each time subdomain.
(iii) Δ is the time increment for the coarse propagator.
(iv)  is the time increment for the fine propagator.
(v)   is the cost of the numerical method at each time step in .
(vi)   is the number of time steps of the coarse propagator on each time subdomain, and we often choose   = 1.
(vii)   is the cost of the numerical method at each time step in .
(viii)   is the number of time steps of the fine propagator on each time subdomain.
(ix)   is the cost of interpolation averaged at each time step in correction steps.
(x)  is the number of time subdomains on [0, ]; that is,  = Δ.where the ratio / is a small constant.The cost of PDC method for system (57) is about  1 /4 times of the cost of the PDC method for system (12).Since  1 is a very big integer, the QSC method reduces the total cost greatly.Furthermore, if the optimal two-step QSC method is employed, the reduction factor will be  1 /2 rather than  1 /4.

Numerical Experiments
In this section, we take the following parabolic PDE as an example to present the advantages of the QSC-PDC algorithm: (, ) =  2  2  (, ) + (2 2 − 1)  −/2 sin () ,  (0, ) =  (1, ) = 0, 0 ≤  ≤ 5,  (, 0) = 2 sin () , 0 ≤  ≤ 1. (61) The true solution of system (61) is  (, ) = 2 −/2 sin () , 0 ≤  ≤ 1, 0 ≤  ≤ 5. (62) To perform the QSC-PDC method, we first employ a uniform partition of the spatial domain [0, 1] with 160 equispaced grid points, and the step size is Δ = 1/160.Then we take the midpoint of each segment as the collocation points; that is, where the matrices  0 ,  2 , and   ∈  160×160 are the same as the matrices in system (12).Here, we only consider the optimal one-step QSC method.We divide the time domain [0, 5] into 40 subdomains.On each subdomain, we use 1-step backward Euler method for the  propagators and use 5-step backward Euler method for the  DC propagators.Polynomials of degree 5 are employed for interpolation in each correction loop.The PDC method is programmed in the pipelined manner, and information is communicated between processors through MPI, and the code is written using C language with GNU Scientific Library.The program is carried out on the Shared Hierarchical Academic Research Computing Network of Canada (https://www.sharcnet.ca/).41 CPUs are employed in the experiments, including one root CPU and the other 40 CPUs.The root CPU is in charge of sending and receiving data and displaying outputs.The iterative computing of the propagators on the 40 time subdomains are distributed onto the other 40 CPUs.
For simplicity, we only examine the error at a special point (, ) = ((80 − 1/2)Δ, 5).The resulting PDC error and corresponding running time are shown in Table 2.In order to exhibit the advantages of the QSC-PDC method, we perform the classical parareal method for system (64), and the spatial variable is also discretized by optimal QSC.Here, we call the reference method as QSC-parareal method.The time domain [0, 5] is also divided into 40 subdomains, and the coarse propagator is also chosen as 1-step backward Euler method on each subdomain.The fine propagator is chosen as 100-step 2-stage Gauss method on each subdomain, which is of high order and easy to carry out.
The classical parareal method for (64) is also programmed in pipelined manner, and the corresponding error at the same point (, ) = ((80 − 1/2)Δ, 5) and running time are shown in Table 3.
To show the advantages of the deferred correction clearly, we plot the iteration error and running time of the two methods together in Figure 6.We can see that the deferred correction technique makes the parareal method achieve better approximations in fixed iterations and costs much less running time.

Conclusions
In this paper, we firstly applied the optimal one-step QSC method to solve linear parabolic PDEs, which lead to linear ODEs with respect to the degrees of freedom.We then combined the parareal algorithm with deferred correction method along the temporal direction to deal with the collocation equations.With the fourth order error on the spatial

10 Figure 1 :
Figure 1: Instability region (blank area) using implicit Euler integrators for both the coarse and fine propagators.The -axis is Re(Δ) and the -axis is Im(Δ), where  = 20 and  = 20.
) and 4(b), so the PDC methods are not -stable in such situations either.

Figure 3 :
Figure 3: Stability region (doted area) using implicit Euler integrators for the coarse propagators and explicit Euler integrators for the fine propagators after 2 iterations (a), 3 iterations (b), 4 iterations (c), and 6 iterations (d).In fact, there exists a relatively very small unstable area around the point  = 1 in (a).The -axis is Re(Δ) and the -axis is Im(Δ), where  = 20 and  = 20.

Table 1 :
Amplification factors at different values of .

Table 2 :
The iteration error and running time of QSC-PDC method, with  = 40,  = 5.

Table 3 :
The iteration error and running time of classical method for the QSC equations, with Gauss methods of  = 40 and 200 steps on each subdomain.