An Efficient Iteration Method for Toeplitz-Plus-Band Triangular Systems Generated from Fractional Ordinary Differential Equation

It is time consuming to numerically solve fractional differential equations. The fractional ordinary differential equations may produce Toeplitz-plus-band triangular systems. An efficient iterationmethod for Toeplitz-plus-band triangular systems is presented with O (Mlog (M)) computational complexity and O (M) memory complexity in this paper, compared with the regular solution with O (M2) computational complexity and O (M2) memory complexity. M is the discrete grid points. Some methods such as matrix splitting, FFT, compress memory storage and adjustable matrix bandwidth are used in the presented solution. The experimental results show that the presentedmethod compares well with the exact solution and is 4.25 times faster than the regular solution.


Introduction
Fractional differential equation (FDE) plays an important role in dynamical systems [1] and has more than 300 years of research history [2].Many analytical solutions and numerical solutions [3][4][5][6] have been proposed for FDE, such as finite difference method [7,8], finite element method [9], and spectral method [10,11].In recent times, interest in fractional ordinary differential equations (FODE) has increased [12][13][14][15].The derivatives in the FODE are approximated by linear combinations of function values at the discrete grid points.Compared with integer ordinary differential equations, the FODE has nonlocal effect, which means a grid point may rely on the grid points far away from its position.And a grid point of the classical integer equations may only rely on its several neighboring grid points.
For integer order equations, the coefficient matrices are often sparse.Because of the nonlocal property of fractional differential operators, the numerical methods for fractional diffusion equations often generate dense or even full coefficient matrices [16].This nonlocal property makes the computation of FODE and FDE much heavier than that of the traditional integer equations.The short memory principle [17], parallel computing [18][19][20][21], fast Fourier transformation (FFT) [22,23], multigrid method [24], and preconditioner technologies [25,26] are used to overcome this heavy computation.Gong et al. presented many parallel algorithms for different FDEs on both traditional and heterogeneous parallel platforms [16,18].Diethelm [19] proposed a parallel secondorder Adams-Bashforth-Moulton method for a FODE.Wang and Du [26] proposed a superfast-preconditioned iterative method for steady-state two-side space-fractional diffusion equations.
The fractional ordinary differential equations may produce Toeplitz-plus-band triangular systems.Toeplitz-plusband systems were studied by professors Chan and Ng [27].They considered the solutions of Hermitian Toeplitz-plusband systems (  +   ) = , where   are -by- Toeplitz matrices and   are -by- band matrices with bandwidth independent of .  and   are both Hermitian matrix.The authors proved that if   is generated by a nonnegative piecewise continuous function and   is positive semidefinite, then there exists a band matrix   , with bandwidth independent of , such that the spectra of  −1  (  +   ) are uniformly bounded by a constant independent of .The band preconditioner was developed for Hermitian Toeplitz systems [28].The recursive blocked algorithms were proposed for triangular systems and the recursive algorithms lead to an automatic variable blocking that has the potential of matching the memory hierarchies of today's HPC (high performance computing) systems [29,30].
The fractional derivative is in the Caputo form [31].
Define   =  for 0 ≤  ≤ , where  is a positive integer, and  = / are step size.Assume   to be the numerical approximation to (  ) and   the numerical approximation to (  ).Using the Grünwald approximation, the finite difference scheme for ( 1) is shown as follows: where the normalized Grünwald weight  is defined by Equation ( 2) results in a linear system of equations where  = ( 1 ,  2 , . . .,   )  and  = ( 1 ,  2 , . . .,   )  .If  0 ̸ = 0, the term should be included in . = (  ) × is the coefficient matrix. is defined by (5)
The linear system (4) can be solved with computational complexity ( 2 ), shown in Algorithm 1.The output  equals .
From ( 6), we can see that  has some properties.
Assume the bandwidth of matrix  is  and  = (  ) + .Solving  +1 =  needs about  arithmetical operations.If  is near log 2 , there are about log 2  arithmetical operations with forward substitution.So the computational complexity of  +1 =  is (log 2 ).
So the cost of each iteration of (10) is (log 2 ).If  is a diagonal dominant matrix, we can expect (10) can converge with not too many iterations.The efficient iteration method is shown in Algorithm 2.
In Algorithm 2,  1 →  stands for the value of previous iteration and  1 →  stands for the current iteration.The value of  can affect the performance of Algorithm 2 shown in Table 1.
Algorithm 2 has five features/advantages compared to Algorithm 1.
(1) Split the coefficient matrix and solve the triangular system iteratively.
(2) Use FFT to compute matrix vector multiplication.

Numerical Example
The experiment platform is a laptop with Intel(R) Core (TM) i3-3110M CPU, 2 GB main memory, and Windows 7 operating system.The CPU clock frequency is 2.40 GHz.The code is developed with MATLAB R2012a and runs on default double precision floating point operations.
The following fractional ( = 0.8) ordinary differential equation [13] was considered: where () = (14/Γ(3.8)) 1.8 + (5/2) 2 + (5/Γ(3.8))(1+ ) 2.8 .The exact solution of ( 14) is The efficient iteration method of Algorithm 2 compares well with the exact solution to the FODE in the test case of (14), shown in Figure 1.The  is 1.0/100.The maximum absolute error is 9.78 × 10 −3 .The difference between the efficient iteration method and the forward substitution Algorithm 1 is only 2.37 × 10 −10 .The efficient iteration method and The performance comparison between regular forward substitution solution of Algorithm 1 and efficient iteration method of Algorithm 2 is shown in Table 2. Columns 2 and 3 of Table 2 are the runtime and the runtime is recorded in seconds.With  = 8 × 10 4 , the maximum speedup is 4.25.Because the speedup increases with , the bigger  is, the higher the speedup that can be expected is.Because of the 2 GB memory limitation, the compress memory usage is also used in Algorithm 1.
The impact of  on the performance of Algorithm 2 is shown in Table 3.The runtime of the presented method varies with .So  is a key parameter for the performance of Algorithm 2. In real fractional ordinary applications, the proper  should be chosen.
The presented iteration method should be regarded as an iteration method to solve not only the system generated from FODE but also the more general Toeplitz-plus-band triangular systems.The technology of parallel computing is very useful, but with less mathematical background.Parallel computing is attractive for fractional differential equations [34].As a part of future work, first, we would like to parallelize the presented solution on shared memory or distributed memory systems.Second, accelerating the presented efficient iteration method on heterogeneous architecture [35][36][37][38] should also be interesting.

Figure 1 :
Figure 1: Comparison of exact solution to the solution of the fast solution at time  = 1.0.
←   / , One has | ,+1 | > | , | for 1 ≤  ≤ ,  < .This property is determined by the normalized Grünwald weight   and is the mathematical background of short memory principle.This property means that for grid point , if the distance of grid point  1 is smaller than that of grid point  2 ,  1 has more impact on  than  2 .

Table 2 :
Performance comparison between regular solution and the presented efficient iteration method.