A Polynomial Preconditioned Global CMRH Method for Linear Systems with Multiple Right-Hand Sides

The restarted global CMRH method (Gl-CMRH(m)) (Heyouni, 2001) is an attractive method for linear systems with multiple right-hand sides. However, Gl-CMRH(m) may converge slowly or even stagnate due to a limited Krylov subspace. To ameliorate this drawback, a polynomial preconditioned variant of Gl-CMRH(m) is presented. We give a theoretical result for the square case that assures that the number of restarts can be reduced with increasing values of the polynomial degree. Numerical experiments from real applications are used to validate the effectiveness of the proposed method.


Introduction
We consider the linear systems of the form where  is a nonsingular and sparse matrix of order ,  ∈ R × with usually  ≤ .Such a situation arises from, for instance, wave-scattering problems, image restorations, recursive least squares computations, numerical methods for integral equations, and structural mechanics problems; see [1,2] and references therein.Numerical solvers for (1) can be roughly divided into two categories.The first is the direct method; for instance, the LU factorization is competent since  is only factorized once to recast (1) into a triangular system which is easy to solve.However, if the coefficient matrix  is large and sparse or sometimes not readily available, then iterative solvers may become the only choice and possibly fall into the following three classes.
The first class is the block method.For symmetric problems, the first block methods are due to O'Leary, including the block conjugate gradient method and block biconjugate gradient method [3].For nonsymmetric cases, the block generalized minimal residual method [4][5][6], the block BiCGSTAB method [7], the block quasiminimal residual method [8], the block least squares method [9], the block Lanczos method [10], and the block IDR() method [11] have been proposed recently.In general, the block solvers are more suitable for dense systems with precondition.
The second class is the seed method.The main idea of this kind of methods is briefed below.We first select a single system (seed system) and develop the corresponding Krylov subspace.Then we project all the residuals of the other systems onto the same Krylov subspace to find new approximations as initial guess; see [2,12,13] for details.
The last class is the global method.To our knowledge, the term global is at least due to Saad [14,Chapter 10] and has been further populated by Jbilou et al. [15] with the global FOM and global GMRES methods for matrix equations.Following the work [15], many other global methods have been developed, including, to name just a few of them, the global BiCG and global BiCGSTAB methods [16,17], the global Hessenberg and global CMRH (changing minimal residual method based on the Hessenberg process) methods [18] and their weighted variants [19], the skew-symmetric methods [20], and the global SCD method [21].Generally, the global methods are more appropriate for large and sparse systems.
It is well known that the performance of the above Krylov subspace methods can be reinforced with a suitable preconditioner [14] or through effective matrix splitting techniques [22,23].In this paper, we are interested in preconditioning the global methods.Specifically, we aim at improving the convergence behavior of the restarted global CMRH method (Gl-CMRH()) [18], which is originally proposed to reduce the increasing storage requirement in its full version.However, because of the use of a small subspace (say  ≪ ), Gl-CMRH() is likely to slow down or even stalls out.Heyouni and Essai give a weighted version of Gl-CMRH() (WGl-CMRH()) [19] to alleviate such disadvantage.Instead, we propose a different approach, that is, by polynomial preconditioner to improve Gl-CMRH() in this paper.
The remainder of this work is organized as follows.In Section 2, we first recall some notations and properties of the global method, and then we sketch the Gl-CMRH() method.In Section 3, we construct the polynomial preconditioner tailored to Gl-CMRH() by exploiting the relation between the Krylov matrix and the global basis.For square right-hand side matrices, we also give a theoretical result that justifies the use of the proposed polynomial preconditioner.In Section 4, several numerical examples are employed to substantiate the effectiveness of the proposed method.Some concluding remarks and potential future work are briefed in the last section.

Notations and the Global CMRH Method
In this section, we first give some notations and properties used in the global methods, which will henceforth be adopted extensively in deriving the main results.Then we present a brief introduction of the Gl-CMRH() method [18].More details about the global methods can be found, for instance, in [15,18,19].

Notations and Properties.
Throughout this paper, the following notations will be used.The norms ‖ ⋅ ‖ 2 and ‖ ⋅ ‖  represent the vector 2-norm and matrix -norm, respectively.Let M be the set of  ×  rectangular matrices.If  ∈ M, then   stands for its transpose.For a square matrix ,  −1 indicates the inverse of  if existed.Unless otherwise stated, subscripts denote the corresponding iteration step; for example,   denotes the th iterate of the matrix (vector) .Moreover, the (, ) entries of matrices  and   are denoted by () , and (  ) , , respectively.If a column or a row of a matrix is invoked, then we denote it in a dot format; that is, (  ) ⋅, and (  ) ,⋅ mean correspondingly the th column and the th row of   .Besides, (  ) :,: extracts the submatrix from  to  rows and from  to  columns of   .
Next we present some notations and basic properties used in the global methods [15].Given the  ×  block matrix V  = [ 1 ,  2 , . . .,   ], where   ∈ M,  = 1, 2, . . ., , then we define the matrix product * as where  ∈ R  .For any matrix  ∈ R × , we define analogously the * product by It can be verified that such matrix product satisfies the following properties: where ,  ∈ R  .
As for the numerical performance, Gl-CMRH is in general competitive with the classic global GMRES method (Gl-GMRES) [15].Now we give a brief sketch of Gl-CMRH.Let  0 ∈ M be the initial guess of (1) with the associated residual matrix  0 =  −  0 .The th iteration   is searched in the affine subspace , where   is the th correction matrix.The matrix Krylov subspace is defined as K  (,  0 ) = span{ 0 ,  0 , . . .,  −1  0 }.Using the basis V  of K  (,  0 ) given by the global Hessenberg process [18], we get where   ∈ R  .The global Hessenberg process in [18] also yields where   is an ( + 1) ×  upper Hessenberg matrix,   is obtained by deleting the last row of   , and 0 is the zero matrix in M. Thus it follows immediately that where To obtain the vector   , a restriction is imposed on the Gl-CMRH method; that is, Relations ( 7) and ( 8) yield Instead of solving (9), which requires O( 2 ) operations and O() storage, we solve a smaller problem min which leads to 1 by assuming that   is of full rank.From ( 5) and ( 10), the th iterate   can be updated by ) .(11) As in the Gl-GMRES method [15], a restarting strategy is used to address the problem that the computational and storage requirements increase with iterations.Algorithm 1 gives a framework of the restarted version of Gl-CMRH (Gl-CMRH()).We refer to [18,19] for elaborate explanation for the Gl-CMRH method.

A New Polynomial Preconditioned Gl-CMRH(𝑚) Method
In many cases, the accuracy of Gl-CMRH() is sufficient.Due to the limited dimension of the matrix Krylov subspace K  , however, Gl-CMRH() may suffer from slow convergence or even stagnation in practice, just like in GMRES() [25] and Gl-GMRES() [15].To remedy this drawback, some accelerating techniques are demanded, for instance, a weighting strategy exploited in [19].Besides, a polynomial preconditioner can also be adapted to improve the convergence [26].In this section we focus on constructing an efficient polynomial preconditioner pertinent to the Gl-CMRH() method.
The essence of the polynomial preconditioned method is to devise a polynomial () ≈  −1 such that an easier system  ()  =  ()  (12) is solved instead of solving the original system (1).In what follows, we obtain a polynomial preconditioner () by extracting some useful information from Gl-CMRH().Now suppose that the block Krylov matrix   is of the form where  0 is the initial residual matrix.By comparing the last  columns of the second equation in ( 6), we have The equality ( 14) can be rearranged as Let us consider the relationship between   and the basis V  .Since V  and   span the same space, it follows that where   is an upper triangular matrix.The relation ( 16), however, does not shed too much light because how to compute   still remains unclear.Fortunately, an explicit recurrence for   can be derived in terms of  −1 and  −1 .By combining ( 16) and ( 4), we get Since   = (V  ) ⋅,(−1)+1: =   * (  ) ⋅, , then ) . ( Substituting ( 17) and ( 18) into (15) gives rise to Besides, the relation (16) gives  +1 =  +1 * ( +1 ) ⋅,+1 .By combining it with (19), we obtain a recurrence for the ( + 1)st column of  +1 ; that is, Therefore,   in ( 16) can be updated recursively by (20).Recall that in (5)   is updated on the basis V  .Here we will show another way to update   which is based on the block Krylov matrix   .It follows from ( 5), (16), and (4) that where     = [ 0 ,  1 , . . .,  −1 ]  and   is solved from (10).Denote by  −1 a polynomial in  of degree  − 1; that is,  −1 () = ∑ −1 =0     .Hence ( 21) can be recast as The matrix polynomial  −1 () in ( 22) can be regarded as the approximation to  −1 in some sense.This is justified for the case  =  in (1) by the following result.
Remark 2. In ( 23), the term ‖  ‖  becomes smaller with growing  and hence the upper bound diminishes correspondingly, which in turn implies that  −1 () approximates  −1 asymptotically.This justifies the use of the polynomial preconditioner.In general, (23) assures that the number of restarts will be reduced correspondingly with increasing .Yet this does not necessarily mean that the CPU time will be reduced simultaneously since the time saved from the reduction of restarts may be offset by the extra time spent in constructing the polynomial.In practice, we are often more concerned with the CPU time than the restarting number.Therefore, we restrict ourselves to small values of .For  < , an inequality similar to ( 23) is generally unavailable.Nevertheless, numerical examples seem to demonstrate that the asymptotical property of ( 23) is also shared by the case  < ; see Example 3 in Section 4 for more discussions.
By putting all together, we propose the new polynomial preconditioned global CMRH method (PGl-CMRH(, deg)) that is shown in Algorithm 2.

Numerical Examples
In this section, we present some numerical experiments which are coded with MATLAB 7.8.0.For fair comparisons, some other global solvers mentioned earlier like Gl-CMRH() [18], WGl-CMRH() [19], and Gl-GMRES() [15] have also been implemented.From now on we drop the parameters  and deg in brackets without ambiguity.In all examples, we assume that  0 = 0.The terminating criterion for the th iteration is tol = ‖  ‖  /‖ 0 ‖  ≤ 10 −10 .Though other alternatives are possible, we use  = √|( 0 ) , |/‖ 0 ‖  as the the weighting matrix for WGl-CMRH, which is also preferred in [19].The coefficient matrices  in the first two examples are derived from the discretizations of the Poisson's equation and convection-diffusion problems which occur frequently in applied science and engineering.The coefficient matrices  in the third example are quoted from the Matrix Market [27].
Example 1.We consider the linear systems of (1) in which its coefficient matrix  is obtained from the discretization of on the unit square (0, 1) × (0, 1) with  = 0 on the boundary.It can be discretized through the centered difference scheme at the gird points (  ,   ) with   = ℎ,   = ℎ, where the mesh size ℎ = 1/( + 1) for ,  = 0, . . .,  + 1.This yields a block tridiagonal matrix of size  =  2 .The right-hand side matrix  is chosen with entries uniformly distributed on [0, 1]; see [14,Chapter 2] for more details about (24).Related parameters are given by  = 2,  = 20 and deg = 5.The number of restarts and CPU time for matrices  of different sizes are given in Table 1.As observed from Table 1, PGl-CMRH improves the original CMRH method by time ratios from 17.8% to 55.4%.Compared with WGl-CMRH, PGl-CMRH requires less number of restarts and CPU time to achieve the required accuracy.Note that WGl-CMRH does not speed up the convergence of Gl-CMRH.This indicates that a different weighting matrix should be used.To find the optimal weighting matrix, however, remains an open problem [19].
Example 2. Consider the linear systems of (1) where its coefficient matrix  is obtained from the discretization of the three-dimensional convection-diffusion problem on the unit cube Here  is a constant coefficient and (25) subjects to Dirichlet-type boundary conditions.This equation can be discretized by applying seven-point finite difference discretizations.For instance, we use the centered difference to the diffusive terms and the first-order upwind approximations to the convective terms.This approach yields a coefficient matrix  of size  =  3 , where the equidistant mesh size ℎ = 1/( + 1) is used, and the natural lexicographic ordering is adopted to Set  = ( 0 )  0 , 0 ,  1 =  0 /,  1,1 =  0 and  1,2 =  0 .Let  1,1 = ( 0 )  0 , 0 (the upper triangular matrix defined in ( 16)).
( Example 3. In practice, the degree of the polynomial preconditioner  deg −1 has a great impact on the numerical performance of PGl-CMRH.Thus it deserves our attention to investigate how to choose the "optimal" degree (if existed) for generic matrices.Nevertheless, theoretical analysis to this end can be very hard.Instead, we show empirically how to choose a range of degrees for the polynomial  deg −1 such that PGl-CMRH at least yields a modest performance.To this end, we use ten unsymmetrical testing matrices from [27] and illustrate how PGl-CMRH performs for each matrix with  rdb2048l is rather irregular and hence unpredictable.However, this does not contradict Theorem 1 where it is stated that  deg −1 () can approximate  −1 better with growing values of deg.In other words, Theorem 1 explains theoretically that the total number of restarts will be reduced with increasing values of deg.However, this does not apply to the change of CPU time.In fact, it is likely that PGl-CMRH with high degree preconditioner takes more CPU time in generating the polynomial preconditioner (even with less number of restarts) and hence uses more time to converge than that of its low degree counterpart.Second, most curves locate the corresponding shortest CPU time point with deg between 2 and 10.This can be the first reason for favoring small values of deg.Finally, more rounding errors can be introduced in developing high-degree polynomial preconditioners from the numerical point of view.This is the second reason for the approval of low-degree polynomial preconditioners.Therefore it is useful to test with deg from 2 to 10.Under extreme situations, however, higher degree may be demanded if a low-degree preconditioner fails to bring the required accuracy.

Conclusion
To remedy the slow convergence of the original Gl-CMRH() method, a new variant of Gl-CMRH() for linear systems with multiple right-hand sides is developed.The proposed method often yields better performance than its predecessor Gl-CMRH() and other global variants in terms of CPU time.We show experimentally that polynomial preconditioners with degree lower than 10 should be considered if no prior knowledge is known.

Figure 1 :
Figure 1: Normalized CPU time against deg (from 2 to 15) for ten testing matrices.

Table 1 :
Number of restarts and CPU time (in brackets) for Example 1.

Table 3 :
Properties of testing matrices in Example 3.
in Figure1.This approach facilitates our comparison since different curves become more clustered now.Some remarks can be made from Figure1.First, the curves seem rather problem-dependent and are not necessarily nonincreasing with increasing values of deg; for instance, the curve of