JAM Journal of Applied Mathematics 1687-0042 1110-757X Hindawi Publishing Corporation 457089 10.1155/2013/457089 457089 Research Article A Polynomial Preconditioned Global CMRH Method for Linear Systems with Multiple Right-Hand Sides http://orcid.org/0000-0001-5073-959X Zhang Ke Gu Chuanqing Yuan Jinyun Department of Mathematics Shanghai University Shanghai 200444 China shu.edu.cn 2013 14 11 2013 2013 14 03 2013 23 06 2013 26 09 2013 2013 Copyright © 2013 Ke Zhang and Chuanqing Gu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The restarted global CMRH method (Gl-CMRH(m)) (Heyouni, 2001) is an attractive method for linear systems with multiple right-hand sides. However, Gl-CMRH(m) may converge slowly or even stagnate due to a limited Krylov subspace. To ameliorate this drawback, a polynomial preconditioned variant of Gl-CMRH(m) is presented. We give a theoretical result for the square case that assures that the number of restarts can be reduced with increasing values of the polynomial degree. Numerical experiments from real applications are used to validate the effectiveness of the proposed method.

1. Introduction

We consider the linear systems of the form (1)AX=B, where A is a nonsingular and sparse matrix of order n, Bn×s with usually sn. Such a situation arises from, for instance, wave-scattering problems, image restorations, recursive least squares computations, numerical methods for integral equations, and structural mechanics problems; see [1, 2] and references therein.

Numerical solvers for (1) can be roughly divided into two categories. The first is the direct method; for instance, the LU factorization is competent since A is only factorized once to recast (1) into a triangular system which is easy to solve. However, if the coefficient matrix A is large and sparse or sometimes not readily available, then iterative solvers may become the only choice and possibly fall into the following three classes.

The first class is the block method. For symmetric problems, the first block methods are due to O’Leary, including the block conjugate gradient method and block biconjugate gradient method . For nonsymmetric cases, the block generalized minimal residual method , the block BiCGSTAB method , the block quasiminimal residual method , the block least squares method , the block Lanczos method , and the block IDR(s) method  have been proposed recently. In general, the block solvers are more suitable for dense systems with precondition.

The second class is the seed method. The main idea of this kind of methods is briefed below. We first select a single system (seed system) and develop the corresponding Krylov subspace. Then we project all the residuals of the other systems onto the same Krylov subspace to find new approximations as initial guess; see [2, 12, 13] for details.

The last class is the global method. To our knowledge, the term global is at least due to Saad [14, Chapter 10] and has been further populated by Jbilou et al.  with the global FOM and global GMRES methods for matrix equations. Following the work , many other global methods have been developed, including, to name just a few of them, the global BiCG and global BiCGSTAB methods [16, 17], the global Hessenberg and global CMRH (changing minimal residual method based on the Hessenberg process) methods  and their weighted variants , the skew-symmetric methods , and the global SCD method . Generally, the global methods are more appropriate for large and sparse systems.

It is well known that the performance of the above Krylov subspace methods can be reinforced with a suitable preconditioner  or through effective matrix splitting techniques [22, 23]. In this paper, we are interested in preconditioning the global methods. Specifically, we aim at improving the convergence behavior of the restarted global CMRH method (Gl-CMRH(m)) , which is originally proposed to reduce the increasing storage requirement in its full version. However, because of the use of a small subspace (say mn), Gl-CMRH(m) is likely to slow down or even stalls out. Heyouni and Essai give a weighted version of Gl-CMRH(m) (WGl-CMRH(m))  to alleviate such disadvantage. Instead, we propose a different approach, that is, by polynomial preconditioner to improve Gl-CMRH(m) in this paper.

The remainder of this work is organized as follows. In Section 2, we first recall some notations and properties of the global method, and then we sketch the Gl-CMRH(m) method. In Section 3, we construct the polynomial preconditioner tailored to Gl-CMRH(m) by exploiting the relation between the Krylov matrix and the global basis. For square right-hand side matrices, we also give a theoretical result that justifies the use of the proposed polynomial preconditioner. In Section 4, several numerical examples are employed to substantiate the effectiveness of the proposed method. Some concluding remarks and potential future work are briefed in the last section.

2. Notations and the Global CMRH Method

In this section, we first give some notations and properties used in the global methods, which will henceforth be adopted extensively in deriving the main results. Then we present a brief introduction of the Gl-CMRH(m) method . More details about the global methods can be found, for instance, in [15, 18, 19].

2.1. Notations and Properties

Throughout this paper, the following notations will be used. The norms ·2 and ·F represent the vector 2-norm and matrix F-norm, respectively. Let 𝕄 be the set of n×s rectangular matrices. If X𝕄, then XT stands for its transpose. For a square matrix A, A-1 indicates the inverse of A if existed. Unless otherwise stated, subscripts denote the corresponding iteration step; for example, Xk denotes the kth iterate of the matrix (vector) X. Moreover, the (i,j) entries of matrices Y and Xk are denoted by (Y)i,j and (Xk)i,j, respectively. If a column or a row of a matrix is invoked, then we denote it in a dot format; that is, (Xk)·,j and (Xk)i,· mean correspondingly the jth column and the ith row of Xk. Besides, (Xk)i:j,s:t extracts the submatrix from i to j rows and from s to t columns of Xk.

Next we present some notations and basic properties used in the global methods . Given the n×ms block matrix 𝒱m=[V1,V2,,Vm], where Vi𝕄, i=1,2,,m, then we define the matrix product * as (2)𝒱m*f=i=1m(f)iVi, where fm. For any matrix Hm×m, we define analogously the * product by (3)𝒱m*H=[𝒱m*(H)·,1,𝒱m*(H)·,2,,𝒱m*(H)·,m].

It can be verified that such matrix product satisfies the following properties: (4)𝒱m*(f+g)=(𝒱m*f)+(𝒱m*g),(𝒱m*H)*f=𝒱m*(Hf), where f,gm.

2.2. The Global CMRH Method

The Gl-CMRH method  is an efficient extension of the CMRH method  for solving (1). It is based on the global Hessenberg process . As for the numerical performance, Gl-CMRH is in general competitive with the classic global GMRES method (Gl-GMRES) .

Now we give a brief sketch of Gl-CMRH. Let X0𝕄 be the initial guess of (1) with the associated residual matrix R0=B-AX0. The mth iteration Xm is searched in the affine subspace X0+𝒦m(A,R0); that is, Xm-X0=Wm𝒦m(A,R0), where Wm is the mth correction matrix. The matrix Krylov subspace is defined as 𝒦m(A,R0)=span{R0,AR0,,Am-1R0}. Using the basis 𝒱m of 𝒦m(A,R0) given by the global Hessenberg process , we get (5)Xm=X0+𝒱m*ym, where ymm. The global Hessenberg process in  also yields (6)A𝒱m=𝒱m+1*H-m=𝒱m*Hm+(H-m)m+1,m[0,0,,0,Vm+1], where H-m is an (m+1)×m upper Hessenberg matrix, Hm is obtained by deleting the last row of H-m, and 0 is the zero matrix in 𝕄. Thus it follows immediately that (7)Rm=R0-A𝒱m*ym=𝒱m+1*(βe1(m+1)-H-mym), where β=max1in,1js{|(R0)i,j|} and e1(m+1)=[1,0,,0]Tm+1. To obtain the vector ym, a restriction is imposed on the Gl-CMRH method; that is, (8)RmF=minW𝒦m(A,R0)R0-AWF.

Relations (7) and (8) yield (9)ym=argminym𝒱m+1*(βe1(m+1)-H-my)F.

Instead of solving (9), which requires 𝒪(nm2) operations and 𝒪(nm) storage, we solve a smaller problem (10)minymβe1(m+1)-H-my2, which leads to ym=β(H-mTH-m)-1H-mTe1(m+1) by assuming that H-m is of full rank. From (5) and (10), the mth iterate Xm can be updated by (11)Xm=X0+β𝒱m*((H-mTH-m)-1H-mTe1(m+1)).

As in the Gl-GMRES method , a restarting strategy is used to address the problem that the computational and storage requirements increase with iterations. Algorithm 1 gives a framework of the restarted version of Gl-CMRH (Gl-CMRH(m)). We refer to [18, 19] for elaborate explanation for the Gl-CMRH method.

<bold>Algorithm 1: </bold>The Gl-CMRH(<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M93"><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:math></inline-formula>) method for <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M94"><mml:mi>A</mml:mi><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mi>B</mml:mi></mml:math></inline-formula> [<xref ref-type="bibr" rid="B13">18</xref>].

(1) Choose an initial guess X0, the restarting frequency m and the tolerance tol. Compute R0=B-AX0.

Determine i0,j0 such that |(R0)i0,j0|=max1in,1js{(R0)i0,j0}. Set β=|(R0)i0,j0|, V1=R0/β, p1,1=i0 and p1,2=j0.

(2) Construct the matrix basis 𝒱m=[V1,V2,,Vm] and H-m by the global Hessenberg process .

(3) Solve ym by (10) and update Xm by (5).

(4) Compute Rm=B-AXm. If RmF/R0Ftol, then stop; otherwise set X0=Xm, R0=Rm.

Choose i0,j0 such that |(R0)i0,j0|=max1in,1js{(R0)i0,j0}.

Set β=|(R0)i0,j0|, V1=R0/β, p1,1=i0 and p1,2=j0. Go to Step >2.

3. A New Polynomial Preconditioned Gl-CMRH(<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M118"><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:math></inline-formula>) Method

In many cases, the accuracy of Gl-CMRH(m) is sufficient. Due to the limited dimension of the matrix Krylov subspace 𝒦m, however, Gl-CMRH(m) may suffer from slow convergence or even stagnation in practice, just like in GMRES(m)  and Gl-GMRES(m) . To remedy this drawback, some accelerating techniques are demanded, for instance, a weighting strategy exploited in . Besides, a polynomial preconditioner can also be adapted to improve the convergence . In this section we focus on constructing an efficient polynomial preconditioner pertinent to the Gl-CMRH(m) method.

The essence of the polynomial preconditioned method is to devise a polynomial Q(A)A-1 such that an easier system (12)Q(A)AX=Q(A)B is solved instead of solving the original system (1). In what follows, we obtain a polynomial preconditioner Q(x) by extracting some useful information from Gl-CMRH(m).

Now suppose that the block Krylov matrix Kk is of the form (13)Kk=[R0,AR0,,Ak-1R0], where R0 is the initial residual matrix. By comparing the last s columns of the second equation in (6), we have (14)AVk=𝒱k*(Hk)·,k+(H-k)k+1,kVk+1.

The equality (14) can be rearranged as (15)Vk+1=((H-k)k+1,k)-1(AVk-𝒱k*(Hk)·,k).

Let us consider the relationship between Kk and the basis 𝒱k. Since 𝒱k and Kk span the same space, it follows that (16)𝒱k=Kk*Uk, where Uk is an upper triangular matrix. The relation (16), however, does not shed too much light because how to compute Uk still remains unclear. Fortunately, an explicit recurrence for Uk can be derived in terms of Uk-1 and Hk-1. By combining (16) and (4), we get (17)𝒱k*(Hk)·,k=Kk*(Uk(Hk)·,k  )=[Kk,AkR0]*(Uk(Hk)·,k0).

Since Vk=(𝒱k)·,(k-1)s+1:ks=Kk*(Uk)·,k, then (18)AVk=[AR0,A2R0,,AkR0]*(Uk)·,k=Kk+1*(0(Uk)·,k).

Substituting (17) and (18) into (15) gives rise to (19)Vk+1=((H-k)k+1,k)-1Kk+1*((0(Uk)·,k)-(Uk(Hk)·,k0)).

Besides, the relation (16) gives Vk+1=Kk+1*(Uk+1)·,k+1. By combining it with (19), we obtain a recurrence for the (k+1)st column of Uk+1; that is, (20)(Uk+1)·,k+1=((H-k)k+1,k)-1((0(Uk)·,k)-(Uk(Hk)·,k0)).

Therefore, Uk in (16) can be updated recursively by (20). Recall that in (5) Xk is updated on the basis 𝒱k. Here we will show another way to update Xk which is based on the block Krylov matrix Kk. It follows from (5), (16), and (4) that (21)Xk=X0+(Kk*Uk)*yk=X0+Kk*(Ukyk)=X0+i=0k-1αiAiR0, where Ukyk=[α0,α1,,αk-1]T and yk is solved from (10). Denote by Qk-1 a polynomial in A of degree k-1; that is, Qk-1(A)=i=0k-1αiAi. Hence (21) can be recast as (22)Xk=X0+Qk-1(A)R0.

The matrix polynomial Qk-1(A) in (22) can be regarded as the approximation to A-1 in some sense. This is justified for the case n=s in (1) by the following result.

Theorem 1.

Let Qk-1(A), X0 and R0=B-AX0 be the square matrices defined in (22), and lot X* be the true solution of (1). Suppose that X*-X0 is nonsingular. Then one has (23)I-Qk-1(A)AFEkFE0-1F, where Ek:=X*-Xk and E0:=X*-X0.

Proof.

The inequality (23) follows immediately from an arrangement of (22).

Remark 2.

In (23), the term EkF becomes smaller with growing k and hence the upper bound diminishes correspondingly, which in turn implies that Qk-1(A) approximates A-1 asymptotically. This justifies the use of the polynomial preconditioner. In general, (23) assures that the number of restarts will be reduced correspondingly with increasing k. Yet this does not necessarily mean that the CPU time will be reduced simultaneously since the time saved from the reduction of restarts may be offset by the extra time spent in constructing the polynomial. In practice, we are often more concerned with the CPU time than the restarting number. Therefore, we restrict ourselves to small values of k. For s<n, an inequality similar to (23) is generally unavailable. Nevertheless, numerical examples seem to demonstrate that the asymptotical property of (23) is also shared by the case s<n; see Example 3 in Section 4 for more discussions.

By putting all together, we propose the new polynomial preconditioned global CMRH method (PGl-CMRH(m,deg)) that is shown in Algorithm 2.

<bold>Algorithm 2: </bold>PGl-CMRH(<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M186"><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>deg</mml:mi><mml:mo /></mml:math></inline-formula>) method for <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M187"><mml:mi>A</mml:mi><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mi>B</mml:mi></mml:math></inline-formula>.

(1) Input: m, deg and X0.

(2) Phase I: Compute R0=B-AX0. Determine i0,j0 such that (R0)i0,j0=max1in,1js{|(R0)i,j|}  .

Set β=(R0)i0,j0, V1=R0/β, p1,1=i0 and p1,2=j0. Let U1,1=(R0)i0,j0 (the upper triangular matrix

defined in (16)).

(3) fork=1:deg

(4)  M=AVk

(5)  for j=1:k

(6)   (H-)j,k=(M)pj,1,pj,2

(7)   M=M-(H-)j,kVj

(8)  end for

(9)  Determine i0,j0 such that (M)i0,j0=max1in,1js{|(M)i,j|}.

(10)  Set (H-)k+1,k=(M)i0,j0, Vk+1=M/(H-)k+1,k, pk+1,1=i0 and pk+1,2=j0.

(11)  Compute (Uk+1).,k+1 from (20).

(12) end for

(13) Xdeg=X0+𝒱deg*ydeg, where ydeg=argminydeg(R0)i0,j0e1(deg+1)-H-degy2.

(14) Construct the polynomial preconditioner Qdeg-1(A) with its coefficients decided by entries of Udegydeg.

(15)  Phase II: Solve Qdeg-1(A)AX=Qdeg-1(A)B using Algorithm 1.

4. Numerical Examples

In this section, we present some numerical experiments which are coded with MATLAB 7.8.0. For fair comparisons, some other global solvers mentioned earlier like Gl-CMRH(m) , WGl-CMRH(m) , and Gl-GMRES(m)  have also been implemented. From now on we drop the parameters m and deg in brackets without ambiguity. In all examples, we assume that X0=0. The terminating criterion for the kth iteration is tol=  RkF/R0F10-10. Though other alternatives are possible, we use D=ns|(R0)i,j|/R0F as the the weighting matrix for WGl-CMRH, which is also preferred in . The coefficient matrices A in the first two examples are derived from the discretizations of the Poisson’s equation and convection-diffusion problems which occur frequently in applied science and engineering. The coefficient matrices A in the third example are quoted from the Matrix Market .

Example 1.

We consider the linear systems of (1) in which its coefficient matrix A is obtained from the discretization of (24)u=uxx+uyy on the unit square (0,1)×(0,1) with u=0 on the boundary. It can be discretized through the centered difference scheme at the gird points (xi,yi) with xi=ih,yj=jh, where the mesh size h=1/(N+1) for i,j=0,,N+1. This yields a block tridiagonal matrix of size n=N2. The right-hand side matrix B is chosen with entries uniformly distributed on [0,1]; see [14, Chapter 2] for more details about (24). Related parameters are given by s=2, m=20 and deg=5. The number of restarts and CPU time for matrices A of different sizes are given in Table 1. As observed from Table 1, PGl-CMRH improves the original CMRH method by time ratios from 17.8% to 55.4%. Compared with WGl-CMRH, PGl-CMRH requires less number of restarts and CPU time to achieve the required accuracy. Note that WGl-CMRH does not speed up the convergence of Gl-CMRH. This indicates that a different weighting matrix should be used. To find the optimal weighting matrix, however, remains an open problem .

Number of restarts and CPU time (in brackets) for Example 1.

n Gl-GMRES Gl-CMRH WGl-CMRH PGl-CMRH
10,000 121 (18.0) 85 (7.4) 89 (8.4) 24 (4.1)
14,400 150 (34.6) 85 (11.8) 116 (17.3) 23 (6.0)
22,500 259 (133.4) 165 (42.8) 173 (53.5) 37 (17.0)
40,000 450 (585.8) 255 (173.0) 302 (235.2) 26 (32.3)
44,100 496 (699.9) 322 (253.3) 368 (313.6) 39 (45.0)

Example 2.

Consider the linear systems of (1) where its coefficient matrix A is obtained from the discretization of the three-dimensional convection-diffusion problem (25)𝒯u=-(uxx+uyy+uzz)+q(ux+uy+uz) on the unit cube Ω=[0,1]×[0,1]×[0,1]. Here q is a constant coefficient and (25) subjects to Dirichlet-type boundary conditions. This equation can be discretized by applying seven-point finite difference discretizations. For instance, we use the centered difference to the diffusive terms and the first-order upwind approximations to the convective terms. This approach yields a coefficient matrix A of size n=N3, where the equidistant mesh size h=1/(N+1) is used, and the natural lexicographic ordering is adopted to the unknowns; we refer to [22, Section 4] for more details. The right-hand side matrix B is chosen with entries uniformly distributed on [0,1]. Here, s=2, m=15, and deg=5. The number of restarts and CPU time for q=0.1 and q=1 is given in Table 2. For this large problem, as expected, PGl-CMRH performs better than CMRH and other variants concerning CPU time.

Number of restarts and CPU time for Example 2; q=0.1 (top) and q=1 (bottom).

n Gl-GMRES Gl-CMRH WGl-CMRH PGl-CMRH
8,000 14 (1.2) 11 (0.6) 13 (0.7) 2 (0.3)
27,000 26 (8.4) 23 (5.0) 21 (4.9) 5 (2.3)
64,000 40 (40.7) 32 (21.2) 32 (22.8) 7 (9.5)
125,000 58 (147.2) 41 (53.3) 47 (74.6) 9 (23.2)
216,000 81 (298.8) 58 (122.5) 61 (160.7) 17 (74.4)

n Gl-GMRES Gl-CMRH WGl-CMRH PGl-CMRH

8,000 14 (1.2) 13 (0.6) 14 (0.7) 2 (0.3)
27,000 25 (7.4) 22 (4.6) 22 (4.8) 5 (2.4)
64,000 39 (37.0) 32 (20.6) 34 (23.3) 7 (9.9)
125,000 57 (114.8) 43 (53.6) 48 (70.8) 9 (23.3)
216,000 79 (313.1) 51 (114.7) 61 (181.0) 17 (76.2)
Example 3.

In practice, the degree of the polynomial preconditioner Qdeg-1 has a great impact on the numerical performance of PGl-CMRH. Thus it deserves our attention to investigate how to choose the “optimal” degree (if existed) for generic matrices. Nevertheless, theoretical analysis to this end can be very hard. Instead, we show empirically how to choose a range of degrees for the polynomial Qdeg-1 such that PGl-CMRH at least yields a modest performance. To this end, we use ten unsymmetrical testing matrices from  and illustrate how PGl-CMRH performs for each matrix with deg varying from 2 to 15; see Figure 1. Some properties of these testing matrices are listed in Table 3. The right-hand side matrix B is chosen with entries uniformly distributed on [0,1]. Since we are only concerned with the value of deg that makes PGl-CMRH performs stably with the shortest CPU time, we have normalized values of CPU time by dividing the maximum value of CPU time for a certain curve. Take the matrix pde2961 for example. The longest time is 4.2 seconds (with deg=3); then we divide all values of CPU time by 4.2 for pde2961 and plot the result in Figure 1. This approach facilitates our comparison since different curves become more clustered now. Some remarks can be made from Figure 1. First, the curves seem rather problem-dependent and are not necessarily nonincreasing with increasing values of deg; for instance, the curve of rdb2048l is rather irregular and hence unpredictable. However, this does not contradict Theorem 1 where it is stated that Qdeg-1(A) can approximate A-1 better with growing values of deg. In other words, Theorem 1 explains theoretically that the total number of restarts will be reduced with increasing values of deg. However, this does not apply to the change of CPU time. In fact, it is likely that PGl-CMRH with high degree preconditioner takes more CPU time in generating the polynomial preconditioner (even with less number of restarts) and hence uses more time to converge than that of its low degree counterpart. Second, most curves locate the corresponding shortest CPU time point with deg between 2 and 10. This can be the first reason for favoring small values of deg. Finally, more rounding errors can be introduced in developing high-degree polynomial preconditioners from the numerical point of view. This is the second reason for the approval of low-degree polynomial preconditioners. Therefore it is useful to test with deg from 2 to 10. Under extreme situations, however, higher degree may be demanded if a low-degree preconditioner fails to bring the required accuracy.

Properties of testing matrices in Example 3.

Matrix n n n z Discipline
add32 4960 19848 Electronic circuit design
cdde6 961 4681 Computational fluid dynamics
fs680.1 680 2184 Chemical kinetics
fidap001 216 4339 Finite element modeling
gre115 115 421 Simulation studies in computer systems
pde2961 2961 14585 Partial differential equations
rdb200 200 1120 Chemical engineering
rdb2048l 2048 12032 Chemical engineering
rdb3200l 3200 18880 Chemical engineering
sherman4 1104 3786 Oil reservoir modeling

Normalized CPU time against deg (from 2 to 15) for ten testing matrices.

5. Conclusion

To remedy the slow convergence of the original Gl-CMRH(m) method, a new variant of Gl-CMRH(m) for linear systems with multiple right-hand sides is developed. The proposed method often yields better performance than its predecessor Gl-CMRH(m) and other global variants in terms of CPU time. We show experimentally that polynomial preconditioners with degree lower than 10 should be considered if no prior knowledge is known.

Acknowledgments

The authors would like to thank Professor Jinyun Yuan and the referees for their valuable remarks that improved this paper. The work is supported by the National Natural Science Foundation (11371243), the Key Disciplines of Shanghai Municipality (S30104), the Innovation Program of Shanghai Municipal Education Commission (13ZZ068), and the Anhui Provincial Natural Science Foundation (1308085QF117).