The restarted global CMRH method (Gl-CMRH(m)) (Heyouni, 2001) is an attractive method for linear systems with multiple right-hand sides. However, Gl-CMRH(m) may converge slowly or even stagnate due to a limited Krylov subspace. To ameliorate this drawback, a polynomial preconditioned variant of Gl-CMRH(m) is presented. We give a theoretical result for the square case that assures that the number of restarts can be reduced with increasing values of the polynomial degree. Numerical experiments from real applications are used to validate the effectiveness of the proposed method.
1. Introduction
We consider the linear systems of the form
(1)AX=B,
where A is a nonsingular and sparse matrix of order n, B∈ℝn×s with usually s≤n. Such a situation arises from, for instance, wave-scattering problems, image restorations, recursive least squares computations, numerical methods for integral equations, and structural mechanics problems; see [1, 2] and references therein.
Numerical solvers for (1) can be roughly divided into two categories. The first is the direct method; for instance, the LU factorization is competent since A is only factorized once to recast (1) into a triangular system which is easy to solve. However, if the coefficient matrix A is large and sparse or sometimes not readily available, then iterative solvers may become the only choice and possibly fall into the following three classes.
The first class is the block method. For symmetric problems, the first block methods are due to O’Leary, including the block conjugate gradient method and block biconjugate gradient method [3]. For nonsymmetric cases, the block generalized minimal residual method [4–6], the block BiCGSTAB method [7], the block quasiminimal residual method [8], the block least squares method [9], the block Lanczos method [10], and the block IDR(s) method [11] have been proposed recently. In general, the block solvers are more suitable for dense systems with precondition.
The second class is the seed method. The main idea of this kind of methods is briefed below. We first select a single system (seed system) and develop the corresponding Krylov subspace. Then we project all the residuals of the other systems onto the same Krylov subspace to find new approximations as initial guess; see [2, 12, 13] for details.
The last class is the global method. To our knowledge, the term global is at least due to Saad [14, Chapter 10] and has been further populated by Jbilou et al. [15] with the global FOM and global GMRES methods for matrix equations. Following the work [15], many other global methods have been developed, including, to name just a few of them, the global BiCG and global BiCGSTAB methods [16, 17], the global Hessenberg and global CMRH (changing minimal residual method based on the Hessenberg process) methods [18] and their weighted variants [19], the skew-symmetric methods [20], and the global SCD method [21]. Generally, the global methods are more appropriate for large and sparse systems.
It is well known that the performance of the above Krylov subspace methods can be reinforced with a suitable preconditioner [14] or through effective matrix splitting techniques [22, 23]. In this paper, we are interested in preconditioning the global methods. Specifically, we aim at improving the convergence behavior of the restarted global CMRH method (Gl-CMRH(m)) [18], which is originally proposed to reduce the increasing storage requirement in its full version. However, because of the use of a small subspace (say m≪n), Gl-CMRH(m) is likely to slow down or even stalls out. Heyouni and Essai give a weighted version of Gl-CMRH(m) (WGl-CMRH(m)) [19] to alleviate such disadvantage. Instead, we propose a different approach, that is, by polynomial preconditioner to improve Gl-CMRH(m) in this paper.
The remainder of this work is organized as follows. In Section 2, we first recall some notations and properties of the global method, and then we sketch the Gl-CMRH(m) method. In Section 3, we construct the polynomial preconditioner tailored to Gl-CMRH(m) by exploiting the relation between the Krylov matrix and the global basis. For square right-hand side matrices, we also give a theoretical result that justifies the use of the proposed polynomial preconditioner. In Section 4, several numerical examples are employed to substantiate the effectiveness of the proposed method. Some concluding remarks and potential future work are briefed in the last section.
2. Notations and the Global CMRH Method
In this section, we first give some notations and properties used in the global methods, which will henceforth be adopted extensively in deriving the main results. Then we present a brief introduction of the Gl-CMRH(m) method [18]. More details about the global methods can be found, for instance, in [15, 18, 19].
2.1. Notations and Properties
Throughout this paper, the following notations will be used. The norms ∥·∥2 and ∥·∥F represent the vector 2-norm and matrix F-norm, respectively. Let 𝕄 be the set of n×s rectangular matrices. If X∈𝕄, then XT stands for its transpose. For a square matrix A, A-1 indicates the inverse of A if existed. Unless otherwise stated, subscripts denote the corresponding iteration step; for example, Xk denotes the kth iterate of the matrix (vector) X. Moreover, the (i,j) entries of matrices Y and Xk are denoted by (Y)i,j and (Xk)i,j, respectively. If a column or a row of a matrix is invoked, then we denote it in a dot format; that is, (Xk)·,j and (Xk)i,· mean correspondingly the jth column and the ith row of Xk. Besides, (Xk)i:j,s:t extracts the submatrix from i to j rows and from s to t columns of Xk.
Next we present some notations and basic properties used in the global methods [15]. Given the n×ms block matrix 𝒱m=[V1,V2,…,Vm], where Vi∈𝕄, i=1,2,…,m, then we define the matrix product * as
(2)𝒱m*f=∑i=1m(f)iVi,
where f∈ℝm. For any matrix H∈ℝm×m, we define analogously the * product by
(3)𝒱m*H=[𝒱m*(H)·,1,𝒱m*(H)·,2,…,𝒱m*(H)·,m].
It can be verified that such matrix product satisfies the following properties:
(4)𝒱m*(f+g)=(𝒱m*f)+(𝒱m*g),(𝒱m*H)*f=𝒱m*(Hf),
where f,g∈ℝm.
2.2. The Global CMRH Method
The Gl-CMRH method [18] is an efficient extension of the CMRH method [24] for solving (1). It is based on the global Hessenberg process [18]. As for the numerical performance, Gl-CMRH is in general competitive with the classic global GMRES method (Gl-GMRES) [15].
Now we give a brief sketch of Gl-CMRH. Let X0∈𝕄 be the initial guess of (1) with the associated residual matrix R0=B-AX0. The mth iteration Xm is searched in the affine subspace X0+𝒦m(A,R0); that is, Xm-X0=Wm∈𝒦m(A,R0), where Wm is the mth correction matrix. The matrix Krylov subspace is defined as 𝒦m(A,R0)=span{R0,AR0,…,Am-1R0}. Using the basis 𝒱m of 𝒦m(A,R0) given by the global Hessenberg process [18], we get
(5)Xm=X0+𝒱m*ym,
where ym∈ℝm. The global Hessenberg process in [18] also yields
(6)A𝒱m=𝒱m+1*H-m=𝒱m*Hm+(H-m)m+1,m[0,0,…,0,Vm+1],
where H-m is an (m+1)×m upper Hessenberg matrix, Hm is obtained by deleting the last row of H-m, and 0 is the zero matrix in 𝕄. Thus it follows immediately that
(7)Rm=R0-A𝒱m*ym=𝒱m+1*(βe1(m+1)-H-mym),
where β=max1≤i≤n,1≤j≤s{|(R0)i,j|} and e1(m+1)=[1,0,…,0]T∈ℝm+1. To obtain the vector ym, a restriction is imposed on the Gl-CMRH method; that is,
(8)∥Rm∥F=minW∈𝒦m(A,R0)∥R0-AW∥F.
Relations (7) and (8) yield
(9)ym=argminy∈ℝm∥𝒱m+1*(βe1(m+1)-H-my)∥F.
Instead of solving (9), which requires 𝒪(nm2) operations and 𝒪(nm) storage, we solve a smaller problem
(10)miny∈ℝm∥βe1(m+1)-H-my∥2,
which leads to ym=β(H-mTH-m)-1H-mTe1(m+1) by assuming that H-m is of full rank. From (5) and (10), the mth iterate Xm can be updated by
(11)Xm=X0+β𝒱m*((H-mTH-m)-1H-mTe1(m+1)).
As in the Gl-GMRES method [15], a restarting strategy is used to address the problem that the computational and storage requirements increase with iterations. Algorithm 1 gives a framework of the restarted version of Gl-CMRH (Gl-CMRH(m)). We refer to [18, 19] for elaborate explanation for the Gl-CMRH method.
(1) Choose an initial guess X0, the restarting frequency m and the tolerance tol. Compute R0=B-AX0.
Determine i0,j0 such that |(R0)i0,j0|=max1≤i≤n,1≤j≤s{(R0)i0,j0}. Set β=|(R0)i0,j0|, V1=R0/β, p1,1=i0 and p1,2=j0.
(2) Construct the matrix basis 𝒱m=[V1,V2,…,Vm] and H-m by the global Hessenberg process [18].
(3) Solve ym by (10) and update Xm by (5).
(4) Compute Rm=B-AXm. If ∥Rm∥F/∥R0∥F≤tol, then stop; otherwise set X0=Xm, R0=Rm.
Choose i0,j0 such that |(R0)i0,j0|=max1≤i≤n,1≤j≤s{(R0)i0,j0}.
Set β=|(R0)i0,j0|, V1=R0/β, p1,1=i0 and p1,2=j0. Go to Step >2.
3. A New Polynomial Preconditioned Gl-CMRH(<inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M118"><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:math></inline-formula>) Method
In many cases, the accuracy of Gl-CMRH(m) is sufficient. Due to the limited dimension of the matrix Krylov subspace 𝒦m, however, Gl-CMRH(m) may suffer from slow convergence or even stagnation in practice, just like in GMRES(m) [25] and Gl-GMRES(m) [15]. To remedy this drawback, some accelerating techniques are demanded, for instance, a weighting strategy exploited in [19]. Besides, a polynomial preconditioner can also be adapted to improve the convergence [26]. In this section we focus on constructing an efficient polynomial preconditioner pertinent to the Gl-CMRH(m) method.
The essence of the polynomial preconditioned method is to devise a polynomial Q(A)≈A-1 such that an easier system
(12)Q(A)AX=Q(A)B
is solved instead of solving the original system (1). In what follows, we obtain a polynomial preconditioner Q(x) by extracting some useful information from Gl-CMRH(m).
Now suppose that the block Krylov matrix Kk is of the form
(13)Kk=[R0,AR0,…,Ak-1R0],
where R0 is the initial residual matrix. By comparing the last s columns of the second equation in (6), we have
(14)AVk=𝒱k*(Hk)·,k+(H-k)k+1,kVk+1.
The equality (14) can be rearranged as
(15)Vk+1=((H-k)k+1,k)-1(AVk-𝒱k*(Hk)·,k).
Let us consider the relationship between Kk and the basis 𝒱k. Since 𝒱k and Kk span the same space, it follows that
(16)𝒱k=Kk*Uk,
where Uk is an upper triangular matrix. The relation (16), however, does not shed too much light because how to compute Uk still remains unclear. Fortunately, an explicit recurrence for Uk can be derived in terms of Uk-1 and Hk-1. By combining (16) and (4), we get
(17)𝒱k*(Hk)·,k=Kk*(Uk(Hk)·,k)=[Kk,AkR0]*(Uk(Hk)·,k0).
Since Vk=(𝒱k)·,(k-1)s+1:ks=Kk*(Uk)·,k, then
(18)AVk=[AR0,A2R0,…,AkR0]*(Uk)·,k=Kk+1*(0(Uk)·,k).
Substituting (17) and (18) into (15) gives rise to
(19)Vk+1=((H-k)k+1,k)-1Kk+1*((0(Uk)·,k)-(Uk(Hk)·,k0)).
Besides, the relation (16) gives Vk+1=Kk+1*(Uk+1)·,k+1. By combining it with (19), we obtain a recurrence for the (k+1)st column of Uk+1; that is,
(20)(Uk+1)·,k+1=((H-k)k+1,k)-1((0(Uk)·,k)-(Uk(Hk)·,k0)).
Therefore, Uk in (16) can be updated recursively by (20). Recall that in (5) Xk is updated on the basis 𝒱k. Here we will show another way to update Xk which is based on the block Krylov matrix Kk. It follows from (5), (16), and (4) that
(21)Xk=X0+(Kk*Uk)*yk=X0+Kk*(Ukyk)=X0+∑i=0k-1αiAiR0,
where Ukyk=[α0,α1,…,αk-1]T and yk is solved from (10). Denote by Qk-1 a polynomial in A of degree k-1; that is, Qk-1(A)=∑i=0k-1αiAi. Hence (21) can be recast as
(22)Xk=X0+Qk-1(A)R0.
The matrix polynomial Qk-1(A) in (22) can be regarded as the approximation to A-1 in some sense. This is justified for the case n=s in (1) by the following result.
Theorem 1.
Let Qk-1(A), X0 and R0=B-AX0 be the square matrices defined in (22), and lot X* be the true solution of (1). Suppose that X*-X0 is nonsingular. Then one has
(23)∥I-Qk-1(A)A∥F≤∥Ek∥F∥E0-1∥F,
where Ek:=X*-Xk and E0:=X*-X0.
Proof.
The inequality (23) follows immediately from an arrangement of (22).
Remark 2.
In (23), the term ∥Ek∥F becomes smaller with growing k and hence the upper bound diminishes correspondingly, which in turn implies that Qk-1(A) approximates A-1 asymptotically. This justifies the use of the polynomial preconditioner. In general, (23) assures that the number of restarts will be reduced correspondingly with increasing k. Yet this does not necessarily mean that the CPU time will be reduced simultaneously since the time saved from the reduction of restarts may be offset by the extra time spent in constructing the polynomial. In practice, we are often more concerned with the CPU time than the restarting number. Therefore, we restrict ourselves to small values of k. For s<n, an inequality similar to (23) is generally unavailable. Nevertheless, numerical examples seem to demonstrate that the asymptotical property of (23) is also shared by the case s<n; see Example 3 in Section 4 for more discussions.
By putting all together, we propose the new polynomial preconditioned global CMRH method (PGl-CMRH(m,deg)) that is shown in Algorithm 2.
(2) Phase I: Compute R0=B-AX0. Determine i0,j0 such that (R0)i0,j0=max1≤i≤n,1≤j≤s{|(R0)i,j|}.
Set β=(R0)i0,j0, V1=R0/β, p1,1=i0 and p1,2=j0. Let U1,1=(R0)i0,j0 (the upper triangular matrix
defined in (16)).
(3) fork=1:deg
(4) M=AVk
(5) for j=1:k
(6) (H-)j,k=(M)pj,1,pj,2
(7) M=M-(H-)j,kVj
(8) end for
(9) Determine i0,j0 such that (M)i0,j0=max1≤i≤n,1≤j≤s{|(M)i,j|}.
(10) Set (H-)k+1,k=(M)i0,j0, Vk+1=M/(H-)k+1,k, pk+1,1=i0 and pk+1,2=j0.
(11) Compute (Uk+1).,k+1 from (20).
(12) end for
(13) Xdeg=X0+𝒱deg*ydeg, where ydeg=argminy∈ℝdeg∥(R0)i0,j0e1(deg+1)-H-degy∥2.
(14) Construct the polynomial preconditioner Qdeg-1(A) with its coefficients decided by entries of Udegydeg.
(15) Phase II: Solve Qdeg-1(A)AX=Qdeg-1(A)B using Algorithm 1.
4. Numerical Examples
In this section, we present some numerical experiments which are coded with MATLAB 7.8.0. For fair comparisons, some other global solvers mentioned earlier like Gl-CMRH(m) [18], WGl-CMRH(m) [19], and Gl-GMRES(m) [15] have also been implemented. From now on we drop the parameters m and deg in brackets without ambiguity. In all examples, we assume that X0=0. The terminating criterion for the kth iteration is tol=∥Rk∥F/∥R0∥F≤10-10. Though other alternatives are possible, we use D=ns|(R0)i,j|/∥R0∥F as the the weighting matrix for WGl-CMRH, which is also preferred in [19]. The coefficient matrices A in the first two examples are derived from the discretizations of the Poisson’s equation and convection-diffusion problems which occur frequently in applied science and engineering. The coefficient matrices A in the third example are quoted from the Matrix Market [27].
Example 1.
We consider the linear systems of (1) in which its coefficient matrix A is obtained from the discretization of
(24)ℒu=uxx+uyy
on the unit square (0,1)×(0,1) with u=0 on the boundary. It can be discretized through the centered difference scheme at the gird points (xi,yi) with xi=ih,yj=jh, where the mesh size h=1/(N+1) for i,j=0,…,N+1. This yields a block tridiagonal matrix of size n=N2. The right-hand side matrix B is chosen with entries uniformly distributed on [0,1]; see [14, Chapter 2] for more details about (24). Related parameters are given by s=2, m=20 and deg=5. The number of restarts and CPU time for matrices A of different sizes are given in Table 1. As observed from Table 1, PGl-CMRH improves the original CMRH method by time ratios from 17.8% to 55.4%. Compared with WGl-CMRH, PGl-CMRH requires less number of restarts and CPU time to achieve the required accuracy. Note that WGl-CMRH does not speed up the convergence of Gl-CMRH. This indicates that a different weighting matrix should be used. To find the optimal weighting matrix, however, remains an open problem [19].
Number of restarts and CPU time (in brackets) for Example 1.
n
Gl-GMRES
Gl-CMRH
WGl-CMRH
PGl-CMRH
10,000
121 (18.0)
85 (7.4)
89 (8.4)
24 (4.1)
14,400
150 (34.6)
85 (11.8)
116 (17.3)
23 (6.0)
22,500
259 (133.4)
165 (42.8)
173 (53.5)
37 (17.0)
40,000
450 (585.8)
255 (173.0)
302 (235.2)
26 (32.3)
44,100
496 (699.9)
322 (253.3)
368 (313.6)
39 (45.0)
Example 2.
Consider the linear systems of (1) where its coefficient matrix A is obtained from the discretization of the three-dimensional convection-diffusion problem
(25)𝒯u=-(uxx+uyy+uzz)+q(ux+uy+uz)
on the unit cube Ω=[0,1]×[0,1]×[0,1]. Here q is a constant coefficient and (25) subjects to Dirichlet-type boundary conditions. This equation can be discretized by applying seven-point finite difference discretizations. For instance, we use the centered difference to the diffusive terms and the first-order upwind approximations to the convective terms. This approach yields a coefficient matrix A of size n=N3, where the equidistant mesh size h=1/(N+1) is used, and the natural lexicographic ordering is adopted to the unknowns; we refer to [22, Section 4] for more details. The right-hand side matrix B is chosen with entries uniformly distributed on [0,1]. Here, s=2, m=15, and deg=5. The number of restarts and CPU time for q=0.1 and q=1 is given in Table 2. For this large problem, as expected, PGl-CMRH performs better than CMRH and other variants concerning CPU time.
Number of restarts and CPU time for Example 2; q=0.1 (top) and q=1 (bottom).
n
Gl-GMRES
Gl-CMRH
WGl-CMRH
PGl-CMRH
8,000
14 (1.2)
11 (0.6)
13 (0.7)
2 (0.3)
27,000
26 (8.4)
23 (5.0)
21 (4.9)
5 (2.3)
64,000
40 (40.7)
32 (21.2)
32 (22.8)
7 (9.5)
125,000
58 (147.2)
41 (53.3)
47 (74.6)
9 (23.2)
216,000
81 (298.8)
58 (122.5)
61 (160.7)
17 (74.4)
n
Gl-GMRES
Gl-CMRH
WGl-CMRH
PGl-CMRH
8,000
14 (1.2)
13 (0.6)
14 (0.7)
2 (0.3)
27,000
25 (7.4)
22 (4.6)
22 (4.8)
5 (2.4)
64,000
39 (37.0)
32 (20.6)
34 (23.3)
7 (9.9)
125,000
57 (114.8)
43 (53.6)
48 (70.8)
9 (23.3)
216,000
79 (313.1)
51 (114.7)
61 (181.0)
17 (76.2)
Example 3.
In practice, the degree of the polynomial preconditioner Qdeg-1 has a great impact on the numerical performance of PGl-CMRH. Thus it deserves our attention to investigate how to choose the “optimal” degree (if existed) for generic matrices. Nevertheless, theoretical analysis to this end can be very hard. Instead, we show empirically how to choose a range of degrees for the polynomial Qdeg-1 such that PGl-CMRH at least yields a modest performance. To this end, we use ten unsymmetrical testing matrices from [27] and illustrate how PGl-CMRH performs for each matrix with deg varying from 2 to 15; see Figure 1. Some properties of these testing matrices are listed in Table 3. The right-hand side matrix B is chosen with entries uniformly distributed on [0,1]. Since we are only concerned with the value of deg that makes PGl-CMRH performs stably with the shortest CPU time, we have normalized values of CPU time by dividing the maximum value of CPU time for a certain curve. Take the matrix pde2961 for example. The longest time is 4.2 seconds (with deg=3); then we divide all values of CPU time by 4.2 for pde2961 and plot the result in Figure 1. This approach facilitates our comparison since different curves become more clustered now. Some remarks can be made from Figure 1. First, the curves seem rather problem-dependent and are not necessarily nonincreasing with increasing values of deg; for instance, the curve of rdb2048l is rather irregular and hence unpredictable. However, this does not contradict Theorem 1 where it is stated that Qdeg-1(A) can approximate A-1 better with growing values of deg. In other words, Theorem 1 explains theoretically that the total number of restarts will be reduced with increasing values of deg. However, this does not apply to the change of CPU time. In fact, it is likely that PGl-CMRH with high degree preconditioner takes more CPU time in generating the polynomial preconditioner (even with less number of restarts) and hence uses more time to converge than that of its low degree counterpart. Second, most curves locate the corresponding shortest CPU time point with deg between 2 and 10. This can be the first reason for favoring small values of deg. Finally, more rounding errors can be introduced in developing high-degree polynomial preconditioners from the numerical point of view. This is the second reason for the approval of low-degree polynomial preconditioners. Therefore it is useful to test with deg from 2 to 10. Under extreme situations, however, higher degree may be demanded if a low-degree preconditioner fails to bring the required accuracy.
Properties of testing matrices in Example 3.
Matrix
n
nnz
Discipline
add32
4960
19848
Electronic circuit design
cdde6
961
4681
Computational fluid dynamics
fs680.1
680
2184
Chemical kinetics
fidap001
216
4339
Finite element modeling
gre115
115
421
Simulation studies in computer systems
pde2961
2961
14585
Partial differential equations
rdb200
200
1120
Chemical engineering
rdb2048l
2048
12032
Chemical engineering
rdb3200l
3200
18880
Chemical engineering
sherman4
1104
3786
Oil reservoir modeling
Normalized CPU time against deg (from 2 to 15) for ten testing matrices.
5. Conclusion
To remedy the slow convergence of the original Gl-CMRH(m) method, a new variant of Gl-CMRH(m) for linear systems with multiple right-hand sides is developed. The proposed method often yields better performance than its predecessor Gl-CMRH(m) and other global variants in terms of CPU time. We show experimentally that polynomial preconditioners with degree lower than 10 should be considered if no prior knowledge is known.
Acknowledgments
The authors would like to thank Professor Jinyun Yuan and the referees for their valuable remarks that improved this paper. The work is supported by the National Natural Science Foundation (11371243), the Key Disciplines of Shanghai Municipality (S30104), the Innovation Program of Shanghai Municipal Education Commission (13ZZ068), and the Anhui Provincial Natural Science Foundation (1308085QF117).
ChanT. F.NgM. K.Galerkin projection methods for solving multiple linear systemsChanT. F.WanW. L.Analysis of projection methods for solving linear systems with multiple right-hand sidesO'LearyD. P.The block conjugate gradient algorithm and related methodsVitalB.SimonciniV.GallopoulosE.An iterative method for nonsymmetric systems with multiple right-hand sidesSimonciniV.GallopoulosE.Convergence properties of block GMRES and matrix polynomialsEl GuennouniA.JbilouK.SadokH.A block version of BICGSTAB for linear systems with multiple right-hand sidesFreundR. W.MalhotraM.A block QMR algorithm for non-Hermitian linear systems with multiple right-hand sidesKarimiS.ToutounianF.The block least squares method for solving nonsymmetric linear systems with multiple right-hand sidesEl GuennouniA.JbilouK.SadokH.The block Lanczos method for linear systems with multiple right-hand sidesDuL.SogabeT.YuB.YamamotoY.ZhangS.-L.A block IDR (s) method for nonsymmetric linear systems with multiple right-hand sidesSmithC. F.PetersonA. F.MittraR.Conjugate gradient algorithm for the treatment of multiple incident electromagnetic fieldsAbdel-RehimA. M.MorganR. B.WilcoxW.Improved seed methods for symmetric positive definite linear equations with multiple right-hand sidesSaadY.JbilouK.MessaoudiA.SadokH.Global FOM and GMRES algorithms for matrix equationsJbilouK.SadokH.Global Lanczos-based methods with applications1997LMA 42Calais, FranceUniversité du LittoralJbilouK.SadokH.TinzefteA.Oblique projection methods for linear systems with multiple right-hand sidesHeyouniM.The global Hessenberg and CMRH methods for linear systems with multiple right-hand sidesHeyouniM.EssaiA.Matrix Krylov subspace methods for linear systems with multiple right-hand sidesGuC.QianH.Skew-symmetric methods for nonsymmetric linear systems with multiple right-hand sidesGuC.YangZ.Global SCD algorithm for real positive definite linear systems with multiple right-hand sidesBaiZ.-Z.GolubG. H.NgM. K.Hermitian and skew-Hermitian splitting methods for non-Hermitian positive definite linear systemsBaiZ.-Z.GolubG. H.LuL.-Z.YinJ.-F.Block triangular and skew-Hermitian splitting methods for positive-definite linear systemsSadokH.CMRH: a new method for solving nonsymmetric linear systems based on the Hessenberg reduction algorithmSaadY.SchultzM. H.GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systemsvan GijzenM. B.A polynomial preconditioner for the GMRES algorithmMatrix Markethttp://math.nist.gov/MatrixMarket/