A Flexible Global GCRO-DR Method for Shifted Linear Systems and General Coupled Matrix Equations

,


Introduction
In the current study, we focus on the solution of sequences of shifted linear systems of the form where A i s i�1 ⊂ C n×n are large nonsingular matrices of order n, the right-hand sides b ℓ i p i ℓ�1 ⊂ C n are vectors of the length n, and σ ℓ i p i ℓ�1 ⊂ C are called shifts. Such shifted linear systems (1) arise in many realistic applications, for example, electromagnetic scattering [1], QCD [2][3][4][5], dynamics of structures [6], complex network [7][8][9], and model reduction [10].
Note that problem (1) is reduced to the solution of a sequence of linear systems with the form when ℓ � 1, where the coefficient matrix A i � A i − σ i I and the right-hand sides b i change from one system to the next. One approach to more efficient solution of such systems is to exploit Krylov subspace recycling; that is, recycling a judiciously selected subspace of the search space typically reduces the number of iterations substantially. ere are several algorithms that take advantage of recycling for subsequent systems as well as for restarts of generalized minimal residual (GMRES) type methods, such as the generalized conjugate residual method with inner orthogonalization (GCRO) [11], GCRO with deflated restarting (GCRO-DR) [12], GMRES with deflated restarting (GMRES-DR) [13] (only for a single system), and GCRO with optimal truncation (GCROT) [14].
It is also noted that, for each i, we substantially solve a family of shifted linear systems with the form A − σ ℓ I x ℓ � b ℓ , for ℓ � 1, 2, . . . , p i .
Krylov subspace solvers [15] are particularly attractive for the solutions of (3) because of its shift-invariance property [16].
at is, for any shift σ, the m-th Krylov subspace satisfies as long as b � βb. is property has led to many approaches for solving such linear systems (3) simultaneously by generating only one subspace. Some solvers are built upon the (Bi-)Lanczos recurrences [17][18][19][20][21] and some others use the Arnoldi-like iteration [16,[22][23][24][25][26][27][28]. However, the shiftinvariance property will be lost when using the general preconditioning, such as Jacobi preconditioning, SSOR preconditioning, ILU preconditioning, AMG preconditioning, and AINV preconditioning. Furthermore, when b ℓ i p i ℓ�1 are unrelated, we cannot employ the shiftinvariance property. It means that all residuals are not simultaneously minimized whiling maintaining collinearity, for example, the GMRES-type methods [16,23,29]. erefore, we propose an alternative without exploiting the shift-invariance. In fact, the solution of (3) is equivalent to solve a Sylvester equation [30], where T � diag(σ 1 , σ 2 , . . . , σ p i ) ∈ C p i ×p i and B � [b 1 , b 2 , . . . , b p i ] ∈ C n×p i . erefore, (1) can also be reformulated as a family of Sylvester equations with the slowly changing coefficient matrices For simplicity of the presentation, a linear operator A is defined as follows: Applying the linear operator A, we can rewrite (1) as Moreover, we also assume that the linear operator A is invertible; that is, Obviously, the linear systems (8) can be seen as generalized version of (2). erefore, based on the special structure of (1), we propose a new generalized global version of the GCRO-DR method that not only combines with any type of the preconditioning but also allows one to retain important spectral information generated during the solution of the i-th linear system and then to exploit such information to accelerate convergence of the iterations when solving the subsequent i + 1-th system. Moreover, the proposed method can be combined with any form of preconditioning and does not require the right-hand sides to maintain collinearity.
For simplicity, we denote A(X) by AX. In the following, we utilize the operator A to present our flexible global GCRO-DR algorithms for solving (1). e paper is organized as follows. A global version of GCRO-DR to solve (1) is derived in detail in Section 2. We also apply the flexible preconditioning operator at each iteration to improve the global GCRO-DR algorithm. By performing an error analysis, we deduce that a much looser tolerance can be applied to save computation by limiting the flexible preconditioned work without sacrificing the closeness of the computed and the true residuals. Meanwhile, we also propose a global version of the FGMRES-DR method and then analyze the relationship between two flexible global methods. In Section 2.5, the flexible global GCRO-DR method is applied to solve the general coupled matrix equations. Section 3 numerically demonstrates the effectiveness of the novel method. Finally, some conclusions and remarks are summarized in Section 4.

Flexible Global GCRO-DR Method
GCRO-DR is a well-established Krylov subspace solver that paired with a harmonic Ritz vector recycling strategy [12]. GCRO [11] is a nested Krylov method, which uses GCR as the outer solver and the restarted GMRES as an inner method. In [12], Parks et al. propose GCRO-DR that exploits deflated restarting within the framework of GCRO for a family of slowly changing linear equations. Because of its superior numerical behavior, we introduce a global version of GCRO-DR to solve (1). Here, we not only employ the subspace recycling at each cycle but also apply the flexible preconditioning operator at each iteration. e proposed method is referred to as flexible global GCRO-DR (FGL-GCRO-DR).

Flexible Global Arnoldi Process.
We first present a global Arnoldi process used in the flexible setting to create an Forthonormal basis for the global Krylov subspace in Algorithm 1. We define M j as the preconditioning operator used at the j-th iteration, and the action M j on rectangular matrix V is denoted by M j (V).
e flexible global Arnoldi process, recursively, generates an F-orthonormal basis V m+1 � [V 1 , V 2 , . . . , V m+1 ] of the global Krylov subspace spanned by the rectangular matrices V 1 , AZ 1 , . . . , AZ m−1 . At the end of the m-th iteration, we obtain a matrix relation (flexible global Arnoldi relation); that is, where is an upper Hessenberg matrix.

e FGL-GCRO-DR Algorithm. Suppose that
where Z i , i � 1, 2, . . . , k is a search vector, recycled from either the previous linear system or the previous cycle, and 0 ≤ k ≤ m. By minimizing the residual F-norm, over the search space X 0 + R(Z k ), X k and R k ∈ C n×p satisfy Based upon what is proven in [11,14] and using the framework discussed in [12,32], the m − k steps of the flexible global Arnoldi process are carried out, with the starting rectangular matrix V k+1 � R k /‖R k ‖, while holding the orthogonality to V k , to eventually obtain Since R k ⊥ F R(V k ), that is, V k+1 ⊥ F V k , the following flexible global Arnoldi-like relation holds: At the end, we find the approximate solution with β 0 � 〈V m+1 , R 0 〉 F . To improve the convergence of iterative solvers, the smallest magnitude eigenvalues will be deflated. Following the idea of [12,32], we use harmonic Ritz vectors with those eigenvalues to construct approximate invariant subspace and then the subspace for next cycles is retained. Firstly, we where W m ∈ C n×mp , and each harmonic Ritz pair (θ i , g i ) satisfies It is noted that the harmonic residual vectors satisfy Journal of Mathematics . So, the residual and the harmonic residuals are in the same p-dimensional en we can characterize the relationship by the following formula: where a(: Due to the relations (10) and (21) can be also expressed as Next, we compute the reduced QR-factorization of H m G k � QR with Q ∈ C (m+1)×k and R ∈ C k×k and then de- erefore, at the end of the loop, an update Z k is generated, scaled such that V k � AZ k has F-orthonormal columns; the previous global Krylov subspace basis is dropped, and then we restart. Details of the FGL-GCRO-DR method are summarized in Algorithm 2.
In addition, the major advantage of FGL-GCRO-DR is that it can retain important spectral information generated during the solution of the i-th linear system and then exploit such information to accelerate convergence of the iterations when solving the subsequent i + 1-th system. en we detail how to recycle subspace when solving sequences of slowly changing linear systems (8) as follows (Algorithm 3).

Comments on the FGL-GCRO-DR Method.
To achieve better efficiency of the linear system solvers, we not only employ the subspace recycling at each cycle but also apply the flexible preconditioning operator at each iteration. As we know, the variable preconditioning can be construed as an inexact subspace method. If the matrix A is difficult to invert, we will resort to several steps of iterative method as flexible preconditioning, so we obtain in some way an approximate result, In such a situation, what are the properties of the true residual obtained by using the flexible preconditioning? We will address this question in this section. Following [33,34], we set to be the residual that results after the iterative solver has been terminated; then erefore, the perturbation in linear operation with A can be written explicitly as e perturbation implies that the error computed has the bound In fact, the solution is obtained directly from the approximation subspace R(Z m ); that is, indicating that the perturbation E j does not affect the computation of the true residual. In other words, in exact arithmetic Input: A ∈ C n×n , B ∈ C n×p . Choose m, k, tol and initial guesses X 0 . (1) Suppose that V i k , Z i k and W i k are defined from solving i-th linear system linear and satisfy Journal of Mathematics and is orthogonal to R(V m+1 ). As a consequence, the perturbation of the linear operations only influences the goodness of R(Z m ) and the accuracy of the approximate solution. erefore, we deduce that, in the flexible setting, a dynamic stopping criterion or a much looser tolerance can be employed to save computation by limiting the flexible preconditioning work without sacrificing the closeness of the computed and the true residuals.

e Relationship between Two Flexible Global Methods with Subspace
Recycling. As we know, FGCRO-DR [32] and FGMRES-DR [35] are two important methods in the general category of augmented (deflated) GMRES-type methods, which choose harmonic Ritz vectors as augmented (deflated) subspace. ey emphasize on the solution of linear system Ax � b which has only one equation. e relationship between FGCRO-DR and FGMRES-DR has been well described by [32]. It is shown that they are algebraically equivalent as long as a collinearity condition is satisfied.In fact, FGL-GCRO-DR is a global generalization of FGCRO-DR. For the sake of descriptive integrality, we also present a global generalization of FGMRES-DR algorithm, referred to as FGL-GMRES-DR, in Algorithm 4. As for global versions, the FGL-GCRO-DR method is also closely related to the FGL-GMRES-DR approach in algebraical analysis, but their numerical behavior will be slightly different. Details will be shown in Section 3. Furthermore, we compare the main calculation cost and storage capacity for each cycle of FGL-GCRO-DR and FGL-GMRES-DR in Table 1. In practical applications, n ≫ m, k.
Here, opM and opA denote floating-point calculation counts for the preconditioning application and the linear operations, respectively. e computational cost of harmonic Ritz vectors and "min‖c − H m y‖ 2 " have not been induced in Table 1 since the cost is approximately O(p 3 ), much more less than n. C � (opM + opA)(m − k) + (2(m + 1)kp + p + 2mkp + (m − k)(2m + 2k + 6))np is referred to as the total cost of FGL-GCRO-DR. One sees that the time complexities of two global algorithms are the same, that is, O((m − k)n 2 p), but FGL-GMRES-DR requires slightly more computation per cycle than FGL-GCRO-DR in terms of computational cost.

Numerical Experiments
is section presents some numerical examples to illustrate the effectiveness of our algorithm for solving shifted linear system (1) and matrix equations (8) and compare the performance with the global GMRES (m) approach (referred to as GL-GMRES (m)) [31,50,59] and GL-GMRES-DR [60], with or without preconditioning. In each example, the flexible preconditioning is dominated by 5 steps of the global full GMRES [50].
We consider three numerical illustrations here. e first two experiments are sequences of linear systems arising from lattice QCD [2][3][4]61] and the finite difference scheme of the convection-diffusion-reaction equation [62], respectively. e last one is the general coupled Sylvester matrix equations. e comparisons are mainly made in two aspects: the execution time in seconds (T (s), for short) and number of matrix-vector products (mvps, for short).
All the numerical experiments are carried out on MATLAB 2017b. We choose X 0 � 0 n×p , m � 20, and k � 10. e stopping criteria are . . , s for all the approaches, or when mvps exceed the set maximum of matrix-vector multiplications (MAXIT, for short). Moreover, the symbol † represents that the solver does not achieve the stopping criteria within MAXIT and RHSs � rand(n, p).  Tables 2 and  3. In addition, we also depict the residual histories for linear systems solved using iterative algorithms with or without flexible preconditioning.
As seen from Tables 2 and 3 and Figures 1 and 2 , it is found that using deflation and flexible preconditioning is conducive to improving the convergence rate. FGL-GCRO-DR has a significant decrease in terms of mvps and T(s), which are similar to FGL-GMRES-DR, but performs slightly better. e GL-GMRES method is less effective and does not converge in some test problems. e above results show that FGL-GCRO-DR has the potential to improve the convergence, compared with other global Krylov subspace solvers.
Example 2. For the second case, we assess the performance of global solvers on three-dimensional separable model convection-diffusion problem in the unit cube with the homogeneous Dirichlet boundary conditions [62]:  Tables 4 and 5 and Figures 3 and 4 show that FGL-GMRES is slightly superior to FGL-GCRO-DR when the mesh size h � 0.025. However, as the number of grid points increases, FGL-GCRO-DR enjoys the significant decrease in the number of mvps and T(s). Meanwhile, according to Tables 4 and 5, we can see that the convergence is speeded up remarkably by considering the flexible preconditioning. Moreover, even the size of grid points increases, and the average number of mvps for FGL-GCRO-DR does not increase.

Example 3.
For the last test case, we are interested in solving the general coupled Sylvester matrix equations [45,47] where Input: A ∈ C n×n , B ∈ C n×p . Choose m, k, tol and initial guesses X 0 Output:
Here, we consider n � 50, 60, 70, 80, 90, 100 different orders. As the definition of MAXIT, we choose MAXIT � 500. e corresponding numerical results of mvps and T(s) are reported in Table 6 and the residual histories are depicted in Figures 5 and 6. As seen from Table 5 and Figures 5 and 6, it is noted that the flexible deflated GMRES-type methods (FGL-GCRO-DR and FGL-GMRES-DR) are found to be efficient in terms of performance test problems of different orders, while GL-GMRES (m) and FGL-GMRES (m) fail to converge in the given steps with or without variable preconditioning, while the parameter n is greater than 80. e convergence is improved remarkably by considering the eigenvalue deflation technique and the flexible preconditioning. One should note that FGL-GCRO-DR enjoys the significant decrease in the number of mvps and T(s) and performs slightly better than FGL-GMRES-DR method. In addition, when n increases, the average number of mvps and T(s) for FGL-GCRO-DR do not increase.

Conclusions
In this study, a new flexible global GCRO-DR method is proposed for addressing sequences of shifted linear systems and general coupled matrix equations. e proposed method does not require the right-hand sides to be related and can also be compatible with general preconditioning. By performing an error analysis, we deduce that a much looser tolerance can be applied to save computation by limiting the flexible preconditioned work without sacrificing the closeness of the computed and the true residuals. Meanwhile, we also propose a global version of the FGMRES-DR method and then analyze the relationship between two flexible global methods. Finally, numerical experiments show that FGL-GCRO-DR significantly reduces the mvps, saves the runtime, and enjoys faster convergence than some other global GMRES-type approaches on tough problems.
As future work, we plan to investigate the numerical properties of FGL-GCRO-DR on large-scale realistic problems. Of interest are applications related to, for example, model adaptation in Gaussian process models [63] or stochastic finite element methods [64] in three dimensions where variable preconditioning using inexact solvers has to be usually considered. Another extension of this work is a distributed-memory parallel implementation and testing using MPI + OpenMP and MPI + CUDA. Investigation will focus on implementing, testing, and comparing the runtime of the introduced FGL-GCRO-DR on CPUs and GPUs with respect to other existent similar methods.
Data Availability e data will be made available upon reasonable request via e-mail to the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.