The Rate of Convergence of the SOR Method in the Positive Semidefinite Case

In this paper, we derive upper bounds that characterize the rate of convergence of the SOR method for solving a linear system of the form Gx = b , where G is a real symmetric positive semide ﬁ nite n × n matrix. The bounds are given in terms of the condition number of G , which is the ratio κ = α / β , where α is the largest eigenvalue of G and β is the smallest nonzero eigenvalue of G . Let H denote the related iteration matrix. Then, since G has a zero eigenvalue, the spectral radius of H equals 1, and the rate of convergence is determined by the size of η , the largest eigenvalue of H whose modulus di ﬀ ers from 1. The bound has the form j η j 2 ≤ 1 − 1/ ð κ c Þ , where c = 2 + log 2 n : The main consequence from this bound is that small condition number forces fast convergence while large condition number allows slow convergence.


Introduction
The SOR method is one of the basic iterative algorithms for solving a large sparse linear system of the form where G ∈ ℝ n×n , b ∈ ℝ n , and x ∈ ℝ n denotes the vector of unknowns. That is, we need to solve a system of n linear equations with n unknowns. The abbreviation SOR stands for "successive overrelaxation." For a detailed description and discussion of this method, see . In this paper, we investigate the SOR rate of convergence in the special case when G is a real symmetric positive semidefinite matrix. This means that G has at least one zero eigenvalue and that the system (1) can be inconsistent. It is also assumed that all the diagonal entries of G are positive. This has two justifications. First, it is not possible to apply the SOR iteration without this assumption. Second, since G is positive semidefinite, a zero diagonal entry implies that the corresponding row and column are null and can be deleted. The last assumption enables us to express G in the form where L is a strictly lower triangular matrix and D is a positive definite diagonal matrix. The kth SOR iteration, k = 1, 2, ⋯, starts x k−1 ∈ ℝ n and computes x k by the rule The matrix is called the SOR iteration matrix, and w is a relaxation parameter that satisfies Remark 1. In this paper, we study the asymptotic rate of convergence of the SOR iteration (3). Thus, when using simpler terms like "fast rate of convergence" or "fast convergence," we always refer to asymptotic behavior.
The role of H w is seen from the following known observation. Assume for a moment that the linear system (1) has a solutionx ∈ ℝ n . Then, the iteration (3) satisfies where x 0 denotes the starting point. The last equality reveals the importance of ρðH w Þ, the spectral radius of H w . If ρð H w Þ < 1, then the sequence fx k g converges towardx, and the rate of convergence depends on the size of ρðH w Þ. The smaller ρðH w Þ is, the faster is the convergence. Otherwise, when ρðH w Þ > 1, the SOR iteration diverges. Yet, when G has a zero eigenvalue, then ρðH w Þ = 1 and the situation depends on the consistency of (1). If the SOR iteration attempts to solve an inconsistent system, it diverges. Otherwise, when the system to solve is consistent, it converges. The question discussed in this paper is how the spectral properties of G affect the rate of convergence in this case. The treatment of the positive semidefinite case becomes easier by noting the relation with Kaczmarz's method. For a detailed description and discussion of Kaczmarz's method, see [4-6, 8, 10, 11, 13, 15, 20, 21, 31, 33, 34, 36] and the references therein. Let r denote the rank of G. Then, 0 < r < n and G has t = n − r zero eigenvalues. Moreover, using the spectral decomposition, it is possible to express G in the form where A ∈ ℝ n×r has r orthogonal columns. Let the sequence y k , k = 0, 1, 2, ⋯, be generated by Kaczmarz's method for solving the linear system where y ∈ ℝ r denotes the vector of unknowns. Then, the following observation is well known, e.g., [4,8]. If the starting points satisfy then the equalities hold for all k. This relation implies that Kaczmarz's method obeys the rule where is the corresponding iteration matrix. The role of F w will be clarified in the coming discussions. The literature on the SOR method is huge and includes various types of convergence results. However, many of the early results are derived under certain assumptions on G, such as being an "M-matrix" or "consistently ordered," e.g., [1, 17-19, 35, 37, 41]. One of the first results without such assumptions is Ostrowski's theorem [29,37], which ensures that ρðH w Þ < 1 whenever G is positive definite. Yet, it took about forty years until Oswald [30] lowered the bound on ρðH w Þ. See also Axelsson [1] (pp. 241-242) for a similar bound. The current paper extends Oswald's bound to the positive semidefinite case. The difficulty here is that ρðH w Þ = 1. So we need to see what dominates the rate of convergence in this case.
The interest in the semidefinite case was initiated in the work of Keller [22]. Then, the surprising behavior of the SOR method when solving inconsistent linear systems was studied in a small number of papers, e.g., [2,7,8,28]. It is shown there that if the linear system (1) is inconsistent then although the SOR sequence fx k g diverges it obeys the rule where fx k g is a converging sequence and v is a fixed vector that belongs to NullðGÞ = NullðA T Þ. Otherwise, when the linear system is consistent, v = 0 and x k =x k . This explains why the related Kaczmarz sequence (11) always converges. The convergence properties of iterative methods for solving consistent positive semidefinite linear systems have attracted the attention of several authors. See, for example, [2,3,7,8,14,[22][23][24][25][26]40], and the references therein. In particular, if G is positive semidefinite as above then the SOR iteration matrix, H w , is known to be "semiconvergent". Let λ j , j = 1, ⋯, n, denote the eigenvalues of H w and assume that they are sorted to satisfy which implies that ρðH w Þ = jλ 1 j. Then "semiconvergent" means that λ 1 = 1, and that any eigenvalue of H w that satisfies jλ j j = 1 must equal 1 and has a 1 × 1 Jordan block. Furthermore, since G has t zero eigenvalues, the eigenvalues of H w satisfy λ j = 1 for j = 1, ⋯, t, Consequently, the Jordan canonical form of H w shows that the rate of convergence is determined by the size of jλ t+1 j, which is sometimes called the "convergence factor," e.g., [26]. This situation means that we need an upper bound on jλ t+1 j.
The bound is gained in two stages. First, we show that Then, we establish the inequality and derive a bound on ρðF T w F w Þ.

Computational and Mathematical Methods
The plan of the paper is as follows. We start by exploring the relations between the eigenvalues of H w and F w , showing that the two matrices share all the eigenvalues that differ from 1, which proves (17). Then, we study the relations between the iteration matrices of the symmetric SOR (SSOR) method and the symmetric Kaczmarz method and use these relations to simplify the expression for F T w F w . The bound on the spectral radius of this matrix is derived in the third section. The bound has the form where κ denotes the condition number of G and The condition number is defined as the ratio κ = α/β, where α is the largest eigenvalue of G and β is the smallest nonzero eigenvalue of G. The bound shows that small condition number forces fast rate of convergence, while large condition number allows slow convergence. However, as this is an upper bound, a large condition number does not force slow convergence. Hence, it is worthwhile to have a close look at the reasons behind slow convergence. This issue is discussed in Section 4. It is shown there that small nonzero eigenvalues of G are likely to cause slow asymptotic rate of convergence. Finally, in the last section, we compare our approach with former attempts to derive such bounds.

Iteration Matrices and Their Eigenvalues
The assumption that G has positive diagonal entries allows us to make the following simplification. Consider the SOR iteration for solving the normalized systemGx =b, wherẽ Then, the related iteration matrix is similar to H w . Thus, when studying the rate of convergence of the SOR method, it is possible to replace (1) with its normalized form. That is, there is no loss of generality in assuming that D = I. Hence, from now on, we assume that G has the form where I denotes the identity matrix and L is a strictly lower triangular matrix. As before, r denotes the rank of G and r < n. Consequently, G can be factorized in the form where the matrix A ∈ ℝ n×r has r orthogonal columns. Moreover, let a T i denote the ith row of A, i = 1, ⋯, n. Then, (21) implies That is, the rows of A have unit length. The SOR itera-tion splits G in the form where and Recall that w is a given relaxation parameter that satisfies 0 < w < 2. The kth SOR iteration, k = 1, 2, ⋯, starts with x k−1 and ends with x k , which is computed by solving the linear system In other words, x k is obtained from x k−1 by the rule where is the related iteration matrix, and Observe that (24) enables us to express H w in the form Multiplying (28) by A T and using (11) gives while substituting This means that the iteration matrix of Kaczmarz's method has the form Note that H w is an n × n matrix while F w is an r × r matrix. However, as the next theorem shows, these matrices share several eigenvalues.

Theorem 2.
Let λ be a nonzero eigenvalue of the matrix B −1 w AA T ; then, λ is also an eigenvalue of the matrix A T B −1 w A . Conversely, let λ be a nonzero eigenvalue of A T B −1 w A; then, λ is also an eigenvalue of B −1 w AA T .

Computational and Mathematical Methods
Proof. Let u ∈ ℝ n be a unit eigenvector of B −1 w AA T that corresponds to a nonzero eigenvalue λ. Then, the equality implies A T u ≠ 0, and multiplying this equality by A T gives which means that λ is an eigenvalue of A T B −1 w A. The converse direction is proved in a similar way. Let v ∈ ℝ r be a unit eigenvector of A T B −1 w A that corresponds to a nonzero eigenvalue λ. Then, the equality implies Av ≠ 0, and which means that λ is an eigenvalue of B −1 w AA T .
We have seen that the eigenvalues of H w satisfy (16). The next theorem shows that jλ t+1 j equals the spectral radius of F w . Theorem 3. The eigenvalues of H w satisfy (16) and (17) with t = n − r.
w A is nonsingular. Now the last theorem implies that the largest eigenvalue of F w equals the largest eigenvalue of H w whose modulus differs from 1.
The bounds which are derived in the next section are using the close relations between the Kaczmarz-SOR method and its symmetric version. The symmetric iteration is combined of two parts. The first one is the usual ("forward") iteration, while the second is a "backward" iteration in which the rows of the linear system are approached in the reverse order. See, for example, [1,13,14,26,33,34,37,41]. The iteration matrix of the backward SOR method has the form where B −T w denotes the matrix ðB −1 w Þ T = ðB T w Þ −1 . This implies that the iteration matrix of the backward Kaczmarz method isF Consequently, the iteration matrix of the symmetric SOR method (SSOR in brief) has the form while that of the symmetric Kaczmarz method is The next assertion expresses these matrices in a useful form.

Theorem 4. The iteration matrix of the SSOR method has the formH
while that of the symmetric Kaczmarz method is Proof. The second equality is a direct consequence of the first one, which is derived from the following identities.
The importance of the last theorem is that it gives a better insight into the eigenvalues of these matrices. In particular, by following the proof of Theorem 2, we obtain the following conclusions.
Computational and Mathematical Methods The next section uses these results to derive upper bounds on ρðF w Þ and ρðF T w F w Þ.

Upper Bounds on the Spectral Radius
Let T ∈ ℝ n×n be an arbitrary square matrix, and let kTk 2 denote the spectral norm of T. Then, it is well known that kTk 2 = σ 1 ðTÞ where σ 1 ðTÞ denotes the largest singular value of T. It is also well known that the spectral radius of T cannot exceed its spectral norm. That is, ρðTÞ ≤ kTk 2 and Combining these relations with Corollary 6 yields the following useful observation. Theorem 7. The Kaczmarz iteration matrix, F w , and the symmetric Kaczmarz iteration matrix, F T w F w , satisfy the relations The rest of this section is aimed at deriving an "effective" upper bound on the right hand side of (50). In particular, we are looking for a bound that shows how the condition number of A affects the rate of convergence. The first step is to establish a lower bound on the value of θ.
Let β > 0 denote the smallest nonzero eigenvalue of G. Then, β is also the smallest eigenvalue of the matrix A T A. Note that the smallest eigenvalue of the matrix B −T w B −1 w is 1 /kB w k 2 2 . Now from (47), we see that where the last inequality follows from the triangle inequality for the matrix B w = I/w − L. Observe that ∥L∥ 2 is not expected to be much larger than ∥G∥ 2 . Indeed, using induction on n, one can verify that For a detailed proof of this assertion, see Oswald [30]. Let α = ∥G∥ 2 denote the largest eigenvalue of G. Then, (52) can be rewritten as where γ = 1 2 log 2 2n ð Þ: ð54Þ Combining (51) with (53) gives and from (50), we obtain that for any w from the interval ð0, 2Þ. A further improvement is gained by noting that the bound function has a unique minimizer in this interval. Computing the derivative of φðwÞ and eliminating w ′ from the equality φ ′ ðw ′ Þ = 0 shows that the minimizer lies at the point It is also easy to verify that while the assumption that A has unit rows implies α ≥ 1 and In other words, the spectral radii of F w′ and F T w ′ F w′ satisfy the inequalities where c = 1 + 2γ = 2 + log 2 n, ð62Þ and κ = α/β denotes the condition number of G. The bound on ρðF w′ Þ can be simplified by using the inequality which shows that Let w″ ∈ ð0, 2Þ be a value of w for which ρðF T w F w Þ attains its smallest value. That is, w″ is an optimal relaxation parameter for the symmetric Kaczmarz method. Then, clearly, 5 Computational and Mathematical Methods and the inequalities (61) and (64) remain valid when w ″ replaces w′. Similarly, let w opt denote the optimal relaxation parameter for Kaczmarz's method. Then, the inequality implies and Since α is often considerably larger than 1, the point w ′ = 1/ð1 + γαÞ is often much smaller than 1. On the other hand, in many cases, w opt is larger than 1, and the function ψðwÞ = ρðF w Þ is decreasing in the interval ½w ′ , w opt , which implies that the bound is likely to hold for all w ∈ ½w ′ , w opt , including w = 1.
The main consequence from these bounds is that small condition number forces fast rate of convergence, while large condition number allows slow convergence. Yet the bounds are not tight in the sense that the actual rate of convergence is often considerably faster than the implied rate. This behavior is due to a number of reasons. First, in many symmetric positive semidefinite matrices, the ratio ∥L∥ 2 /∥G∥ 2 is considerably smaller than γ. Second, as noted above, w ′ is expected to be considerably smaller than 1, so the rate of convergence for w opt (or w = 1) is expected to be much faster. Third, let P ∈ ℝ n×n be an arbitrary permutation matrix and consider the SOR method for solving the linear system Then, since the iteration matrix of PGP T is not necessarily similar to that of G, we might get a different rate of convergence, e.g., [27,32,37]. On the other hand, since PGP T has the same eigenvalues as G, both matrices share the same upper bound. This shows that the bound holds for the worst possible ordering.
Finally, we note that the above treatment of the positive semidefinite case is easily adapted to the positive definite case. In the latter case, A is an n × n invertible matrix, and F w is similar to H w , so the bounds on ρðF w Þ apply to ρð H w Þ.

Slow Rate of Convergence
The bounds derived in the former section indicate that slow rate of convergence is possible only when G has a large condition number. On the other hand, the assumption that G has unit diagonal implies that the largest eigenvalue of G satisfies 1 ≤ α ≤ n. Consequently, a large condition number occurs whenever G has small positive eigenvalues. These observations raise the question of whether small positive eigenvalues are the reason behind slow convergence. Indeed, as explained below, a small positive eigenvalue may cause slow rate of convergence. The first two lemmas provide the tools for proving this claim.
Proof. The definition of B w implies that B w = ð1/wÞI − L and Now the equality Lemma 9. Let β denote a nonzero eigenvalue of G and let v denote the corresponding unit eigenvector. That is, Gv = βv, v T v = 1, and β > 0. Then Proof. Using (71) and the fact that the matrix H T w GH w is positive semidefinite, we obtain the inequalities and Computational and Mathematical Methods

Theorem 10.
Let v be an eigenvector of G as above, and let u and η be defined by the equalities Then where η satisfies and Proof. A further use of the equality while from (74) we see that proves (80).
One consequence of (79) is that small β implies small η, while (80) shows that for small η the error component in the direction of v decays slowly. That is, small β leads to slow error decay. Another consequence is that small τ (which means w close to 2) may compensate the slowing effect of small β.

Concluding Remarks
The SOR method and Kaczmarz's method have been intensively studied for many years. Thus, naturally, some of the mentioned results can be found elsewhere in different forms. In particular, the Kaczmarz iteration matrix (34) and the symmetric Kaczmarz iteration matrix (45) both easily come out as special cases of a more general iteration, see [13] (prop. 4 and 10). Also, the relation between the spectral radius of Kaczmarz and symmetric Kaczmarz (first part of Theorem 7) is already observed in [13] ( §4).
Estimates of the rate of convergence in the semidefinite case are derived in a series of papers by Lee et al. [23,24,40]. However, these estimates have a different flavor, as they are not using the eigenvalues of G or its condition number.
More recently Oswald and Zhou [31] have used the concept of stable Hilbert splittings to develop a unified approach for studying the convergence of multiplicative Schwartz methods. This approach was used in [31] to derive upper bounds on the rate of convergence of Kaczmarz's method, and later in [32], it was modified to bound the SOR convergence in the semidefinite case.
The current treatment of the semidefinite case is quite different. It is based on direct arguments from linear algebra, such as the Jordan canonical form of H w and the relations between the eigenvalues of H w and F w . This simplifies the proof and adds important insight into the semidefinite case.
The upper bounds on the convergence factor explain why small condition number ensures fast convergence. Another related question is whether and why large condition number leads to slow convergence. The analysis in Section 4 provides a convincing explanation. Yet, as this is the first attempt to resolve this enigma, there may be further ways to answer this question.
The relation between the condition number and the rate of convergence stands behind the "Kaczmarz anomaly" phenomenon [10,11]. The Kaczmarz-SOR method is often considered as a prototype of more sophisticated methods from the families of Row-action methods [5,6,9], Projection methods [6,33,36], Column-action methods [8,12,38], and Coordinate-descent algorithms [39]. This suggests that other members of these families may share similar asymptotic behavior. Examples that illustrate this connection are described in [9].

Data Availability
No data were used to support this study.

Conflicts of Interest
The author declares that there are no conflicts of interest.