Approximating the Inverse of a Square Matrix with Application in Computation of the Moore-Penrose Inverse

This paper presents a computational iterative method to find approximate inverses for the inverse of matrices. Analysis of convergence reveals that the method reaches ninth-order convergence. The extension of the proposed iterative method for computingMoore-Penrose inverse is furnished. Numerical results including the comparisons with different existingmethods of the same type in the literature will also be presented to manifest the superiority of the new algorithm in finding approximate inverses.


Introduction
The main purpose of this paper is to present an efficient method in terms of speed of convergence, while its convergence can be easily achieved and also be economic for large sparse matrices possessing sparse inverses.We also discuss the extension of the new scheme for finding the Moore-Penrose inverse of singular and rectangular matrices.
Finding a matrix inverse is important in some practical applications such as finding a rational approximation for the Fermi-Dirac functions in the density functional theory [1], because it conveys significant features of the problems dealing with.
We further mention that in some certain circumstances, the computation of a matrix inverse is necessary.For example, there are many ways to encrypt a message, whereas the use of coding has become particularly significant in recent years.One way to encrypt or code a message is to use matrices and their inverses.Indeed, consider a fixed invertible matrix .Convert the message into a matrix  such that  is possible to perform.Send the message generated by .At the other end, they will need to know  −1 in order to decrypt or decode the message sent.
The direct methods such as Gaussian elimination with partial pivoting or LU decomposition require a reasonable time to compute the inverse when the size of the matrices is high.To illustrate further, the Gaussian elimination with partial pivoting method cannot be highly parallelized and this restricts its applicability in some cases.In contrary, Schulz-type methods, which could be applied for large sparse matrices (possessing sparse inverses [2]) by preserving the sparsity feature and can be parallelized, are in focus in such cases.
Some known methods were proposed for the approximation of a matrix inverse, such as Schulz scheme.The oldest scheme, that is, the Schulz method (see [3]), is defined as wherein  is the identity matrix.Note that [4] mentions that Newton-Schulz iterations can also be combined with wavelets or hierarchical matrices to compute the diagonal elements of  −1 independently.The proposed iteration relies on matrix multiplications, which destroy sparsity, and therefore the Schulz-type methods are less efficient for sparse inputs possessing dense inverses.However, a numerical dropping is usually applied to  +1 to keep the approximated inverse sparse.Such a strategy is useful for preconditioning.

Journal of Applied Mathematics
To provide more iterations from this classification of methods, we remember that W. Li and Z. Li in [5] proposed In 2011, Li et al. in [6] represented (2) in the following form: and also proposed another iterative method for finding  −1 as comes next: Much of the application of these solvers (specially the Schulz method) for structured matrices was investigated in [7] while the authors in [8] showed that the matrix iteration of Schulz is numerically stable.Further discussion about such iterative schemes might be found at [9][10][11][12].
The Schulz-type matrix iterations are dependent on the initial matrix  0 too much.In general, we construct the initial guess for square matrices as follows.For a general matrix  × satisfying no structure, we choose as in [13] or  0 = , where  is the identity matrix, and  ∈ R should adaptively be determined such that ‖ − ‖ < 1.
For rectangular or singular matrices, one may choose  0 =  * /(‖‖ 1 ‖‖ ∞ ) or  0 =   /(‖‖ 1 ‖‖ ∞ ), based on [16].Also, we could choose  0 :=   , where 0 <  < 2/(  ) or with 0 <  < 2/( * ) [17].Note that these choices could also be considered for square matrices.However, the most efficient way for producing  0 (for square nonsingular matrices) is the hybrid approach presented in Algorithm 1 of [18].The rest of this paper is organized as follows.The main contributions of this article are given in Sections 2 and 3.In Section 2, we analyze a new scheme for matrix inversion while a discussion about the applicability of the scheme for Moore-Penrose inverse will be given in Section 3. Section 4 will discuss the performance and the efficiency of the new proposed method numerically in comparison with the other schemes.Finally, concluding remarks are presented in Section 5.

Main Result
The derivation of different Schulz-type methods for matrix inversion relies on iterative (one-or multipoint) methods for the solution of nonlinear equations [19,20].For instance, imposing Newton's iteration on the matrix equation  =  would result in (1), as fully discussed in [21].Here, we apply the following new nonlinear solver on the matrix equation () =  −1 −  = 0: Note that ( 7) is novel and constructed in such a way to produce a new matrix iterative method for finding generalized inverses efficiently.Now, the following iteration method for matrix inversion could be obtained: The obtained scheme includes matrix power, which is certainly of too much cost.To remedy this, we rewrite the obtained iteration as efficiently as possible to also reduce the number of matrix-matrix multiplications in what follows: whereas  is the identity matrix and the sequence of matrix iterates {  } =∞ =0 converges to  −1 for a good initial guess.It should be remarked that the convergence of any order for nonsingular square matrices is generated in Section 6 of Chapter 2 of [22], whereas the general way for the rectangular matrices was discussed in Chapter 5 of [23] and the recent paper of [24].In those constructions, always a convergence order  will be attained by  times of matrix-matrix products, such as (1) which reaches the order 2 using two matrix-matrix multiplications.
Two important matters must be mentioned at this moment to ease up the perception of why a higher order (efficient) method such as (9) with 7 matrix-matrix products to reach at least the convergence order 9 is practical.First, by following the comparative index of informational efficiency for inverse finders [25], defined by  = /, wherein  and  stand for the convergence order and the number of matrixmatrix products, then the informational index for (9), that is, 9/7 ≈ 1.28, beats its other competitors, 2/2 = 1 of (1), 3/3 = 1 of ( 2)-(3), and 3/4 = 0.75 of (4).And second, the significance of the new scheme will be displayed in its implementation.To illustrate further, such iterations depend totally on the initial matrices.Though there are certain and efficient ways for finding  0 , in general such initial approximations take a high number of iterations (see e.g., Figure 3, the blue color) to arrive at the convergence phase.On the other hand, each cycle of the implementation of such Schulz-type methods includes one stopping criterion based on the use of a matrix norm, and this would impose further burden and load in general, for the low order methods in contrast to the high order methods, such as (9).Because the computation of a matrix norm (usually ‖ ⋅ ‖ 2 for dense complex matrices and ‖ ⋅ ‖  for large sparse matrices) takes time and therefore higher number of steps/iterations (which is the result of lower order methods), it will be costlier than the lower number of steps/iterations of high order methods.Hence, the higher order (efficient) iterations are mostly better solvers in terms of computational time to achieve the desired accuracy.
After this complete discussion of the need and efficacy of the new scheme (9), we are about to prove the convergence behavior of (9) theoretically in what follows.
then the iterative method (9) converges with at least ninth order to  −1 .
Proof.Let (10) hold, and for the sake of simplicity assume that  0 =  −  0 and   =  −   .It is straightforward to have Hence, by taking an arbitrary norm from both sides of (11), we obtain The rest of the proof is similar to Theorem 2.1 of [26], and it is hence omitted.Finally,  −   → 0, when  → ∞, and thus   →  −1 , as  → ∞.So the sequence {‖  ‖ 2 } is strictly monotonic decreasing.Moreover, if we denote   =   −  −1 , as the error matrix in the iterative procedure (9), then we could easily obtain the following error inequality: The error inequality (13) reveals that the iteration (9) converges with at least ninth order to  −1 .The proof is now complete.
Some features of the new scheme ( 9) are as follows.
(1) The new method can be taken into consideration for finding approximate inverse (preconditioners) of complex matrices.Using a dropping strategy, we could keep the sparsity of a matrix to be used as a preconditioner.Such an action will be used throughout this paper for sparse matrices using the command Chop[exp, tolerance] in Mathematica, to preserve the sparsity of the approximate inverses   , for any .
(2) Unlike the traditional direct solvers such as GEPP, the new scheme has the ability to be parallelized.
(3) In order to further reduce the computational burden of the matrix-by-matrix multiplications per computing step of the new algorithm when dealing with sparse matrices, in the new scheme by using Sparse Array[] command in Mathematica, we will preserve the sparsity features of the output approximate inverse in a reasonable computational time.Additionally, for some special matrices, such as Toeplitz or Vandermonde types of matrices, further computational savings are possible as discussed by Pan in [27].See for more information [28].
(4) If the initial guess commutes with the original matrix, then all matrices in the sequence {  } =∞ =0 satisfy the same property of commutation.Notice that only some initial matrices guarantee that in all cases the sequence commutes with the matrix .
In the next section, we present some analytical discussion about the fact that the new method (9) could also be used for computing the Moore-Penrose generalized inverse.

An Iterative Method for Moore-Penrose Inverse
It is well known in the literature that the Moore-Penrose inverse of a complex matrix  ∈ C × (also called pseudoinverse), denoted by  † ∈ C × , is a matrix  ∈ C × satisfying the following conditions: wherein  * is the conjugate transpose of .This inverse uniquely exists.Various numerical solution methods have been developed for approximating Moore-Penrose inverse (see e.g., [29]).The most comprehensive review of such iterations could be found in [17], while the authors mentioned that iterative Schulz-type methods can be taken into account for finding pseudoinverse.For example, it is known that (1) converges to the pseudoinverse in the general case if  0 :=  * , where 0 <  < 2/( * ) and (⋅) denotes the spectral radius.
Accordingly, it is known that the new scheme (9) converges to the Moore-Penrose inverse.In order validate this fact analytically, we provide some important theoretics in what follows.
Lemma 2. For the sequence {  } =∞ =0 generated by the iterative Schulz-type method (9) and the initial matrix (6), for any  ≥ 0, it holds that Proof.The proof of this lemma is based on mathematical induction.Such a procedure is similar to Lemma 3.1 of [30], and it is hence omitted.
Theorem 3.For the rectangular complex matrix  ∈ C × and the sequence {  } =∞ =0 generated by (9), for any  ≥ 0, using the initial approximation (6), the sequence converges to the pseudoinverse  † with ninth order of convergence.
Proof.We consider E  =   −  † , as the error matrix for finding the Moore-Penrose inverse.Then, the proof is similar to the proof of Theorem 4 in [18] and it is hence omitted.We finally have Thus, ‖  −  † ‖ → 0; that is, the obtained sequence of ( 9) converges to the Moore-Penrose inverse as  → +∞.This ends the proof.
Theorem 4. Considering the same assumptions as in Theorem 3, then the iterative method (9) has asymptotical stability for finding the Moore-Penrose generalized inverse.
Proof.The steps of proving the asymptotical stability of ( 9) are similar to those which have recently been taken for a general family of methods in [31].Hence, the proof is omitted.
Remark 5.It should be remarked that the generalization of our proposed scheme for generalized outer inverses, that is, , , is straight forward according to the recent work [32].
Remark 6.The new iteration ( 9) is free from matrix power in its implementation and this allows one to apply it for finding generalized inverses easily.

Computational Experiments
Using the computer algebra system Mathematica 8 [33], we now compare our iterative algorithm with other Schulz-type methods.For numerical comparisons in this section, we have used the methods (1), ( 3), (4), and (9).We implement the above algorithms on the following examples using the built-in precision in Mathematica 8.
Example 7. Let us consider the large complex sparse matrix of the size 30000 × 30000 with 79512 nonzero elements possessing a sparse inverse as in Algorithm 1 ( = √ −1), where the matrix plot of  showing its structure has been drawn in Figure 1(a).Methods like (9) are powerful in finding an approximate inverse or in finding robust approximate inverse preconditioners in low number of steps, in which the output form of the approximate inverse is also sparse.
In this test problem, the stopping criterion is ‖ −   ‖ 1 ≤ 10 −7 .Table 1 shows the number of iterations for different methods.As the programs were running, we found the running time using the command Absolute Timing[] to report the elapsed CPU time (in seconds) for this experiment.Note that the initial guess has been constructed using  0 = diag(1/ 11 , 1/ 22 , . . ., 1/  ) in Example 7. The plot of the approximate inverse obtained by applying (9) has been portrayed in Figure 1(b).The reason which made (4) the worse method is in the fact of computing matrix power which is time consuming for sparse matrices.In the tables of this paper "No. of nonzero ele." stands for the number of nonzero elements.
In Algorithm 2, we provide the written Mathematica 8 code of the new scheme (9) in this example to clearly reveals the simplicity of the new scheme in finding approximate inverses using a threshold Chop[exp, 10 Table 2 shows the number of iterations for different methods in order to reveal the efficiency of the proposed iteration with a special attention to the elapsed time.The matrix plot of  in this case showing its structure alongside its approximate inverse has been drawn in Figure 2, respectively.Note that in Example 8, we have used an initial value constructed by  0 =  * /(‖‖ 1 ‖‖ ∞ ).The stopping criterion is same as the previous example.
The new method ( 9) is totally better in terms of the number of iterations and the computational time.In this paper, the computer specifications are Microsoft Windows XP Intel(R), Pentium(R) 4, CPU 3.20 GHz, and 4 GB of RAM.
Matrices used up to now are large and sparse enough to clarify the applicability of the new scheme.Anyhow, for some classes of problems, there are collections of matrices that are used when proving new iterative linear system solvers or new preconditioners, such as Matrix Market, http://math.nist.gov/MatrixMarket/.In the following test we compare the computational time of solving a practical problem using the above source.
Example 9. Consider the linear system of  = , in which  is defined by A = Example Data ["Matrix", "Bai/pde900"] and the right hand side vector is b = Table [1., {i, 1, 900}].In Table 3, we compare the consuming time (in seconds) of solving this system by GMRES solver and its left preconditioned system  1  =  1 .In Table 3, for example, PGMRES- (9) shows that the linear system  1  =  1 , in which  1 has been calculated by (9), is solved by GMRES solver.The tolerance is the residual norm to be less than 10 −14 in this test.The results support fully the efficiency of the new scheme (9) in constructing robust preconditioners.
Example 10.This experiment evaluates the applicability of the new method for finding Moore-Penrose inverse of 20 random sparse matrices (possessing sparse pseudo-inverses)  ×  = 1200 × 1500 as shown in Algorithm 4.
Note that the identity matrix should then be an  ×  matrix and the approximate pseudoinverses would be of the size  × .In this example, the initial approximate Moore-Penrose inverse is constructed by )) for each random test matrix.And the stopping criterion is ||V n+1 − V n || 1 ≤ 10 −8 .We only compared methods (1) denoted by "Schulz", (3) denoted by "KMS, " and the new iterative scheme (9) denoted by "PM." The results for the number of iterations and the running time are compared in Figures 3-4.They show a clear advantage of the new scheme in finding the Moore-Penrose inverse.

Conclusions
In this paper, we have developed an iterative method in inverse finding of matrices, which has the capability to be applied on real or complex matrices with preserving the sparsity feature in matrix-by-matrix multiplications using a threshold.This allows us to interestingly reduce the computational time and usage memory in applying the new iterative method.We have shown that the suggested method (9) reaches ninth order of convergence.The extension of the scheme for pseudoinverse has also been discussed in the paper.The applicability of the new scheme was illustrated numerically in Section 4.
Finally, according to the numerical results obtained and the concrete application of such solvers, we can conclude that the new method is efficient.
The new scheme (9) has been tested for matrix inversion.However, we believe that such iterations could also be applied for finding the inverse of an integer in modulo   [34].Following this idea will, for sure, open a new topic of research which is a combination of Numerical Analysis and Number Theory.We will focus on this trend of research for future studies.

Figure 1 :
Figure 1: The matrix plot (a) and the inverse (b) of the large complex sparse 30000 × 30000 matrix  in Example 7.

Figure 2 :
Figure 2: The matrix plot (a) and the inverse (b) of the large sparse 10000 × 10000 matrix  in Example 8.

Example 8 .
Let us consider a large sparse 10000 × 10000 matrix (with 18601 nonzero elements) as shown in Algorithm 3.

Figure 3 :
Figure 3: The results of comparisons for Example 10 in terms of the number of iterations.

Figure 4 :
Figure 4: The results of comparisons for Example 10 in terms of the elapsed time.

Table 1 :
Results of comparisons for Example 7.

Table 2 :
Results of comparisons for Example 8.

Table 3 :
Results of elapsed computational time for solving Example 9.