Hierarchical Matrices Method and Its Application in Electromagnetic Integral Equations

Hierarchical (H-) matrices method is a general mathematical framework providing a highly compact representation and efficient numerical arithmetic. When applied in integral-equation(IE-) based computational electromagnetics, H-matrices can be regarded as a fast algorithm; therefore, both the CPU time and memory requirement are reduced significantly. Its kernel independent feature also makes it suitable for any kind of integral equation. To solve H-matrices system, Krylov iteration methods can be employed with appropriate preconditioners, and direct solvers based on the hierarchical structure of H-matrices are also available along with high efficiency and accuracy, which is a unique advantage compared to other fast algorithms. In this paper, a novel sparse approximate inverse (SAI) preconditioner in multilevel fashion is proposed to accelerate the convergence rate of Krylov iterations for solving H-matrices system in electromagnetic applications, and a group of parallel fast direct solvers are developed for dealing with multiple right-hand-side cases. Finally, numerical experiments are given to demonstrate the advantages of the proposed multilevel preconditioner compared to conventional “single level” preconditioners and the practicability of the fast direct solvers for arbitrary complex structures.


Introduction
Integral equation (IE) method [1] is widely used in electromagnetic analysis and simulation.Compared to finite difference time domain (FDTD) method [2] and Finite element (FE) method [3], IE method is generally more accurate and avoids annoying numerical dispersion problems as well as complex boundary conditions.Despite the many advantages, IE method is often involved in enormous computational consumption, since its numerical discretization leads to dense linear system.With the efforts for years, various fast algorithms had developed focusing on reducing the computational complexity for IE method such as adaptive integral method (AIM) [4], fast multipole method (FMM) [5], IE-Fast Fourier transform (IE-FFT) [6], and fast low-rank compression methods [7], Though these fast algorithms are based on different theories, they use the same idea to reduce CPU time and memory usage complexity, which is to compute and store the major entries of the dense system matrix indirectly, and employ iterative methods instead of direct methods (such as LU decomposition) to solve the system.Preconditioners are often used to accelerate the convergence of iterative methods.However, iterative methods cannot always guarantee a reasonable solution with high precision; therefore, in some complex cases, we expect to employ powerful preconditioners to obtain visible acceleration of convergence or even use direct methods to avoid this problem entirely.But to implement these ideas in traditional fast algorithms encounters some difficulties that are, once a fast algorithm is applied, the major entries of the system matrix are computed and stored indirectly, and only a few entries can be accessed to form the sparse pattern which is essential to construct the preconditioners.This condition confines the flexibility of the preconditioning; therefore, a powerful preconditioner is generally hard to get.For direct methods, accessibleness of the whole matrix entries is also prerequisite, and employing them in fast algorithms is difficult as well as flexible preconditioning.
Hierarchical (H -) matrices method [8,9] is a general mathematical framework providing a highly compact representation and efficient numerical arithmetic.Applying Hmatrices in IE method can reduce both CPU time and International Journal of Antennas and Propagation memory usage complexity significantly, so it can be regarded as a fast algorithm to IE method.Unlike the traditional fast algorithms mentioned above, any entry of H -structured system matrix can be recovered easily, though the major entries are still implicitly expressed.This quality enables us to construct more efficient preconditioners, and moreover, a fast direct solver is available since its unique format.In this paper, a novel sparse approximate inverse (SAI) preconditioner in multilevel fashion is proposed to accelerate the convergence rate of Krylov iterations for solving Hmatrices system in electromagnetic applications, which is mentioned in [10], and a group of parallel fast direct solvers are developed for dealing with multiple right-hand-side cases.
This paper is organized as follows.Section 2 gives a brief review of the IE method and basic conception of H -matrices.In Section 3, we elaborate the construction of the proposed multilevel SAI preconditioner and the implementation of parallel fast direct solvers.Numerical experiments are given in Section 4 to demonstrate the advantages of the proposed multilevel preconditioner compared to conventional "single level" preconditioners and the practicability of the fast direct solvers for arbitrary complex structures.Finally, some conclusions are given in Section 5.

H-Matrices Representation for IE Method
We first proceed with a description of the IE method for solving electromagnetic scattering problems from 3D perfectly electric conductor (PEC).For concise introduction, only the electric field integral equation (EFIE) [1] is considered.The EFIE is written as Discretize integral equation ( 1) by expanding J with local basis functions where N is the number of unknowns, denoting the vector basis functions, and x n is the unknown expansion coefficients.Applying Galerkin's method results in a matrix equation where If matrix A is in H -matrices formatted, we use A H to denote it.
Considering that the construction of proposed multilevel SAI preconditioner is built upon the structure of Hmatrices, some of its basic concepts need be reviewed at first.
The basis functions set indexed by I := {1, 2, . . ., N} can be constructed as a tree T I = (V , E) with vertex set V and edge set E. For each vertex v ∈ V , we define the set of son of v as S I (v) := {w ∈ V | (v, w) ∈ E}.The tree T I is called a cluster tree if the following conditions hold: ( Each vertex v in T I is called a cluster, representing a bunch of basis functions.Typically, each vertex has two sons.If the amount of basis functions contained in a cluster is less than a certain number of n min , that cluster has no sons, also called the leaves of T I .By rearranging the basis functions by their indices, they are numbered consecutively in each cluster, and, moreover, they are concentrated geometrically.Precisely, let Ω v denote the domain of basis functions represented by cluster v, and it is bounded by the cardinality of v: (6) in which diam(•) is the Euclidean diameter of a set.c g is a real constant; let the inequality valid for all clusters.The electromagnetic interaction of any two clusters, including self-interaction, maps certain subblock of the system coefficient matrix.Practically, most of these subblocks can be approximated by low-rank matrices with highaccuracy.Therefore, a systemic and appropriate partitioning procedure is needed for the coefficient matrix.Based on the cluster tree T I which contains hierarchy of partitions of I, we are able to construct the block-cluster tree T I×I describing a hierarchical partition of I × I by the following maps: in which the definition of S I×I (•) is similar to S I (•) that denotes the sons of certain block-cluster t × s.Admissible is a criterion which will be elaborated later and decides whether the subblock should be approximated by low-rank matrix, or split and combined by its sons, suspending for further transaction.Finally, the whole system matrix is segmented into pieces of subblocks by the procedure above.
Corresponding to the block-cluster tree, these subblocks are called the leaves of T I×I , denoted by L(T I×I ).
If two clusters are well separated geometrically, the Green function which connects the interaction of them is barely varying in their domains.That means, only a few patterns of interactional vectors can represent the whole mode; therefore, the subblock representing their interaction is rank deficit.In order to discerning these subblocks appropriately, we introduce the admissibility condition [9]: in which dist(•, •) is the Euclidean distance of two sets, Ω t and Ω s denote the domains of basis functions of cluster t, and s and η are a controllable real parameter, which is also illustrated in Figure 1.If the statement ( 8) is true, sub-block t × s is admissible, otherwise it is nonadmissible.The leaves of block-cluster tree L(T I×I ) are either admissible or nonadmissible.For admissible subblocks, we substitute them by low-rank representation through specific algorithms such as adaptive cross approximation (ACA) [11].
For nonadmissilbe subblocks, we calculate the entries and store them directly.The total computational complexity for constructing is close to O(N log N), in which N regards to the number of unknowns, and we give a numerical investigation Part A of Section 4.

Multilevel SAI Preconditioning.
If the Krylov iterative methods are used to solve the linear system, we always expect to find a high-performance preconditioner to accelerate the convergence.Generally, instead of solving the linear system of the form A H x = b, we solve it by the following forms: In which M is a sparse matrix satisfying the condition M ≈ A −1 H to a certain degree.Therefore, MA H ≈ I or A H M ≈ I can make the iterative solver more efficient.Because of limited space, we only discuss the left preconditioning case below.The right preconditioning case can be analyzed similarly.Traditional preconditioner has a fixed form that we could only execute "single level" preconditioning.Here, we elaborate how to construct a multilevel preconditioner under H -matrices format.
For a block-cluster tree T I×I in H -matrices, let L(T I×I ) denotes its depth and L(T I×I ) denotes all the leaves of the cluster tree.Any leaf block-cluster b = (t × s) ∈ L(T I×I ) belongs to certain level of T I×I ; hence, 1 ≤ level(b) ≤ L(T I×I ).We define that he finest level is the 1st level, where the smallest block cluster is located.A substructure of A H denoted by A (l)  H can be defined as follows: in which B (l) is a subset of L(T I×I ), so A (l)  H is sparse regarded as a standard matrix, which could be made use of as the primary data to form sparse approximate inverse.A trivial example of A (l)  H is that The intuitive grasp of A (l) H is shown in Figure 2. Consider an IE linear system matrix in H -matrices form, the nonzero entries distribution of A (1)  H , A (2)  H , and A (3)  H are marked as shadow area corresponding to the whole matrix.Obviously high level of A (l)  H contains more useful data of the system.Correspondingly, a set of preconditioning matrices H , which satisfied the conditions below: In practice, M (l) can be solved by the sparse approximate inverse technique, which aims to minimize I − M (1) A (l)   H F [12]. • F is the Frobenius norm of a matrix, and M (l) is subjected to a certain sparsity pattern, expressed as min where e i and m (l)  i are the row vectors of the matrices I and M (l) , respectively.G (l) represents the sparsity pattern for H .This H -matrices-formatted system is derived from an aircraft model which will be introduced in numerical experiments of Section 4.
different M (l) .Each of the subminimization problems in the right hand of ( 14) can be solved independently.
Because m (l) i in ( 14) is constrained to a certain sparse pattern G, it has many zero entries that force the corresponding rows of A (l)  H to be zero in matrix-vector multiply.Let m (l)  i denotes the subvector containing the nonzero entries of m (l)  i , and its corresponding rows of A (l) H are denoted by A (l)  H i .Besides, since A (l) H is sparse, the submatrix A (l) H i has many columns that are identically zero.By removing the zero columns, we have a much smaller submatrix A (l)  Hi .The individual minimization problem of ( 14) is reduced to a least squares problem: H that we implement the QR decomposition for solving (15).

Hierarchical Fast Direct
Methods.If the system matrix is severely ill-conditioned and even the iterative solver with powerful preconditioner cannot obtain acceptable results, fast direct solvers are good alternative for IE H -matrices.According to the arithmetic of partition matrices, the inversion of a matrix can be calculated by operating its submatrices.Considering the hierarchical structure of Hmatrices is indeed a nested quaternary submatrices structure, and a potential direct method can be made to solve the linear system.If a system matrix A can be partitioned as then its inversion can be written as where Applying this arithmetic into H -matrices, we can obtain a hierarchical inversing procedure as elaborated in Algorithm 1.
The operator and ⊕ manipulate matrices multiplication and addition which are specific for H -matrices.Reference [9] gives detailed information about these operators.By implementing submatrices partition, aggregation, and truncation of singular value decomposition (SVD), the hierarchical structure can be maintained after the manipulation of these operators, and, more importantly, both CPU time and memory cost are saved compared to conventional matrices arithmetic.Consequently, the complexity of hierarchical direct solvers is O(N 2 ) for CPU time and O(N 1.5 ) for memory usage, in contrast with O(N 3 ) and O(N 2 ) of conventional direct solver.This leads to realizing kinds of fast direct solving processes including the hierarchical inversing algorithm described above.
For numerical implementation, hierarchical inversing is not as fast as hierarchical LU decomposition, which is another fast direct method based on matrices decomposition [9,13].Considering LU decomposition for partition matrix A: matrices L and U can be figured out by the procedures: (1) computing L 11 and U 11 from the LUD L 11 U 11 = A 11 ; (2) computing U 12 from L 11 U 12 = A 12 ; (3) computng L 21 from L 21 U 11 = A 21 ; (4) computing L 22 and U 22 from the LUD Process (2) and process (3) can be refined as subprocesses of partitioned LU decomposition; therefore, by recursive applying of this procedure, the whole decomposition can be achieved.In the process of recursion, when the partitioning reaches the finest level in which submatrices are explicit expressed, conventional LU decomposition is employed.
After the decomposition done, the primary L and U are still in H -form and overwrite the original A H .Then, we can use partitioned backward/forward processes to solve the linear system which is similar to linear equation solving by conventional LU decomposition.For EFIE, we can utilize its symmetric property and then construct a fast LL T decomposition solver which is even much faster and memory saving.Considering LL T decomposition for partition matrix A: Comparing with the procedure of hierarchical LU decomposition, one step is removed because we can obtain L T  12 from trivial transposition.Therefore, the implementing of LL T decomposition is approximately one time faster than that of LU decomposition, in a recursive way.

Complexity of Constructing H -Matrices.
To investigate the complexity of constructing IE H -matrices for electromagnetic analysis, there are two themes, namely, discretization density varying and wave frequencies varying.The former one means we change the number of unknowns along with varying the mesh density regarding electromagnetic wavelength.And the latter one means we change the number of unknowns along with varying wave frequencies, but the mesh density regarding wavelength is fixed.Here, we use a PEC sphere model of 1.0 λ (wavelength) radius to test both these two themes, as shown in Figures 3 and 4.
From the investigation, we can easily see that if incident wave frequency is fixed, the complexity of both CPU time and memory usage approach to O(N log N), where N regards to the number of unknowns.If we fix the discretization density and change the frequency of incident wave, the complexity curve is close to O(N 4/3 log N).This is because the object domain contains more phase information when incident by higher frequency wave; therefore, the compression ratio of low rank submatrices in H -matrices is lower in higher frequency cases.

Iterative Solving with Multilevel SAI Preconditioner.
Firstly, a conducting sphere is used to demonstrate the improvement of the spectrum characteristics of the linear coefficient matrix by employing multilevel-SAI preconditioner (ML-SAI).Supposing M (l) is the lth level preconditioner and x (i)  rnd is the ith randomized vector, we define the regression index c (p) r of lth preconditioning level as r (l) → 0 is expected.Particularly, M (0) = I, which means no preconditioner is imposed.A sphere of 2 λ diameter, discretized by Rao-Wilton-Gliso (RWG) basis with different densities, is used to obtain the linear system matrix A H .The different regression indices of different preconditioning levels are presented in Table 1 and compared to analogous indices of conventional SAI preconditioning in MLFMA.
From Table 1, we can see that with the preconditioning level of ML-SAI increased, the regression index is decreased, which means the effect of preconditioning is getting better.And the preconditioning effect of conventional SAI in MLFMA is better than the low level ML-SAI but not as good as high level ML-SAI.It should be noted that the c (100) r (3) in case of 957 unknowns is close to zero because the H -matrices  structure of this model is only 3 levels, which means M (3) is very close to the exact inverse of the linear system.Next, a 40 λ diameter PEC sphere is solved by H -matrices method.Combine field integral equation (CFIE) is used, and the number of unknowns is 426,827.To set up the hierarchical system matrix, the memory cost is 42.6 GBytes and 26,387 seconds are used by 8 core parallel computing, on a workstation of Intel Xeon X5460 processor with 64 GBytes RAM.Generalized minimal residual (GMRES) [14] algorithm is employed as the Krylov iterative solver, and we use 1∼3 level SAI preconditioners and diagonal block preconditioners (DBPs) as reference to accelerate the convergence of iteration.From Figure 5, we can see that the RCS curve conform to the Mie series identically.And the iterative history in Figure 6 shows that ML-SAIs have a distinct impact on accelerating the convergence of iteration.
Another example is an aircraft model which is 50.75 λ long, 29.20 λ wide, and 13.57λ high, excited by a plane wave incoming from its nose with an upper 45 • angle to the center axis of its main body.The CFIE is used, and there are 412,737 unknowns to simulate the exciting surface currents as shown in Figure 7.By solving the linear system by Hmatrices method, there are 14 levels in the block-cluster tree.To set up the hierarchical system matrix, the memory  cost is 38.4 GBytes and 25,634 seconds are used by 8 core parallel computing, on a workstation of Intel Xeon X5460 processor with 64 GBytes RAM.The data of A (l)  H in the finest 3 levels are to construct preconditioners M (1) , M (2) , and M (3) , respectively.GMRES(90) is used to solve the linear system.The convergence histories of iteration with different preconditioners are presented in Figure 8.This is a little tough case because of complicated structure and relatively high frequency.If no preconditioner or just DBP is employed, the convergence is very poor.By employing ML-SAI, the iteration converges quickly under satisfied residual error, and the convergence is much better with the higher level preconditioning processed.The conventional SAI with MLFMA is also able to achieve the request, but its convergence is not as good as higher level ML-SAI.Table 2 shows the solving time and memory cost of ML-SAI, conventional SAI with MLFMA, and diagonal blocks preconditioner.The memory cost is mainly constituted by the storage of preconditioner M and Krylov subspace vectors of GMRES.In the fact that the restart number of GMRES, namely, the number of subspace vectors, is chosen the same for all cases, so the memory cost can be regarded as the scale of preconditioner M.

Hierarchical Fast Direct Solvers.
In this part, we give some numerical examples solved by fast H -matrices direct solvers.We use hierarchical LU decomposition to solve CFIE and use hierarchical LL T decomposition to solve EFIE, since utilizing its symmetric property.First, a PEC sphere model is used to investigate the accuracy of fast direct solvers.From Figure 9, we can see that the bistatic RCSs obtained by both fast LU and LL T decomposition solvers have very high precision regarding to analytic Mie series.The EFIE result has a slight bigger relative error than CFIE because the system matrix of EFIE is not diagonal dominated, consequently, the numerical error accumulated through LL T decomposition is more than that in CFIE.
Next, the computational complexity of fast LL T decomposition solver is tested via the PEC sphere model.The number of unknowns is varying along with the change of frequency of the incident wave, and the mesh discretization density is fixed at 10.0 points per wavelength.The test results are shown in Figures 10 and 11  and memory usage is close to O(N 1.5 ).This is much more efficient than conventional direct solvers.The aircraft model shown in the previous part is also used here to test fast LL T decomposition solver by employing EFIE.The incident wave is 10.0 GHz, and mesh discretization density is 10.0 points per wave number.The number of unknowns is 119,202, and 8 levels are set for H -matrices.This example is parallelly solved by a workstation equipped with dual Intel Xeon X5460 processor and 64 GBytes of RAM.2,256 seconds is used for building up the hierarchical system matrix, and the memory cost is 5,128.4MBytes.LL T decomposition time is 8,415 seconds, and 6,437.2MBytes are used after decomposition. Figure 12 presents the surface current solved by LL T decomposition.
The most notable advantage of fast direct solver is the high efficiency of handling multiple right-hand-side cases.After LU or LL T decomposition, solving multiple righthand vectors is only one-round process of forward/backward substitution, so that it is much faster than iterative solvers, which gives a full-round iteration for each vector step by step.In IE electromagnetic analysis, monostatic RCS is the typical multiple right-hand-side problem.We here use LL T   calculated.The total solving time is 468.3 seconds, contrast to MLFMA combined with GMRES iterative solver, which costs 4215.7 seconds, almost ten times of the former one.In order to compare to LL T decomposition, here, we use EFIE for iterative MLFMA.Figure 13 shows the comparing results between LL T decomposition and iteration of MLFMA, we can observe that direct solved curve is smoother than that of iterative solved and there is a big difference between them.We believe that the result of LL T decomposition is more accurate and the reason is elaborated as follow.Theoretically, the monostatic RCS curve of this intermediatefrequency example should be smooth, because the sampling is sufficiently continuous comparing to the wavelength.However, for EFIE iteration, the convergence is poor due to  the large condition number of its system matrix.Therefore, a loose relative residual error threshold 0.005 is set here, for the purpose of getting iterative result with a reasonable CPU time cost, and this causes the iterative result not very accurate.

Conclusion
The hierarchical matrice methods presented in this paper is embedded in electromagnetic IE method.Due to its special structure, we can construct a multilevel SAI preconditioner to accelerate the convergence of iterative solving, and even kinds of fast direct solvers can be made, which is not viable for the traditional IE fast algorithm.The multilevel SAI preconditioner proposed here is more efficient than conventional "single level" preconditioners, and hierarchical fast direct solvers are good alternatives to iterative solvers, very suitable for ill-conditioned system and multiple righthand-side problems.Furthermore, the kernel independence feature of hierarchical matrices method is adapted to varied electromagnetic problems without being limited to integral equation with free-space Green function.

Figure 1 :
Figure 1: Two basis function clusters domains and their distance, describe the definition of admissible condition.

4 InternationalFigure 2 :
Figure 2: The data distribution of the finest 3 levels of A (l)H .This H -matrices-formatted system is derived from an aircraft model which will be introduced in numerical experiments of Section 4.

Figure 3 :
Figure 3: Computational complexity of constructing H -matrices by fixed incident wave at 300.0 MHz.

Figure 4 :
Figure 4: Computational complexity of constructing H -matrices by fixed discretization mesh density at 8 points per wavelength.

Figure 5 :
Figure 5: The bistatic RCS of a conducting sphere with 40 λ.

Figure 6 :
Figure 6: The iterative history of solving the PEC sphere cases.GMRES(30) is used and accelerated by different preconditioners.

Figure 7 :
Figure 7: The surface current distribution of the aircraft model obtained by solving the ML-SAI preconditioned H -matrices method.The unit of surface current is A/m 2 .

Figure 8 :
Figure 8: The convergence histories of GMRES(90) with no preconditioner, DB preconditioner, and ML-SAI preconditioners of different levels.

Figure 9 :
Figure 9: The bistatic RCS solved by hierarchical fast direct solvers.CFIE is solved by LU decomposition solver, and EFIE is solved by LL T decomposition.

Figure 10 :
Figure 10: CPU time complexity of hierarchical LL T decomposition.

Figure 11 :
Figure 11: Memory usage complexity of hierarchical LL T decomposition.

Figure 12 :Figure 13 :
Figure 12: The surface current distribution of the aircraft model obtained by solving EFIE with hierarchical LL T decomposition.The frequency of incident wave is at 10.0 GHz.The unit of surface current is A/m 2 .

Table 1 :
The regression index of different levels of ML-SAI and conventional SAI preconditioning.

Table 2 :
The time and memory cost of iterative solving aircraft model with different preconditioners.