Application of a Sparsity Pattern and Region Clustering for Near Field Sparse Approximate Inverse Preconditioners in Method of Moments Simulations

This document presents a technique for the generation of Sparse Inverse Preconditioners based on the near field coupling matrices of Method of Moments simulations where the geometry has been partitioned in terms of regions. A distance parameter is used to determine the sparsity pattern of the preconditioner. The rows of the preconditioner are computed in groups at a time, according to the number of unknowns contained in each region of the geometry. Two filtering thresholds allow considering only the coupling terms with a significant weight for a faster generation of the preconditioner and storing only the most significant preconditioner coefficients in order to decrease the memory required. The generation of the preconditioner involves the computation of as many independent linear least square problems as the number of regions in which the geometry is partitioned, resulting in very good scalability properties regarding its parallelization.


Introduction
Many of the modern approaches for electromagnetic analysis based on the Method of Moments (MoM) [1] rely on the idea of only storing the near field coupling terms of the impedance matrix, which typically extends about onequarter of the wavelength under analysis.The coupling effects between distant parts of the geometry are taken into account in the matrix-vector product computation as a part of the iterative solution process, and the specific manner in which this is carried out depends on each approach.Some popular schemes are based on the use of the Multilevel Fast Multipole Algorithm (MLFMA) [2], which includes the processes of aggregation, translation, and disaggregation of multipole expansions in the computation of such products.Other schemes use matrix compression techniques [3,4] to compute these multiplications efficiently.All these methods circumvent the burden of storing the full MoM matrix, which would easily surpass the memory capacity and cripple the efficiency for the analysis of even moderately sized problems.
Since the full coupling matrix is no longer to be calculated and due to the size of the system to be computed, iterative solvers play a major role in modern simulation methods, although it is worthwhile to mention that some approaches extend the range in which direct solvers can be applied, such as those based on the use of Macro Basis functions [5], which reduce the number of unknowns of the original problem.The necessity of iteration, however, is unavoidable for electrically large or very large problems and with the use of iterative solvers arises the problem of slow convergence for some cases, due to reasons of different nature, such as geometrical features of the model, multiscale problems, or to the intrinsic electromagnetic behavior, such as the presence of resonant regions within the geometry.In any case the use of preconditioners is very important in order to ease the iterative process, which, in many cases, accounts for most of the total simulation time.
It is noteworthy to mention a family of preconditioners that are based on the physical properties of the problem and more specifically on the idea that inverse matrix that is approximately represented by the preconditioner is essentially an approximate solution of the electromagnetic problem.Previous works have been published for the efficient generation of physical-based preconditioners considering quasiplanar structures [6] or exploiting the quasiplanar electromagnetic behavior of conducting rough surfaces [7].A method for the generation of a preconditioner for problems involving multiple scatterers, based on splitting the system matrix according to the types of material of the subdomains as well as currents on different parts, is shown in [8].
Another group of preconditioners that are also commonly used in the context mentioned above rely on the numerical manipulation of the system matrix rather than on physical properties of the problem under analysis.The Incomplete LU (ILU) decomposition [9][10][11] and the Sparse Approximate Inverse (SAI) [12][13][14] show a good behavior and are extensively used at the present time.The ILU preconditioner is considered slightly more efficient for the same amount of data [15], although the SAI approach is much more scalable, since it can be computed by solving independent linear least square (LLS) systems that generate the rows of the preconditioner.This property renders the SAI preconditioner very well suited to be used in very large problems that require a fair number of processing nodes.It should be noted, however, that modern ILU preconditioners based on multifrontal/multilevel schemes allow better performance than the conventional ILU approach while offering a good compromise between robustness and efficiency [16][17][18].The application of multilevel ILU preconditioning techniques applied to electromagnetic problems involving the Electric Field Integral Equation has been documented in [19].
The preconditioners based on the ILU and SAI approaches are often generated considering only the near field part of the coupling matrix, which includes the strongest interactions between basis and testing functions.This allows a fast generation and reasonably reduced size, although the performance of such preconditioners may be lacking in the analysis of problems in which there are strong interactions between parts of the geometry that are physically distant, like the interaction between the feed of a reflector antenna and the main reflector or between parts of certain cavities.
The most common approach in the application of the SAI preconditioner involves the definition of a sparsity pattern, understood as the pattern of the elements of the preconditioner that are not null, which is the same as that of the near field coupling matrix [12].In this work we propose an improved strategy based on the clustering of the least squares systems that fall under each near field region as well as a distance threshold parameter that allows a finer control of the sparsity pattern of the SAI preconditioner.We have obtained noticeable improvements on the convergence properties of the conventional use of the SAI preconditioner compared to the approach described here.Illustrative examples are given in the Numerical Results.

Description of the Approach
The application of the MoM is based on the solution of a linear system of equations as follows: where In order to incorporate the near field preconditioning scheme proposed in the present work, it is useful to write the MoM equation separating the coupling matrix into the near field term ([ NF ]) and the far field term ([ FF ]) as shown as follows: where [] is the preconditioner.This matrix should ideally be as close as possible to the inverse of However, it is very important to keep the generation procedure scalable and efficient both in terms of size and CPU time.
Under the assumption that the near field part of the coupling matrix is stronger than the far field term, it is possible to rewrite (3) in the following fashion: (4)

Region Clustering in the Generation of the Preconditioner. A commonly used SAI preconditioner relies on the generation of [𝑀] by minimizing the Frobenius norm of {[𝐼] − [𝑀][𝑍 NF ]}, where [𝐼]
is the identity matrix.The scalability properties of this preconditioner arise from the possibility of decomposing such expression into the minimization of the norm of the difference between each row of the identity matrix and the preconditioner multiplied by the near field coupling matrix as follows: where  indicates the total number of basis functions of the problem, e  stands for the th row of the identity matrix, and m  denotes the th row of the preconditioner It should be remarked that the technique described in this work relies on the compartmentalization of the computational space in terms of regions, so that only the coupling terms between subdomain functions located in the same or in adjacent regions are to be computed.This is a widely adopted scheme by many modern methods.Under this context, it is possible to take advantage of the reutilization of the same LLS matrix to obtain as many rows of the preconditioner as the number of the unknowns contained in each region.This can be illustrated by rewriting (5) in the following manner: where  indicates the total number of regions of the problem.
Considering at this point that   is the number of subdomain functions contained in region  and that    represents the absolute index of the th subdomain contained in region , the [  ] matrix in (6) consists of   rows of the unit matrix and, analogously, [  ] contains   rows of the preconditioner:

Parametric Sparsity Pattern.
Noting that [  ] will be generally a full matrix it is necessary to define a sparsity pattern that enforces null values for some of its coefficients.
It is common to see approaches in the literature that consider such a sparsity pattern to be identical as that of the original near field coupling matrix [12].We introduce an alternative approach by defining in this context the sparsity distance parameter  as the distance threshold that determines whether the subdomains contained in a region are to be included in the sparsity pattern.Let us denote for this purpose   as the region in which subdomain  is contained and (  ,   ) as the distance between the centres of regions   and   , where  an  are two arbitrary subdomains.The sparsity mask matrix for region  is defined as where The [Ω]   matrix, applied for all the regions, defines the sparsity pattern used to generate the preconditioner.With this consideration, the right hand side of expression ( 6) can be approximated as where [ M ] is an approximation of [  ] obtained as the solution of the LLS where the matrices marked with the tilde overscore denote the application of the sparsity pattern by computing the Hadamard product with the sparsity mask and eliminating the null rows and columns of the result: In this expression ⊙ denotes the element-wise matrix multiplication operator and ℵ([]) represents the matrix that results from the elimination of the null rows and columns of [].The term ℶ   ([ NF ]), in turn, is an operator that returns a filtered version of [ NF ] discarding the coefficients with a module lower than  times the largest self-impedance in the region where they belong.The  parameter, named preprocessing threshold parameter through the rest of this document, allows a reduction of the size of the LLS problems to be solved for the generation of the preconditioner by eliminating low-magnitude rows and columns, resulting in faster computation.Considering these comments ℶ   ([ NF ]) can be written as follows: International Journal of Antennas and Propagation with 0 ≤  ≤ The preconditioner, therefore, can be computed by calculating the result of  independent LLS problems, which can be distributed among an arbitrary number of processing nodes, and assembling the preconditioner using the partial [ M ] matrices.It is convenient, however, to filter these results to retain only the elements that pose a significant weight for each row of the preconditioner.Experimental results have shown that for typical applications the size of the preconditioner can be reduced at least by an order of magnitude without a compromise in the convergence performance.Expression (12) illustrates the row filtering of [ M ] introducing the postprocessing threshold parameter : where Ψ  ([ M ]) retains the elements with an absolute value over  times the largest element in each row of the preconditioner: , where The sparsity distance parameter  introduced in this section allows a fine control of the size of the preconditioning matrix.Large values (similar to the size of the geometry) imply no restrictions in the sparsity pattern of the resulting preconditioner, which can lead to the computation of the exact inverse of [ NF ] with a prohibitive computational cost.Low values, around the size of the regions in which the geometry has been previously partitioned, render an approximate inverse with the same size as [ NF ] that can be computed fast.These sizes can then be modified by the filtering threshold  that discards the least significant terms.Our experience, as shown in the next section, shows that sparsity distance parameter values ranging from 0.5  to  present a good tradeoff between size and performance.

Numerical Results
Some test cases have been selected in this section in order to illustrate the performance of the approach described in the present document.All the simulations have been run on an HP Z820 workstation using 16 Intel Xeon E5-2660 2.2 GHz processing cores and 128 GB of RAM.In all the cases we have applied the Method of Moments combined with the Multilevel Fast Multipole Algorithm using 0.25  as the region size unless otherwise indicated.The SAI preconditioners have been obtained distributing the computation of [ M𝑟 ] shown in expression ( 13) for all the regions of the problem between the computing nodes using the OpenMP paradigm [20].The Generalized Minimal Residual (GMRES) iterative solver [21] with a restart parameter of 300 has been used for all the simulations presented in this section.The parallelization scheme of the SAI preconditioner followed in the simulations shown in this section consists of a loop that iterates over the regions in which the geometry has been previously partitioned.For each iteration the corresponding LLS matrix is assembled and the LAPACK numerical library [22] is used to obtain the solution.Since we use OpenMP to merely distribute the iterations of this loop over all the processing nodes, it is extremely important that the iterations are independent of each other, as is the case for the proposed preconditioner.
In the results presented in this section  represents the polar angle measured from the -axis and  denotes the azimuthal angle measured from the -axis, with 0 ∘ ≤  ≤ 180 ∘ , 0 ∘ ≤  < 360 ∘ .These angular coordinates have been illustrated in Figure 1 for the sake of clarification.
Figure 2 depicts the geometry of the PLACYL case, originally proposed for the JINA EM 2006 workshop [23], consisting of a cylinder with a radius of 0.2 m and a length of 1 m connected to a semispherical surface on one side and located on top of a plane plate that extends 1.8 m along the  direction and 1.2 m along the  direction.The gap between the cylinder and the plate is 0.02 m.The monostatic RCS of this setup has been computed at a frequency of 10 GHz for a set of 181 theta polarized incident waves corresponding to the directions given by  = 45 ∘ and  ranging from 0 ∘ to 180 ∘ .The electrical length of the cylinder at this frequency is 40  and the radius is 6.67 , while the plate is 60  long and 40  wide.The Electric Field Integral Equation (EFIE) formulation has been considered for all the surfaces of this case instead of the Combined Field Integral Equation (CFIE) in order to obtain a moderately ill-conditioned problem useful to test the performance of the preconditioners.The geometry has been meshed using a sampling rate of 10 divisions per wavelength, resulting 831,734 unknowns, and the compartmentalization of the geometry has generated 73,590 regions.The iteration error threshold of the GMRES solver has been 10 −3 .Figure 3  shows the results obtained for this case.Table 1 shows the CPU time required for the generation of the preconditioner as well as the memory for its storage considering different configurations and the average number of iterations per monostatic direction necessary to compute the solution.The values of the SAI parameters (sparsity distance , preprocessing threshold , and postprocessing threshold ) defined in the approach presented in this work are also included in the table.
The results plotted in Figure 3 show that there is a certain amount of specular reflection on the hemisphere and between the hemisphere and the plate at  = 0 ∘ that provides a RCS level that fades in a very narrow angular margin.The convergence for this direction is not especially slow because of the predominant specular effect over the high-order coupling reflections between plat and hemisphere.From  = 0 ∘ to  = 90 ∘ the backscattering RCS contributions are due to diffuse reflections on the plate, cylinder, and high-order effects between both bodies, leading to slow convergence due to the low RCS levels and strong coupling.The RCS increases very noticeably for angular values around  = 90 ∘ , due to the larger cross section of the cylinder that provides specular reflection, as well as the high-order specular reflections between the side of the cylinder and the plate and creeping effects on the cylinder.From  = 90 ∘ to  = 180 ∘ the contribution of the interaction between the flat part of the cylinder and the plate becomes increasingly stronger and for the angular margin around  = 180 ∘ the double reflection between the plate and the flat cap of the cylinder is overwhelmingly predominant, yielding high RCS values and easing the convergence rate.
It can be seen in Table 1 that the memory requirements for the storage of the preconditioner have a weak dependency on the sparsity distance parameter .This is due to the use of the postprocessing threshold  that eliminates the least significant terms of the preconditioner.In general, it is normal to see a mild increase of the size of the preconditioner as  increases because, as more coefficients are included in the sparsity pattern of the preconditioner, a number of them are probably going to have a magnitude that exceeds the threshold, adding information to the computed preconditioner.
In addition to the monostatic results shown above, it is interesting to test the behavior of the convergence of the proposed approach for a different number of configurations.For the next example we have considered the PLACYL geometry increasing the frequency to 20 GHz, resulting in an electrical length of 80  for the cylinder with a radius of 13.35  and a plate size of 120  × 80 .Now the CFIE formulation with parameter  = 0.5 has been applied for the closed body (cylinder and semisphere), and the EFIE formulation has been applied for the plane plate.In this case, meshing with the standard rate of 10 samples per wavelength, we obtain 3,358,631 unknowns and 297,132 regions.A bistatic RCS analysis has been performed for the excitation incidence given by  = 45 ∘ and  = 0 ∘ with observation directions separated by 0.1 ∘ steps between  = 0 ∘ and  = 90 ∘ .The sparsity distance parameter considered in this case has been  = 0.5 , and the preprocessing and postprocessing thresholds have been, respectively,  = 0.05 and  = 0.03. Figure 4 shows the results obtained for the -polarized incident field considering an iteration error threshold of 10 −3 .Figure 5 shows the convergence analysis of this case considering the use of the conventional SAI approach (with the same sparsity pattern as the near field coupling matrix), the SAI approach proposed in this document and without any preconditioner.
For the next example, the RCS of a realistic model of an Opel Astra car has been simulated at a frequency of 5.9 GHz.The geometry is shown in Figure 6 and has been modeled using 282 NURBS surfaces.This case consists of 2,255,500 unknowns and 200,936 regions using a sampling rate of 10 samples per wavelength.In order to analyze a more realistic problem, several surfaces have been treated as dielectric layers, defining their relative permeability   .The windows and lights have been modeled using a thin layer approximation with   = 4.The wheels have been modeled by means of a thin layer approximation using   = 7, and dielectric losses have been introduced by assigning the loss tangent tan() = 0.04, considering these values as a realistic approach for the rubber compound of the tires [24].The size of the car is approximately 4.14 meters long, 1.64 meters wide, and 1.35 meters tall, which at the working frequency corresponds to 81.5  tall, 32.25  wide, and 26.55  high.
The bistatic RCS has been computed for a -polarized plane wave impinging from  = 80 ∘ and  = 45 ∘ .The observation directions are contained in the  = 45 ∘ angular cut  ranging from 0 ∘ to 90 ∘ in 0.1 ∘ steps.The EFIE formulation has been applied, due to the geometrical and material features of this problem.Figure 7 shows the computed results.Convergence has been obtained for an error of 10 −3 in 281 iterations.Realistic cases often offer a slow convergence due to the more complex geometrical models.The inclusion of very small patches, degenerate surfaces, and other sublambda details produces an irregular mesh that favors slower convergence rates.In addition, the use of dielectric materials assigned to some of the surfaces of the model combined with PEC surfaces in the rest of the body also poses a burden to the convergence properties of the problem.In this case the conventional SAI approach did not reach convergence and stalled with an error value of 2.2 ⋅ 10 −2 .The conventional SAI preconditioner uses the same sparsity pattern as the near field coupling matrix, which is in essence equivalent to generating the preconditioner using a sparsity distance of  = 0.25  in this case.However, due to the features of this geometrical model the resulting preconditioner does not adequately resemble the inverse of [ NF ].The extended value of the sparsity distance used with the proposed approach ( = 0.75 ) allows the computation of a denser preconditioner that provides better convergence.The total simulation time

Conclusion
A new sparsity pattern and filtering approach that can be used to generate Sparse Approximate Inverse preconditioners considering the near field part for computations based on the Method of Moments have been presented in this document.The rows of the preconditioner are generated as groups that result from the solution of similar least squares problems.In addition, a parametric distance sparsity parameter allows a fine control of the amount of data to be used.The scalability properties of this preconditioner are excellent and make it well suited to solve large and realistic problems with a moderate computing time and RAM footprint.Some illustrative examples have been provided in order to test the performance of the proposed approach.
From the experience gathered by the authors, some recommended typical values of the parameters of the proposed preconditioner can be a sparsity distance ranging between 0.5  and , as well as preprocessing and postprocessing thresholds between 0.01 and 0.03.For those cases with an especially difficult convergence the sparsity distance can be raised up to 2  at the expense of higher CPU time and memory consumption.In cases where the memory consumption is a critical issue the postprocessing threshold can be increased to values around 0.05 in order to reduce the size of the preconditioner, with the possible consequence of a slightly slower convergence.

Table 1 :
Convergence performance for several SAI parameters.