The Application of Improved Spherical Harmonics Expansion-Based Multilevel Fast Multipole Algorithm in the Solution of Volume-Surface Integral Equation

During the solution of volume-surface integral equation (VSIE), to reduce the core memory requirement of the radiation patterns (RPs) of the basis functions, an improved spherical harmonics expansion-based multilevel fast multipole algorithm (SE-MLFMA) using the mixed-potential representation and the triangle-/tetrahedron-based grouping scheme is applied. Numerical results show that accompanying with a faster speed, the memory requirement of the RPs in the improved SE-MLFMA is several times less than that in the conventional MLFMA without compromising accuracy. A result employing the OpenMP parallelization and vector arithmetic logic unit (VALU) hardware acceleration technique is also shown to illustrate the robustness and scalability of the improved SE-MLFMA method.


Introduction
In the computational electromagnetics, the method of moments (MoM) solution of volume-surface integral equation (VSIE) is an attractive approach, since it can be generally applied to the simulation of the arbitrary composite conductor-inhomogeneous dielectric objects [1][2][3][4][5].As one of the most powerful fast solvers, the multilevel fast multipole algorithm (MLFMA) can make MoM be applied to the analysis of objects with large electrical size [4][5][6][7][8].Based on the addition theorem of Green's function and diagonalization of the translation operator, MLFMA drastically reduces the overall computational complexity from the order of O(N 2 ) to O(NlogN) through three processes: aggregation, translation, and disaggregation.In the MLFMA, the far interactions between the well-separated groups are efficiently computed using the k-space integrations over the Ewald sphere.Before iterative solution, to enhance the computing efficiency, the k-space samples of the basis/testing functions, called radiation patterns (RPs), should be computed and stored prior.Because the quadrature sampling rate is determined by the bandwidth of the diagonalized translation matrix at the finest level, the RPs in the k-space are oversampled, which causes a significant amount of redundant memory usage.
To alleviate this problem, a spherical harmonics expansion-based MLFMA (SE-MLFMA) for the dyadic form of surface integral equation (SIE) was proposed [6].With the use of symmetry, the memory requirement of RPs is significantly reduced from 2N L + 1 2 c 1 c 2 bytes in the conventional MLFMA to 3N/2 P + 1 P + 2 c 1 c 2 bytes in the SE-MLFMA.Here, N is the number of unknowns, while L and P are the order of multipole expansion and the SE degree of the RPs, respectively.The constant c 1 is equal to 1 in case of the electric surface field integral equation (ESFIE) or 2 in case of the combined surface field integral equation (CSFIE), and c 2 is 8 for single precision or 16 for double precision.In [6], P for the RPs of the RWG [9] basis functions in the Cartesian coordinates is selected as P = L/2 − 1 P ≥ 2 .Further, based on the mixed-potential form of the SIE and the trianglebased grouping scheme, an improved SE-MLFMA with more efficiency was proposed [7].In contrast to the traditional SE-MLFMA, the memory requirement of RPs is further reduced to 2N t P + 1 P + 2 c 2 bytes, which is proportional to the number of triangles N t .Besides, c 1 is eliminated because the memory cost is independent of the type of SIE.Moreover, since the RP computation is operated in the Cartesian coordinates, as the same as the aggregation, translation, and disaggregation processes, the improved SE-MLFMA totally obviates the repeated transformations between the spherical and the Cartesian coordinates, which are necessary for the traditional SE-MLFMA to eliminate the Gibbs phenomenon.
Nevertheless, the previous discussions were focused on SIE.Although the applicability of SE-MLFMA in SIE has been well verified, since the numerical characteristics and expression form of VSIE are very different from SIE, we still want to know whether this method can be effectively applied to the solution of VSIE.In this paper, when the RWG and SWG [10] basis functions are used to disperse the mixed-potential form of VSIE, how to efficiently apply the improved SE-MLFMA in the solution process is shown in detail.In contrast to the conventional MLFMA and the traditional SE-MLFMA, the improved SE-MLFMA shows the following improvements: firstly, the memory requirement of RPs is only proportional to the total number of triangles and tetrahedrons rather than that of the basis functions, leading to a considerable reduction of core memory requirement.Secondly, the memory requirement does not depend on the type of the SIE part in VSIE, which means that the application of CSFIE will not cause any memory increasing; Thirdly, since the Cartesian components of RPs are permanently used during the SE procedures, the repeated transformations between Cartesian and spherical coordinates are totally eliminated, resulting in a faster implementation.

Formulation
2.1.Derivation of the Proposed Method.Using Garlerkin's type of MoM, the VSIE is discretized into a matrix equation as where Z is a N × N impedance matrix, and I and V are the N × 1 unknown current coefficient and generalized voltage vectors, respectively.In the MLFMA, Z is decomposed into two parts as where Z near and Z far denote the near-and far-interaction matrices between the basis and test functions, respectively.As we know, in the VSIE, Z far can be further decomposed as In (3), N S and N V are the number of surface and volume unknowns, respectively, α 0 < α ≤ 1 is a real constant, and η 0 is the free-space intrinsic impedance.The superscript S and V represent the type of interaction (S for conducting surface and V for dielectric volume), while the secondary subscript E or H denotes that the related entry is generated by the electric field integral equation, or that by the magnetic surface field integral equation (MSFIE).The entry in (3) can be expressed in the mixed potential form as

4
where or 1 for Q = S, f S j r and f V j r are the jth RWG and SWG basis functions, respectively, and G is the Green function.
Since RWG/SWG basis function is defined on two adjacent triangles/tetrahedrons, each double integral in (4) consists of four subintegrals of the same type, which represent the interactions between the double two triangles/tetrahedrons involved in the basis and test functions.The final result can be efficiently obtained by adding the contributions of all related triangle/tetrahedron-to-triangle/tetrahedron integrals to the basis-to-test function entry.Thus, in the following, only the subintegral of two "positive" triangles/tetrahedrons is considered, while other integrals can be obtained by simply changing the signs.If we assume that the S m /V m and S n /V n are the two "positive" triangle/tetrahedron containing the jth and ith basis functions, respectively, the contribution of the interaction between S m /V m and S n /V n to the far-interaction matrix entry in (4) will be expressed as In ( 5), c 0 = 1 if P = V as well as the jth face a j is on the boundary of dielectric bodies, in which case it is assumed that f V j r is defined only over the interior tetrahedron and that the exterior tetrahedron is fictitious, or c 0 = 0 other else.In this paper, the number of the faces of tetrahedrons constituting the boundary of dielectric bodies is counted as a part of the number of triangles.
Using the identical relation (5) can be expressed as where In the MLFMA, the far-field interactions are done through groups.It is worthy to point out that our grouping scheme is based on the triangles/tetrahedrons but not the edges/faces.In other words, the index of the leaf box to which a given triangle/tetrahedron belongs is determined by comparing its center coordinates to those of the triangle/tetrahedron's barycenter.Obviously, compared to the edge-/facebased grouping method, the triangle-/tetrahedron-based grouping scheme needs slightly more preprocessing time to collect the unknowns in various boxes at the finest MLFMA level.However, due to the utilization of binary-tree searching algorithm, whose complexity is the order of O(NlogN) for N basis functions, this increasing time is very limited.
If S m /V m and S n /V n are grouped into the m ′ th and n ′ th leaf boxes, and the barycenter coordinates of the two boxes are r m′ and r n′ , respectively, then via the addition theorem, the Green function G is transformed as with the translation operator T L [7].Thus, ( 9) and ( 6) become respectively, where the superscript " * " denotes the complex conjugate and 13 If we substitute the definition and the divergence equation of the RWG and SWG basis functions into (13), we will obtain 3 International Journal of Antennas and Propagation In (16), l ς and a are length of the ςth edge and the area of the ςth face, and A β and V β are the area of the βth triangle and the volume of the βth tetrahedron, respectively.It can be seen that the scalar S and vector V contain all information needed by the aggregation and disaggregation processes in the MLFMA, which means that they can be regarded as the RPs defined for given triangles/tetrahedrons.Further, the RPs can be expressed as a series of the spherical harmonics as where Y pq is the orthonormalized spherical harmonics [3].
The expansion coefficients are computed by Instead of the RPs, the expansion coefficients can be computed and stored in the setup of the MLFMA, which can reduce the memory requirement significantly.The remaining steps, such as the computation of SE coefficients as well as aggregation and disaggregation at the finest level using the spherical harmonics series, are done in the same way as presented in [6].However, it is important to note that if MSFIE is involved into the simulation of the objects that contained closed conducting surfaces, when all incoming waves are collected in a certain group on the finest level, the disaggregation process needs to be operated elaborately.Let and utilize the orthonormality of SE, (20) is simplified to (1) The memory requirement of RPs is further decreased.In the traditional SE-MLFMA [6], the basis functions are grouped according to the center of the edges/faces on which the basis functions are defined.As a consequence, the memory requirement of RPs is proportional to the number of basis functions N.However, the improved SE-MLFMA groups the basis functions according to the barycenter of triangles/tetrahedrons containing the basis functions, which makes that the memory requirement of RPs is proportional to the total number of triangles and tetrahedrons N t .Since generally, N t is much less than N, the memory requirement of RPs is drastically reduced.
Further, according to (16), we need to store four SE coefficients (one for scalar S and the other three for vector V ) for each triangle/tetrahedron in the conventional MLFMA.With the symmetry of the RPs, the memory requirement of RPs for the conventional MLFMA in the mixed-potential form is bytes.If we define the memory cost ratio for RPs as R M = Mem sph /Mem conv then from ( 23) and ( 24), we obtain Usually, P is equal L/2 − 1 [6,7].to the conventional MLFMA, the improved SE-MLFMA shows significant decrease in the memory requirement of RPs.
(2) When MSFIE is in the computation, no extra memory requirement of RPs will be needed.
In the traditional SE-MLFMA, when MSFIE is coupled with ESFIE to form the CSFIE to analyze the objects containing closed conducting surfaces, double times core memory for the RPs of RWG basis functions will be consumed [6].However, in the improved SE-MLFMA, due to the symmetry of the triangle's RPs, the complex conjugate of ( 16) can be reused as the receiving patterns in the disaggregation process.Therefore, the memory requirement of RPs is not dependent on the type of SIE.
(3) The repeated translation between coordinates is totally eliminated.For the dyadic representation of integral equations in spherical coordinates, when all incoming waves are collected in a certain finest box, due to the discontinuities in the spherical vector components, the Gibbs phenomenon will arise if the integral operation of the spherical harmonics representation is also evaluated spherical coordinates [6].Moreover, it is found that P can be one degree smaller if the SE is performed for Cartesian vector components instead of spherical ones [6].
From the above, for the dyadic form of integral equations, two spherical components of the RPs tangential to the Ewald surface should be firstly transformed into three Cartesian components for the SE process, which need to be transformed back to the spherical coordinate system for upward aggregation.These repeated translations between different coordinates occupy a portion of the computation time of matrix-vector product (MVP).However, in the improved SE-MLFMA, not only the RPs but also the aggregation, translation, and disaggregation processes are all operated in the Cartesian coordinates.Therefore, the improved SE-MLFMA totally obviates the need for transforming the RPs between different coordinates.

Numerical Results
To illustrate the effectiveness of the improved SE-MLFMA in VSIE, two composite closed conductor-dielectric objects are simulated.RWG and SWG basis functions are used to discretize the conducting surfaces and dielectric bodies, respectively.All matrix equations are iteratively solved by a restarted GMRES solver [11] with a simple diagonal preconditioner to reach the 0.001 convergence criterion, while the restarted number is fixed to 100.The finest box size in the MLFMA is 0.2λ.All experiments are carried out on a workstation with 2.4 GHz CPU in single precision (c 2 = 8).

3.1.
Conductor-Dielectric Cylinder.The top half of the cylinder is perfectly conducting, while the bottom half is dielectric (ε r = 2 2 − j0 00198).The object is 5λ in diameter and 10λ in height, with the conducting segment in the middle.ESFIE is adopted to model the conducting cylinder part.A moderate mesh size is chosen to generate totally 833,574 unknowns with respect to 33,490 triangles and 387,318 tetrahedrons.
For the detailed comparison of the conventional MLFMA (conv) in the mixed-potential form and the improved SE-MLFMA (imp), Table 1 shows the influence of L and P on the memory cost of RPs (Mem), the number of iterations, and the computing time per MVP.The symbol P L denotes the various selections of P with a certain L. It is observed that due to the use of SE-MLFMA, a considerable core memory is saved, while the memory requirement of RPs matches with (23) or (24) that is only proportional to the total number of triangles and tetrahedrons but not that of the basis functions.
Besides, the improved SE-MLFMA is even more efficient than the conventional MLFMA in the MVP implementation, since the aggregation of the outgoing waves and the disaggregation of the incoming waves at the finest level can also be computed in a faster way by summation of the spherical harmonics instead of the integrations.Moreover, since in the improved SE-MLFMA the whole process is totally executed in the Cartesian coordinates, the repeated transformations between Cartesian and spherical coordinates which are essential in the traditional SE-MLFMA are totally eliminated, resulting in more fast MVP implementation.Figure 1 compares the bistatic radar cross section (BiRCS) by using the conventional MLFMA and the improved with different L and P. conventional MLFMA solution with finest box (L = 7) is also presented for comparison.It is found that the results with L = 5L = 6, and L = 7 show the accuracy.This indicates that in the solution of VSIE, neither a larger multipole truncation order nor a larger finest box size is needed for the improved SE-MLFMA, both of which lead to a increase of memory consumption.Besides, for L = 5, both P = 2 and 3 yield accurate results, while P = 1 leads to an obviously unacceptable deviation.The case L = 3 also shows an unacceptable deviation but is closer to the L = 5 results than P = 1, which phenomenon is the same as stated in [2].
3.2.Coated Sphere.The BiRCS of a dielectric-covered conducting sphere in diameter of 30λ and a dielectric (ε r = 3 5) coating in thickness of 0.1λ with 368,582 triangles; 1,140,829 tetrahedrons; and 3,207,256 unknowns is computed by the improved SE-MLFMA with P = 2. CSFIE is adopted to model the conducting sphere part.Moreover, to verify the scalability of this method, both of the OpenMP parallelization and the hardware acceleration technique based on vector arithmetic logic unit (VALU) are employed [8], while 12 cores are used in the hybrid-parallel computation.The related results are shown in Figure 2. High agreement between the results from the serial code and hybrid-parallel code (OpenMP and VALU) is observed, which indicated that the improved SE-MLFMA can be well paralleled.
In this simulation, the total memory requirement was about 22.9 GB, including 280.5 MB for the RPs.The number of iterations was 547.The computing time per MVP for the serial code and the hybrid-parallel code are 48.72 s and 1.82 s, respectively, while the speedup ratio is 26.8.The total computation time for the hybrid-parallel code is 1239 s.Obviously, OpenMP and VALU techniques have significant improvement on the computing efficiency.However, the speedup of the hybrid-parallel performance is not as high as that in [8].This is because in the implementation of MLFMA, MVP is formed by two parts: far-field interactions (FFI) and near-field interactions (NFI).OpenMP can accelerate both of the two parts successfully.The FFI can be well accelerated by the VALU acceleration technique, while the NFI cannot be accelerated by it at all.Compared to SIE, VSIE needs to divide the dielectric regions into 3D tetrahedrons, which leads to the increasing number of unknowns contained by each box, and the proportion of NFI part in the MVP.Because NFI cannot be accelerated by VALU, the total speedup of MVP is reduced.

Conclusion
The improved SE-MLFMA has been successfully applied to the solution of VSIE.With the triangle-/tetrahedron-based grouping scheme and the mixed-potential representation, a considerable amount of core memory requirement of RPs has been reduced without compromising accuracy and computing speed.Numerical results are presented to verify the effectiveness of the method.At last, the salability of the improved SE-MLFMA was illustrated by using OpenMP parallelization and VALU hardware acceleration technique, which can improve the computing efficiency tremendously.International Journal of Antennas and Propagation

2 .
Advantages of the Proposed Method.The advantages of the improved SE-MLFMA are stated as follows:

Figure 1 :
Figure 1: BiRCS of a conductor-dielectric cylinder by using conventional MLFMA and proposed method with different P and L.

Figure 2 :
Figure 2: BiRCS of a coated sphere with SE-MLFMA by using the serial and hybrid-parallel codes.

Table 1 :
Memory requirement by RPs and time cost per iteration versus various L and P.