Three-Dimensional Induced Polarization Parallel Inversion Using Nonlinear Conjugate Gradients Method

Four kinds of array of induced polarization (IP) methods (surface, borehole-surface, surface-borehole, and borehole-borehole) are widely used in resource exploration. However, due to the presence of large amounts of the sources, it will take much time to complete the inversion. In the paper, a new parallel algorithm is described which usesmessage passing interface (MPI) and graphics processing unit (GPU) to accelerate 3D inversion of these four methods. The forward finite differential equation is solved by ILU0 preconditioner and the conjugate gradient (CG) solver. The inverse problem is solved by nonlinear conjugate gradients (NLCG) iteration which is used to calculate one forward and two “pseudo-forward”modelings and update the direction, space, andmodel in turn. Because each source is independent in forward and “pseudo-forward” modelings, multiprocess modes are opened by calling MPI library. The iterative matrix solver within CULA is called in each process. Some tables and synthetic data examples illustrate that this parallel inversion algorithm is effective. Furthermore, we demonstrate that the joint inversion of surface and borehole data produces resistivity and chargeability results are superior to those obtained from inversions of individual surface data.


Introduction
IP methods are important in geophysical electrical surveys.Surface exploration is used for detecting metallic and nonmetallic minerals, using various observational techniques.Borehole-surface, surface-borehole, and borehole-borehole exploration methods are also widely used in mining field contexts and give improved results for the prospecting for ores, oil, and gas.
Many previous resistivity and IP inversion methods have been developed, including 2D least-squares inversion [1], a 3D resistivity inversion using alpha center method [2], many linear and nonlinear IP inversion methods [3,4], and a 3D IP conjugate inversion method [5].
Since 3D problems are very much larger than 1D and 2D problems, their solution is correspondingly more costly in computer time.Parallel MPI and GPU algorithms are generally used for numerical calculation.MPI is a library that is called by Fortran77, Fortran90, C, and C++ software and serves to communicate between processes.It has two basic patterns: the equal pattern and the principal-subordinate pattern [6].
GPUs are parallel processors with a large number of computation units and are superior to CPUs in both processing capability and memory bandwidth.They are cheaper and consume less power, and they are therefore used for large-scale, high-performance computation work [7,8].Programs are written in the familiar C, C++, Fortran, or an ever-expanding list of supported languages [9].CULA is a set of GPU-accelerated linear algebra libraries utilizing the NVIDIA CUDA parallel computing platform to dramatically improve the speed of advanced mathematical computation.In other words, to use these functions an NVIDIA GPU with CUDA support is required.The library contains CULA Dense and CULA Sparse.
Solution is very slow for forward and "pseudo-forward" equations using the nonlinear conjugate gradients method.Therefore, a set of 3D IP forward and inversion parallel algorithms are needed for surface, borehole-surface, surfaceborehole, and borehole-borehole techniques.This paper 2 Mathematical Problems in Engineering introduces a parallel algorithm developed by parallel programming to improve 3D NLCG IP inversion.First the theory of the NLCG algorithm for 3D IP inversion is presented.Second, implementation of the MPI and GPU parallel algorithm is briefly introduced.Finally, the efficiency of inversion codes is analyzed, with tables and synthetic data examples.

Forward
The electrical potential for arbitrary conductivity in a halfspace is given by the differential equation: where  is the conductivity of rocks (S/m); U  denotes the potential (V) in absence of IP effect; I is the electric current.
The quantity on the right-hand side is in Ampere/meter where  is the angle between the radial distances from the source point to the outward normal spatial coordinate n on the outer boundary.Then, the forward mapping operator F is defined as If ground is chargeable, then the potential U  , which is induced by the application of a constant current will be different from U  .According to Siegel's formulation [11], with the conductivity replaced by  * = (1 − ), the effect of the chargeability of ground can be modeled by direct current resistivity forward mapping operator F.Then, apparent chargeability  a can be defined by equation where  * a and  a are the apparent resistivity computed by the potential U  and U  , respectively.So the apparent chargeability can be computed by two DC resistivity modelings with conductivity  * and .
2.1.Finite Difference Method.Usually, there are two ways to discretize the differential equation: the finite difference method and the finite element method.The finite difference method was applied for 3D forward modeling in the present study.The matrix form of (1) can be written as where A is the coefficient matrix; U is the potential vector; and b is a vector containing the locations of the positive and negative current sources.There are many numerical methods for solving linear equations; of these, the ILU0 preconditioner and the BICGSTAB iterative method [12] was chosen to solve (5) in serial algorithm.The ILU0 preconditioner is an incomplete LU factorization with zero fill-in [13]: where L is the lower triangular and U is the upper triangular.

Forward Test.
In order to test the accuracy of the forward algorithm, we compared our result of using the proposed BICGSTAB method with results from the preconditioned conjugate gradient (CGPC) algorithm [14].The model in the present study consisted of a 10 Ω⋅m rectangular prism measuring 40 m × 40 m × 50 m embedded in a 100 Ω⋅m homogeneous half-space.The top of the model was located 30 m below ground surface.Four data acquisition methods were used for recording pole-pole data: surface-surface, surface-borehole, borehole-surface, and borehole-borehole (Figure 1).Their responses are shown in Figure 2, which indicates that the proposed BICGSTAB algorithm produced highly accurate results.

Inverse
Based on the result of resistivity result, nonlinear conjugate gradients IP inversion was adopted in this study since it constrains the model and is a nonlinear inversion method [15].The objective function in the problem is in two parts: where  obs is a vector of the observed data;  ai is a vector of the calculated data obtained by the forward algorithm in the th inversion iteration; C d is the random error matrix;  is the Lagrange multiplier that balances the effect of data misfit and model regularization during the iteration;  ref is the a priori model parameters vector;  is the model parameters vector at a given iteration; and L is the second-difference Laplacian.
The corresponding gradients of the objective function are expressed as where J is the Jacobian matrix and e is the vector containing data residuals.The initial search direction p 0 = h 0 is obtained from where C is the precondition factor, written as where  is a specific scalar.The search direction in the th inversion iteration can be defined: The updated step size is then obtained from The gradients and updated step are important parts of the inversion; Jp and J  C −1 d e are calculated directly as the characteristics of nonlinear conjugate gradients inversion.From ( 4) and ( 5), the sensitivity matrix J can be derived as

MPI and GPU Parallel Programing
The main parallel computations in the NLCG inversion of 3D resistivity data are the forward and pseudo-forward modeling.Forward and pseudo-forward modeling of the same source is mainly done to form the coefficient matrix A and the right-hand vector b and solve the equation in a similar fashion to (5).Because forward and pseudo-forward modeling is independent for each source, MPI parallel computation is suitable for this calculation.
We developed a parallel procedure based on the 3D NLCG serial procedure by calling Linux system interfaces with Intel Fortran complier and used the principalsubordinate pattern for programming.The main process controls the assigning and sending tasks for every process and of reclaiming and assembling the results from other processes.The subordinate processes calculate the task sent from the main process and send the result back.3D parallel inversion divides the 3D forward and pseudo-forward modeling tasks into several subordinate processes which are implemented in separate parallel execution nodes.The main process (NO.0) reads all input parameters and broadcasts command messages to all subordinate processes.The subordinate processes finished calculation and sent result back to main process.
A workstation with a GPU card (NVIDIA Tesla C2075) is very fast for computation.Our parallel code was developed using CUDA (6.0) with the solver of CULA Sparse (S6) library.The BICGSTAB iteration method used in the serial code for solving linear equations was replaced by calling CULA Sparse library that provides iterative solvers for sparse systems.Therefore, the CULA iterative matrix solver is called during forward and pseudo-forward modeling in different processes.Table 1 shows the iteration times and runtimes of six different methods.These methods choose the same ILU0 preconditioner, double precision data type, and compressed sparse row (CSR) matrix storage format."Iteration times" Mathematical Problems in Engineering   are the iteration times when achieving the convergence error 10 −6 ."Overhead time" means the time that the memory transfers to GPU. "Preconditioner time" is the time for the preconditioner generative portion of the iterative solver."Solver time" represents the time spent on iteration solver."Total time" is the sum of overhead time, preconditioner time, and solver time.It is easy to know (from Table 1) that the CG method is the fastest one; the GMRES method spends the smallest number of iteration times.In our program, overall consideration, the CSR matrix storage format, ILU0 preconditioner, and CG iteration solver were chosen for solving linear equations.The details of our parallel programming are shown in Figure 3.The flowchart of NLCG inversion shows that one forward modeling and two pseudo-forward modelings are conducted for each inversion iteration.

Examples with Synthetic Data.
To reflect the influence of borehole data in the inversion, we carried out four field scenarios for a synthetic simple resistivity model consisting of a 10 Ω⋅m and 20% chargeability rectangular prism (40 m × 40 m × 70 m) embedded in a 100 Ω⋅m and 1% chargeability background.The top of the model is 30 m below ground surface (Figure 4).The input apparent chargeability with 5% Gaussian random noise was generated by applying 3D IP forward finite difference modeling algorithm for these four tests.The reference chargeability models were a homogeneous 1% half-space in the inversion.And all the four experiments were run using a fixed Lagrange multiplier  = 5000 used to balance the effect of data misfit and model regularization.When one source starts working, all the receivers will record using pole-dipole.
For the first test, 189 current electrodes and 189 receiver electrodes were distributed equally along nine survey lines on the surface (Figure 5).Data was recorded using leftand right-side pole-dipole (AMN and MNB) in this surface resistivity method.The minimum distance between each consecutive electrode was 10 m and maximum distance was 50 m, giving a total of 33,660 data points for the entire survey.The vertical sections of resistivity and chargeability inversion result are illustrated in Figure 6 from  direction.Both the resistivity and chargeability anomalies are located near the top of the block, not at its center, and the target values of anomalies are not recovered.Furthermore, the boundary of the anomalous body is not delineated very well, especially in the bottom of block.
In the second test, the first test was repeated, but with the original distribution of electrodes and receivers enhanced by 28 pairs of additional receivers placed at similar positions in two boreholes.Thus, for each source, 28 additional independent voltage measurements were taken from two boreholes, combining the surface and surface-borehole methods.The minimum distance between borehole electrodes was 10 m, and maximum distance was 50 m (Figure 7).The total number of data points was 38,952.Figure 8 shows the vertical sections of resistivity and chargeability inversion result.Although the anomaly is not located in the center of block, it is closer than in the first test.And the boundary of the anomalous body is delineated more clearly than the result of first test.Overall, the top of block is recovered better; because of the data from borehole, the anomaly is stretched in the direction.
The third experiment combined the borehole-surface and borehole-borehole methods.The distributions of receivers were the same as in the second test, and 28 current electrodes were put in similar positions into two boreholes (Figure 9) and total 5,824 sets of voltage data were joined to the inversion calculation.Figure 10 shows improved inversion results compared to those from the first and second tests: the anomalous body appears at the center of the block, and its boundary is clearly delineated; furthermore, the bottoms of anomalies are relatively wider.
In the fourth and final test, all the receivers and current electrodes from test one to test three were used (Figure 11), combining the surface, surface-borehole, borehole-surface, and borehole-borehole methods.The entire experiment yielded 44,776 sets of synthetic data.The inversion captured the correct location of the block in the vertical sections and produced the best result of the four tests.Compared with second and third test, the boundary is more clearly seen in the vertical section (Figure 12).The magnitudes of the resistivity and chargeability values and the position of the anomaly matched the theoretical model reasonably well.

Computation Efficiency.
In order to test the GPU computation efficiency, the runtime of linear equation computation with GPU was compared with CPUs in the same method, processes, sources, and iteration times (see Table 2).Next, we tested the computation efficiency of the parallel algorithm of NLCG.Some four-way tests were carried out in two modes (MPI and MPI + GPU): (i) a parallel algorithm test using       a single machine with one process; (ii) a parallel algorithm test using two machines with one process in each; (iii) a parallel algorithm test using three machines with one process in each; and (iv) a parallel algorithm test using four machines with one process in each.All the four tests with the grid size of 67 × 67 × 65 were computed by ten inversion iteration times.Table 3 shows the statistical runtimes for the prism model inversion for the four tests."Runtime (hours)" is the time spent until all the processes stop."Percentage of serial algorithm (%)" is the time spent by the parallel algorithm as a percentage of the time spent by the serial algorithm.
In Table 2, the runtime with GPU is about 3.8 times the runtime with CPU.Table 3 shows that the efficiency of the algorithm is greatly enhanced by MPI + GPU parallel computation, especially when using four machines.The runtime using MPI + GPU is about 3.8 times the runtime using MPI alone, with the same number of machines and processes.

Conclusion
Based on the analysis of NLCG algorithm for 3D IP inversion with MPI and GPU parallel programming, we developed a parallel NLCG 3D resistivity inversion code for data generated by surface, borehole-surface, surface-borehole, and borehole-borehole IP methods.The inversion results show that data from the borehole improves the quality of the inversion and delineates the boundary of an anomalous body more clearly.The borehole data can constrain the bottom of the anomalous body very well; similarly, the surface data can constrain the top of block.We tested the parallel code using synthetic data and compared it with a serial procedure.This showed that our proposed parallel algorithm not only is effective but also greatly improved the speed of inversion.It also shows that the computing speed of a workstation with a GPU card (NVIDIA Tesla C2075) is faster than one with CPU cores.The tables demonstrate that the CG method of CULA library is suitable for double precision data type and symmetric positive definite coefficient matrix; the equation solved by the iterative method using GPU is faster than that with CPU; the parallel algorithm accelerates inversion in cases when high efficiency is required; furthermore, the use of larger numbers of machines enhances this effect.

Figure 2 :
Figure 2: Forward modeling results from the BICGSTAB and CGPC numerical methods: (a) surface method (transmitters and receivers both on surface); (b) surface-borehole method (transmitters on the surface and receivers in borehole); (c) borehole-surface method (transmitters in borehole and receivers at surface); (d) borehole-borehole method (transmitters and receivers both in borehole).

1 Figure 3 :
Figure 3: Flowchart of parallel algorithm of the 3D IP NLCG inversion.Rectangles and ovals indicate main process and subordinate process, respectively.The cards on the right-hand side of the ovals indicate that the iterative matrix solvers are called for forward and pseudo-forward modeling.

Figure 6 :
Figure 6: Inversion results for surface method in the main profile ( = 0 m); (a) shows corresponding resistivity inversion results; (b) shows corresponding inversion chargeability results.

Figure 7 :
Figure 7: Observation system for surface-borehole and surface method; (a) shows 3D graph of acquisition mode; (b) shows synthetic model and electrodes in the  direction main profile; (c) shows surface electrodes and projection of anomalous body on the ground (black crosses = current electrodes; red circles = receivers).

Figure 8 :
Figure 8: Inversion results for surface-borehole and surface method in the main profile ( = 0 m); (a) shows corresponding resistivity inversion results; (b) shows corresponding inversion chargeability results.

Figure 9 :
Figure 9: Observation system for combined borehole-surface and borehole-borehole method; (a) shows 3D graph of acquisition mode; (b) shows synthetic model and electrodes in the  direction main profile; (c) shows surface electrodes and projection of anomalous body on the ground (black crosses = current electrodes; red circles = receivers).

Table 1 :
Iteration times and runtimes of different method for linear equation computation.

Table 2 :
Statistical runtimes for linear equation computation.

Table 3 :
Statistical runtimes for the prism model inversion.