Geoelectrical Data Inversion by Clustering Techniques of Fuzzy Logic to Estimate the Subsurface Layer Model

Soft computing based geoelectrical data inversion differs from conventional computing in fixing the uncertainty problems. It is tractable, robust, efficient, and inexpensive. In this paper, fuzzy logic clustering methods are used in the inversion of geoelectrical resistivity data. In order to characterize the subsurface features of the earth one should rely on the true field oriented data validation. This paper supports the field data obtained from the published results and also plays a crucial role in making an interdisciplinary approach to solve complex problems.Three clustering algorithms of fuzzy logic, namely, fuzzy C-means clustering, fuzzyK-means clustering, and fuzzy subtractive clustering, were analyzed with the help of fuzzy inference system (FIS) training on synthetic data. Here in this approach, graphical user interface (GUI) was developed with the integration of three algorithms and the input data (AB/2 and apparent resistivity), while importing will process each algorithm and interpret the layer model parameters (true resistivity and depth). A complete overview on the three above said algorithms is presented in the text. It is understood from the results that fuzzy logic subtractive clustering algorithm gives more reliable results and shows efficacy of soft computing tools in the inversion of geoelectrical resistivity data.


Introduction
In recent years, soft computing was bound to play a key role in the earth sciences.This is in part due to the subject nature of the rules governing many physical phenomena in the earth sciences.As our problems related to nonlinear parameters of earth, it becomes too complex to rely only on one discipline and we find ourselves at the midst of information explosion interdisciplinary analysis methods.
To solve complex problems, we need to rely on knowledge based approach than standard mathematical techniques.Instead, we need to complement the conventional analysis methods with a number of emerging methodologies and soft computing techniques such as expert systems, artificial intelligence, neural network, fuzzy logic, genetic algorithm, probabilistic reasoning, and parallel processing techniques.
Recent applications of soft computing techniques have already begun to enhance our ability of estimating the subsurface features of earth.In this paper vertical electrical sounding (VES) data obtained from the field is fed as an input to the FIS training, where it generates many synthetic data necessary for clustering algorithm.After getting the synthetic data, MATLAB based program runs on the specially designed major algorithm (Figure 2).The data processing depends on various parameters, mainly the number of iterations (userdependent) and error percentage.The lowest error percent thus provides the best performance of output (true resistivity and depth) information of subsurface earth.

Geophysical Method
Schlumberger electrode array is used to study the electrical resistivity distribution of the subsurface in order to understand the groundwater conditions such as resistivity, thickness, and depth (Figure 1).Usually the depth of penetration is proportional to the separation between the electrodes and varying the electrode separation provides information about the stratification of the ground.
The apparent resistivity value depends on the electrical conductivities of different rocks and minerals.Thus electrical prospecting can be carried out to understand the subsurface earth.The data collected from the field has been interpreted using fuzzy clustering algorithms.The FIS algorithm provides the necessary database needed for interpretation.Moreover, the best model of the trained database fits with the apparent resistivity curve.The corresponding layer model will be produced as an output with lowest root mean square error in particular number of iterations.

Fuzzy Logic Applications on Geoscience Data Inversion
Fuzzy logic is considered to be appropriate to deal with the nature of uncertainty in system and human errors, which are not included in current reliability theories.The basic theory of fuzzy sets was first introduced by Zadeh [12].In recent years, fuzzy logic, or more generally, fuzzy set theory, has been applied extensively in many geophysical characterization studies.Fuzzy set theory has the ability to deal with such information and to combine it with the quantitative observations.The applications are many, including resistivity inversion, magnetic studies, seismic and stratigraphic modeling, and formation evaluation.
Nordlund [14] has presented a study on dynamic stratigraphic modeling using fuzzy logic.Cuddy [15] has applied fuzzy logic to solve a number of petrophysical problems in several North Sea fields.Fang and Chen [16] also applied fuzzy rules to predict porosity and permeability from five compositional and textural characteristics of sandstone in the Yacheng Field (South China Sea).Huang et al. [17] have presented a simple but practical fuzzy interpolator for predicting permeability from well logs in the North West Shelf (offshore Australia).The basic idea was to simulate local fuzzy reasoning [18,19] proposed by Bois in applying the use of fuzzy sets theory in the interpretation of seismic sections.

Fuzzy Inference System (FIS)
Training.The fuzzy inference system is a popular computing framework based on the concepts of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning.It has found successful applications in a wide variety of fields, such as automatic control, data classification, decision analysis, expert systems, time series prediction, robotics, and pattern recognition.
The implementation process of nonlinear mapping between AB/2 values and apparent resistivity data values by means FIS is shown in primary class training of Figure 2(a).This mapping is accomplished by a number of fuzzy if-then rules, each of which describes the local behaviour of the mapping.In particular, the antecedent of a rule defines a fuzzy region in the input space where the imported data stored will consequently specify the output in the fuzzy region.

Clustering Tool to Invert Geoelectrical Data
The idea of data grouping, or clustering, is simple in its nature and is close to the human way of thinking; whenever we are presented with a large amount of data, we usually tend to summarize this huge number of data into a small number of groups or categories in order to further facilitate its analysis.Moreover, most of the data collected in many nonlinear problems related to earth seem to have some inherent properties that lend themselves to natural groupings.Nevertheless, finding these groupings or trying to categorize the data is not a simple task for humans.This is why some methods in soft computing-clustering technique have been proposed to solve geoelectrical resistivity inversion problem.
In clustering, three algorithms were employed, namely, fuzzy -means clustering, fuzzy -means clustering, and fuzzy subtractive clustering.For clustering, a large number of synthetic databases had been developed during the FIS training which will uniquely play a role in data interpretation.

Fuzzy 𝐶-Means
Clustering.Fuzzy -means clustering (FCM) is a data clustering algorithm in which each data point belongs to a cluster to a degree specified by a membership grade.Bezdek in 1981 [20] proposed this algorithm as an improvement over hard -means clustering algorithm.Each of the  data pairs belongs to each of the  groups with a membership coefficient,   , being the membership degree of pair  to cluster .Let  2  be the distance between pair  and cluster , basically defined as the Euclidean norm and more generally as being the th data pair used for the clustering,  being a positive definite symmetric matrix, and V  being the prototype of cluster .FCM partitions a collection of  vectors obtained from the imported field data   ,  = 1, . . ., , into  fuzzy groups and finds a cluster center in each group such that a cost function of dissimilarity measure is minimized.To accommodate the introduction of fuzzy partitioning, the membership matrix  is allowed to have elements with values between 0 and 1.The normalised values of AB/2 and apparent resistivity data thus plotted with the cluster centres were shown in the GUI panel of Figures 4 and 8 for the corresponding data.However, imposing normalization stipulates that the summation of degrees of belongingness for a data set always be equal to unity: The cost function (or objective function) for FCM is then a generalization of (1): where   is between 0 and 1;   is the cluster center of fuzzy group ;   = ‖  −   ‖ is the Euclidean distance between th cluster center and th data point; and  ∈ [1, ∞) is a weighting exponent.
The necessary conditions for (3) to reach a minimum can be found by forming a new objective function  as follows: where   ,  = 1 to , are Lagrange multipliers for the  constraints in (2).By differentiating  (,  1 , . . .,   ,  1 , . . .,   ) with respect to all its input arguments, the necessary conditions for (3) to reach its minimum are The fuzzy -means algorithm is simply an iterated procedure through the preceding two necessary conditions.
Step 1. Initialize the membership matrix  with random values between 0 and 1 taken with the constraint of not being losing the essence of the field apparent resistivity data by satisfying (2).
Step 3. Compute the cost function according to (3).Stop if either it is below a certain tolerance value or its improvement over previous iteration is below a certain threshold.
Step 4. Compute a new  using (6).Go to Step 2 and repeat the steps until the performance goal is reached.
The cluster centers can also be first initialized and then the iterative procedure can be carried out.The performance depends on the initial cluster centers, thereby allowing us either to use another fast algorithm to determine the initial cluster centers or to run FCM several times, each starting with a different set of initial cluster centers.Therefore, at each iteration, different cluster centers will be formed and the centers which focus finally will fall on the exact synthetic data that correlates with the field data with minimum error percentage.

Fuzzy 𝐾-Means
Clustering.The -means clustering [21,22] has been applied to a variety of areas, including image and speech data compression [23], data preprocessing for system modelling [24], and task decomposition [23].
The -means algorithm partitions a collection of  vectors which is the input data AB/2 and apparent resistivity data   ,  = 1, . . .., , into  groups   ,  = 1, . . ., , and finds a cluster center in each group such that a cost function (or an objection function) of dissimilarity (or distance) measure is minimized.When the Euclidean distance is chosen as the dissimilarity measure between a vector   in group  and the corresponding cluster   , the cost function can be defined by where   = ∑ ,  ∈  ‖  −  ‖ 2 is the cost function within group .Thus, the value of   depends on the geometrical properties of   and the location of   .
In general, a generic distance function (  ,   ) can be applied for vector   in group ; the corresponding overall cost function is thus expressed as For simplicity, the Euclidean distance is used as the dissimilarity measure and the overall cost function is expressed as in (7).The partitioned groups are typically defined by a  ×  binary membership matrix , where the element   is 1 if the th data point   belongs to group  and 0 otherwise.Once the cluster centers   are fixed, the minimized   for (7) can be derived as follows: On the other hand, if   is fixed, then the optimal center   that minimizes (7) is the mean of all vectors in group : where For a batch-mode operation, the -means algorithm is presented with a data set   ,  = 1, . . ., ; the algorithm determines the cluster centers   and the membership matrix  iteratively using the following steps.
Step 1. Initialize the cluster center   ,  = 1, . . ., .Here in this approach it has been randomly chosen within the range of the minimum and maximum values of normalised apparent resistivity data.
Step 3. Compute the cost function according to (7).Stop if either it is below a certain tolerance value or its improvement over previous iteration is below a certain threshold.Iterations can be fixed by the user by means of GUI panel.
Step 4. Update the cluster centers according to (11).Go to Step 2 and repeat the steps until the minimum error percentage is reached.
The algorithm is inherently iterative, and the performance of the -means algorithm depends on the initial positions of the cluster centers, thereby making it advisable either to employ some front-end methods to find good initial cluster centers or to run the algorithm several times, each with a different set of initial cluster centers.So thereby providing the synthetic data will be smoothened in order to correlate with the actual field data.[25], in which data points (not grid points) are considered as the candidates for cluster centers.By using this method, the computation is simply proportional to the number of resistivity data points and independent of the dimension of the inverse problem under consideration.

Fuzzy Subtractive Clustering. The subtractive clustering technique was proposed by
Consider a collection of  data points { 1 , . . .,   } in an -dimensional space.Without loss of generality, the apparent resistivity data points are assumed to have been normalized within a hypercube.Since each data point is a candidate for cluster centers, density measure at data point   is defined as where   is a positive constant.Hence, a data point will have a high density value if it has many neighbouring data points.The radius   defines a neighbourhood; apparent resistivity data points outside this radius contribute only slightly to the density measure.
After the density measure of each data point has been calculated, the data point with the highest density measure is selected as the first cluster center.Let   1 be the point selected and   1 its density measure.Next, the density measure for each data point   is revised by the formula where   is a positive constant.Therefore, the data points near the first cluster center   1 will have significantly reduced density measures, thereby making the points unlikely to be selected as the next cluster center.The constant   defines a neighbourhood that has measureable reductions in density measure.The constant   is normally larger than   to prevent closely spaced cluster centers; generally   is equal to 1.5  .
After the density measure for each data point is revised, the next cluster center   2 is selected and all of the density measures for data points are revised again.This process is repeated until a sufficient number of cluster centers are generated.
When applying subtractive clustering to a set of inputoutput data, each of the cluster centers represents a prototype that exhibits certain characteristics of the system to be modelled.These cluster centers would be reasonably used as the centers for the fuzzy rules' premise in a zero-order Sugeno fuzzy model.For instance, assume that the center for the th cluster is   in an  dimension.The   can be decomposed into two component vectors   and   , where   is the input part and it contains the first  element of   ;   is the output part and it contains the last - elements of   .Then, given an input vector , the degree to which fuzzy rule  is fulfilled is defined by After these procedures are completed, more accuracy can be gained by using gradient descent or other advanced derivative-based optimisation schemes for further refinement.

Step by Step Procedure
The workflow of the GUI panel works on the path of algorithm description shown in the flowchart of Figure 2.
(i) Importing AB/2 and apparent resistivity data needed for interpretation can be done by the push button shown in Figure 3(a).
(ii) The corresponding data imported will be shown in the table panel of Figure 3(c).
(iii) The user optional slider (Figure 3(b)) used to estimate the number of iterations needed for the clustering tool to run.
(iv) Push button of Figure 3(d) will help the user to modify the imported table values.After editing the necessary values, the user can click the push button shown below the table of the GUI to import the modified data.
(v) Figure 3(e) provides the different clustering algorithm push buttons where the corresponding program will run after importing the necessary data.
(vi) The output layer model and cluster panel graph are shown in Figures 3(f) and 3(g), respectively.
(vii) Running message will be shown in the GUI panel of Figure 3(h).
(viii) After iterating the algorithm, the user can save the respective plots and can exit easily while pushing the corresponding push buttons.

Results and Discussions
For validating the algorithm and comparative analysis, resistivity data of different geological regions and the resultant performance will conclude the result.The performance measures show that subtractive clustering algorithm result moves more positively than the other two algorithms according to this application.In subtractive clustering technique, the nearest neighbourhood radius adjustment based on the density measure of each apparent resistivity data point was converging in each and every set of iterations, thereby concluding the good performance.If the raw field data contains more noises or field errors, the converging rate will be slow, but it can be achieved by increasing more number of iterations.Moreover, the adjustments in the dimensionality problems have already been done in each of the clustering algorithms.Therefore, there will be no dimensionality problems occurring while iterating each algorithm.Data 1 was chosen from the Singhbhum Shear Zone of Jaduguda, Jharkhand, India [26].GUI panel of Figure 3 shows the interpreted model with successful clustering classifications based on centres.
Figures 4(a)-4(c) show the cluster centres of fuzzy means, fuzzy -means, and fuzzy subtractive clustering algorithms, respectively.
The performance measure graph is shown in Figure 6.Data 2 was chosen from Kamuli district, Eastern Uganda [27].Fuzzy based inverted results of Data 2 are shown in GUI panel of Figure 7.  Figures 9 and 10 show the output panel and performance plot for data 2 (see Figure 5 for data 1 output panel).One of the major causes for successful interpretation of fuzzy subtractive clustering technique is that, each and every time while iterating the algorithm, the cluster centre has been revised on the basis of density measure of each resistivity data problem.In this geoelectrical-computational approach, it is very much essential to update the density measures of each data point, as an apparent resistivity curve can fit with many models while iterating.The algorithm stops running after getting the reliable model by limiting the root mean square error as much as possible for reducing the computational time.
The performance of these techniques was compared using the two different datasets along with their lithologs.The performance measures of data 1 and data 2 were tabulated in Tables 1 and 2, respectively, which proclaims that the subtractive clustering inversion technique seems to be the best algorithm for the inversion of geoelectrical resistivity data comparatively.On the other hand, fuzzy -means clustering suffers inconsistency in the performance.Fuzzy -means provides similar favourable accuracy on comparing with the fuzzy subtractive clustering, but it suffers in producing more     accurate results because the adjustment of cluster centers and updating it.Therefore the overall result proves that fuzzy subtractive clustering performance is satisfactory for this problem under consideration.This may cause loss of original information of the field data.Therefore the overall result proves that fuzzy subtractive clustering performance is satisfactory.

Conclusion
Three clustering algorithms, namely, fuzzy -means clustering, fuzzy -means clustering, and fuzzy subtractive clustering, have been applied in this paper for the inversion of geoelectrical resistivity data.These approaches solve the problem in categorizing the results to predict the appropriate International Journal of Geophysics layer model.The FIS training provides enough synthetic data necessary for framing the clusters.Three clustering algorithms have been implemented and tested with the field data based on the cluster centers and the results obtained from all the three algorithms do not deviate from the earlier results.It was obvious that from the inverted results the fuzzy -means algorithm fails to provide appropriate results compared to the other two algorithms.Subtractive clustering seems to be a better performance algorithm on comparing with fuzzy -means algorithm, since the density measure calculated at each number of iterations is very much helpful in modelling the subsurface parameters.Finally, the clustering techniques discussed here in this paper can be used as a standalone approach for the interpretation of subsurface layer model.

Figure 2 :
Figure 2: Flowchart showing the specially designed fuzzy clustering algorithm with (a) primary class training and (b) major class training.

Figure 2 (
b) thus provides the major class training for the inversion of synthetic data to layer model based on the following clustering algorithms.

Figures 8 (
a)-8(c) show the cluster centres of fuzzy means, fuzzy -means, and fuzzy subtractive clustering algorithms, respectively.

Figure 3 :
Figure 3: GUI panel showing the fuzzy subtractive clustering inversion for data 1.

Figure 5 :Figure 6 :
Figure 5: Output panel showing the fuzzy subtractive clustering inverted model for data 1.

Figure 7 :
Figure 7: GUI panel showing the fuzzy subtractive clustering inversion for data 2.

Figure 9 :Figure 10 :
Figure 9: Output panel showing the fuzzy subtractive clustering inverted model for data 2.
belongs to group  if   is the closest center among all centers.Since a given data point can only be in a group, the membership matrix  has the following properties: −        2 ≤        −        2 , for each  ̸ = , 0 otherwise.

Table 1 :
Accuracy of the three fuzzy clustering algorithms for data 1.

Table 2 :
Accuracy of the three fuzzy clustering algorithms for data 2.