Prediction of Hydrocarbon Reservoirs Permeability Using Support Vector Machine

Permeability is a key parameter associated with the characterization of any hydrocarbon reservoir. In fact, it is not possible to have accurate solutions to many petroleum engineering problems without having accurate permeability value. The conventional methods for permeability determination are core analysis and well test techniques. These methods are very expensive and time consuming. Therefore, attempts have usually been carried out to use artificial neural network for identification of the relationship between the well log data and core permeability. In this way, recent works on artificial intelligence techniques have led to introduce a robust machine learning methodology called support vector machine. This paper aims to utilize the SVM for predicting the permeability of three gas wells in the Southern Pars field. Obtained results of SVM showed that the correlation coefficient between core and predicted permeability is 0.97 for testing dataset. Comparing the result of SVM with that of a general regression neural network GRNN revealed that the SVM approach is faster and more accurate than the GRNN in prediction of hydrocarbon reservoirs permeability.


Introduction
In reservoir engineering, reservoir management, and enhanced recovery design point of view, permeability is the most important rock parameter affecting fluids flow in reservoir.Knowledge of rock permeability and its spatial distribution throughout the reservoir is of utmost importance.In fact, the key parameter for reservoir characterization is the permeability distribution.In addition, in most reservoirs, permeability measurements are rare and therefore permeability must be predicted from the available data.Thus, the accurate estimation of Mathematical Problems in Engineering permeability can be considered as a difficult task.Permeability is generally measured in the laboratory on the cored rocks taken from the reservoir or can be determined by well test techniques.The well testing and coring methods are, however, very expensive and time consuming compared to the wire-line logging techniques 1 .Moreover, in a typical oil or gas field, almost all wells are logged using various tools to measure geophysical parameters such as porosity and density, while both well test and core data are available only for a few wells 2, 3 .As the well log data are usually available for most of wells, many researchers attempt to predict permeability through establishing a series of statistical empirical correlation between permeability, porosity, water saturation, and other physical properties of rocks.This technique has been used with some success in sandstone and carbonate reservoirs, but it often shows short of the accuracy for permeability prediction from well log data in heterogeneous reservoirs.Proton magnetic resonance PMR is another modern approach for the prediction of the permeability as a continuous log, but it has significant technological constrains 4 .
Alternatively, neural networks have been increasingly applied to predict reservoir properties using well log data 5-7 .In this way, previous investigations 8-15 have revealed that neural network is a proper tool for identifying the complex relationship among permeability, porosity, fluid saturations, depositional environments, lithology, and well log data.However, recent works on the artificial intelligence have resulted in finding a suitable machine learning theory called support vector machine SVM .Support vector machines, based on the structural risk minimization SRM principle 16 , seem to be a promising method for data mining and knowledge discovery.It was introduced in the early 90s as a nonlinear solution for classification and regression tasks 17, 18 .It stems from the frame work of statistical learning theory or Vapnik-Chervonenkis VC theory 19, 20 and was originally developed for pattern recognition problem 21, 22 .VC theory is the most successful tool by now for accurately describing the capacity of a learned model and can further tell us how to ensure the generalization performance for future samples 23 by controlling the capacity of a model.Its theory is mainly based on the consistency and rate of convergence of a learning process 20 .VC dimension and SRM principle are the two most important elements of VC theory.VC dimension is a measure of the capacity of a set of functions, and SRM principle can ensure that the model will be well generalized.SVM is firmly rooted in VC theory and has superiority compared to the traditional learning methods.As a matter of fact, there are at least three reasons for the success of SVM: its ability to learn well with only a very small number of parameters, its robustness against the error of data, and its computational efficiency compared with several other intelligent computational methods such as neural network, fuzzy network, and so forth 24, 25 .By minimizing the structural risk, SVM works well not only in classification 26-28 but also in regression 29, 30 .It was soon introduced into many other research fields, for example, image analysis 31, 32 , signal processing 33 , drug design 34 , gene analysis 35 , climate change 36 , remote sensing 37 , protein structure/function prediction 38 , and time series analysis 39 and usually outperformed the traditional statistical learning methods 40 .Thus, SVM has been receiving increasing attention and quickly becomes quite an active research field.The objective of this study is to evaluate the ability of SVM for prediction of the reservoirs permeability.In this way, Kangan and Dallan gas reservoirs in the Southern Pars field, Iran are selected as case study in this research work.In addition, obtained results of SVM will be compared to those given by a general regression neural network GRNN .

Study Area
The Iranian South Pars field is the northern extension of Qatar's giant North Field.It covers an area of 500 square miles and is located 3,000 m below the seabed at a water depth of 65 m.The Iranian side accounts for 10% of the worlds and 60% of Iran's total gas reserves.Iran's portion of the field contains an estimated 436 trillion cubic feet.The field consists of two independent gas-bearing formations, Kangan Triassic and Dalan Permian .Each formation is divided into two different reservoir layers, separated by impermeable barriers.The field is a part of the N-trending Qatar Arch structural feature that is bounded by Zagros fold belt to the north and northeast.In the field, gas accumulation is mostly limited to the Permian-Triassic stratigraphic units.These units known as the "Kangan-Dalan Formations" constitute which are very extensive natural gas reservoirs in the Persian Gulf area and consist of carbonateevaporate series also known as the Khuff Formation 41 .Figure 1 shows the geographical position of Southern Pars gas field.

Data Set
The main objective of this study is to predict permeability of the gas reservoirs by incorporating well logs and core data of three gas wells in the Southern Pars field.As a matter of fact, the well logs are considered as the inputs, whereas horizontal permeability k h is taken as the output of the networks.Available digitized well logs are consisted of sonic log DT , gamma ray log GR , compensated neutron porosity log NPHI , density log ROHB , photoelectric factor log PEF , microspherical focused resistivity log MSFL , and shallow and deep lateroresistivity logs LLS and LLD .For the present study, a total number of 175 well logs and core permeability datasets were obtained from 3 wells of Kangan and Dalan gas reservoirs.In view of the requirements of the networks computation algorithms, the data of the input and output variables were normalised to an interval by transformation process.In this study, normalization of data inputs and outputs was done for the range of −1, 1 using 3.1 , and the number of training 125 and testing data 50 was then selected randomly: where p n is the normalised parameter, p denotes the actual parameter, p min represents a minimum of the actual parameters, and p max stands for a maximum of the actual parameters.In addition, the leave-one-out LOO cross-validation of the whole training set was used for adjusting the associated parameters of the networks 42 .

Independent Component Analysis
Independent component analysis ICA is a suitable method for feature extraction process.Unlike the PCA, this method both decorrelates the input signals and also reduces higherorder statistical dependencies 43, 44 .ICA is based on the non-Gaussian assumption of the independent sources and uses higher-order statistics to reveal interesting features.The applications of IC transformation include dimension reduction, extracting characteristics of the image, anomaly and target detection, feature separation, classification, noise reduction, and mapping.Compared with principal component PC analysis, IC analysis provides some unique advantages: i PC analysis is orthogonal decomposition.It is based on covariance matrix analysis and Gaussian assumption.IC analysis is based on non-Gaussian assumption of the independent sources; ii PC analysis uses only second-order statistics.IC analysis uses higher-order statistics.Higher-order statistics is a stronger statistical assumption, revealing interesting features in the usually non-Gaussian datasets 45, 46 .

The ICA Algorithm
Independent Component Analysis of the random vector x is defined as the process of finding a linear transform S Wx such that W is a linear transformation and the components s i are as independent as possible.They maximize a function F s 1 , . . ., s m that provides a measure of independence 46 .The individual components s i are independent when their probability distribution factorizes when there is no mutual information between them and can be mentioned as 3.2 : This approach used to minimize the mutual information involves maximizing the joint entropy H g s .This is accomplished using a stochastic gradient ascent method, termed Infomax.If the nonlinear function g is the cumulative density function of the independent components s i , then this method also minimizes the mutual information.The procedure of transforming data in higher dimension is shown in Figure 2.
Notice that the nonlinear function g is chosen without knowing the cumulative density functions of the independent components.In case of a mismatch, it is possible that the algorithm does not converge to a solution 47 .A set of nonlinear functions has been tested, and it was found that super-Gaussian probability distribution functions converge to an ICA solution when the joint entropy is maximized.The optimization step in obtaining the independent components relies on changing the weights according to the entropy gradient that can be expressed as 3.3 : where E is the expected value, y g s 1 • • • g s n T , and |J| is the absolute value of the determinant of the Jacobian matrix of the transformation from x to y.From this formula, it is shown that ΔW can be calculated using Equation 3.4 involves the matrix inversion, and thus an alternative formulation involving only simple matrix multiplication is preferred.This formula can be mentioned as 43, 47 : 3.5

Support Vector Machine
In pattern recognition, the SVM algorithm constructs nonlinear decision functions by training a classifier to perform a linear separation in high-dimensional space which is nonlinearly related to input space.To generalize the SVM algorithm for regression analysis, an analogue of the margin is constructed in the space of the target values y using Vapnik's ε-insensitive loss function Figure 3 48-50 : Only the samples out of the ±ε margin will have a non-zero slack variable, so they will be the only ones that will be part of the solution 50 .
To estimate a linear regression, where w is the weighting matrix, x is the input vector, and b is the bias term.With precision, one minimizes where C is a trade-off parameter to ensure that the margin ε is maximized and error of the classification ξ is minimized.Considering a set of constraints, one may write the following relations as a constrained optimization problem: Subject to According to relations 3.10 and 3.11 , any error smaller than ε does not require a nonzero ξ i or ξ i and does not enter the objective function 3.9 51-53 .By introducing Lagrange multipliers α and α and allowing for C > 0, ε > 0 chosen a priori, the equation of an optimum hyper plane is achieved by maximizing the following relations:

Kernel function Type of classifier
Gaussian RBF kernel with parameter σ which controls the half-width of the curve fitting peak where x i only appears inside an inner product.To get a potentially better representation of the data in nonlinear case, the data points can be mapped into an alternative space, generally called feature space a pre-Hilbert or inner product space through a replacement: The functional form of the mapping ϕ x i does not need to be known since it is implicitly defined by the choice of kernel: k x i , x j ϕ x i •ϕ x j or inner product in Hilbert space.With a suitable choice of kernel, the data can become separable in feature space while the original input space is still nonlinear.Thus, whereas data for n-parity or the two-spiral problem is nonseparable by a hyperplane in input space, it can be separated in the feature space by the proper kernels 54-59 .Table 1 gives some of the common kernels.
Then, the nonlinear regression estimate takes the following form: where b is computed using the fact that 3.5 becomes an equality with ξ i 0 if 0 < α i < C and relation 3.6 becomes an equality with ξ i 0 if 0 < α i < C 60, 61 .

General Regression Neural Network
General regression neural network has been proposed by 62 .GRNN is a type of supervised network and also trains quickly on sparse data sets, rather than categorising it.GRNN applications are able to produce continuous valued outputs.GRNN is a three-layer network with one hidden neuron for each training pattern.GRNN is a memory-based network that provides estimates of continuous variables and converges to the underlying regression surface.GRNN is based on the estimation of probability density functions, having a feature of fast training times, and can model nonlinear functions.GRNN is a one-pass learning algorithm with a highly parallel structure.GRNN algorithm provides smooth transitions from one observed value to another even with sparse data in a multidimensional measurement space.The algorithmic form can be used for any regression problem in which an assumption of linearity is not justified.GRNN can be thought as a normalised radial basis functions RBF network in which there is a hidden unit centred at every training case.These RBF units are usually probability density functions such as the Gaussian.The only weights that need to be learned are the widths of the RBF units.These widths are called "smoothing parameters."The main drawback of GRNN is that it suffers badly from the curse of dimensionality.GRNN cannot ignore irrelevant inputs without major modifications to the basic algorithm.So, GRNN is not likely to be the top choice if there are more than 5 or 6 nonredundant inputs.The regression of a dependent variable, Y , on an independent variable, X, is the computation of the most probable value of Y for each value of X based on a finite number of possibly noisy measurements of X and the associated values of Y .The variables X and Y are usually vectors.In order to implement system identification, it is usually necessary to assume some functional form.In the case of linear regression, for example, the output Y is assumed to be a linear function of the input, and the unknown parameters, a i , are linear coefficients.
The method does not need to assume a specific functional form.A Euclidean distance D 2 i is estimated between an input vector and the weights, which are then rescaled by the spreading factor.The radial basis output is then the exponential of the negatively weighted distance.The GRNN equation can be written as: where σ is the smoothing factor SF .The estimate Y X can be visualised as a weighted average of all of the observed values, Y i , where each observed value is weighted exponentially according to its Euclidian distance from X. Y X is simply the sum of Gaussian distributions centred at each training sample.However, the sum is not limited to being Gaussian.In this theory, the optimum smoothing factor is determined after several runs according to the mean squared error of the estimate, which must be kept at minimum.This process is referred to as the training of the network.If a number of iterations pass with no improvement in the mean squared error, then smoothing factor is determined as the optimum one for that data set.While applying the network to a new set of data, increasing the smoothing factor would result in decreasing the range of output values 63 .In this network, there are no training parameters such as the learning rate, momentum, optimum number of neurons in hidden layer and learning algorithms as in BPNN, but there is a smoothing factor that its optimum is gained as try and error.The smoothing factor must be greater than 0 and can usually range from 0.1 to 1 with good results.The number of neurons in the input layer is the number of inputs in the proposed problem, and the number of neurons in the output layer corresponds to the number of outputs.Because GRNN networks evaluate each output independently of the other outputs, GRNN networks may be more accurate than BPNN when there are multiple outputs.GRNN works by measuring how far given samples pattern is from patterns in the training set.The output that is predicted by the network is a proportional amount of all the output in the training set.The proportion is based upon how far the new pattern is from the given patterns in the training set.

ICA Implementation and Determination of the Most Relevant Well Logs
As it was mentioned, ICA is one of the suitable methods for extracting the most important and relevant features of any particular dataset.Hence, in this paper, we used this method for identification of those well logs that have a good relationship with permeability.Table 2 gives the correlation matrix of the well logs and permeability after applying ICA in the first step.
As it is seen in Table 2, most of the well logs have a good relationship with permeability except Y and PEF.Hence, it can be concluded that photoelectric factor log PEF and Y coordinate should be ignored as the inputs because of the weak relationship with permeability.For further analysis, rotated component matrix was determined showing the number of components of the dataset.This matrix is show in Table 3.
Regarding Table 3, it can be observed that the entire well logs and their coordinate are divided into two different components.In the first component, eight parameters including X and Z coordinates, GR, DT, RHOB, NPHI, MSFL, LLD, and LLS well logs and permeability were categorized.The second component is consisted of Y coordinate and PEF well log.This table clearly shows that component no. 1 is the one which can be used for prediction of permeability.Figure 4 is another representation showing the obtained results of Table 3 in a graphical form.
As it is also shown in Figure 4, Y and PEF are not associated with the others in a same side of the cubic.Regarding the obtained results of ICA, five well logs including GR, DT, RHOB, NPHI, MSFL, LLD, and LLS and two coordinates, X and Z, are taken into account for prediction of permeability using the networks.

Implementation of SVM
Similar to other multivariate statistical models, the performance of SVM for regression depends on the combination of several parameters.They are capacity parameter C, ε epsilon of ε-insensitive loss function, and the kernel type K and its corresponding parameters.C is a regularization parameter that controls the tradeoff between maximizing the margin and minimizing the training error.If C is too small, then insufficient stress will be placed on fitting the training data.If C is too large, then the algorithm will overfit the training data.But, Wang et al. 64 indicated that prediction error was scarcely influenced by C. In order to make the learning process stable, a large value should be set up for C e.g., C 100 .The optimal value for ε depends on the type of noise present in the data, which is usually unknown.Even if enough knowledge of the noise is available to select an optimal value for ε, there is the practical consideration of the number of resulting support vectors.ε-insensitivity prevents the entire training set meeting boundary conditions and so allows for the possibility of sparsity in the dual formulations solution.So, choosing the appropriate value of ε is critical from theory.
Since in this study the nonlinear SVM is applied, it would be necessary to select a suitable kernel function.The obtained results of previous published researches 64, 65 indicated that the Gaussian radial basis function has a superior efficiency than other kernel functions.As it is seen in Table 1, the form of the Gaussian kernel is as follow: where σ sigma is a constant parameter of the kernel.This parameter can control the amplitude of the Gaussian function and the generalization ability of SVM.We have to optimize σ and find the optimal one.In order to find the optimum values of two parameters σ and ε and prohibit the overfitting of the model, the data set was separated into a training set of 125 compounds and a test set of 50 compounds randomly and the leave-one-out cross-validation of the whole training  5 and 6.To obtain the optimal value of σ, the SVM with different σs was trained, with the σ varying from 0.01 to 0.3, every 0.01.We calculated the RMS on different σs, according to the generalization ability of the model based on the LOO cross-validation for the training set.The curve of RMS versus the sigma was shown in Figure 5.In this regard, the optimal σ was found as 0.16.
In order to find an optimal ε, the RMS on different εs was calculated.The curve of the RMS versus the epsilon was shown in Figure 6.From Figure 6, the optimal ε was found as 0.08.From the above discussion, the σ, ε, and C were fixed to 0.16, 0.08, and 100, respectively, when the support vector number of the SVM model was 41. Figure 7 is a schematic diagram showing the construction of the SVM.

Implementation of GRNN
In order to check the accuracy of the SVM in prediction of permeability, obtained results of SVM are compared with those of the general regression neural network GRNN .Constructed GRNN of this study was a multilayer neural network with one hidden layer of radial basis function consisting 49 neurons and an output layer containing only one neuron.Multiple layers of neurons with nonlinear transfer functions allow the network to learn nonlinear and linear relationships between the input and output vectors.
In addition, the performance of GRNN depends mostly on the choice of smooth factor SF which is in a sense equivalent to the choice of the ANN structure.For managing this issue, LOO cross-validation technique has been used and optimal SF was found as 0.23. Figure 8 shows the obtained results of this study.

Prediction of Permeability Using SVM
After building an optimum SVM based on the training dataset, performance of constructed SVM was evaluated in testing process.Figure 9 is a demonstration showing the ability of SVM in prediction of permeability.
As it is illustrated in Figure 9, there is an acceptable agreement correlation coefficient of 0.97 between the predicted and measured permeability.In fact, the SVM is an appropriate method for prediction of permeability.Nonetheless, the performance of this method should be compared with another suitable method for highlighting the strength of SVM.

Prediction of Permeability Using GRNN
As it was already mentioned, to check the accuracy of the SVM in the prediction of permeability, obtained results of SVM are compared with those of the general regression neural network GRNN .Figure 10 shows the obtained results of prediction via GRNN.
As it is shown in Figure 10, GRNN is a good network for prediction of permeability.However, it is not as strong as the SVM in prediction process.Hence, it can be considered as the second alternative after the SVM for prediction of permeability.

Discussion
In this research work, we have demonstrated one of the applications of artificial intelligence techniques in forecasting the hydrocarbon reservoirs permeability.Firstly, independent component analysis ICA has been used for determination of relationship between the well log data and permeability.In that section, we found that Y and PEF are not suitable inputs for the networks because of the weak relationship with the permeability Figure 4 .As a matter of fact, GR, DT, RHOB, NPHI, MSFL, LLD, and LLS well logs along with X and Z coordinates were used for the training of the networks.In this regard, we developed two Matlab software codes i.e., M.file and interrogated the performance of SVM with the best performed work of the GRNN method.When comparing SVM with this model Table 4 , it presented overall better efficiency over the GRNN in terms of RMS error in both training and testing process.
According to this table, the RMS error of the SVM is smaller than the GRNN.In terms of running time, the SVM consumes a considerably less time 3 second for the prediction compared with that of the GRNN 6 second .All of these expressions can introduce the SVM as a robust algorithm for the prediction of permeability.

Conclusions
Support vector machine SVM is a novel machine learning methodology based on statistical learning theory SLT , which has considerable features including the fact that requirement on kernel and nature of the optimization problem results in a uniquely global optimum, high generalization performance, and prevention from converging to a local optimal solution.In this research work, we have shown the application of SVM compared with GRNN model for prediction of permeability of three gas wells in the Kangan and Dalan reservoir of Southern Pars field, based on the digital well log data.Although both methods are data-driven models, it has been found that SVM makes the running time considerably faster with the higher accuracy.In terms of accuracy, the SVM technique resulted in an RMS error reduction relative to that of the GRNN model Table 4 .Regarding the running time, SVM requires a small fraction of the computational time used by GRNN, an important factor to choose an appropriate and high-performance data-driven model.
For the future works, we are going to test the trained SVM for predicting the permeability of the other reservoirs in the south of Iran.Undoubtedly, receiving meaningful results from other reservoirs using well log data can further prove the ability of SVM in prediction of petrophysical parameters including permeability.

Figure 1 :
Figure 1: Geographical position of Southern Pars gas field.

Figure 2 :
Figure 2: Mechanism of data transformation via ICA algorithm.

mp on en t 1 C o m p o n e n t 3 Component 2 Figure 4 :
Figure 4: A graphical form of representation for showing the relationship of well logs and permeability.

Figure 9 :
Figure 9: Relationship between the measured and predicted permeability obtained by SVM a ; estimation capability of SVM b .

Figure 10 :
Figure 10: Relationship between the measured and predicted permeability obtained by GRNN a ; estimation capability of GRNN b .

Table 2 :
Correlation matrix of the well logs and permeability after applying ICA.

Table 3 :
Rotated component matrix of the parameters.

Table 4 :
Comparing the performance of SVM and BPNN methods in the training and testing process.