A Novel Homogenous Hybridization Scheme for Performance Improvement of Support Vector Machines Regression in Reservoir Characterization

Hybrid computational intelligence is defined as a combination of multiple intelligent algorithms such that the resulting model has superior performance to the individual algorithms. Therefore, the importance of fusing two or more intelligent algorithms to achieve better performance cannot be overemphasized. In this work, a novel homogenous hybridization scheme is proposed for the improvement of the generalization and predictive ability of support vector machines regression SVR. The proposed and developed hybrid SVR HSVR works by considering the initial SVR prediction as a feature extraction process and then employs the SVR output, which is the extracted feature, as its sole descriptor. The developed hybrid model is applied to the prediction of reservoir permeability and the predicted permeability is compared to core permeability which is regarded as standard in petroleum industry. The results show that the proposed hybrid scheme HSVR performed better than the existing SVR in both generalization and prediction ability. The outcome of this research will assist petroleum engineers to effectively predict permeability of carbonate reservoirs with higher degree of accuracy and will invariably lead to better reservoir. Furthermore, the encouraging performance of this hybrid will serve as impetus for further exploring homogenous hybrid system.


Introduction
Permeability is the ease with which oil and gas flow in a porous rock and is associated with subsurface production and injection [1].Permeability is an indication of the interconnection of rock pores and a measure of oil and gas fluidity through the rock spaces.It is an important flow parameter whose accurate determination is fundamental to successful oil and gas exploration.Important details such as the scale of the medium under exploration, the amount of recoverable hydrocarbon from the medium, and the flow rate can be deducted from the correct estimation of permeability.Despite the importance of this parameter, its accurate determination has been a long-standing subject of intense research in petroleum industry as it has proven very difficult to estimate accurately.There are several proposed methods for permeability estimation which can be categorized broadly into three categories, namely, empirical method, statistical method, and computational intelligence schemes [2].The standard procedure is the empirical method [3] which makes use of laboratory equipment in the determination of permeability from core samples obtained from the oil exploration fields.However, there is limitation to this method since obtaining large amount of core samples is very expensive and highly impractical [4].This is due to the fact that expensive special drills are required to obtain cored samples through drilling and their transportation and analysis are also cumbersome and costly.Furthermore, the use of liquid nitrogen is usually required to freeze the cores in order to prevent potential degradation between retrieval and the final laboratory analysis; hence, the cost of a coring operation could be very expensive and runs into tens of millions [5,6].An alternative procedure is the use of statistical analysis which is majorly based on regression analysis.This involves using linear regression to predict permeability parameter from several descriptors leading to the development of a linear model used for future estimation and generalization to unseen data.Regression analysis, however, fails to account for the high nonlinearity which exists between the well log data used as descriptors and the core permeability used as target variable.Hence, statistical analysis does not perform adequately well in permeability estimation [7].Computational intelligence is a set of nature-inspired computational methodologies and approaches to address complex real-world problems to which traditional approaches such as statistical modeling schemes are ineffective or infeasible.Computational intelligence can better deal with the problems which statistical modeling schemes often fail to handle adequately which is usually the case in the face of uncertainty, noise, and the presence of an ever-changing context.The use of computational intelligence schemes for learning pattern distributions and nonlinear relationships among variables has been widespread in several major areas and has enjoyed wide-ranging success [8].The schemes have been employed in areas such as medicine [9], materials science [10], biometrics [11], and many other important applications [12].The use of computational intelligence scheme (CIT) in estimation of petrophysical parameters has also gained widespread attention and is now a firmly established procedure in the petroleum industry [2].The models developed based on CIT are able to learn adequately the underlying nonlinear relationships existing between permeability and the descriptors used for its estimation and as such outperform the statistical analysis based estimations.However, in extreme cases, where the CIT scheme is not performing excellently, hybrid systems are often employed to boost performance and increase accuracy [4].Hybrid techniques have been employed and deployed in many fields like medicine, reservoir characterization, optimization, ambient intelligent systems, and so on [13][14][15][16][17][18][19][20].These hybrid systems are usually heterogeneous in the sense that multiple and differing computational schemes are employed.In this work, a homogenous hybridization scheme is proposed to improve the performance of CIT in the estimation of carbonate reservoir.The CIT considered is support vector regression (SVR) due to its excellent performance derived from its many unique features such as its sound mathematical foundation, nonconvergence to local minima, and accurate generalization and predictive ability [2].In order to advance and improve SVR performance, a novel hybridization scheme is proposed which uses the predicted output of an SVR model as the descriptor of another SVR model to get a better performance.This idea is inspired by hybridization procedure which uses target of one model as the descriptor of another model as shown in Figure 1.The proposed scheme uses a single descriptor that contains all the intricacies, patterns, and information embedded in several descriptors necessary for accurate prediction of permeability such that SVR model has access to this pool of information from a single descriptor which leads to development of better and efficient model.Intuitively, the initial training of SVR model can be viewed as a feature extraction process, where the predicted output is the extracted feature, while subsequent SVR model makes use of this extracted feature for performance improvement.
The results obtained attest to the effectiveness of this scheme and its potential in improving generalization and prediction ability of SVR models.This improvement will yield more accurate permeability prediction invariably resulting in more productive and highly successful oil and gas exploration.

Support Vector Machine
Support vector machine (SVM) is a computational tool used for classification tasks and is derived from statistical learning theory [21].It is an extension of support vector machine which was solely derived to tackle classification tasks [22].SVM (support vector machine) is a supervised learning model with associated learning algorithms that analyze data and recognize patterns.It is a computational tool employed basically for classification tasks.The basic idea behind the SVM principle is that input vectors are nonlinearly mapped into a high-dimensional feature space where the algorithm constructs a decision surface known as hyperplane with special characteristics and properties.The hyperplane or set of hyperplanes can then be used for classification, regression, or other tasks.The original optimal hyperplane algorithm proposed by Vapnik and Lerner in 1963 was a linear classifier [23].In 1992, Boser et al. suggested a way to create nonlinear classifiers by applying the kernel trick to maximummargin hyperplanes [24].The maximum-margin hyperplanes work on the principle of selecting an optimal plane based on minimization of generalization error or on defining upper bound on the error using structural risk minimization (SRM) for separable classes.SRM is an inductive principle employed in machine learning in order to provide a good balance between complexity of the model and its generalization capability.SRM uses VC dimension as proposed by Vapnik and Chervonenkis in 1974 to reduce training error on a finite set of data [25].SVM uses a predefined constant to develop the model by making a tradeoff between the number of misclassification errors and the maximum margin [26].
SVM takes a set of inputs and, with the help of kernel trick, maps the data to a higher-dimensional plane in order to perform linear classification on the data [27].It constructed a maximally separating plane for this task [28].The objective function of SVM can be stated as follows: where w are the adjustable parameters,   is the th desired output, x  is a vector of th input,  is the transformation rule,  is the mean of the input vector, and  is the error-limit factor.Slack variables are introduced into the SVM objective function in order to ease the earlier defined -error limit on all training pairs.The modified objective function can then be written as above is a regularization parameter for the good performance of the developed model [29].The modified optimization objective can be solved by introducing Lagrangian multipliers [30].A version of SVM for regression was proposed in 1996 by Drucker et al. and is referred to as support vector regression (SVR) machines [31].SVR has been applied successfully to solve problems in numerous fields and proved to be a better prediction model than most regression schemes [32].

Description of Data and Research Methodology
3.1.Description of Data.Real-life datasets obtained from three wells, located in a Middle Eastern oil and gas field, have been used for studying the effectiveness of the proposed scheme.The total dataset for each of the wells is presented in Table 1 and the geophysical data (descriptors) available for the wells are neutron porosity (NPHI), total porosity (PHIT), bulk density (RHOB), water saturation (SWT), bulk density correction (DRHO), and microspherically focused log (MSFL).Table 2 shows the statistical analysis of the data from each well which reveals the variation existing from measurement to measurement and provides insight into the intricacies within the dataset.

Experimental Design.
The datasets for all the wells were first normalized in order to prevent bias from some of the descriptors.This is due to the fact that the descriptors were measured on a different scale and, without normalization, relatively large values of some of the descriptors could bias the developed model thereby impacting negatively its predictive ability.The total dataset was divided into training set and testing set in the ratio 8 to 2 using stratified sampling in order to develop the SVR model; hence, the training set consists of 80% of the total dataset, while testing set takes 20% (shown in Table 1).The use of stratified sampling minimizes bias and ensures that the variations within and across the dataset are adequately captured and represented in both the training set and the testing set.The training phase involves using the descriptors to train the SVR model in a supervised learning process and using cored permeability as the target variable.This is due to the fact that cored permeability is the standard in the petroleum industry.The correlation coefficient as well as the error between the predicted permeability by the SVR and cored permeability is iteratively adjusted till desired values are obtained indicating a satisfied confidence level in the developed model.

Implementation Strategy for the Proposed Homogenous
Hybridization Scheme.The predicted permeability from the developed SVR model is a rich source of information for hybridization scheme as it contains all the flows, connections, patterns, and intricacies previously found in the descriptors.The advantage now is that this mine of information is now aggregated and centralized in a single parameter.Hence, the predicted permeability is an excellent parameter for use as an SVR descriptor.This is the basis of the proposed homogenous hybridization scheme which works by employing the output of an earlier developed SVR model as its own descriptor resulting in model with better predictive and generalization ability.The data flow diagram for the proposed hybrid scheme is shown in Figure 2. It is seen that the hybrid scheme involves multiple SVR models, where the output of the first SVR model serves as the input of the second SVR model leading to a much improved performance.
It must be stated here that the proposed model followed strictly the standard training and testing procedures in which the testing set is kept away as unseen data before it is sent to the model for testing.From the data flow diagram depicted in Figure 2, it is made clear how the dataset is first partitioned into training set and testing set.The standard procedure for the sequential implementation of hybrid model is followed in this work, where the preceding method (SVR1) always acted on the dataset to generate an output that is then fed to the next model in the hybrid setup (i.e., SVR2 in this present work.).This hybridized implementation is referred to as hybridized SVR (HSVR).

Criteria for Performance Evaluation.
The criteria implemented for assessing performance are the ones commonly used in regression analysis and petroleum engineering journals [7].This is done in order to align with the best practice and to carry out a fair assessment of the developed models.Correlation coefficient (CC) and cross plots of the predicted and cored permeability are examined for both the developed SVR and the proposed hybridized SVR.Correlation coefficient (CC) indicates the strength of relationship existing between the predicted permeability and cored permeability.The higher the CC, the better the developed model, and vice versa.It is expressed as a percentage ranging from a value of 0 to 1, with 0 indicating lack of any correlation, while a value of 1 indicates perfect generalization accuracy.The formula for correlation coefficient is where   and   stand for the actual and predicted values, respectively, while    and    , respectively, represent their mean values.Cross plots are commonly used in well log analysis and geophysical data interpretation.They show the emerging trend between variables and reflects patterns which are not discernible by single-valued criteria like coefficient correlation.

Validation Set Procedure.
The performance of SVR is highly dependent on the selection of its parameters which give optimum performance.These parameters are referred to as optimum parameters.The mode of selection for the optimum parameters is briefly detailed in this section and the optimum parameters are presented.A test-set-validation procedure [10] is adopted which involves running the algorithm for a fixed set of parameters while varying a single one of them till optimum performance is recorded as shown in Figure 3.The value corresponding to the optimum performance is then recorded and this procedure is repeated for all the SVR parameters.Finally, the optimum set of parameters is then used to train the model in order to develop the best model.Mathematical implementation of cross-validation procedure is as follows:   () is first defined where  contains all the available kernel functions, , , and  are the indexes for kernel functions, and we selected values of  and , respectively, while , , and  represent the indexes for optimum kernel functions,  and , respectively.The total number of available kernel functions is , assuming the maximum values of  and  are  and , respectively.The recorded performance measure is stored in .The algorithm can then be depicted as in Algorithm 1.The varied parameters for SVR are regularization factor (), kernel function, kernel option (), Epsilon (), and hyperparameter Lambda ().The parameter which has the greatest effect on the model performance is the regularization factor () and, due to this fact, the images showing the sensitivity of the model performance with respect to varying values of regularization factor are shown in Figures 4-9  is that while the radial basis function (RBF) (Gaussian) kernel was the best for SVR, which has been observed and reported in many publications [8,10,26,33], we observed that the polynomial kernel (Poly) performed best for HSVR.This observation can be attributed to the reduction in the number of descriptors since HSVR only takes a single descriptor which greatly simplifies the degree of complexity for polynomial kernel and gives it an edge.Table 3 shows the optimum values for the parameters of the models.These optimum values are shown in order to aid the reproducibility of the experiments detailed in this work.

Experimental Results and Discussions
The result of the experiment is presented and discussed in this section.The normal SVR is termed SVR, while the model developed with the proposed scheme is referred to as hybridized SVR (HSVR).Figures 10 and 11 show the coefficient correlation for both training phase and testing phase of the two models considered in this work.The training phase is the process where the model is developed using supervised learning scheme, while the testing phase performance shows the ability of the developed and trained model to generalize to newly unseen data.
It is seen that the proposed scheme improved the performance of the SVR model in both Well-A and Well-C with increasing correlation coefficient of about 18.5% and 16%, respectively, during the testing phase, while it achieves similar result in Well-B.This shows the effectiveness of the proposed scheme in improving the performance of SVR.Furthermore, the cross plots of the cored and predicted permeability are shown for the testing phase of the wells in Figures 12-17.
It can be seen in Figure 12 that the cross plot for the SVR model in Well-A is not accurately aligned, as the points are scattered, showing the discrepancies between the predicted permeability and cored permeability.However, the HSVR model realized from the proposed scheme and shown in Figure 13 has a much better aligned cross plot, as the points are closer together, showing that the predicted permeability is closer to the core permeability which is a result of the effectiveness of the proposed scheme.
Figures 14 and 15 show the cross plots for both models in Well-B.It is seen that the two models have relatively similar performance as the correlation coefficient is very high with a much aligned cross plot.This indicates that the proposed scheme does not degrade performance as the good performance of SVR is maintained by the HSVR model.
Finally, Figures 16 and 17 show the cross plots for SVR and HSVR in Well-C.The cross plot for HSVR shows that the predicted permeability is much closer to the cored permeability as the points are closer and better aligned compared to the cross plot for the SVR model which has scattered points that are not well aligned.

Conclusion and Recommendation
A new homogenous hybrid based on SVR has been implemented in this work.Thereafter, permeability of carbonate reservoir is predicted using the developed novel homogenous hybridized SVR (HSVR) which is trained and developed with real industrial datasets obtained from a Middle Eastern oil and gas exploration field.The performance of the proposed HSVR is then compared to that of the ordinary SVR.The superior performance of the proposed hybridized SVR has been established in terms of higher correlation coefficient and highly accurate permeability prediction compared to the model developed from conventional or standalone SVR.Therefore, the proposed scheme can be employed in increasing the production of hydrocarbons through better characterization of carbonate reservoir resulting from improvement in the generalization and predictive ability of SVR for prediction of permeability.The outcome of this research will greatly assist engineers and experts in the petroleum industry to better tackle the difficulty experienced in attaining high degree of accuracy in the prediction of reservoir permeability and improve the exploration of hydrocarbon deposits.Further research could be carried out in investigating the effect of the proposed scheme on other computational schemes.
Furthermore, the proposed hybrid could be used on other prediction problems related to oil and gas and even outside the oil and gas fields.

Figure 1 :
Figure 1: Flow chart for homogenous hybridization of SVR.

Figure 2 :
Figure 2: Data flow diagram of the proposed hybrid model (HSVR).

Start
Initialize the parametersRun algorithm and record performance criteria (pf) Vary one parameter (with others constant) till the best pf is recordedRecord the parameter value corresponding to the best pfEndRepeat the procedure for all the parameters

Figure 12 :Figure 13 :Figure 14 :
Figure 12: Cross plot of cored and predicted permeability for SVR model.

Table 1 :
Number of available data points.

Table 2 :
Statistics of the geophysical data for all the wells.

Table 3 :
Optimum parameters for each of the wells.