^{1}

^{1}

^{1}

^{1}

In petroleum exploration, the acoustic log (DT) is popularly used as an estimator to calculate formation porosity, to carry out petrophysical studies, or to participate in geological analysis and research (e.g., to map abnormal pore-fluid pressure). But sometime it does not exist in those old wells drilled 20 years ago, either because of data loss or because of just being not recorded at that time. Thus synthesizing the DT log becomes the necessary task for the researchers. In this paper we propose using kernel extreme learning machine (KELM) to predict missing sonic (DT) logs when only common logs (e.g., natural gamma ray: GR, deep resistivity: REID, and bulk density: DEN) are available. The common logs are set as predictors and the DT log is the target. By using KELM, a prediction model is firstly created based on the experimental data and then confirmed and validated by blind-testing the results in wells containing both the predictors and the target (DT) values used in the supervised training. Finally the optimal model is set up as a predictor. A case study for wells in GJH survey from the Erdos Basin, about velocity inversion using the KELM-estimated DT values, is presented. The results are promising and encouraging.

Oil and gas exploration in sedimentary basins is very complicated, since all the targets are buried underground and they cannot be viewed or touched directly. So all the properties for the buried targets have to be predicted or estimated by using modern electrical or magnetic tools. The physical properties of the geologic formations include pore-fluid pressure, rock lithology, porosity, permeability, and oil or water saturation. Nowadays the conventional tool for characterizing these geophysical properties is well logging, and some logs such as gamma ray (GR), dual induction log, formation density (DEN) compensated, deep resistivity (REID), self-potential (SP), and sonic log (DT) are usually recorded. Among them, the sonic log (DT) has largely been used to predict rock porosity, to perform petrophysical analysis, or to carry out well-to-seismic inversion.

Owing to historical operation mistakes or recording loss, the sonic log may not be available in well logging suites. The traditional way solving this problem is to transform the DEN or REID log to DT log based on some experimental formula built between these logs. It might be feasible for some area, but sometimes the errors are unacceptable.

Artificial intelligence techniques have the advantage in connecting unrelated parameters and solving nonlinear problems. Such techniques, including BP neural network, fuzzy reasoning, or evolutionary computing for data analysis and interpretation have become effective tools in the workflow for well drilling and reservoir characterization [

Extreme learning machine (ELM) is a single-hidden layer feed-forward neural network (SLFN) proposed by Huang et al. [

In this paper, kernel-based extreme learning machine is used to predict missing sonic (DT) logs when only common logs (e.g., natural gamma ray—GR, bulk density—DEN, or deep resistivity—REID) are available. By using KELM, we first create and train a supervised network model based on experimental data and then confirm and validate the model by blind-testing the results. The optimal model is at last applied to wells containing the predictor data but with lack of DT log. We use this workflow in GJH survey from Erdos Basin and the KELM-estimated DT logs are then integrated in the seismic inversion to identify the sandstone reservoir.

The rest of this paper proceeds as follows. Section

In this study, the kernel extreme learning machine (KELM) is employed to predict the DT logs for the wells in GJH survey. So we present an overview of the ELM and kernel-based ELM as follows.

The classical ELM was proposed for SLFNs by Huang et al. [

As proposed in Huang et al. [

Similar to the SVM,

For the given type of the kernel function, the training dataset, and the initial parameters of the network, the following steps are considered.

Initiate the population based on the kernel function.

Evaluate the fitness function of each parameter.

The optimal parameters of kernel function can be determined. Then, based on the optimized parameters, the hidden layer kernel matrix is computed.

Determine the final output weights.

Well logging is the practice of making a detailed record of the geologic formations penetrated by a borehole. Normally the log is based on the physical measurements made by instruments lowered into the borehole. According to the geophysical properties of the rocks, the logs are always classified as follows: electrical logs, porosity logs, lithology logs, and miscellaneous logs. Sonic log (DT) belongs to the porosity logs, and it provides a formation interval transit time, which typically varies lithology and rock texture, especially porosity for the rocks. Gamma ray log is a log of the natural radioactivity of the formation along the borehole, measured in API units, particularly useful for distinguishing between sands and shales in a siliciclastic environment. This is because sandstones are usually nonradioactive quartz, whereas shales are naturally radioactive due to potassium isotopes in clays and adsorbed uranium and thorium.

The main datasets used in this study include acoustic log (DT), the gamma ray (GR), the resistivity log (REID), which represents the variation of the electric resistivity, the density (DEN), which records the density variation with depth in the borehole, and the self-Potential (SP), a measurement of natural electric potential. These geophysical parameters DT, GR, REID, DEN, and SP are intrinsically linked, since each of them reflects some physical property of the same rock layer. Take sandstone as an example. Pores are sure to exist at the sandstone interval, and if the pores are not filled with other types of tight materials, fluid is the only also important stuffing. There might be oil or gas and water as well. Since the fluid has different physical parameters than the surrounding sandstone, obvious differences will be recorded on the measuring logs: lower GR, lower DT, higher REID, lower DEN, and abnormal change on SP. Thus just observing the characters of the logs, especially those abnormal changes, the experienced researchers have confidence to tell the geological information along the borehole. And then some researchers try to build theoretical relationships between the logs. Thousands of experiments result in empirical equations. For example, DEN could be transformed using DT log when DEN is missing and the relation is defined as Gardener formula [

In this study, the key we focus on is the DT log, and we want to find the optimal way to get the DT log when it is missing.

The sonic log (DT) is very important in petroleum exploration phase. One way for using DT is to estimate rock porosity, which is the critical parameter for the reservoir evaluation, and identify the fluid information along the borehole. Additionally, since DT log has both time and velocity information, it becomes the reliable key for the time-depth conversion when using seismic data to interpret structures and geological mapping. In one word, the DT log is indispensable for the geophysical and geological study.

But there has always been imperfection, and sometimes, owing to operation mistake or recording loss, DT log may not be available in some wells. One solution for obtaining the DT log is to carry out empirical transformation from other logs, and the model is built by experiment analysis. The formula is just for specific field condition, and it can not be used for all the formation conditions. For instance, Faust formula is just for DT calculation using REID log, and cases [

Since DT has intrinsic links with the other geophysical logs, researchers often use logs like GR, REID, and so forth as the original inputs and the DT as outputs. Linear and nonlinear relationships have been set up using the soft-computing methods. But the results are not always satisfied. Thus our purpose is to build an optimal and reliable relationship between those geophysical logs and DT log.

In this paper, we investigate the capability of a kernel extreme learning machine in building the nonlinear mathematical model that best explains DT (target) as a function of GR, REID, DEN, and SP (predictors).

In order to validate the use of KELM in the context of log data recorded in oil and gas wells, we employed datasets obtained from seven wells drilled in the GJH survey in Erdos Basin.

The study involves the following well logging parameters: gamma ray (GR), deep resistivity (REID), self-potential (SP), formation density (DEN), and sonic log (DT). Among the wells, wells of YQ2, Y209, S211, S212, and S215 have full suites of well logs, while DT log is not available in the other two wells (S219 and S205). According to the evaluation conclusion for the logging process, we choose the farther four wells as training dataset sources and well S215 as the testing dataset. Shanxi group of the Permian formation is set as the analysis interval. Logs of GR, REID, SP, DEN, and DT in the interval from the mentioned four wells are collected and grouped as training dataset, while logs of well S215 as the validation target.

Figure

Logs showing of well YQ2 in the Shanxi group of Permian formation.

We select data in the same interval from the four wells of YQ2, Y209, S210, and S212 as the training samples. To ensure the quality of the logs, we use caliper log (CAL) as the reference. Constant diameter of the wellbore (described by CAL) means good environment for the other suite of logs. Totally about 40,000 data items are available for the training process.

To speed up the convergence of the gradient descent algorithm, data normalization is mandatory for the performance. And the above-mentioned logs have different measurement units. All of the logs are normalized before formally inputting into the network. The normalized variable has the following form:

In KELM network learning, the output model is created by learning patterns from the training examples provided. Therefore, the training dataset should be carefully chosen in order to provide correct examples. And noise should be removed from the samples; otherwise errors may affect the final performance.

For the KELM network model, there are totally four input neurons and one target at the output layer. The four inputs include GR, REID, SP, and DEN logs, and the main task is to build reliable prediction model between these inputs logs and DT log (shown as Figure

Example of multi-input versus single-output sonic log prediction using KELM. Details about the well logging parameters depicted in the figure are given in text.

In the algorithms of KELM, two hyperparameters, namely, the regularization factor (

The quality of the trained model is evaluated based on the prediction accuracy. The Mean Squared Error (MSE) is computed as the average over all squared deviations of the predictions from the real values.

After training, the model could be presented in the following form:

Furthermore, in order to testify the advantages of KELM, BP network algorithm is used in the model training and testing process to compare with KELM. Backpropagation (BP) feed-forward network is the most commonly used ANN approach, and it is also criticized on its difficulty to decide learning rates, being easy to be stuck on local minimums, overfit problems, and being time-consuming [

Table

Comparison of porosity prediction performance results on KELM against BP methodology for well YQ2. The comparison strata belong to the Shanxi group of the Permian formation.

Algorithm | Accuracy | MSE (%) | Training time (s) |
---|---|---|---|

BP | 0.752 | 1.812 | 912 |

KELM | 0.906 | 0.423 | 23 |

Through the above-mentioned training process, the KELM model for predicting DT is established finally. Although the training dataset has almost 40,000 data points, the training task costs very short time and the performance is satisfying. To validate the KELM model, we use well S215 as blind well. The four logs are collected and processed for the well, and then we input them into the model and keep the network parameters. Since the data for validation is small group with nearly 6000 samples, the process only costs 6 seconds and one predicted DT is generated. In well S215, there has been DT log, so that the predicted DT can be used in comparison with the real DT. Figure

Logs comparison in Shanxi group of Permian formation in well S215.

In this study, DT log is missing in the two wells of S219 and S205. Here the KELM model is then recommended to do the prediction task for the two wells. Luckily, the four input logs (GR, REID, DEN, and SP) are guaranteed in both of the wells. Using the same noise-filtering and normalization step in the training and validating step, we firstly input the four predictor logs of well S219 into the model and generate DT log for this well. And then we repeat the steps for the well S205 and also get the DT log. Figure

Estimated DT log using KELM model (colored in blue) for S219.

The above analysis has shown the reliability and accuracy of the KELM-based prediction model. All of the 7 wells in the studied area have DT logs now, although two of them are generated using KELM model.

In reservoir description phase, seismic profiles are just wiggle-based and not so convenient for researchers to understand and identify the potential fluid zone. Thus transforming the wiggle shape of seismic sections into velocity or lithological profiles are the necessary step in seismic interpretation. That goal of transformation in geophysical process is the seismic inversion. Since DT log has time unit and velocity information, while seismic data is just in time unit, in the inversion task, DT can be used to do the well-to-seismic calibration and mark the reservoir interval. Here we just focus on the KELM-estimated DT application in the seismic inversion other than discussing the complex inversion technique.

Figure

KELM-based inverted velocity section crossing well S205.

This paper discusses kernel extreme learning machine as a tool for predicting the sonic log in gas/oil wells based on other available common logs. Strict steps including data normalization, training set selection, and optimization of the ELM parameters are very important for deciding the prediction power, the generalization capability, and the complexity of the derived regression model. Extensive applications are carried on to investigate the prediction power of model-predicted DT log use for seismic inversion.

The method presented here is not limited to modeling DT logs only. It can be extended, with appropriate modifications of the algorithm, in any area of well logging studies, where missing log values are needed. Thus, we offer a blueprint for future similar applications.

The authors declare that they have no conflict of interests.

This work is supported by National Natural Science Foundation of China (no. 61402331). The Foundation of Educational Commission of Tianjin City, China (Grant no. 20140803) also funded this research.