Potential for Evaluation of Interwell Connectivity under the Effect of Intraformational Bed in Reservoirs Utilizing Machine Learning Methods

Machine learning method has gradually become an important and effective method to analyze reservoir parameters in reservoir numerical simulation. This paper provides a machine learning method to evaluate the connectivity between injection and production wells controlled by interlayer in reservoir. In this paper, Back Propagation (BP) and Convolutional Neural Networks (CNNs) are used to train the dynamic data with the influence of interlayer control connectivity in the reservoir layer as the training model. The dataset is trained with dynamic production data under different permeability, interlayer dip angle, and injection pressure. The connectivity is calculated by using the deep learning model, and the connectivity factor K is defined. The results show that compared with BP, CNN has better performance in connectivity, average absolute relative deviation (AARD) below 10.01% higher. Moreover, CNN prediction results are close to the traditional methods. This paper provides new insights and methods to evaluate the interwell connectivity in conventional or unconventional reservoirs.


Introduction
The interlayer has the function of an impermeable barrier or very low permeable high-resistance layer for fluid flow. It blocks fluid flow and separates fluid flow units within a certain range, which has a great influence on the oil displacement process. It is one of the main factors affecting the formation and distribution of remaining oil and the key content of reservoir heterogeneity research. It is of great significance for well pattern adjustment and potential tapping direction of oilfield development to dynamically adjust the reservoir in real time according to the connected results of interlayer distribution and dip angle characteristics. In particular, it is very important to judge the dynamic connectivity between injection and production wells accurately when the waterflood reservoir enters the later stage of development [1][2][3][4].
For the study of interlayer, the geological data is mainly used to predict the scale of the configuration bed, but there is no unified understanding of scale and fineness. In particular, the relationship with production performance is not clear enough at present, so it is difficult to directly guide the exploration and development of oil and gas fields with the research results of reservoir configuration. This greatly limits the rapid development of reservoir configuration research [5][6][7][8][9][10][11][12].
There are mainly three traditional methods to analyze connectivity evaluation between injector and producer of the present day. Firstly, some scholars use the statistical Spearman correlation coefficient (SCC) to evaluate the connectivity among wells. Heffer et al., Refunjor, and Soeriawinata and Kelkar established a connectivity inversion model based on the superposition principle using analyzing the connectivity by calculating the Spearman correlation coefficient [13][14][15]. Secondly, many scholars applied multiple linear regression (MLR) as the study method of good connectivity. For instance, Albertoni et al. established an improved multiple linear regression model to evaluate connectivity between injector and producer [16]. Thirdly, some scholars used the capacitance model (CM) to analyze connectivity. Based on bottom whole pressure (BHP) and rate data, Yousef et al. and Kaviani et al. presented two enhancements to CM when BHP data are unavailable [17,18].
Using machine learning as a method to evaluate connectivity between wells has two advantages. Firstly, for the common methods, such as CM and MLR models, the input dynamic production data is less limited to production and pressure data. This is because the traditional method is to establish a mathematical model for analysis of the connectivity coefficient to characterize connectivity. However, under different conditions,the machine learning model is established to learn the characteristics of dynamic production data. The output is average permeability and split angle. Therefore, the input variables are less limited to the dynamic data. In this paper, the water cut information provided by dynamic production data is added as the input characteristic data, which make predicted results more accurate. Secondly, the prediction time acquired by machine learning model is very little. When the dynamic production data obtained as the trained model, the connectivity result feedback can be less than a second, much less than the several days or longer acquired by traditional methods, such as tracer testing, interference testing, and pressure test [19][20][21][22][23].
In this paper, machine learning method is used to evaluate the connectivity between control wells of reservoir interlayer; the specific research steps are as follows: (1) Establishing a mathematical model of two-phase flow between injection and production wells, and taking the pressure, water content and injection pressure under different interlayer distribution and dip angle characteristics as input parameters (2) Based on the training dataset, BP and CNN methods were used to build a deep learning model to train Trained-K and Trained-θ (3) The connectivity is described by the calculated Trained-K and Trained-θ, and the accuracy of different BP and CNN methods is compared The research results of this paper could provide new insight and prospects for future oilfield development.

Methodology
2.1. BP Neural Network. The BP neural network is a neural network model that combines the network structure of the multilayer perceptron with the error back propagation algorithm. At present, the model has been widely used in various fields and has achieved great success. The biggest advantage of BP neural network lies in the nonlinear mapping ability from input to output, which makes it particularly suitable for solving extremely complex internal mechanisms. Secondly, due to the existence of error back propagation algorithm, the neural network has stronger self-learning and self-adaptive abilities in the training process. Finally, the BP neural network has certain fault tolerance, and its local or partial neuron parameters are abnormal, which will not have a great impact on the overall training results. These advantages of BP neural network make it very suitable for predicting interwell connectivity. It can quickly establish a nonlinear mapping relationship between dynamic production data and formation average permeability under the condition of nonphysical models, so that dynamic production data can be utilized to invert average permeability through model training and testing [24,25].
BP neural network consists of three parts in structure: the input layer, the hidden layer, and the output layer. At the same time, the calculation process is mainly divided into two stages: forward propagation and backward propagation. Forward propagation is the process of using the weights and thresholds in the neural network to calculate our desired output variable based on the input data. Backward propagation is to update the weights and thresholds continuously in the network according to the error of the output variable, so that the output result can be constantly approaching the true value. The input layer of the neural network designed in this study has a total of 3600 neurons, which characterize the oil production, water cut, and injection volume. The hidden layer has a three-layer structure, and each layer contains 120 neurons to extract the data features of the input layer fully. Only two neurons in the output layer represent the average permeability and intraformational bed (as show in Figure 1).

Convolutional Neural Network.
Convolutional neural network has been widely used in various fields and achieved great success due to its unique feature extraction and dimension reduction approach. The structure of CNN is mainly composed of five parts: input layer, convolutional layer, pooling layer, fully connected layer, and output layer. The convolution layer is used to extract data features, after that, the pooling layer is utilized to reduce the dimension of the extracted data [26,27].
In this study, we merge the oil production (1 × 1200), water cut (1 × 1200), and injection volume (1 × 1200) data into a new dataset (3 × 1200) and reshape its data format (60 × 60) as the input layer, which can facilitate the extraction of the features by the convolution layer. Then, connected to the input layer are two convolutional layers, each followed by a pooling. The sizes of the convolution kernels in the two convolutional layers are 3 × 5 and 5 × 3, respectively, and the number of convolutional kernels is 60 and 120. In the two pooling layers, the max pooling function and the kernel size of 2 × 2 were adopted. Three consecutive fully connected layers are followed by the pooling layer, each layer consisting of 40 neurons. In the end, there are still two neurons in the  2 Geofluids output layer, indicating the average permeability Trained-K and the dissection angle Trained-θ (as shown in Figure 2).

Model Evaluation.
When the calculation is completed, the model needs to be evaluated by statistical evaluation indicators, so as to verify the model. This paper used some error evaluation criteria to study the accuracy of model results. These methods include average relative deviation (ARD), average absolute relative deviation (AARD), and root mean square error (RMSE) as follows [28,29]: where N represents total number of data in each set. X real i is the real value we expected from each set, and X predict i is the corresponding predicted value calculated by the neural network.

Acquisition of Connectivity
Factor. Based on the production and pressure data of injection and production wells, the mathematical model of the connectivity coefficient between wells is established. Another experimental method based on tracer monitoring to evaluate the connectivity between wells in water drive reservoir has high cost and long period. This paper studies the influence of the interlayer with a certain angle on the connectivity between injection and production wells. Training factors: dynamic data, average permeability, and interlayer angle. Target factors: average permeability, and interlayer angle. Compared with the traditional model, it is more reasonable to use the dynamic data to calculate the average permeability and interlayer angle, and the expression of the interwell connectivity with the average permeability Trained-K and interlayer angle Trained-θ is more reasonable. Equation of connectivity evaluation is shown as follows: Here, f k and f θ are the connectivity factor and Trained-K and Trained-θ are the training values of machine learning.

Dataset Collection.
This study prepares dynamic data of permeability, interlayer angle, and corresponding injection production relationship as training model. In actual oilfield development, it is difficult to obtain enough permeability models, and dynamic data contains a lot of data noise, which is not suitable for deep learning training. Therefore, this paper uses the data of numerical simulation training.
This experiment takes injection production well as an example. As shown in Figure 3, the reservoir has a certain angle of the interlayer, and the permeability of the interlayer is 0. And then, we use the dynamic data of production wells to back calculate the average permeability and interlayer angle between injection and production wells to show the connectivity between wells. We design the average permeability variation range (1400), step size 5 md, and the angle 3 Geofluids between the interlayer and the main flow line (0, 90°), step size 4.5°under each permeability. With each iteration of permeability, the corresponding models with different interlayer angles can be generated. At the same time, this model can change the mesh size, different injection volume, and other conditions. Based on the oil-water two-phase seepage model, this simulation includes 1701 seed models and corresponding dynamic data by using the IMPES method.  permeability. The horizontal axis is the production time, and the vertical axis is the oil production. The curve clearly describes that the driving speed of the water flow is inversely proportional to the angle of the interlayer due to the shielding effect of the interlayer. The results show that the larger the angle is, the slower the water breakthrough of the oil well is, the higher the production is, and the peak production is also high. Figure 4(b) shows the water cut curve affected by different interlayer angles under the same permeability. The horizontal axis is the production time, and the vertical axis is the water content. Under the same permeability, the larger the interlayer angle, the more obvious the shielding effect, the slower the water breakthrough, and the slower the water cut rise. The smaller the angle is, the faster the water driving speed is, the faster the water meets, and the faster the water content rises. Figure 5(a) shows the production well production affected by the same interlayer angle under different permeability. The horizontal axis is the production time, and the vertical axis is the oil production. The curve clearly describes that with the increase of permeability, the higher the production, the faster the decline. The permeability decreases and the peak value of production decreases. Figure 5(b) shows the water cut curve affected by the same interlayer angle under different permeability. The horizontal axis is the production time, and the vertical axis is the water content. The greater the permeability is, the faster the oil well will see water. The lower the permeability, the slower the water breakthrough. However, due to the barrier effect of interlayer, the lower the permeability, the lower the water displacement speed, and the larger the swept volume, the faster the water cut will rise in the later period.
Of course, the injection volume will also affect these characteristic parameters to a certain extent. We increase the injection pressure change as a deep learning model. The number is 1200 days, and the model parameters are as shown in Table 1.

Training Process.
In the machine learning process, training process as follows:    (1) Normalize the input and output [30] The normalized data can improve the computational efficiency of the model to a certain extent.
where X is the initial value, X ′ is its normalized value, and X min and X max are the minimum and maximum values in the variable, respectively.
(2) The dataset is divided into training set and test set according under the certain proportion. BP and CNN models can be a selection, and the test set to evaluate the model (3) In practical applications, using input dynamic production data into the model to quickly obtain connectivity information without renewed training 3.3. Optimization Technology. In this paper, the optimization technology is composed of ReLU and Dropout [31,32]. ReLU has the property of piecewise linearity, which makes its forward, backward, and derivative piecewise linear, so it is easier to optimize. The traditional sigmoid function is easy to lose information in the process of propagation due to the saturation of both ends. ReLU can make a part of neurons output to 0, which can result in sparsity of the network, reduce the interdependence of parameters, and alleviate over fitting, as shown in Figure 6. Some other parameters of the neural network are shown in Table 2.

Model Calibration.
The number of hidden layers has an important influence on the prediction of neural network model. Theoretically, the more layers the neural network has, the more accurate it can simulate the nonlinear mapping between variables. However, too many layers will lead to a sharp increase in the amount of computation, and MOS will cause the gradient to disappear in the calculation process. Of course, if the number of hidden layers is relatively small, the accuracy of the model calculation results will be doubted. Therefore, in order to better predict the connectivity of neural networks, it is necessary to modify the number of hidden layers. It can be seen from Table 3 that the AARD value will decrease significantly when the number of layers changes from 1 to 2 no matter if CNN or BP is the neural network. Compared with only one layer, AARD of BP and CNN decreased by 7.6% and 6.9%, respectively. However, when the hidden layer increases to 3 or even 4, the AARD value no longer fluctuates significantly for CNN, and it is basically stable at about 25%. For BP neural network, AARD value will decrease in three years. Increasing the number of layers has no significant effect on AARD. Considering the computing resources and accuracy, CNN and BP neural network use two and three layers, respectively.
In addition to the number of hidden layers, the division of datasets is also a prediction effect model that affects deep learning. The condition is that the other parameters are fixed and the corresponding proportion is calculated, and the optimal proportion is obtained for a comprehensive analysis of the dataset.
It can be seen from Table 4 that the AARD value of the two neural network models with the largest error value is  6 Geofluids 27.3% and 19.7% when the training set to test set is 5 : 5 no matter if CNN or BP is the neural network. This may be because the number of samples in the training set and the geological model of production data obtained dynamically under specific types are not learned by neural network. Therefore, we increase the training set, we can clearly see that when the ratio is 6 : 4, the error is significantly reduced. The proportion of training group continued to increase, and the values of AARD two models tend to be stable. It shows that the training set has covered all types of geological models. Finally, the ratio of the division ratio dataset is modified to 6 : 4.

Model Verification and Comparison.
The neural network calculates the error of the model after each training by loss function and then updates the weight and threshold value by back propagation algorithm to get the optimal deep learning model. Therefore, the accuracy of model prediction is evaluated by monitoring the curve change of RMSE. From Figure 7, we can clearly obtain the error change of neural network training. The horizontal axis represents the training time, and the vertical axis represents the error of the training K calculated by the loss function. In general, whether CNN or BP neural network, the overall trend of loss curve is declining. This phenomenon shows that the prediction accuracy of the model is constantly improving through each training of the back propagation algorithm. At the same time, we can also observe that the loss curve model converges at 30 stages, and the error of the test set is slightly higher than that of the training set. Finally, the RMSE of CNN and BP test set are 12.43 md and 2.36 md, respectively, which shows that the CNN model is better than the BP model.

Geofluids
As shown in Figure 8, generally speaking, the scattering points obtained by BP or CNN method are basically distributed near the x = y line, but the dispersion is different. The deviation between CNN prediction and x = y is within 11% and that of BP neural network prediction is even more than 20%. This scatter diagram further proves that the prediction of CNN method is more accurate than that of BP neural network.
From the evaluation results shown in Table 5, the AARD values of BP model are 15.35% and the test set is 22.56%. On the contrary, AARD of CNN model can be as low as 10.01% and 13.47%, which proves that CNN model has better prediction performance. In a word, we evaluate the accuracy of the model from many aspects and verify the prediction of the Trained-K and Trained-θ. CNN's training is more reliable and accurate.
In addition to verifying the accuracy of the model, we also compared the calculation time of the two models, as shown in Table 6. Because of the large amount of data input and the long training time of CNN and BP neural networks, it needs 2 hours and 50 minutes and 1 hour and 15 minutes, respectively. CNN's method needs longer time to extract more complex data features. In practice, we use trained models to predict, so we need to pay more attention to time for neural network prediction. You can see that the two models (BP and CNN) only need 0.8 seconds and 1.1 seconds to predict, respectively. 4.3. Analysis of the Influencing Factors of Input Variables. The input variables studied in this paper are not only the oil production and injection pressure used in the traditional model, which makes the data more comprehensive and increases the water content. Different variables are used as the input of the model, and the influence of each variable is analyzed by the AARD value of the prediction results.
As shown in Table 7, we use the input of production, water injection pressure, and water cut data model to verify the combination of different calculation results. First of all, when the input data is based on injection pressure, the AARD values obtained by adding the water cut or production to the input, respectively, are 28.8% and 23.2%, which indicate that the production has a greater impact on the results than the water cut. And the calculated AARD fall again to 17.1% when we add production and water cut data together as input variables. Finally, we added injection pressure data to the combination of water cut, AARD results also have a certain less gap significantly decreased, controlled at 16.8%. In a word, we import the moisture content data into the input. In addition, with the traditional production and pressure, the model can better at calculating the Trained-K and Trained-θ.

Comparison of Connectivity Factor.
We randomly select a sample from the test set and calculate the interwell connectivity coefficient as the actual coefficient according to the average permeability of the sample. In addition, CNN and the traditional CM, MLR model, and other methods calculate the corresponding connectivity coefficient under the same conditions for comparative analysis, as shown in Table 8.

Conclusion
This paper established a machine learning method to evaluate the interwell connectivity based on the dynamic production data. Firstly, the data of oil production, water cut, and injection pressure under different reservoir scale and water injection conditions are calculated by numerical simulation. Then, through the depth study of BP and CNN, the differences between the above dynamic characteristics are analyzed. Finally, according to the Trained-K and Trained-θ, the connectivity coefficient is calculated to represent the connectivity between wells. Through the analysis and discussion of the prediction results, we can obtain the following conclusions: (1) According to a large number of data, the overall AARD of prediction results of CNN method can be controlled at 10.01%, which is about 5.34% lower    (2) The integration of production, pressure, and water cut data will make the model more complete and connectivity prediction more accurate. Through the analysis of the influencing factors of the input variables, it can be seen that the calculation AARD of the neural network model is controlled at 16.8% by adding moisture content data into the input variables, which is significantly lower than the traditional artificial neural network model (3) The prediction of neural network model is relatively accurate, and the prediction time is very short. Based on the comparison of connectivity factors, compared with some traditional methods, CNN prediction results are closer to the real connectivity factors. In addition, the prediction time of BP and CNN is 0.8 s and 1.1 s, respectively

Data Availability
The manuscript is a self-contained data article. The entire data used to support the findings of this study are included within the article or available from the corresponding author upon request. If any additional information is required, this is available from the corresponding author upon request to e-mail: jinzi19811216@126.com.