^{1}

^{2}

^{1}

^{2}

^{3}

^{1}

^{1}

^{1}

^{2}

^{3}

In the middle and late stages of heavy oil development, formulating a scientific and reasonable mining plan is the key to improving oilfield efficiency. At present, steam stimulation is still the main development method of heavy oil. The determination of its production is not only limited by boiler conditions, surface pipelines, and wellbore conditions but also by the steam absorption capacity of the formation. Therefore, local analysis cannot achieve the best effect in the whole process of steam stimulation. The mechanism model is the most commonly used method to predict heavy oil production, but too many idealized assumptions make the prediction results quite different from the actual production situation. With the rapid development of machine learning, people can achieve rapid prediction of production through field data. However, when the range of the actual parameter is small, the generalization ability of the model is weak and overfitting occurs. Based on the above background, this paper conducts a coupling study on surface steam pipeline flow, steam injection wellbore flow, and formation flow from the perspective of data-driven. Firstly, based on the correlation coefficient and the feature selection of Random Forest, the importance of the characteristics affecting liquid production and water content was ranked. Secondly, through the comparison of five typical machine learning algorithms, we select the optimal prediction model and optimal characteristics suitable for the sample of this paper. Finally, because of the poor generalization ability of the prediction model, we sampled the mechanism model and increased the diversity of steam dryness samples. We find that the accuracy of the optimal prediction model is improved and the generalization ability of the model is improved after the training of new samples. This paper provides a new idea for the production prediction of heavy oil steam stimulation reservoirs, which is helpful for the efficient development of heavy oil reservoirs.

As a rich mineral resource, heavy oil has important practical significance for its efficiency and economic development. However, due to the high viscosity and poor flowability of heavy oil, it is difficult to achieve ideal results with conventional technology. Therefore, steam stimulation is still the main development method of heavy oil. The local analysis theory of steam stimulation technology in surface pipelines, wellbores, and formations has been relatively mature and applied to the actual production of oilfields [

The dynamic prediction of steam stimulation wells is the basis of injection parameter design and production design optimization. To improve the mining effect of steam stimulation, researchers have conducted a lot of research on the index prediction of steam stimulation wells. Marx and Langenheim used the energy balance to calculate the heating area of the oil layer [

From the perspective of percolation mechanics and cybernetics, the reservoir system belongs to the distributed parameter system. The basic physical quantities describing the reservoir state are water saturation field and reservoir pressure field. Different parameters represent different underground conditions. The mechanism model reflects our induction and summary of real phenomena and is a reliable and prior cognition of the flow law of underground fluid. Although the mechanism model developed more and more perfectly but compared with the reservoir numerical simulation method, the parameters considered are much less. In 1953, Bruce et al. simulated the one-dimensional gas-phase unstable radial and linear flow [

So far, reservoir numerical simulation software has made a great breakthrough in the integration of functions. For different types of oil and gas reservoirs, different mining methods can almost be used to deal with reservoir numerical simulation software [

In recent years, artificial intelligence methods have been widely used in the field of petroleum engineering [

The innovations of this paper are as follows. (1) Based on previous studies, we conduct a coupled study on surface steam pipeline flow, steam injection wellbore flow, and formation flow based on data-driven. (2) Based on the correlation coefficient and Random Forest feature selection, this paper ranks the features that affect liquid production and water content in importance. (3) For a heavy oil field in eastern China, we used five typical machine learning algorithms to model and compare its field data. It is found that the six characteristics of produced degree, dynamic liquid surface, soaking time, stroke, stroke times, and well pattern mode have little effect on liquid production and water content, which are eliminated. At the same time, the prediction models of liquid production and water content based on Random Forest have the highest accuracy of 86% and 83%, respectively, but the generalization ability of the prediction models is poor. (4) We sampled the mechanism model, increased the diversity of steam dryness samples, and trained the new samples again. It is found that the accuracy of the optimal prediction model obtained previously was improved, making the prediction results more accurate and reliable, and the generalization ability of the model was improved.

The content of this paper is arranged as follows. The second part introduces the data source and data preprocessing. The third part is the establishment and verification of the input and output model of the reservoir system based on data-driven. The fourth part is the establishment and verification of the input and output model of the oil reservoir system based on hybrid data-driven. The fifth part is the conclusion.

The data used in this paper are collected from the dynamic and static information, steam injection data, and production data of 109 heavy oil blocks in a heavy oil field in eastern China. Among them, the static information includes oil area, produced reserves, porosity, permeability, and other information data. Dynamic indicators include cumulative oil and cumulative water production. Steam injection data include steam quantity at the boiler outlet, steam pressure at the boiler outlet, and so on. Production data include liquid production and water content.

Data preprocessing is also an important part of data-driven index prediction, which greatly affects the accuracy of prediction. There are many missing or abnormal values in the actual production data, which cannot be directly trained. Therefore, data cleaning and other operations must be carried out first to obtain higher prediction accuracy.

We remove outliers according to the PauTa criterion (

For the collected samples, if there is too much missing data for a certain group of samples or the sample is missing the two important data of liquid production and water content, the sample is deleted. For the missing values of other parameters such as steam temperature and steam pressure, the

After outlier processing and missing value filling, we finally sorted out 97 heavy oil blocks from 109 heavy oil blocks, a total of 780 groups of samples.

Feature selection is also called feature subset selection or attribute selection. It is a data preprocessing operation that selects from the original features to reduce the data dimension and improve the generalization ability of the model. In practice applications, although more parameters can be used to integrate more information, too many parameters will reduce learning efficiency and even affect prediction accuracy.

Since many factors affecting the development index of heavy oil steam stimulation, it is necessary to go through a systematic index analysis process to find the development index more accurately. Based on the basic theory of reservoir engineering and combined with related research [

Reservoir characteristic: reservoir type, surface crude oil viscosity, initial formation temperature, reservoir buried depth, edge-bottom water, oil area, dynamic reserve, primitive oil-bearing saturability, reservoir effective thickness, porosity, net total ratio of oil layers, permeability, original formation pressure, and dynamic liquid surface, in turn with _{1}∼_{14}

Productive regulation: soaking time, well distance, well spacing density, well pattern mode, startup well number, stroke, stroke times, production time, and annual turnover, in turn with _{15}∼_{23}

Characteristics of historical production: cumulative oil production, cumulative water production, and produced degree, in turn with _{24}∼_{26}

Control variable: steam quantity at the boiler outlet, steam flow rate at the boiler outlet, steam pressure at the bottom of the steam injection well, and steam dryness at the bottom of the steam injection well

Output variable: liquid production and water content, represented by _{1} and _{2}, respectively

In the data-driven process, considering that the interaction between data may have a negative impact on the final result, appropriate choices are therefore needed. The four control variables directly affect the final mining effect so as the input of the model, and this paper only selects the remaining 26 variables.

The correlation coefficient is a type of statistical analysis index, which is usually used to determine the direction and degree of linear correlation of variables. The formula is as follows:

We get the correlation coefficient between 26 independent variables and 2 dependent variables, as shown in Table

Correlation coefficient of partial variables.

_{1} | _{2} | _{3} | _{4} | _{5} | _{6} | _{7} | _{8} | _{9} | _{10} | _{11} | _{12} | _{13} | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

_{1} | 0.574 | 0.718 | 0.671 | 0.622 | 0.419 | 0.698 | 0.697 | 0.785 | 0.727 | 0.772 | 0.640 | 0.485 | 0.386 |

_{2} | 0.685 | 0.526 | 0.532 | 0.399 | 0.640 | 0.778 | 0.715 | 0.688 | 0.641 | 0.675 | 0.583 | 0.640 | 0.665 |

_{14} | _{15} | _{16} | _{17} | _{18} | _{19} | _{20} | _{21} | _{22} | _{23} | _{24} | _{25} | _{26} | |

_{1} | 0.562 | 0.078 | 0.760 | 0.746 | 0.109 | 0.646 | 0.353 | 0.029 | 0.509 | 0.575 | 0.876 | 0.745 | 0.237 |

_{2} | 0.082 | 0.056 | 0.399 | 0.558 | 0.117 | 0.622 | 0.165 | 0.336 | 0.656 | 0.684 | 0.796 | 0.787 | 0.348 |

Feature screening based on Random Forest refers to how much contribution each feature makes on each tree in the Random Forest [_{mk} represents the proportion of category

Then, the importance of feature _{j} at node _{l} and _{r}, respectively, represent the Gini index of the two new nodes after branching.

If the node of feature _{j} in the decision tree _{j} in the tree

Assuming that there are _{j} throughout the Random Forest is as follows:

We get the importance of 26 characteristics that affect liquid production and water content, as shown in Table

Characteristic importance based on the Random Forest Gini index.

_{1} | _{2} | _{3} | _{4} | _{5} | _{6} | _{7} | _{8} | _{9} | _{10} | _{11} | _{12} | _{13} | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

_{1} | 0.126 | 0.135 | 0.186 | 0.094 | 0.167 | 0.008 | 0.113 | 0.006 | 0.096 | 0.094 | 0.126 | 0.178 | 0.167 |

_{2} | 0.169 | 0.087 | 0.167 | 0.172 | 0.005 | 0.135 | 0.124 | 0.107 | 0.085 | 0.167 | 0.091 | 0.134 | 0.142 |

_{14} | _{15} | _{16} | _{17} | _{18} | _{19} | _{20} | _{21} | _{22} | _{23} | _{24} | _{25} | _{26} | |

_{1} | 0.001 | 0.008 | 0.124 | 0.148 | 0.001 | 0.007 | 0.001 | 0.001 | 0.183 | 0.139 | 0.009 | 0.191 | 0.002 |

_{2} | 0.007 | 0.002 | 0.098 | 0.109 | 0.002 | 0.136 | 0.003 | 0.014 | 0.009 | 0.008 | 0.126 | 0.135 | 0.016 |

We obtained the correlation coefficient and the importance ranking based on Random Forest feature selection and then added them to make a comprehensive comparison and to obtain the importance ranking of variables affecting liquid production and water content. The results are shown in Table

Ranking of characteristic variables affecting liquid production and water content.

Ranking | Parameter | Ranking | Parameter |
---|---|---|---|

1 | Cumulative water production | 2 | Cumulative oil production |

3 | Porosity | 4 | Initial formation temperature |

5 | Well spacing density | 6 | Reservoir type |

7 | Dynamic reserve | 8 | Primitive oil-bearing saturability |

9 | Oil area | 10 | Original formation pressure |

11 | Well distance | 12 | Surface crude oil viscosity |

13 | Reservoir effective thickness | 14 | Permeability |

15 | Production time | 16 | Annual turnover |

17 | Startup well number | 18 | Reservoir buried depth |

19 | Net total ratio of oil layers | 20 | Edge-bottom water |

21 | Produced degree | 22 | Dynamic liquid surface |

23 | Soaking time | 24 | Stroke times |

25 | Stroke | 26 | Well pattern mode |

The steam stimulation oil recovery process is composed of a steam injection system, reservoir system, and lifting system. They perform the steam injection, soaking, and production, as shown in Figure

Steam stimulation oil recovery process [

Steam flow process.

This paper uses the steam injection wellhead and bottom hole as nodes to couple surface steam pipeline flow, steam injection wellbore flow, and formation flow. To explore the complex formation flow law, firstly, we convert the field data from the boiler outlet to the bottom of the well through a simplified mechanism, as shown in Figure

We make the following assumptions [

The pressure loss when steam flows in the pipeline is not considered

The steam temperature and atmospheric temperature are fixed

There is an insulating layer outside the steam pipeline

Since reaching the wellhead is still saturated steam and we ignore the change of pressure, its temperature is constant. At the same time, we do not consider the change of kinetic energy and potential energy, but only consider the change of steam internal energy. Then, the wellhead dryness can be calculated by the energy balance principle. We have

The dryness loss of the steam pipeline is as follows:

We make the following assumptions [

The steam injection rate, steam pressure, and steam quality of the wellhead remain unchanged

We assume that the heat transfer from the oil well to the cement ring is one-dimensional stable, and the heat transfer from the cement ring to the formation is one-dimensional unstable heat transfer and ignores the heat transfer along the well depth direction

We consider pressure changes in the wellbore

We assume that the thermal conductivity of the formation is constant

This paper only considers the case of vertical injection wells. Since saturated steam is injected into the well, it becomes a two-phase flow of water and vapor. Therefore, according to the pressure balance equation, the pressure drop formula is expressed as follows:

We obtain the steam pressure change of the steam injection wellbore as follows:

Considering the limitation of the article content, the proof process is shown in Appendix

In unit time, the heat loss on the length

The heat loss of the wellbore will inevitably lead to a decrease in saturated steam energy, which will result in a decrease in steam dryness. We have

Furthermore,

We make the following transformation:

Therefore, the solution of equation (

We obtain the following dryness loss of the steam injection wellbore:

Considering the limitation of the article content, the proof process is shown in Appendix

According to the importance ranking results of the characteristics affecting the liquid production and water content in Section ^{2} (determination coefficient) of the model on the liquid production and water content as the measurement standard. The larger the ^{2}, the better the model accuracy. The formula for ^{2} is as follows:_{j,c} is the actual observation value, _{j,p} is the predicted value, and _{j,a} is the average value of the actual observation value.

For the 780 groups of samples sorted out in Section ^{2} of cross-validation results was used as the estimation of algorithm accuracy. The effects of the feature number on the determination coefficients of liquid production and water content are shown in Figures

The effect of the feature number on liquid production.

The effect of the feature number on water content.

It can be seen that when the number of features is 24, the prediction accuracy of liquid production and water content based on the Random Forest algorithm is the highest, which are 86% and 83%, respectively. At this time, the determination coefficients of the five algorithms for liquid production and water content are shown in Table

Coefficient of determination of liquid production and water content.

Algorithm | ^{2} of liquid production | ^{2} of water content |
---|---|---|

0.44 | 0.78 | |

Linear Regression | 0.72 | 0.46 |

Random Forest | 0.86 | 0.83 |

AdaBoost | 0.78 | 0.77 |

Support Vector Regression | 0.20 | 0.44 |

In order to further verify the accuracy of the model after adding dryness samples, we randomly selected two blocks (A and B) from 97 heavy oil blocks and used the established model to simulate the influence of steam quantity, bottom-hole steam pressure, and bottom-hole steam dryness on oil production and liquid production by the control variable method. The results are shown in Figures

Effect of steam quantity on oil and liquid production.

Effect of bottom-hole steam pressure on oil and liquid production. (a) Block

Effect of bottom-hole steam dryness on oil and liquid production. (a) Block

According to Figure

According to Figure

According to Figure

The essence of training the model through field data is function fitting, and the fitting function has no clear direction, as shown in Figure

Hybrid data-driven process.

In Section

Coefficient of determination of liquid production and water content.

Algorithm | ^{2} of liquid production | ^{2} of water content |
---|---|---|

0.49 | 0.80 | |

Linear Regression | 0.76 | 0.47 |

Random Forest | 0.88 | 0.85 |

AdaBoost | 0.80 | 0.77 |

Support Vector Regression | 0.20 | 0.45 |

According to Table

In order to further verify the accuracy of the model after adding dryness samples, we used a new prediction model for blocks

Effect of steam quantity on oil and liquid production. (a) Block

Effect of bottom-hole steam pressure on oil and liquid production. (a) Block

Effect of bottom-hole steam dryness on oil and liquid production. (a) Block

According to Figure

According to Figure

Figure

Based on previous studies, this paper conducts a coupled study on surface steam pipeline flow, steam injection wellbore flow, and formation flow based on data-driven. This provides a new idea for the prediction of heavy oil steam stimulation production and a theoretical basis for further formulating scientific and reasonable development plans.

Based on the correlation coefficient and Random Forest feature selection, this paper ranks the features that affect liquid production and water content in importance.

For a heavy oil field in eastern China, we compared the field data through five typical machine learning algorithms and selected the optimal prediction model and the optimal number of features suitable for the sample problem in this article, but the generalization ability of the prediction model is poor. Therefore, we sampled the mechanism model, increased the diversity of steam dryness samples, and trained the new samples again. It is found that the previously obtained optimal prediction model not only improved the accuracy but also the generalization ability of the model.

It is feasible to study the steam stimulation production of heavy oil from the perspective of mechanism model and field data in this paper. However, this paper still has some limitations. Firstly, there is a certain error in the collection of field data, which may affect our results. Secondly, the lack of samples leads to weak generalization ability after training. Thirdly, the content of steam stimulation is complex, and many factors are affecting the production of steam stimulation. In the selection of features, this paper did not consider the influence of heavy oil lifting methods and viscosity reduction technology.

Based on the assumptions in Section

The change of kinetic energy has obvious significance only in the case of the fog flow. For the fog flow, the gas volume flow is much larger than the liquid volume flow. Therefore, according to the law of ideal gas, we have

At the same time,

So,

We replace equation (

In unit time, the heat loss on the length

The heat loss of the wellbore will inevitably lead to a decrease in saturated steam energy, which will result in a decrease in steam dryness. We have

Among them,

At the same time,

So,

We make the following transformation:

Therefore, the solution of equation (

We obtain the following dryness loss of the steam injection wellbore:

Table

Parameter description.

Letter | Physical quantity | Unit |
---|---|---|

_{s} | The enthalpy of saturated steam at a certain pressure | kcal/kg |

The enthalpy of saturated water under a certain pressure | kcal/kg | |

Steam dryness of boiler outlet | Decimal | |

Steam dryness of steam injection wellhead | Decimal | |

Length of pipeline | m | |

_{s} | Saturated steam mass flow rate | kg/h |

_{l} | Heat loss per unit time and unit length of steam pipeline | kcal/(h·m) |

The pressure at a point in the wellbore | kgf/m^{2} | |

Well depth | m | |

_{m} | The density of the saturated steam mixture | kg/m^{3} |

Acceleration of gravity | m/s^{2} | |

_{f} | Friction loss gradient | (Kgf/m^{2})/m |

The flow rate of the saturated steam mixture | m/h | |

_{2} | The outer radius of the inner pipe | m |

_{2} | Overall heat transfer coefficient | kcal/(h·m^{2}·°C) |

_{s} | Steam temperature | °C |

_{h} | Outer-edge temperature of cement ring | °C |

The volume flow of steam | m^{3}/h | |

_{p} | Pipe cross-sectional area | m^{2} |

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that they have no conflicts of interest.

The authors are grateful to all of the anonymous reviewers for their careful reading and valuable comments on how to improve this work. This work was supported by the National Natural Science Foundation of China (no. 11601451), the International Cooperation Program of Chengdu City (no. 2020-GH02-00023-HZ) and the Scientific Research Project of Sinopec Corporation “Heavy oil steam stimulation low-consumption and high-efficiency development of overall optimization technology” (no: P19018-5).