Well Performance from Numerical Methods to Machine Learning Approach: Applications in Multiple Fractured Shale Reservoirs

Horizontal well fracturing technology is widely used in unconventional reservoirs such as tight or shale oil and gas reservoirs. Meanwhile, the potential of enhanced oil recovery (EOR) methods including huff-n-puff miscible gas injection are used to further increase oil recovery in unconventional reservoirs. The complexities of hydraulic fracture properties and multiphase flow make it difficult and time-consuming to understand the well performance (i.e., well production) in fractured shale reservoirs, especially when using conventional numerical methods. Therefore, in this paper, two methods are developed to bridge this gap by using the machine learning technique to forecast well production performance in unconventional reservoirs, especially on the EOR pilot projects. The first method is the artificial neural network, through which we can analyze the big data from unconventional reservoirs to understand the underlying patterns and relationships. A bunch of factors is contained such as hydraulic fracture parameters, well completion, and production data. Then, feature selection is performed to determine the key factors. Finally, the artificial neural network is used to determine the relationship between key factors and well production performance. The second is time series analysis. Since the properties of the unconventional reservoir are the function of time such as fluid properties and reservoir pressure, it is quite suitable to apply the time series analysis to understand the well production performance. Training and test data are from over 10000 wells in different fractured shale reservoirs, including Bakken, Eagle Ford, and Barnett. The results demonstrate that there is a good match between the available and predicated well performance data. The overall R values of the artificial neural network and time series analysis are both above 0.8, indicating that both methods can provide reliable results for the prediction of well performance in fractured shale reservoirs. Especially, when dealing with the EOR field cases, such as huff-n-puff miscible gas injection, Time series analysis can provide more accurate results than the artificial neural network. This paper presents a thorough analysis of the feasibility of machine learning in multiple fractured shale reservoirs. Instead of using the time-consuming numerical methods, it also provides a more robust way and meaningful reference for the evaluation of the well performance.


Introduction
Concerns about the environmental impact of fossil energy (oil, gas, and coal) promote an increase in the energy produced from renewable energy [1][2][3]. However, fossil energy is still the largest contributor to the total global energy consumption [4][5][6] and is predicted to compromise more than 70% of the world's total energy consumption in 2050 [4]. Among fossil energy, oil and other liquids nota-bly provide the largest share, underlining the importance of exploiting oil. Recently, due to the tremendous progress in horizontal well technology and hydraulic fracturing, the production from unconventional reservoirs is increased largely. However, the expected oil recovery is still below 10% of the original oil-in-place [7,8]. To address this issue, various enhanced oil recovery methods such as huff-n-puff miscible gas injection are used to further increase the oil recovery. Although we can obtain an increased oil recovery via these two methods, they also increase the complexity when evaluating reservoir and well performance such as using simulation of multiphase flow in fractured reservoirs and well production performance. The latter one is the current area of interest at the time of writing this paper. This work primarily focuses on using machine learning methods instead of conventional numerical methods to characterize the well performance, especially on well production.
When it comes to saying well production, a lot of numerical models can characterize it, starting from the well testing to decline curve analysis (DCA). Mokhtari and Waltrich [9] used the multiphase flow models to evaluate the well performance in virtual flow meter simulators and concluded that virtual flow meter simulators can provide accurate predictions for multiphase flow metering, such as flow rates of oil, gas, and water. Wang et al. [10] used a multiscale flow model to analyze the well production performance in shale gas reservoirs with fractal geometry and concluded that there are six regimes of flow characteristics of type curves and a small fracture conductivity and a large region conductivity can lead to a large pressure depletion. Zhu et al. [11,12] investigated the performance of downhole equipment and technology by experiments, simulation, and mechanistic modeling. Ogunyomi et al. [13] performed a statistical-based model to evaluate well production in unconventional reservoirs and summarized that the production signatures show varying slopes on a diagnostic log-log plot that ranges from one-half to one-and-a-half. Ghorbani et al. [14] applied a customized genetic optimization model to predict liquid flow rate performance in unconventional reservoirs and stated that their model can be easily applied to other reservoirs and to achieve rapid practical flow prediction applications. Male [15] used a segregated flow model to forecast the production of oil, gas, and water in shale oil reservoirs and concluded that compared to empirical observations, their model can be used to perform reliable, fast, automated decline analysis for tight oil fields. Zhao and Du [16] proposed a new numerical model for the multistage fractured horizontal well to predict well production in tight oil reservoirs and concluded that their model can easily obtain eighth flow regimes. Wu et al. [17] established a transient two-phase flow model to predict the oil production from fractured tight gas reservoirs with fluid-induced formation damage and concluded that their model can deal with production data in the flow back and production stage and evaluate the formation damage for unconventional reservoirs. Although much progress has been made in conventional numerical models on well production performance, the complexity of calculation is still an obstacle we cannot step over. Therefore, decline curve analysis (DCA) seems to be an alternative way to predict well production performance. And it indeed has been used to predict well production performance in field for a couple of years [18].
The earliest ones proposed the DCA could stem from Arnold and Anderson [19], where they did not apply the terminology "Decline Curve Analysis" but "a great mass of data" and "methods of computation". Until 1945, Arps [20] proposed the most classical DCA model in which three parameters are used to fit the production data. The Arps model is simple and fast and widely used in conventional reservoirs [21], but it still suffers some limitations, including boundary-dominated flow regime [22,23], constant work condition [24], and overestimating the estimated ultimate recovery (EUR) [25]. Especially, these limitations can be exaggerated when applying them in unconventional reservoirs. For example, due to the ultralow permeability and porosity of unconventional reservoirs, it is well accepted that it is rather hard for the fluid to reach boundary-dominated flow regime in a short time (when we say 'short,' we mean the time in unconventional reservoirs is much longer than conventional ones). Therefore, the Arps model cannot be used directly to the unconventional reservoirs without modifications. Although various modified models are proposed to handle the boundary issue, such as the modified hyperbolic decline model [26], power-law exponential decline model [27], stretched exponential decline model [28,29], Duong model [30], logistic growth model [31], extended exponential DCA model [32], and fractional decline curve model [33], they still have another important issue, that is, constant work condition. The reason is that nowadays, EOR methods are widely used in unconventional reservoirs to improve oil recovery just as mentioned before, so it is impossible to remain constant work conditions [34] in this case. If we persist to use these methods, we have to repeatedly do the fitting process, which is tedious and time-consuming. Thus, an accurate, economical, simple, and fast method is urgently needed to do the production forecast.
Machine learning is such a fast developing and compelling approach to perform well production performance, which automatically learns and improves from the experience to output the expected results without being explicitly programmed [35]. Meanwhile, this method is also based on big data, ignoring a lot of ideal scenario restrictions, such as constant work conditions and boundarydominated flow regimes in the DCA model [36,37]. Zhong et al. [38] proposed a deep learning-based method for reservoir production forecast under uncertainty and concluded that their method can predict the reservoir pressure and fluid saturation with high accuracy. Amini and Mohaghegh [39] used machine learning and artificial intelligence in the proxy model for fluid flow in porous media and stated that their approach can save much computation time when the reservoir simulation model is complicated. Teixeira et al. [40] applied machine learning models to support reservoir production optimization and mentioned that coupled with traditional reservoir simulation, their model shows promising results. Chen et al. [41] did a comparative study among machine learning and numerical models and concluded that machine learning models provided more accurate predictions than numerical models. Mohaghegh [42], Bravo et al. [43], and Li et al. [44] did comprehensively explained several models of AI and common application areas in the exploration and production industry and aimed to be a guide for production and asset management.

Geofluids
Therefore, through the previous discussion, the machine learning method does provide some benefits to us compared to the traditional numerical simulations. Here, the artificial neural network (ANN) and time series analysis (TSA) are used to perform the well production performance. The reasons behind are that first ANN can analyze a bunch of data to uncover the hidden pattern and relationships [45]. Second, EOR methods are widely used in unconventional reservoirs to improve oil recovery [2,5,[46][47][48][49][50][51][52], changing their work conditions frequently and leading to a more complicated well production performance. TSA is a good method to handle this issue because the stock markets have used this technique to predict the stock price for a long time [53]. Therefore, it is quite suitable to use this method to handle the well production performance in unconventional reservoirs with hydraulic fracturing.
In this work, field data from over 10000 wells in different fractured shale reservoirs, including Bakken, Eagle Ford, and Barnett, are used to testify the feasibility of ANN and TSA. However, the validation through the whole fractured shale reservoir shares similar results. Hence, we only present part of the results from the Eagle Ford formation. The results from the Bakken formation are provided in the Figure 1-7. Our paper is organized as follows: In Section 2, we use the conventional field data from the Eagle Ford formation. The meaning of "conventional" is that there is no enhanced oil recovery method applied. In this section, the procedure and results of ANN are discussed. In Section 3, different from data used in ANN, here, we use the data from one of the EOR pilot projects from the Eagle Ford formation. In this section, we discuss the details regarding the time series analysis such as hyperparameter selection and predicted results. Conclusions are presented in the last section.

Neural Network
Artificial neural networks (ANN), usually simply called neural networks (NNs), are one of the representative machine learning methods based on a collection of connected units or nodes called artificial neurons [54][55][56]. Figure 8 shows a layer-wise structure of a neural network, cited from Vieira et al. [57]. Figure 8(a) presents the underlying algorithm in each unit or node. Each input x i has an associated weighted w i . The sum of all weighted inputs, ∑x i w i , is passed through a nonlinear activation function such as ReLu or Logistic or SoftMax activation and finally to transform the preactivation level of the neuron to an output y j . It should be noted that for the simplicity, the bias terms have been omitted here. The output y j will serve as input to node in the next layer. Figure 8(b) shows the conventional (whole) structure of neural networks. It generally consists of an interconnected system including (a) input layers: gathering data; (b) hidden layers: data transformation; and (c) output layers: prediction of the response variable. Here, the structure of the neural network has 3 input features, 2 hidden layers, and 2 output features. Therefore, the application of this structure could be dimension deduction. Besides this application, ANN can also conduct on pattern recognition, classification, and clustering.
In this work, the first 3 years of production history from unconventional shale plays were used to predict future a year production. Related production features are cumulative oil and gas, first-month oil and gas production, and peak rate of oil and gas. In addition to the typical production history, we also have other features including well perforation (upper and lower perforation, cross perforated interval), and hydraulic fracturing data (fracture length and width) and reservoir location (surface latitude and longitude) to reflect field operation conditions. Four hidden layers are applied, and each of them contains 80 neurons. The whole layer-wise structure of a neural network is shown in Figure 9.
Initially, the whole data are scaled using MinMaxScaler, and a box plot is used to eliminate the outliers. 80% of the data is used as training data, and the rest is as test data. Cross-validation is used to determine the best split. ReLu is  3 Geofluids used as the nonlinear activation, and the optimizer is Adam. The loss function is mean squared error and metric is acc, meaning the average training accuracy at the end of an epoch where the total epochs in our models are 700. It should be noted that the performance of the neural network is enhanced by adding more hidden layers and hidden neurons. A more complicated network structure can converge to an unintended local optimum. Overfitting or underfitting that occurred by the extent of the dataset is another concern to disturb the optimization process [56,58]. The target feature (output layer) is well production performance. The predicted result is shown in Figure 10. The predicted accuracy after 700 epochs is 0.83.
Although we include hundreds of well information in the ANN system, the predicted accuracy is still not satisfactory. Therefore, to design an optimized ANN system, feature selection was implemented to control the quality of data by removing inefficient features in input data. There are several ways to do feature selection such as the principal component analysis, random forest, and multiTaskLassoCV. Usually, the principal component analysis can do the dimensional deduction and choose the most important component. It can reduce the effect of inefficient features to some extent but the effect is still there. A random forest can follow the process similar to a decision tree and pick out the importance of features in a sequential way. Meanwhile, multiTaskLassoCV can loop over the feature and determine its sequence. Therefore, both random forest and multiTaskLassoCV are used to process the feature selection. The result shows that gross perforated interval, surface latitude, and longitude are inefficient features and should be ignored.
Then, we constructed the optimized deep neural network (DNN) that followed the way described before. To build a DNN system close to an ideal network system with   5 Geofluids

Input layer
Output layer 1 st hidden layer 2 nd hidden layer Figure 8: Representation of the layer-wise structure of the neural network [57].    7 Geofluids reasonable complexity and powerful performance, the dataset is divided into 80% of training data and 20% of validation data. Then, after several trials and errors, four hidden layers with 80 neurons for each are the best for our cases as shown in Figure 9. Figure 11 is the simulation results of our DNN system. Before the validation of the system, the DNN indicates an overall underpredicted production, especially at the production data of the first four months, indicating the inefficient features can lead to a big deviation for the prediction. Through the reconstruction of the network and data processing, the predicted production history using an optimized neural network is closer to real data, where the total predicted accuracy is 0.92. Figure 12 is a comparison of DNN performance in terms of the predicted peak oil rate. As we can see, the improved DNN system shows much less dispersed output around the target data.

Time Series Analysis
The time series analysis is a statistical technique that deals with time series data, or trend analysis. Time series data means that data is in a series of particular periods or intervals. It has been widely applied in different areas including stock prices, weather forecasting, pattern recognition, and control engineering. Nowadays, the most used time series models are the AR (autoregressive) model, the integrated models, and the MA (moving average) models. However, these three models more or less suffer some limitations. The autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models are the combinations of two or three of these models in which they can apply to more broad areas. In this work, ARIMA models are applied to characterize the well production performance.
Before we use the ARIMA model to analyze data, we must primarily check the data stationarity. There are several ways to do this, and one of the popular ways is to apply the autocorrelation and partial autocorrelation methods. This process is important since we can use the autocorrelation and partial autocorrelation of the data to determine some hyperparameters in ARIMA models. Usually, there are three important parameters including p (trend autoregression order in AR model), q (trend difference order in the integrated model), and d (trend moving average order in moving average model). The physical meaning behind these three parameters is that p demonstrates that the previous occurrence can affect the current occurrence. q indicates the stationary process of the models. d is the impulse effect propagation to the future such as seasonality and seasonal effects of production. It        9 Geofluids should be noted that although autocorrelation and partial autocorrelation methods are feasible to determine the hyperparameters in the ARIMA model, they can only provide the ranges of the hyperparameters. The final values of hyperparameters are determined by some other criteria such as the accuracy of the model. Therefore, in this work, first, we use the autocorrelation and partial autocorrelation to determine the range of hyperparameters, and then, the grid search method is used to determine the ARIMA hyperparameters, in which the built-in package 'MODEL_ARIMA.FIT' in Python can save much effort. Figure 13 shows the cumulative oil production and oil rate as the function of time in the Eagle Ford formation where the EOR method was applied in 2012. Note that the scale of the y-axis in Figure 13 is in a scientific style where Figures 13(a) and 13(b) are on a scale of 10 5 and 10 3 , respectively. Figures 14 and 15 show the partial autocorrelation for cumulative oil and oil rate, respectively. To estimate the amount of p hyperparameter, we need to look at the partial autocorrelation plot. First, ignore the value at lag 0 ( Figures 14 and 15). It will always show a perfect correlation since we are estimating the correlation between today's value with itself. Note that there is a blue area in the plot, representing the confidence interval. To estimate how much p hyperparameter we should use, start counting how many "lollipop" are above or below the confidence interval before the next one enters the blue area. So, looking at the partial autocorrelation plot below, we can estimate to use 2 AR terms for our model, since lag 0, and 1 is out of the confidence interval, and lag 2 is in the blue area. For estimating the q hyperparameter, this is a simple part. All we need to estimate the amount of the q hyper-parameter is to know how many differentials are used to make the data stationary. Here, we use log difference to transform the data, so the amount of q hyperparameter is 1. For estimating d hyperparameter, just like the partial autocorrelation, this time we will look at the value in the autocorrelation plot and the same logic is applied here. In our data, the estimated d hyperparameter is around 0.
Next, we will use the built-in package to obtain the ARIMA hyperparameters. According to the range of hyperparameters estimated via partial autocorrelation and autocorrelation, initially, we guess the ranges of these three parameters from 0 to 5. Meanwhile, the fitting cannot be perfect. If in that case, it will be overfitting which should be avoided. In our case, we find p = 2, q = 1, and d = 0 give us the least Akaike information criterion (AIC) which is equal to about 1003. Note that the less the AIC is, the better the model will be. Figure 16 shows the oil rate prediction with varying timesteps. Here, four different timesteps are used in our model (1, 3, 6, and 12 months). It is worthy to note that the meaning of timestep is the updating frequency of training data in the ARIMA model. This is the most important advantage of the time series model. In conventional machine learning methods such as random forest, once the training data is determined, it will fix through the whole simulation process. However, as for the time series analysis, we will update the training data in a set timeline. Therefore, it will be more accurate. Hence, the timestep of 1 month means that the training dataset will update every month. In other words, when we have one more month of data, the training data will include this new data, and the ARIMA model will fit the new training data; then, the updated ARIMA model will be used to predicting future data in one month. In the time series analysis, the first 50% of the data is treated as training and the rest of the data is test data. Figure 16 indicates that when we update our data with a smaller timestep, the prediction of the outcome will be better. This is reasonable because the ARIMA model can make a better prediction when you input more data. The total accuracy of the prediction is 0.94 when the time step is 1 month.
As shown in Figure 17, when the timestep is 12 months, the prediction variance is the biggest. However, when we update the data every month, the prediction is the best. This result also confirms our previous remarks. For the prediction of cumulative oil as shown in Figures 18 and 19, the results share the same reason as the prediction of the oil rate. Therefore, from these results, the time series analysis shows an important characteristic where other conventional machine learning methods do not have. That is, conventional machine learning methods do not consider the newly produced data and the training data is always kept the same, and they never update their training data. However, the time series analysis treats the training data as the function of time and updates it at a fixed time. That is the reason it can output a good prediction in stock price. In petroleum engineering, nowadays, EOR methods such as huff-n-puff have been applied in  many fields to improve oil recovery, leading to the change of production trend. In this situation, conventional machine learning cannot make a good prediction.

Conclusion
In this work, two machine learning methods are used to predict well production performance. Instead of using conventional numerical models, machine learning methods can provide us a more robust way. In the artificial neural network, different features are used such as well completion, hydraulic fracturing, and production data. Meanwhile, a modified deep neural network is built where feature selection is processed. The result shows that suitable feature selection can improve the prediction accuracy and avoid some noised features. Different from artificial neural networks, in time series analysis, we only consider the well production performance. We think the well production performance is the outcome of other factors such as well completion and hydraulic fracturing. The result indicates that time series analysis is more suitable to handle the production data, especially for the EOR production data, for the underlying algorithm of the time series analysis makes it update the training data as the function of time. Therefore, it can yield a good match result. Compared to the conventional numerical models such as multiphase flow models and decline curve analysis, these two methods provide us an effective way to characterize and predict well performance. However, there are still some limitations of these two machine learning methods. Taking the time series analysis as an example, the trickest part of this method is to determine the hyperparameters (p, q, d).
Although two methods are used including autocorrelation or partial autocorrelation function and the built-in package "MODEL_ARIMA.FIT," they often provide us a set of hyper-parameters (p, q, d) that share some similar prediction accuracies. Therefore, as for future work, setting another criterial to acquire more reasonable parameters in the time series analysis is needed further investigation.

Data Availability
Data are available on request.

Disclosure
An earlier version of this work was presented as a conference paper at SPE Improved Oil Recovery Conference, August 2020 as SPE 200365, and the feedback obtained from colleagues at this conference is greatly appreciated.

Conflicts of Interest
The authors declare that they have no conflicts of interest.