System Identification Based Proxy Model of a Reservoir under Water Injection

Simulation of numerical reservoir models with thousands and millions of grid blocks may consume a significant amount of time and effort, even when high performance processors are used. In cases where the simulation runs are required for sensitivity analysis, dynamic control, and optimization, the act needs to be repeated several times by continuously changing parameters. This makes it even more time-consuming. Currently, proxy models that are based on response surface are being used to lessen the time required for running simulations during sensitivity analysis and optimization. Proxy models are lighter mathematical models that run faster and perform in place of heavier models that require large computations. Nevertheless, to acquire data for modeling and validation and develop the proxy model itself, hundreds of simulation runs are required. In this paper, a system identification based proxy model that requires only a single simulation run and a properly designed excitation signal was proposed and evaluated using a benchmark case study. The results show that, with proper design of excitation signal and proper selection of model structure, system identification based proxy models are found to be practical and efficient alternatives for mimicking the performance of numerical reservoir models. The resulting proxy models have potential applications for dynamic well control and optimization.


Introduction
Reliable prediction is a natural requirement to the design of an optimal management strategy whereby important decisions on future investment, facility development, and expansion will be made [1].Typically, in the petroleum industry, forecasting oil, gas, and water production rates is undertaken on a regular basis.Some of the factors that determine precision and accuracy of these forecasts include quality of geological and geophysical (G&G) data, quality and quantity of observed production rate and pressure data, and modeling procedures employed.
Decline curve analysis (DCA) and numerical reservoir simulation are classical methods that are frequently employed to forecast reservoir performance [2][3][4][5][6].Each of these methods has strengths, limitations, and restrictions [7].In practice, it is common to implement more than one method to reduce uncertainty and increase prediction accuracy.Decline curve analysis (DCA) is popular in the industry and is relatively a modest technique.It is one of the oldest methods, which was developed by Arps [2] and has been utilized even when there were no high-speed processors for handling large amounts of production and pressure data.The empirical Arps equations recognized for different curtailment trends are shown in (1), (2), and (3) [8].The analysis is based on fitting one of these three empirical relations using measured production data.
For the exponential case ( = 0),  =    (−) . ( For the hyperbolic case (0 <  < 1), For the harmonic case ( = 1), where  is the nominal (instantaneous) decline rate,  is the decline exponent constant,  is the flow rate, and   is the initial flow rate 2 Modelling and Simulation in Engineering However, it is recognized that the application of DCA is limited to the early production stage of a well, where production is at steady state and the reservoir is under boundarydominated flow.In addition, rock and fluid compressibility must be low and constant.In cases such as gas reservoir and solution gas derived oil production, where compressibility is higher, the decline is less severe and hence none of the three curtailment trends will fit.An improved DCA which considers reservoir parameters as part of the decline analysis was developed by Fetkovich [4] and is referred to as Fetkovich's type-curve analysis.Fetkovich showed that the value of  was an indication for efficiency of recovery, where zero was the lowest efficiency with small compressibility and 0.5 was the highest efficiency that is achievable from any known recovery mechanism with optimistic relative permeability.Analytical flow equations were applied to generate type curves for transient flow, and they were combined with the empirical decline curve equations documented by Arps [2].However, Fetkovich's analysis has some notable limitations.It is applicable if the reservoir is homogeneous and has constant thickness, no-flow outer boundary, and constant flowing pressure at the wellbore.In addition, Fetkovich's type-curve analysis assumes a constant productivity index, which implies a straight-line relationship between pressure and flow rate.However, Evinger and Muskat [9] stated that for multiphase flow the relationship between pressure and flow rate is curved.Therefore, Fetkovich's type-curve analysis performs well if only the reservoir is saturated with a single-phase fluid of constant and small compressibility.
On the other hand, numerical reservoir simulation based on first principles provides a more accurate and robust solution to the task of forecasting reservoir performance.Reservoir simulation models are mathematical descriptions of an actual reservoir and are based on material balance, momentum balance, and constitutive equations such as Darcy's Law.Reservoir models are often used to analyze, optimize, and forecast both pressure and saturation dependent terms.Geological, geophysical, petrophysical, well log, core, and fluid data are required to construct reservoir simulation models.However, development of an accurate and representative numerical model is challenging and it involves utilization of a large number of uncertain data.This is because reservoirs are gigantic subsurface systems, which are both heterogeneous and anisotropic.Due to the relatively large size, samples of rock and fluid are collected from only a few locations within the boundary of the reservoir.Statistical techniques are then employed to populate and infer rock and fluid properties for the remaining portion.Moreover, contrary to direct measurement, most of the input data required to build a simulation model are inferred from physical and chemical measurements on rock and fluid samples taken at only few locations of the vicinity of the reservoir.For instance, fluid resistivity measurements are used to determine saturation of water within the reservoir and levels of natural gamma ray and spontaneous potential of a reservoir rock are used to determine the type of rock (lithology) and mineral content.Thus, reservoir property data used in modeling and forecasting are only best estimates and will not be exact representatives of the reservoir at large.Besides, the historymatching stage by which the inferred reservoir properties are adjusted until the numerical simulation model reasonably mimics the actual reservoir is a formidable task and has substantial limitations [3,5].
History matching is the process of altering sensitive reservoir model parameters and developing a list of representative numerical models which are capable of reproducing observed production and pressure data [10,11].History matching is undertaken via manual alteration of sensitive parameters or automatically by using dedicated multiobjective optimization algorithms [12].The former is time-consuming and requires expert knowledge and experience on the reservoir under study.In addition, manually adjusting reservoir parameters results in trial-and-error solutions; therefore, it is impossible to use all information; it is not suitable for complex reservoir models; it is time-consuming and it provides suboptimal results.A relatively new technique, which is referred to as assisted history matching (AHM), substantially reduces the time required.In AHM, global optimization algorithms help to reduce a misfit function between the observed and simulated production and pressure data.AHM techniques apply methods that allow for multiple realizations and are able to deal with the uncertainty in an optimal manner.
The Kalman Filter and its variants, such as the Ensemble Kalman Filter (EnKF) and Particle Filter, are currently popular among the research community [13][14][15][16][17]. Though AHM shortened the time taken during the history-matching phase, it is an ill-posed optimization problem, which is known to result in many realizations.Different sets of reservoir parameters may equally reproduce the historical data which results in a different prediction [6].The so-called best realization could also produce a geologically inconsistent reservoir model [11].The reader is advised to refer to [3] which has critically summarized the limitations of assisted history matching.
Once history matching is successfully completed, the models are used for production optimization and prediction of future reservoir performance under various development strategies.The process is repeated several times until a viable strategy is realized.For a medium sized reservoir model, each simulation run may require hours to complete.In spite of the demanding effort and time spent on model construction and history matching, the challenge encountered during optimization makes it difficult to exploit the full potential of numerical reservoir simulators.The use of proxy models that mimic the behavior of numerical reservoir models is currently on the rise and significantly eases the optimization stage.Once developed, these proxy models can be evaluated within seconds.Many commercial simulators that generate polynomial proxy models are being developed and used in both industry and academia.In all the simulators, response surface based proxy models are being employed.However, several runs are required to develop a representative proxy model that accurately mimics the response of numerical reservoir models.For instance, if there are ten factors (e.g., water injection wells) and three responses (oil, water, and gas production rates), then the Box-Behnken experiment design will require at least 170 runs.Again, if each run takes four hours to complete, then 680 computational hours are required to gather data for proxy modeling.
In this paper, we propose system identification based proxy models for optimization of a reservoir model.System identification is the art and science of model building that engages statistical methods to build mathematical models of dynamic systems from measured data.System identification involves proper design of excitation signal, model structure selection, parameter estimation, and validation of the identified model.By properly designing the excitation signal and by selection of an appropriate model structure, it is possible to obtain a representative proxy model using data from a single run of the numerical simulation model.System identification is extremely flexible in a sense that there are several model structures to capture linear as well as nonlinear relationships.In this study, a benchmark reservoir model originally developed to evaluate history-matching techniques is used to validate the efficacy of system identification based proxy models, which have the potential to be used for automatic control and production optimization.The proposed methodology assumes that the numbers of wells and operating conditions except for injection rates remain the same.In cases where infill drilling is required, it is necessary to redo the identification anew.

System Identification Based Proxy Models.
System identification is a popular and steadily growing modeling approach.It is used to build analytical models of a dynamic system and forecast future behavior easily with less computational effort and time [18][19][20].It is commonly used in engineering where prediction is part of an integrated technology such as model based control systems.System identification develops a uniquely structured mathematical model of a system through the application of certain model structure and conventional statistical methods.The number of researches and results reported related to the use of system identification in the oil and gas industry is increasingly growing.Aïfa [21] has presented applications of neural network based black-box model, to various problems that arise in the process of understanding the behavior of reservoirs.Sheremetov et al. [22] applied nonlinear autoregressive neural networks with exogenous input to forecast production rate of a naturally fractured reservoir.Olominu and Sulaimon [7] pointed out that autoregressive integrated moving average (ARIMA), which is a popular model structure in time series analysis, has a better efficacy than DCA.In addition, they showed that the model might be useful for short-to mid-longterm forecasting.However, ARIMA model structure does not consider external inputs, such as water injection.As such, the application is limited to primary recovery only.Elmabrouk et al. [23] studied the use of feedforward backpropagation artificial neural network in the prediction of oil production and employed real data from a Libyan oil field to test the hypothesis.The study proves the potential use of advanced empirical models for better prediction of oil production, which is essential for better reservoir management.However, the study does not incorporate the effect of external inputs.System identification is the process of developing mathematical models of a physical system from input and output data [18,19].In this study, the system is a reservoir under water injection, the inputs are either water injection rates or well bottomhole pressures, and outputs are production rates.During the lifetime of a reservoir, data such as wellhead flowing pressure, bottomhole flowing pressure, production rates, injection rate, gas/oil ratio, and water cut are recorded regularly to quantify production and evaluate field performance.Therefore, one can develop a dynamic model that assists in decision-making using system identification techniques and those readily available sets of data.In this study, linear polynomial model structures, which are among the many variations of system identification model structures, are used to build a proxy model, which can be used for production optimization.Equation ( 4) displays the general representation of a linear, dynamic time invariant system [24,25]: where (), (), (), (), and () are polynomials with respect to the shift operator  −1 , where  − () = (−1).Depending on the nature of these polynomials, different structures can be formulated as shown in Table 1.
Noise integrated versions of the model structures listed in Table 1 are also available for system identification.These are special cases of general linear input-output models.They correspond to linear difference equations relating the input to the output under various noise assumptions.For ARX model structure, addition of noise integration leads to an "ARIX" model structure: Similarly, the ARMAX and BJ structures can be enhanced to handle nonstationary disturbances and are referred to as ARIMAX and BJI, respectively.The development of the system identification based models comprises the following key procedures: A detailed discussion on the steps of system identification and model development is available by Ljung [24] and Nelles [25].

Excitation Input Selection.
Selection of a proper excitation input dictates efficacy in system identification.There are several input sequences, which can be used in modeling different complexities.Among them, a pseudorandom binary sequence (PRBS) is very popular and efficient in equally exciting all frequencies.PRBS is a sequence that switches between an upper and a lower bound at varying frequencies.PRBS can also have multiple amplitudes; that is, different upper and lower bounds may be specified at different times for a given input sequence.However, such PRBS inputs are often employed during nonlinear system identification.Since this article studies the application of linear polynomial model structures, the excitation signal used was PRBS that only switches between two amplitudes.

Model
where ŷ is the predicted output,  is the observed output,  mean is the mean of the observed output, and NRMSE is the normalized root mean square error.
Whiteness test of a residual indicates whether the model has extracted all the information or requires further refining.This is achieved by plotting a histogram of residuals.A good model is one that has a mean value of zero and some variance, indicating that the residuals are uncorrelated.On the other hand, a crossplot analysis is a graphical and visual method of comparing two sets of data whereby the plot of simulated output and validation data was compared graphically and visually, with a 45-degree straight line.The crossplot of a residual is an - plot obtained by plotting the simulated output on the -axis and validation data on the -axis.A good model will produce an output that gives a perfectly 45-degree straight line when plotted against the validation data set.

Description of the Case Study Reservoir Model.
The reservoir model, which was used to generate the input and output data, is referred to as UNISIM-I-M and is based on a field located in Campos basin in Brazil [27].The model was originally developed to serve as a benchmark case study on decision analysis regarding reservoir management.For a detailed description of the model, readers are referred to Avansi and Schiozer [26] and Gaspar et al. [27].Figure 1 shows the porosity distribution while Figure 2 presents the reservoir geometry including the wells placement and vertical permeability distribution of the UNISIM-I-M reservoir model.All the red wells represent producer wells while blue wells represent injector wells.The reservoir volume is meshed into smaller 3D blocks, referred to as grid blocks.Each grid block has a thickness of 50 m, a width of 100 m, and a length of 100 m.There are a total of 93960 such active grid blocks with 81, 58, and 20 blocks arranged in the , , and  directions, respectively.The water-oil and gas-oil contacts are located at a depth of 3200 m and 3000 m, respectively.Initial oil originally in place is calculated to be 175000 m 3 .The initial reservoir pressure was 300 bar.The total numbers of water injectors and producers are 11 and 14, respectively.The original well placement was slightly modified to optimize production; however, production constraints are kept the same as the original.Production and injection well constraints are described in Tables 2 and 3.

Result and Discussion
In developing a proxy model, the inputs consisted of water injection rate from individual wells and the output consisted  4.
The response of the numerical reservoir model is presented in Figure 3.The figure shows that the total number of input and output data is 2807, of which 1500 were used for identification and the next 10 data points were removed to fulfill requirements set by the initial condition of prediction equations.The remaining 1297 data points were utilized for cross validation.Each of the 2807 data points presents the input and output calculated on a daily basis.The numerical simulation run time required to generate the outputs was 4 hours and 53 minutes on Intel5 Core6 i5-3210 M CPU @2.5 GHz Processor with 6 GB RAM.This is considered very large compared to the few seconds required to run polynomial models, which are examined in this study.
In addition, when carrying out multiple simulation runs to evaluate reservoir performance, the essence of utilizing reduced model or metamodel as a substitute for numerical reservoir simulator becomes extremely substantial.

Initial Model Estimation.
A simple and yet useful system identification model structure, that is, ARX, and its noise integrated version were used to initiate the system identification process.This allows narrowing the model-order search region and determining time delay.Noise integrated ARX model (ARIX) with both poles and zeros equal to three and a time delay of zero was found to result in a percentage fit of 71.92%.The model was named ARIX (3-3-0).Model structure of ARIX model is where () is a polynomial in the shift operator () of order , () is a polynomial in the shift operator () of order , () is the output (oil production rate), () is the input (injection rates),  is the number of poles, and  is the number of zeros.Figure 4 presents the cross validation result of the ARIX (3-3-0) model.Akaike's final prediction error (FPE) and mean squared error (MSE) exhibited were 0.8157 and 0.8724.The MSE is a measure of the quality of an estimator; it is always nonnegative, and values closer to zero are better.Moreover, according to Akaike's theory, the most accurate model has the smallest FPE.The step response, which illustrates the time behavior of the outputs of ARX (3-3-0) when its inputs change from zero to one in a very short time, is shown in Figure 5.It can be seen that the model has responded without a time delay and has become stable.introducing a variation in model order and time delay close to the ARIX (3-3-0) model.The best model that resulted in the best percentage fit with the validation data set was found to be Box-Jenkins model with noise integration (BJI).The structure of the BGI model is

Identification of a
]  () .(9) The number of poles and zeros of the deterministic part was found to be four for all the inputs.The stochastic part of the BJI model structure that gave the best performance was also of order four in both poles and zeros.The time delay is found to be two.The model can be referred to as BJI (4-4-4-4-2).Figure 6 In addition, the step response of the identified model is shown in Figure 7.It can be observed that the response of model BJI (4-4-4-4-2) is stable after an abrupt change in input.Knowing how the system responds to a sudden input is important because large and possibly fast deviations from the long-term steady state may have extreme effects on the prediction ability of the model.This will in turn influence decisions.In addition, a crossplot of simulated output versus validation data was plotted to allow further analysis on the identified model.The data set was concatenated and sorted in descending order based on the simulated output before it was plotted on - plane.A good model will have its crossplot aligned with the 45-degree line.A model that underestimates the output will have its crossplot below the 45-degree line and a model that overestimates the output will have its crossplot above the 45-degree line.Figure 9 presents a crossplot of model BJI (4-4-4-4-2).It can be seen that the model neither underestimates nor overestimates during the validation period.In other words, the prediction ability of the identified model is satisfactory.

Summary and Conclusion
Numerical reservoir modeling and simulation has many applications in the upstream oil and gas industry.However, due to the large computational time required, full exploitation of its potential use remains challenging.Currently, the use of response surface based proxy models is becoming a common trend.Nowadays, several simulation software packages incorporate a module for proxy modeling.While such proxy models are useful and significantly reduce optimization time, their development still requires several simulation runs.In this study, a system identification based proxy model that only requires a single run and properly designed excitation signal was proposed.A case study that involves water injection was used to investigate and establish a procedure for application of system identification techniques in reservoir simulation.Injection flow rates were designed in the form of a pseudorandom binary sequence that is known to excite all frequencies equally and allow successful identification.Input and output data were collected at a frequency of 1/day and were divided into modeling and validation sets.The modeling set was used to construct a proxy model.The validation set was used for cross validation and residual analysis.Residual analysis and crossplot test together with the calculation of percentage fit have shown that the identified model was able to accurately mimic the performance of the numerical reservoir simulator while predicting total oil production during oil recovery by water injection.In conclusion, with proper selection of input sequence and model structure, system identification can deliver a proxy model with just one simulation run.

Nomenclature
: Liquid flow rate   : Oil flow rate DCA: Decline curve analysis   : Water flow rate WC: Water cut BHP: Bottomhole pressure GOR: Gas-to-oil ratio NRMSE: Normalized root mean square error PRBS: Pseudorandom binary sequence MISO: Multiple input-single output.

xFigure 1 :Figure 2 :
Figure 1: Porosity distribution of the first layer of the UNISIM-I-M reservoir model.

Figure 3 :Figure 4 :Figure 5 :
Figure 3: Oil production rate (a) and water injection rate of well INJ-1 (b), used during modeling and validation of linear time invariant model structures.

Table 2 :
Groups and wells control data for water injection case.

Table 3 :
Group wells control for water injection case.

Table 4 :
Lower and upper bounds of input design for modeling fluid production during water injection.Well name Lower bound (Sm 3 /day) Upper bound (Sm 3 /day)