Examination of Experimental Designs and Response Surface Methods for Uncertainty Analysis of Production Forecast : A Niger Delta Case Study

The purpose of this paper is to examine various DoE methods for uncertainty quantification of production forecast during reservoir management. Considering all uncertainties for analysis can be time consuming and expensive. Uncertainty screening using experimental design methods helps reducing number of parameters to manageable sizes. However, adoption of various methods is more often based on experimenter discretions or company practices. This is mostly done with no or little attention been paid to the risks associated with decisions that emanated from that exercise. The consequence is the underperformance of the project when compared with the actual value of the project. This study presents the analysis of the three families of designs used for screening and four DoE methods used for response surface modeling during uncertainty analysis. The screening methods (sensitivity by one factor at-a-time, fractional experiment, and Plackett-Burman design) were critically examined and analyzed using numerical flow simulation. The modeling methods (Box-Behnken, central composite, D-optima, and full factorial) were programmed and analyzed for capabilities to reproduce actual forecast figures.The best method was selected for the case study and recommendations were made as to the best practice in selecting various DoE methods for similar applications.


Introduction
The increasing necessity for systematic and consistency of methodologies for analyzing uncertainties and their associated effects on investment decisions has become very important in management decisions.This is one of the major reasons why the use of experimental design has been on the increase in the exploration and production (E&P) industry.To be able to harness the benefits associated with design of experiment (DoE), users must apply it wisely, failure of which can result in waste of time and costly decisions.Important areas of application of DoE in petroleum engineering include uncertainty screening and quantification.With DoE one can significantly reduce the number of simulations required to assess uncertainties since considering large number of simulations is not feasible due to limited available time and resources.Previous screening methodologies [1] have considered static sensitivity analysis.Though still applicable for some studies, the dynamic behaviour of the sensitivity is important to observe because different uncertainty attributes may influence reservoir performance at different time.
Recent research and applied reservoir characterization and modelling projects have focused on the application of experimental design (ED) and response surface techniques [2][3][4].Successes recorded from the use of ED depend largely on number of factors.These include available time, number of factors, the study objectives, and amount of details required from integrated reservoir simulation studies.There are several designs.Morris [5] classified ED into fractional factorial design (FRFD), optimal design, and Latin hypercube design (LHD).The first two classes are extensively used in the area of physical experiments for model development [6,7].
LHD is an efficient design that allows representation of entire parameter range in the design matrix.Its construction is based on the idea of stratified sampling which requires a search algorithms and prespecified optimality criteria [6].
The underlining assumption using 2-level factorial designs is that the 3-factor interactions higher order factors are negligible.Consequently, most of the experiments are omitted from the analysis and therefore seldom used for construction of response surfaces.For illustration, if we consider 7 parameters system that require a 2 7 = 128-run full design.Among the 127 degrees of freedom in the design, only 7 are used to estimate the linear terms and 21 to estimate the 2-factor interactions.The remaining 119 degrees of freedom are associated with three-factor interactions higher order factors that are supposed to be negligible.In practice, if the number of factors ranges from 2 to 15, fractional factorial can be used to screen the significant few by estimating the main effects and interactions.There are various possible fraction designs: 64-, 32-, or 16-run designs.The 16-run design usually is preferred since this is the cheapest design with resolution IV that is available for uncertainties higher than eight.
Two-level Plackett-Burman (PB) designs have been used in the context of classical statistical experiments for screening purposes.However, PB designs cannot be used when higher number of levels is required or where simulator proxy's development remains the priority of the study.One variable at a time is a screening process involving varying a parameter values between its minimum (−1) and maximum (+1) ranges within the parameter space.The degree of the displacement from the base case of resultant response indicates the extent of influence the parameter has on the measured response.One variable at a time method is a static sensitivity common in classical laboratory experiments.However, its application for screening large number of uncertainties has been reported [8].
Three-level experimental design methods are high resolution algorithms for response surface construction.Central composite design (CCD) consists of a factorial design with the corners at +1 of the cube, augmented by additional "star" and "centre" points, which allow the estimation of the second-order polynomial equation [9].Like CCDs, Box-Behnken designs are a family of three-level designs and can be constructed by the combination of factorial and incomplete block designs [7].The application of 3-level full factorial is limited only to four factors or lower due to the accompanied large number of runs for large number of uncertainties.However, 2-level factorial designs (fractional designs) have also been used where 3-level full factorial is infeasible for response surface construction [10].
There are various response modeling techniques described in the literature [7,11].The creation of proxymodels of a high quality is related to the selection of appropriate experimental design algorithms, the quality of the input dataset, and the degree of linearity and nonlinearity of input-output relationship.Yeten et al. [12] studied the capability of different experimental designs and proxy models to predict uncertainties.A good result was reported in application of Kriging and Splines models with Latin hypercube design.However, underperformance of models developed using traditional experimental designs was observed due to under sampling of uncertainty space and high degree of nonlinearity of input-output relationship.
Fetel and Caumon [13] demonstrated how flow simulation results can be combined with secondary information to overcome sampling problem for modeling complex response surfaces.A more accurate response surface at affordable cost was generated when compared with the conventional techniques.Recent studies have shown that the problem of nonlinearity is addressed using ANN [14] and fuzzy inference system [15] that employs hybrid-learning rules for training process.
Whenever experimental design is used, RSM are usually constructed with regression, interpolation, and neural network [16,17] methods.The methodology integrated 3level designs with some screening linear designs.In reservoir simulation, a response surface model is essentially an equation derived from the multiple regressions of all the main attributes (reservoir parameters) that impact the reservoir response.This provides a good approximation to simulator and also provides simplified solution to even very complex problems that required huge computation time [18].The resulting proxy model is a function of the number of values for each uncertainty and is readily amenable to sampling by Monte Carlo simulation for uncertainty quantification.
This study critically examined three common families of experimental designs used for screening during uncertainty analysis in simulation for reservoir management.These include (a) sensitivity by one factor at-a-time, (b) fractional experiment, and (c) Placket-Burman design.The selection of "heavy-hitters" was based on the dynamic sensitivity with responses measured at the end-of-simulation.However, the potential of missing out some parameters that are time dependent as pointed out by Amudo et al., 2008 [4], was minimized by taking measurement of the response after 15 and 30 years of simulation.With this measure, uncertainty propagation was adequately captured and all uncertainties were included in the analysis.The study developed possible response surface models that are consequence of the various screening methods.The models were validated and subjected to statistical error and Monte Carlo analysis.

Description of Uncertainty.
There are total of 10 major uncertainties in this study.Table 1 shows the descriptions, the keywords used, and the corresponding ranges of uncertainties as multipliers on the base case value.

Experimentation.
Experiments were performed using history matched model.The response is the cumulative oil production (FOPT) after 15 and 30 years of forecast.Simulation runs vary according to design methods and at the end of simulation, responses are available for statistical analysis.Figure 1 shows a typical cumulative production curve at the end of history or beginning of production forecast.The profile shows the existence of uncertainty in the  2, 3, and 4. The "1", "−1, " and "0" represent absolute high, low, and base case values of the parameters.The contribution of each uncertainty factor was estimated following the significant test for the regression used in the analysis of variance (ANOVA) for all the methods.The two hypotheses used for the test are as follows.
(1) The null hypothesis: all treatments are of equal effects: (2) The alternative hypothesis: some treatment is of unequal effects: To reject the null hypothesis  0 at least one of the variables explains significantly the variability observed on the response so that the model is valid.
The effects of the different attributes on the response for Table 4 were estimated using [19] Attribute where  max and  min are maximum and minimum responses, respectively.

Development of Response Surfaces. Figures 2, 3, and 4 show
Pareto charts associated with different screening methods highlighting the impact of the various parameters on the reserves over the forecast periods.The vertical black lines correspond to a 95% confidence level implying that any parameter to the right is significant ("heavy-hitter") with 95% confidence.
Figure 2 shows the six "heavy-hitters" identified for fractional experiment at the end of 15-year forecast: the OVISC, SWI, PERMX, PORO, KRW, and PERMZ.The four other parameters, that is, the SGCR, the AQUIPV, MULTFLT, and SWCR, are insignificant on reserve forecast within these periods.However, at the end of 30-year simulation, KRW effect was insignificant and this reduced the number of "heavy-hitters" to five.
Figure 3 shows the five "heavy-hitters" identified for PB experiment at the end of 15-year forecast: the OVISC, SWI, PERMX, PORO, and PERMZ.However, at the end of 30-year simulation, PERMZ became insignificant on the model.The four identified "heavy-hitters" include OVISC, SWI, PERMX, and PORO.
Figure 4 shows the "heavy-hitters" identified for varying one parameter at-a-time at the end of 15-and 30-year forecast, respectively.In both cases, OVISC, PERMX, and SWI have the most influence on the response.One important observation is the time-dependent nature of the aquifer properties (AQUIPV).
To determine whether there is a linear relationship between the response and various "heavy-hitters, " test for significance of regression was performed by ANOVA.This study utilized the computed Fisher variable () and its associated probability (prob > ) to assess the model significance as well as to select regressors with high impact on the model by assuming 95% confidence level or 5% significant limit (i.e.,  = 0.05).There is only a 0.01% chance that a "model -value" this large could occur due to noise.Values of "Prob > " less than 0.0500 indicate models terms are significant.In this case A (PORO), B (PERMX), C (PERMZ), D (SWI), E (OVISC), B 2 , C 2 , D 2 , and E 2 are all significant model terms.
In general, 10 different response surface correlations were developed.The regression equations were based on the outcome from the three screening methods earlier discussed.Four methods were considered for response surface modelling: Box-Behnken design (BBD), central composite design (CCD), full factorial design (FFD), and D-optimal design (DOD).Thus, model equations ( 4), ( 5), (6), and ( 7) are associated with fractional factorial method (FRFDRSMs), model equations ( 8), ( 9), (10), and ( 11) are associated with Placket-Burman method (PBDRSMs), and the 2 feasible response model equations ( 12) and ( 13) are associated with screening using one variable at-a-time (OVAATRSMs).Tables 6, 7, and 8 show the constants in all the equations:   the prediction methods were a perfect fit of the experimental data then all of the points would lie on the  =  line.Parity plots do not reveal much information.However, the plots for the Box-Behnken method generally demonstrate that the method predict the actual value more accurately.On these plots the vast majority of the points are along the  =  line.In addition, the plots reveal that regardless of the screening method, CCD exhibits the least accurate prediction.

Model Validation
The validation using cross plots only assess model efficiencies within the experimental range of parameters.In order to determine model predictability elsewhere this study performed a "blind test" using parameter values outside the range already defined in Table 1.The simulation and prediction were made only on OVISC because it is controllable.
Figures 6(a), 6(b), and 6(c) show, respectively, the comparison of RSMs from FRFD, PBD, and OVAAT with the actual experimental/simulated values.The simulated production upon degradation of OVISC by 30, 50, and 70% is shown in Figure 6(d).It was noticed that prediction using full factorial method was excellent with maximum deviation of 0.79 MMSTB followed by central composite method with maximum deviation of 3.9 MMSTB.Box-Behnken generally overpredicted the actual reserves volume with deviation approximately 20 MMSTB.

Statistical Error Analysis
The performance indices used are summarized in Table 9.Each of the response surfaces developed was used to estimate the production forecast for each of the data sets.Tables 10, 11, and 12 show the result from the error analysis.In addition to the estimated errors, the coefficients of correlations are   CCD on the other hand shows maximum estimated error and least correlation coefficients.Again this was performed on predictions within the actual parameter range of values.

Representative Model
As shown in Table 13, all models were ranked using the following criteria: number of uncertainties associated with the different screening methods, result from error analysis, result from the "blind" test, and coefficients of correlation.The decision to select "best" model was based on scenario objectives.

Objective 1.
If it is desired to develop risk curves from where P90/P50/P10 values for the response can be determined alongside its representative models, the analysis favours the selection of Box-Behnken method.There are however three possible response surfaces depending on the screening method.These are response surface equations (4) (with 5 factors, 46 runs), (8) (with 4 factors, 29 runs), and (13) (with 7 factors, 62 runs).For practical purposes, response surface equation ( 8) is desirable with fewer number of factors and experimental runs.

Objective 2.
If it is desired to utilized the proxy equation to simulate and evaluate future development strategies such as the needs for acquiring additional information, undertaking stimulation or any of EOR such as in situ combustion, a more efficient model capable of estimating reservoir performance within acceptable margin of error is desired.Both central composite (CCD) and full factorial methods are adequate.However, in this analysis, full factorial method performed far better than the CCD.FFD tends to be impractical for large number of uncertainties.We have demonstrated in this study that a 2-level factorial experiment can be used for constructing response surface of quality as comparable as that from 3-level full factorial.However, we highly recommended consideration for resolution of the factorial fractions to be used.

Monte Carlo Simulation
The Monte Carlo technique [20] was used to combine the uncertain attributes and allows generating values for model input variables.2000 iterations were made while assuming the following distribution functions: triangular for OVISC, uniform for PORO, log normal for PERMX and PERMZ, normal for SWC, triangular for AQUIPV, and triangular for KRW.The risk curves generated were in terms of cumulative oil production.
Figure 7 shows risk curves from various RSMs for the uncertainty assessment.It is clearly shown that the P10 and P90 values differ according to the assumed response surface methods.Full factorial method gives the highest P10 (38.4 MMSTB) value and minimum P90 (28 MMSTB) value.CCD is characterized by lowest P10 (35.4 MMSTB) and highest P90 (32 MMSTB).Both Box-Behnken and D-optimal methods give approximately equal values of P90 (36 MMSTB) and P10 (30 MMSTB).These figures require additional evaluation framework for ranking different prospects regarding different outcomes so that feasible decisions when comparing risky capital investment can be made instead of decisions based on "best guess." The costs and benefits of performing 46 and 62 experiments instead of the desired 29 PB experiments  were determined by constructing multiple risk curves with equal or about the same P50 (mean) values for all models.Figure 8(a) shows a good agreement of risk curves of Box-Behnken associated with Placket-Burman and fractional factorial screening designs.The same applied to full factorial risk curves shown in Figure 8(c).However, the risk curve associated with screening using the one variable at-a-time exhibits wider variability from others and tends to be more optimistic which is perhaps due to different occurrence probability and larger number of factors involved in the regression analysis.
The risk curves obtained by CCD developed from Placket-Burman and fractional factorial are a little at variance from each other due to combination that possesses dissimilar occurrence probability.
In summary, both fractional screening design and PB design tend to give approximate result.Based on the analysis, there is no significant added advantage in performing 46 experiments as required by fractional factorial screening method.PB with fewer experimental runs is therefore desirable.

Forecast Distribution
It was found that 2000 equiprobable realisations iteratively built in Excel were enough to stabilize the resulting forecast distributions.One critical measure used to generate each realisation of the model is that the deterministic reserve value was fairly maintained at simulation base case value throughout the process.Cumulative probability distribution for the forecast reserves is shown in Figure 9.The impact of the major uncertain parameters was quantified.It clearly appears in Figure 10 that the oil viscosity and water saturation uncertainties are the most influential parameters.The other three parameters (porosity, horizontal, and vertical permeability) have less importance.The spread in production profiles (P10-P90 range) from the deterministic forecast (P50) volume as shown in Figure 11 indicates occurrence of uncertainty in the forecast.The extreme quantities to a large extent are investment indicators.

Summary
(i) The major objective of this study was to investigate the implications of various experimental design assumptions usually made while performing uncertainty analysis after reservoir simulation for reservoir management.
(ii) The three families of screening designs considered are fractional factorial, Placket-Burman, and one variable at-a-time.
(iii) The four response surface methods considered for the regression modeling are full factorial, central composite, Box-Behnken, and D-optimal.
(iv) A total of 9 response surface models developed were validated and subjected to statistical error analysis.
(v) The models were ranked using some criteria and based on case objectives, selection of appropriate model was made.
(vi) The risk curves generated were used to provide information about the costs and benefits of conducting additional experiments due to differences in the number of factors emanated from using different screening methods.
(vii) Also the risk curve was employed to identify stochastic models associated with the P10/P50/P90 realizations.

Conclusion and Recommendations
This study examined three screening designs (Placket-Burman, fractional factorial, and one variable at-a-time) and four response surface methodologies (Box-Behnken, central composite, D-optimal, and full factorial) commonly used for uncertainty analysis.In all screening methods, years of production forecast played important role on associated number of "heavy-hitters." One variable at-a-time was identified with largest number of parameters and hence considered not ideal because of the attendant large number of simulation runs.Unlike Placket-Burman, a low resolution fractional factorial in addition to main effects, considered significance of factor interactions, required more simulation runs, but can prevent exclusion of some factors with minimal main effect but significant interaction effect.Nevertheless, the analysis performed in this study using Monte Carlo simulation shows that there was no added advantage using fractional factorial in lieu of Placket-Burman for screening.
The "best" model for uncertainty quantification must be selected based on the reservoir management objectives.Box-Behnken method was adequate to determine P10/P50/P90 and associated models.On the other hand, to evaluate future development strategies such as EOR, stimulation, and the needs for acquiring additional information, full factorial and central composite designs are more efficient predictors within acceptable margin of error.A full 2-level factorial or high resolution fractional factorial method was equally adequate for the construction of response surfaces for uncertainty quantification where 3-level full factorial was not feasible.

Figures 5 (
Figures 5(a), 5(b), and 5(c) are parity plots for the various prediction methods corresponding to different screening method.These graphs are plots of the predicted production forecast as a function of the experimental reserves values.If

Figure 7 :
Figure 7: Impact of different RSMs on uncertainty assessment.

Table 1 :
Experimental range in terms of multipliers on the base case uncertain parameters.

Table 2 :
DoE matrix for fractional factorial method.

Table 3 :
PB design table for 10 parameters.

Table 5 :
Analysis of variance for Box-Behnken model associated with fractional factorial screening design.

Table 5
is a typical ANOVA table for Box-Behnken method associated with fractional factorial screening design.The model -value of 1031.25 implies the model is significant.

Table 9 :
Performance indices for model evaluation.

Table 10 :
Summary of the statistical analysis of RSMs from fractional factorial screening.

Table 11 :
Summary of the statistical analysis of RSMs from Placket-Burman screening.

Table 12 :
Summary of the statistical analysis of RSMs from OVAAT screening.

Table 13 :
Summary table for model ranking.