Developing a Robust Surrogate Model of Chemical Flooding Based on the Artificial Neural Network for Enhanced Oil Recovery Implications

Application of chemical flooding in petroleum reservoirs turns into hot topic of the recent researches. Development strategies of the aforementioned technique are more robust and precise when we consider both economical points of view (net present value, NPV) and technical points of view (recovery factor, RF). In current study many attempts have been made to propose predictive model for estimation of efficiency of chemical flooding in oil reservoirs. To gain this end, a couple of swarm intelligence and artificial neural network (ANN) is employed. Also, lucrative and high precise chemical flooding data banks reported in previous attentions are utilized to test and validate proposed intelligent model. According to the mean square error (MSE), correlation coefficient, and average absolute relative deviation, the suggested swarm approach has acceptable reliability, integrity and robustness. Thus, the proposed intelligent model can be considered as an alternative model to predict the efficiency of chemical flooding in oil reservoir when the required experimental data are not available or accessible.


Introduction
The oil and gas upstream industries are recently encountered with the difficulties and challenges of dealing with hydrocarbon resources whose productions with conventional technologies are following an upward trend of technical limitations.It is because of achieving the stage of decline phase by most of oilfields around the world.Therefore, how to postpone the abandonment of reservoirs has tuned into the priority of researchers in the worldwide.Their researches normally highlight the concept of great necessities for inventions of new techniques, normally classified as tertiary oil recovery methods, having abilities of maintaining the economic production rate [1][2][3].
Chemical enhanced oil recovery approaches as one of the most effective subsets of tertiary methods are known as a key to unlock the exploitation of referred resources.Different methods for this process have been developed, such as polymer, surfactant/polymer (SP), and alkaline/surfactant/polymer (ASP) flooding.These methods are applied to increase the rate of oil production through focusing on both lowering the interfacial tension and reducing the water mobility.In more details, it has enormously been declared in previous literatures that in order to design, manage, and run a chemical enhanced oil recovery operation it is highly required to set very expensive and time-consuming but precise experimental procedures which their generated results must be gained to plan effectively the process of injecting chemical materials [4][5][6][7][8][9].
The laboratorial generated outputs are then used to conclude two parameters, recovery factor (RF) and net present value (NPV), which are used to evaluate the performance of the chemical flooding which is one of the most popular methods of chemical enhanced oil recovery.Having knowledge about these two parameters is essentially vital to make decisions if it is beneficial to run the referred operation.Unfortunately, there are no global methods to interpret simultaneously both aforementioned factors although there are numerous numbers of different software and numerical or analytical methods which are capable of making very precise quantitative decisions about the amount of one of the RF or NPV [10][11][12].Hence, there is a great need in oilfield for having access to a solution or model which can predict the amount of these two parameters at the same time.The major aim of current study is to execute new kind of artificial intelligence approaches to suggest robust and accurate predictive method to forecast efficiency of the chemical flooding through petroleum reservoirs.To gain successfully this referred goal, hybridization of artificial neural network and particle swarm optimization (PSO) was executed on the previous literature data bases.The integrity and performance of the proposed predictive approaches in estimating recovery factor (RF) and net present value (NPV) from the literature are described in details.

Data Gathering
The data utilized throughout this research have been gathered from previous attentions [9] in which chemical flooding had been simulated in Benoist sand reservoir, by executing UTCHEM simulator.That reservoir has been produced under primary and secondary processes over fifty years.The original dataset contained 202 data.Each data had 7 inputs: surfactant slug size, surfactant concentration in surfactant slug, polymer concentration in surfactant slug, polymer drive size, and polymer concentration in polymer drive,  V / ℎ ratio, and salinity of polymer drive.In addition, the outputs were RF and NPV.The ranges of implemented data banks are reported in Table 1 [9].

Artificial Neural Network and Particle Swarm Optimization
Artificial neural network (ANN) includes simple nodes, named as neurons, which are bonded to each other to construct a network model.Indeed, the biological nervous systems can be simulated with the ANN system, somehow.
In fact, the main purpose of an ANN model is to determine target function through internal computation during the training phase if the values of input variables are provided.The most common type of ANN is the multilayer feed forward neural network which is made up of group of interconnected neurons organized in the form of layers: input layer, hidden layer(s), and output layer where each layer comprises a group of neurons as presented in Figure 1.This network is strictly an acyclic type since signals propagate only in a forward direction from the input neurons to the output neurons and no signals are allowed to be fed-back among the neurons.The number of neurons in the input and output layers is decided by the number of input and output variables that are planned for the predictive tool.However, the optimal number of neurons in hidden layer(s) is a strong function of nonlinearity and dimensionality of the problem under study [13][14][15][16][17][18][19][20][21][22][23][24].
The artificial neuron is the fundamental part of the neural networks.Each artificial neuron-excluding neurons at the input layer-takes and processes inputs gathered from other neurons.Given further information, each artificial neuron is a mathematical information-processing unit.The processed information is presented at the output end of the neuron.Figure 2 addresses the procedure in which an artificial neuron treats the data and information entered in the model.Each input signal (  ) is primarily multiplied by the corresponding weight value (  ) and the resultant products are summed up to generate a total weight in the form of  1  1 +  2  2 + ⋅ ⋅ ⋅ +     .The sum of the weighted inputs and the bias (  = ∑  =1   ⋅   +   ) forms the input to the activation function, .An activation function processes this sum and gives out the output,   .Indeed, the resulting sum is processed by a neuron activation function to obtain the ultimate output of the neuron as follows [13][14][15][16][17][18][19][20][21][22][23][24][25][26]: This output will be the input signal for the neurons in the following layer.The linear (purelin) transfer, tan-sigmoid (tansig) activation, and log-sigmoid (logsig) activation functions are mostly employed in the practical cases with applications in science and engineering disciplines.The corresponding  relationships for these functions are defined, respectively, by ( 2)-( 4), as given below [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27]: The weight factors are generally considered as the adaptive parameters in the network to obtain the strength of the input signals.A bias is characterized with a weight which is not responsible for connecting an input of two neurons to an output.A particular level of a neuron output signal is represented by a set of bias that does not depend on the input signals.The weight factors and biases are tuned during the course of training phase such that the network is able to forecast the accurate target parameter for a given set of inputs.
There are a number of training algorithms with different methodologies in the context of intelligence system.A variety of optimization tools such as particle swarm optimization (PSO) [15,18,19], genetic algorithm (GA) [21], hybrid genetic algorithm and particle swarm optimization (HGAPSO) [13,16], unified particle swarm optimization (UPSO) [14], and imperialist competitive algorithm (ICA) [17,20,23] for weight training of neural networks have been used.Kennedy [27] introduced the PSO as a strong stochastic optimization technique which simulates the social manners of birds within a group, based on population concept.It searches for an optimum solution by iteratively updating a swarm of particles.
The model originally includes a group of random particles (solutions).A random velocity is attributed to each candidate particle which flies within the problem space.The solutions consist of memory and try to attain the best position or/and fitness.This parameter is symbolized by " best " that is linked only to a specific particle.The model also retains the best fitness, known as " best , " which is found among the entire solutions (particles) in the swarm.The candidate particle that obtains this fitness is the global best in the population [25][26][27][28].In the current study, a particle's fitness is calculated through determination of the network output for every point in the training part and then computing the sum of squares of the resultant errors (MSE) for performance evaluation.If f(x) is better than f(p best ): set current value as the new p best set current value as the new g best The basic PSO theory involves variation of each particle velocity toward its  best and  best locations at each time interval.The particles' new velocity and position are updated according to the following equations [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28]: where V   and V  +1 are velocities of particle  at iterations  and  + 1;    and   +1 are positions of particle  at iterations  and  + 1;  represents the inertia weight that directs the exploitation and exploration of the search space as it continuously updates velocity;  1 and  2 are termed as cognition and social components, respectively.They are considered as the acceleration constants which alter the velocity of a solution in the direction of  best and  best [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28]; and  1  and  2  refer to the two random variables uniformly distributed in the interval of [0, 1].
Herein, PSO algorithm has been used in evolving weights of multilayer feed forward neural network.In this case, a particle's position at any iteration is described as a particle whose coordinates are connection weights.The vectors of weights for each particle  will be called   .Throughout the training process the above equations (equations ( 5) and ( 6)) will customize the network weights until a criterion is met.In this case, a lower MSE, as a sufficiently good fitness, is achieved; nevertheless, a maximum number of iterations are used to terminate the iterative search process if no improvement is observed over a number of consecutive generations in an appropriate time.The flowchart of the PSObased training algorithm for the ANN is shown in Figure 3.
The PSO utilizes a random procedure in the search space of the problem such that particles in the population are directed toward optimum positions but not in or between optimal areas [27].Thus, PSO can be used to train neural networks with nondifferentiable (even discontinuous) neurons activation functions.It can be also implemented in cases where gradient or error information is not accessible.PSO is easy to implement and there are few parameters to be adjusted.However, the uniqueness of the algorithm lies in the dynamic interactions among the particles that turn it into a social-psychological model of knowledge management [27].

Results and Discussion
According to the study accomplished by Cybenko [29], a network that consists of only one single hidden layer has the ability to approximate nearly any kind of nonlinear function.However, determination of the ideal number of neurons in the hidden layer is a challenging task; few neurons will not give adequate precision and too many hidden neurons may lead to overfitting.It means that the training data might be fitted adequately; however considerable oscillations between the points are noticed in the fitting curve, resulting in poor interpolation and extrapolation.The network performance is evaluated as demonstrated in Figure 4, when different number of neurons is tested.A smart model with one hidden layer (including just one neuron) was primarily built in the current study to predict the recovery factor and net present value (NPV) of chemical flooding in oil reservoirs.Prediction accuracy was further analyzed by an increase in the number of neurons to 10 to decide on the most precise technique.
As clear from the results demonstrated in Figure 4, a 3-6-1 architecture (6 neurons in the hidden layer, 3 neurons in the input layer, and one neuron in output layer) offers the best model for recovery factor and net present value (NPV) prediction in terms of MSE and  2 , since the optimum structure achieved by the trial and error procedure has a very low mean squared error of MSE = 0.0012 and a satisfactory coefficient of determination of  2 = 0.9996, on the basis of comparison between the predicted and real data.
The generated results of the proposed intelligent approach are depicted through Figures 5 to 10.The existing contrasts between suggested intelligent approach and related recovery factor (RF) of the chemical flooding in oil reservoir in the regression plot have been depicted in Figure 5.As shown in Figure 5 which is a graphical and scatter presentation of the PSO-ANN results versus corresponding determined recovery factor (RF) data, the PSO-ANN outputs lie over the line  = , the fact that indicates the identity of outputs gained from suggested PSO-ANN model and relevant recovery factor data samples.To serve better understanding about generated results of the proposed PSO-ANN model, the comparison between gained recovery factor from the addressed model and real recovery factor data versus relevant data index has been illustrated in Figure 6.As illustrated in Figure 6, the obtained results of proposed model are as close as possible to real recovery factor (RF) data samples.To put it another way, the outputs of the PSO-ANN approach have the same behaviour as actual data do.The high considerable level of efficiency and accuracy related to the PSO-ANN approach in prediction of the recovery factor dataset of chemical flooding has once again been certified in Figure 6.Moreover, the robustness of the PSO-ANN has been demonstrated in terms of the relative deviations of PSO-ANN model outputs from corresponding determined recovery factor data in Figure 7.As could be observed in Figure 7, the highest deviations of the suggested approach results are subjected to the early boundary of recovery factor data samples.5% is the maximum degree of relative deviation shown in Figure 7.The draw parallel between our proposed intelligent PSO-ANN model results and related net present value (NPV) of the chemical flooding in oil reservoir in the regression plot has been shown in Figure 8.As shown in Figure 8 which is a graphical and scatter presentation of the PSO-ANN results versus corresponding determined net present value (NPV) data, the PSO-ANN outputs lie over the line  = , the fact that indicates the identity of outputs gained from suggested PSO-ANN model and relevant net present value (NPV) data samples.The comparison between generated net present value (NPV) from the addressed approach and real net present value (NPV) data versus relevant data index has been shown in Figure 9.As illustrated in Figure 9, the obtained results of proposed model are as close as possible to net present value (NPV) data samples.To put it another way, the outputs of the PSO-ANN approach have the same behaviour as actual data do.Furthermore, the effectiveness of the proposed intelligent model has been depicted in terms of the relative deviations of PSO-ANN model outputs  10.As can be seen from Figure 10, the highest deviations of the suggested approach results are subjected to the early boundary of net present value (NPV) data.6% is the maximum degree of relative deviation depicted in Figure 10.The performance efficiency of the selected network is assessed using the various error analysis parameters.Table 2 tabulates the PSO-ANN accuracy in terms of correlation coefficient (), coefficient of determination ( 2 ), mean absolute in which  represents the total number of data points including either training, testing, or whole data set (input and output pairs),    refers to the actual value at the sampling point ,    is the th output of the model, and   and   stand for the average magnitudes of the actual and predicted data, respectively.

Conclusions
Owing to the gained results of this contribution the following major conclusions can be drawn.
(1) Adequate agreement between gain dew point pressure from the developed intelligent model and corresponding real recovery factor/net present value (NPV) values is observed.In other words, the conventional approaches fail to monitor real recovery factor/net present value (NPV) of chemical flooding dedicated to the gained statistical criteria such as mean square error (MSE) and correlation coefficient.
(2) The evolved intelligent network model for monitoring real recovery factor/net present value (NPV) of chemical flooding is user friendly, fast, and cheap for implementation.Moreover, it is very useful and user friendly for evolving the accuracy and robustness of the commercial reservoir simulators like ECLIPSE and computer modelling group (CMG) software for enhanced oil recovery (EOR) from oil reservoirs.

Figure 1 :
Figure 1: Architecture of multilayer feed forward neural network.The symbol    denotes the synaptic weight between the output of the th neuron in the hidden layer and the input of the th neuron in output layer.The symbol    denotes the bias of the th neuron in hidden layer.The superscript  stands for output layer.

Figure 2 :
Figure 2: Information processing by an artificial neuron.
particles size Initialize each particle with random position and velocity; each particle's position contains a series of ANN weights Generate new swarm Find fitness (f) value for each particle in for ANN testing If f(x) is better than f(g best ):

Figure 3 :
Figure 3: PSO-based algorithm flowchart in optimization of the weights of ANN.

Figure 4 :Figure 5 :
Figure 4: Effect of number of hidden neuron on PSO-ANN accuracy of (a) recovery and (b) NPV predictions in terms of MSE and -squared.

Figure 6 :Figure 7 :
Figure 6: Comparison between suggested network model and recovery factor versus relevant data index: (a) training phase and (b) testing phase.

Figure 8 :Figure 9 :Figure 10 :
Figure 8: Performance plot of the suggested network model for determining net present value (NPV) of chemical flooding owing to correlation coefficient ( 2 ): (a) training phase and (b) testing phase.

Table 2 :
Statistical parameters of the proposed approaches in prediction of efficiency of chemical flooding in oil reservoirs.