History Matching of a Channelized Reservoir Using a Serial Denoising Autoencoder Integrated with ES-MDA

,


Introduction
In the petroleum industry, history matching is an essential process to calibrate reservoir properties (e.g., facies, permeability, and PVT parameters) by conditioning one or more reservoir models to field observations such as production and seismic data [1].Ensemble-based data assimilation methods based on Bayes theorem [2][3][4] have been applied to solve a variety of petroleum engineering problems since the early 2000's [5].Specifically, ensemble Kalman filter (EnKF) [2], ensemble smoother (ES) [6], and ensemble smoother with multiple data assimilation (ES-MDA) [7] have been utilized for history matching of geological features such as channels (the subject of this study).Loss of geological characteristics due to pixel-based manipulation of channel features (such as shape and connectivity) is challenging for an ensemble-based history matching of a channelized reservoir.Note, the loss is regarded as noise in this study.Despite sufficient conditioning to field observations during ensemble updates, increase in noise often causes failure to deliver the geologically plausible reservoir models.This decreases the reliability of history matching results [8].For this reason, a relevant problem is how to update the reservoir models with consideration for geological plausibility in a practical manner.
Previous studies have improved the performance of ensemble-based history matching by adopting data transformation [9][10][11][12][13].In general, transformation methods have a substantial energy compaction property that is useful for feature extraction and dimension reduction of parameters and helping to save computational cost in data processing.If essential features are adequately acquired, updating the features can also yield an improved history matching performance over calibrating original parameters.For these reasons, discrete cosine transform (DCT) [14,15], discrete wavelet transform [1], K-singular value decomposition (K-SVD) [16,17], and autoencoder (AE) [18] have been employed as ancillary parameterizations of ensemblebased methods.For history matching of channelized reservoirs, DCT has been utilized to preserve channel properties because DCT figures out overall trends and main patterns of channels by using only essential DCT coefficients [19][20][21][22].Updating essential DCT coefficients implies the importance of determining the optimal number of DCT coefficients for preserving channel connectivity and continuity [22].K-SVD has an advantage of sparse representations of data as weighted linear combinations of prototype realizations.However, it takes preprocessing time to construct a set of prototype realizations called a dictionary.As a remedy, a combination of DCT and iterative K-SVD was proposed to complement the limitations of both methods [23].Canchumuni et al. [18] coupled AE with ES-MDA for an efficient parameterization and compared its performance with that of ES-MDA coupled with principal component analysis.
Recent advances in machine learning have offered opportunities for using complex meta-heuristic tools based on artificial neural networks (if the tools are well trained at affordable computational cost).In petroleum engineering, examples include production optimization [24][25][26] and history matching [18,[27][28][29].As investigated in [18,27], AE is a multilayer neural network that learns efficient data coding in an unsupervised manner.This is useful for representation (encoding) of a given dataset followed by reconstruction of the encoded dataset [30,31].In image processing and recognition, AE has a capability of denoising through encoding and decoding processes if noise data is input and purified data is output [32].This type of AE is called denoising autoencoder (DAE) [33,34].
Taking this capability of DAE into consideration, this study designs a serial denoising autoencoder (SDAE) and integrates the algorithm in the ensemble update of ES-MDA to improve the performance of ensemble-based history matching.The SDAE learns how to eliminate the noise and restore the clean reservoir models through encoding and decoding processes using the noise realizations as inputs and the original realizations as outputs of the SDAE.The trained SDAE imports the posterior reservoir models derived using Kalman gain of ES-MDA for purifying the models and exports the purified models as prior models for the subsequent assimilation of ES-MDA.The ES-MDA coupled with SDAE is applied to history matching of a channelized gas reservoir.Its performance is compared with that of the conventional ES-MDA.Also, denoising effects are investigated for ES-MDA coupled with dimension reduction methods such as DCT and K-SVD.

Methodology
In this study, ES-MDA is the platform to calibrate the reservoir models for history matching.A procedure of ES-MDA is mainly composed of two steps: numerical simulation for the reservoir models and update of reservoir parameters.A brief description of ES-MDA is given in Section 2.1.SDAE purifies noise in the updated reservoir models (Section 2.2).AE (Section 2.2), DCT (Section 2.3), and K-SVD (Section 2.4) are introduced as parameterization techniques.Section 2.5 proposes the ES-MDA algorithm coupled with the SDAE and the parameterization methods.

ES-MDA.
Ensemble-based history matching methods update parameters of the target models using observed data such as production rate and 4D seismic data.For model updates, EnKF utilizes observed data one-time step by one-time step in time sequence.The principle of EnKF might cause inconsistency between the updated static models and dynamic behaviors due to sequential updates without returning to an initial time step [7,35].ES updates models using observed data measured at all time steps at once to solve the inconsistency issue [6].However, history matching performance obtained using ES was often less satisfactory due to the one-time calibration of the reservoir models.ES-MDA is a variant of ES.ES-MDA repeats ES with inflation coefficients for the covariance matrix of observed data measurement error.Therefore, it has advantages in not only history matching performance but also the consistency between static data and dynamic data [35].
For ensemble-based history matching, the equation of model update is as follows: where m is the state vector consisting of reservoir parameters (e.g., facies and permeability), the subscript i means the ith ensemble member, the superscript b means before update in this study, C md is the cross-covariance matrix of m and d, C dd is the autocovariance matrix of d, α p is the inflation coefficients for C D (which is the covariance matrix of the observed data measurement error [7]), d is the simulated responses obtained by running a forward simulation, d unc is the observation data perturbed by inflated observed data measurement error, and N ens is the number of ensemble members (i.e., the reservoir models in the ensemble).In equation (1), the Kalman gain that is computed with regularization by singular value decomposition (SVD) using 99.9% of the total energy in singular values [7].

Geofluids
Definitions of C md and C dd are as follows: where m is the mean of state vectors and d is the mean of dynamic vectors.The condition for α p is as follows: where N a is the number of assimilations in ES-MDA.ES-MDA updates all state vectors N a times using an inflated covariance matrix of measurement error compared to the single assimilation of ES [7,35].In other words, ES has N a = 1 and α p=1 = 1 because of the single assimilation.
In equation ( 1), the perturbed observation d unc is computed as follows: where d obs means the original observed data.On the righthand side of equation ( 5), the second term is the perturbation term quantifying reservoir uncertainty caused by data measurement, processing, and interpretation.[36].Here, the manifold refers to the dimension that represents essential features of the original data [33,37].As a well-designed manifold is useful for data compression and restoration, AE has been recently utilized as a preprocessing tool for feature extraction of the reservoir models [18,38].Figure 1(a) is a schematic diagram of AE that shows compression and reconstruction of a channelized reservoir model composed of two facies: sand channels with high permeability and shale background with low permeability.Throughout this paper, indicators for shale and sand facies are 0 and 1, respectively (see the original reservoir model in Figure 1(a)).As a multilayer neural network, AE typically consists of three types of layers: one input layer, one or more hidden layers, and one output layer.Each layer is composed of interconnected units called neurons or nodes [39].In Figure where f AE • is the encoder of AE, W enc and b enc are the weight matrix and bias vector for f AE • , respectively.The subscript enc refers to encoding.For example, in Figure 1(a), m composed of facies indexes in 5,625 gridblocks is compressed into 2,500 coefficients denoted as h 1 in the hidden layer.
The above encoding process is followed by the decoding process as follows: where m is the reconstructed reservoir model, g AE • is the decoder of AE, W dec is the weight matrix for g AE • , and b dec is the bias vector for g AE • .The subscript dec refers to decoding.In Figure 1(a), the encoded coefficients h 1 are reconstructed as m in the output layer.In Figure 1(b), the encoded h 1 is encoded again into h 2 .For feature extraction, the number of nodes in hidden layers is smaller than that of the input layer.For data reconstruction, the number of nodes in the output layer is the same as that in the input layer.
Training AE indicates tuning W enc , b enc , W dec , and b dec in the direction of minimizing the dissimilarity between the original model m and the reconstructed model m.The dissimilarity is quantified as the loss function E given as follows: where N train is the number of the reservoir models used for AE training, N para is the number of parameters in each reservoir realization, m ij is the jth model parameter value for the ith model, λ is the coefficient for the L2 regularization, Ω W is the sum of squared weights, β is the coefficient for the sparsity regularization, and Ω s is the sparsity of network links between nodes of adjacent layers.Ω W and Ω s are given as follows: 3 Geofluids where w ij is the weight for a node of the jth parameter of the ith model, N node is the number of nodes in a hidden layer, ρ is a desired value for the average output activation measure, ρk is the average output activation measure of the kth node in a hidden layer, and h k m i is an assigned value in that kth node [40].
For further feature extraction, an AE (Figure 1(b)) can be nested in another AE, as shown in Figure 1(c).This nested AE is called a stacked AE (SAE) [33].In Figure 1(b), the encoded model h 1 composed of 2,500 coefficients is compressed into another encoded model h 2 composed of 465 coefficients.Figure 1(c) is a combination of Figures 1(a) and 1(b).In Figure 1  4 Geofluids realizations updated at each data assimilation and make the realizations preserve clean channel features in terms of shape and connectivity.This denoising process is also called purification in this study.Figure 2 shows a procedure of DAE applied to the purification of a channelized reservoir.
For obtaining training data of DAE in this study, the clean original models are generated using a multipoint statistics modelling technique called single normal equation simulation (SNESIM) [41] (Figure 2(a)).Black solid circles in Figure 2(a) and m of Figure 2(b) are the original models which are corrupted stochastically with artificial noise using the conditional probability, q m | m (Figure 2 [42] and Gaussian noise (GN) [43].SPN can be caused by sharp and sudden disturbances in the image signal.GN is statistical noise having a probability density function equal to that of the Gaussian distribution [43].Both SPN and GN are typical noise types in digital image recognition and so have been used for DAEs [33]. Figure 3 compares a clean reservoir model, the model corrupted with SPN, and the model corrupted with GN.In Figure 3(a), the clean model consists of two facies values: 0 (shale) and 1 (sand).Because the facies value of a grid cell blurred with SPN is converted either from 0 to 1 or from 1 to 0, SPN makes mosaic effects that might break and disconnect sand channels in the shale background while inducing sparse sand facies in the shale background (Figure 3(b)).Grid cells to be perturbed are randomly selected using a specific ratio over the reservoir domain.The range of GN values is [-1, 1].For preserving the range [0, 1] of given data, the facies value at a grid cell is set as 0 if the value is negative after GN is added.The value is 1 if any facies value added with GN is greater than 1 (Figure 3(c)).In brief, the need to remove two noise types necessitates the design of serial DAEs as an auxiliary module embedded in an ensemble-based data assimilation algorithm.
Figure 4 shows how trained SPN and GN filters denoise a channelized reservoir model updated using ES-MDA.This study executes SPN and GN filters sequentially.Grid cells of the updated model have facies values following the Gaussian distribution, as seen in the histogram of Figure 4(a).Recall that the ideal facies models follow the discrete bimodal distribution (not the Gaussian) before import to the simulator.Using the SPN filter (Figure 4(b)), a purified model yields the histogram following the bimodal distribution.However, it still reveals blurred channel borders.Some grid values are neither 0 (shale) nor 1 (sand).To obtain more distinct channel traits, the GN filter is applied to Figure 4(b) which yields Figure 4(c).After the GN filtering, a cutoff method is carried out to keep every facies value as either 0 or 1 for reservoir simulation [23].

Serial Denoising
Autoencoder.This research proposes a serial denoising autoencoder (SDAE) with consideration of the two noise types.Figure 5 describes the operating procedure of the SDAE denoising a channelized reservoir model.Figure 5(a) shows a DAE trained with the reservoir models corrupted with SPN.i in equation (1) after passing the cutoff method.

Extraction of Geologic Features Using Discrete Cosine
Transform.An efficient reduction of the number of parameters can contribute to improving the history matching performance [1,15,17].Discrete cosine transform (DCT) presents finite data points in a sum of coefficients of cosine functions at different frequencies [44].Figure 6 depicts   7 Geofluids an example that applies DCT for extracting features of a channelized reservoir model.Figure 6(a) shows an image of physical parameters (e.g., facies) for a channel reservoir.The DCT application to Figure 6(a) yields Figure 6(b) which shows a distribution of DCT coefficients.In Figure 6(b), DCT coefficients are arranged following an order of cosine frequencies: the upper left part is filled with lower (i.e., more essential) frequencies of cosine functions, and the lower right part is filled with higher frequencies of the functions.DCT coefficients in the lower-frequency region (regarded as essential) have higher energy (for representing the channel image) than those in the higher-frequency region.The total number of DCT coefficients is the same as the number of gridblocks: 75 × 75 = 5,625.The number of the upper left coefficients is 465 which is equal to the sum ∑ 30 i=1 i: from 30 DCT coefficients on the 1st row to 1 DCT coefficient on the 30th row.Throughout this study, the number of essential rows is constant at 30.Hence, the number of essential DCT coefficients N DCT becomes 465 if any data assimilation algorithm is combined with DCT.Thus, the data compression ratio is approximately 12.1 ≈5, 625/465 .In Figure 6(c), the coefficients in the upper left part within the red dotted triangle are preserved while the other coefficients are assumed as zero (corresponding to the negative infinity in the color scale).Figure 6(d) shows that a channel image reconstructed using the 465 DCT coefficients is similar to the original image shown in Figure 6(a).This reconstruction is referred to as inverse DCT (IDCT).Due to the data compression, the channel borders get blurred in Figure 6(d).Nevertheless, it seems that Figure 6(d) reliably restores the main channel trend of Figure 6(a).

Construction of Geologic Dictionaries.
Sparse coding is the process used to calculate representative coefficients for the prototype models composing a geologic dictionary [45][46][47].The premise of sparse coding is that geomodels are presented with a weighted linear combination of the prototype models [48,49].Once a library Y is built with a large number of sample geomodels, K-SVD extracts essential features from Y and then constructs both the dictionary matrix D and its weight matrix X: Y ≅ DX [16].Orthogonal matching pursuit (OMP) aids the decomposition of Y [50,51].
Figure 7 compares sparse coding to construct geologic dictionaries using K-SVD and OMP in the original facies domain and in the DCT domain.The procedure starts with constructing the library matrix Y (a N para by N lib matrix in Figure 7(c  computational cost of sparse coding is reduced more in the DCT domain than in the original grid domain [16,23].Furthermore, the previous work by the authors [23] claimed that iterating the modified sparse coding has the potential to improve the overall history matching performance of a channelized reservoir by updating the geologic dictionary in every assimilation of ES-MDA.N qual qualified ensemble members are selected for the efficient update of the geologic dictionary.More details on iterative update of sparse geologic dictionaries can be found in [23]. 2.5.Integration of DCT, K-SVD, and SAE in ES-MDA with SDAE.In this study, ten variants of ES-MDA are investigated to analyze the effects of SDAE on history matching performance of a channelized reservoir.Table 1 summarizes state vectors and postprocesses of the ten ES-MDA algorithms.Note, some algorithms are integrated with one or more of the transformation techniques addressed in Section 2. The first to fifth ES-MDA algorithms update the reservoir models without the SDAE.The sixth to tenth ES-MDA algorithms (which correspond to the first to fifth algorithms in numerical order) apply the SDAE as a noise remover to the ensemble update addressed in equation (1).
Figure 9 shows the flowchart of the ten algorithms.First, the N lib (thousands of) reservoir models are realized using SNESIM (Figure 9(a)) for considering various geological scenarios [17,18,23].For the parameterization techniques, the K-SVD uses the whole realization pool for constructing the geologic dictionary (see the left box of Figure 9(a)).The SAE and SDAE utilize some realizations as their training data.The initial ensemble is composed of randomly selected N ens realizations from the pool (see the right box of Figure 9(a)).As shown in Figure 9(b), forward reservoir simulation is run for the initial ensemble and the initial parameters are imported to the ten algorithms (depending on their transformation techniques).For each ES-MDA algorithm, the transformed parameters are updated using the Kalman gain (Figure 9(c)).For example, for the first algorithm, the facies indexes (i.e., 0 for shale and 1 for sand) are the target parameters of the reservoir models updated using the conventional ES-MDA without any transformation.These original coefficients are transformed into the DCT domain for the second algorithm.The third algorithm updates weight coefficients of K-SVD [23].The fourth algorithm adjusts weight coefficients of K-SVD in the DCT domain with an iterative update of the dictionary matrix [23].The fifth algorithm updates coefficients encoded using SAE (as described in Section 2.2.1).It should be clarified that the SAE and SDAE have different purposes.Similar to the DCT, the SAE is used to represent the facies distribution of a reservoir model in a lower dimension.The number of nodes in the hidden layer of the SAE equals the number of representatives.Meanwhile, the SDAE introduced in Section 2.2.4 aims at purifying the reservoir models in each data assimilation.
The updated reservoir parameters are retransformed into the facies domain to figure out the updated reservoir realizations in the physical state (see the left box in Figure 9(d)).Neither 0 nor 1 facies values are changed as 0 or 1 using the cutoff (see the right box in Figure 9(d)).In this study, 0.5 is the threshold facies value distinguishing sand from shale.The ensemble update is repeated until the assimilation count p reaches the number of assimilations N a (Figure 9(e)).After the final assimilation is complete, well behaviors are predicted through forward simulation for the updated reservoir models (Figure 9(f)).

Results and Discussion
The ten ES-MDA algorithms addressed in Table 1 were applied to history matching and production forecasts of a channelized gas reservoir to investigate the efficacy of denoising using the proposed SDAE on ensemble-based reservoir model calibration.Section 3.1 provides the field description and experimental settings for the algorithms.Section 3.2 describes experimental settings for SAE and SDAE.The simulation results of the ten algorithms are compared   regarding history-matched and history-predicted production rates (Section 3.3), updated facies distribution (Section 3.4), and error estimation (Section 3.5).
3.1.Field Description.Table 2 summarizes properties of a channelized gas reservoir model applied to the ES-MDA algorithms.This gas reservoir is composed of two facies: sand and shale.Boundaries of the gas reservoir are attached to a numerical aquifer modelled by pore volume multiplication.
Figure 10 shows the training image (Figure 10(a)) and reference model (Figure 10(b)) employed for the ten algorithms.Sixteen vertical gas production wells are set up: eight wells (P1, P4, P6, P7, P9, P12, P14, and P15) are drilled in the sand formation while the other eight wells (P2, P3, P5, P8, P10, P11, P13, and P16) are drilled in the shale formation.Facies information at the well locations are used as hard data of SNESIM that realizes the reference model and N lib reservoir realizations.
Table 3 describes well coordinates, operating conditions, and simulation period for history matching and forecast.The total simulation period was 7,000 days: 3,500 days for history matching was followed by production forecasts for 3,500 days.Target parameters of history matching were gas production rate and bottomhole pressure (BHP).Water production rate was regarded as the watch parameter (thus excluded from the matching targets).
Table 4 presents the number of the reservoir models and parameters used for the ten algorithms.N a = 4 and α p = 4 according to equation ( 4) for all the ES-MDA algorithms.

Experimental Settings for SAE and SDAE.
Recall that the SDAE was designed for denoising updated ensemble members, while the SAE was adopted as a feature extraction tool  11 Geofluids (such as the DCT).All autoencoders were developed using the trainAutoencoder toolbox in MATLAB [40].
Table 5 describes experimental settings for the SDAE.As the SDAE was the sequence of SPN and GN filters (Figure 5(c)), the number of hidden nodes in each filter was the same.With the 15% visiting probability, SPN altered the rock facies values of the visited gridblocks either from 0 to 1 or vice versa for each training the reservoir model.The SPN filter was trained with 2,100 noise reservoir models: 700 clean reservoir models were equiprobably noise three times.The number of the reservoir models used for training the GN filter was 2,000.All the training models originated from one clean model.For each training model, GN contaminated all gridblocks with the mean of 0 and standard deviation of 0.35.If a contaminated value of a gridblock was smaller than the minimum facies index of 0, the minimum was assigned to that gridblock.Also, the maximum of 1 was assigned if a value exceeded the maximum.
Table 6 describes hyperparameters used for the SAE.5,625 gridblocks were represented by 465 node values in the second hidden layer via 2,500 node values in the first hidden layers.Note, the number of SAE coefficients in the second hidden layer is kept equal to the number of DCT coefficients for a fair comparison throughout the study.

History Matching and Forecasts of Dynamic Data.
Figure 11 presents profiles of gas production rate during the 10-year history matching and the following 10-year prediction periods.Figures 11(a)-11(e) are the profiles obtained using the five ES-MDA algorithms uncoupled with SDAE.Figures 11(f)-11(j) are those obtained using the algorithms coupled with SDAE.In each subfigure, the solid grey and light blue lines represent the production behaviors of the initial and final updated ensembles, respectively.The dark blue line corresponds to the mean of the final ensemble.The red line indicates the production profile from the reference model (Figure 10(b)) regarded as actual data.The profiles at the production wells (P1, P4, P9, and P15) located on the sand formation were presented because these wells near the reservoir boundary were sensitive to the aquifer water influx in this case study.
For all ten ES-MDA algorithms, the updated ensembles decreased the discrepancies between the reference and updated ensemble models compared to the initial ensembles.Furthermore, the comparison of the subfigures indicates the denoising using the SDAE was effective to improve both matching and prediction accuracy during data assimilation.The five ES-MDA algorithms with SDAE yielded better performance (Figures 11(f)-11(j)) than the uncoupled algorithms (Figures 11(a)-11(e)) after the assimilations were complete.For the updated ensembles, reservoir uncertainty was somewhat left at well P15 during the prediction period.This was because the decrease in gas rate due to water breakthrough was hardly observed at this well during the history matching period.As shown in the reference model, the inflow from the numerical aquifer could arrive at well P15 after breaking through wells P12 or P14.This late water breakthrough caused the remaining uncertainty at well P15 despite the satisfactory assimilation results at the other wells.Figure 12 shows well BHP profiles during both periods.Every final ensemble got conditioned to the reference data sufficiently.This yielded the narrow bandwidth of the simulation results including the reference data at most wells.Also, denoising effects caused by the use of the SDAE were captured at well P15.
Figure 13 compares matched and predicted water production rate at the four production wells.Discrepancies between the updated ensemble mean profiles (dark blue lines) and the reference profiles (red lines) remained in the results of the ES-MDA algorithms without SDAE.For example, simulation results were somewhat unmatched to observations at well P1 in Figure 13(a) and at well P4 in Figure 13(b).The SDAE was effective in correcting these discrepancies.When comparing Figures 13(a)-13(e) and corresponding Figures 13(f)-13(j), both matching and prediction accuracy improved due to the coupling of SDAE and ES-MDA.In particular, remarkable improvements due to denoising were captured in prediction at wells P4 and P9.At wells P9 and P15, discrepancies were observed but acceptable considering water rate was used as the watch parameter and not used for history matching.

Distribution of Facies and Permeability after History
Matching.Figure 14 presents the evolution of an ensemble member obtained using the ES-MDA coupled with the SDAE method over four assimilations.The row number indicates the assimilation sequence.The first column presents the ensemble member obtained using ES-MDA (Figure 14(a)).The second column shows the member denoise using the SPN filter (Figure 14(b)).The denoise model was purified again using the GN filter, as shown in the third column (Figure 14(c)).Channel features blurred in Figure 14(a) improved significantly by passing the two filters in sequence.The filtering functionality of the trained SDAE refined the facies value of each gridblock in the ensemble member similar to 0 (i.e., shale) or 1 (i.e., sand).Finally, applying the cutoff 12 Geofluids to the filtered model yielded the prior model of the next assimilation (Figure 14(d)).The cutoff delivered the models only composed of sand and shale facies.Figure 15 compares the updated permeability distributions obtained using the ten ES-MDA algorithms.The first row of Figure 15 deploys the reference field and the mean of the initial ensemble members.The initial ensemble mean reveals high dissimilarity to the reference in terms of channel connectivity and pattern.The average maps of the updated ensemble members obtained using the ten algorithms are arrayed in the second and third rows.The conventional ES-MDA without SDAE had broken and thus had geologically less plausible channels due to the direct perturbation of gridblock pixels (i.e., facies) (Figure 15(a)).Though coupling DCT with ES-MDA reduced the pixel heterogeneity, the quality of the ensemble mean was less satisfactory.ES-MDA-K-SVD showed better results than the two previous algorithms.However, there was a room for improvement regarding connectivity between wells P9 and P14 (Figure 15(c)).For the ES-MDA-DCT-i-K-SVD, inconsistent channel widths and broken connectivity between wells P1 and P6 were observed (Figure 15(d)).Similar to the ES-MDA-K-SVD, ES-MDA-SAE suffered from the connectivity issue (Figure 15(e)).
Coordinates of well locations in shale facies (dimensionless) (30,14), (46,14), (14,30), (62, 30), (30,46), (46,46), (14,62), (62, 62) When comparing the plots on the second and third rows in the same column, the results obtained using the five ES-MDA algorithms with SDAE (Figures 15(f)-15(j)) preserved the main X-shaped channel patterns with consistent channel width, though Figures 15(g)-15(j) had unexpected channel connectivity between wells P9 and P14.In the case of Figures 15(b) and 15(g), it seems that coupling feature extraction techniques (such as DCT and SAE) might deteriorate history matching performance due to data compression of a reservoir realization.The same issue was seen in Figures 15(e) and 15(j).An improvement in the results is expected if an optimal number of essential coefficients and hyperparameters are used for the transformation.In summary, the above results imply a well-trained machine learning-based noise remover has the potential to preserve geological plausibility of a target reservoir model during ensemble-based history matching.
Figure 16 displays error histograms of facies distributions for the final ensembles.The error equals N m /N grid × 100 % , where N m is the number of facies-unmatched gridblocks where the updated facies are different from the facies in the reference model.Histograms are scaled 0 to 50 for the y-axis to make the results more readable.Note, the sum of the frequencies for ensemble members is 100 in each histogram.For the initial ensemble, the range of errors is between 17 and 37% (see the first row of Figure 16).All  cumulative gas production was insignificant at the wells on the shale background.The quality of the updated ensemble was quantified using the equations as follows: , for i = 1, … , N ens , 13 where ε i,type is the error of the ith ensemble member in terms of each dynamic data type, μ ε is the normalized mean of ε, and σ ε is the normalized standard deviation of ε.The superscripts upd and init indicate the updated and initial ensemble members, respectively.The overall ensemble quality was improved using the SDAE as the errors were significantly reduced after SDAE  16 Geofluids activation.The SDAE helped any ES-MDA algorithm reduce μ ε and σ ε .For example, μ ε for the matched gas rate obtained using ES-MDA without and with the SDAE were 20.98% and 5.17%, respectively.σ ε was also reduced from 40.23% to 8.27%.Furthermore, the average values of μ ε and σ ε for the ES-MDA algorithms were decreased for all data types after coupling the SDAE.For the history matching, the average error values of gas rate, water rate, and BHP were improved from 13.44% to 4.95%, from 76.85% to 37.65%, and from 21.04% to 13.14%, respectively.For the prediction, the average error values went from 35.29% to 5.39%, from 6.29% to 6.78%, and from 53.03% to 18.18%, respectively.The ES-MDA-DCT yielded ε values greater than 100% for the prediction, which indicated the degradation of the ensemble.This phenomenon claims any less-or unoptimized feature extraction might cause the deterioration of the ensemble quality.
3.5.Computational Costs for the Denoising Process.Table 9 summarizes the computational cost required for executing the transformation and denoising methods embedded in the ES-MDA.The specification of the computer used in this study was Intel (R) Core (TM) i7-8700 CPU at 3.2 GHz with 16 GB RAM.The cost for reservoir simulation was excluded from Table 9 because each ES-MDA algorithm expended the same cost for the ensemble update.The total number of reservoir simulation runs was 400, which was the product of the ensemble size (100) and the number of assimilations

Conclusions
The SDAE was implemented as the postprocessor of ES-MDA for enhancing the preservation of geological plausibility during ensemble-based history matching.The SDAE is composed of SPN and GN filters facilitating and purifying the updated reservoir models resulting from the smoothing effects.The denoising effects were investigated by comparing the results of the five ES-MDA algorithms coupled with the SDAE and those uncoupled.The application was to history matching of the channelized gas reservoir.The results obtained using ten different algorithms showed the performance difference between the cases with and without the 19 Geofluids SDAE in terms of production data matching and geological plausibility.The SDAE showed excellent accuracy in history matching and prediction for gas rate, water rate, and BHP.Executing the SDAE decreased the average matching error by 75% in the ES-MDA results.The SDAE was also efficient for improving the performance of the ES-MDA algorithms combined with the data transformation methods.The improvement in the matching and prediction accuracy of dynamic data resulted from the conservation of geological plausibility achieved using the SDAE.Consequently, the purified models followed the discrete bimodal distribution for mimicking the channelized reservoir while maintaining channel width and connectivity consistently.These results highlight the potential of the machine learning-based noise remover as an efficient auxiliary method that enhances the performance of ensemble-based history matching if the proxy is designed properly at affordable computational cost.

Figure 2 :
Figure 2: Schematic diagram of manifold learning using a denoising autoencoder (DAE) for geomodels: (a) generating the original models as training outputs; (b) noising the original models stochastically as training inputs; (c) learning of purification from the noise models to the original models; (d) building of a manifold dimension; (e) applying DAE to the noise models obtained from ES-MDA for obtaining the purified models; (f) delivering the purified models to ES-MDA.
Figure 5(b) is a trained DAE with GN.The two neural networks are herein called the SPN filter and the GN filter.Orange circles are original data.Red circles correspond to data noise with SPN or GN.Dark and light green diamonds indicate encoded coefficients in the hidden layers of SPN and GN filters, respectively.Peach colored circles indicate reconstructed data.m, m, and m represent the original, noise, and reconstructed models, respectively.In Figure 5(c), m is imported to the input layer of the SDAE.The quality of m is improved (in terms of geological plausibility such as channel pattern and connectivity) using the SPN and GN filters.It is expected that the output of the SDAE mGN will be similar to the corresponding original model m during training the SDAE.During data assimilation, mGN becomes m b

Figure 3 :
Figure 3: Three realizations of the (a) original, (b) salt and pepper noise, and (c) Gaussian noise reservoir models.

Figure 4 :
Figure 4: (a) A reservoir realization updated using ES-MDA.(b) A purified model using a salt and pepper noise (SPN) filter.(c) A further purified model using a Gaussian noise (GN) filter on (b).

Figure 5 :
Figure 5: (a) A denoising autoencoder eliminating salt and pepper noise.(b) A denoising autoencoder eliminating Gaussian noise.(c) A serial denoising autoencoder with the original, noise, and purified reservoir models.
)) which consists of a variety of channel reservoir realizations generated by SNESIM (Figure7(b)) with a given training image (Figure 7(a)).Herein, N para is the number of parameters in each reservoir realization and N lib is the number of reservoir realizations in Y.In Figure 7(c), N para equals N grid .Applying OMP and K-SVD decomposes Y into D and X (Figure 7(d)).Strictly speaking, the multiplication of D and X produces Y ′ , which is the reconstructed Y (see the right-hand side of Figure 7(d)).D is a N grid by N dict matrix and X is a N dict by N lib matrix.D and Y ′ are visualized in Figures 8(a) and 8(b), respectively.The above procedure to construct D and X can be carried out in the DCT domain as well if each reservoir realization is transformed into DCT coefficients (Figure 7(e)).For this modified sparse coding, Figure 7(f) shows that applying K-SVD and OMP builds D DCT (Figure 8(c)) and Y DCT ′ (Figure 8(d)) of which the dimensions are N DCT by N dict and N dict by N lib , respectively.It appears that both procedures sufficiently capture the channel connectivity and pattern of the original realizations (compare Figures 8(b) and 8(d) with Figure 7(b)).Also, N DCT is typically smaller than N grid for dimensionality reduction.For this reason, the

Figure 6 :
Figure 6: Example of discrete cosine transform (DCT) and inverse DCT (IDCT) applied to the reproduction of shale and sand facies of a channelized reservoir.

Figure 7 :
Figure 7: Construction of sparse geologic dictionaries using K-SVD without and with DCT: (a) training image, (b) generation of the initial channel models (Y) using SNESIM, (c) organization of the models in a matrix, (e) transformation of Y into DCT coefficients, and construction of D and X from Y using K-SVD in (d) facies domain and (f) in DCT domain.

Figure 8 :
Figure 8: Reservoir realizations of dictionaries and reconstructed libraries from Figure 7. (a, b) D and Y′ of Figure 7(d), respectively.(c, d) D DCT and Y DCT ′ of Figure 7(f), respectively.

Figure 9 :
Figure 9: Flowchart of ten ES-MDA algorithms conducted in this study.

Figure 10 :
Figure 10: Training images and the reference model used for history matching.P1 to P16 indicate well numbers.

Figure 12 :
Figure 12: Profiles of BHP at the four wells (P1, P4, P9, and P15) obtained by executing the ten ES-MDA algorithms.The latter five algorithms from (f) to (j) are the algorithms coupled with SDAE.

Figure 13 :
Figure 13: Profiles of water production rate at the four wells (P1, P4, P9, and P15) obtained by executing the ten ES-MDA algorithms.The latter five algorithms from (f) to (j) are the algorithms coupled with SDAE.

( 4 )
. The machine learning-based methods were more expensive than the transformation methods.It took a few seconds for the activation of the DCT to transform and reconstruct reservoir facies in each assimilation.It took approximately 3.6 hours for the K-SVD to construct the library and weight matrices, as addressed in[23].In contrast, the ES-MDA-DC-T-i-K-SVD needed only about one-ninth of the time for the ES-MDA-K-SVD due to the dimension reduction of reservoir parameters by DCT.It took approximately 8.7 hours to train the SAE.The SDAE was the most computationally expensive method.It took 17.9 hours to train the two filters: 7.2 hours for the SPN filter and 10.7 hours for the GN filter.Nonetheless, the overall results addressed in this study highlight the efficacy of the proposed SDAE as a supplementary tool to the data assimilation algorithm for improving the ensemble quality.Once developed, the noise remover could be easily applied to other algorithms without additional learnings.It is also anticipated that the development of computer hardware will enhance the efficacy of soft computing for big data machine learning.

Figure 14 :
Figure 14: Updated facies distribution results of the first ensemble member in every assimilation by the ES-MDA with SDEA.

Figure 16 :
Figure 16: Error histograms for facies distribution of the final ensembles obtained using the ten ES-MDA algorithms.

Figure 15 :
Figure 15: Permeability distributions of the ensemble mean maps obtained using the ten ES-MDA algorithms.
1, orange and peach circles indicate original and reconstructed data.Dark blue diamonds and purple squares represent encoded and double-encoded coefficients, respectively.Light blue diamonds are reconstructed double-encoded coefficients.If an original reservoir model m is imported to the input layer, the encoded model h is as follows:

Table 1 :
Comparison of state vectors and postprocesses after each assimilation for ten ES-MDA algorithms investigated in this study.

Table 2 :
Reservoir properties of the channelized gas reservoir.

Table 3 :
Experimental settings for reservoir simulation.

Table 4 :
Number of parameters used for the ten ES-MDA algorithms.

Table 5 :
Experimental settings for the SDAE.

Table 6 :
Experimental settings for the SAE.

Table 7 :
Statistical parameters of history matching errors for gas rate, water rate, and BHP (only for the wells installed in the sand formation).μ ε and σ ε refer to the mean and standard deviation of errors, respectively.

Table 8 :
Statistical parameters of prediction errors for gas rate, water rate, and BHP (only for the wells installed in the sand formation).μ ε and σ ε refer to the mean and standard deviation of errors, respectively.

Table 9 :
Comparison of the computational cost required for executing an auxiliary module embedded in ES-MDA.
AE • : Encoder function of autoencoder f DAE • : Encoder function of denoising autoencoder g AE • : Decoder function of autoencoder g DAE • : Decoder function of denoising autoencoder h: lib : Number of reservoir models in the library matrix Y N node : Number of nodes in a hidden layer N train : Number of reservoir models used for AE training N para : Number of parameters in a reservoir model w:Weight coefficients of a node in a hidden layer W dec :Weight matrix of the neural network in a decoder