Pore Space Reconstruction of Shale Using Improved Variational Autoencoders

Pore space reconstruction is of great significance to some fields such as the study of seepage mechanisms in porous media and reservoir engineering. Shale oil and shale gas, as unconventional petroleum resources with abundant reserves in the whole world, attract extensive attention and have a rapid increase in production. Shale is a type of complex porous medium with evident fluctuations in various mineral compositions, dense structure, and low hardness, leading to a big challenge for the characterization and acquisition of the internal shale structure. Numerical reconstruction technology can achieve the purpose of studying the engineering problems and physical problems through numerical calculation and image display methods, which also can be used to reconstruct a pore structure similar to the real pore spaces through numerical simulation and have the advantages of low cost and good reusability, casting light on the characterization of the internal structure of shale. The recent branch of deep learning, variational auto-encoders (VAEs), has good capabilities of extracting characteristics for reconstructing similar images with the training image (TI). The theory of Fisher information can help to balance the encoder and decoder of VAE in information control. Therefore, this paper proposes an improved VAE to reconstruct shale based on VAE and Fisher information, using a real 3D shale image as a TI, and saves the parameters of neural networks to describe the probability distribution. Compared to some traditional methods, although this proposed method is slower in the first reconstruction, it is much faster in the subsequent reconstructions due to the reuse of the parameters. The proposed method also has advantages in terms of reconstruction quality over the original VAE. The findings of this study can help for better understanding of the seepage mechanisms in shale and the exploration of the shale gas industry.


Introduction
The demand for oil and gas resources is soaring in the whole world with the fast development of economy. Faced with huge energy demands, the world's conventional oil and gas production is relatively insufficient, so people begin to pay more attention to unconventional oil and gas resources. Shale oil and shale gas are two kinds of unconventional resources with wide distributions and considerable potentials. However, because of the complex geological conditions of shale reservoirs, it is necessary to figure out the internal structures including the distribution of pore space in shale before any large-scale engineering exploitation [1][2][3].
Shale is a typical porous medium with low porosity, low permeability, and anisotropy, making the exploration and exploitation of shale oil and gas difficult. Some macroscopic properties of shale (such as porosity, permeability, etc.) depend on its microstructures, so it is quite useful to reconstruct a 3D microscopic shale model describing the statistical and topological properties of the pore structure. An important step for building such a model is to reconstruct the internal microstructure of pore space with the characteristics of real shale. The general pore space reconstruction methods are divided into physical experimental methods and numerical reconstruction methods [4][5][6][7][8][9][10].
Commonly used physical experimental methods are the scanning electron microscope (SEM) [4], the focused ion beam SEM (FIB-SEM) [5], the CT-scanning method [6], and the atomic force microscope SEM (AFM-SEM) [7]. With the help of the above experimental methods and physical equipment, some important parameters and variation characteristics such as the variation in permeability during CO 2 -CH 4 displacement in coal seams [11] can be effectively obtained and studied. Generally speaking, these physical methods normally are fast and their imaging quality mostly is quite good, but the imaging costs are usually quite high and experimental samples are difficult to prepare due to the fragile structure of some porous media like shale, leading to the limitation of physical experimental methods for wide application [12].
Some traditional numerical reconstruction methods extract the statistical information (such as porosity and variogram) of the 2D pore structures as constraints to complete reconstruction. For example, the early reconstruction methods such as the simulated annealing method [8], the process-based method [9], and the sequential indicator simulation method [10] use the low-order statistical information for 3D reconstruction. Then, some methods such as the multiple-point statistics (MPS) [13] used for the reconstruction of porous media focus on the reconstruction through the high-order statistical information describing the correlation between multiple points of the pore structures. It is quite obvious that the methods like MPS using high-order information have the advantages in depicting complex pores and the reconstructions are more similar to the realistic pore structures. However, this also makes it a big challenge for the branches of MPS and some improved MPS methods (e.g. direct sampling (DS) [14], filter-based simulation (FIL-TERSIM) [15], and single normal equation simulation (SNE-SIM) [13]) to perform reconstruction due to more hardware burdens for CPU and memory as well as a lengthy computational time.
Recently, many engineering and scientific researchers have benefited from the fast development of deep learning thanks to its robust extraction capabilities for characteristics, leading to the possibility of deep learning extracting the structural characteristics of porous media for pore space reconstruction [16]. Compared to the traditional reconstruction methods, there are two important advantages introduced by deep learning and its branches. The first one is the training time which can be greatly reduced due to the acceleration of GPU (Graphics Processing Unit) and the reuse of model parameters. Because of the frameworks (e.g., Tensorflow and PyTorch) provided by the public deep learning communities, these deep learning methods can be used by common users even without knowing the complicated assignment and design of parallelization for GPU. The detailed work about GPU is performed using the APIs of the frameworks like the black-box mechanism, which makes it easier for the general public. The second advantage relies on the strong ability of deep learning extracting characteristics from training images (TIs), improving the similarity between reconstructions and the TIs to obtain higher reconstruction quality.
As one of the important technologies in deep learning, the autoencoder is an unsupervised learning algorithm whose output can realize the learning and represent of the characteristics of input data. The concept of autoencoder was first put forward in [17]. Hinton et al. [18] derived a fast greedy algorithm based on the autoencoder by using the complementary prior method to generate the deep autoencoder. Vincent et al. [19] added noise to the input data to improve the robustness of the algorithm to form a denoising autoencoder.
Based on variational bayes (VB) inference, the variational autoencoder (VAE) model [20] is a deep latent generation model and widely used in image generation, but it often suffers from the complicated expression model leading to the low quality of generated images. In information theory, Fisher information has always been an important tool to describe information behavior in information systems [21]. Fisher introduced Fisher information in the context of statistical estimation, which can describe the behavior of the dynamic system accurately [22]. Recent studies have reported that VAE is difficult to balance the information control between the encoder and the decoder in its structure [23]. For example, when the decoder is too expressive, latent variables generated by the encoder are almost ignored. Therefore, this paper applies the information theory specifically the Fisher information to improve VAE for the reconstruction of shale.
Compared to the traditional methods, our method can reuse the parameters obtained in the first training or reconstruction, so the subsequent reconstructions can be largely accelerated without the repeatedly training process existing in the traditional methods, which is a practical contribution for a large-quantity reconstruction mission. In addition, the combination with Fisher information also helps to improve the reconstruction quality of pore space compared to VAE.
The sketch of the research technical route of this paper is shown in Figure 1. The remainder of this paper is organized below. The main idea of the proposed method is given in Section 2. Section 3 gives a detailed procedure of the proposed method in this paper. Section 4 gives experimental results and comparison with some other methods. Section 5 provides conclusions.

The Main Idea of the Proposed Method
2.1. The Introduction of VAE. In VAE, two models of probability density distribution are established using two neural networks: one is called the encoder, used for the variational inference of original input data to generate the variational probability distribution of the latent variable z; the other is the decoder, reconstructing the data by generating the approximate probability distribution of the original data according to the variational probability distribution of the generated latent variable z.
The typical structure of VAE is shown in Figure 2, in which "+" and " * ," respectively, represent the addition and multiplication of elements; X is the input dataset and X ′ is the reconstructed result. X usually is quite highdimensional and used as the training data, whose internal characteristics are learned by the encoder. VAE uses stochastic gradient descent (SGD) to optimize the loss function [20]. SGD is a simple and effective optimization method that updates parameters until losses are within acceptable limits. Although SGD can handle random input, it cannot handle 2 Geofluids random operation. Hence, VAE needs to "reparameterized" for optimization by introducing an auxiliary parameter ε, which is obtained by sampling from the standard normal distribution Nð0, 1Þ. The latent variable z can be expressed by Equation (1) [24]: where σ and μ are, respectively, the standard deviation and mean of Gaussian distribution calculated from the encoder. Suppose x is the input vector and x′ is the output vector. The encoder model q φ ðz | xÞ parameterized by φ is introduced into the encoder network to replace the undetermined real posterior distribution P θ ðz | xÞ since the distribution of z is unknowable. Kullback-Leibler (KL) divergence [25] is used to measure the similarity between q φ ðz | xÞ and P θ ðz | xÞ. The constraint parameters φ and θ are repeatedly trained and optimized to minimize the KL divergence, meaning that the evidence lower bound (ELBO) function Lðθ, φ ; XÞ is maximized, which is defined as follows: where D KL represents the KL divergence, lb is a base-2 logarithm, P θ ðx ′ | zÞ is the probability distribution of a θ -parameterized decoder, and P θ ðzÞ represents the probability distribution of the latent variable z. Finally, z is input into the decoder model to obtain the final reconstruction X ′. It seems Equation (2) has a penalty item (KL divergence) and a reconstruction item.

Fisher Information and Shannon Entropy.
As can be seen from the Lðθ, φ ; XÞ in Equation (2), only the KL divergence is considered a penalty item for regularization, so it is difficult to balance between the representation of Reconstruct the pore space of shale based on the proposed network.
Estimating the reconstruction quality by comparing with other methods.
A 3D shale cube is used as a TI for the training of our proposed network.
Comparison of porosity.

Comparison of variogram.
Comparison of permeability.
Comparison of pore distribution.

Comparison of reconstruction time
and CPU/GPU/memory usage. 3 Geofluids the latent variable z and the likelihood maximization of the reconstructed results [23]. Since the representation of the VAE encoder network is controlled by the divergence error but there is no control exerted on the VAE decoder network, some schemas based on information theory were used to solve this problem; e.g., a constraint item based on Shannon entropy is introduced in ELBO to ensure that latent variables can learn sufficient data characteristics, thus improving the quality of representation learning [26]. However, Shannon information is usually difficult to be processed in calculation, and only approximate substitutes can be obtained in previous studies.
For a continuous probability distribution function (PDF), Shannon entropy is a measure of "global characteristics," which is less sensitive to changes in the distribution of local details of a small area. Fisher information is quite different from Shannon entropy [21]. The former is a measure of the gradient content of the distribution, so it is also quite sensitive to small local details. Let the Shannon entropy and Fisher information of input data X be HðXÞ and JðXÞ, respectively. Shannon entropy is usually converted into the form of entropy weight [22]: where exp represents the exponential function with base e. When X represents a random vector and Fisher information JðXÞ represents a Fisher information matrix, the relation between NðXÞ and the trace of JðXÞ writes [22,27] where K is a constant and trð•Þ represents the trace of a matrix. As shown in Equation (4), Fisher information and Shannon entropy are intrinsically related and have the nature of uncertainty, in which the higher Fisher information is, the lower Shannon entropy is [27], but the calculation of Fisher information is usually easier than Shannon entropy. Hence, Fisher information is considered to be complementary for Shannon entropy and used in this paper.
Compared with Lðθ, φ ; XÞ shown in Equation (2), Equation (5) adds two more penalty items (PI-1 and PI-2) that adjust the encoder and decoder, respectively: PI-1 controls Fisher information of the encoder network, and PI-2 controls Fisher information of the decoder network. λ z and λ x ′ are adjustment coefficients. Both F x ′ and F z are positive constants, representing the expected Fisher information values in the decoder and the encoder, respectively, meaning that Fisher information in the decoder and encoder can be controlled by F x ′ and F z . The larger F x ′ and F z are, the more likely the distribution is estimated by the θand φ-parameterized model; otherwise, the smaller F x ′ and F z show that Input the TI of real shale and initialize all parameters.
Use SGD to update the parameters.
The network parameters are saved and output the reconstructed images.
Calculate the latent result z by , and random sampling from N (0,1).
Obtain the probability distribution and fisher information of training data to optimize the encoder. 4 Geofluids the influence of Shannon information is enhanced. Fisher information estimation can be directly calculated according to its definition [28].
The optimization procedures of the model are discussed, respectively, for the encoder network and the decoder network in the following section. Take the encoder network as an example. If the Fisher information is only considered in the encoder network, set λ x ′ = 0 in Equation (5). Meanwhile, the KL divergence is considered, but the reconstruction item in Equation (5) is not considered. The ELBO of the encoder network, denoted as L e ðθ , φ ; XÞ, contains the error item of KL divergence and information error item: where the posterior distribution q φ ðz | xÞ obeys the normal distribution and P θ ðzÞ obeys the standard normal distribu-tion. Therefore, the KL divergence in Equation (6) is calculated as [20] follows: According to the definition, trðJðzÞÞ in Equation (6) can be obtained [25]: Equations (7) and (8) are substituted into Equation (6), and then L e ðθ, φ ; XÞ writes

Geofluids
Similarly, Fisher information is used in ELBO at the decoder network, denoted as L d ðθ, φ ; XÞ. Now, the reconstruction item in Equation (5) is considered, but the KL divergence is not considered. Then, set λ z = 0 in Equation (5), and L d ðθ, φ ; XÞ writes As shown in Equation (2), the original VAE only considers KL divergence and a reconstruction item, while IVAE includes a reconstruction item and three penalty items (KL divergence, PI-1, and PI-2) related to the Fisher information and KL divergence, as shown in Equation (5). IVAE balances both the likelihood estimation and the dependence between input data and latent variables through the new penalty items, making the new model include KL divergence and Fisher information together so as to improve the reconstruction quality.

The Procedure of the Proposed Method
The steps of the proposed method are as follows: Step 1. Input the TI of real shale and initialize all parameters.
Step 2. Use SGD to update the parameters.
Step 3. Use the designed deep neural network to iteratively fit the training data, and then the probability distribution and information content of the training data are obtained. Optimize the encoder network according to Equation (6).
Step 4. The random sampling results from Nð0, 1Þ as well as σ and μ obtained from the encoder network are substituted into Equation (1) to calculate the latent result z : Once the training error reaches the thresholds, the network parameters of the encoder are saved.
Step 5. Take the latent result z as the input of the decoder network, which is iteratively optimized according to Equation (10) to obtain z. The final decoding result from decoder is the final reconstruction result, and the network parameters of the decoder are saved.
The flowchart of the above procedures is shown in Figure 3.

Experimental Results and Analyses
In this section, the experiments were implemented with an Intel Core i7-9700k 4.1 GHz CPU, 16GB memory, and GeForce

Geofluids
RTX2070s GPU with 8GB video RAM to evaluate the effectiveness of IVAE in the pore space reconstruction of shale by the metrics of porosity, variogram curves, permeability, distribution of pores, and CPU/GPU/memory performance. The experimental software framework for IVAE is Tensorflow-GPU.

Training Data.
To estimate the effectiveness of the proposed method in the pore space reconstruction of shale, the real shale volume data with the resolution of 64 nanometers obtained by nano-CT scanning were used as the test data for the following tests. Figure 4 shows three cross-sections (64 × 64 pixels) of the shale image with two facies: grain and pore space.
Note that in 2D conditions the basic image unit is pixel while in 3D conditions the unit of 3D images is voxel. Before applying any reconstruction methods, a 3D cube with 64 × 64 × 64 voxels was used as the TI, extracted from the original shale volume data. Figures 5(a)-5(c) are the exterior (64 × 64 × 64 voxels), cross-sections (X = 32, Y = 32, and Z = 32) and pore space of the TI (porosity = 0:2690).

Reconstructions
Using IVAE, VAE, SNESIM, and DS. In the section, IVAE, VAE, and two typical reconstruction methods-SNESIM and DS-were, respectively, used to reconstruct shale for comparison. The main parameters of IVAE are initialized as follows: learning rate is 0.001, the number of training epochs is 4000, batch size is 64, λ z = λ x ′ = 0:1, and F x ′ = F z = 5. Figure 6 is the reconstruction by IVAE. Figures 7-9 are, respectively, the reconstructed images using VAE, SNESIM, and DS. All the reconstructed images are 64 × 64 × 64 voxels. It can be found that the results of the four reconstruction methods all have similar longconnected pore spaces with the TI.

Comparison of Porosity.
Porosity of shale indicates the shale's ability to store fluids and is one of the characteristics for evaluating reconstruction quality. The definition of porosity ϕ is as follows: where V is the total volume and V p is the pore volume. The porosities of the TI and the images shown in Figures 6-9 are displayed in Table 1. It is seen that the porosity of the IVAE-reconstructed image is closest to that of the TI. To obtain an average performance in porosity, IVAE, VAE, SNESIM, and DS were performed for another 20 times to, respectively, achieve 20 reconstructed images (64 × 64 × 64 voxels). As shown in Table 2, the porosity of 3D images reconstructed by IVAE is closest to that of the TI, and the variance is smallest, indicating a high reconstruction quality and low fluctuation.

Comparison of Variogram.
Variogram is widely used to evaluate the variability of spatial structures in different direction [29], which is defined as follows: where E is the mathematical expectation, h is the lag distance between two positions: x + h and x, and ZðxÞ means the property value at the position x. If the variogram curves of two geological bodies are similar in a certain direction, it means that the spatial structures of the two geological bodies are similar in this direction; otherwise, it means that their structures are quite different. In this section, the variogram curves of TI and average variograms of 20 reconstructed images by IVAE, SNESIM, VAE, and DS were, respectively, calculated in the three directions of X, Y, and Z, as shown in Figure 10. In general, the variogram curves of IVAE are closest to those of the TI.
In order to quantitatively measure the difference of variogram curves between the TI and the reconstruction of each method, a difference degree (DD) is defined as follows:    where TI and re, respectively, represent the TI and the reconstruction method and X h and x h represent the variogram value of the TI and the reconstructed image when the distance is h. A smaller DD means a smaller difference between the TI and the reconstruction. DDs between the TI and the reconstructions of IVAE, SNESIM, VAE, and DS measured by variogram differences in the X, Y, and Z directions are shown in Table 3. It seems that reconstructed images by IVAE are closest to the TI.  [29,30]. The data of the TI and the reconstructed images were, respectively, used as the input files of LBM simulation to calculate the permeabilities of those models with the size of 64 × 64 × 64 voxels. As shown in Table 4, the permeability of the TI and the average permeabilities of 20 reconstructed images using IVAE, SNESIM, VAE, and DS in three directions were computed by LBM. The permeability values of the reconstructed images using IVAE are quite close to those of the TI, displaying the good reconstruction quality of IVAE.

Distribution and Numbers of Pores.
The diameter of a pore is defined as follows: where V is the volume of a pore. The 3D shale images reconstructed by IVAE, SNESIM, VAE, and DS are imported into the software Avizo [31] to analyze the number of pores and the pore size. Table 5 shows the pore number in the TI and the average pore numbers in the reconstructions of these methods. Table 6 shows the maximum, minimum and average pore diameters in the TI and the reconstructions of IVAE, SNESIM, VAE, and DS. From Tables 5 and 6, it is seen that IVAE has shown the best reconstruction quality since the reconstruction of IVAE has the most similar pore number and average pore diameter with those of the TI.

4.7.
Comparison of Reconstruction Time and CPU/GPU/Memory Usage. Since the pore space reconstruction of shale is quite CPU-intensive and normally uses large memory, the reconstruction time and CPU/GPU/memory usage of IVAE, SNESIM, VAE, and DS are compared in this section. As for the computational hardware platforms, these four methods are different. SNESIM and DS are typical CPU-based reconstruction methods, while IVAE and VAE can be performed by both CPU and GPU. Table 7 shows the time consumption and the average usage of CPU, GPU, and memory by IVAE, SNESIM, VAE, and DS for 20 reconstructions.
In Table 7, the time for 20 reconstructions is divided into two parts: the time for the first reconstruction and the time for the other 19 reconstructions. The former means the time used for the first reconstruction of the traditional reconstruction methods (SNESIM and DS) and deep learning methods (VAE and IVAE). The latter is the time consumed for the other 19 reconstructions by IVAE, SNESIM, VAE, and DS.
Deep learning methods normally benefit from the ability of saving model parameters during training. Therefore, the reconstruction process after the first model training that is finished in the first reconstruction only needs to reuse the saved parameters and takes much less time to reconstruct images. Hence, as typical deep learning methods, VAE and IVAE require much longer training time to set up the training model before reconstruction, usually consuming more time in the first reconstruction than the time used for the subsequent reconstruction. On the contrary, SNESIM and DS need to rescan the TI from scratch for each reconstruction because they only store training modes in memory. When a reconstruction process is over, training data of SNE-SIM and DS in memory will be cleared, resulting in the repeated processes of scanning the TI and extracting characteristics from the TI in the subsequent reconstruction.
Since VAE and IVAE can save and reuse the parameters after the first reconstruction, each subsequent reconstruction only needs to input the parameters and complete the reconstruction very soon. Hence, as shown in Table 7, although VAE and IVAE spend more time on the first reconstruction than SNESIM, the time for the subsequent reconstructions decreases largely. From an overall point of view, VAE and IVAE have great advantages in the speed of multiple reconstructions over the other CPU-based reconstruction 9 Geofluids methods. As for the overall performance of time consumption and average use of CPU/GPU/memory, VAE and IVAE are evenly matched and each one has its own strengths.

Summary and Conclusions
The properties of shale such as low porosity, low permeability, and complex inner structures are the main reasons challenging the exploitation of shale reservoirs. The establishment of a 3D pore space model of shale can help to analyze the characteristics of shale, improving the producing efficiency of shale resources. Main conclusions, including the advantages and disadvantages (or limitations) of our method, are as follows: (1) A real 3D shale cube acquired by nano-CT scanning was used as the TI, providing the necessary real structural information of pore space for pore space reconstruction, so the reconstructed structures have similar structures with the real shale (2) Traditional numerical reconstruction methods need to repeatedly scan the TI to extract the statistical information in reconstruction, leading to consuming more time in repeated reconstructions. The proposed method shows the advantage in shale reconstruction in terms of both speed and quality by reusing the saved models after the first reconstruction or training (3) Our method combines VAE and Fisher information together for the pore space reconstruction of shale, using new penalty terms related to Fisher information in the encoder and the decoder, respectively, so our method performs better than the original VAE in the reconstruction quality, also verified by the real experiments (4) Our method still has some disadvantages or limitations. First, compared to traditional reconstruction methods, our method has higher CPU/memory usage and consumes much more time in the first reconstruction. Second, our method is established on the framework of Tensorflow-GPU, so a GPU is necessary for our method, increasing the hardware cost of reconstruction Nomenclature VAE: Variational autoencoder TI: Training image SEM: Scanning electron microscope MPS: Multiple-point statistics DS: Direct sampling FILTERSIM: Filter-based simulation SNESIM: Single normal equation simulation GPU: Graphics Processing Unit z: Latent variable X: Input dataset X ′ : Reconstructed result SGD: Stochastic gradient descent ε: Auxiliary parameter σ: Standard deviation of Gaussian distribution μ: Mean of Gaussian distribution x: Input vector x ′ : Output vector q φ ðz | xÞ: Encoder model P θ ðz | xÞ: Real posterior distribution φ: Constraint parameter θ: Constraint parameter Lðθ, φ ; XÞ: Evidence lower bound function D KL : KL divergence lb: Base-2 logarithm P θ ðx′ | zÞ: Probability distribution of a θ-parameterized decoder P θ ðzÞ: Probability distribution of the latent variable z ELBO: Evidence lower bound PDF: Probability distribution function HðXÞ: Shannon entropy of input data X JðXÞ: Fisher information of input data X exp: Exponential function with base e K: A constant trðÞ: The trace of a matrix IVAE: Information variational autoencoder PI-1: Penalty item PI-2: Penalty item λ z : Adjustment coefficient The expected Fisher information value in the decoder F z : The expected Fisher information value in the encoder L e ðθ, φ ; XÞ: Evidence lower bound of the encoder network L d ðθ, φ ; XÞ: Evidence lower bound of the decoder network KL: Kullback-Leibler ϕ: Porosity V: Total volume V p : Pore volume E: Mathematical expectation h: Lag distance ZðxÞ: Property value at the location x DD: Difference degree re: Reconstruction method X h : The variogram value of the TI when the distance is h x h : The variogram value of the reconstructed image when the distance is h LBM: Lattice Boltzmann Method AFM: Atomic force microscope.

Data Availability
The data used to support this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflict of interest.