Reconstruction of Three-Dimensional Porous Media Using Deep Transfer Learning

The reconstruction of porous media is widely used in the study of fluid flows and engineering sciences. Some traditional reconstruction methods for porous media use the features extracted from real natural porous media and copy them to realize reconstructions. Currently, as one of the important branches of machine learning methods, the deep transfer learning (DTL) method has shown good performance in extracting features and transferring them to the predicted objects, which can be used for the reconstruction of porous media. Hence, a method for reconstructing porous media is presented by applying DTL to extract features from a training image (TI) of porous media to replace the process of scanning a TI for different patterns as in multiple-point statistical methods. The deep neural network is practically used to extract the complex features from the TI of porous media, and then, a reconstructed result can be obtained by transfer learning through copying these features. The proposed method was evaluated on shale and sandstone samples by comparing multiple-point connectivity functions, variogram curves, permeability, porosity, etc. The experimental results show that the proposed method is of high efficiency while preserving similar features with the target image, shortening reconstruction time, and reducing the burdens on CPU.


Introduction
The reconstruction of porous media plays a key role in many engineering disciplines. A model of porous media can be used to quantitatively study the effects of various microscopic factors (e.g., pore structures, wettability, and aqueous films) on the macroscopic properties of oil and gas reservoirs, showing its great significance for the study of the seepage mechanisms of oil and gas [1][2][3].
Different approaches are used to model the internal structures and features of porous media such as geological facies and petrophysical properties. Some typical physical experimental methods including the Serial Sections Tomography Method (SSTM) and the X-ray Computer Tomography Scanning Method (XRCTSM) [4][5][6][7] use experimental instruments to scan a sample of porous media to obtain a large number of two-dimensional cross-section images and then superimpose these images by a modeling program or software to form three-dimensional digital porous media. The SSTM is quite time-consuming, and the connectivity between slices sometimes is not satisfactory. The XRCTSM has the advantage of being fast and precise, but the experiments are very expensive. Generally speaking, physical methods (e.g., SSTM and XRCTSM) can obtain highresolution real images of porous media but are constrained by high costs of equipment or experimental difficulties.
Different from physical experimental methods, numerical reconstruction methods such as the cross-correlation-based simulation (CCSIM) [8], the Sequential Indicator Simulation Method (SISM) [9], and the process-based method [10] are often based on a small number of two-or three-dimensional real images and reconstruct the three-dimensional porous media by stochastic simulation or sedimentary process simulation. Compared with physical experimental methods, numerical methods are cost-effective and can reconstruct many kinds of porous media, but some of them are still quite time-consuming and cannot do well in the reconstruction of porous media with complex inner structures like shale [3].
Multiple-point statistics (MPS) [11,12] is also considered a typical numerical method for the reconstruction of porous media, calculating the conditional cumulative probability function by replacing the variogram with the training images (TIs) using a sequential simulation process. However, the whole simulation is quite slow and memory-demanding, and every time, the probabilistic information from TIs will be scanned over repeatedly for a new simulation because the probabilistic information is stored in memory instead of a file, leading to a great waste of both time and hardware resources, e.g., single normal equation simulation (SNESIM) [12], pattern-based simulation (SIMPAT) [13], filter-based simulation (FILTERSIM) [14], distance-based pattern simulation (DISPAT) [15], and direct sampling [16,17].
Currently, it is possible to find some new solutions from flourishing machine learning especially deep learning and transfer learning to achieve better reconstruction of porous media for efficiency and quality. Deep learning is an algorithm that extracts complex features by performing multiple nonlinear transformations on data through multiple layers and neural networks [18]. Neural networks originated in the 1950s, which were called perceptron at that time. A neural network has altogether three layers: an input layer, an output layer, and a hidden layer. The input feature vectors connect the output layer through the hidden layer, and the result is obtained at the output layer. The single-layer perceptron is based on the linear combination of the input vectors, and the result is calculated by a nonlinear function. As shown in Figure 1, the first layer of a perceptron is regarded as a linear combination of input vectors x = ðx 1 , x 2 , ⋯, x k Þ. The output a can be used as the input of the next-level network after a nonlinear transformation by an activation function f ð·Þ. Define the weight w = ðw 1 , w 2 , ⋯, w k Þ and the bias b. If there are multiple single-layer perceptrons connected as multiple layers, a fully connected and discrete network called an artificial neural network (ANN) is obtained.
In 2006, Hinton and Salakhutdinov [18] proposed an effective way of using the restricted Boltzmann machines (RBMs) to learn features from datasets, establishing the framework of deep learning. As an algorithm based on learning data and characterizing inner features, deep learning uses many simple nonlinear features to transform raw data into more complex, higher-level, and more abstract representations [19]. Compared with the traditional ANN, deep learning has more hierarchical layers, so it has a stronger ability to abstract complex features.
Deep learning has also developed into a number of related branches, such as Convolutional Neural Networks (CNNs) [20,21] and Generative Adversarial Networks (GANs) [22]. A CNN is a multilayer perceptron specially designed to recognize two-dimensional shapes. It uses spatial relationships to reduce the number of parameters that need to be learned to improve the general training performance of the back propagation (BP) algorithm. A GAN trains two deep networks (generators and discriminators) and then lets them perform adversarial learning, improving the capabilities of the generators and discriminators in continuous adversarial learning. GANs have been used for the reconstruction of porous media by using a discriminator to learn the TIs and a generator to reconstruct a new image of porous media [2].
In many machine learning algorithms, an important assumption is that the current training data and future training data must be in the same feature space and have the same distribution. However, in many realistic cases, this assumption may fail. Transfer learning is proposed to address this problem by using different transformations to bring the distances of different data distribution closer, extracting information from one or more source tasks, and then applying it to the target task. It is not necessary to retrain the TIs for every new reconstruction or simulation when using transfer learning, so it can save much time and reduce the necessary amount of training data [23].
Many different transfer learning methods have been developed. The distribution adaptation methods including Transfer Component Analysis (TCA) [24] and Deep Domain Confusion (DDC) [25] are typical transfer learning methods. The joint distribution adaptation method [26] improves transfer learning by reducing the distance between the joint probability distribution of the source and the target domains. Deep transfer learning (DTL) [27,28] is based on transfer learning and currently is one of the most important technologies in deep learning, which can quickly transfer welltrained models from one dataset to another [29]. The network of DTL can simulate a new dataset well by borrowing the features from previously trained datasets [30].
This paper proposes a reconstruction method of porous media based on DTL, which uses deep learning to learn the features of porous media and then copies the learned features into the reconstructed results by transfer learning. Specifically, the method designs a deep learning model to learn the features of TIs and then to reconstruct porous media by transferring the features learned from TIs through a deep neural network (DNN). The reconstructed results have similar features (e.g., pore structures, connectivity, and permeability) hidden in the TIs.
There are two important tasks in the proposed method. The first one is to determine the proper parameters to learn TIs when training DNNs to achieve high accuracy. The second one is to transfer the features learned by DNNs into a new reconstruction. For the first task, the appropriate network structures (e.g., hidden layers and neural cells) and the optimization methods (e.g., the activation function and gradient descend algorithms) as well as some ways of tuning Input a = f(y) Neural cell Output Figure 1: The structure of a perceptron.
2 Geofluids for them will be considered. The second one can be solved by improving DTL and choosing appropriate transfer conditions. The processing time of the proposed method can be largely shortened due to the GPU acceleration and optimization algorithms for deep learning compared to traditional simulation methods such as MPS. Experiments show that DTL can reconstruct the similar structures of porous media with the target images. Meanwhile, once the parameters for the reconstruction can be determined, they are stored after training and can be quickly used for new reconstructions, displaying the effectiveness of the proposed method in reconstruction quality, speed, and memory demands.

The Main Idea of the Proposed Method
In the proposed method, deep learning is used to learn the complex features of porous media, and then, these features are transferred into reconstruction by transfer learning, so the details for the reconstruction of porous media will be discussed in two sections: Deep Learning Phase and Transfer Learning Phase.

Deep Learning Phase.
In the training process of deep learning, the errors or distances between the output and the expectation can often be obtained by calculating an objective function. These errors are reduced by modifying the internal tunable parameters of the model, which are often referred to as weights. In a typical deep learning system, there are millions of samples and weights for training a model. In order to adjust the weight vectors correctly, it is necessary to calculate the gradient vector of each weight and then adjust the weight vectors in the opposite direction of the gradient vector so that the overall errors are reduced to a reasonable interval. In practical applications, the gradient descent algorithm is used to iterate over the above process, which provides input vector samples, outputs, and errors by calculating the average gradient of these samples and then adjusting the weights [31]. This process is repeated by continuously inputting samples to train the network until the objective function is optimal. In contrast to other optimization techniques, the speed of gradient descent is faster and the generalization ability is stronger [23].
A typical DNN in deep learning is shown in Figure 2, in which a neuron, expressed as "○," is the basic node unit and the neurons constitute the hidden layers. ðx 1 , x 2 , ⋯, x i Þ represents the input feature vectors, and "y" is the predicted output value of the output layer. Excluding the input layer and the output layer, the neural network has n layers. "l ð= 1, 2, ⋯, nÞ" represents the serial number of the lth layer in a neural network. Eq. (1) is the output of each hidden layer, and Eq. (2) means the output activated by an activation function: where z is the output of each hidden layer; a is the output of the activation function taking z as the only parameter; w and b are, respectively, the weight and the bias; and f ð·Þ is the activation function which has several options including the sigmoid (Eq. (3)), tanh (Eq. (4)), and ReLu (Eq. (5)): In the training process, as the layer number of the neural network increases, the gradient of the training parameters will become smaller. It is a common issue called "gradient disappearance," which will prevent the parameters from changing their values and even break the training process. The ReLu can solve the problem of "gradient disappearance," so it is better than the sigmoid and the tanh functions [32,33]. Hence, the ReLu is used in the proposed method as the activation function.
In DNNs, the loss function measures the quality of simulation in a single training sample. For all the training samples, a cost function Jðw, bÞ and a loss function Lðy i , y i pre Þ need to be defined: where m, w, and b are the number of input samples, weights, and bias of neurons, respectively, and y i and y i pre , respectively, represent the real and predicted values of the output.
The neural network is trained to iterate over parameters by forward propagation [34] and backward propagation [35]. As shown in Eqs. (1) and (2), forward propagation calculates the output z and a of each hidden layer from the input, the total output y, and the cost function Jðw, bÞ [36].

Geofluids
Meanwhile, as shown in Eqs. (8) and (9), backward propagation calculates dw ½l and db ½l by the chain derivation rule [37] and then updates w, b in every forward propagation and backward propagation: where α is the learning rate. The purpose of training is to find the appropriate w and b to minimize the total cost Jðw, bÞ, which is calculated by forward propagation and backward propagation. The parameters (e.g., w, b) are derived through the chain derivation rule and updated in the whole network in the above computation. When training and learning complex features, it is not necessary to have many nodes in each layer, but the number of layers is more important. For all these hidden layers, the first few layers can learn some lower-level simple features while the latter ones can combine them into more complex features.
2.2. Transfer Learning Phase. DTL needs to transfer some features into a new network, leading to a separation of these layers (an input layer, an output layer, and some hidden layers) into two categories: fixed layers and transfer layers, as shown in Figure 3. In transfer learning, the fixed layers will not change the parameters but the transfer layers will update the parameters, so it is very important to determine which layers should be transferred or fixed in DTL.
As mentioned above, fixed layers in DTL will not be changed and only learn simple features from raw data. On the contrary, transfer layers are changeable and learn complex features. The number of transfer layers is adjustable and determined according to the total number of layers in a DNN. A typical transfer learning method named Finetune [29,30] is introduced in the proposed method, which has the following characteristics: (1) saving much time since it does not need to start a new task from the very beginning when learning and training data; (2) easily appending pretrained models to current datasets to extend training data; (3) simple implementation; and (4) permitting the distributions of training data and test data not necessarily to be completely identical.
In the structure of a neural network, Finetune can change the last layer (i.e., the output layer) and then perform a round of new training by inputting new data. For some more complex models, Finetune can be expanded to include several hidden layers before the output layer for better accuracy. However, Finetune has a poor effect when the distributions of training data and test data are quite different. There is a solution to address the issue by adding an adaptation layer into the layers of DTL, ensuring the accuracy of a transfer learning model. When using the adaptation method, there are two key points to be determined: (i) the layers that are added to the adaptation layer and (ii) the adaptative method. The first one determines the quality of the whole transfer learning, and the second one determines the generalization ability of the network.

Geofluids
When adding an adaptive layer in DTL, there are altogether three steps: (1) determining the layers that are suitable for adaption, (2) adding an adaptive layer between the transfer layers and the fixed layers, and (3) using the data from the target domain to train the network by Finetune. The following loss function is defined in DTL: where D s and D t are, respectively, the input of the source dataset (i.e., TIs) and the target dataset (e.g., known conditioning data), y s is the output of the source dataset, l is the total loss of the network, l c ðD s , y s Þ is the loss of the network and l A ðD s , D t Þ is the adaptive loss of the network, and λ is the weight of the dataset. The learning quality of DTL can be well improved by adding an adaptive layer, but the computational complexity is additionally increased, and the adaptive layer is also difficult to select. In order to simplify the deep network adapta-tion, the adaptive layer is replaced by a BatchNorm (BN) layer [38] for normalization to incorporate the adaptation of statistical features, which is called an Adaptive Batch Normalization (AdaBN) method. The structure of DTL using the AdaBN method is shown in Figure 4.
As shown in Figure 4, transfer layers consist of some hidden layers and the BN layer. The latter can reduce the differences of distribution from the source dataset and the target dataset. The goal of data normalization for each layer in AdaBN is to make each layer receive data from a similar distribution to mitigate the problem of dataset shift. AdaBN normalizes samples from the source dataset to a zero-mean and the same variance. The BN layer is defined in Eq. (11) for the jth neuron:

Geofluids
where x j and y j are the input and output of the jth neuron; w j and b j are, respectively, the weight and the bias; μ j and σ j are the mean and standard deviation; and t means the tth iteration.
For the input of m samples at the nth iteration, suppose the input mean and input variance are μ and σ 2 , respectively. The calculated mean μ j and the variance σ 2 j for the jth neuron can be updated as follows: where d is the difference between the input mean μ at the nth iteration and the calculated mean μ j at the ðn − 1Þth iteration and n j is the sum of all the samples of the previous n − 1 iterations, i.e., the cumulative number of samples. The variables on the left of the operator "⟵" are updated at each iteration. At the first iteration, μ j and σ 2 j are initialized to 0 and 1, respectively.

Hyperparameters in DTL.
In DTL, some parameters called hyperparameters can be set artificially before training. Hyperparameters do not need to be adjusted and are often divided into three categories: network parameters, optimization parameters, and regularization parameters.
Network parameters include the interaction mode (addition, multiplication, concatenation, etc.) between network layers, the number of network layers (also called depth), and the activation functions. Optimization parameters include the learning rate, batch sizes, parameters of different optimizers, and adjustable parameters of some loss functions. Regularization parameters include the coefficient of weight attenuation and dropout.
At the beginning of the training process, a large learning rate can speed up training. Learning rate α is defined as where α 0 is the initial learning rate, k is the number of rounds of gradient descent, and t is the training time. With the growth of training times, the learning rate will gradually decrease, ensuring that Eq. (13) converges. Since the DTL model is complex and possibly prone to overfitting in calibration and underfitting in validation, the generalization ability of the model should be considered. As for overfitting problems, a regularization method can be used to reduce overfitting by adding a regular term Ω to the loss function [39]: where λ is the weight of regularization and Ω reduces the number of feature vectors and complexity of the model to prevent overfitting. Dropout is another regularization method for overfitting [40], which discards the values of some neuron nodes during each round of training by randomly setting some points to 0 (i.e., these neural cells are considered "dropped out"), improving the generalization ability of the model. The method of adding a regular term and dropout are both used in the proposed method to prevent overfitting. For underfitting in validation, choosing an appropriate network, adjusting hyperparameters, and training more times should be used, which will be discussed with some other hyperparameters such as the number of hidden layers and the BN layer in Section 4.4.

Procedures of the Proposed Method
As shown in Figure 5, the procedures of the proposed method are as follows.
Step 1. Design the deep learning structure (number of layers, activation functions, etc.) and add the BN layer into the hidden layers.  Geofluids

Geofluids
Step 2. Input 2D or 3D TIs of porous media, then use the gradient descent to update the parameters and set the adaptive learning rate.
Step 3. Save all the network parameters (w, b, the layer number, and the number of neurons in the layer) and hyperparameters (e.g., learning rate α and the activation function).
Step 4. Load the model and parameters and then take the conditioning data from the new dataset as input.
Step 5. Set the fixed layers and initialize the transfer layers, then update the parameters of transfer layers using Finetune.
Step 6. Export the neural network model including the structure and corresponding parameters to a file.
Step 7. Output 2D or 3D images of porous media using the above model and parameters.
Multiple source datasets (i.e., TIs) can be used in the proposed method. Besides, reconstructing porous media with any different sizes of 2D or 3D images can be realized according to the features extracted from TIs. Since TIs are real 3D images, the reconstruction retains the features of porous media in the real world.

Experimental Results and Analyses
Since DTL can be run by using a tensorflow-gpu framework [41] accelerated by GPU, the following tests were performed based on a tensorflow-gpu framework with a CPU of Intel Core i7-8700 (3.2 GHz), a memory of 8 GB, and a GPU of GeForce GTX 1070 (6 GB memory). As mentioned previously, the proposed method uses deep learning to extract all features from TIs and then save the trained model and corresponding parameters. The number of layers often can be determined according to some trials and experiences. Generally, the more complicated the porous media are, the more 8 Geofluids hidden layers should be applied. In the practical tests, the reconstruction of porous media is based on the model displayed in Figure 6, in which the network has 8 hidden layers using the ReLu function as the activation function [33]. The details about the architecture shown in Figure 6 will be discussed in Section 4.4.

Training Data and the Representative Elementary
Volume. To evaluate the effects and applicability of the proposed method in the reconstruction of porous media, the real shale volume data with the resolution of 64 nanometers obtained by nano-CT and the real sandstone volume data with the resolution of 10.9 micrometers obtained by micro-CT were used as the test data for the following tests. Figure 7 shows some cross-sections of the volume data of shale and sandstone with two facies: grains (white) and pores (black).
4.1.1. 3D Experimental Shale Images. Before applying any reconstruction methods, two 3D cubes with 80 × 80 × 80 voxels were extracted from different parts of the original shale volume data: one is used as a TI and the other is a target image. A target image can be a judge for comparing the reconstructed results when extracting the features from a TI and then reproducing them using the proposed method conditioned to some conditioning data from the target image.  [42,43] of the training data should be determined first. It is important to observe the influence which the scale of the studied sample exerts when reconstructing porous media, i.e., a key point is to determine the minimum size of a studied sample in which the features tend to be substantially stable and can be independent of the size of the sample. When the size of the sample is less than an REV, the features possibly will change easily with the different sizes of the sample, showing obvious fluctuations in the features. On the contrary, when the samples are bigger than the REV, they have   According to the definition of an REV, all the samples of porous media bigger than an REV have the same statistical distribution as long as they are taken from the original data. The influence of different locations in the original data does not need to be considered. The REV actually conforms to the concepts of ergodicity and stationarity in statistics because a sample bigger than an REV has fixed statistical distribution, meaning the experimental samples bigger than an REV also have ergodicity and stationarity when they meet the REV requirements.
There are two major methods for determining an REV. The first one is porosity widely used in soil science and material science, regardless of the macroscale parameters; the second one determines an REV based on some macroscale parameters without considering the microscale parameters of a sample, often used in engineering mechanics [44,45]. For homogeneous porous media, the method of using porosity for REV may be effective, but it will not work for 11 Geofluids heterogeneous porous media since porosity tends to vary largely with different sizes. Therefore, some other methods should be considered to determine an REV for heterogeneous porous media to ensure that their features will not change largely with different sizes. In our tests, the size of REV was determined by the variogram γðhÞ, which is often used to represent the correlation and variability of spatial structural changes in a certain direction and defined in where E means the mathematical expectation, ZðxÞ is a variable value at the location x, and h is the lag between two locations x and x + h. The specific procedure of determining an REV is the following: firstly, the variogram curves of pores in the X, Y, and Z directions are plotted, respectively; secondly, when the variogram curves at all three directions begin to become stable, the corresponding size of porous media is an REV. The variogram curves computed from the original volume data of shale and sandstone in three directions (X, Y, and Z) are shown in Figures 12 and 13. The abscissa indicates the spatial distance h (unit: voxel), and the ordinate indicates the value of the variogram. In Figure 12, the variogram curves in three directions tend to be stable at distance = 20, 31, and 19 voxels, respectively (indicated with red dashed lines), so the REV can be at least 20 * 31 * 19 voxels for shale; similarly, the REV can be at least 20 * 20 * 26 voxels for sandstone inferred from Figure 13. Since the TIs and the target images of shale and sandstone in our tests are 80 × 80 × 80 voxels, their sizes are bigger than the REV and meet the experimental requirements.

Reconstructions and Comparisons with Other Methods.
Some sample points extracted from the target images of shale and sandstone were, respectively, used as conditioning data of shale and sandstone reconstruction, accounting for 1% of total voxels of the target images, in which the pore voxels and the grain voxels have the same number (meaning the pore and grain voxels, respectively, account for 50% in the conditioning data). The proportion of conditioning data is deliberately quite different from the porosity of the target image to prove the applicability of the proposed method. Suppose the pore value is 1 and the grain value is 0. For convenience, the proposed method is called DTL hereafter. Reconstructions of porous media (shale and sandstone) were performed using DTL and some typical MPS methods (SNESIM, FILTERSIM, and DISPAT) with conditioning data and TIs. Figure 14 is the reconstructed 3D pore spaces of shale, respectively, using DTL, SNESIM, FILTERSIM, and DISPAT. It is seen that all   12 Geofluids the reconstructed pore spaces by the four methods have similar structures with the target image (Figure 9(c)), and the long connectivity of the pore spaces is well reproduced. Figure 15 is the reconstructed 3D pore spaces of sandstone, respectively, using DTL, SNESIM, FILTERSIM, and DISPAT. The reconstructed images by the four methods also have similar structures with the target image ( Figure 11(c)), and the long connectivity of pore spaces is well reproduced, too.

Averages of Reconstructions.
For convenience, the porosity values of shale and sandstone reconstructions are put together. The porosity values of TIs, the target images, and the reconstructed images are shown in Table 1.
To get the average performance, another 10 shale and sandstone reconstructions using SNESIM, FILTERSIM, DISPAT, and DTL were performed, as shown in Table 2. All the reconstructed images are cubes with the same size of 80 × 80 × 80 voxels. Since each voxel within the cube has its fixed location and value, the average values of these ten reconstructions can be computed and constitute an "average cube" with the size of 80 × 80 × 80 voxels for each method (SNESIM, FILTERSIM, DISPAT, and DTL), as shown in Figures 16 and 17. The attribute values of each voxel in the "average cube" are the average of the voxel at the same location in reconstructed images. It seems that the recon-structed shale image using DTL has a clearer distinction between the pore spaces and grains, showing that DTL has relatively fixed reconstructed results compared with the other three methods.

Variogram and Multiple-Point Connectivity.
Variogram depends on the independent variable h, and variogram curves can represent the spatial variability of two points in one direction. In the reconstruction of porous media, a variogram is also used for evaluation. The variogram curves of the TIs, the target images, and the reconstructions of SNESIM, FILTERSIM, DISPAT, and DTL were computed in the directions of X, Y, and Z, as shown in Figures 18 and 19. It is seen that the variogram curves of the DTL method, in all three directions, are quite close to those of the target images.
Multiple-point connectivity (MPC) [12,46] can measure the joint connectivity between multiple points in one direction, which is defined as where SðuÞ is the attribute value at the position u, h is the lag distance, n is the number of nodes or points in one   direction, and E is the mathematical expectation. Suppose SðuÞ = 1 when u corresponds to pore space; otherwise, Sð uÞ = 0. As shown in Figures 20 and 21, the MPC curves of the TIs, the target images, and reconstructions are very similar in X and Y directions, but in the Z direction (especially in Figure 20), DTL shows better performance since their curves are much closer to those of the target images.

Permeability Estimation Using the Lattice Boltzmann
Method. Permeability means the ability to allow fluid to pass through porous media and often is related to porosity, geometry of pores in the direction of liquid penetration, and other factors [47]. The permeability of reconstructed porous media is computed using the Lattice Boltzmann Method (LBM) [3,48]. The evolution equation is defined as follows: where c = Δx/Δt is the lattice speed and u is the fluid velocity. w i means the weights whose values are w i = 1/3 ði = 0Þ, w i = 1/18 ði = 1, 2, ⋯, 6Þ, and w i = 1/36 ði = 7, ⋯, 18Þ, respectively. e i is the discrete velocities: The density of the momentum ρu and the fluid ρ are Since the internal structures of reconstructed systems are very complicated, the bounce-back scheme is used to obtain no-slip velocity conditions. The inlet and outlet of models are computed by using pressure conditions. The data of the TIs, the target images, and the reconstructed images were, respectively, used as the input files of LBM simulation to calculate the permeability of those models with the size of 80 × 80 × 80 voxels. Some parameters in LBM are defined as follows: Δx = Δt = 1, τ = 1. Two faces of the reconstructed model, which are perpendicular to the flow direction, are left open while all other four faces are sealed with the matrix phase. When convergence is reached, the permeability along this flow direction can be computed according to Darcy's law.
As shown in Tables 3 and 4, the permeability of the TIs, the target images, and the average permeability of ten reconstructed images using DTL, SNESIM, FILTERSIM, and DIS-PAT in three directions were computed by LBM. The permeability values (especially in the Z direction) of the reconstructed images using DTL are quite close to those of the target images, displaying good reconstruction quality of DTL.
4.2.6. Distribution and Numbers of Pores. Analyses of pore structures were performed by the software Avizo [49] through importing the TIs, the target images, and the reconstructed images of DTL, SNESIM, FILTERSIM, and DISPAT.

Geofluids
The pore structures including the number of pores, the pore sizes, and the diameters of each pore were calculated. The diameter of pores is approximately defined as where V is the volume of each pore. Table 5 shows the average numbers of pores in the TIs, the target images, and 10 reconstructed images using FIL-TERSIM, SNESIM, DISPAT, and DTL. The diameters of pores in the TIs, the target images, and the reconstructed images of SNESIM, FILTERSIM, DISPAT, and DTL are displayed in Table 6 (shale) and Table 7 (sandstone). Figure 22 shows the distributions of pore diameters of the TIs, the target images, and the reconstructed images of SNESIM, FIL-TERSIM, DISPAT, and DTL. DTL is not the best in all individual items, but it has shown good quality in the overall performance, judged from Tables 5-7 and Figure 22.
The average memory usage, CPU/GPU utilization, and running time of SNESIM, DISPAT, FILTERSIM, and DTL of ten reconstructions (shale and sandstone) are shown in Tables 8 and 9. The reason for splitting the "first-round running time" from the overall running time is that all four methods cost more time in their first-round running: SNE-SIM, DISPAT, and FILTERSIM scan all points or patterns in TIs, and DTL needs to use deep learning to train a basic model to learn all the structural features in TIs. However, after the "first-round running time," the time consumption of all four methods will be largely reduced because the training or learning processes have finished in the first round. It is seen that DTL is much better than other methods in the reconstruction time and memory demands due to the use of GPU.

Comparisons Using Different Conditioning Data.
The results using four methods with 1% conditioning data (i.e., accounting for 1% of total voxels of the target image) are compared in Section 4.2. Since the porosity values of the target image and the training image are quite different, more conditioning data extracted from the target image can prove the applicability of the proposed method better. Hence, 5% conditioning data (i.e., accounting for 5% of total voxels of the target image) extracted from the target image for reconstruction were used to demonstrate the applicability of the proposed method. The pore voxels and grain voxels in the conditioning data have the same number. Variogram and MPC curves (average of ten reconstructions) of sandstone and shale using the conditioning data (5% of the total voxels) are shown in  It is seen that when there are more conditioning data (5% of total voxels), the variogram and MPC curves of the  18 Geofluids reconstructed DTL images are closer to those of the target image, proving the applicability of the proposed method.

Parameters
Used for DTL. The practically used DTL network and some parameters will be discussed in this section. When using the DTL method, the network structure needs to be designed, including the number of hidden layers, the number of neural cells in the hidden layers, the optimization method, the activation function, and some other parameters, which will largely affect the training time, learning ability, accuracy, generalization ability, etc. When designing the structure of DTL, the number of hidden layers can be determined by the learning effect, which is measured by the accuracy during the deep learning phase. The accuracy is calculated by comparing voxel values of the results from the training model with those of the TI during the deep learning phase. The accuracy (range is from 0 to 100%) is actually the similarity between the results in the deep learning phase with the TI, measuring whether the network structure can learn the structural features of training data quickly and accurately. High accuracy in the deep learning phase means that the network learns the features of training data well. However, high accuracy has little effect on the generalization ability of DTL and does not mean overfitting of the reconstruction because during the transfer phase conditioning data will be added for the constraints and the model will be trained again. The effect of overfitting on the whole DTL is small during the deep learning phase, but condition-ing data have a significant impact on the generalization ability of DTL during the transfer learning phase. Therefore, the principle of selecting network parameters is to choose a network with a simple structure while ensuring accuracy.
As shown in Figure 27 and Table 10, more hidden layers and neural cells make the accuracy higher but need more training time. In our real experiments, the number of hidden    layers is 8 and the number of neural cells is 500 to balance the training time and accuracy. A BN layer will be inserted in transfer layers, but the specific location should be determined by experiments. A BN layer was inserted in the different locations of a DTL network to, respectively, form a 6-, 7-, 8-, 9-, and 10-layer network for a test. The accuracy during the deep learning phase is also, respectively, calculated. As shown in Figure 28, the abscissa ranging from 1 to 6 means the number of layers from the location of the inserted BN layer to the last layer, considered the "distance" (unit: layer) from the currently inserted BN layer to the last layer of the DTL network. For example, the accuracy is 80% when the inserted BN layer is next to the last layer (distance = 1) for a 6-layer DTL network in Figure 28; the accuracy is 98% when distance = 2 for an 8-layer DTL network. Since the number of hidden layers was 8 in the real experiments (see discussions about Figure 27 and Table 10) and the accuracy was close to 100% when distance = 2, the BN layer was added to the penultimate layer (i.e., distance = 2).
As for the optimization algorithm, the Adam-gradient descent method was used [40], which has faster convergence speed and more effective learning effect in practical applications. The weight w and the bias b should be randomly initialized. Learning rate α can be set to 0.5 at the beginning and will be adjusted to a smaller level after some training processes. The loss function in our experiments is the crossentropy loss function for speeding up the training process and improving accuracy [13], which is defined as

Conclusions
Statistical methods represented by MPS and some other methods are widely used for the reconstruction of highresolution 3D porous media. However, the applicability of these methods is limited due to their large CPU cost and memory requirements. Meantime, the models established by them are not deterministic but a series of stochastic implementations with equal probabilities, so it is necessary to reconstruct models many times to obtain an average result. Due to the rapid development of various machine learning technologies, it has become feasible to use deep learning to solve the problem in the reconstruction of 3D porous media. In this paper, a reconstruction method based on DTL is proposed, which is considered a combination of deep learning and transfer learning. Deep learning is used to learn the structural features of porous media, and then, transfer learning reproduces the features in new reconstructions.
Instead of reconstructing the unknown regions pixel by pixel or pattern by pattern, the proposed method learns the features and relationships between TIs and conditioning data first. The modeling process of the neural networks is a process of iterative optimization, making the errors gradually smaller and the results more certain rather than a series of equal probability results, which is also the reason that DTL has more stable results. This method has lower CPU costs and memory demands than the traditional MPS-like methods. The experiments of reconstructing the shale and sandstone images have proved the advantages of the proposed method in reconstruction quality, time consumption, and CPU utilization.

Data Availability
The data used to support this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflict of interest.