Characterization of Aquifer Multiscale Properties by Generating Random Fractal Field with Truncated Power Variogram Model Using Karhunen – Loève Expansion

1Department of Oil-Gas Field Development, College of Petroleum Engineering, China University of Petroleum, Beijing 102249, China 2State Key Laboratory of Petroleum Resources and Engineering, China University of Petroleum, Beijing 102249, China 3State Key Laboratory of Shale Oil and Gas EnrichmentMechanisms and Effective Development, Sinopec Group, Beijing 100083, China 4Department of Hydrosciences, School of Earth Sciences and Engineering, Nanjing University, Nanjing 210093, China


Introduction
Groundwater resources management and contamination prevention are the important issues to sustain the development of human society.One of the most challenging tasks to address these issues is to describe the heterogeneity of the hydrogeological parameters, such as hydraulic conductivity and porosity over scales.Many techniques have been devoted to obtain more information on the heterogeneity of the hydrogeological parameters in different scales, such as core samples analysis [1], direct push technology [2], and hydraulic tomography [3].Due to the complex nature of the spatial distribution of these hydrogeological parameters, these parameters tend to be treated as random variables, and thus their spatial distribution can be quantitatively described using stochastic process [4,5].The governing equations of groundwater flow and contaminant transport become stochastic partial differential equations.In this manner, the state variables, such as hydraulic head, velocity, and contaminant concentration, are predicted with uncertainty.The merit of the stochastic method is that it can produce the probabilistic distribution of the prediction.The best prediction (quantified by the first moment, i.e., mean) and the associated uncertainty (quantified by the second moment, variance/covariance) can be inferred from this distribution.Geostatistics was introduced to hydrogeology and became popular in numerical simulation of groundwater flow and solute transport [6][7][8].The assumption underlying the traditional geostatistics [9] is that the spatial distribution of the hydrogeological property is statistically homogeneous; that is, the joint probability of hydrogeological parameters depends only on their relative locations, but not on the exact locations.However, it is evident that the random hydrogeological field may manifest it as random fractals with statistically homogeneous increments [10][11][12][13][14]. Stochastic fractal models to characterize such random field include Gaussian-based fractional Brownian motion and fractional Gaussian noise [10,12,15], non-Gaussian-based fractional Levy motion or fractional Levy noise [13], and multifractals [16].The Gaussian-based fractional Brownian motion model that can be characterized by power law variogram can be inferred from collected hydrogeological data, such as permeability or porosity, at several sites over diversified distance scales [12,17].The random field described using such model exhibits the self-affine scaling properties and thus can take the multiscale variation into account.
To analyze the scale-dependency of the random fractal field and investigate the effect of multiscale random fractals to the behavior of groundwater flow and contaminant transport, it requires generating the realizations of hydrogeological properties based on their stochastic characteristics.If measurements on the hydrogeological properties are directly available, the generated multiscale random fractal field should be further conditional on the measurement values.The most common method used to generate a Gaussian random field is sequential Gaussian simulation (SGSIM) due to the wide usage of geostatistical software library (GSLIB) developed by Deutsch and Journel [9].Although power variogram model is included in GSLIB, it can not be used directly to generate a multiscale random fractals field.The traditional geostatistical method can only generate Gaussian random field with the assumption of stationarity or statistical homogeneity.Since the power variogram does not have a finite sill and integral scale and lacks an explicit form of covariance function, the SGSIM module in GSLIB does not have the support to use power variogram.The spectral method can be used to generate a multiscale random fractals field, which represents the random field in Fourier space and thus can be referred to as Fourier method [18].The Fourier method has been improved to be a hierarchical method, such as the hybrid method [19] and Fourier-Wavelet method [20].
In this work, we adopt a truncated power variogram model proposed by Di Federico and Neuman [21] to generate the multiscale random fractal field that can be characterized by power variogram model.The truncated power variogram can describe the multiscale characteristics of the random field and can take measurement scale, observation scale, and window scale into account.This model also has a reasonable hydrogeological interpretation, which states that the power law variogram behavior of log hydraulic conductivity random field can be considered as a weighted integral of infinite hierarchies of mutually uncorrelated stationary fields.After truncating the higher and/or lower frequency, the truncated power variogram models become stationary and have explicit forms of covariance functions.Each hierarchy is referred to as a mode in the spectra of such random field.Mathematically, it has been proved that each mode can be characterized by either the exponential or the Gaussian variogram.It renders two types of truncated power variogram: truncated power variogram with exponential modes (TpvE) and truncated power variogram with Gaussian modes (TpvG).We have modified the GSLIB code to incorporate these truncated power variogram models to generate random fractal field, both unconditional and conditional, using sequential Gaussian simulation method [22].Other methods can also generate random fractal field, such as the spectral methods [23] and fractal method [24].However, these existing methods do not have the capability to generate the conditional random field.The conditioning data of hydrogeological properties are usually available and they are critical components to improve the characterization of the targeted field.Therefore, it is necessary to take these hard data into account by the conditioning technique.The comparison of the existing methods to generate random fractal fields is listed in Table 1.
Here in this work, an innovative method to generate the multiscale random fractal field is proposed by incorporating the truncated power variogram models with Karhunen-Loève (KL) expansion method.The advantage of the proposed method is that KL expansion has the capability to conduct the model dimensionality reduction.If it is further incorporated with polynomial chaos expansion on the state variables, such as hydraulic head in the single-phase flow problem [25], saturation in the multiphase problem [26], and contaminant transport problem [27], the computational cost to perform uncertainty analysis in the hydrological research can be greatly reduced.The proposed KL-based multiscale random fractal field generator provides the foundation to establish a high-efficiency stochastic analysis framework.
The rest of the paper is arranged as follows: the theoretical bases of truncated power variogram model and KL expansion are introduced in Section 2; the results of unconditional and conditional multiscale random fractal field generations and the capability of KL expansion to perform dimensionality reduction are presented in Section 3; and the conclusions are drawn in Section 4.

Methodology
2.1.Truncated Power Variogram Models.The heterogeneous log hydraulic conductivity, denoted as , random fractal field with statistical homogeneous increments, can be described using power variogram model: where  0 is a constant,  is Hurst coefficient (0 <  < 1), and  is the lag distance.
The log hydraulic conductivity random field is self-affine since the power variogram function has the scaling property: where  is any positive constant ( > 0).The fractal dimension of the self-affine log hydraulic conductivity random field is where  is the Euclidean or topologic dimension.Define the mode number  = 1/ representing the spatial frequency of random fluctuation (where  is the integral scale), integrate continuously over all the possible modes within the range [  ,   ], and weight each mode with a factor of 1/ [21], where the lower and upper limits of mode number   and   correspond with the upper and lower limits of integral scale   and   due to the reciprocal relationship.The upper limit of integral scale   is proportional to the characterization length of window scale (e.g., the domain size of an aquifer), and the lower limit of integral scale   is proportional to the characterization length of data support or measurement scale (e.g., the size of the soil core If each statistically homogeneous mode in an infinite hierarchy log hydraulic conductivity random field with uncorrelated spatial increments can be characterized by exponential variograms in terms of  integral scale, where  2 () =  2 is the variance of log hydraulic conductivity and  is a constant.The consequent truncated power variogram model with exponential modes (TpvE) is with the analytical form (0 <  ≤ 0.5) where  = ,  and Γ(⋅, ⋅) is the incomplete gamma function.
The corresponding covariance function of TpvE model is with the analytical form If each statistically homogeneous mode in an infinite hierarchy random field with uncorrelated spatial increments can be characterized by Gaussian variogram, The analytical form of truncated power variogram model with Gaussian modes (TpvG) is (0 <  ≤ 1): The corresponding analytical form of TpvG covariance is If the lower limit   approaches 0 (i.e., the point measurement scale) when the size of the log hydraulic conductivity measurement is so small that it can be treated as a point and the upper limit   approaches infinity (i.e., infinitely large window scale or domain size), the truncated variogram models can also reduce the power variogram model (1) with the condition

KL Expansion of Stochastic Process.
The spatial distribution of hydrogeological variables, such as log hydraulic conductivity , can be treated as a stochastic process.The covariance function (x, y), as in (9) for TpvE and in (12) for TpvG, of a zero-mean stochastic process, (x), is bounded, symmetric, and positive definite, and it can be expanded as where x, y is the spatial coordinate in a topological space D, the lag distance  = |x − y|,   is the eigenvalue, and   (x) is eigenfunction.
The eigenfunctions are orthogonal and form a complete set.They satisfy the condition where   is the Kronecker delta function.
The KL expansion of the stochastic process can be expressed as where   is the standard Gaussian random variable following the normal distribution (0, 1).The unconditional random field can be generated by using this equation by finding its eigenvalues and eigenfunctions with the stochastic characteristics known.
The eigenvalues and eigenfunctions of the covariance function can be solved from the Fredholm equation If there are   measurements  1 (x 1 ),  2 (x 2 ), . . .,    (x   ), it may require generating random field conditioning on these measurements.The generation of the conditional random field can be achieved by the conditional Kriging.The conditional Kriging mean and variance are, respectively, where   (x) are weighting functions and the subscript  represents "conditional." The weighting function can be computed by using the following Kriging equations: Since the set of eigenfunctions {  (x)} is complete, we can expand   (x) on the basis of {  (x)}, where   is the coefficients to be determined.Substitute this expansion to Kriging equation (20), multiply   (x), and integrate both sides with respect to x over the domain : Similar to unconditional case, the conditional eigenvalues  ()   and eigenfunctions  ()  (x) of the conditional covariance functions  ()   (x, y) can be solved from the following Fredholm equation: To determine  ()  (x), we expand it in terms of unconditional eigenfunctions   (x) in a similar way to Kriging coefficients.Write the conditional eigenfunction as Substituting this expression into (23), multiply   (y) and integrate it with respect to y over , Write it in the matrix form where the components of , and E is  ×  identity matrix.This equation indicates that the problem of finding the eigenvalues of conditional covariance function can be transformed into the problem of finding the eigenvalues of matrix A.
After finding the eigenvalue of matrix A,  () , the corresponding eigenvector d can be used to construct the conditional eigenfunction of the conditional covariance Once the eigenvalue  () and eigenfunction  ()  (x) of conditional covariance function are obtained, the conditional log hydraulic conductivity random field can be generated using where  is the number of terms required to generate the conditional random field and   ∼ (0, 1) is the standard Gaussian random variable.unconditional multiscale random fractal field are shown in Figure 2(a).In this case, the generated 50 × 50 random multiscale fractal field is shown in Figure 3.As can be found in Figure 3, the upper region is associated with relative large values and the lower right region is associated with relative small values.To confirm the quality of the generated unconditional multiscale random fractal field, the sample variogram of the generated 2500 data values is used to compare with its theoretical counterpart.The sample variogram can be estimated using

Results and Discussion
where   is the number of pairs of samples ,  such that |x  − x  | = ℎ.This task can be achieved by using the GAMV module in GSLIB code.As can be observed in Figure 4, the sample variogram model can reproduce the variogram structure of theoretical power variogram with parameter set { 0 , } = {0.046,0.25} very well, which justifies the quality of the generated multiscale random fractal field.Although not shown here, a series of similar tests with different power variogram model parameter values have also been performed.They include the cases when the Hurst coefficient 0 <  < 0.5 (indicating negatively correlated spatial increments), 0.5 <  < 1 (indicating a positively correlated spatial increments), and  = 0.5 (indicating independent spatial increments).All the test results confirm that sample variograms can coincide well with their corresponding theoretical variograms.

Capability of Dimensionality Reduction by Using KL Expansion.
One of the significant advantages to use KL expansion to generate the multiscale random fractal field is that it has the capability of reducing the dimensionality of the field in probabilistic space.This can be achieved by Obtain the covariance matrix with the given parameters by using the TpvG model Equation ( 12)

Solve the eigenvalues and eigenfunctions of the covariance matrix numerically
Equation ( 16) Obtain the realization of the random fractal eld by using KL expansion a er generating a set of standard Gaussian random numbers Equation ( 17) Obtain the Kriging weights of conditional dataset Equation ( 20)

Derive the conditional Kriging mean and covariance
Equation ( 19) Generate the conditional random fractal eld using conditional KL expansion with the given random standard Gaussian variables Equation ( 28)

Obtain the eigenvalues and eigenfunctions of conditional covariance function
Equation ( 26)  truncating out the expanded KL terms with small eigenvalues.The eigenvalues of covariance function are plotted in Figure 5.
It can be found that the eigenvalues decrease dramatically with the increasing of the number of the remained terms after truncation.Approximately, all the eigenvalues are less than 1 after 200 terms.As stated in Chang and Zhang [28], each eigenvalue represents the energy and information content for each term to express the spatial variability of the random field.
To determine the number of terms that needs to be retained to accurately represent the random field, Chang and Zhang [28] propose to define an energy criterion   :   where   is the number of the retained terms,  is the domain size, and  2 is the variance of the random field.The relationship of   and   is shown in Table 2.As can be observed in Table 2, the first 10 terms associated with large eigenvalues has already taken up to more than 50% of the total energy.To investigate how the number of the retained terms affects the generation of the multiscale random fractal field, a set of realizations of multiscale random fractal field are generated with different number of retained terms (e.g., 50, 100, 200, 500, 1000, and 1500) in KL expansion, as shown in Figure 6.It can be found that even only 50 leading terms, taking up to approximately 70% of the total energy, are retained in KL expansion during the generation of the multiscale random fractal field; the general pattern of the spatial variation can be captured.With the increasing of the number of the retained terms in KL expansion, the details of the heterogeneous characteristics become more and more revealed.When 1500 leading terms, corresponding to approximately 94% of the total energies, are retained in the KL expansion, the generated multiscale random fractal field is almost indistinguishable from that generated with full KL expansion by comparing Figure 6(f) and Figure 3.This finding indicates that the heterogeneous characteristics of the random field are dominated by the leading KL expansion terms with large eigenvalues, which provides the foundation to conduct the dimensionality reduction.Since the small details of the heterogeneous characteristics may not affect the flow and transport characteristics dramatically, the KL expansion terms with small eigenvalues can be safely truncated to improve computational efficiency in the stochastic analysis of flow and transport.For example, it has been demonstrated that the mean and variance of hydraulic head obtained with only the first few leading KL expansion terms (e.g.,   = 6 for a 1D case) is very close to those obtained from thousands of Monte Carlo simulation with full models [25].Similar observations are also reported in Li et al. [26] and Liao and Zhang [27].In a quantitative sense, if the polynomial chaos expansion (PCE) is used to construct a surrogate model for the state variables (e.g., hydraulic head) and KL expansion on the model parameter (e.g., log hydraulic conductivity), the number of model evaluation required to obtain the PCE coefficients is (  + )!/  !/!, where  is the degree of PCE.For a given , the number of model evaluation is greatly reduced when   decreases; hence the construction of surrogate model can be more efficient.

Generation of Conditional Multiscale Random Fractal
Field.When direct measurement data are available, it may require generating the multiscale random fractal field that can be conditional on these measurements.To take the measurements into account, the procedures to generate the conditional random fractal field are shown in Figure 2(b), and the measurement values are obtained from a realization of the generated unconditional field to maintain the underlying random fractal characterizations.If there are 100 measurement data (the location of the measurements is shown in Figure 7(a)) collected from the unconditional multiscale random fractal field, the two generated conditional random fractal field can be depicted as shown in Figures 7(b) and 7(c).It can be observed that the generated values on the measurement locations honor the measurement values as shown in Figure 7(d), and the spatial correlation of the values on the unmeasured locations is governed by the given power variogram.Therefore, both the measurement values and the multiscale random fractal features can be represented in the generated unconditional field.With the aid of the generated multiscale random fractal field conditioning on the available measurements on the hydrogeological properties, the conditional stochastic analysis on flow and transport can be further performed.

Advantages and Limitations of TPV Models in Simulating
Real Fields.This study is focused on random field generation, but it will be helpful to discuss when and how the proposed method can be used to characterize an aquifer based on the real observations.Due to the diversity and complexity of the geologic heterogeneity, there exists no versatile geostatistical model which works consistently well for all cases.Different models and tools have been developed to describe specific characteristics, for example, (non)stationarity, log-normal distribution, channeling, and self-similarity or fractal scaling [9,21,29]; [Carle and Fogg, 1996].Compared to traditional geostatistical models employed in GSLIB, TPV models have some additional and unique merits.First, they can capture fractal scaling, a widely reported feature in real sediments, rocks, aquifers, and reservoirs ( [30][31][32][33][34][35][36][37], among many others).Fractal scaling in many cases can be significant but cannot be described by traditional models.Second, unlike traditional models which are limited to stationarity assumption, TPV models are able to treat true fields as nonstationary in general and allow them to be stationary through upper cutoffs representing the impact of finite observation ranges or geologic homogeneity.That is, TPV models are compatible to both stationary fields and nonstationary fields, as demonstrated in Figures 1 and 4.
In the meanwhile, TPV models inherit the advantages of the traditional models; for example, they can take the anisotropy, nested structures, and conditioning effect into account.In order to capture the property of geological layering in real fields, one may choose long correlation lengths in the horizontal directions and a much shorter one in the vertical direction.More specifically, the parameters  and/or   above can be anisotropic to reflect the anisotropy of geologic layering.
As discussed above, it may be unwise to expect a versatile tool to handle all the fields.TPV model has its limitations in certain circumstances.The main limitation is that TPV, as a member of two-point statistical tools, is weak in featuring apparently observed connectivity, branching, or channeling.At the present stage, to solve this problem one may also need postprocessing using tools like sequential indicator simulation or directly resort to multipoint statistics (MPS [29,38]).Combining fractal scaling and connectivity will be a promising work in the future.

Figure 1 :
Figure 1: The equivalency of TpvG and power variogram models.
It indicates that when   is small (e.g.,   = 10 2 ), the TpvG model is associated with a finite sill and integral scale, which are 0.4 and 33.33, respectively.With the increasing of   , sill and integral scale increases.When   increases to 10 5 in this case, the TpvG model can approximate the power variogram model accurately in the range of the investigated lag distance.
(12)results obtained by TpvE and TpvG models are very similar to each other, and it may be redundant to show both results in this work.Due to the larger range of  parameter in the TpvG model and thus its flexibility (especially in the inverse modeling process when observation scale needs to be accounted for), only the results obtained by TpvG model is shown in this paper.The lower limit   approaching to 0 corresponds with a point scale measurement.In reality, most of the measurements, such as the permeability measurements of soil cores, satisfy this condition since its scale is much smaller than the scale of domain size.Therefore, we set   as 0 and vary   in an increasing order to show the equivalency of TpvG with power variogram within a prescribed finite range of lag distance.As shown in(1)and(11), the parameter set is {, ,   } for TpvG model with a point measurement scale and { 0 , } for power variogram model.The relationship of  and  0 is described by(14).Here, we set the parameter set of TpvG as {, } = {0.02,0.25}and the corresponding parameter set for power variogram model is { 0 , } = {0.046,0.25}andthenvaryfrom 10 2 to 10 6 .The obtained results are shown in Figure1. to 10 6 and the parameter set {, ,   } = {0.02,0.25, 10 6 } in TpvG is used to ensure the sufficient accuracy to approximate the power variogram with parameter set { 0 , } = {0.046,0.25}),atwo-dimensional multiscale random fractals field can be generated by using KL expansion method with the TpvG model given by(12)as the covariance function.The procedures to generate the

Table 2 :
The relationship of energy criterion and number of retained terms.