Uncertainty and Sensitivity Analysis of Significant Parameters for Superlarge Diameter Shield Excavation

Excavation of a superlarge diameter tunnel by tunnel boring machine (TBM) is different from that of a shield tunnel with normal dimension, in which the control system of the superlarge TBM is very complicated and difficult to operate. Hence, it is very important to focus on the control and management of significant parameters to ensure excavation stability under uncertainty. In this paper, we (i) utilize a BIM-based big data platform (BIM-BDP) to manage the essential construction data of tunnel project in digital format; (ii) adopt the global sensitivity analysis (SA) to recognize significant parameters for shield excavation based on polynomial chaos expansion (PCE)–extended Fourier amplitude sensitivity test (eFAST) model; and (iii) employ the uncertainty analysis (UA) to discover the correlation between significant parameters from the data of the BIM-BDP.'is research contributes to (i) the body of knowledge of proposing a more appropriate research methodology that can cope with aleatory and epistemic uncertainty and support uncertainty and sensitivity analysis (UA/SA) processes based on data from BIM-BDP and (ii) the state of practice by providing a data-driven surrogate model to simulate system behaviors of shield excavation with high reliability and to reduce dependency on domain experts. Here, we pay close attention to the most influential parameters that require priority parameter control, which can help administrators optimize the management of shield parameters during tunnel excavation.


Introduction
Tunnel boring machine (TBM) is the main tool for urban tunnel excavation due to its high excavation efficiency, high safety, environmental friendliness, and high cost-effectiveness [1]. In order to guarantee the safety and efficiency of tunnel excavation, the administrators need to (i) record and store real-time monitoring data during TBM tunneling, including hydrological, geological, and TBM parameters; (ii) identify the influential and noninfluential parameters for excavation stability based on the monitoring data; (iii) control and manage influential parameters under system uncertainty. To avoid excavation instability, it is very meaningful to manage and analyze the construction parameters [2][3][4][5][6][7][8]. Building Information Modeling (BIM) enables the design and analysis of complex, multidisciplinary systems as often encountered in infrastructure projects and, in particular, tunneling [9].
Reasonable control of TBM is considered as a complicated process, in which it is very difficult to rigorously analyze interactive relationship between construction parameters due to the uncertainty of heterogeneous soil and complex surroundings. e uncertainty and sensitivity analysis (UA/SA), which helps researchers better understand the relative importance of each parameter [10,11], of shield machine construction parameters is an important research direction for parameters control. Understanding the relative "sensitivity" of parameters can aid in the development of better monitoring strategies and experiment design, for example, indicating the priority and amount of parameters to be controlled. In the SA methods, the global sensitivity analysis (GSA) can perceive and distinguish the influence magnitude of parameters associated with the excavation stability to help reduce the cognitive uncertainty caused by the limitation of human cognition and uncertain factors, for example, incomplete information. Hence, GSA provides guidance for identifying parameters whose information should be collected to enhance the system reliability. e extended Fourier amplitude sensitivity test (eFAST) method is a global and quantitative GSA algorithm that can be applied to complex nonlinear and nonmonotonic models [12][13][14]. e key idea of eFAST is that from one simulation run to another, all factors fluctuate around their nominal values [15]. e importance of the factor is determined by analyzing the Fourier decomposition of the model response.
e eFAST method has been utilized in this paper to investigate the complicated and uncertain interaction between the construction parameters as well as the excavation stability in the SRHT project.
is research contributes to (i) developing a framework of UA/SA for investigating the interactive relationship between construction parameters under system uncertainty; (ii) proposing a noval polynomial chaos expansion-extended Fourier amplitude sensitivity test (PCE-eFAST) model for GSA with high accuracy and reliability; (iii) exploring the reasonable sample size for PCE model and investigating the limitation of eFAST method to get a more convincing result. e remainder of this paper is organized as follows. First, the literature review of parameters analysis for superlarge diameter shield excavation is briefly presented. Second, Section 3 shows the construction of the BIM-BDP, which includes four modules (i.e., data sensing, data storage, data analysis, and data presentation). ird, the global uncertainty and sensitivity analysis process based on BIM-BDP data is exhibited in Section 4, with three steps (i.e., generating data according to the input-output index system, running the surrogate model, and performing GUA and GSA by PCE-eFAST method). Fourth, the UA/SA results are revealed in Section 5. Fifth, we discuss the limitation of the eFAST method and this study. e final section summarizes the main observations of the study and lists future research directions.

Parameters Analysis for Superlarge Diameter Shield
Excavation. TBM is extremely sensitive to geological changes and excessively dependent on the operator's own experience [16]. BIM for decision making during the life cycle of infrastructure projects is a vital tool for the analysis of complex, integrated, multidisciplinary systems [9]. e BIM mainly contains both geometric and nongeometric data, and a strong link should be formed between the visual model and database [17]. For underground projects, [12] established a BIM cloud-based safety risk knowledge database to sort out all kinds of safety risk knowledge. A study integrated WNS and BIM to carry out long-term underground environment detection and monitoring, and the collected data can be preserved in a database for analysis and management [18]. As for the integrated database, data incompatibility is the most significant problem [19]. One of the most valuable assets in any database is the content itself [20]. However, as far as the literature is concerned, there are relatively few studies that make full use of and analyze the data, especially in the shield tunnel project.
TBM is a kind of very complex and comprehensive equipment and is composed of a variety of systems. Although TBM can provide hundreds of shield machine operating parameters, the TBM operator still has an inadequate awareness of geological conditions and shield tunneling machine status at present and of whether such shield machine operation status corresponds to such geological conditions. erefore, there are still many problems involved in the operation of the shield machine during the TBM construction, such as how to evaluate rock parameters in real-time based on equipment parameters and how to use the evaluation results as the basis for decision making and optimization to achieve scientific control and management of shield parameters for excavation stability.

Polynomial Chaos Expansion (PCE).
e random spectrum method [21] is a new technique for establishing a surrogate model. e method has a multidimensional random response surface in the sample space and a large number of response samples. e most famous random spectrum method is polynomial chaos extension (PCE). e authors in [22] proposed an initial Hermite polynomial chaos method, which uses normal random variables. e authors in [23] proposed a generalized PCE method and extended the method to non-normal random variables (e.g., lognormal, Weibull, and beta). Besides, after further development, generalized PCE can already use non-global smoothing polynomial basis functions. e Hermite polynomial chaos method for moving and windowing can estimate a small activation probability and improve the accuracy of the response surface [24]. PCE has many advantages over other surrogate models [25]. is method can produce highly accurate results, and it has proven to have significant computational efficiency in the expansion of large-scale systems [26]. e use of nonintrusive PCE in the quantification of uncertainty has received wide attention. e method has been applied as well in solving fluid dynamics, circuit simulation, environment, and sound field. e basic idea is to approximate the stochastic process by the sum of orthogonal polynomial chaos with independent random variables. e key step is to determine the coefficients of each polynomial. e model responses are characterized by a multivariate polynomial whose distribution relative to the input variables is orthogonal. In this case, characterizing the probability density function (PDF) is equivalent to evaluating the polynomial chaos coefficients.

Global Uncertainty and Sensitivity Analysis.
e fact that SA should be a fundamental part of any analysis that involves the assessment and propagation of uncertainty is underlined in [27]. SA of a model aims at quantifying the relative importance of each input parameter [28]. In particular, techniques for GSA have become an accepted standard for the evaluation of the impact and interactions of uncertain inputs in complex environmental effects [29][30][31].
Uncertainty is a physical quantity that characterizes how reliable a random variable is and refers to situations involving imperfect and unknown information. It appears in a partially observable and/or random environment, which is suitable for the prediction of future events [32]. e GSA considers the output behavior of the model over the full domain of uncertain inputs; specifically, this implies that the full distribution of each input parameter should be evaluated and that the importance of each input should be evaluated across the domain of all other parameters [33,34]. e main methods for performing GSA include variance-based methods, regression methods, and Morris screening methods. Among these GSA methods, the variance-based methods have restrictions on the output distribution. Regression methods require knowledge of the input-output relationship form. Morris screening methods cannot compare the magnitude of sensitivity. Only variancebased methods can explore the whole space of the uncertain inputs [35]. e inputs are usually carried out via Monte Carlo (MC) sampling, and the basic idea of SA based on variance is to analyze the effect of input on output variance.
A common type of GSA is the variance-based method, which operates by apportioning the variance of the model's output into different sources of variation in the inputs [15,36]. More specifically, it quantifies the sensitivity of a particular input (the percentage of the total variability in the output attributed to the changes in that input) by averaging over other inputs rather than fixing them at specific values.
e Fourier amplitude sensitivity test (FAST) was one of the variance-based methods [37]. e classical FAST method uses spectral analysis to apportion the variance, after first exploring the input space using sinusoidal functions of different frequencies for each input factor or dimension [38]. e FAST is a variance-based method that is used to calculate the main sensitivity index (MSI) by scanning the parameter space with periodic functions such that the entire sample space can be analyzed [31]. A multidimensional integration is reduced to 1D integration along with a curve by associating each variable with a sampling frequency of the system in the Fourier transform space, which greatly improves the sampling efficiency [39,40]. en, Saltelli et al. [12] extended the FAST method to perform the total sensitivity index (TSI), which is called eFAST method.
is eFAST method, combining the benefits of the FAST and Sobol's method, is highly efficient and may be used to analyze the interactive relationship between various parameters.
In the present work, the MC method is used to evaluate the effect of each input error on the output [27]. Firstly, the PCE-eFAST model is established. en, the statistical characteristics of the parameters are calculated by performing a large amount of random sampling on the parameters in the model. Finally, the approximate value of the solution is obtained. It is concluded that the necessary precautions should be taken to reduce the impact of changes in these parameters on the parameters of the shield machine equipment and to minimize errors or disturbances.

The Architecture of the BIM-BDP
TBM operations are greatly affected by geological conditions and heavily rely on the operator's own subjectivity. In the case that external environment changes, it is difficult for TBM to perceive the status of geotechnical and shield machines in time to make scientific judgments to guide construction, leading to frequent occurrence of engineering accidents during TBM tunneling, such as ground settlement and cutter head damage, and thus resulting in huge economic losses. erefore, it is necessary to remotely monitor and control different types and brands of shield machines to collect construction information to guide construction. In order to realize the scientific and efficient construction management of the shield tunnel, a shield construction information management system integrating project management, data analysis, and construction decision support is particularly important.
In this study, a UA/SA method is proposed for data analysis and decision making during tunnel excavating. To improve its prediction accuracy, a variety of sensors are employed, covering both shield machine and stratum. Furthermore, in order to facilitate the data collection, processing, analysis, and presentation, an BIM-BDP is developed.
e entire system is structurally divided into a sensing module, a storage module, an analysis module, and a presentation module, as shown in Figure 1.
e sensing module and storage module are used to collect and store the shield machine data of the tunnel project, and the analysis module is used to analyze shield machine parameters. e presentation module exhibits the BIM-BDP through the browser client and realizes the functions of visualizing the shield parameter data and applying the platform. e TBM construction data sensing layer needs to collect real-time information related to TBM operation and main control parameters.
is real-time information should be related to the whole process of shield tunnel construction, types, and stages.
is information is divided into three categories according to engineering objects: shield machine (construction equipment), geological space (Earth layer information), and tunnel structure (engineering entity). e shield machine information mainly includes its overall geometry, performance parameter description, and key construction parameters of each subsystem of the shield; the tunnel structure includes tunnel linear design, functional design parameters, segment ring design parameters, and tunnel forming quality parameters; geological space includes geological information such as soil parameters, soil stratification, and hydrological information (e.g., groundwater level, head, and seepage).
is information plays an important role in supporting the progress, quality, cost, and safety management of the shield tunnel project. e information classification and storage methods are shown in Figure 1. e data stored by the module includes real-time data of the shield machine, manual data, and BIM 3D model data.
e three types of data are stored in the database server and Advances in Civil Engineering establish a request and response relationship with the web server. e web server then establishes a request with the browser. e browser sends a request to the web server in an HTTP format to access the cloud database. After the web server accepts the server request, the request is automatically converted into a SQL query statement and passed to the database server. After receiving the request, the database server needs to verify its legitimacy, perform data processing, and then return the processed structure to the web server. e web server then converts the obtained result into an HTML document form and displays a friendly web page through a browser. e analysis module is the core of the BIM-BDP, which embeds the sensitivity and uncertainty analysis algorithm.
e UA/SA makes use of the previously developed BIM-BDP that collects the excavating information of SRHT. Numerous methods, based on either deterministic or statistical concepts, have been proposed for performing UA/ SA. Most of the existing methods require explicit expressions (equations or functions) between the inputs and outputs. e common way of quantifying uncertainty is estimating the PDF [41] and the cumulative distribution function (CDF) [31,42]. It is considered that the parameters or the interaction between the input parameters causes the variance of the model output; thus, PDF and CDF can reflect the sensitivity of the model output to the input parameters [43,44]. erefore, the coupling among the parameters and the total variance parameter can be analyzed by GSA [45]. For the GSA study, the surrogate model (also known as metamodel) is commonly employed to represent the uncertainty in the output and calculate the mean and standard deviation [46]. We adopt the generalized polynomial chaos expansion (gPCE) method [47] to construct a computationally cheaper surrogate to approximate the mapping between input parameters and output parameters. en, the eFAST method is adopted to obtain the total-order sensitivity index, which takes into account the main effect of a single parameter on the output and the interaction between The gPCE the parameters and describes the total influence of a parameter on the output. e data presentation module is based on the browser display and uses the Silverlight plug-in to develop an interactive interface to increase system interactivity. e module can realize the visualization of the BIM and its associated files, full network visualization, and real-time key parameters of the shield machine. In addition, the system provides multiple data analysis tools to support construction decision making. Besides, users can easily find the construction time and material consumption of any component by only selecting it on the BIM. For example, if any lining ring is selected, the construction time and material consumption of the ring can be counted. By querying the historical data and exporting the data, the excavation surface stability analysis and GSA can be further developed based on the data.
In order to better collect, transmit, store, and use these data, the platform supports multiuser access and remote login monitoring via the Internet, while reducing the dependence on front-end hardware and software, and its maintenance and upgrade are easy. e system is built using the browser/server (B/S) architecture and uses Java Development Kit (JDK) as the development platform, MyEclipse as the development tool, Java as the programming language, and SQL Server 2008 as the database.

The UA/SA Framework
e global scheme that describes the required different steps for the computation of partial variances is presented in Figure 2. e proposed framework (as given in Figure 2) consists of three main steps. In Step 1: generating data based on the input-output index system, we confirm the inputs and outputs from the BIM-BDP and generate the input data by the same "level of uncertainty" according to the same coefficient of variation (COV). In Step 2: running the surrogate model, surrogate models are built for the response and coupling variables, and then we can extract the output data by rerunning of the surrogate model. In Step 3: performing GUA and GSA by PCE-eFAST method, the uncertainty and sensitivity measures of inputs are estimated, and the input parameters include the MSIs and TSIs. In Sections 4.1-4.3, we explain the three steps in detail. For illustration, we use the BIM-BDP to verify the applicability of this method.

Step 1: Generating Data according to the Input-Output Index System.
e parameters of BIM-BDP data can be sorted in to three categories: shield machine, tunnel structure, and geological space, including more than 900 groups of shield construction parameters as well as more than 30 construction monitoring parameters. However, due to the limited attention of the shield machine operator, it is impossible to monitor and control all these parameters simultaneously. Hence, some significant parameters, which have a prominent effect on the safety of shield tunneling, are considered as key control parameters for the shield machine operator.
Previous studies [48,49] have primarily focused on the mechanical performance of the TBM in the construction phase, such as the maximum thrust force, permeability, and rotation speed of the cutter head. Berthoz et al. investigated the identification of pertinent control parameters to guarantee the safe advancement of the machine and the ground-supporting function of the cutting wheel, such as advancing speed, total volume loss, thrust effort, and torque on the cutter head [3]. Zou [50] indicated that a synchronous grouting technique can improve the stability of the surrounding rock and control the ground surface deformation in the shield tunnel construction process. Besides, based on construction experiences and previous studies [14,51], the deviation angle and bubble chamber pressure are the main monitoring parameters for a shield machine operator to evaluate the stability of shield excavation. According to statistical analysis [52], it is found the attitude and position of the shield machine are mainly determined by the thrust cylinder and cutting wheel systems. Hence, some pertinent control parameters (as shown in Table 1) on the BIM-BDP are regarded as significant parameters for UA/SA study. Herein, the x1-x10 are the inputs and the s1-s2 are the outputs. In this study, the distribution of input parameters and output parameters is defined as follows: (i) Beta (r, s, a, b): r > 0, s > 0. e support of the distribution is given by the parameters [a, b]. e shape of the distribution is related to parameters [r, s] and their ratio.
e logistic distribution is similar to the normal distribution in shape but has heavier tails (higher kurtosis). (iii) Laplace (μ, b): µ ∈ R, b > 0. Laplace distribution is also known as double exponential distribution, because it can be thought of as two exponential distributions (with an additional location parameter) spliced together back-to-back.
Determining the range and distribution of each input parameter is important in GSA study [53,54], because it may affect the parameter TSIs and rank [55]. In general, two methods can be used to set the parameter ranges and distributions: (1) for those parameters with clear physical interpretations, their values should cover the entire physical range as much as possible; (2) if a parameter has no clear physical interpretation, one should refer to the literature or expert experience. In this study, all the parameters in the BIM-BDP have clear physical or geometric significance. e beta, logistic, and Laplace distributions have been adopted to fit the parameters. According to the fitting accuracy, the best fitting distribution and the parameter of each input have been summarized in Table 1.
en, the inputs sets for UA/SA can be generated based on the best fitting distribution. Besides, scatter plots constitute an important class of visualization method. Scatter matrix plots are high-dimensional extensions of scatter plots, which can effectively show the pairwise relationship of multidimensional data. As shown in Figure 3, there is a high correlation between x4 (advancing speed) and x9 (average soil excavation volume with cutter rotation 360°). It is obvious that, with the increase in the advancing speed, the average soil excavation volume with cutter rotation 360°a lso increases significantly.
e PCE model is popular for uncertainty propagation and GSA to determine the effect of input uncertainties on complex computational models [56,57]. Currently, we use random variables to represent the uncertainty in the system. In this section, we introduce the basics of the gPCE [47] for stochastic uncertainty propagation.
We use a triplet (Ω, F, P) to represent a complete probability space, where the sample space Ω is the set of all possible outcomes, F ⊂ 2 Ω is the σ -algebra collection of all measurable events belonging to Ω, and P: F ⟶ [0, 1] is a probability measure indicating the likelihood of occurrence of each event. Let ζ � ζ 1 (υ), ζ 2 (υ), . . . , ζ n (υ) : Ω ⟶ Ξ ⊆ R n be a set of uncorrelated random variables that represent the uncertainty in the system. en, any second-order random variables μ(υ) ∈ L(Ω, P)(i.e., (μ, μ) � ‖μ‖ 2 Ω < ∞) can be represented as follows [28,57]: where Ψ i 's were originally proposed by Wiener as Wiener-Hermite polynomial chaos (assuming that ζ has a Gaussian distribution) and later extended to generalized polynomial chaos (more classical orthogonal polynomials) by the Askey scheme [58]. e choice of polynomial chaos types depends on the distribution of random inputs. In the current work, we use Hermite polynomials for Gaussian random variables and Legendre polynomials for uniform random variables; the parameters μ i 's are called the gPCE coefficients.
For the purpose of numerical calculation, the series is usually truncated up to polynomial order p with N terms (N � (n + p)!/n!p!) to approximate the exact output μ(ζ(υ)): Based on the orthogonality of the polynomial basis, the gPCE coefficients can be calculated by projecting μ on each basis using the inner product: where ρ(ζ) is the probability distribution of the variable ζ, and the integration can be estimated using a quadrature rule: where ζ k , ε k G k�1 is a set of quadrature points and the corresponding weights. For low-dimensional problems, the use of tensor product quadrature may have significant efficiency; however, since the number of quadrature points increases exponentially as the dimension increases, it suffers from the "curse of dimensionality" for high-dimensional cases.
Once an accurate gPCE μ p (ζ) is constructed to approximate the random function μ(ζ), one can obtain an analytical representation of the function μ in terms of ζ.
erefore, all the statistical information of μ can be retrieved from the gPCE in a straightforward manner using their definitions directly. e jth moment of μ for j ∈ N is which can be calculated either analytically or with small computational effort. In particular, the first and second moments of μ can be obtained directly from the gPCE coefficients as follows. e expectation of μ only depends on the first coefficient: Step 1 Generating data according to input-output index system Step 2 Running of surrogate model Step 3 UA/SA by EFAST method Figure 2: e UA/SA scheme that describes the required different steps for the computation of partial variances. 6 Advances in Civil Engineering e variance σ 2 μ is obtained as a weighted sum of its squared gPCE coefficients: By gPCE method, we can quantify the uncertainty though the quantity of interest g (propagated from the uncertainty in the inputs) efficiently using its stochastic moments. e maximum thrust force needs to be 1.5∼2 times the total resistance. e total resistance mainly includes the resistance of shield shell and surrounding strata, the thrust resistance of knife disc plate, the friction resistance between tube sheet and shield tail, the penetration resistance of the stratum through the notched ring, steering resistance, and the drag resistance of the supporting trailer after traction.

4.3.
Step 3: Performing GUA and GSA by PCE-eFAST Method. e eFAST is a variance decomposition method, which uses a periodic sampling method and a Fourier transformation to partition the whole variance of the model output and quantify the degree to which variation in each input factor accounts for the output variance [59]. A periodic sampling approach is used to generate a search curve in the parameter space, and partitioning is implemented by assigning the periodic sample of each parameter with a distinct frequency [15,46]. en, a Fourier transformation is applied to the model output to measure how strongly a factor's frequency propagates from the input to the output, i.e., the variance contribution of the factor to the whole variance of the output [12].
First, in order to use the model evaluations more effectively, a conversion function in (8) is selected for the input parameters x i (i � 1, 2, . . ., N t ).
where ] i is a random phase in [0, 2π]. Here, ] i � π/3, λ i represents the characteristic frequency of the input parameters, and t is an independent variable for all input parameters. In (−π, π), N t � 2ωλ max + 1 is taken as a uniform interval. ω is usually set to 4 or 6, and λ max is the maximum value of the sequence λ i that should satisfy the following nonlinear correlation: n i�1 s i ·λ i ≠ 0. In the above formula, s i is an integer. erefore, y � f(x 1 , x 2 , . . . , x 10 ) can be transformed intoy � f(t); that is, f(t) � f X 1 sin λ 1 , X 2 sin λ 2 , . . . , X 10 sin λ 10 t ∈ (−π, π).
e Fourier series expansion of f(t) is   Advances in Civil Engineering e Fourier frequency spectrum curve is defined as Λ p � U 2 j + V 2 j with j ∈ Z, U j , V j , Λ p having the following properties: e variance of the model caused by the parameters' input changes can be expressed as the sum of the parameters 1, 2, 3, . . ., ω frequency doubling the spectral curves.
where z 0 � Z − 0 { }, λ i is the integer frequency defined by the parameter x i , and P is the integer. en, the total variance of the model is e total variance of the model output W and the variance of the model W i can be obtained by U p , V p , and λ i . Since the total variance of the model output is obtained by parameters or coupling the parameters, the decomposition can be expressed as (16) where W ij is the variance (coupling variance) contributed by the input item x i through the input item x j , and W ijm is the variance contributed by the input item x i through the error items x j and x m . W 12, . . ., k is the variance contributed by the input item x i through the input items x 1 , x 2 , . . ., x k . erefore, the first-order sensitivity index R i of the input item x i can be defined as (17) by the normalized processing: e global sensitivity index reflects the sum of the input item, direct contribution rate, and interaction with input parameters indirectly to the variance of the model output. By decomposing the output variance of the model, the eFAST method can quantitatively obtain every order and global sensitivity of each error item. erefore, the eFAST method can not only test the effect of the change in multiple error items on the input model results but also analyze the direct and indirect effects of the change of each input item on the simulation results. e specific methods for calculating the total sensitivity are as follows. e input item x i was assigned to a larger frequency λ i , and then, a set of smaller different integer frequencies was assigned to other input items. e selected integer frequency should meet the following relationship: λ i ≥ 2ω · max λ i ′ , so the frequency domain was divided into two parts: [1, ω · max λ i ′ ] and [ω · max λ i ′ + 1, N t − 1/2]. Finally, the total sensitivity of the input item was calculated by e aforementioned steps are summarized in Algorithm 1.

The UA/SA for Superlarge Diameter
Shield Excavation e data of this study comes from the Sanyang Road Highway-railway Tunnel (SRHT). e SRHT project of rail transit line 7 is located in Wuhan, China, with a total length of about 31.3 km. is tunnel is designed for both road and subway use and constructed using two superlarge (diameters are 15.76 meters) mud-water balance shield machines. In the project of SRHT, especially in the compound stratum around the under-river section, the control of excavation stability is a vital problem. In the SRHT project, the BIMbased big data platform (BIM-BDP) has been established for the monitoring of construction parameters.
Global UA/SA aims to depict the entire set of possible model outcomes, together with their associated frequencies of occurrence [59]. We conducted SA for investigating the differences of outcome uncertainty derived from the variation in input parameters with eFAST. Besides, we identified influential input parameters by calculating both the MSI and the TSI. In line with [59,60], for each output variable, a threshold of MSI > 0.01 and TSI > 0.1 was imposed to identify influential parameters.

Surrogate Model Validation.
e surrogate modeling methods by gPCE described in Section 4.2 are used to construct the relationship between input and output parameters. Because the parameters in the BIM-BDP are nonlinear and constitute strongly coupled system, it is extremely difficult to use a finite number of sampling points to construct a very high accuracy surrogate model that can completely replace the dynamic model. erefore, the adjusted error-squared [61], R 2 adjusted , is calculated to validate the accuracy of the surrogate models. ey can be calculated by

Advances in Civil Engineering
where p is the total number of explanatory variables in the model (excluding the constant term), n is the number of validation sample size, y i is the observed value, y * i is the value predicted by the surrogate model, and ȳis the average value of the observed value. Larger values of the R 2 adjusted reveal that the surrogate model is a better approximation of the original dynamic model. Figure 4 shows the scatters of the observation data in BIM-BDP and the prediction data by the surrogate model. Herein, the R 2 adjusted values of s1 and s2 are 0.850 and 0.896, respectively, using (18). ose values are all close to 1, indicating that the surrogate model can predict the average magnitudes of s1 and s2 with high reliability. e black dash line in Figure 4 means that the observation data is equal to prediction data, and the red line is the fit line of the observation data versus prediction data. e small deviation of the above two lines also indicates the good fitting accuracy of the surrogate model.

e Impacts of Sample Size on the Sensitivity Measures.
e goal of parameter GSA is to explore the entire input space with a reasonable sample size and to identify the sensitive parameters [62]. A key element of GSA is the sampling of input parameters for the simulation, and the sample size determines the computational cost of the analysis [46]. Different metrics of the train sample have been adopted to evaluate the accuracy of the surrogate model. e metamodel is built using a training set (X; Y) of size N train that is much smaller than that of the total sample N total , which is got from the BIM-BDP database. It is noteworthy that the values in the training set are continuously selected along the segment ring. en, the prediction data by different metamodels can be validated with respect to the original observation data. In this research, 7 groups of training samples (sample sizes are 704, 1408, 2816, 5632, 11264, 22528, and 45056, respectively) are collected with the mean and standard deviation of the 10 input parameters by the Latin hypercube sampling (LHS) method. en, the s1 and s2 can be obtained by running the surrogate model. e evolution of the MSI and TSI with increasing sample size is shown in Figure 5.

Advances in Civil Engineering
With regard to bubble chamber pressure (s1), the MSIs and TSIs of input parameters vary greatly when the sample size is less than 22528. e most important parameter, advancing speed (x4), is chosen as an example to exhibit the results of SA. erefore, if the objective of the parameter GSA is solely to calculate an input parameter ranking, then the eFAST analysis can be applied with a sample size of 22528 to yield a reliable ranking result.
From Figure 5, the sample size is the main determining factor for the convergence of the MSI and TSI with the surrogate model. e sample size over 22528 yields the most stable sensitivity indices. When the sample size is small, i.e., only 1408, the MSI of advancing speed (x4) shows strong variations and cannot reach a stable convergence result. For most of parameters, fewer than 11264 samples are not sufficient to reach a stable value. In general, it appears from the plots that a sample size of greater than 22528 is required to reach the final converged value for the most of parameters.
For the input parameter with the highest SIs, such as advancing speed (x4) to s1 or actual volume excavation (x10) to s2, the final stable sensitivity index value is attained rather slowly, and greater fluctuations are observed. Convergence can be obtained for this type of parameter under the condition of a large sample size. However, those insensitive parameters are more prone to minor fluctuations that can appear with increasing sample size. For those insensitive parameters, the SA can quickly obtain convergence. ese results are consistent with the research of [53], which verifies the robustness and universality of this study. Figure 6 reports the full results about MSIs and TSIs for all 10 input parameters on the two output parameters, under the sample size of 22528 (according to the conclusion in Section 5.2.1). e influential parameters (corresponding to MSI > 0.01 and TSI > 0.1) are found for two outputs.

e MSIs and TSIs of the Entire Section.
For the output bubble chamber pressure (s1), the synchronous grouting pressure (x2), advancing speed (x4), cutter rotation speed (x6), ratio of intake/discharge mud (x7), cutter torque (x8), and soil excavation volume (x9) are influential parameters according to the MSIs, whereas synchronous grouting pressure (x2), advancing speed (x4), and soil excavation volume (x9) are influential parameters according to the TSIs. e most important parameter is advancing speed (x4) reaching an MSI of up to 0.292 and a TSI of up to 0.331. e next important parameter, soil excavation volume (x9), reaches an MSI value of up to 0.258 and a TSI value of up to 0.313. e third important parameter, synchronous grouting pressure (x2), reaches an MSI value of up to 0.195 and a TSI value of up to 0.235. Besides, the MSIs and TSIs of other input parameters are much lower than the above-mentioned three parameters.
For the output deviation angle (s2), the maximum thrust force (x1), synchronous grouting pressure (x2), advancing speed (x4), cutter rotation speed (x6), cutter torque (x8), soil excavation volume (x9), and actual volume excavation (x10) are influential parameters according to the MSIs, while cutter rotation speed (x6), cutter torque (x8), soil excavation volume (x9), and actual volume excavation (x10) are influential parameters according to the TSIs. e most important parameter is actual volume excavation (x10) reaching an MSI of up to 0.353 and a TSI of up to 0.388. e next important parameter, soil excavation volume (x9), reaches an MSI value of up to 0.299 and a TSI value of up to 0.380.
e MSIs and TSIs of other input parameters are much lower than the above two parameters.
Comparing the MSIs and TSIs in Figure 6, we find that the impact of an input parameter's interaction with other parameters on the outputs (TSI-MSI) can be much greater than that of the parameter's own contribution (MSI); for example, the MSI and TSI of actual volume excavation (x10) to bubble chamber pressure (s1) are 0.002 and 0.024, where the TSI is ten times the MSI. In this study, the TSI, which includes the effects of parameter interactions, yields basically the same ranking of the parameters as the MSI (as in Figure 6). Parameter importance may be better determined by the TSI, which is likely to give a better quantitative view of the overall parameter importance over the whole simulation parameter than indices computed for the individual parameter by MSI.

UA Results.
We use global UA to study the propagation of uncertainty from basic input parameters to output parameters, and to quantify the confidence of the model output from existing data and parameter estimates. In this study, we characterize the well-trained metamodel by varying all of the input factors of the metamodel within a certain range. is process can be divided into four steps: Step 1, defining the type of random distribution, including Beta, Logistic, and Laplace in the current study; Step 2, creating a random sample of the input based on the selected distribution and introducing a random error into the model input; Step 3, running the model using random samples as input; Step 4, obtaining the error of each parameter to obtain a 95% confidence interval.
As mentioned above, the input parameters have their initial conditions and a specific area. Next, the LHS method is used to generate inputs based on the distribution, and then the output is generated using the PCE-eFAST model. After several iterations, the uncertainty of the PCE-eFAST model will be represented by CDF of the model outputs. Figure 7 shows the results of the UA of the inherent parameters of the shield machine in different model outputs. Due to the uncertainty of the input parameters and their interaction, the model output s1 (average bubble chamber pressure per ring) ranges from 4.4 to 5.4 MPa; s2 (deviation angle per ring) ranges from −80°to −50°.
is study takes s1 (average bubble chamber pressure per ring) as an example. Figure 8 shows the comparison between the univariate uncertainty CDF curve and the baseline. Taking Figure 8(a) as an example, set x1 as the average of its distribution, the blue thick solid line is the baseline CDF of the model output s1 (the same as that shown in Figure 7(a)), and the red dotted line is the univariate of the model output s1 uncertainty CDF. If the two curves are highly similar, this indicates that the change in value x1 has no significant effect on the model result s1. Obviously, the factors x2 (Figure 8(b)), x4 (Figure 8(d)), and x9 (Figure 8(i)) have the greatest impact on model output because their univariate uncertainty CDF curves are very different from baseline CDF. At the same time, the factors x1 (Figure 8(a)), x2 (Figure 8(c)), x3 (Figure 8(e)), and x4 (Figure 8(j)) have little effect on the model output because their univariate uncertainty CDF curves almost coincide with the baseline. e results of the UA are identical with those obtained in Figure 6, which indicates that the method can identify inputs that have a significant impact on the output. In this way, the UA result of the proxy model method can be verified in some aspects.

Discussion
e BIM-BDP is used for monitoring shield construction parameters and status during tunnel construction. e database of BIM-BDP, including BIM database, real-time collection database, and manual filling database, supports shield tunneling management and further analysis. en, we performed GSA of the control parameters of the shield machine from BIM-BDP data on SRHT, under high uncertainty. We use the eFAST which has proven to be among the most reliable and efficient GSA methods [13,59]. e finding that significant differences exist in parameters identified as influential and noninfluential, in parameter importance rankings, and in the magnitude of MSI and TSI under high uncertainty has not been reported previously. roughout our research, it is found that the set of significant first-order MSI and total-order TSI sensitivity indices returned by eFAST exhibit the same ranking.
According to the previous research [63,64], LHS develops a procedure to establish correlations between sampled values; hence, the LHS has been adopted as a sampling method in contrast to the samples by the eFAST method. Furthermore, the Pearson correlation coefficient (PCC) (see [65]) has been used to investigate the correlation between samples. e PCCs of samples from BIM-BDP data, LHS, and eFAST are used to obtain 1000 random parameter combinations, and the LHS and eFAST method are illuminated in Figure 9. From Figure 9, the sample PCC from BIM-BDP data is similar to that from LHS. However, the sample PCC from the eFAST method is almost close to zero, which means the eFAST method cannot fit correlation coefficients of the input samples effectively. To be specific, the maximum correlation coefficient from BIM-BDP data is between x4 and x9, which is 0.938 and indicates a strong relevance between x4 and x9. In contrast, the maximum correlation coefficient of x4 and x9 from LHS is 0.947, which is very similar to BIM-BDP data. However, this coefficient of x4 and x9 from the eFAST method is 0.001, which is almost close to zero and quite different from the coefficient from BIM-BDP data and LHS.
In order to elucidate the sample technology by the eFAST method, the parallel coordinates plot of selected samples is shown in Figure 10, which reveals the distribution of samples within their variability ranges. In Figure 10, the parameter ranges are standardized to percentiles and allowed for comparison across parameters.  Figures 10(a), 10(c), and 10(e) show that the samples vary from the entire range, while Figures 10(b), 10(d), and 10(f ) indicate all the samples with the x4 varying from 0.2 to 0.3 (set as an example to illustrate the correlation coefficient of x4 to other parameters). Comparing Figures 10(a), 10(c), and 10(e), we find that the samples do not fill the entire ranges of the parameters from BIM-BDP system in Figure 10(a), while from LHS and eFAST method, the samples are set as the desired distribution (e.g., uniform and normal) over the entire range in Figures 10(b) and 10(c). e samples with the x4 varying from 0.2 to 0.3 are extracted from the entire range. Figure 9 shows that relevance between x4 and x9 is very intense, which can be also demonstrated by Figure 10. From the BIM-DMSCC system, the sample range of x4 is very similar to x9; to be specific, with the x4 varying from 0.2 to 0.3, the x9 almost varied from 0.18 to 0.41 in Figure 10(b). For LHS, the sample range of x4 is also very similar to x9, and with the x4 varying from 0.2 to 0.3, the x9 almost varies from 0.16 to 0.41 in Figure 10(d). However, as to the eFAST method, with the x4 varying from 0.2 to 0.3, the x9 almost varies from 0.01 to 0.81 in Figure 10(f ).
is is because the sampling procedure implemented in eFAST defines a sinusoidal function of a particular frequency for each input parameter [65], i.e., the search curve in equation (8), that assigns a value to x based on the sample number 1 through the total number of samples per search curve, N t . e choice of the sinusoidal function depends on the distribution of parameter values desired (e.g., uniform, normal). Hence, the correlation coefficients of the input samples by the eFASTmethod are quite different from the correlation coefficients of the inputs from BIM-BDP data.
In general, the computational execution time of the model is the major concern when performing the eFAST method [65]. However, the sampling procedure of the eFAST method can be further investigated for the influence on SA results. Hence, more SA methods need to be adopted for comparisons, such as Morris elementary effects [66,67] and Sobol's indices [68,69]. In addition, only 10 input parameters and two output parameters have been regarded as significant parameters and investigated in this study; however, the control system of the superlarge diameter shield machine is very complicated and has more than 900 groups of control parameters. is is because it is difficult to use the metamodel to fit the relevance of so many parameters. More effort to expand the input-output index system would further improve the application scope of GSA results.    x8  x6 x7  x5  x4  x3  x2  x1  x10  x9  x8  x6 x7  x5  x4  x3  x2  x1   x10  x9  x8  x6 x7  x5  x4  x3  x2  x1  x10  x9  x8  x6 x7  x5  x4  x3  x2  x1   x10  x9  x8  x6 x7  x5  x4  x3  x2  x1  x10  x9  x8  x6 x7  x5  x4  x3  x2  x1 Samples from BIM-DMSSC system

Conclusion
It is important to control and adjust the shield parameters for excavation stability, especially in the SRHT project, which needs to cross 1365 meters of soft-hard composite strata in the under-river section. In this paper, BIM technology has been utilized to manage the essential project data of SRHT project in digital format throughout shield excavation, and then the GSA framework has been adopted to investigate the correlation between parameters from the BIM-BDP, which can help the administrators optimize their management scheme of shield parameters during the tunnel excavation. SA can help the administrator identify the importance of the parameters, which can reduce the epistemic uncertainty of project safety during tunnel excavating. In addition, SA helps the administrator determine what type of additional information of parameters should be gathered to enhance project safety and reliability. For the output s1, the x2, x4, x6, x7, x8, x9 are influential parameters according to the MSIs, while x2, x4, and x9 are influential parameters according to the TSIs. e most important parameter is x4 reaching an MSI of up to 0.292 and a TSI of up to 0.331. e next important parameter, x9, reaches an MSI value of up to 0.258 and a TSI value of up to 0.313. e third important parameter, x2, reaches an MSI value of up to 0.195 and a TSI value of up to 0.235. Besides, the MSIs and TSIs of other input parameters are much lower than the above three parameters. For the output s2, the x1, x2, x4, x6, x8, x9, x10 are influential parameters according to the MSIs, whereas x6, x8, x9, x10 are influential parameters according to the TSIs. e most important parameter is x10 reaching an MSI of up to 0.353 and a TSI of up to 0.388. e next important parameter, x9, reaches an MSI value of up to 0.299 and a TSI value of up to 0.380. Besides, the MSIs and TSIs of other input parameters are much lower than those of the above two parameters. e GSA from the BIM-BDP can determine the quantitative correlation between shield parameters. e decision maker can then evaluate the parameter adjustment strategy during shield advancing. ese findings are of general value and can be extrapolated to other similar projects under high uncertainty. Some subsequent research can be investigated in the future. Only ten input parameters were considered in this study. However, it is necessary to study more shield machine control parameters to promote the establishment of more effective models and achieve specific goals more effectively. More complex and effective hybrid models can be studied to more accurately predict the stability of the excavation and optimize the construction plan. In addition, with the accumulation of more data, the modeling accuracy will be greatly improved. Future work can focus on how to reduce system uncertainty by the GSA technology and how to enlarge the application scope.
is is expected to reduce system uncertainty and expand its application range through GSA technology.
Nomenclature Ω: e sample space F: e σ-algebra collection of all measurable events belonging to Ω

P:
A probability measure indicating the likelihood of occurrence of each event Ζ: A set of uncorrelated random variables that represent the uncertainty in the system μ i : e gPCE coefficient E: Wiener-Hermite polynomial chaos N: Positive integer Ρ: e probability distribution density Ε: Weights P: Polynomial order ‖·‖: Euclidean Data Availability e data generated by the superlarge tunnel boring machine used to support the findings of this study have not been made available because of the nondisclosure agreement the research team signed with the property owner and the contractor. 18 Advances in Civil Engineering

Conflicts of Interest
e authors declare that they have no conflicts of interest.