Comparisons of Two Approaches for Geotechnical Model Calibration with Scarce Data

Geotechnical models are usually built upon assumptions and simplifications, inevitably resulting in discrepancies between model predictions and measurements. To enhance prediction accuracy, geotechnical models are typically calibrated against measurements by bringing in additional empirical or semiempirical correction terms. Different approaches have been used in the literature to determine the optimal values of empirical parameters in the correction terms. When measured data are abundant, calibration outcomes using different approaches can be expected to be practically the same. However, if measurements are scarce or limited, calibration outcomes could differ significantly, depending largely on the adopted calibration approach. In this study, we examine two most commonly used approaches for geotechnical model calibration in the literature, namely, (1) purely datacatering (PDC) approach, and (2) root mean squared error (RMSE) method. Here, the purely data-catering approach refers to selection of empirical parameter values that minimize coefficient of variation of model factor while maintains its mean value of one, based solely onmeasured data. A real case of calibrating the Federal Highway Administration (FHWA) simplified facing load model for design of soil nail walls is illustrated to thoroughly elaborate the differences in practical calibration and design outcomes using the two approaches under scarce data conditions.


Introduction
It has been well recognized that model uncertainty plays a key role in reliability-based design of geotechnical structures [1][2][3][4][5], as it is usually much larger than uncertainty associated with design parameters (e.g., soil cohesion, unit weight, and internal friction angle). Typically, geotechnical models need to be assessed and calibrated against measured or observed data before used for design. However, in many cases, measurements or observations that are available for model assessment and calibration are limited, mainly due to two reasons: first, obtaining in situ geotechnical data is costly and time consuming in general; and second, monitored data were always undervalued and thus not well collected and pooled, although in recent years, geotechnical engineers start to realize the value of data and make effort to make the best use of it [6][7][8][9][10][11][12][13].
Despite of the situation, assessment and calibration of geotechnical models using limited data are far better than doing nothing at all [3]. Usually, calibration of a geotechnical model can be done by following two steps: (1) introduce an empirical or semiempirical correction term to a model and (2) determine the constants in the correction term according to certain criteria. Based on the calibration criteria, there are two methods that have been widely adopted for geotechnical model calibration in the literature. One is the purely datacatering (PDC) approach, and the other is the root mean squared error (RMSE) approach.
e PDC approach calibrates a model by adjusting the constants to satisfying two criteria: keeping the mean of bias equal to one while minimizing the coefficient of variation (COV) of bias. Here, bias is defined as the ratio of measured to the predicted value. Bias, also referred to as model bias or model factor elsewhere, is commonly treated as a random variable and used as an indicator for quantification of model accuracy. Previous geotechnical model calibration using the PDC approach can be seen in, e.g., Lin and Liu [14], Lin et al. [15], Lin et al. [16], Phoon and Kulhawy [17], Phoon and Tang [18], Tang and Phoon [19], Yuan et al. [20], and Yuan et al. [21].
On the other hand, the RMSE approach determines the constants as the set that minimizes the root mean squared error between measurements and predictions. Common examples are geotechnical models developed using the response surface method and machine learning methods, e.g., Bathurst and Yu [22], Lin et al. [16], Liu et al. [23], Yu and Bathurst [24], Zhang et al. [25], Zhang et al. [26], Zhang et al. [27], and Zhang et al. [28]. Obviously, these two calibration approaches are not equivalent, especially when the data for calibration are scarce. is will result in different calibrated values for the constants, which in turn lead to different geotechnical design outcomes. Discussion on the difference of these two approaches will be provided later in this study.
While being extensively used, influences of adoption of the two approaches on geotechnical calibration outcomes and the consequences are not yet thoroughly examined. To fill the gap, this study is focused on investigating the differences in both calibration and practical design outcomes that are resulted by using the above two calibration approaches. To allow comparing model competence from a third angle, the Bayesian information criterion (BIC) is employed [29]. A case study of calibrating the default Federal Highway Administration (FHWA) facing load model for facing design of soil nail walls is shown to elaborate the influences.

Approaches for Model Calibration and Ranking
is section introduces in detail the PDC and RMSE approaches for calibration of geotechnical models against observed data. e commonalities and differences of the two approaches are discussed. A likelihood-based model ranking method called Bayesian information criterion (BIC) is also introduced, which will be used later to quantify the competences of the calibrated models.

Purely Data-Catering Approach.
Suppose y � f(x) be the geotechnical model to be calibrated. Here, y is the model output which is taken to be a scalar for simplicity; and x is the model input parameter vector, x � (x 1 , x 2 , . . . , x n ). Note that the input parameter vector x is the same across different design scenarios, while the values of its elements could vary. For convenient, we denote y k and x k for design scenario k.
Let D � (d 1 , d 2 , . . . , d m ) be m observed or measured values for y from m real cases, and λ be the model factor (which is a random variable) for y � f(x). en, based on the method of moment, the sample mean and sample standard deviation of λ, denoted as λ and s λ , respectively, can be computed as e purely data-catering (PDC) approach assumes that a model can be adequately (indeed, compromisingly) calibrated in terms of λ and s λ that are obtained by the method of moment. e PDC approach introduces an empirical correction term, M, to the original model for calibration purposes, where M is a function of x and C, with C being a vector of empirical constants to be determined using D. In general, the correction term can be written as M � f M (x ; C). As such, the calibrated model can be expressed as where y ′ is the model output after calibration. e sample mean and sample standard deviation of the model factor for the calibrated model, denoted as λ ′ , can then be calculated as e calibration principles of the PDC approach are to adjust the values of the empirical constants in C until simultaneously satisfying two criteria: (1) λ ′ is equal to 1, and (2) s λ ′ is minimized. Evidently, when the data available for calibration are limited, i.e., m in equations (3) and (4) is not sufficiently large, both λ ′ and s λ ′ could be significantly influenced by m and D, resulting in unstable calibration outcomes.

Root Mean Squared Error (RMSE) Approach.
is method assesses the accuracy of a model by computing its root mean squared error (RMSE) between model predictions (i.e., y k or y k ′ ) and measured (true) values (d k ). e RMSE can be computed as for the original model, i.e., y � f(x), and for the calibrated (corrected) model, i.e., e calibration principle of the RMSE method is to select C that minimizes r'. Note that this method does not necessarily result in λ ′ � 1 with minimal COV λ ′ , where COV λ ′ � s λ ′ /λ ′ is the coefficient of variation of λ'. Although the RMSE method has been widely used for geotechnical model development and calibrations, it does not provide an intuitive impression of model accuracy. 2 Advances in Civil Engineering

Model Competence Ranking.
e Bayesian information criterion (BIC) is a relative measure of goodness-of-fit among models given observed data and has been widely used in ranking model competence [29,30]. e BIC is computed as where ln(L max ) is the log of the maximum likelihood, κ is the number of model parameters, and n is the number of data points. Typically, the maximum value of the likelihood function (L) and the corresponding maximum likelihood estimators can be found using numerical methods. Technical descriptions of the maximum likelihood estimation method, e.g., the construction of a likelihood function, are not provided here for brevity. Interested readers are directed to, e.g., Juang et al. [31] and Lin and Liu [14] for more details.
e criterion simply states that the smaller the BIC value of a fitting model, the better the model captures the observations. It should be emphasized that the absolute value of the BIC itself is meaningless in terms of model competence; only the difference between BICs helps ranking the models.

Discussion.
Although the above two approaches have been widely adopted for geotechnical model calibrations based on observed data, there are some fundamental differences in calibration outcomes and interpretations, as given in Table 1.
e PDC approach uses two indicators to jointly describe the model accuracy, i.e., λ ′ and s λ ′ ; whereas, the RMSE approach uses only one indicator, r ′ . Typically, λ ′ is interpreted as an indicator for on-average accuracy, while s λ ′ is taken as an indicator for dispersive accuracy. e advantage of such an accuracy assessment scheme is that it provides an immediate and general idea of the performance of a model. For example, λ ′ � 0.90 and COV λ ′ � s λ ′ /λ ′ � 0.30 suggest that overall model predictions are 10% larger than the corresponding observations, while the dispersion in prediction accuracy is 30%. Clearly, if a model is perfect, then it would have λ ′ � 1 and COV λ ′ � 0. e disadvantage is probably the lack of ability to compare accuracies among models. is can be easily seen for two models, for example, A and B, where A has λ ′ � 0.90 and COV λ ′ � 0.30 and B has λ ′ � 0.80 and COV λ ′ � 0.20. In such a case, model A has a better on-average accuracy, but its prediction is more dispersive; whereas, model B has a less on-average accuracy, but the prediction spreads less. Hence, it is difficult to directly determine which model is more accurate, if without any further analyses.
For the RMSE approach, the smaller the r ′ , the better the model accuracy. For a perfect model, r ′ � 0. As this method uses a single index for model assessment, accuracy comparison among models is straightforward. However, this approach does not provide an intuitive sense of accuracy of the model itself.
Another difference is the criteria set for calibration. e PDC approach minimizes s λ ′ or COV λ ′ conditioned on λ ′ � 1. A mean of λ ′ of one represents that the calibrated model is unbiased on average, within the context of observations D.
e objective of the RMSE approach is to minimize r ′ to obtain the best accuracy. As has been pointed out earlier, usually this is not necessarily equivalent to the criteria for the PDC approach.
Generally, both PDC and RMSE approaches can only handle uncensored data. If observation data are censored, then more robust approaches such as the maximum likelihood method or Bayesian inference technique can be employed. However, to do so, the maximum likelihood method and Bayesian approach would have to first assume the probability distribution of λ ′ so as to construct the likelihood function L. erefore, the estimators by these approaches are conditioned on the distribution of λ ′ . For PDC and RMSE approaches, λ ′ and s λ ′ and r ′ are computed without any assumptions of distribution of λ ′ .
Last, it is pointed out that while each measurement or observation is equally weighted in the PDC calibration approach, it is not the case in the RMSE calibration approach. For the RMSE approach, measurements with large values weigh much more than those with small values in the calibration process.

Case Study
is section presents a case study to elaborate the difference in model calibration outcomes using the two approaches. e example is to calibrate the default Federal Highway Administration (FHWA) simplified model for computation of facing loads of soil nail walls using a total of 23 measured data. ese data were collected by Liu et al. [32] from the literature. ey corresponded to facing loads monitored during or at completion of wall constructions. Hence, they should be interpreted as "short-term" facing loads.
In the following, the measured facing load database established by Liu et al. [32] is first introduced, followed by a brief review of the default FHWA simplified facing load model. Section 3.3 presents the calibration results along with comparisons and discussion. Note that calibration of the FHWA facing load model has been done by Liu et al. [32] using the PDC approach. Section 3.4 shows how would the selection of calibration approaches affect the practical designs of facing of soil nail walls. Figure 1 shows the side and front views of the facing of a typical soil nail wall. Nails are structurally connected to the facing at their heads. As the wall deforms, lateral active earth pressures act onto the facing, which are then transferred to nails due to the nailfacing connections. In equilibrium state, a nail is responsible for the lateral earth pressure within a tributary area where the nail head centers. e product of the earth pressure and Advances in Civil Engineering the tributary area is referred to as the nail head tensile load or facing load (Figure 1(a)) in this study. Here, the tributary area is the product of horizontal and vertical nail spacing (Figure 1(b)). Liu et al. [32] developed a database containing measured long-term and short-term facing loads. e short-term load data are extracted and briefly reviewed here for the reader's convenience to follow. ey collected in total 31 short-term load data; however, 8 were identified as questionable data and thus filtered from analyses. Table 2 provides the wall geometry, soil properties, facing type, and nail spacing for soil nail walls where the remaining 23 data were from. e data were collected from five wall sections, ranging from 4 to 12 m high. All walls had vertical or steep facings and horizontal back slopes. Four walls were in cohesionless soils, while one was in cohesive soil. Two walls were subjected to surcharge, in addition to soil self-weights. e facings were constructed with shotcrete or concrete panels. Nail spacings were set between 1 and 2 m, which are typical. Readers are directed to Liu et al. [32] for detailed description of the collected facing load data.

FHWA Facing Load Models.
e facing of a soil nail wall, as shown in Figure 1, can be simplified as a continuous twoway slab. According to the FHWA soil nail wall design manuals [33], under working conditions, the maximum facing load due to the lateral active earth pressure, T f , can be calculated as where α is the empirical spacing factor expressed as α � 0.6 + 0.2(S max − 1); S max (unit: m) is the larger of horizontal and vertical nail spacing S h and S v , respectively; η is the empirical depth factor expressed as η � 1.25 h/H + 0.
where h and H are the depth and wall height, respectively; K a , c, and q s are the Coulomb active earth pressure coefficient, soil unit weight, and surcharge, respectively. e measured facing loads (T m ) are plotted against the corresponding predicted T f using equation (8), as shown in Figure 2(a). e data points in the figure appear to be two clusters: one around the 1 : 1 correspondence line and the other below the line. is suggests that the current default FHWA facing load model is conservative as generally it would overestimate the maximum facing loads.
is observation is confirmed by computing the sample mean (λ) and sample COV (COV λ ) of λ of equation (8), which are λ � 0.77 and COV λ � 0.669. Hence, on average, equation (8) overpredicts the maximum facing loads by 23%, and the predictions are highly dispersive, according to the ranking scheme proposed by Phoon and Tang [18]. Furthermore, Figure 2(b) shows that λ tends to decrease as T f increases. e dependency is quantitatively confirmed at a level of significance of 0.05 by Spearman's rank correlation test results that are also given in the figure. Such a dependency is undesirable, and its effects on reliability-based geotechnical design have been investigated by Lin and Bathurst [34]. As a result, it is necessary to perform model calibration for equation (8) for accuracy improvements.
To identify the sources of model errors and the abovementioned dependency, λ are plotted against each input parameter of equation (8), as shown in Figure 3. Spearman's rank correlation test results show that λ are statistically correlated to α, K a , and S h S v at a significance level of 0.05. erefore, a correction term which is a function of these three parameters can be introduced to the equation for calibration, i.e., M � f(α, K a , S h S v ). However, since α and S h S v are highly correlated, only S h S v is kept, while α is removed from M formulation for simplification. Moreover, K a is removed from M to further simplify the calibration as     Advances in Civil Engineering the focus of this study is on comparison of calibration approaches rather than model development. Last, a power form expression is assumed for M, and thus, equation (8) becomes where a and b are the empirical constants to be determined using the 23 measured facing load data; and A t � 2.25 m 2 is the typical tributary area used to make M dimensionless. e PDC and RMSE approaches discussed earlier in this study are then used to determine the values of a and b. e results are shown and discussed in the next section. (9) is carried out in this section. e accuracy of the calibrated model is then compared based on sample mean and sample COV and also root mean squared errors (RMSEs). After that, the accuracy is rescrutinized from a maximum likelihood perspective, which helps understanding the calibration outcomes. Table 3 provides the calibration outcomes for equation (9) using the two approaches. By using the PDC approach, the constants in equation (9)  Second, based on bias statistics of equation (9), the PDC approach appears to be superior as its results are unbiased on average and less dispersive compared to those by the RMSE approach (i.e., 4% overestimation and 2% more dispersive). Interestingly, if comparing the RMSEs, one would easily reach a reverse conclusion that the RMSE approach is better than the PDC approach as it gives less RMSE, 35.9635 against 38.3618. erefore, it is difficult to judge which one is better if based merely on the results given in Table 3. e comparison should be made from a third angle, which will be discussed in the next subsection.

Comparisons between PDC and RMSE Approaches.
Last, both approaches are effective and efficient in model calibration for accuracy enhancement. e essence of calibration is to move the data points towards the 1 : 1 correspondence line in general, as shown in Figure 4. e difference is that λ ′ by the PDC approach seems to be more skewed, while those by the RMSE approach appear to be more uniform. Despite of this, λ ′ by the two approaches follow closely along the 1 : 1 correspondence line, and thus, the two methods in general do not lead to fatally different outcomes.

Rescrutinization from a Maximum Likelihood
Perspective. e accuracies of the three facing load models, i.e., default FHWA, calibrated FHWA by PDC, and calibrated FHWA by RMSE, are reassessed in this section using the maximum likelihood method. e cumulative distributions of λ and λ ′ are shown in Figure 5. Kolmogorov-Smirnov tests are applied to the three bias datasets, and the results show that all the datasets can be considered as both normally and log-normally distributed at a significance level of 0.05. Hence, the maximum likelihood estimation (MLE) is carried out assuming normal and log-normal λ and λ ′ . e estimation outcomes are given in Table 4.
For both cases, the estimated means by MLE are practically the same as the sample means; while, the estimated COVs by MLE are slightly less than the sample COVs, i.e., about 2-4% which are practically neglectable. However, for the default model case (i.e., equation (9)), the estimated bias COV by MLE assuming log-normal λ is much larger than the sample COV or that assuming normal λ, i.e., about 20% higher.
For the normal case, the computed maximum loglikelihood values ln(L max ) are −12.8317 and −12.9868 for the PDC and RMSE approaches, respectively. As for both models, the number of parameters is κ � 2, and the number of data points is n � 23; by using equation (7), the BIC values are correspondingly calculated to be 31.93 and 32.24. is means that, if λ ′ is a normal random variable, then in this case, the PDC calibration approach is better than the RMSE approach as the BIC value for the former is smaller. On the other hand, ln(L max ) are −9.5680 if using the PDC approach and −8.6353 if using the RMSE approach. e corresponding BIC values are 25.41 and 23.54. is suggests that if λ ′ is a log-normal random variable, then model calibration using the RMSE approach is preferable. For both approaches, the ln(L max ) values for the log-normal case are always less than those for the normal case; hence, it can be said that λ ′ is more likely a log-normal random variable. If comparing the four ln(L max ) values, one would conclude that the RMSE-log-normal scenario is the best one as its ln(L max ) is the smallest one.

Practical Influences.
Analyses presented earlier show how the selection of a calibration approach would affect the calibration outcomes, i.e., C � (a, b). In this part, we investigate the influences of the calibration outcomes on practical designs of facing of soil nail walls. e facing design must ensure adequate margins of safety against various limit states, including facing flexural, punching shear, and headed-studs tensile failures. For illustration purposes, here we only consider the facing flexural limit state, which requires estimation of the maximum facing load and the ultimate facing flexure capacity. e primary design parameter for this limit state is the reinforcement ratio cross-sectional area per unit width at the nail head 6 Advances in Civil Engineering   e influences of the two calibration approaches are assessed by using C to compute the maximum facing loads and to determine the reinforcement ratio. e example wall used for analysis is 10 m in height with a horizontal back slope (β � 0°) and a vertical facing (ω � 0°). e means of the soil strength parameters are assumed to be c � 0 kPa, ϕ � 30°, and c � 18 kN/m 3 . e COVs are taken as 0.15 for ϕ and 0.075 for c. Nails are spaced at 1.2 m horizontally and vertically. (9)  e T f by the calibrated FHWA model by the RMSE approach is about 36 kN. e difference in T f using the two approaches is about 8%. Figure 7 shows the distributions of facing loads corrected by biases, λT f or λ ′ T f , where the statistics of λ and λ ′ are estimated by the method of moment (referred to as sample bias) and maximum likelihood method (referred to as MLE bias). Visually, the distributions of λ ′ T f by PDC and RMSE approaches are highly similar, regardless of the methods used to compute bias statistics. Compared to the calibrated FHWA cases, the distributions of λT f for the default FHWA case move leftward, meaning that λT f overall is smaller than λ ′ T f . ere is a noticeable difference in λT f distributions by sample bias and MLE bias, mainly due to the large difference in COV of λT f , as given in Table 5, which summarizes computed means and COVs of λT f and λ ′ T f at h/H � 0.5 using different calibration approaches and bias estimation approaches. On the other hand, λT f for the default FHWA case has longer tails than those for the calibrated FHWA cases; obviously, this is due to the much higher COVs of λT f , compared to those for λ ′ T f . Last, it is observed that, in general, the differences in the means and COVs of λ ′ T f based on RMSE and PDC approaches are practically insignificant, as given in Table 5.

On Facing
Design. Consider identical reinforcement cross-sectional area per unit width at the nail head and at midspan (a n + a m ) horizontally and vertically, with S h � S v , the facing flexural capacity, R F , calculated as [33] R F � C F 265 a n + a m tf y .
where C F is the empirical factor accounting for the nonuniformity of soil pressures behind the facing and is equal to 2.0, 1.5, and 1.0 for temporary walls when the facing thickness is t � 100, 150, and 200 mm [35], and f y is the reinforcement tensile yield strength. In this example, the facing thickness is taken as t � 100 mm, and thus, C F � 2.0. e nominal value of f y is taken as 414 MPa. e reinforcement area (a n + a m ) is the main design parameter to be determined given a target margin of safety (e.g., factor of safety or reliability index). e design factor of safety for this limit state can be calculated as e performance function, g F , can be written as where λ R and λ L are the model factors (model factors (biases) for R F and T f , respectively. In this study, λ L is λ if using the default FHWA model and λ ′ if using the calibrated FHWA models. λ R is taken as a log-normal random variable with mean of 1.1 and COV of 0.1 [36,37].  8 Advances in Civil Engineering Table 6 provides the design outcomes of (a n + a m ) using the deterministic approach where the margin of safety is taken as FS � 2.0 and using the reliability approach where the margin of safety is taken as β T � 3.50. Here, β T is the target reliability index, and β T � 3.50 roughly corresponds to a probability of failure of 1/5000. Note that these FS and β T values are consistent with those recommended in the FHWA soil nail wall design manual by Lazarte et al. [33] and Aashto [38]. Based on the deterministic approach (equation (11)), using the FHWA facing load model calibrated by the PDC approach gives the least ((a n + a m )), which is 212 mm 2 /m; whereas, the default FHWA model gives the highest value, i.e., 266 mm 2 /m. e difference is about 20%.
However, based on the reliability approach, the computed (a n + a m ) values using the default FHWA model are much larger than those using the calibrated FHWA models, i.e., 730 versus 446 and 492 and 1135 versus 430 and 447. e difference is about 40-60%. e design outcomes by the calibrated FHWA models by PDC and RMSE approaches are more or less similar, albeit those by the RMSE approach are slightly higher. Figure 8 shows the (a n + a m ) values with respect to FS ranging from 1 to 5 and β T ranging from 2 to 4 using the default and calibrated FHWA facing load models. It confirms that the difference in (a n + a m ) is insignificant between the two calibrated FHWA models; both are much less than those obtained by the default model. is highlights the importance of performing geotechnical model calibration while the influence of selection of the model calibration approach is secondary. Tables 5 and 6 and Figure 8 together

Conclusions
Calibrations of geotechnical models in many cases have to be carried out with scarce data. is study examines two approaches that have been widely adopted for geotechnical model calibration in the literature, namely, pure datacatering (PDC) approach and root mean squared error (RMSE) approach. e PDC approach calibrates a model by adhering to two criteria: maintaining mean of model bias of one while minimizing COV of model bias, where model bias is defined as the ratio measured to the predicted value. e RMSE approach calibrates a model by minimizing the root mean squared error between measured and predicted values. A case study is presented to elaborate the influence of selection of model calibration approaches from a practical point of view. e case study is on calibration of the default Federal Highway Administration (FHWA) simplified facing load model for facing design of soil nail walls. A total of 23 measured facing load data collected by Liu et al. [15] are adopted for calibration. Calibration results confirm that the two approaches are not practically equivalent when the data available for calibration are scarce. A model calibrated by the PDC approach usually does not reach minimal RMSE or vice versa.
e Bayesian information criterion (BIC) is introduced to rank the competence of the PDC and RMSEcalibrated models fitting to the data. According to BIC, a model calibrated by the PDC approach may or may not be superior to its counterpart by the RMSE approach, depending on the assumption of distribution of model bias.
e two PDC-and RMSE-calibrated models are then used for estimation of facing loads and design of reinforcement ratio against the facing flexure limit state using both deterministic and reliability-based design approaches. It is demonstrated that the estimated facing loads and the determined reinforcement ratios using both calibrated models do not differ significantly from each other. erefore, in practice, either approach can be adopted for geotechnical model calibration even with scarce data.
Last, there are also other approaches for model calibrations, for example, the Bayesian inference technique. e Bayesian approach provides distributions other than point estimates for estimation of model parameters. is sets the basic differences between the Bayesian approach and the  Figure 8: Plots of design outcomes of (a n + a m ) against FS ranging from 1 to 5 and β T ranging from 2 to 4. PDC and RMSE approaches. Discussion on parameter determination using Bayesian approaches can be referenced to, e.g., Lin and Yuan [39] and Lin et al. [40].

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.