An Improved Bayesian Structural Identification Using the First Two Derivatives of Log-Likelihood Measure

The posterior density of structural parameters conditioned by the measurement is obtained by a differential evolution adaptive Metropolis algorithm (DREAM). The surface of the formal log-likelihood measure is studied considering the uncertainty of measurement error to illustrate the problem of equifinality. To overcome the problem of equifinality, the first two derivatives of the log-likelihood measure are proposed to formulate a new informal likelihood measure for the sake of improving the accuracy of the estimator. Moreover, the proposed measure also reduces the standard deviation (uncertain range) of the posterior samples.The benefit of the proposed approach is demonstrated by simulations on identifying the structural parameters with limit output data and noise polluted measurements.


Introduction
Recent years witness the increasing desire of Bayesian estimation for structural parametric system when quantifying the inevitable uncertainties, such as measurement error or structural model error and so forth, as is reviewed by Simoen et al. [1].In particular, Beck and Au [2] used Laplace's method of asymptotic approximation to obtain a posterior PDF with a small-dimensional parameter space.To solve higher dimensional problems, Muto and Beck [3] developed an adaptive Markov chain Monte Carlo (MCMC) simulation for the Bayesian model updating.Gibbs sampling and transitional Markov chain Monte Carlo (TMCMC) were used by Ching and Chen [4] to obtain the posterior PDF of parameters.Cheung and Beck [5] used a hybrid Monte Carlo method, known as the Hamiltonian Markov chain, to solve higher dimensional model updating problems.Huhtala and Bossuyt [6] explored a Bayesian inference framework to solve the inverse problem of locating structural damage.An et al. [7] proposed a statistical model parameter estimation using Bayesian inference when parameters are correlated and observed that data have noise.Green [8] used a novel MCMC algorithm, data annealing, which is similar to simulated annealing, for the Bayesian identification of a nonlinear dynamic system.
The difficulty of Bayesian estimation lies in the efficiency in the convergence of posterior samples in the Markov chain to the acceptable model set.Moreover, because of the noise corrupted measurement, the surface of the prediction error lies in a hypersurface of a multidimensional parametric space.It will cause the surface of the probability density for the posterior sequences to have multiple regions of attraction and numerous local optima.It thus inevitably yields a biased estimator (no matter what is called maximum likelihood estimator, ML, or maximum a posteriori estimator, MAP).This problem is defined as the "equifinality" [9][10][11].The surfaces of the prediction error, using formal likelihood measures, maximum log-likelihood (ML), are studied.From the surfaces of fitness measures, it can be concluded that the formal likelihood measure underestimates or overestimates the uncertain intervals of the posterior samples.The reason is that there are several possible models which can also give high values of likelihood around the neighborhood of the estimator.
In this paper, the bias between the ML/MAP estimator and the actual value is deduced by the Taylor expansion.It is found that the gradient and Hessian matrix of the likelihood measure can bridge the biased estimator and the actual value, which is thus proposed to improve the accuracy of the posterior samples.The parameter estimation problem is proposed as a two-step strategy.In the first step, the MAP/ML estimator is obtained by the formal Bayesian likelihood measures using the differential evolution adaptive Metropolis-Hastings (DREAM) algorithm.In the second step, a new fitness measure is proposed, which can be seen as the informal likelihood measure under the framework of the generalized likelihood uncertainty estimation (GLUE) [12][13][14].Numerical examples of a linear structural system are presented, with which the effectiveness and efficiency of the proposed method are investigated.

Least Squares (LS)
Estimator for the Inverse Problem.Let   () denote the measured response at each time interval ( = 1, . . .,   ) and ŷ(x, ) denotes the output of candidate models.The difference between the measured response and model outputs is defined as the residual error: (x, ) =   () − ŷ(x, ), where  = 1, . . ., , and  is the number of outputs.The common measure for the inverse problem is to attempt to force the residual vector as close to zero as possible by tuning the model parameter vector, x.Thus, the fitness measure can be defined as follows: This is an -dimensional optimization issue which maximizes the likelihood measure of SSR (equivalent to minimize the measure of LS formulation).But such measure can only provide an estimate of optimal value of x * .If we need to quantify the uncertainty of the estimator, it would be a desire to estimate the underlying posterior PDF of parameter, (x() |   ), which is under the framework of Bayesian probabilistic estimation.

Bayes Estimate Using Formal Log-Likelihood (LL) Measures.
In the Bayesian estimation framework, the model set, , is a class of probabilistic models, each of which predicts the response of the actual system.The identification problem is to infer the plausibility of each candidate model with a posterior density distribution conditioned by the measured data, (x() |   ); it is not a quest for the true structural parameters.x() is a stochastic parameter vector defining each possible model in the model set.The model set, M, is defined by random parameters, x() = (x 1 (), x2 (), . . ., x ()) ∈ R  (  ∈ Ω,  is the random variables in probability space Ω), where  is the number of parameters for model   ∈  and   is the number of stochastic samples.The initial plausibility of each model parameterized by x(  ) ( = 1, 2, . . .,   ) is defined as a prior density function, (x() | ).The updated plausibility of I/O model using Bayes' theorem is as follows: (  | x(), ) is the likelihood measure, (x()).If the measurement error is considered as obeying the Gaussian distribution with a constant variance,  2  (th available observed response), the posterior PDF in (2) is thus as follows: (x () |   , ) where It is clear that the advantage of MH algorithm lies in the fact that when computing acceptance ratio there is no need to obtain the model evidence since the constant  cancels out.The transition of samples generates a Markov chain ( 0 ,  1 , . . .,   , . ..).Following a burn-in period, the Markov chain approaches its stationary distribution, and the samples after the burn-in period converge into the posterior PDF, (), as that in (2).From (4), it can be found that the Bayesian estimate relies on the likelihood measure, (  | x(), ).It is more convenient to use the logarithm of the likelihood measures (LL(x())) rather than the likelihood function itself: ( Either the log-likelihood measure, as is in (5), or the least square measure, as is in (1), obeys the rule of "goodness-of-fit." This is because only the model with high probabilistic value of likelihood in the MH method will be accepted.

The Surface of the Likelihood Measures.
To illustrate the problem of "equifinality, " the surfaces of the common-used likelihood measure, the LL(x()) as is in (5), are simulated in identifying the stiffness parameters of a 2-DOF linear dynamic system.The state space of the system is written as follows: where M, C, and K are the mass, damping, and the stiffness matrices, I is  ×  identity matrix, and Γ = [1, 1, . . ., 1]  is 1 ×  position vector.v 1 () and v 2 () are state space vectors, respectively, representing the displacement and velocity, and u() is the input of the system.The system output is an acceleration which is assumed to be contaminated by Gaussian white noise   () ∼ (0,   ()), ( = 1, . . ., ).The measured output vector is thus The mass and stiffness of each DOF are defined as 100 kg and 1000 N/m.Equation ( 7) includes a Rayleigh damping  Mita [15], where the first two-modal damping ratio (  ) is set as 5%: The parametric domain is meshed by 5% deviation of the true value.The output acceleration (acc.) with different noise levels (7) was used in the simulation, in which the noise level (nl.) was defined as   = nl.×  acc. .The contour plots of the likelihood measure, as is in (5), in the scenarios of noise-free and different noise level scenarios are exhibited in Figure 1.
From Figure 1, we can find that only when the measurement error is ignored, the center of the posterior samplings model set (the ML/MAP estimator) will be unbiased; however, taking the noise into account, the optimal solution with maximum PDF is biased.Moreover, around the search neighborhoods of the optimal solution, there are many local optimums.It can conclude that when considering the measurement error the common likelihood measures as in (1) and in (5) are weak to solve the problem of equifinality.The bias of the estimator will increase with adding the number of the parametric dimensions and the noise level.It is thus necessary to improve the identified ML/MAP estimator.

The First Two Deviations of the Likelihood Measure. With
Taylor's expansion, the likelihood measure can be deduced as where Δx denotes x() − x(  ).With the derivative of (9) to x() and ignoring the high order derivative series, one obtains: where G(x()) is the first order derivative of likelihood measure, which is the gradient matrix of L(x()); H(x()) is the second order derivative of likelihood measure, which is the Hessian matrix of L(x()).G(x(  )) and H(x(  )) are the gradient and Hessian matrix at the point of true value, x(  ).
Since G(x( ML )) = 0, then it yields that As seen in ( 11), the bias, x( ML ) − x(  ), is equal to the negative product of the gradient and the inverse Hessian matrix at the actual value.Equation ( 11) is thus proposed in this study to formulate the informal likelihood measure.Let (x()) = x() − x( ML ) denote the bias between the MAP/ML estimator and each of the posterior samples.The proposed Bayesian updating of the posterior samples using DREAM algorithm can be divided into two steps.The first step is to find the ML/MAP estimator, x( ML ).The second step is to search the optimal point, x( * ), of the proposed criteria around the neighborhood of the ML/MAP estimator.The informal likelihood measure can be written as follows: where  denote the credible range, which can be decided by the user as the stop criteria.

Illustration of the Proposed Likelihood
Measure.From (12), it can be found that the extreme points of the likelihood measure,  * (x()), are the actual value, x(  ), and the MAP/ML estimator, x( ML ), because the former extreme value is the null point of (11) and the latter item meets the condition of G(x( ML )) = 0. Since the MAP/ML estimator, x( ML ), has already been obtained in the first step after the distribution of the posterior samples has become stationary, the posterior samples on the Markov chains in the second step will converge into the neighborhood around the actual value due to the proposed likelihood measure.The accuracy of the estimator is thus improved and the uncertain range of the posterior samples will be narrowed around the actual value.
To verify the proposed likelihood measure, as is in (12), the surface and contour for the 2-DOF linear system, of which the simulation in 100% noise scenario is the same as that in Section 2.3, are shown in Figure 2. The accuracy of the posterior samples, seen in Figure 2, is improved comparing with the MAP estimator, comparing with Figure 1(b).

Two Steps of the DREAM Based
where   is a small random vector that is drawn from   (0, Σ);  is the number of chosen pairs; and where  13) and ( 14) and approaches to a stationary after a burn-in period.The convergence of posterior samples is checked by the Gelman-Rubin criteria [16].The R -statistic, R, is computed by using the last 50% of the samples in each chain.The convergence criteria, R, are where  is the number of the used posterior samples;  equals ∑  =1  2  /, where  2  is the variance of the sequence; the posterior variance is estimated as σ2 = (( − 1)/) + (1/), where  is the variance between the sequences;  can be computed as  =   × ∑  =1 ( x − x) 2 /( − 1).

3.3.2.
Step 2: Density Updating of the Samples That Satisfy the Proposed Criteria.When R in (14) for each dimension () is less than 1.2, the posterior samples on the MC chain converged into a stationary distribution.The MAP estimator, x MAP , is obtained with the maximum posterior PDF.Also, the standard deviation of the posterior samples,  x() , can be obtained.Then, the algorithm turns into the second step.The gradient and Hessian matrix of the posterior samples within the boundary of x MAP ∓ 3 x() are calculated at each iteration.
The proposed informal likelihood measure as in (11) will be used for the updating of the posterior samples.The estimation procedure will stop till the prescribed precision, , is satisfied.The gradient matrix and the Hessian matrix at each sample can be written as where  denotes the th posterior sample and  is the parametric dimension.The diagonal and off-diagonal elements of the H(x(  )) can be given by ( 16) and the following equation: 3.4.Identification Procedures.The procedure of the proposed posterior density estimation is as follows.
Step 1. Use the LHS method to generate   sequences for the initial state of MC chains, respecting the prescribed limits of the search space.The likelihood measure of each sample is obtained by (5).
Step 2. Update the posterior sample of the Markov chain by mutation strategy using (13) and by the crossover probability using (14).Calculate the density for the updated samples.
Step 3. The Metropolis acceptance ( 4) is used for choosing the accepted posterior samples.
Step 4. Return to Step 2 and to Step 3, after the burn-in period and calculating the convergence criteria using (15) for each Step 5. Calculate the gradient and Hessian matrix at the point of each sample within the interval of [x MAP ∓ 3 x() ] by ( 16) and ( 17).
Step 6.The informal likelihood measure as in ( 12) is used for the transition of the posterior samples till the predefined stop condition is satisfied.

Numerical Study
4.1.Identification of a 5-DOF LTI System.Numerical simulation of a 5-DOF LTI system was carried out to verify the proposed method.The structural system is simulated as (6) and the measured signal is as (7).The input was an El-Centro wave lasting 40 s and the sampling frequency was 100 Hz (Figure 3).The measurement noise of the 5th DOF in the 100% noise level scenarios is shown in Figure 3. Table 1 shows the structural properties of the dynamic system.The influence of the limited availability of measurements on the proposed method is also assessed in this study.In the "full output" scenario, measurements of all DOFs are available, whereas, in the "partial output" case, only data from DOFs 3 and 5 are available.The mass is assumed to be known; hence, an -DOF system with -available measurements is described by a model set, of which the interested parameterized vector is as follows: The parameters of the DREAM algorithm were set as follows: the number of Markov chain samples (  ) was 20, the crossover probability (CR) was 0.85, and the number of sample pairs () was 5.The search domain was taken to be 0.5-2.0 times the true value.The identified results are exhibited in Table 2.
From Table 2, the conclusion that the proposed method can find the optimal value, x( * ), in the scenarios of both "full outputs" and "partial outputs, " even when considering large noise level, can be drawn.Ignoring the measured noise error, the proposed method can identify the structural parameters with no bias.The coefficient variance (cov.) of posterior samples is identified as zero, which demonstrates that the inverse problem under the noise-free scenario is deterministic.When the acceleration of each DOF is available for measure, the maximum relative error ranged from 0.494% in the 30% noise case to 0.726% in the 100% noise case.While considering that the measurements of the 3rd and the 5th DOFs are available, the maximum relative error ranged from 0.701% in the 30% noise case to 1.044% in the 100% noise case.The credible range of the posterior samples is depicted by the coefficient variance of the MC sequences.It is clear that the parametric uncertainty was additive with the increasing of measurement error.In the "full outputs" scenario, the maximum coefficient variance (cov.) of the posterior samples ranged from 0.897% in the 30% noise level to 2.397% in the 100% noise case.Correspondingly increasing with noise level in the partial outputs scenario, the maximum cov.rises from 2.423% to 4.197%.
The convergence progress of the MC samples for each dimensional structural parameter in the scenario of partial output and the consideration 100% noise level are shown in Figure 4. From Figure 4, it is found that the probability density of the posterior samples in Markov chain approached to be stationary after 1500 iterations, which is clearly shown in Figure 4(b).The proposed likelihood measure used for the transition of posterior samples to find the optimal x( * ) works after 2500 iterations.The posterior uncertain range that assures a reliability of 95% can be obtained from the posterior samples of the model class which denotes the plausibility of each I/O system.Figure 5 shows the uncertain range of the response with 95% assurance and with measurement errors at each time interval by incorporating the identified standard deviation of the prediction error (time history of initial 10 s is shown).The percentage of the response considering a 100% measurement error within the uncertain range that considers prediction error is 94.76%.The "Error" is in %; and the "Cov." (the ratio of standard deviation to the mean) is in %.

The Identification of a 10-DOF LTI System.
To show the advantage of the proposal, the identification results of a 10-DOF LTI system using the improved method, as is proposed in (12), are compared with the estimation solutions using the formal log-likelihood measure, as is in (5).For simplicity, the mass of each floor is assumed to be 50 kg, the stiffness of each DOF is equal to 5000 N/m, and the first two-mode shape damping ratio is assumed to be 0.05.The input excitation is the same as that used in the simulation of the 5-DOF LTI system.The white noise is added to the measured response, simulated by (11), where the noise level is assumed to be 10%, 30%, and 100%.When considering the scenario of partial measurements, only the even DOFs (2nd, 4th, 6th, 8th, and 10th) are assumed to be available for Bayesian updating.The results are shown in Figure 6.
In the case of "full outputs, " the maximum relative error of xMAP ranges from zero to 0.617% in 10% noise level case,  1.175% in 30% noise level, and 1.766% in 100% noise level.Correspondingly, the maximum relative error of xMAP using the traditional likelihood measure increases from zero in the case of ignoring measurement noise to 0.754% in 10% noise level, 1.999% in 30% noise level, and 6.491% in 100% noise level.The improvement is also clear in the scenario of partial outputs.The maximum relative error of xMAP using the proposed method increases from zero in noise-free case to 1.479% in 10% noise level, 2.175% in 30% noise level, and 3.543% in 100% noise level.While the maximum relative error of xMAP using the traditional method rises from zero in nonoise case to 2.621% in 10% noise level, 5.974% in 30% noise level, and 8.451% in 100% noise level, it can be found that the minimum and maximum relative errors of mean posterior samples in model set are all reduced using the proposed method.Moreover, it can be found from Figure 6 that using (12) for the parametric uncertainty (the cov.) becomes smaller than that obtained by the formal log-likelihood measure as in (5).For instance, when considering the case of 100% noise level and partial outputs are available, the maximum coefficient variance of the MC samples using the proposed method is 6.198% compared with that obtained by traditional method which is 11.09%.Comparing the identification results shown in Figure 6, it is clear that the accuracy of the estimator using the proposed likelihood measure is improved.

Conclusions
To improve the accuracy of the MAP estimator obtained by the traditional likelihood measure using the DREAM based structural identification, the gradient and Hessian of the log-likelihood measure are proposed to formulate the generalized likelihood measure for the density transition of Markov chains.Compared with the formal likelihood function, the relative error of the MAP estimator and the uncertain range of the posterior samples using the proposed method becomes smaller.Numerical simulations of a 5-DOF and 10-DOF LTI system demonstrated their effectiveness in solving identification problems with a high noise level and loss of measurement data.In conclusion, DREAM based Bayesian estimation using the proposed improvement has the potential to solve the problem of "equifinality" especially when considering large level of measurement error.

Figure 3 :
Figure 3: Input and simulated measured error.

Figure 6 :
Figure 6: Identified results using the proposed and traditional likelihood measures (gray bars: traditional likelihood measure; blue bars: proposed likelihood measure).
1he updated samples,  * , are produced by the jumping distribution, ( 1 ,  2 ), which is the probability of returning a value of  2 given a previous value of 1.The restriction on the jumping is that the transition is probability symmetric, ( 1 ,  2 ) = ( 2 ,  1 ).(3)The acceptance ratio at the updated candidate ( * ) and the source posterior samples ( −1 ) is () = ( * )/( −1 ) = ( * )/( −1 ) = ( * )/( −1 ).(4)If the acceptance ratio,  > 1, accept the candidate sample,  * ; if the jump decreases the density,  < 1, then it rejects the updating and keeps the current samples,  −1 .The acceptance ratio is as follows: 1 () and  2 () are, respectively, different and random integers that are chosen from the integer set {1, 2, . . .,  − 1,  + 1, . . .,   }.The term   signifies the -dimensional identity matrix, and   signifies a small random vector drawn from a uniform distribution to assure the ergodicity of the samples on the Markov chain.The scaling factor  is decided by the values of  and , where d is the parametric dimension.The DREAM algorithm also explores the DE crossover strategy in d dimensions of the current samples, x (), the updated posterior samples, x +1 (), and a trial sample x +1 () is generated with

Table 2 :
Identified results of structural parameters.