Structural Health Monitoring under Nonlinear Environmental or Operational Influences

Vibration-based structural health monitoring is based on detecting changes in the dynamic characteristics of the structure. It is well known that environmental or operational variations can also have an influence on the vibration properties. If these effects are not taken into account, they can result in false indications of damage. If the environmental or operational variations cause nonlinear effects, they can be compensated using a Gaussian mixture model (GMM) without the measurement of the underlying variables.Thenumber ofGaussian components can also be estimated. For the local linear components,minimummean square error (MMSE) estimation is applied to eliminate the environmental or operational influences. Damage is detected from the residuals after applying principal component analysis (PCA). Control charts are used for novelty detection. The proposed approach is validated using simulated data and the identified lowest natural frequencies of the Z24 Bridge under temperature variation. Nonlinearmodels are most effective if the data dimensionality is low. On the other hand, linear models often outperform nonlinear models for highdimensional data.


Introduction
In structural health monitoring (SHM), changes in damagesensitive features are an indication of damage.Also other sources of deviation are often present, for example, environmental or operational variability.If these effects are not taken into account, they can result in false identifications of damage or a loss of sensitivity to detect minor damage.It is important to distinguish between the two sources of changes in the dynamic characteristics.One option is to make a physical model of different environmental or operational phenomena, but it can be too expensive and inaccurate.An alternative is to include the normal variability in the training data and build a model solely based on the data.Using multivariate statistics, the environmental or operational effects can be eliminated even without measuring the underlying variables (see [1] and the references therein).Also a third source of change in the monitoring data is sensor fault.Kullaa [1] proposed a unified model to distinguish between the three sources of changes in a monitoring system.
Most of the models assume linear correlation between the measured variables or features.However, the environmental or operational variations often cause nonlinear effects.For example, as the temperature falls below zero, its influence on the natural frequencies can change abruptly.This often results also in nonlinear correlation between the features, especially if the data dimensionality is low.On the other hand, a linear model may be sufficient with a large data dimensionality, because the correlation structure may become linear [1].There are only few studies of nonlinear models.Kullaa [2] used the mixture of factor analyzers [3] model to compensate the nonlinear effects.A similar approach was used by Yan et al. [4] having local PCA models for local regions in the data space.Sohn et al. [5] used an autoassociative neural network that can be thought as a nonlinear PCA [6].Figueiredo et al. [7] applied the Bayesian approach to a mixture model and the Mahalanobis squared distance for the mixture components.
A nonlinear model is studied in this paper.A Gaussian mixture model (GMM) is proposed in Section 2 to compensate for the nonlinear effects.It is based on the mixture of linear models, each modelling a region in the input space.The approach needs a clustering algorithm to assign each new measurement to the corresponding class.Clustering can be performed independently of the local linear models.

Shock and Vibration
Therefore, clustering is first performed identifying a Gaussian mixture model followed by local linear models to eliminate the underlying effects within each class.The number of classes is often unknown but can also be estimated.Minimum mean square error (MMSE) estimation is applied to the local linear models, which is described in Section 3. Damage can be detected from the residuals between the data and the model.Damage detection is discussed in Section 4.
The first applications in Section 5 are numerical studies, in which the objective is to validate the proposed approach.Section 5.3 shows the experimental results of the Z24 Bridge, in which the natural frequencies varied due to the temperature.Finally, concluding remarks are given in Section 6.

Gaussian Mixture Model (GMM)
Let x be the multivariate measurement data (also subsequently called variables), which can be time series (e.g., acceleration or strains) from a simultaneously sampled sensor network, or a feature vector comprising identified dynamic properties of the structure (e.g., natural frequencies or mode shapes).Nonlinear data are not normally distributed and cannot be modelled as a single Gaussian distribution.One may try a mixture of Gaussian components, in which the distribution can be written as a linear superposition of  Gaussian densities in the form [8] which is called a mixture of Gaussians.Each Gaussian density (x |   , Σ  ) is called a component of the mixture and has its own mean   and covariance Σ  .The parameters   are called mixing coefficients, which are positive and are summed to one.The first step is to identify the model parameters.The difficulty lies in the fact that the data points are unlabeled; that is, it is typically not known which component was responsible for generating each data point.The data labels can be considered as latent variables and the expectation-maximization (EM) algorithm can be used to identify the mixture model.It is momentarily assumed that the number of components is known.
The EM algorithm is iterative and consists of two steps: the E step and the M step.In the expectation step, or E step, the model parameters are held fixed and the posterior probability of the component  (latent variable) given the data point x is evaluated.In the maximization step, or M step, the latent variables are assumed to be known, and the model parameters are obtained by maximizing the log-likelihood function.
A -dimensional binary random variable z is introduced having a 1-of- representation in which a particular element   is equal to 1 and all other elements are zero.For an observation x  ,   denotes the th component of z  .
The algorithm is outlined as follows.In the E step, the expected value of the indicator variable   under the posterior distribution is In the M step, the model parameters are updated to maximize the log-likelihood function, resulting in [8] where  is the number of observations and The log-likelihood is then evaluated: The steps are repeated until the log-likelihood converges.It is not guaranteed that the algorithm converges to the global maximum.Therefore, it is often advised to run the algorithm a couple of times with different initial guesses of   and Σ  to find a satisfactory maximum.An example of convergence to a local maximum is given in Section 5.1.Another problem is that the number of components is often unknown.To that end, different models can be identified by varying , and the model resulting in the highest log-likelihood is chosen.In order to avoid overfitting, a penalty term −(1/2) ln  is added to the log-likelihood [8], where  is the number of training samples and  is the number of model parameters: where  is the data dimensionality.
Once the model parameters are identified and fixed, the objective is to decide if the new data are generated by the model (undamaged) or by another model (damage).To this end a residual is estimated, which is the difference of the true data point and that estimated by the model: where The last term in the RHS of ( 8) is obtained using Bayes' theorem: which is the same as (2).The first term in the RHS of ( 8) is given by the local linear model, which in this paper is the minimum mean square error (MMSE) estimate for each component [9]: where the coefficient matrix A  is composed of rows estimating the variable corresponding to that row using the remaining variables.Therefore, the diagonal components of A  are zero.The MMSE estimation is discussed in the next section.

Local Linear Models Using MMSE Estimation
With enough redundancy, a subset of observation x can be estimated using the remaining variables.Each observation is divided into observed variables v and missing variables u.It is assumed here that u is the th variable   and the remaining variables are collected in vector v: The partitioned covariance matrix Σ of the training data is where the precision matrix Γ is defined as the inverse of the covariance matrix Σ and is also written in the partitioned form.A linear minimum mean square error (MMSE) estimate for u | k (u given v) is obtained by minimizing the mean square error (MSE) resulting in [9] where   and  V are the mean of u and v, respectively, and An MMSE model is estimated for each mixture component.For component , the estimate of variable   is given by (14).Then, the th row of matrix A  in (10) is composed of the partitioned row matrix H and a zero: The partitioning should be clear from (11).The zero element hits the diagonal in A  , originating from the fact that   is not used to estimate itself, but all the remaining variables are only used.Therefore, the diagonal elements of matrix A  are all zeros.The other rows of A  are obtained similarly by estimating all variables in turn using the remaining variables.Matrix A  is estimated for each mixture component .
To show the relation between ( 10) and ( 14), compute the estimate of the th variable   for a fixed mixture component  using (10).For clarity, the component index  is omitted.Consider which is equal to û in (14).

Damage Detection
Using the mixture model for damage detection introduces an issue of residual scaling, because each class may have a different error variance (15).Therefore, the residual of each variable within each class is divided with the corresponding standard deviation, which is the square root of (15).Also the data dimensionality may be too high for statistical reliability (curse of dimensionality).Therefore, the first principal component scores [10] of residual (7) are used for damage detection.
Control charts [11] are used for damage detection.The control chart used in this study is the Shewhart chart [11], and the plotted variable is the subgroup mean of successive observations.It is believed that the robustness of damage detection increases, because (1) additional variability due to environmental or operational influences can be removed, in this paper using a nonlinear model; (2) PCA is applied to the residuals avoiding the curse of dimensionality; and (3) control charts utilize averaging for better statistical reliability.

Experimental Results
The proposed nonlinear model and the subsequent SHM functions are applied to two numerical studies and the experimental data of the Z24 Bridge.

Five Gaussian Components.
The first numerical example is a mixture of five Gaussian components in a two-dimensional space.Each component has 10,000 data points.This example was chosen because the model assumptions are satisfied.In addition to damage detection, the objective is to test the model identification performance and the number of components selected by the algorithm.The data were created as follows.The data dimensionality was two and the number of components was five.For each component , the components of the mean vector   were sampled from a uniform random distribution between −10 and 10.The covariance matrix was generated by first generating the variances of the principal directions, resulting in a diagonal covariance matrix: where  1 and  2 were uniform random variables,  1 varying between 1 and 2 and  2 varying between 0.01 and 0.5.This diagonal covariance matrix was then rotated in a random orientation , resulting in covariance matrix Σ  : where The data were then generated by sampling from a multivariate Gaussian distribution Once data from each component were generated, all data were concatenated and random permutation was applied to randomize the data labels.
The data are plotted in Figure 1.The training data are the first 10,000 data points shown in Figure 1(a) which were used to identify the model.Because the number of components is often unknown, different models were identified varying the number of components between 1 and 10.The log-likelihoods with the penalty term are plotted in Figure 2(a) for different number of components.The maximum was correctly found with a five-component model.It should be noted that sometimes the components were not correctly identified, but the solution converged to a local maximum (Figure 2(b)).Therefore, it is suggested that the identification is repeated until a satisfying model is obtained, and the model with the highest log-likelihood is selected.
Damage was an equal shift in mean for all components.A bias vector x  = [0.5 −0.25] was added to each data point.
The damaged data were the last 25,000 data points plotted in Figure 1(b) together with the identified model for the training data.The shift of mean can be visually observed.The residuals were estimated for all data points and the first principal component scores were used for the control chart.The Shewhart chart was designed with a subgroup size 100 and the in-control samples 1-10,000.The shift in mean was clearly detected (Figure 3).

Three Piecewise Linear Components.
The second example is a more realistic one in which the data are continuous with piecewise linear correlation between the two monitored variables.The piecewise linear regions are not necessarily Gaussian.Also, the variances are different in each region.The data were created as follows.The data dimensionality was two and the number of piecewise linear components was three.Variable  1 was uniformly distributed between 0 and 1. Variable  2 was a piecewise linear function of  1 : where the parameters and their validity regions are given in Table 1.Gaussian noise was added to the variables, with the standard deviations within each component shown in Table 1.
The test data came from a limited region of component 2. The first half of the test data was healthy and the second half  The data are plotted in Figure 4.They consist of 20,000 data points in a two-dimensional space.The first half is randomly distributed in all three regions (Figure 4(a)), but the last 10,000 data points are confined to the middle region (Figure 4(b)).
The training data include a larger variability than the test data.This is also more realistic as the training typically consists of monitoring under a full range of environmental or operational conditions, while the test data often come from a limited number of measurements at more or less constant conditions.
The training data were the first 5,000 data points.The model identification suggested 7 Gaussian components (Figure 5(a)), which are plotted in Figure 4.
Data points 10,001-15,000 are from the undamaged case, and data points 15,001-20,000 are from the damaged case with a shift in mean shown in Figure 4(b) together with the identified model for the training data.
The residuals were estimated for all data points and the first principal component scores were used for damage detection.The Shewhart chart was designed with a subgroup size 100 and the in-control samples 1-5,000.Damage was clearly detected with no false alarms (Figure 5(b)).

The Z24 Bridge.
The data in the last case are the four lowest identified natural frequencies of the Z24 Bridge (see [12] for details) shown in Figure 6.Their pairwise correlation is plotted in Figure 7.It was reported that the frequencies varied considerably due to environmental effects and can be seen to be nonlinearly correlated.The physical reason was the different behaviour of the bridge below and above the freezing point.
Progressive damage test scenarios were introduced: settlement of pier, spalling of concrete, landslide at abutment, failure of a concrete hinge, failure of anchor heads, and rupture of tendons [12].The first damage was introduced around measurement number 3517, shown with a vertical dashed line in Figure 6.
The training data were the first 3,000 samples shown in blue in Figure 7 and with another vertical dashed line in Figure 6.The training algorithm suggested 6 Gaussian components (Figure 8).After identifying the Gaussian mixture model, the residuals were estimated for all data points and   the first principal component scores were used for damage detection.The Shewhart control chart was designed with a subgroup size 4 and the in-control samples 1-3,000.Damage was clearly detected (Figure 9(a)).The control limits are probably too tight resulting in several false alarms.In particular, a few false indications can be observed just prior to damage.Some activity was reported as the settlement system was installed [12], which may have changed the natural frequencies.Another control chart is shown in Figure 9(b) after using a corresponding linear model (with one component only).Compared to the chart in Figure 9(a), it can be concluded that GMM outperformed the linear model in this case.

Conclusion
A Gaussian mixture model was proposed to eliminate nonlinear environmental or operational influences from structural health monitoring data.The main advantages are that (1) the measurement of the underlying variables is not necessary, (2) the number of Gaussian components can be estimated, (3) the GMM model can be identified independently of the local linear models, and (4) it is a data-based method; no finite element model is needed.The main disadvantages are that the EM algorithm is not guaranteed to find the global maximum and that the training may be quite slow.Nonlinear models are most effective if the data dimensionality is low.Linear models often outperform nonlinear models for highdimensional data [1].The number of environmental or operational variables is usually relatively small.Therefore, their influences on the data are virtually located in a low-dimensional subspace, and a linear analysis is capable of removing this subspace from the subsequent analysis, thus eliminating the environmental or operational effects from the data.
Once the GMM model was identified, MMSE was applied to each component to take into account the local linear correlation.The Mahalanobis distance or whitening transformation [13] could also be applied to the linear components.
The question of how small damage can be detected was not addressed in this study.Detection performance depends on the signal-to-noise ratio (SNR), in which signal is the shift or variance change due to damage and noise is often the measurement error or more generally everything that cannot be explained by the model.SNR should be as low as possible.In this study, noise was decreased by building an accurate model for nonlinear data.
Damage detection comprises several functions and models, many of which are classical.Also, many methods have been applied to SHM by the author and other researchers.This paper focused on the residual generation when the data are nonlinearly correlated.The remaining functions were merely referred to by name.

Figure 1 :
Figure 1: Training data (a) and test data (b) with a change in mean.The identified GMM model is shown in red.

Figure 2 :Figure 3 :
Figure 2: Log-likelihood with the penalty term (a) and an identified GMM model (in red) converged to a local maximum (b).

Figure 4 :Figure 5 :Figure 6 :
Figure 4: Training data (a) and test data (b) with a change in mean.The identified GMM model is shown in red.

Figure 7 : 4 Figure 8 :Figure 9 :
Figure 7: Correlation between the four lowest natural frequencies of the Z24 Bridge.Blue symbols indicate the training data.

Table 1 :
Parameters of the three linear components (22).